Great Lakes

Anirudh Yadav 2020-04-15 3 minute read

Transferring files

One option is to use Cyber Duck, which works fine, but can sometimes be a bit annoying/slow (and requires DUO authentication). Another option is to use the scp command on the terminal. For example to transfer a file called test.txt from my desktop to my home directory on great lakes:

cd Desktop
scp test.txt gl-xfer:/home/asyadav/test.txt

To transfer an entire directory use the -r option; e.g.

scp -r localdir gl-xfer:/home/asyadav

The command can be reversed to transfer from great lakes to my laptop. See the great lakes user guide for more info.

A couple of notes on using scp:

  1. The great lakes user guide says that you’ll need to authenticate via Duo to complete the transfer, but I’ve only had to enter my password so far…
  2. I modified my ~/ssh/config file so that I can use the short-hand gl-xfer rather than typing out the entire host name uniqname@greatlakes-xfer.arc-ts.umich.edu; see this linuxize post.

Batch jobs

The main way of submitting jobs to Great Lakes is via the sbatch command. The command is designed to reject the job at submission time if there are requests or constraints that Slurm cannot fulfill as specified, giving users a chance to modify their job specifcations.

Submitting a job

To submit a batch job you first need to create a simple batch job script which tells Slurm the job specifications (e.g. how many nodes, processors, memory, etc.) and the program to execute.

Here is an example batch script that I have used in the past named calibration_SAMIN_mktclearing.sh. The fields are pretty self-explanatory.

#!/bin/bash
# The interpreter used to execute the script

#“#SBATCH” directives that convey submission options:

#SBATCH --job-name=calib_mktclear
#SBATCH --mail-user=asyadav@umich.edu
#SBATCH --mail-type=BEGIN,END,FAIL
#SBATCH --cpus-per-task=10
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --mem-per-cpu=1g
#SBATCH --time=48:10:00
#SBATCH --account=lsa3
#SBATCH --partition=standard
#SBATCH --output=/home/%u/%x-%j.log

# The application(s) to execute along with its input arguments and options:

julia -p 10 calibration_SAMIN_mktclearing.jl

To submit the job simply navigate to the directory where the batch job script is located an use the sbatch command:

sbatch calibration_SAMIN_mktclearing.sh

Note that you need to specify a Slurm account for the job to run. To view which accounts you can submit to use the command:

sacctmgr show assoc user=$USER

UPDATE: email recieved from HPC support on April 21 says that I should be using lsa1!

Common errors

Often when running a new job you’ll find errors in your code/script, so some iteration is involved. Once the code/script is OK, you may encounter an “out of memory” error. The usual solution to this is simply to increase the memory allocated for the job via the mem-per-cpu argument. Great Lakes defaults to 1G per cpu, but if this isn’t sufficient, double it until the job goes through.

Useful job commands

List queued and running jobs

squeue -u$USER

Cancel a queued job or kill a running job

scancel <job_id>

Cancel all jobs

scancel -u$USER

To monitor CPU/memory usage of running jobs you can ssh into the computer node the job is running on. To find the nodes your jobs are running on use the squeue -u$USER user command. Then ssh into the node:

ssh c13n03

Once you’re in the computer node use the ps command to get an instantaneous snapshot of your CPU/memory usage:

ps -u$USER -o %cpu,rss,args

You can also use the top command here.