SLURM

Some important command

sinfo 
sinfo -N -l
sinfo --long —-partition=big_compute
sbatch script.sh
squeue
squeue --me
scancel <jobid>

Sample slurm script

#!/bin/bash
#SBATCH --partition=big_compute         # Partition name
#SBATCH --nodes=5                       # Number of nodes
#SBATCH --ntasks-per-node=192           # ranks per node <= no. of cores/node
#SBATCH --cpus-per-task=1               # 1 is default
#SBATCH --time=0-01:00:00               # days-hours:min:sec
#SBATCH --job-name=NAME                 # Job name
#SBATCH -o slurm.%j.out                 # Slurm output file (%j = job ID)
#SBATCH -e slurm.%j.err                 # Slurm error file
#SBATCH --mail-user=you@mail.com
#SBATCH --mail-type=ALL                 # Mail on job BEGIN, END, FAIL
#SBATCH --export=ALL                    # Export environment variables

cd $SLURM_SUBMIT_DIR

# Report job context
echo "Job $SLURM_JOB_ID began"
echo "Running on host $(hostname)"
echo "Time is $(date)"
echo "Directory is $(pwd)"
echo "Using ${SLURM_NTASKS} processors across ${SLURM_JOB_NUM_NODES} nodes"

# Load proper env and modules

# Executable
EXE=./build/main
INPUT=input.prm
LOG=log.txt

mpirun -np $SLURM_NTASKS $EXE $INPUT > $LOG

echo "Job $SLURM_JOB_ID ended"
echo "Time is $(date)"

In this script, total number of ranks = nodes * ntasks-per-node = SLURM_NTASKS. Instead of specifying nodes and ntasks-per-node, you can specify the total number of ranks

#SBATCH --ntasks=960                     # Total number of ranks

and let slurm decide the number of ranks per node.

Praveen Chandrashekar

Centre for Applicable Mathematics, TIFR, Bangalore

SLURM

Sample slurm script

More info