The Stanford AI Lab cluster aggregates research compute nodes from various groups within the lab and controls them via a central batch queueing system which coordinates all jobs running on the cluster. The nodes should not be accessed directly as the scheduler is exclusively responsible for managing resources such as CPU, Memory and GPU for each job.
Once you have access to use the cluster, you can submit, monitor, and cancel jobs from the headnode, sc.stanford.edu. This machine should not be used for any compute-intensive work, however you can get a shell on a compute node simply by starting an interactive job. You may also monitor (read-only) your jobs and the status of the cluster using the web-based dashboard at https://sc.stanford.edu.
You can use the cluster by starting batch jobs or interactive jobs. Interactive jobs give you access to a shell on one of the nodes, from which you can execute commands by hand, whereas batch jobs run from a given shell script in the background and automatically terminate when finished.
If you encounter any problems using the cluster, please send us a request via http://support.cs.stanford.edu being as specific as you can when describing the issue.
To gain access to the cluster, please have your advisor or one of your research group leaders submit a request via http://support.cs.stanford.edu stating the following: your CS login ID, name of the advisor you're working with (and put them under cc on the form), and estimated access expiration date.
By default, there is no sharing of compute resources between partitions, in other words, only use partition(s) and compute resources from your own group. For collaboration, please present us with written approval by your collaborators and their advisors. If we observe deliberate abuse attempts, the incident will be reported and may incur negative consequences.
Access to the headnode, sc.stanford.edu, is only available on the Stanford Network or via the Stanford VPN service. You will need to put the VPN in full tunnel mode, not split tunnel.
If we have any trouble with your job, we will try to get in touch with you, but we reserve the right to kill your jobs at any time. In particular, these machines are intended for compute intensive jobs only. They must not be used to engage in heavy I/O operations or to crawl or scrape data from the internet. For those tasks, see the Dedicated data-transfer node section below.
- Do NOT run intensive processes on sc headnode (NO vsnode, ipython, tensorboard...etc), they will be killed automatically
- Do NOT run on partition that does not belong to your group (even for testing)
- Do NOT start parallel download/upload on scdt unless previously arranged/explained especially with tools like gsutil...etc
- Always be mindful of other users
If you have questions about the cluster, send us a request at http://support.cs.stanford.edu.
Use of the cluster is coordinated by a batch queue scheduler, which assigns compute nodes to jobs in an order that depends on various factors, such as: the time submitted, the number of nodes requested, the availability of the resources being requested (GPU, memory, etc.).
There are two basic types of jobs to the cluster: interactive and batch.
Interactive jobs give you access to a shell on one of the nodes, from which you can execute commands by hand, whereas batch jobs run from a given shell script in the background and automatically terminate when finished.
Generally speaking, interactive jobs are used for building, prototyping and testing, while batch jobs are used thereafter.
Batch jobs are the preferred way to interact with the cluster, and are useful when you do not need to interact with the shell to perform the desired task. Two clear advantages are that your job will be managed automatically after submission, and that placing your setup commands in a shell script lets you efficiently dispatch multiple similar jobs. To start a simple batch job on a partition (group you work with, see bottom of the page), ssh into sc and type:
There are many parameters you can define based on your requirement. You can reference a sample submit script at: /sailhome/software/sample-batch.sh.
For further documentation on submitting batch jobs via Slurm, see the online sbatch documentation via SchedMD.
Our friends at the Stanford Research Computing Center who run the Sherlock cluster via Slurm, also have a wonderful write-up that largely applies to us too: Sherlock Cluster.
Interactive jobs are useful for compiling and prototyping code intended to run on the cluster, performing one-time tasks, and executing software that requires runtime feedback. To start an interactive job, ssh into sc and type:
srun --partition=mypartition --pty bash
The above will allocate a node in mypartition (replace that name with the name of your partition) and drop you into a bash shell. You can also add other parameters as necessary.
srun --partition=mypartition --nodelist=node1 --gres=gpu:1 --pty bash
The above will allocate node1 in mypartition with 1 GPU and drop you into a bash shell.
If you need X11 forwarding please make sure you have XServer installed (such as XQuartz) and add --x11 to your srun command:
srun --partition=mypartition --nodelist=node1 --gres=gpu:1 --pty --x11 xclock
Users can request for a specific type of GPU or specify a memory constraint if they choose to:
srun --partition=mypartition --gres=gpu:titanx:1 --pty bash
The above will request 1 TitanX GPU from any nodes in mypartition.
srun --partition=mypartition --gres=gpu:1 --constraint=12G --pty bash
The above will request 1 GPU with 12G VRAM from any nodes in mypartition.
Of course this varies from partition to partition depending on their hardware configurations. Please visit https://sc.stanford.edu and click on "Partition" on the top right, you can see the types of GPU available for each partition there. As for constraint, you can refer to the specification by Nvidia. (1080ti = 11G, titan = 12G, etc.)
For further documentation on the srun command, see the online srun documentation via SchedMD.
One tool we found very useful and installed on SC cluster is "pestat", https://github.com/OleHolmNielsen/Slurm_tools/tree/master/pestat.
It gives you an overview of the entire cluster or just a specific partition/node/user, line-by-line.
Status of each node on the cluster:
Status of each node within a partition:
pestat -p mypartition -G
Status of a specific node:
pestat -n mynode -G
List nodes that have a job owned by a specific user:
pestat -u myuser -G
You can also use standard Slurm commands. To view a list of all jobs running on the cluster, type:
You can view detailed information for a specific job by typing:
scontrol show job jobid
To cancel a job you started, type:
There is also a Slurm web dashboard if you prefer at https://sc.stanford.edu -- note that this is only accessible from within the Stanford network or via Stanford VPN.
For a good comparison between torque/pbs commands vs. Slurm, please head to https://www.sdsc.edu/~hocks/FG/PBS.slurm.html
There are several storage options for the scail cluster. Replace CSID with your CS username.
Home directory: /sailhome/CSID
All sc cluster nodes mount a common network volume for your home directory. This is a good option for submission scripts, outputs, etc. There is a quota of 20GB for each user.
Scratch Storage via NFS:
Each group has their own network storage mounted via auto-fs (meaning it mounts only when you reference the path). Space amounts vary from group to group, so ask your group for details or contact us if you are not sure where to store your files.
Dedicated data-transfer node:
Since we want to keep resource contention to a minimum, we have a dedicated machine for handling data I/Os or access to off-campus resources. If you need to move large amounts of data or have any prolonged I/O operations, please do so on SCDT (scdt.stanford.edu). Because this machine is equipped with higher bandwidth interfaces and mounts all network storage within the cluster, it is also much more likely to be faster to do transfers there than anywhere else.
You should be aware of a number of global defaults:
Memory per job = 4GB, user can specify more via --mem
Core per job = 2, user can specify more via --cpus-per-task
Walltime = varies by partitions, check https://sc.stanford.edu/ (Partition). Most are 7days, user can specify upto 21days via --time
We see many users wrap their interactive jobs with screen or tmux so they can detach and re-attach later. While this is a feasible use case, we want to state that if there's any network interruption between the headnode (sc) and the compute nodes (and they do happen occasionally), these jobs will get cancelled automatically by Slurm. Jobs submitted via sbatch on the other hand, can better sustain these kind of interruptions.
Virtual Environment for Python:
Almost all users are using some kind of virtual python environment, either virtualenv, anaconda, miniconda, etc. We install a small number of default python packages to get things going, but you are responsible for creating your own environment.
At the moment, CUDA 9.0 is the default across the cluster. But each group (partition) can have their own default. Contact us if you think your group is ready for a new version of CUDA, which can be added (multiple CUDA versions can co-exist). This often requires GPU driver update, which requires a reboot on all of the nodes.
Do not run iPython/Jupyter notebook on the headnode (can cause memory spikes). Instead, do that via one of the compute nodes.
srun -p mypartition --pty bash (add --gres=gpu:1 if you need GPU)
export XDG_RUNTIME_DIR="" (important)
jupyter-notebook --no-browser --port=8880 --ip='0.0.0.0'
Follow the result URL on your browser to open up your notebook.
Extra-credit: If you do this often, you can easily convert the above into a script and use sbatch to run it in batch mode.