Compute Resources
The Università della Svizzera italiana has a high performance compute cluster that is dedicated to research in the Faculty of Informatics. The cluster comprises 42 nodes, each with two 10 core Intel Xeon processors, for a total of 840 cores connected by a high speed InfiniBand network. In addition, 11 nodes are equipped with Nvidia GPUs to increase throughput for massively parallel computational problems.
Cluster Specifications | |
---|---|
Number of compute nodes | 42 |
Number of compute cores | 2 sockets per node, each with 10 cores (840 cores total) |
Processors | 2 x Intel Xeon E5-2650 v3 (25M Cache, 2.3 GHz) |
Memory per node | 64GB to 512GB DDR4 @ 2133MHz |
Disk capacity per node | 2TB SATA 6Gb |
Network storage | 15TB + 32TB RAID array |
Node interconnect | Intel InfiniBand 40Gbps QDR |
GPUs | 2 x NVIDIA GeForce GTX 2080/GTX 1080 TI 11GB GDDR6/11GB GDDR5X 4352/3584 CUDA cores |
HPC Cluster Documentation
Contents
Introduction
This webpage provides support to researchers and to students who need to use the Rosa cluster.
It includes some basic information about the hardware/software/network configuration and the scientific software installed on the machines. It also explains how to access the infrastructure and how to run jobs on the cluster. The final sections provide some hints on how to install additional software and the necessary support contact information.
Cluster Specification
The USI HPC cluster “Rosa” is composed of 41 compute nodes and is managed by a master node. Each node runs CentOS 8.2.2004.x86_64 and provides a wide range of scientific applications. Resource allocation and job scheduling are performed by Slurm, which allows users to submit jobs.
The following list includes the main hardware resources provided by the cluster:
(Note that users are not allowed to directly access compute nodes: they can only access the login node through SSH and then they can run batch jobs on compute nodes through the Slurm job scheduler.)
Login node
Login node | |
---|---|
NODES | icslogin01 |
CPU | 2 x Intel Xeon E5-2650 v3 @ 2.30GHz, 20 (2 x 10) cores |
RAM | 64GB DDR4 @ 2133MHz |
HDD | 1 x 1TB SATA 6Gb |
INFINIBAND ADAPTER | Intel 40Gbps QDR |
Compute nodes
Fat nodes | |
---|---|
NODES | icsnode[01-04,07] |
CPU | 2 x Intel Xeon E5-2650 v3 @ 2.30GHz, 20 (2 x 10) cores |
RAM | 128GB DDR4 @ 2133MHz Note: icsnode07 is populated by 512GB RAM |
HDD | 1 x 1TB SATA 6Gb |
INFINIBAND ADAPTER | Intel 40Gbps QDR |
GPU nodes | |||
---|---|---|---|
NODES | icsnode[05,06,08-16] | ||
CPU | 2 x Intel Xeon E5-2650 v3 @ 2.30GHz, 20 (2 x 10) cores | ||
RAM | 128GB DDR4 @ 2133MHz Note: icsnode15 is populated by 512GB RAM |
||
HDD | 1 x 1TB SATA 6Gb | ||
INFINIBAND ADAPTER | Intel 40Gbps QDR | ||
GPU | on icsnode[05-06] 1 x NVIDIA A100-PCIe-40GB Tensor Core GPU 40GB HBM2 | on icsnode[08,13] 2 x NVIDIA GeForce GTX 1080 Ti Titan 11GB GDDR5X 3584 CUDA cores | on icsnode[09-12,14-16] 1 x NVIDIA GeForce GTX 1080 Founders Edition 8GB GDDR5X 2560 CUDA cores |
Multi GPU nodes | ||
---|---|---|
NODES | icsnode[41,42] | |
CPU | 2 x Intel Xeon E5-2650 v3 @ 2.30GHz, 20 (2 x 10) cores | |
RAM | 100GB DDR4 @ 2666MHz | |
HDD | 1 x 1TB SATA 6Gb | |
INFINIBAND ADAPTER | Intel 40Gbps QDR | |
GPU | on icsnode41 2 x NVIDIA GeForce RTX 2080 Ti 11GB GDDR6 4352 CUDA cores | on icsnode42 2 x NVIDIA GeForce GTX 1080 Ti 11GB GDDR5X 3584 CUDA cores |
Network
Hardware & Configuration | |
---|---|
INFINIBAND | Intel True Scale Fabric 12200 Switch. 36 ports 40Gbps QDR |
WAN | Cluster accessible from the web at rosa.usi.ch (SSH) |
Private LAN | Low-performance tasks, cluster management, access to nodes |
Infiniband LAN | High-performance tasks, jobs scheduling and synchronization, data movement |
Storage
Hardware |
---|
24-bay storage with RAID controller |
16-bay JBOD storage |
16-Gb Fibre Channel controller |
Partitions | |
---|---|
/scratch | 15TB in RAID 10 with SAS3 HD and one dedicated spare much faster write speed than Data_Apps volume prefer using this volume with I/O intensive applications. The scratch filesystem is intended to be used for temporary storage while a job is running, and important data should be moved to alternative storage facilities as soon as a job is completed. |
/apps and /home | 32TB in RAID 6 with SAS3 NL and one dedicated spare. /apps, /home are located on this volume |
Warning
We do not have any backup system for the cluster. Don’t store any critical data on the cluster without making your own backups. We are not responsible for any loss of data.
Disk Quotas
We have enabled disk quotas on the home directories of each user in order to allow for fair usage of shared resources. Each user has a soft limit of 90GB and a hard limit of 100 GB. Once you have reached the hard limit you will not be able to use the resources. You may use /scratch partition if you need more disk space. If you need more disk space in your home directory, you can write to cluster support with your requirements and if possible more disk space may be granted.
Supported Applications
We provide a wide range of scientific applications, which are organized with environment modules.
Environment Modules
To allow the user to switch between different versions of installed programs and libraries we use a module concept. A module is a user interface that provides utilities for the dynamic modification of a user’s environment, i.e., users do not have to manually modify their environment variables ( PATH
, LD_LIBRARY_PATH
, …) to access the compilers, loader, libraries, and utilities. For all applications, tools, libraries, etc. the correct environment can be easily set by e.g., module load gcc
. If several versions are installed they can be chosen like module load gcc/13.2.0-gcc-8.5.0-5hqhkwo
. A list of all modules shows module avail
. Other important commands are:
Command | Description |
---|---|
module avail | lists all available modules (on the current system) |
module list | lists all currently loaded modules |
module show | display information about |
module load | loads module |
module switch | unloads, loads |
module rm | unloads module |
module purge | unloads all loaded modules |
Installing New Applications
If you need any additional applications, you are welcome to install it. Necessary support and guidance may be provided upon request for the installation process.
Access to HPC Services
- To get the account on the cluster please contact the Cluster Administrator mentioning your group head in request.
- For help with the access and account management please contact the serviceportal.usi.ch platform.
- Access to the HPC cluster is only possible via secure protocols ( ssh, sftp, scp, rsync). The HPC clusters is only accessible from inside the USI network. If you would like to connect from a computer, which is not inside the USI network, then you would need to establish a VPN connection first. Outgoing connections to computers inside the USI network are not blocked.
- If you already have an active account, you can connect to the cluster via SSH and your USI LDAP password.
ssh username@rosa.usi.ch (e.g. schenko@usi.ch, your short USI user name)
- The LDAP account, which remains active for the entire stay at the University, is a UNIX / Linux account given to students, assistants, academic and administrative staff at the Faculty of Informatics.
- We strongly recommend you to access the cluster using ssh-keys. You can find the information about generating ssh keys, here.
Slurm
We have Slurm as a batch management system on the cluster (version-23.02.6).
You can find detailed tutorials, examples, manpages, etc: https://slurm.schedmd.com/
Slurm Partitions
Partition name | Max runtime | # of nodes | Range | Main features |
---|---|---|---|---|
slim | 48 hours | 22 | icsnode[17-38] | cpu-only, 64GB RAM |
gpu | 48 hours | 10 | icsnode[05-06,08-15] | Nvidia GPUs, 128GB RAM |
fat | 48 hours | 4 | icsnode[01-04] | cpu-only, 128GB RAM |
bigMem | 48 hours | 2 | icsnode[07,15] | 512GB RAM |
multi_gpu | 4 hours | 2 | icsnode[41,42] | 100GB RAM, 2 Nvidia GPUs each |
debug-slim | 4 hours | 1 | icsnode39 | cpu-only, 64GB RAM |
debug-gpu | 4 hours | 1 | icsnode16 | Nvidia GPUs, 128GB RAM |
Note: The sinfo
command gives you a quick status overview.
Job Submission
The job submission can be done with srun
command or by submitting a slurm job using sbatch
.
Some options of srun
/ sbatch
are:
Slurm Option | Description |
---|---|
-c or --cpus-per-task | this option is needed for multithreaded (e.g. OpenMP) jobs, it tells SLURM to allocate N cores per task allocated; typically N should be equal to the number of threads you program spawns, e.g. it should be set to the same number as OMP_NUM_THREADS |
-e or --error | specify a file name that will be used to store all error output (stderr), you can use %j (job id) and %N (name of first node) to automatically adopt the file name to the job, per default stderr goes to "slurm-%j.out" as well |
-o or --output | specify a file name that will be used to store all normal output (stdout), you can use %j (job id) and %N (name of first node) to automatically adopt the file name to the job, per default stdout goes to "slurm-%j.out" |
-n or --ntasks | set number of tasks to N(default=1). This determines how many processes will be spawned by srun (for MPI jobs) |
-N or --nodes | set number of nodes that will be part of a job, on each node there will be ntasks-per-node processes started, if the option ntasks-per-node is not given, 1 process per node will be started |
--ntasks-per-node | how many tasks per allocated node to start, as stated in the line before |
-p or --partition | select the type of nodes where you want to execute your job |
--time | specify the maximum runtime of your job, if you just put a single number in, it will be interpreted as minutes |
-J or --job-name | give your job a name which is shown in the queue |
--exclusive | tell SLURM that only your job is allowed on the nodes allocated to this job; please be aware that you will be charged for all CPUs/cores on the node |
-a or --array | submit an array job |
-w node1, node2,... | restrict job to run on specific nodes only |
-x node1, node2,... | exclude specific nodes from job |
--mem=MB | minimum amount of real memory |
--mem-per-cpu | specify the memory need per allocated CPU in MB, mem >= mem-per-cpu if mem is specified |
--mincpus | minimum number of logical processors (threads) per node |
--reservation=name | allocate resources from named reservation |
--gres=list | required generic resource |
It might be more convenient to put the options directly in a job file that you can submit using sbatch options.
The following example job file shows how you can use of slurm job files:
Simple Slurm job file
#!/bin/bash
#SBATCH --time=00:10:00
#SBATCH --output=simulation-m-%j.out
#SBATCH --error=simulation-m-%j.err
#SBATCH --nodes=4
#SBATCH --mem=12020
#SBATCH --ntasks-per-node=20
echo Starting Program
OMP Slurm job file
#!/bin/bash
#SBATCH -J OpenMP_job
#SBATCH --nodes=1
#SBATCH --tasks-per-node=1
#SBATCH --cpus-per-task=20
#SBATCH --time=00:10:00
#SBATCH --mem=12020
export OMP_NUM_THREADS=20
./path/to/binary
MPI Slurm job file
#!/bin/bash
#SBATCH -J MPI_job
#SBATCH --ntasks=80
#SBATCH --time=00:10:00
#SBATCH --mem=12020
mpirun ./path/to/binary
GPU Slurm job file
#!/bin/bash
#SBATCH -J MPI_job
#SBATCH --ntasks=1
#SBATCH --partition=gpu
#SBATCH --time=00:10:00
#SBATCH --mem=12020
#SBATCH --gres=gpu:1
./path/to/binary
During runtime, the environment variable SLURM_JOB_ID will be set to the id of your job.
Job and Slurm Monitoring
On the command line, use squeue
to watch the scheduling queue. To filter only jobs of a specific user use squeue -u <username>
, or to filter your jobs use squeue --me
. This command will tell the reason, why a job is not running (job status in the last column of the output). More information about job parameters can also be determined with scontrol -d show job jobid
Here are detailed descriptions of the possible job status:
Reason | Long description |
---|---|
Dependency | This job is waiting for a dependent job to complete. |
None | No reason is set for this job. |
PartitionDown | The partition required by this job is in a DOWN state. |
PartitionNodeLimit | The number of nodes required by this job is outside of its partitions current limits. Can also indicate that required nodes are DOWN or DRAINED. |
PartitionTimeLimit | The job’s time limit exceeds its partition’s current time limit. |
Priority | One or more higher priority jobs exist for this partition. |
Resources | The job is waiting for resources to become available. |
NodeDown | A node required by the job is down. |
BadConstraints | The job’s constraints can not be satisfied. |
SystemFailure | Failure of the SLURM system, a file system, the network, etc. |
JobLaunchFailure | The job could not be launched. This may be due to a file system problem, invalid program name, etc. |
NonZeroExitCode | The job was terminated with a non-zero exit code. |
TimeLimit | The job exhausted its time limit. |
InactiveLimit | The job reached the system InactiveLimit. |
Killing jobs
If you want to kill a running job you can use scancel
command.
The command scancel
kills a single job and removes it from the queue.
By using scancel -u
you can kill all of your jobs at once.
Reservations
In case you would like to reserve some nodes for running your jobs you could ask for a reservation to Cluster Administrator.
Please add the following information to your request mail:
- start time (note: start time has to be later than the day of the request)
- duration or end time (note: the longest jobs run 7 days)
- account
- node count or cpu count
- partition
After we agree with your requirements, we will send you an e-mail with your reservation name.
Then you could see more information about your reservation with the following command:
scontrol show res reservation_name
If you want to use your reservation, you have to add the parameter --reservation=reservation_name
either in your sbatch script or to your srun
or salloc
command.
Contact Admin
For issues regarding support you can use the serviceportal.usi.ch platform. On the platform it is possible to explore the full range of services available and to engage in dialogue with the IT support. For the issues related with the cluster please type “High-performance computing (HPC)” in the search bar (or use this link), select one of the categories and describe your problem. A detailed guide on how to use the service portal can be found here.