The Minnesota Supercomputing Institute
The Minnesota Supercomputing Institute (MSI) provides the software, hardware, storage and experts to support research projects in all research areas.
Registering for Access
UMN Employees
Request an MSI account and your account can be created from your UMN InternetID (formerly called x500).
Non-UMN Employees
In order to request an MSI account as a non-UMN employee, a Person of Interest (POI) account must first be created. The requester needs written approval from Dr. Damien Fair or another DCAN Leadership team member (Drs. Eric Feczko, Oscar Miranda-Dominguez, Steve Nelson, Anita Randolph, and Amanda Reuter).
Send written approval to Nora Byington (bying015@umn.edu) to initiate the POI account creation process.
Gaining Access
Establish VPN
To remotely log into MSI you need to establish a VPN connection. Detailed instructions can be found on HST/AHC: VPN and Remote Desktop Setup for Windows, OS X, and Linux.
NOTE: If you already have VPN on your computer for another institution, you will need to complete the following steps:
- Type "tc-vpn-1.umn.edu" in the box
- Choose "anyconnect-UofMvpnfull" from the group
- Enter your username and password
This installs required setups for VPN, after which you will have a dropdown with UMN choices and you can use "split tunnel" moving forward.
If you receive an error message the first time, simply re-open Cisco.
Permissions
To ensure the data and code created can be accessed by all, update your bashrc with the following steps (this only needs to be done the first time you access MSI). See here for some info on what a "bashrc" is.
- Open your bashrc file with a text editor, e.g.
emacs ~/.bashrc. - Set
umaskto0007. Theumaskis the default permission (self, group, likefaird, and all users can be given read, write, and execute access) applied to the files you create.0007gives read, write, and execute access to you and all group members, but no one else (this could be anyone in the university). See here for details. - Close the file and type
source ~/.bashrcinto the terminal to apply the changes. - Your bashrc is loaded each time you log in, so you only need to
sourceit when you edit it mid-session.
Structure of High Performance Computer (HPC) Systems
- To learn more about the structure of the HPC system, visit the MSI website for a list of tutorials.
- Launch
- Using clusters: To perform more advanced tasks, log in to one of MSI's clusters, like Mesabi or Mangi.
- From a login node:
ssh -Y <user>@mesabiormangi.
- From a login node:
-
Request resources for interactive computing, for performing simple analyses. For batch processing of many independent jobs (i.e. one job per ABCD session), see the section below. By default, you should request an interactive node. In some cases (i.e. high priority projects) you might request access to the dcan node.
- On Mesabi/Mangi (see here for more detail):
srun -N 1 --ntasks-per-node=4 --mem-per-cpu=4gb -t 4:00:00 -p interactive --x11 --pty bash-Nis the number of nodes, which can have multiple cores/CPUs. Usually, you only need one.--ntasks-per-nodeis the number of CPUs.- To change memory, change the argument of the flag
--mem-per-cpu. Total memory ismem-per-cputimesntasks-per-node - To change time, change the argument of the flag
-t. Be sure to specify time completely so you don't end up requesting 4 minutes instead of 4 hours.
- Using the dcan node. Reserve this node for high priority/high RAM/CPU power demands
srun -N 1 --cpus-per-task=4 --mem-per-cpu=4gb -t 4:00:00 --x11 -p dcan --pty bash
- On Mesabi/Mangi (see here for more detail):
-
Load the module you need. For example, if you want to run Matlab, just type in terminal
module load matlab.- Other common modules include:
fsl,R,python2orpython3, and HCPworkbench. - Typing e.g.
module avail Rwill show you all the versions available to load, in case versioning matters for your application.
- Other common modules include:
Data
To optimize computing resources the MSI offers two types of storage:
- High Performance Storage: Data in the High Performance Storage is backed up and is accessible from all MSI systems.
- Second Tier Storage: Big and archival data is stored in the Second Tier. Data for each particular project can be found on each PI's space, depending on who leads each particular project.
Storage locations should be determined in conversation with one or more DCAN Lab PI, and may change as space needs change.
Submitting Jobs
For non-interactive jobs, use slurm for job submissions.
In short, slurm allows a user to submit a script to be executed, not on the login/interactive node the user is on, but on MSI's various clusters, which are divided into partitions with different specifications. This link points to a table that shows the memory and time capacity of each partition for non-interactive jobs. Choosing a partition your job doesn't fit on will cause it not to be run.
There are two main ways to submit a script to slurm.
- A script can have a complete
#SBATCHheader as described in the link above. However, a job will submit with some default - very low - values, so if you don't specify a time limit, for example, your job will be given only 5 minutes to execute. Submitting these jobs to slurm can be as simple assbatch file.sh. - The same
#SBATCHheader options can be given as options tosbatch, e.g.sbatch --time=12:00:00 file.sh, which can be useful when parameters vary by job, or when setting job names with-J, since command line arguments cannot be passed to#SBATCHparameters, as they are Bash comments and therefore invisible to the script.
Scripts submitted to slurm must include any module calls necessary.
Parameters
The main parameters are number of cores: --ntasks-per-node; memory (--memory or --mem-per-cpu), and time (--time). It is also sometimes necessary to specify amount of "scratch space" in temporary storage on /tmp with --tmp. A complete list of options can be found here.
It is important to specify both enough resources for your job, but only just enough. Larger jobs take longer to queue, and smaller jobs can be fit between larger jobs. We recommend testing a handful of jobs to establish parameters as necessary - it can be quite frustrating when a job fails at 36 hours with only an hour left of processing!
Your jobs will be given a numeric ID and you can view all your jobs with squeue -u <user> or squeue --me. squeue alone will give you all jobs submitted to MSI across the university!
You can see how long a job actually took with seff <ID>
Job ID: 10000000
Cluster: mesabi
User/Group: park2589/dorfmank
State: COMPLETED (exit code 0)
Cores: 1
CPU Utilized: 00:03:12
CPU Efficiency: 87.67% of 00:03:39 core-walltime
Job Wall-clock time: 00:03:39
Memory Utilized: 289.38 MB
Memory Efficiency: 14.47% of 1.95 GB
This shows us that 4 minutes was probably a good estimate for this job (although it would have to be contextualized among all the jobs with similar inputs), but 2 GB was probably more than necessary.
This example job is very small - I wouldn't bother with requesting e.g. 0.5 or 2 GB, but the difference between 24 and 36 hours or 6 and 24 GB memory can definitely influence how long your job takes to start on slurm.
Accounts
Each PI has their own allocation on MSI. Damien Fair: faird; Eric Feczko feczk001; Oscar Miranda-Dominguez: miran045; Steve Nelson: ??; Anita Randolph: rando149. These allocations have storage and grid time allocated to them. As one account is more active, it becomes deprioritized relative to all other accounts on MSI.
You can see your priority ("FairShare") on each account with sshare -U <user> (note the capital -U). Higher numbers are better.
If you have permission to use a given PI's account for a given project, you can choose the account with the highest FairShare to get your jobs running sooner and avoid slamming those accounts with an already low FairShare.
Canceling jobs
Jobs can be canceled with the command scancel <ID> or all your jobs with scancel -u <user>.
Delays and Problems
If you experience delays or problems accessing the MSI or a particular node, the reason could be that the MSI might be non-fully operational at that time. The first Wednesday of each month the system is down for maintenance. Use the following links to check the MSI system and node status at any time:
Reporting Errors
If you are experiencing errors/issues, contact the following people:
- General usage or slurm issues: MSI help desk (
help@msi.umn.edu) - Parallelization or code running issues: Tim Hendrickson (
hendr522@umn.edu) - Pipeline issues:
- Infant pipeline: Luci Moore (
lmoore@umn.edu) - Human pipeline: Anders Perrone (
perr0372@umn.edu) - Macaque pipeline: Thomas Madison (
madisoth@ohsu.edu) - Rodent pipeline: Thomas Madison (
madisoth@ohsu.edu)