User Guide
This guide provides the essential information you need to get started,
understand the system rules, and run jobs effectively.
1. Getting Started: Your Account
Your account provides access to the cluster's resources.
Connecting: You can connect to the cluster's login node via SSH using its
IP address.
ssh [email protected]
Username: Your username is the part of your provided institute email
address in the form, before the "@" symbol, converted to lowercase.
Example: If your email is [email protected] , your username is rajesh.kumar .
Password: Your initial password is the 8-character account key provided to
you after filling out the registration form. The password is case-sensitive.
Changing Your Password: You can and should change your password after
your first login. To do so, run the following command in the terminal and
follow the prompts:
passwd
2. System Overview
The cluster consists of the following resources.
CPUs: 256 CPU Threads (Dual AMD EPYC 9554)
GPUs: 4 x NVIDIA RTX 6000 Ada Generation GPUs
3. Use Slurm for All Jobs
The login node is strictly for light tasks like editing files, and managing your
jobs. It is NOT for running calculations, data processing, or any computationally
User Guide 1
intensive tasks.
All computationally intensive tasks MUST be submitted to the compute node
through the Slurm scheduler.
Running heavy processes on the login node slows it down for everyone and is a
misuse of the system.
IMPORTANT: Any resource-intensive user processes found running directly on
the login node can lead to the termination of the responsible account.
4. Your Environment & Limits
To ensure fair use for all users, the following limits are in place:
Home Directory: Your default home directory is located at
/mnt/home2/home/your_username .
Storage Quota: Each user is restricted to a total of 220 GB of storage
space. You can check your current usage by running the quota command.
Job Time Limit: The maximum runtime for any single job is 1 day (24
hours).
GPU Limit: You are restricted to using a maximum of 1 GPU per job and per
user.
CPU Limit: A single job can request a maximum of 40 CPU threads.
Memory Limit: Jobs are allocated a maximum of 4 GB (4000 MB) of RAM
for each CPU thread they request. For example, a job requesting 10 CPUs
can use up to 40 GB of RAM.
5. Running Jobs with Slurm
You can submit jobs interactively with srun for testing or non-interactively with
sbatch for longer runs.
Example 1: Run a simple interactive CPU command
This command requests 2 CPUs and 8GB memory and runs the command
hostname
srun --cpus-per-task=2 --mem=8G hostname
User Guide 2
Example 2: Start an interactive session with a GPU
This is very useful for development and debugging.
srun --gres=gpu:1 --cpus-per-task=4 --mem=16G --pty sh -i
Example 3: Submit a Python script as a batch job
This is the standard way to run non-interactive jobs.
sbatch --cpus-per-task=4 --mem=16G --gres=gpu:1 --time=02:30:00 job.s
h
6. Checking Job Status
See your jobs in the queue:
squeue -u your_username
See the status of the partitions:
sinfo
Cancel a running or pending job:
scancel <your_job_id>
User Guide 3