0% found this document useful (0 votes)
29 views3 pages

User Guide

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views3 pages

User Guide

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

User Guide

This guide provides the essential information you need to get started,
understand the system rules, and run jobs effectively.

1. Getting Started: Your Account


Your account provides access to the cluster's resources.

Connecting: You can connect to the cluster's login node via SSH using its
IP address.

ssh [email protected]

Username: Your username is the part of your provided institute email


address in the form, before the "@" symbol, converted to lowercase.

Example: If your email is [email protected] , your username is rajesh.kumar .

Password: Your initial password is the 8-character account key provided to


you after filling out the registration form. The password is case-sensitive.

Changing Your Password: You can and should change your password after
your first login. To do so, run the following command in the terminal and
follow the prompts:

passwd

2. System Overview
The cluster consists of the following resources.

CPUs: 256 CPU Threads (Dual AMD EPYC 9554)

GPUs: 4 x NVIDIA RTX 6000 Ada Generation GPUs

3. Use Slurm for All Jobs


The login node is strictly for light tasks like editing files, and managing your
jobs. It is NOT for running calculations, data processing, or any computationally

User Guide 1
intensive tasks.
All computationally intensive tasks MUST be submitted to the compute node
through the Slurm scheduler.
Running heavy processes on the login node slows it down for everyone and is a
misuse of the system.
IMPORTANT: Any resource-intensive user processes found running directly on
the login node can lead to the termination of the responsible account.

4. Your Environment & Limits


To ensure fair use for all users, the following limits are in place:

Home Directory: Your default home directory is located at


/mnt/home2/home/your_username .

Storage Quota: Each user is restricted to a total of 220 GB of storage


space. You can check your current usage by running the quota command.

Job Time Limit: The maximum runtime for any single job is 1 day (24
hours).

GPU Limit: You are restricted to using a maximum of 1 GPU per job and per
user.

CPU Limit: A single job can request a maximum of 40 CPU threads.

Memory Limit: Jobs are allocated a maximum of 4 GB (4000 MB) of RAM


for each CPU thread they request. For example, a job requesting 10 CPUs
can use up to 40 GB of RAM.

5. Running Jobs with Slurm


You can submit jobs interactively with srun for testing or non-interactively with
sbatch for longer runs.

Example 1: Run a simple interactive CPU command


This command requests 2 CPUs and 8GB memory and runs the command
hostname

srun --cpus-per-task=2 --mem=8G hostname

User Guide 2
Example 2: Start an interactive session with a GPU
This is very useful for development and debugging.

srun --gres=gpu:1 --cpus-per-task=4 --mem=16G --pty sh -i

Example 3: Submit a Python script as a batch job


This is the standard way to run non-interactive jobs.

sbatch --cpus-per-task=4 --mem=16G --gres=gpu:1 --time=02:30:00 job.s


h

6. Checking Job Status


See your jobs in the queue:

squeue -u your_username

See the status of the partitions:

sinfo

Cancel a running or pending job:

scancel <your_job_id>

User Guide 3

You might also like