Resources

student working on a laptop sitting outside leaning against a tree

Information Technology

The Hellgate computing cluster is a non-homogeneous cluster composed of 45 compute nodes and ~25 gpu nodes.

When setting up an sbatch script there are a number of sbatch variables that can be used when requesting resources. The complete list of nodes and their specific resources are as follows. Note: Some nodes are currently undergoing maintenance and may be unavailable. To see a list of all nodes and their current resources you can run the command "sinfo -Nl | grep <partition_name>" replacing the partition name with a specific partition(listed at the bottom of this page).

CPU Nodes:

Node Name	CPU's	Sockets: CPUs: Threads	Memory (GB)
hgcpu1-1	72	2:18:2	754.5
hgcpu1-2	72	2:18:2	754.5
hgcpu1-3	72	2:18:2	754.5
hgcpu1-4	72	2:18:2	754.5
hgcpu1-5	72	2:18:2	754.5
hgcpu1-6	72	2:18:2	754.5
hgcpu1-7	72	2:18:2	754.5
hgcpu1-8	72	2:18:2	754.5
hgcpu2-1	32	2:8:2	251.7
hgcpu2-2	32	2:8:2	251.7
hgcpu2-3	32	2:8:2	251.7
hgcpu2-4	32	2:8:2	251.7
hgcpu3-1	36	2:18:1	187.5
hgcpu3-2	36	2:18:1	171.8
hgcpu3-3	36	2:18:1	187.5
hgcpu3-4	36	2:18:1	187.5
hgcpu3-5	36	2:18:1	187.5
hgcpu3-6	36	2:18:1	171.8
hgcpu3-7	36	2:18:1	187.5
hgcpu3-9	36	2:18:1	187.5
hgcpu3-10	36	2:18:1	171.7
hgcpu3-11	36	2:18:1	171.7
hgcpu3-12	36	2:18:1	187.5
hgcpu6-1	48	2:24:1	187.5
hgcpu6-2	48	2:24:1	187.5
hgcpu6-3	48	2:24:1	187.5
hgcpu6-4	48	2:24:1	187.5
hgcpu6-5	48	2:24:1	187.5
hgcpu6-6	48	2:24:1	187.5
hgcpu6-7	48	2:24:1	187.5
hgcpu6-8	48	2:24:1	187.5
hgcpu6-9	48	2:24:1	187.5
hgcpu6-10	48	2:24:1	187.5
hgcpu6-11	48	2:24:1	187.5
hgcpu6-12	48	2:24:1	187.5
hgcpu6-13	48	2:24:1	187.5
hgcpu6-14	48	2:24:1	187.5
hgcpu6-15	48	2:24:1	187.5
hgcpu6-16	48	2:24:1	187.5
hgcpu7-1	128	4:16:2	182.6
hgcpu8-1	56	2:28:1	251.5
hgcpu8-2	56	2:28:1	251.5
hgcpu8-3	56	2:28:1	251.5
hgcpu8-4	56	2:28:1	251.5

GPU Nodes:

Node Name:	CPUs	Sockets: CPUs: Threads	Memory(GB)	GPUs(Model+amount)
hggpu4-1	36	2:18:1	502.5	4xNVIDIA RTX2080Ti
hggpu4-2	36	2:18:1	502.5	4xNVIDIA RTX2080Ti
hggpu4-3	36	2:18:1	502.5	4xNVIDIA RTX2080Ti
hggpu4-4	36	2:18:1	502.5	4xNVIDIA RTX2080Ti
hggpu4-5	36	2:18:1	502.5	4xNVIDIA RTX2080Ti
hggpu4-6	36	2:18:1	502.5	4xNVIDIA RTX2080Ti
hggpu4-7	36	2:18:1	502.5	4xNVIDIA RTX2080Ti
hggpu4-8	36	2:18:1	502.5	4xNVIDIA RTX2080Ti
hggpu4-9	36	2:18:1	502.5	4xNVIDIA RTX2080Ti
hggpu4-10	36	2:18:1	502.5	4xNVIDIA RTX2080Ti
hggpu4-11	36	2:18:1	502.5	4xNVIDIA RTX2080Ti
hggpu4-12	36	2:18:1	502.5	4xNVIDIA RTX2080Ti
hggpu5-1	48	2:24:1	439.5	4xNVIDIA A40
hggpu9-1	48	2:24:1	503.5	4xNVIDIA A4500 Ada gen
hggpu9-2	48	2:24:1	503.5	4xNVIDIA A4500 Ada gen
hggpu9-3	48	2:24:1	503.5	4xNVIDIA A4500 Ada gen
hggpu9-4	48	2:24:1	503.5	4xNVIDIA A4500 Ada gen
hggpu9-5	48	2:24:1	503.5	4xNVIDIA A4500 Ada gen
hggpu9-6	48	2:24:1	503.5	4xNVIDIA A4500 Ada gen
hggpu9-7	48	2:24:1	503.5	4xNVIDIA A4500 Ada gen
hggpu9-8	48	2:24:1	503.5	4xNVIDIA A4500 Ada gen
hggpu9-9	48	2:24:1	503.5	4xNVIDIA A4500 Ada gen
hggpu9-10	48	2:24:1	503.5	4xNVIDIA A4500 Ada gen
hggpu9-11	48	2:24:1	503.5	4xNVIDIA A4500 Ada gen
hggpu9-12	48	2:24:1	503.5	4xNVIDIA A4500 Ada gen

When creating your job's slurm submission script(sbatch) in addition to the specific amounts of resources (RAM, CPUs, etc) you will need to specify which partition(group of nodes) that you're requesting those resources from. As a majority of our nodes are grant funded most of the nodes will have priority access given to the PIs of their grants, those lab groups will be able to submit jobs to the partitions of said grants and these will preempt(end and requeue lower priority jobs). Priority tiers are 1-3 with 1 being lowest priority able to be preempted by partitions of higher numbers. Partitions listed under (safe) do not have a specific grant/PI group associated with them and are not at risk of being preempted, however you should still aim to checkpoint your jobs.

Quality of Service queues are currently in the works and it will change the way that resources are requested, while partitions will largely stay consistent. Specific resource limitations will also be implemented.

The Partitions,the purposes, and their Priorities are as follows(note, while most do not yet have maximum time-limits implemented, jobs running longer than 7 days are at risk of being ended early for maintenance or other issues):

Partitions:

Partition:	Nodes:	Max Time:	Priority:
normal(preemptable)	hgcpu1-[1-8],hgcpu2-[1-4],hgcpu3-[1-12],hggpu4-[1-12],hggpu5-1,hgcpu6-[1-16],hgcpu7-1,hgcpu8-[1-4]	N/A	1
cpu(all)	hgcpu1-[1-8],hgcpu2-[1-4],hgcpu3-[1-12],hgcpu6-[1-16],hgcpu7-1,hgcpu8-[1-4]	N/A	2
cpu(safe)	hgcpu3-[9-12],hgcpu6-[1-16]	N/A	2
gpu(all)	hggpu4-[1-12],hggpu5-1	N/A	2
gpu(safe)	hggpu4-[1-12]	N/A	2
Reincarnation(Priority)	hgcpu7-1	N/A	3
Brinkerhoff(Priority)	hggpu5-1	N/A	3
Martens(Priority)	hgcpu8-[1-4]	N/A	3
GSCC(Priority)	hgcpu1-[1-8],hgcpu2-[1-4]	N/A	3
INBRE(Priority)	hgcpu3-[1-8]	N/A	3
interactive	hggpu9-[1-12]	24 hours	3
batch	hggpu9-[1-12]	7 days	1

�����ؿ�

Information Technology

CPU Nodes:

Node Name

CPU's

Sockets: CPUs: Threads

Memory (GB)

GPU Nodes:

Node Name:

CPUs

Sockets: CPUs: Threads

Memory(GB)

GPUs(Model+amount)

Partitions:

Partition:

Nodes:

Max Time:

Priority:

��ؿ�