How to know how many jobs you can run on the ULHPC?

Disclaimer

The following knowledge nugget simplifies some information for the sake of clarity.
Note that:

  • The numbers below are for running jobs. You can request more but the jobs above the thresholds will be queued.
  • The total number of ressources can be lower as they are shared among the user groups.

I want to run GPU jobs

For one job, you can reserve up to:

  • 4 nodes, if your job lasts less than 2 days.
  • 2 nodes, if you need a longer job (up to 14 days)

Note: each GPU node contains 4 GPUs, thus you can potentially use up to 16 GPUs.

For multiple up to 2 days jobs, you can:

  • Have a maximum of 50 jobs
  • Each of those jobs can request a maximum of 4 nodes
  • The total amount of ressource for all the jobs cannot exceed 4 nodes

For multiple up to 14 days jobs, you can:

  • Have a maximum of 4 jobs
  • Each of those jobs can request a maximum of 2 nodes
  • The total amount of ressource for all the jobs cannot exceed 2 nodes

I want to run a job that requires a lot of memory

When you need more than 256 Go of RAM on a single node, then your only option is to use the bigmem nodes which have 3To of RAM. If you RAM requirements can be distributed over multiple nodes, please consider using AION nodes instead of the bigmem nodes which are heavily booked and not available in large number (only 4!).

For one job, you can reserve up to 1 node (112 cores and 3To of RAM).

For multiple jobs, you can:

  • Have a maximum of 4 jobs
  • Each of those jobs can request a maximum of 1 node
  • The total amount of ressource for all the jobs cannot exceed 1 node

I want to run a long job (longer than 2 days)

  • Maximum of 4 jobs

GPU node

  • Each of those jobs can request a maximum of 2 nodes
  • The total amount of ressource for all the jobs cannot exceed 2 nodes

Bigmem node

  • Each of those jobs can request a maximum of 1 node
  • The total amount of ressource for all the jobs cannot exceed 1 nodes

Regular node

  • Each of those jobs can request a maximum of 2 node
  • The total amount of ressource for all the jobs cannot exceed 12 nodes

I want to run a job on a lot of (regular, non gpu, non bigmem) nodes

  • You can have a maximum of 50 jobs
  • Each of those jobs can request a maximum of 64 nodes (on AION this means 8192 cores!)
  • The total amount of ressource for all the jobs cannot exceed 64 nodes