Neon HPC Tutorial

Queues

When we say that a compute node “belongs” to the Department of Biostatistics, what does that mean? The basic idea is that the node is there for us to use whenever we want, although if we’re not using it, other people are allowed to. This works both ways: if there are compute nodes not being used, we can use them in addition to our own node(s).

All this is managed through a batch queuing system – you place your computing request with the scheduler, and it decides whose jobs run on which node. As part of the request, you specify a queue (essentially, you declare which line you want to wait in). There are three queues worth knowing about:

TUKEY: This is the name of the general-purpose Department of Biostatistics queue. It has 32 2.6 GHz Intel Xeon processors and 512 GB of RAM (spread out over two physical nodes).
UI: This is the name of the general-purpose University of Iowa queue. As a member of the University of Iowa, these compute nodes belong to you just as much as the TUKEY queue does, although there is a limit of 10 jobs you can run at once on the UI queue.
all.q: Submitting jobs to this queue means that they will run on any compute node that is available. In particular, it means that your job could run on a node belonging to someone else, and if they want it back, your job would be immediately terminated.

There are also some other queues that could be worth knowing about if you have specific resource requests, such as high memory or accelerator cards; see the Neon documentation for more details on those queues.

The Sun Grid Engine

There are a variety of batch schedulers; the one used by the Neon cluster is the Sun Grid Engine (SGE). The next several pages discuss the SGE commands for submitting, controlling, and monitoring jobs submitted to the compute nodes.