qsub

OK, the preliminaries are out of the way and we are now ready to use the batch scheduler for its main purpose: submitting jobs to be run on the compute nodes. The command for this is qsub, and we will illustrate its use with the sim.R file from earlier. Let’s create an executable script that runs sim.R in batch mode:

File:
sim
#!/bin/bash
R CMD BATCH --no-save --no-restore "--args $1" sim.R .sim.Rout

Note that this version of sim allows you to pass an argument to sim.R. Thus:

>
./sim 5

creates a file called sim5.RData and so on. At this point, let’s log out of the TUKEY node back into a login node (if you ran the above command from a login node, don’t worry about it – this simulation isn’t particularly computer-intensive. But in general it is very important to avoid running simulations on the login nodes).

Running this simulation on a compute node is (almost) as easy as submitting qsub sim 5. However, before we do this, we need to mention two important arguments to qsub. One is -cwd, which says that we want to run the command in our current working directory (I can’t imagine why you would ever not want to do this); the other is -q TUKEY, which specifies the queue (again, you could also submit to UI or all.q). So, let’s run the simulation on a compute node:

>
qsub -cwd -q TUKEY sim 3

After a little while, a file sim3.RData will appear. You may also notice that two other files appeared: sim.e1002669 and sim.o1002669 (your numbers will be different). These contain a log of the errors (stderr) and output (stdout) from your submission. Typically, they are not interesting (in this case, they are both empty), but they are certainly valuable if you are trying to debug things or if you have a command that prints out important information when it finishes.

Personally, I like to keep the error and output streams in separate folders to avoid clutter. Also, I always specify -cwd and -V, which passes along the current environmental variables (especially PATH), and almost always want to specify -q TUKEY. Rather than type all these out every time, I placed the following alias in my .bash_profile file:

File:
~/.bash_profile
<...more...>
alias qsub="qsub -cwd -V -e ~/err -o ~/out -q TUKEY"
<...more...>

The commands in .bash_profile are run at each login, but we can force them to run now with:

>
source ~/.bash_profile

You don’t ever have to do this again; it will happen automatically from now on when you log in.

Now, running our simulation on a compute node really is as easy as submitting:

>
qsub sim 5

The downside to the use of aliases is that it prevents us from changing the queue with -q. There are many ways around this, such as leaving the -q option out of the alias, having separate aliases for each queue, or writing shell functions. By all means, use what works best for you, but in what follows, I’ll use this version of qsub with the defaults modified to avoid clutter. Keep in mind, though, that if you have not modified qsub in this way, you’ll have to type in all the -cwd, etc. arguments.