Imperial College provides research computing resources for all College researchers, with the standard service being free at the point of use. You must be registered in order to access our systems:
- Permanent academic staff: please register yourself for access. Once you are registered you may register members of your group. Please note that by doing so, you accept responsibility for their behaviour on the systems. (Please note you can only access our registration pages whilst connected to the College network).
- Post-docs, PhD and research postgraduate students: please contact your group leader or supervisor who will be able to register you.
- Undergraduates and taught postgraduates: our systems are for research only. If you are working on a computational project your supervisor may elect to register you. Please consult with them.
- Academic visitors: visitors will require a college account (NOT just a guest account). To do this, an Ask form should be made by the hosting department’s nominated Username Contact. This should be a generic request, for An external user account for HPC access, and must include the full name, business email and affiliation of the visitor, as well as the name of the hosting researcher at IC, and also an expiry date (max one year). Once the account has been created, it can be enrolled as service user by any permanent member of academic staff (see above).
Please be aware that our systems are general purpose computing resources. If you are working with sensitive personal data, even if anonymised, please consult with the Service Manager before commencing work.
We operate three systems:
- CX1 for general and high throughput work. This is the place to start if you are new to the service, and is likely to satisfy most of your needs.
- CX2 for high-end parallel work. This is only for parallel programs that use10s-100s of cores at once. Most applications in this category are physics or engineering programs and use MPI.
- AX4 for big data analysis. This is for processing multi-terabyte datasets by a single program that cannot be parallelised across multiple nodes of our other systems.
These are all cluster systems, meaning that they are composed of many independent computers, or nodes. These are managed by a batch system which is responsible for matching your compute tasks, known as jobs, to available nodes. As a user, you submit a job to the batch system which holds it in a queue, along with those of other users, until there are sufficient nodes free to run it.
New to Linux
All our systems run Linux, and you'll need some familiarity with working in a terminal. If this is new to you, please read our introductions to the command-line and shell scripting. We also offer regular training courses.
SSH, Data Management, Applications
Running your first job
Our resources are batch processing systems. Rather than being run directly from the command line, jobs get submitted to a queue where they are held until compute resources become free. A job is defined by a shell script that contains all of the commands to be run.
Running your first job
Anatomy of a job script
Every job script must start with two lines that describe the resources required by the work. The first specifies the maximum amount of time the job will be allowed to run for in hours:minutes:seconds:
The second gives the the number of cores N and memory M that the job needs:
Next come the module loads for all of the software the job needs, for example:
module load anaconda3/personal
The initial working directory of a job is a private temporary directory created just for the job, and deleted once it is done. Take that into account when crafting paths to input files. The path of the directory that the job was submitted from is present in the environment variable PBS_O_WORKDIR, and that of the temporary directory in TMPDIR.
Next come the commands for the program you actually want to run, for example:
python $HOME/myprog.py $PBS_O_WORKDIR/path/to/input.txt
Finally, stage any output file back from TMPDIR to permanent storage in your WORK directory
cp * $WORK/$PBS_JOBID
Choosing job resources
It's very important that you accurately specify the jobs' resource requirements in the #PBS directives.
Advice on choosing the right resource requirements for your work is in our job sizing guidelines. If you are moving to the system for the first time, you probably need to start in the throughput class, or general if you know that the program you intend to use is capable of parallel execution.
As a general rule, the smaller the resource request, the less time the job will spend queuing.
The are some sample job scripts for you to use to get started with on CX1. To get them, do:
module load my-first-job
Submitting and monitoring a job
Submit a job with the following command:
Once a job is submitted, you can follow its progress with qstat command.
The image below shows an example of a job script being submitted on CX1. The job script is called blastp.pbs, it starts a BLAST job on 16 cores on one node. The job is started with qsub blastp.pbs. This will return a unique id for the job (9582789).
Jobs belonging to one user can be monitored with qstat. In the example below, the first ivocation shows the job waiting in the queue (status "S" is "Q"). The second time, it shows that the job is running "R". You can also monitor the state of jobs via the web.
When a job finishes, it disappears from the queue. Any text output is captured by the system and returned to the submission directory,in two files named after the jobscript and with the job id as suffix.
If you need to delete a job, ether while it is still queuing, or running, use the command qdeljobid.