Running SUMO on UGent HPC

From SUMOwiki
Jump to navigationJump to search

Introduction

This page provides instructions on how to use the HPC infracture of Ghent university only, although it might apply to other clusters as well. Make sure you have the latest nightly build version of the SUMO Toolbox which you can find here as these instructions are not supported by the SUMO Toolbox version 7.0.2. For more information about using the toolbox with distributed backends go here. Note though that many features of the toolbox are not yet tested :).

For more information about the UGent HPC infracture itself please visit their website. To learn how to run Matlab in general on the UGent HPC see this page.

Compiling a standalone copy of the SUMO Toolbox for use on the HPC

The SUMO Toolbox needs to be compiled because the worker nodes of the HPC cannot connect to license server and due to the limited number of Matlab licenses. By compiling the toolbox, you create a standalone program that does not need to connect to the license server. The compiled version of the toolbox will not have all functionality however and some functionalities such the GA algorithms and the ANN Toolbox.

The process of compiling the toolbox is pretty straightforward. Copy the toolbox to your working directory on the server, then load the required modules for compilation. If the separate components of the toolbox still need to be compiled (such as the Java classes and some binary code for certain model types), do so using make. By default the examples directory is not included in the compiled version of the toolbox and you probably want to add your example to the mcc options. To make a standalone SUMO Toolbox version, type in make-csumo. The steps you need to follow are summarized below.

Currently, compiled MATLAB code can be ran on haunter, gastly (recommended) and gengar. All other clusters (including the default delcatty) do not have MCR installed. Also, remember that the checkpointing framework (for jobs > 72 hours) does not work with MATLAB. SUMO Runs can take no more than 72 hours. Therefor, include only one run (with no repeat) in your configs, and submit multiple jobs to repeat experiments (for example using array jobs).

  • Log on to the HPC
  • Upload a copy of the toolbox if you haven't done so. Tip: place and install the toolbox in $VSC_DATA
  • On the Login node, select the proper cluster. For example, to use the gastly cluster:
        module swap cluster/gastly
  • Load the Matlab and ant module using these commands (you may want to change the versions to suit your needs):
        module load MATLAB/2012b
        module load ant
        export _JAVA_OPTIONS="-Xmx1024M -Xms512M" (use this option in case you need more heap space)
  • Open the Makefile in the SUMO root directory and change MATLABDIR to ${EBROOTMATLAB}
        MATLABDIR ?= ${EBROOTMATLAB}
  • In the terminal, change directory to the SUMO root directory and type in "make" to compile the toolbox for use (this will compile all the Java classes and other libraries needed by the SUMO Toolbox)
  • Verify that this first compilation has worked by starting Matlab in the terminal and doing a test run (this will only work on the login node)
  • Edit the Makefile again and add the path to the example(s) you plan to run. This is only required if you use a MATLAB script as simulator (MCR can only run scripts that have been included by mcc). The mcc line:
        ${MATLABDIR}/bin/mcc -m -v -a pathToMyExample -a ./src/matlab/ -a ./configure.m -a ./startup.m -R '-nodesktop, -nosplash' -d './dist/csumo-toolbox' ./go.m
  • Compile the SUMO Toolbox for standalone use by typing into terminal the following command: make dist-csumo

Testing on the HPC infracture

You can only run matlab code that was included in the mcc options. To run the standalone version of the toolbox, use the run_go.sh bash script which takes two arguments the MCR root and your configuration file. Note: make sure you test your configuration file extensively both locally and on the debug queue of the HPC to avoid wasting your HPC resources as the infrastructure can be very busy at times.


  • Log on to HPC and request for some worker nodes (see the HPC wiki for more information), for example this command will request the HPC for an interactive session:
        qsub -I 
  • Wait for the session to be started. In the terminal, change directory to where the compiled SUMO Toolbox is located (by default this is dist/csumo-toolbox)
  • Set the MCRROOT environmental variable to point to the MCR root, e.g:
        export MCRROOT=/apps/gent/gengar/harpertown/software/MATLAB/MCR_2011a/v715
  • Try to run a configuration xml-file using:
        ./run_go.sh $MCRROOT pathToYourConfig/yourConfig.xml

Only use this method for testing! Submit jobs for the real work (see below).

Submit SUMO jobs

Example job script to run a SUMO job on the HPC. It reserves an MCR cache to avoid issues when 2 MATLAB jobs end up on the same node.

#!/bin/bash

#PBS -N JOBNAME
#PBS -l nodes=1:ppn=1
#PBS -l walltime=11:59:00
#PBS -l vmem=8gb

## name of .m file
name=go

## options to pass to the executable (= parameters for go script)
opts="config/default.xml"

## directory where the execuatble and script can be found
## PBS_O_WORKDIR variable points to the directory you are submitting from
dir=/path/to/compiled/sumo-toolbox

## version: version of MATLAB
version=2012b

module load MATLAB/${version}

if [ ! -d $dir ]
then
  echo "Directory $dir is not a directory"
  exit 1
fi

cd $dir

if [ ! -x $name ]
then
  echo "No executable $name found."
  exit 2
fi
script=run_${name}.sh
if [ ! -x  $script ]
then
  echo "No run script $script found"
  exit 3
fi

## make cache dir
## TMPDIR is set and created by torque. 1 unique dir per job
cdir=$TMPDIR/mcrcache
mkdir -p $cdir
if [ ! -d $cdir ]
then
  echo "No tempdir $cdir found."
  exit 1
fi

## set dir
export MCR_CACHE_ROOT=$cdir
## 1GB cache (more then large enough)
export MCR_CACHE_SIZE=$((1024*1024*1024))

## real running
./$script ${EBROOTMATLAB} $opts > ~/${PBS_JOBNAME}.log 2> ~/${PBS_JOBNAME}.err

Ideally, configure output to be stored on the local disc of the node to avoid network overhead (although this is currently not possible as environment variables are unavailable in the XML!). Another alternative is $VSC_SCRATCH (fast access network storage). echo ${VSC_SCRATCH} provides the absolute path which can be used in the config. Do not claim many cores: the advantage of multicore computing in MATLAB is quite limited, 1 or 2 should be ok. This has the additional advantage of being placed on nodes which are only partially filled.

To submit the job to the schedule:

        qsub -q long run_script.sh

To get an overview of all queues and their properties:

        qstat -q

To get an overview of the status of your submitted jobs:

        qstat -u `whoami`

For more advanced topics (such as array jobs), please refer to the HPC user wiki.