SUMOwiki - User contributions [en]

Running SUMO on UGent HPC

2014-03-17T15:14:50Z

Javdrher: /* Testing on the HPC infracture */

== Introduction ==

This page provides instructions on how to use the HPC infracture of Ghent university only, although it might apply to other clusters as well. Make sure you have the latest nightly build version of the SUMO Toolbox which you can find [[Downloading|here]] as these instructions are not supported by the SUMO Toolbox version 7.0.2. For more information about using the toolbox with distributed backends go [[Add_Distributed_Backend|here]]. Note though that many features of the toolbox are not yet tested :).

For more information about the UGent HPC infracture itself please visit their [http://hpc.ugent.be/userwiki/index.php/Main_Page website].
To learn how to run Matlab in general on the UGent HPC see this [http://hpc.ugent.be/userwiki/index.php/Main_Page/User:MATLAB page].

== Compiling a standalone copy of the SUMO Toolbox for use on the HPC ==
The SUMO Toolbox needs to be compiled because the worker nodes of the HPC cannot connect to license server and due to the limited number of Matlab licenses. By compiling the toolbox, you create a standalone program that does not need to connect to the license server. The compiled version of the toolbox will not have all functionality however and some '''functionalities''' such the GA algorithms and the ANN Toolbox.

The process of compiling the toolbox is pretty straightforward. Copy the toolbox to your working directory on the server, then load the required modules for compilation. If the separate components of the toolbox still need to be compiled (such as the Java classes and some binary code for certain model types), do so using make.
By default the examples directory is not included in the compiled version of the toolbox and you probably want to add your example to the mcc options. To make a standalone SUMO Toolbox version, type in make-csumo. The steps you need to follow are summarized below.

Currently, compiled MATLAB code can be ran on haunter, gastly (recommended) and gengar. All other clusters (including the default delcatty) do not have MCR installed. Also, remember that the checkpointing framework (for jobs > 72 hours) does not work with MATLAB. SUMO Runs can take no more than 72 hours. Therefor, include only one run (with no repeat) in your configs, and submit multiple jobs to repeat experiments (for example using array jobs).

* Log on to the HPC
* Upload a copy of the toolbox if you haven't done so. Tip: place and install the toolbox in $VSC_DATA
* On the Login node, select the proper cluster. For example, to use the gastly cluster:
module swap cluster/gastly
* Load the Matlab and ant module using these commands (you may want to change the versions to suit your needs):
module load MATLAB/2012b
module load ant
export _JAVA_OPTIONS="-Xmx1024M -Xms512M" (use this option in case you need more heap space)
* Open the Makefile in the SUMO root directory and change MATLABDIR to ${EBROOTMATLAB}
MATLABDIR ?= ${EBROOTMATLAB}
* In the terminal, change directory to the SUMO root directory and type in "make" to compile the toolbox for use (this will compile all the Java classes and other libraries needed by the SUMO Toolbox)
* Verify that this first compilation has worked by starting Matlab in the terminal and doing a test run (this will only work on the login node)
* Edit the Makefile again and add the path to the example(s) you plan to run. This is only required if you use a MATLAB script as simulator (MCR can only run scripts that have been included by mcc). The mcc line:
${MATLABDIR}/bin/mcc -m -v -a '''pathToMyExample''' -a ./src/matlab/ -a ./configure.m -a ./startup.m -R '-nodesktop, -nosplash' -d './dist/csumo-toolbox' ./go.m
* Compile the SUMO Toolbox for ''standalone'' use by typing into terminal the following command: <code>make dist-csumo</code>

== Testing on the HPC infracture ==
You can only run matlab code that was included in the mcc options. To run the standalone version of the toolbox, use the <code>run_go.sh</code> bash script which takes two arguments the MCR root and your configuration file. Note: make sure you test your configuration file extensively both locally and on the debug queue of the HPC to avoid wasting your HPC resources as the infrastructure can be very busy at times.

* Log on to HPC and request for some worker nodes (see the HPC wiki for more information), for example this command will request the HPC for an interactive session:
qsub -I
* Wait for the session to be started. In the terminal, change directory to where the compiled SUMO Toolbox is located (by default this is dist/csumo-toolbox)
* Set the MCRROOT environmental variable to point to the MCR root, e.g:
export MCRROOT=/apps/gent/gengar/harpertown/software/MATLAB/MCR_2011a/v715
* Try to run a configuration xml-file using:
./run_go.sh $MCRROOT pathToYourConfig/yourConfig.xml

'''Only use this method for testing! Submit jobs for the real work (see below).'''

== Submit SUMO jobs ==
Example job script to run a SUMO job on the HPC. It reserves an MCR cache to avoid issues when 2 MATLAB jobs end up on the same node.

<source lang="Bash">
#!/bin/bash

#PBS -N JOBNAME
#PBS -l nodes=1:ppn=1
#PBS -l walltime=11:59:00
#PBS -l vmem=8gb

## name of .m file
name=go

## options to pass to the executable (= parameters for go script)
opts="config/default.xml"

## directory where the execuatble and script can be found
## PBS_O_WORKDIR variable points to the directory you are submitting from
dir=/path/to/compiled/sumo-toolbox

## version: version of MATLAB
version=2012b

module load MATLAB/${version}

if [ ! -d $dir ]
then
echo "Directory $dir is not a directory"
exit 1
fi

cd $dir

if [ ! -x $name ]
then
echo "No executable $name found."
exit 2
fi
script=run_${name}.sh
if [ ! -x $script ]
then
echo "No run script $script found"
exit 3
fi

## make cache dir
## TMPDIR is set and created by torque. 1 unique dir per job
cdir=$TMPDIR/mcrcache
mkdir -p $cdir
if [ ! -d $cdir ]
then
echo "No tempdir $cdir found."
exit 1
fi

## set dir
export MCR_CACHE_ROOT=$cdir
## 1GB cache (more then large enough)
export MCR_CACHE_SIZE=$((1024*1024*1024))

## real running
./$script ${EBROOTMATLAB} $opts > ~/${PBS_JOBNAME}.log 2> ~/${PBS_JOBNAME}.err
</source>
Ideally, configure output to be stored on the local disc of the node to avoid network overhead (although this is currently not possible as environment variables are unavailable in the XML!). Another alternative is $VSC_SCRATCH (fast access network storage). <code>echo ${VSC_SCRATCH}</code> provides the absolute path which can be used in the config. Do not claim many cores: the advantage of multicore computing in MATLAB is quite limited, 1 or 2 should be ok. This has the additional advantage of being placed on nodes which are only partially filled.

To submit the job to the schedule:
qsub -q long run_script.sh

To get an overview of all queues and their properties:
qstat -q

To get an overview of the status of your submitted jobs:
qstat -u `whoami`

For more advanced topics (such as array jobs), please refer to the HPC user wiki.

Running SUMO on UGent HPC

2014-03-17T15:14:34Z

Javdrher: /* Testing on the HPC infracture */

== Introduction ==

This page provides instructions on how to use the HPC infracture of Ghent university only, although it might apply to other clusters as well. Make sure you have the latest nightly build version of the SUMO Toolbox which you can find [[Downloading|here]] as these instructions are not supported by the SUMO Toolbox version 7.0.2. For more information about using the toolbox with distributed backends go [[Add_Distributed_Backend|here]]. Note though that many features of the toolbox are not yet tested :).

For more information about the UGent HPC infracture itself please visit their [http://hpc.ugent.be/userwiki/index.php/Main_Page website].
To learn how to run Matlab in general on the UGent HPC see this [http://hpc.ugent.be/userwiki/index.php/Main_Page/User:MATLAB page].

== Compiling a standalone copy of the SUMO Toolbox for use on the HPC ==
The SUMO Toolbox needs to be compiled because the worker nodes of the HPC cannot connect to license server and due to the limited number of Matlab licenses. By compiling the toolbox, you create a standalone program that does not need to connect to the license server. The compiled version of the toolbox will not have all functionality however and some '''functionalities''' such the GA algorithms and the ANN Toolbox.

The process of compiling the toolbox is pretty straightforward. Copy the toolbox to your working directory on the server, then load the required modules for compilation. If the separate components of the toolbox still need to be compiled (such as the Java classes and some binary code for certain model types), do so using make.
By default the examples directory is not included in the compiled version of the toolbox and you probably want to add your example to the mcc options. To make a standalone SUMO Toolbox version, type in make-csumo. The steps you need to follow are summarized below.

Currently, compiled MATLAB code can be ran on haunter, gastly (recommended) and gengar. All other clusters (including the default delcatty) do not have MCR installed. Also, remember that the checkpointing framework (for jobs > 72 hours) does not work with MATLAB. SUMO Runs can take no more than 72 hours. Therefor, include only one run (with no repeat) in your configs, and submit multiple jobs to repeat experiments (for example using array jobs).

* Log on to the HPC
* Upload a copy of the toolbox if you haven't done so. Tip: place and install the toolbox in $VSC_DATA
* On the Login node, select the proper cluster. For example, to use the gastly cluster:
module swap cluster/gastly
* Load the Matlab and ant module using these commands (you may want to change the versions to suit your needs):
module load MATLAB/2012b
module load ant
export _JAVA_OPTIONS="-Xmx1024M -Xms512M" (use this option in case you need more heap space)
* Open the Makefile in the SUMO root directory and change MATLABDIR to ${EBROOTMATLAB}
MATLABDIR ?= ${EBROOTMATLAB}
* In the terminal, change directory to the SUMO root directory and type in "make" to compile the toolbox for use (this will compile all the Java classes and other libraries needed by the SUMO Toolbox)
* Verify that this first compilation has worked by starting Matlab in the terminal and doing a test run (this will only work on the login node)
* Edit the Makefile again and add the path to the example(s) you plan to run. This is only required if you use a MATLAB script as simulator (MCR can only run scripts that have been included by mcc). The mcc line:
${MATLABDIR}/bin/mcc -m -v -a '''pathToMyExample''' -a ./src/matlab/ -a ./configure.m -a ./startup.m -R '-nodesktop, -nosplash' -d './dist/csumo-toolbox' ./go.m
* Compile the SUMO Toolbox for ''standalone'' use by typing into terminal the following command: <code>make dist-csumo</code>

== Testing on the HPC infracture ==
You can only run matlab code that was included in the mcc options. To run the standalone version of the toolbox, use the <code>run_go.sh</code> bash script which takes two arguments the MCR root and your configuration file. Note: make sure you test your configuration file extensively both locally and on the debug queue of the HPC to avoid wasting your HPC resources as the infrastructure can be very busy at times.

* Log on to HPC and request for some worker nodes (see the HPC wiki for more information), for example this command will request the HPC for an interactive session:
qsub -I
* Wait for the session to be started. In the terminal, change directory to where the compiled SUMO Toolbox is located (by default this is SUMORoot/csumo)
* Set the MCRROOT environmental variable to point to the MCR root, e.g:
export MCRROOT=/apps/gent/gengar/harpertown/software/MATLAB/MCR_2011a/v715
* Try to run a configuration xml-file using:
./run_go.sh $MCRROOT pathToYourConfig/yourConfig.xml

'''Only use this method for testing! Submit jobs for the real work (see below).'''

== Submit SUMO jobs ==
Example job script to run a SUMO job on the HPC. It reserves an MCR cache to avoid issues when 2 MATLAB jobs end up on the same node.

<source lang="Bash">
#!/bin/bash

#PBS -N JOBNAME
#PBS -l nodes=1:ppn=1
#PBS -l walltime=11:59:00
#PBS -l vmem=8gb

## name of .m file
name=go

## options to pass to the executable (= parameters for go script)
opts="config/default.xml"

## directory where the execuatble and script can be found
## PBS_O_WORKDIR variable points to the directory you are submitting from
dir=/path/to/compiled/sumo-toolbox

## version: version of MATLAB
version=2012b

module load MATLAB/${version}

if [ ! -d $dir ]
then
echo "Directory $dir is not a directory"
exit 1
fi

cd $dir

if [ ! -x $name ]
then
echo "No executable $name found."
exit 2
fi
script=run_${name}.sh
if [ ! -x $script ]
then
echo "No run script $script found"
exit 3
fi

## make cache dir
## TMPDIR is set and created by torque. 1 unique dir per job
cdir=$TMPDIR/mcrcache
mkdir -p $cdir
if [ ! -d $cdir ]
then
echo "No tempdir $cdir found."
exit 1
fi

## set dir
export MCR_CACHE_ROOT=$cdir
## 1GB cache (more then large enough)
export MCR_CACHE_SIZE=$((1024*1024*1024))

## real running
./$script ${EBROOTMATLAB} $opts > ~/${PBS_JOBNAME}.log 2> ~/${PBS_JOBNAME}.err
</source>
Ideally, configure output to be stored on the local disc of the node to avoid network overhead (although this is currently not possible as environment variables are unavailable in the XML!). Another alternative is $VSC_SCRATCH (fast access network storage). <code>echo ${VSC_SCRATCH}</code> provides the absolute path which can be used in the config. Do not claim many cores: the advantage of multicore computing in MATLAB is quite limited, 1 or 2 should be ok. This has the additional advantage of being placed on nodes which are only partially filled.

To submit the job to the schedule:
qsub -q long run_script.sh

To get an overview of all queues and their properties:
qstat -q

To get an overview of the status of your submitted jobs:
qstat -u `whoami`

For more advanced topics (such as array jobs), please refer to the HPC user wiki.

Running SUMO on UGent HPC

2014-03-17T15:14:16Z

Javdrher: /* Testing on the HPC infracture */

== Introduction ==

This page provides instructions on how to use the HPC infracture of Ghent university only, although it might apply to other clusters as well. Make sure you have the latest nightly build version of the SUMO Toolbox which you can find [[Downloading|here]] as these instructions are not supported by the SUMO Toolbox version 7.0.2. For more information about using the toolbox with distributed backends go [[Add_Distributed_Backend|here]]. Note though that many features of the toolbox are not yet tested :).

For more information about the UGent HPC infracture itself please visit their [http://hpc.ugent.be/userwiki/index.php/Main_Page website].
To learn how to run Matlab in general on the UGent HPC see this [http://hpc.ugent.be/userwiki/index.php/Main_Page/User:MATLAB page].

== Compiling a standalone copy of the SUMO Toolbox for use on the HPC ==
The SUMO Toolbox needs to be compiled because the worker nodes of the HPC cannot connect to license server and due to the limited number of Matlab licenses. By compiling the toolbox, you create a standalone program that does not need to connect to the license server. The compiled version of the toolbox will not have all functionality however and some '''functionalities''' such the GA algorithms and the ANN Toolbox.

The process of compiling the toolbox is pretty straightforward. Copy the toolbox to your working directory on the server, then load the required modules for compilation. If the separate components of the toolbox still need to be compiled (such as the Java classes and some binary code for certain model types), do so using make.
By default the examples directory is not included in the compiled version of the toolbox and you probably want to add your example to the mcc options. To make a standalone SUMO Toolbox version, type in make-csumo. The steps you need to follow are summarized below.

Currently, compiled MATLAB code can be ran on haunter, gastly (recommended) and gengar. All other clusters (including the default delcatty) do not have MCR installed. Also, remember that the checkpointing framework (for jobs > 72 hours) does not work with MATLAB. SUMO Runs can take no more than 72 hours. Therefor, include only one run (with no repeat) in your configs, and submit multiple jobs to repeat experiments (for example using array jobs).

* Log on to the HPC
* Upload a copy of the toolbox if you haven't done so. Tip: place and install the toolbox in $VSC_DATA
* On the Login node, select the proper cluster. For example, to use the gastly cluster:
module swap cluster/gastly
* Load the Matlab and ant module using these commands (you may want to change the versions to suit your needs):
module load MATLAB/2012b
module load ant
export _JAVA_OPTIONS="-Xmx1024M -Xms512M" (use this option in case you need more heap space)
* Open the Makefile in the SUMO root directory and change MATLABDIR to ${EBROOTMATLAB}
MATLABDIR ?= ${EBROOTMATLAB}
* In the terminal, change directory to the SUMO root directory and type in "make" to compile the toolbox for use (this will compile all the Java classes and other libraries needed by the SUMO Toolbox)
* Verify that this first compilation has worked by starting Matlab in the terminal and doing a test run (this will only work on the login node)
* Edit the Makefile again and add the path to the example(s) you plan to run. This is only required if you use a MATLAB script as simulator (MCR can only run scripts that have been included by mcc). The mcc line:
${MATLABDIR}/bin/mcc -m -v -a '''pathToMyExample''' -a ./src/matlab/ -a ./configure.m -a ./startup.m -R '-nodesktop, -nosplash' -d './dist/csumo-toolbox' ./go.m
* Compile the SUMO Toolbox for ''standalone'' use by typing into terminal the following command: <code>make dist-csumo</code>

== Testing on the HPC infracture ==
You can only run matlab code that was included in the mcc options. To run the standalone version of the toolbox, use the <code>run_go.sh</code> bash script which takes two arguments the MCR root and your configuration file. Note: make sure you test your configuration file extensively both locally and on the debug queue of the HPC to avoid wasting your HPC resources as the infrastructure can be very busy at times.

* Log on to HPC and request for some worker nodes (see the HPC wiki for more information), for example this command will request the HPC for an interactive session:
qsub -I
* Wait for the session to be started. In the terminal, change directory to where the compiled SUMO Toolbox is located (by default this is SUMORoot/csumo)
* Set the MCRROOT environmental variable to point to the MCR root, e.g:
export MCRROOT=/apps/gent/gengar/harpertown/software/MATLAB/MCR_2011a/v715
* Try to run the a configuration xml-file using:
./run_go.sh $MCRROOT pathToYourConfig/yourConfig.xml

'''Only use this method for testing! Submit jobs for the real work (see below).'''

== Submit SUMO jobs ==
Example job script to run a SUMO job on the HPC. It reserves an MCR cache to avoid issues when 2 MATLAB jobs end up on the same node.

<source lang="Bash">
#!/bin/bash

#PBS -N JOBNAME
#PBS -l nodes=1:ppn=1
#PBS -l walltime=11:59:00
#PBS -l vmem=8gb

## name of .m file
name=go

## options to pass to the executable (= parameters for go script)
opts="config/default.xml"

## directory where the execuatble and script can be found
## PBS_O_WORKDIR variable points to the directory you are submitting from
dir=/path/to/compiled/sumo-toolbox

## version: version of MATLAB
version=2012b

module load MATLAB/${version}

if [ ! -d $dir ]
then
echo "Directory $dir is not a directory"
exit 1
fi

cd $dir

if [ ! -x $name ]
then
echo "No executable $name found."
exit 2
fi
script=run_${name}.sh
if [ ! -x $script ]
then
echo "No run script $script found"
exit 3
fi

## make cache dir
## TMPDIR is set and created by torque. 1 unique dir per job
cdir=$TMPDIR/mcrcache
mkdir -p $cdir
if [ ! -d $cdir ]
then
echo "No tempdir $cdir found."
exit 1
fi

## set dir
export MCR_CACHE_ROOT=$cdir
## 1GB cache (more then large enough)
export MCR_CACHE_SIZE=$((1024*1024*1024))

## real running
./$script ${EBROOTMATLAB} $opts > ~/${PBS_JOBNAME}.log 2> ~/${PBS_JOBNAME}.err
</source>
Ideally, configure output to be stored on the local disc of the node to avoid network overhead (although this is currently not possible as environment variables are unavailable in the XML!). Another alternative is $VSC_SCRATCH (fast access network storage). <code>echo ${VSC_SCRATCH}</code> provides the absolute path which can be used in the config. Do not claim many cores: the advantage of multicore computing in MATLAB is quite limited, 1 or 2 should be ok. This has the additional advantage of being placed on nodes which are only partially filled.

To submit the job to the schedule:
qsub -q long run_script.sh

To get an overview of all queues and their properties:
qstat -q

To get an overview of the status of your submitted jobs:
qstat -u `whoami`

For more advanced topics (such as array jobs), please refer to the HPC user wiki.

Running SUMO on UGent HPC

2014-03-17T15:14:08Z

Javdrher: /* Testing on the HPC infracture */

== Introduction ==

This page provides instructions on how to use the HPC infracture of Ghent university only, although it might apply to other clusters as well. Make sure you have the latest nightly build version of the SUMO Toolbox which you can find [[Downloading|here]] as these instructions are not supported by the SUMO Toolbox version 7.0.2. For more information about using the toolbox with distributed backends go [[Add_Distributed_Backend|here]]. Note though that many features of the toolbox are not yet tested :).

For more information about the UGent HPC infracture itself please visit their [http://hpc.ugent.be/userwiki/index.php/Main_Page website].
To learn how to run Matlab in general on the UGent HPC see this [http://hpc.ugent.be/userwiki/index.php/Main_Page/User:MATLAB page].

== Compiling a standalone copy of the SUMO Toolbox for use on the HPC ==
The SUMO Toolbox needs to be compiled because the worker nodes of the HPC cannot connect to license server and due to the limited number of Matlab licenses. By compiling the toolbox, you create a standalone program that does not need to connect to the license server. The compiled version of the toolbox will not have all functionality however and some '''functionalities''' such the GA algorithms and the ANN Toolbox.

The process of compiling the toolbox is pretty straightforward. Copy the toolbox to your working directory on the server, then load the required modules for compilation. If the separate components of the toolbox still need to be compiled (such as the Java classes and some binary code for certain model types), do so using make.
By default the examples directory is not included in the compiled version of the toolbox and you probably want to add your example to the mcc options. To make a standalone SUMO Toolbox version, type in make-csumo. The steps you need to follow are summarized below.

Currently, compiled MATLAB code can be ran on haunter, gastly (recommended) and gengar. All other clusters (including the default delcatty) do not have MCR installed. Also, remember that the checkpointing framework (for jobs > 72 hours) does not work with MATLAB. SUMO Runs can take no more than 72 hours. Therefor, include only one run (with no repeat) in your configs, and submit multiple jobs to repeat experiments (for example using array jobs).

* Log on to the HPC
* Upload a copy of the toolbox if you haven't done so. Tip: place and install the toolbox in $VSC_DATA
* On the Login node, select the proper cluster. For example, to use the gastly cluster:
module swap cluster/gastly
* Load the Matlab and ant module using these commands (you may want to change the versions to suit your needs):
module load MATLAB/2012b
module load ant
export _JAVA_OPTIONS="-Xmx1024M -Xms512M" (use this option in case you need more heap space)
* Open the Makefile in the SUMO root directory and change MATLABDIR to ${EBROOTMATLAB}
MATLABDIR ?= ${EBROOTMATLAB}
* In the terminal, change directory to the SUMO root directory and type in "make" to compile the toolbox for use (this will compile all the Java classes and other libraries needed by the SUMO Toolbox)
* Verify that this first compilation has worked by starting Matlab in the terminal and doing a test run (this will only work on the login node)
* Edit the Makefile again and add the path to the example(s) you plan to run. This is only required if you use a MATLAB script as simulator (MCR can only run scripts that have been included by mcc). The mcc line:
${MATLABDIR}/bin/mcc -m -v -a '''pathToMyExample''' -a ./src/matlab/ -a ./configure.m -a ./startup.m -R '-nodesktop, -nosplash' -d './dist/csumo-toolbox' ./go.m
* Compile the SUMO Toolbox for ''standalone'' use by typing into terminal the following command: <code>make dist-csumo</code>

== Testing on the HPC infracture ==
You can only run matlab code that was included in the mcc options. To run the standalone version of the toolbox, use the <code>run_go.sh</code> bash script which takes two arguments the MCR root and your configuration file. Note: make sure you test your configuration file extensively both locally and on the debug queue of the HPC to avoid wasting your HPC resources as the infrastructure can be very busy at times.

* Log on to HPC and request for some worker nodes (see the HPC wiki for more information), for example this command will request the HPC for an interactive session:
qsub -I
* Wait for the session to be started. In the terminal, change directory to where the compiled SUMO Toolbox is located (by default this is SUMORoot/csumo)
* Set the MCRROOT environmental variable to point to the MCR root, e.g:
export MCRROOT=/apps/gent/gengar/harpertown/software/MATLAB/MCR_2011a/v715
* Try to run the a configuration xml-file using:
./run_go.sh $MCRROOT pathToYourConfig/yourConfig.xml

'''Only use this method for testing! Submit jobs for the real work (see below)'''

== Submit SUMO jobs ==
Example job script to run a SUMO job on the HPC. It reserves an MCR cache to avoid issues when 2 MATLAB jobs end up on the same node.

<source lang="Bash">
#!/bin/bash

#PBS -N JOBNAME
#PBS -l nodes=1:ppn=1
#PBS -l walltime=11:59:00
#PBS -l vmem=8gb

## name of .m file
name=go

## options to pass to the executable (= parameters for go script)
opts="config/default.xml"

## directory where the execuatble and script can be found
## PBS_O_WORKDIR variable points to the directory you are submitting from
dir=/path/to/compiled/sumo-toolbox

## version: version of MATLAB
version=2012b

module load MATLAB/${version}

if [ ! -d $dir ]
then
echo "Directory $dir is not a directory"
exit 1
fi

cd $dir

if [ ! -x $name ]
then
echo "No executable $name found."
exit 2
fi
script=run_${name}.sh
if [ ! -x $script ]
then
echo "No run script $script found"
exit 3
fi

## make cache dir
## TMPDIR is set and created by torque. 1 unique dir per job
cdir=$TMPDIR/mcrcache
mkdir -p $cdir
if [ ! -d $cdir ]
then
echo "No tempdir $cdir found."
exit 1
fi

## set dir
export MCR_CACHE_ROOT=$cdir
## 1GB cache (more then large enough)
export MCR_CACHE_SIZE=$((1024*1024*1024))

## real running
./$script ${EBROOTMATLAB} $opts > ~/${PBS_JOBNAME}.log 2> ~/${PBS_JOBNAME}.err
</source>
Ideally, configure output to be stored on the local disc of the node to avoid network overhead (although this is currently not possible as environment variables are unavailable in the XML!). Another alternative is $VSC_SCRATCH (fast access network storage). <code>echo ${VSC_SCRATCH}</code> provides the absolute path which can be used in the config. Do not claim many cores: the advantage of multicore computing in MATLAB is quite limited, 1 or 2 should be ok. This has the additional advantage of being placed on nodes which are only partially filled.

To submit the job to the schedule:
qsub -q long run_script.sh

To get an overview of all queues and their properties:
qstat -q

To get an overview of the status of your submitted jobs:
qstat -u `whoami`

For more advanced topics (such as array jobs), please refer to the HPC user wiki.

Running SUMO on UGent HPC

2014-03-17T15:13:46Z

Javdrher: /* Running an example on the HPC infracture */

== Introduction ==

This page provides instructions on how to use the HPC infracture of Ghent university only, although it might apply to other clusters as well. Make sure you have the latest nightly build version of the SUMO Toolbox which you can find [[Downloading|here]] as these instructions are not supported by the SUMO Toolbox version 7.0.2. For more information about using the toolbox with distributed backends go [[Add_Distributed_Backend|here]]. Note though that many features of the toolbox are not yet tested :).

For more information about the UGent HPC infracture itself please visit their [http://hpc.ugent.be/userwiki/index.php/Main_Page website].
To learn how to run Matlab in general on the UGent HPC see this [http://hpc.ugent.be/userwiki/index.php/Main_Page/User:MATLAB page].

== Compiling a standalone copy of the SUMO Toolbox for use on the HPC ==
The SUMO Toolbox needs to be compiled because the worker nodes of the HPC cannot connect to license server and due to the limited number of Matlab licenses. By compiling the toolbox, you create a standalone program that does not need to connect to the license server. The compiled version of the toolbox will not have all functionality however and some '''functionalities''' such the GA algorithms and the ANN Toolbox.

The process of compiling the toolbox is pretty straightforward. Copy the toolbox to your working directory on the server, then load the required modules for compilation. If the separate components of the toolbox still need to be compiled (such as the Java classes and some binary code for certain model types), do so using make.
By default the examples directory is not included in the compiled version of the toolbox and you probably want to add your example to the mcc options. To make a standalone SUMO Toolbox version, type in make-csumo. The steps you need to follow are summarized below.

Currently, compiled MATLAB code can be ran on haunter, gastly (recommended) and gengar. All other clusters (including the default delcatty) do not have MCR installed. Also, remember that the checkpointing framework (for jobs > 72 hours) does not work with MATLAB. SUMO Runs can take no more than 72 hours. Therefor, include only one run (with no repeat) in your configs, and submit multiple jobs to repeat experiments (for example using array jobs).

* Log on to the HPC
* Upload a copy of the toolbox if you haven't done so. Tip: place and install the toolbox in $VSC_DATA
* On the Login node, select the proper cluster. For example, to use the gastly cluster:
module swap cluster/gastly
* Load the Matlab and ant module using these commands (you may want to change the versions to suit your needs):
module load MATLAB/2012b
module load ant
export _JAVA_OPTIONS="-Xmx1024M -Xms512M" (use this option in case you need more heap space)
* Open the Makefile in the SUMO root directory and change MATLABDIR to ${EBROOTMATLAB}
MATLABDIR ?= ${EBROOTMATLAB}
* In the terminal, change directory to the SUMO root directory and type in "make" to compile the toolbox for use (this will compile all the Java classes and other libraries needed by the SUMO Toolbox)
* Verify that this first compilation has worked by starting Matlab in the terminal and doing a test run (this will only work on the login node)
* Edit the Makefile again and add the path to the example(s) you plan to run. This is only required if you use a MATLAB script as simulator (MCR can only run scripts that have been included by mcc). The mcc line:
${MATLABDIR}/bin/mcc -m -v -a '''pathToMyExample''' -a ./src/matlab/ -a ./configure.m -a ./startup.m -R '-nodesktop, -nosplash' -d './dist/csumo-toolbox' ./go.m
* Compile the SUMO Toolbox for ''standalone'' use by typing into terminal the following command: <code>make dist-csumo</code>

== Testing on the HPC infracture ==
You can only run matlab code that was included in the mcc options. To run the standalone version of the toolbox, use the <code>run_go.sh</code> bash script which takes two arguments the MCR root and your configuration file. Note: make sure you test your configuration file extensively both locally and on the debug queue of the HPC to avoid wasting your HPC resources as the infrastructure can be very busy at times.

* Log on to HPC and request for some worker nodes (see the HPC wiki for more information), for example this command will request the HPC for an interactive session:
qsub -I
* Wait for the session to be started. In the terminal, change directory to where the compiled SUMO Toolbox is located (by default this is SUMORoot/csumo)
* Set the MCRROOT environmental variable to point to the MCR root, e.g:
export MCRROOT=/apps/gent/gengar/harpertown/software/MATLAB/MCR_2011a/v715
* Try to run the a configuration xml-file using:
./run_go.sh $MCRROOT pathToYourConfig/yourConfig.xml

'''Only use this method for testing!'''

== Submit SUMO jobs ==
Example job script to run a SUMO job on the HPC. It reserves an MCR cache to avoid issues when 2 MATLAB jobs end up on the same node.

<source lang="Bash">
#!/bin/bash

#PBS -N JOBNAME
#PBS -l nodes=1:ppn=1
#PBS -l walltime=11:59:00
#PBS -l vmem=8gb

## name of .m file
name=go

## options to pass to the executable (= parameters for go script)
opts="config/default.xml"

## directory where the execuatble and script can be found
## PBS_O_WORKDIR variable points to the directory you are submitting from
dir=/path/to/compiled/sumo-toolbox

## version: version of MATLAB
version=2012b

module load MATLAB/${version}

if [ ! -d $dir ]
then
echo "Directory $dir is not a directory"
exit 1
fi

cd $dir

if [ ! -x $name ]
then
echo "No executable $name found."
exit 2
fi
script=run_${name}.sh
if [ ! -x $script ]
then
echo "No run script $script found"
exit 3
fi

## make cache dir
## TMPDIR is set and created by torque. 1 unique dir per job
cdir=$TMPDIR/mcrcache
mkdir -p $cdir
if [ ! -d $cdir ]
then
echo "No tempdir $cdir found."
exit 1
fi

## set dir
export MCR_CACHE_ROOT=$cdir
## 1GB cache (more then large enough)
export MCR_CACHE_SIZE=$((1024*1024*1024))

## real running
./$script ${EBROOTMATLAB} $opts > ~/${PBS_JOBNAME}.log 2> ~/${PBS_JOBNAME}.err
</source>
Ideally, configure output to be stored on the local disc of the node to avoid network overhead (although this is currently not possible as environment variables are unavailable in the XML!). Another alternative is $VSC_SCRATCH (fast access network storage). <code>echo ${VSC_SCRATCH}</code> provides the absolute path which can be used in the config. Do not claim many cores: the advantage of multicore computing in MATLAB is quite limited, 1 or 2 should be ok. This has the additional advantage of being placed on nodes which are only partially filled.

To submit the job to the schedule:
qsub -q long run_script.sh

To get an overview of all queues and their properties:
qstat -q

To get an overview of the status of your submitted jobs:
qstat -u `whoami`

For more advanced topics (such as array jobs), please refer to the HPC user wiki.

Running SUMO on UGent HPC

2014-03-17T15:12:19Z

Javdrher: /* Compiling a standalone copy of the SUMO Toolbox for use on the HPC */

== Introduction ==

This page provides instructions on how to use the HPC infracture of Ghent university only, although it might apply to other clusters as well. Make sure you have the latest nightly build version of the SUMO Toolbox which you can find [[Downloading|here]] as these instructions are not supported by the SUMO Toolbox version 7.0.2. For more information about using the toolbox with distributed backends go [[Add_Distributed_Backend|here]]. Note though that many features of the toolbox are not yet tested :).

For more information about the UGent HPC infracture itself please visit their [http://hpc.ugent.be/userwiki/index.php/Main_Page website].
To learn how to run Matlab in general on the UGent HPC see this [http://hpc.ugent.be/userwiki/index.php/Main_Page/User:MATLAB page].

== Compiling a standalone copy of the SUMO Toolbox for use on the HPC ==
The SUMO Toolbox needs to be compiled because the worker nodes of the HPC cannot connect to license server and due to the limited number of Matlab licenses. By compiling the toolbox, you create a standalone program that does not need to connect to the license server. The compiled version of the toolbox will not have all functionality however and some '''functionalities''' such the GA algorithms and the ANN Toolbox.

The process of compiling the toolbox is pretty straightforward. Copy the toolbox to your working directory on the server, then load the required modules for compilation. If the separate components of the toolbox still need to be compiled (such as the Java classes and some binary code for certain model types), do so using make.
By default the examples directory is not included in the compiled version of the toolbox and you probably want to add your example to the mcc options. To make a standalone SUMO Toolbox version, type in make-csumo. The steps you need to follow are summarized below.

Currently, compiled MATLAB code can be ran on haunter, gastly (recommended) and gengar. All other clusters (including the default delcatty) do not have MCR installed. Also, remember that the checkpointing framework (for jobs > 72 hours) does not work with MATLAB. SUMO Runs can take no more than 72 hours. Therefor, include only one run (with no repeat) in your configs, and submit multiple jobs to repeat experiments (for example using array jobs).

* Log on to the HPC
* Upload a copy of the toolbox if you haven't done so. Tip: place and install the toolbox in $VSC_DATA
* On the Login node, select the proper cluster. For example, to use the gastly cluster:
module swap cluster/gastly
* Load the Matlab and ant module using these commands (you may want to change the versions to suit your needs):
module load MATLAB/2012b
module load ant
export _JAVA_OPTIONS="-Xmx1024M -Xms512M" (use this option in case you need more heap space)
* Open the Makefile in the SUMO root directory and change MATLABDIR to ${EBROOTMATLAB}
MATLABDIR ?= ${EBROOTMATLAB}
* In the terminal, change directory to the SUMO root directory and type in "make" to compile the toolbox for use (this will compile all the Java classes and other libraries needed by the SUMO Toolbox)
* Verify that this first compilation has worked by starting Matlab in the terminal and doing a test run (this will only work on the login node)
* Edit the Makefile again and add the path to the example(s) you plan to run. This is only required if you use a MATLAB script as simulator (MCR can only run scripts that have been included by mcc). The mcc line:
${MATLABDIR}/bin/mcc -m -v -a '''pathToMyExample''' -a ./src/matlab/ -a ./configure.m -a ./startup.m -R '-nodesktop, -nosplash' -d './dist/csumo-toolbox' ./go.m
* Compile the SUMO Toolbox for ''standalone'' use by typing into terminal the following command: <code>make dist-csumo</code>

== Running an example on the HPC infracture ==
You can only run matlab code that was included in the mcc options. To run the standalone version of the toolbox, use the <code>run_go.sh</code> bash script which takes two arguments the MCR root and your configuration file. Note: make sure you test your configuration file extensively both locally and on the debug queue of the HPC to avoid wasting your HPC resources as the infrastructure can be very busy at times.

* Log on to HPC and request for some worker nodes (see the HPC wiki for more information), for example this command will request the HPC for node in the debug queue:
qsub -I
* In the terminal, change directory to where the compiled SUMO Toolbox is located (by default this is SUMORoot/csumo)
* Set the MCRROOT environmental variable to point to the MCR root, e.g:
export MCRROOT=/apps/gent/gengar/harpertown/software/MATLAB/MCR_2011a/v715
* Run the a configuration xml-file using:
./run_go.sh $MCRROOT pathToYourConfig/yourConfig.xml

== Submit SUMO jobs ==
Example job script to run a SUMO job on the HPC. It reserves an MCR cache to avoid issues when 2 MATLAB jobs end up on the same node.

<source lang="Bash">
#!/bin/bash

#PBS -N JOBNAME
#PBS -l nodes=1:ppn=1
#PBS -l walltime=11:59:00
#PBS -l vmem=8gb

## name of .m file
name=go

## options to pass to the executable (= parameters for go script)
opts="config/default.xml"

## directory where the execuatble and script can be found
## PBS_O_WORKDIR variable points to the directory you are submitting from
dir=/path/to/compiled/sumo-toolbox

## version: version of MATLAB
version=2012b

module load MATLAB/${version}

if [ ! -d $dir ]
then
echo "Directory $dir is not a directory"
exit 1
fi

cd $dir

if [ ! -x $name ]
then
echo "No executable $name found."
exit 2
fi
script=run_${name}.sh
if [ ! -x $script ]
then
echo "No run script $script found"
exit 3
fi

## make cache dir
## TMPDIR is set and created by torque. 1 unique dir per job
cdir=$TMPDIR/mcrcache
mkdir -p $cdir
if [ ! -d $cdir ]
then
echo "No tempdir $cdir found."
exit 1
fi

## set dir
export MCR_CACHE_ROOT=$cdir
## 1GB cache (more then large enough)
export MCR_CACHE_SIZE=$((1024*1024*1024))

## real running
./$script ${EBROOTMATLAB} $opts > ~/${PBS_JOBNAME}.log 2> ~/${PBS_JOBNAME}.err
</source>
Ideally, configure output to be stored on the local disc of the node to avoid network overhead (although this is currently not possible as environment variables are unavailable in the XML!). Another alternative is $VSC_SCRATCH (fast access network storage). <code>echo ${VSC_SCRATCH}</code> provides the absolute path which can be used in the config. Do not claim many cores: the advantage of multicore computing in MATLAB is quite limited, 1 or 2 should be ok. This has the additional advantage of being placed on nodes which are only partially filled.

To submit the job to the schedule:
qsub -q long run_script.sh

To get an overview of all queues and their properties:
qstat -q

To get an overview of the status of your submitted jobs:
qstat -u `whoami`

For more advanced topics (such as array jobs), please refer to the HPC user wiki.

Running SUMO on UGent HPC

2014-03-17T15:10:30Z

Javdrher: /* Submit SUMO jobs */

== Introduction ==

This page provides instructions on how to use the HPC infracture of Ghent university only, although it might apply to other clusters as well. Make sure you have the latest nightly build version of the SUMO Toolbox which you can find [[Downloading|here]] as these instructions are not supported by the SUMO Toolbox version 7.0.2. For more information about using the toolbox with distributed backends go [[Add_Distributed_Backend|here]]. Note though that many features of the toolbox are not yet tested :).

For more information about the UGent HPC infracture itself please visit their [http://hpc.ugent.be/userwiki/index.php/Main_Page website].
To learn how to run Matlab in general on the UGent HPC see this [http://hpc.ugent.be/userwiki/index.php/Main_Page/User:MATLAB page].

== Compiling a standalone copy of the SUMO Toolbox for use on the HPC ==
The SUMO Toolbox needs to be compiled because the worker nodes of the HPC cannot connect to license server and due to the limited number of Matlab licenses. By compiling the toolbox, you create a standalone program that does not need to connect to the license server. The compiled version of the toolbox will not have all functionality however and some '''functionalities''' such the GA algorithms and the ANN Toolbox.

The process of compiling the toolbox is pretty straightforward. Copy the toolbox to your working directory on the server, then load the required modules for compilation. If the separate components of the toolbox still need to be compiled (such as the Java classes and some binary code for certain model types), do so using make.
By default the examples directory is not included in the compiled version of the toolbox and you probably want to add your example to the mcc options. To make a standalone SUMO Toolbox version, type in make-csumo. The steps you need to follow are summarized below.

Currently, compiled MATLAB code can be ran on haunter, gastly (recommended) and gengar. All other clusters (including the default delcatty) do not have MCR installed. Also, remember that the checkpointing framework (for jobs > 72 hours) does not work with MATLAB. SUMO Runs can take no more than 72 hours.

* Log on to the HPC
* Upload a copy of the toolbox if you haven't done so. Tip: place and install the toolbox in $VSC_DATA
* On the Login node, select the proper cluster. For example, to use the gastly cluster:
module swap cluster/gastly
* Load the Matlab and ant module using these commands (you may want to change the versions to suit your needs):
module load MATLAB/2012b
module load ant
export _JAVA_OPTIONS="-Xmx1024M -Xms512M" (use this option in case you need more heap space)
* Open the Makefile in the SUMO root directory and change MATLABDIR to ${EBROOTMATLAB}
MATLABDIR ?= ${EBROOTMATLAB}
* In the terminal, change directory to the SUMO root directory and type in "make" to compile the toolbox for use (this will compile all the Java classes and other libraries needed by the SUMO Toolbox)
* Verify that this first compilation has worked by starting Matlab in the terminal and doing a test run (this will only work on the login node)
* Edit the Makefile again and add the path to the example(s) you plan to run. This is only required if you use a MATLAB script as simulator (MCR can only run scripts that have been included by mcc). The mcc line:
${MATLABDIR}/bin/mcc -m -v -a '''pathToMyExample''' -a ./src/matlab/ -a ./configure.m -a ./startup.m -R '-nodesktop, -nosplash' -d './dist/csumo-toolbox' ./go.m
* Compile the SUMO Toolbox for ''standalone'' use by typing into terminal the following command: <code>make dist-csumo</code>

== Running an example on the HPC infracture ==
You can only run matlab code that was included in the mcc options. To run the standalone version of the toolbox, use the <code>run_go.sh</code> bash script which takes two arguments the MCR root and your configuration file. Note: make sure you test your configuration file extensively both locally and on the debug queue of the HPC to avoid wasting your HPC resources as the infrastructure can be very busy at times.

* Log on to HPC and request for some worker nodes (see the HPC wiki for more information), for example this command will request the HPC for node in the debug queue:
qsub -I
* In the terminal, change directory to where the compiled SUMO Toolbox is located (by default this is SUMORoot/csumo)
* Set the MCRROOT environmental variable to point to the MCR root, e.g:
export MCRROOT=/apps/gent/gengar/harpertown/software/MATLAB/MCR_2011a/v715
* Run the a configuration xml-file using:
./run_go.sh $MCRROOT pathToYourConfig/yourConfig.xml

== Submit SUMO jobs ==
Example job script to run a SUMO job on the HPC. It reserves an MCR cache to avoid issues when 2 MATLAB jobs end up on the same node.

<source lang="Bash">
#!/bin/bash

#PBS -N JOBNAME
#PBS -l nodes=1:ppn=1
#PBS -l walltime=11:59:00
#PBS -l vmem=8gb

## name of .m file
name=go

## options to pass to the executable (= parameters for go script)
opts="config/default.xml"

## directory where the execuatble and script can be found
## PBS_O_WORKDIR variable points to the directory you are submitting from
dir=/path/to/compiled/sumo-toolbox

## version: version of MATLAB
version=2012b

module load MATLAB/${version}

if [ ! -d $dir ]
then
echo "Directory $dir is not a directory"
exit 1
fi

cd $dir

if [ ! -x $name ]
then
echo "No executable $name found."
exit 2
fi
script=run_${name}.sh
if [ ! -x $script ]
then
echo "No run script $script found"
exit 3
fi

## make cache dir
## TMPDIR is set and created by torque. 1 unique dir per job
cdir=$TMPDIR/mcrcache
mkdir -p $cdir
if [ ! -d $cdir ]
then
echo "No tempdir $cdir found."
exit 1
fi

## set dir
export MCR_CACHE_ROOT=$cdir
## 1GB cache (more then large enough)
export MCR_CACHE_SIZE=$((1024*1024*1024))

## real running
./$script ${EBROOTMATLAB} $opts > ~/${PBS_JOBNAME}.log 2> ~/${PBS_JOBNAME}.err
</source>
Ideally, configure output to be stored on the local disc of the node to avoid network overhead (although this is currently not possible as environment variables are unavailable in the XML!). Another alternative is $VSC_SCRATCH (fast access network storage). <code>echo ${VSC_SCRATCH}</code> provides the absolute path which can be used in the config. Do not claim many cores: the advantage of multicore computing in MATLAB is quite limited, 1 or 2 should be ok. This has the additional advantage of being placed on nodes which are only partially filled.

To submit the job to the schedule:
qsub -q long run_script.sh

To get an overview of all queues and their properties:
qstat -q

To get an overview of the status of your submitted jobs:
qstat -u `whoami`

For more advanced topics (such as array jobs), please refer to the HPC user wiki.

Running SUMO on UGent HPC

2014-03-17T15:10:06Z

Javdrher: /* Submit SUMO jobs */

== Introduction ==

This page provides instructions on how to use the HPC infracture of Ghent university only, although it might apply to other clusters as well. Make sure you have the latest nightly build version of the SUMO Toolbox which you can find [[Downloading|here]] as these instructions are not supported by the SUMO Toolbox version 7.0.2. For more information about using the toolbox with distributed backends go [[Add_Distributed_Backend|here]]. Note though that many features of the toolbox are not yet tested :).

For more information about the UGent HPC infracture itself please visit their [http://hpc.ugent.be/userwiki/index.php/Main_Page website].
To learn how to run Matlab in general on the UGent HPC see this [http://hpc.ugent.be/userwiki/index.php/Main_Page/User:MATLAB page].

== Compiling a standalone copy of the SUMO Toolbox for use on the HPC ==
The SUMO Toolbox needs to be compiled because the worker nodes of the HPC cannot connect to license server and due to the limited number of Matlab licenses. By compiling the toolbox, you create a standalone program that does not need to connect to the license server. The compiled version of the toolbox will not have all functionality however and some '''functionalities''' such the GA algorithms and the ANN Toolbox.

The process of compiling the toolbox is pretty straightforward. Copy the toolbox to your working directory on the server, then load the required modules for compilation. If the separate components of the toolbox still need to be compiled (such as the Java classes and some binary code for certain model types), do so using make.
By default the examples directory is not included in the compiled version of the toolbox and you probably want to add your example to the mcc options. To make a standalone SUMO Toolbox version, type in make-csumo. The steps you need to follow are summarized below.

Currently, compiled MATLAB code can be ran on haunter, gastly (recommended) and gengar. All other clusters (including the default delcatty) do not have MCR installed. Also, remember that the checkpointing framework (for jobs > 72 hours) does not work with MATLAB. SUMO Runs can take no more than 72 hours.

* Log on to the HPC
* Upload a copy of the toolbox if you haven't done so. Tip: place and install the toolbox in $VSC_DATA
* On the Login node, select the proper cluster. For example, to use the gastly cluster:
module swap cluster/gastly
* Load the Matlab and ant module using these commands (you may want to change the versions to suit your needs):
module load MATLAB/2012b
module load ant
export _JAVA_OPTIONS="-Xmx1024M -Xms512M" (use this option in case you need more heap space)
* Open the Makefile in the SUMO root directory and change MATLABDIR to ${EBROOTMATLAB}
MATLABDIR ?= ${EBROOTMATLAB}
* In the terminal, change directory to the SUMO root directory and type in "make" to compile the toolbox for use (this will compile all the Java classes and other libraries needed by the SUMO Toolbox)
* Verify that this first compilation has worked by starting Matlab in the terminal and doing a test run (this will only work on the login node)
* Edit the Makefile again and add the path to the example(s) you plan to run. This is only required if you use a MATLAB script as simulator (MCR can only run scripts that have been included by mcc). The mcc line:
${MATLABDIR}/bin/mcc -m -v -a '''pathToMyExample''' -a ./src/matlab/ -a ./configure.m -a ./startup.m -R '-nodesktop, -nosplash' -d './dist/csumo-toolbox' ./go.m
* Compile the SUMO Toolbox for ''standalone'' use by typing into terminal the following command: <code>make dist-csumo</code>

== Running an example on the HPC infracture ==
You can only run matlab code that was included in the mcc options. To run the standalone version of the toolbox, use the <code>run_go.sh</code> bash script which takes two arguments the MCR root and your configuration file. Note: make sure you test your configuration file extensively both locally and on the debug queue of the HPC to avoid wasting your HPC resources as the infrastructure can be very busy at times.

* Log on to HPC and request for some worker nodes (see the HPC wiki for more information), for example this command will request the HPC for node in the debug queue:
qsub -I
* In the terminal, change directory to where the compiled SUMO Toolbox is located (by default this is SUMORoot/csumo)
* Set the MCRROOT environmental variable to point to the MCR root, e.g:
export MCRROOT=/apps/gent/gengar/harpertown/software/MATLAB/MCR_2011a/v715
* Run the a configuration xml-file using:
./run_go.sh $MCRROOT pathToYourConfig/yourConfig.xml

== Submit SUMO jobs ==
Example job script to run a SUMO job on the HPC. It reserves an MCR cache to avoid issues when 2 MATLAB jobs end up on the same node.

<source lang="Bash">
#!/bin/bash

#PBS -N JOBNAME
#PBS -l nodes=1:ppn=1
#PBS -l walltime=11:59:00
#PBS -l vmem=8gb

## name of .m file
name=go

## options to pass to the executable (= parameters for go script)
opts="config/default.xml"

## directory where the execuatble and script can be found
## PBS_O_WORKDIR variable points to the directory you are submitting from
dir=/path/to/compiled/sumo-toolbox

## version: version of MATLAB
version=2012b

module load MATLAB/${version}

if [ ! -d $dir ]
then
echo "Directory $dir is not a directory"
exit 1
fi

cd $dir

if [ ! -x $name ]
then
echo "No executable $name found."
exit 2
fi
script=run_${name}.sh
if [ ! -x $script ]
then
echo "No run script $script found"
exit 3
fi

## make cache dir
## TMPDIR is set and created by torque. 1 unique dir per job
cdir=$TMPDIR/mcrcache
mkdir -p $cdir
if [ ! -d $cdir ]
then
echo "No tempdir $cdir found."
exit 1
fi

## set dir
export MCR_CACHE_ROOT=$cdir
## 1GB cache (more then large enough)
export MCR_CACHE_SIZE=$((1024*1024*1024))

## real running
./$script ${EBROOTMATLAB} $opts > ~/${PBS_JOBNAME}.log 2> ~/${PBS_JOBNAME}.err
</source>
Ideally, configure output to be stored on the local disc of the node to avoid network overhead (although this is currently not possible as environment variables are unavailable in the XML!). Another alternative is $VSC_SCRATCH (fast access network storage). <code>echo ${VSC_SCRATCH}</code> provides the absolute path which can be used in the config.

Do not claim many cores: the advantage of multicore computing in MATLAB is quite limited, 1 or 2 should be ok. This has the additional advantage of being placed on nodes which are only partially filled.

To submit the job to the schedule:
qsub -q long run_script.sh

To get an overview of all queues and their properties:
qstat -q

To get an overview of the status of your submitted jobs:
qstat -u `whoami`

For more advanced topics (such as array jobs), please refer to the HPC user wiki.

Running SUMO on UGent HPC

2014-03-17T15:08:27Z

Javdrher: /* Submit SUMO jobs */

== Introduction ==

This page provides instructions on how to use the HPC infracture of Ghent university only, although it might apply to other clusters as well. Make sure you have the latest nightly build version of the SUMO Toolbox which you can find [[Downloading|here]] as these instructions are not supported by the SUMO Toolbox version 7.0.2. For more information about using the toolbox with distributed backends go [[Add_Distributed_Backend|here]]. Note though that many features of the toolbox are not yet tested :).

For more information about the UGent HPC infracture itself please visit their [http://hpc.ugent.be/userwiki/index.php/Main_Page website].
To learn how to run Matlab in general on the UGent HPC see this [http://hpc.ugent.be/userwiki/index.php/Main_Page/User:MATLAB page].

== Compiling a standalone copy of the SUMO Toolbox for use on the HPC ==
The SUMO Toolbox needs to be compiled because the worker nodes of the HPC cannot connect to license server and due to the limited number of Matlab licenses. By compiling the toolbox, you create a standalone program that does not need to connect to the license server. The compiled version of the toolbox will not have all functionality however and some '''functionalities''' such the GA algorithms and the ANN Toolbox.

The process of compiling the toolbox is pretty straightforward. Copy the toolbox to your working directory on the server, then load the required modules for compilation. If the separate components of the toolbox still need to be compiled (such as the Java classes and some binary code for certain model types), do so using make.
By default the examples directory is not included in the compiled version of the toolbox and you probably want to add your example to the mcc options. To make a standalone SUMO Toolbox version, type in make-csumo. The steps you need to follow are summarized below.

Currently, compiled MATLAB code can be ran on haunter, gastly (recommended) and gengar. All other clusters (including the default delcatty) do not have MCR installed. Also, remember that the checkpointing framework (for jobs > 72 hours) does not work with MATLAB. SUMO Runs can take no more than 72 hours.

* Log on to the HPC
* Upload a copy of the toolbox if you haven't done so. Tip: place and install the toolbox in $VSC_DATA
* On the Login node, select the proper cluster. For example, to use the gastly cluster:
module swap cluster/gastly
* Load the Matlab and ant module using these commands (you may want to change the versions to suit your needs):
module load MATLAB/2012b
module load ant
export _JAVA_OPTIONS="-Xmx1024M -Xms512M" (use this option in case you need more heap space)
* Open the Makefile in the SUMO root directory and change MATLABDIR to ${EBROOTMATLAB}
MATLABDIR ?= ${EBROOTMATLAB}
* In the terminal, change directory to the SUMO root directory and type in "make" to compile the toolbox for use (this will compile all the Java classes and other libraries needed by the SUMO Toolbox)
* Verify that this first compilation has worked by starting Matlab in the terminal and doing a test run (this will only work on the login node)
* Edit the Makefile again and add the path to the example(s) you plan to run. This is only required if you use a MATLAB script as simulator (MCR can only run scripts that have been included by mcc). The mcc line:
${MATLABDIR}/bin/mcc -m -v -a '''pathToMyExample''' -a ./src/matlab/ -a ./configure.m -a ./startup.m -R '-nodesktop, -nosplash' -d './dist/csumo-toolbox' ./go.m
* Compile the SUMO Toolbox for ''standalone'' use by typing into terminal the following command: <code>make dist-csumo</code>

== Running an example on the HPC infracture ==
You can only run matlab code that was included in the mcc options. To run the standalone version of the toolbox, use the <code>run_go.sh</code> bash script which takes two arguments the MCR root and your configuration file. Note: make sure you test your configuration file extensively both locally and on the debug queue of the HPC to avoid wasting your HPC resources as the infrastructure can be very busy at times.

* Log on to HPC and request for some worker nodes (see the HPC wiki for more information), for example this command will request the HPC for node in the debug queue:
qsub -I
* In the terminal, change directory to where the compiled SUMO Toolbox is located (by default this is SUMORoot/csumo)
* Set the MCRROOT environmental variable to point to the MCR root, e.g:
export MCRROOT=/apps/gent/gengar/harpertown/software/MATLAB/MCR_2011a/v715
* Run the a configuration xml-file using:
./run_go.sh $MCRROOT pathToYourConfig/yourConfig.xml

== Submit SUMO jobs ==
Example job script to run a SUMO job on the HPC. It reserves an MCR cache to avoid issues when 2 MATLAB jobs end up on the same node.

<source lang="Bash">
#!/bin/bash

#PBS -N JOBNAME
#PBS -l nodes=1:ppn=1
#PBS -l walltime=11:59:00
#PBS -l vmem=8gb

## name of .m file
name=go

## options to pass to the executable (= parameters for go script)
opts="config/default.xml"

## directory where the execuatble and script can be found
## PBS_O_WORKDIR variable points to the directory you are submitting from
dir=/path/to/compiled/sumo-toolbox

## version: version of MATLAB
version=2012b

module load MATLAB/${version}

if [ ! -d $dir ]
then
echo "Directory $dir is not a directory"
exit 1
fi

cd $dir

if [ ! -x $name ]
then
echo "No executable $name found."
exit 2
fi
script=run_${name}.sh
if [ ! -x $script ]
then
echo "No run script $script found"
exit 3
fi

## make cache dir
## TMPDIR is set and created by torque. 1 unique dir per job
cdir=$TMPDIR/mcrcache
mkdir -p $cdir
if [ ! -d $cdir ]
then
echo "No tempdir $cdir found."
exit 1
fi

## set dir
export MCR_CACHE_ROOT=$cdir
## 1GB cache (more then large enough)
export MCR_CACHE_SIZE=$((1024*1024*1024))

## real running
./$script ${EBROOTMATLAB} $opts > ~/${PBS_JOBNAME}.log 2> ~/${PBS_JOBNAME}.err
</source>
Ideally, configure output to be stored on the local disc of the node to avoid network overhead (although this is currently not possible as environment variables are unavailable in the XML!). Another alternative is $VSC_SCRATCH (fast access network storage). <code>echo ${VSC_SCRATCH}</code> provides the absolute path which can be used in the config.

Do not claim many cores: the advantage of multicore computing in MATLAB is quite limited, 1 or 2 should be ok. This has the additional advantage of being placed on nodes which are only partially filled.

For more advanced topics (such as array jobs), please refer to the HPC user wiki.

Running SUMO on UGent HPC

2014-03-17T15:07:56Z

Javdrher: /* Submit SUMO jobs */

== Introduction ==

This page provides instructions on how to use the HPC infracture of Ghent university only, although it might apply to other clusters as well. Make sure you have the latest nightly build version of the SUMO Toolbox which you can find [[Downloading|here]] as these instructions are not supported by the SUMO Toolbox version 7.0.2. For more information about using the toolbox with distributed backends go [[Add_Distributed_Backend|here]]. Note though that many features of the toolbox are not yet tested :).

For more information about the UGent HPC infracture itself please visit their [http://hpc.ugent.be/userwiki/index.php/Main_Page website].
To learn how to run Matlab in general on the UGent HPC see this [http://hpc.ugent.be/userwiki/index.php/Main_Page/User:MATLAB page].

== Compiling a standalone copy of the SUMO Toolbox for use on the HPC ==
The SUMO Toolbox needs to be compiled because the worker nodes of the HPC cannot connect to license server and due to the limited number of Matlab licenses. By compiling the toolbox, you create a standalone program that does not need to connect to the license server. The compiled version of the toolbox will not have all functionality however and some '''functionalities''' such the GA algorithms and the ANN Toolbox.

The process of compiling the toolbox is pretty straightforward. Copy the toolbox to your working directory on the server, then load the required modules for compilation. If the separate components of the toolbox still need to be compiled (such as the Java classes and some binary code for certain model types), do so using make.
By default the examples directory is not included in the compiled version of the toolbox and you probably want to add your example to the mcc options. To make a standalone SUMO Toolbox version, type in make-csumo. The steps you need to follow are summarized below.

Currently, compiled MATLAB code can be ran on haunter, gastly (recommended) and gengar. All other clusters (including the default delcatty) do not have MCR installed. Also, remember that the checkpointing framework (for jobs > 72 hours) does not work with MATLAB. SUMO Runs can take no more than 72 hours.

* Log on to the HPC
* Upload a copy of the toolbox if you haven't done so. Tip: place and install the toolbox in $VSC_DATA
* On the Login node, select the proper cluster. For example, to use the gastly cluster:
module swap cluster/gastly
* Load the Matlab and ant module using these commands (you may want to change the versions to suit your needs):
module load MATLAB/2012b
module load ant
export _JAVA_OPTIONS="-Xmx1024M -Xms512M" (use this option in case you need more heap space)
* Open the Makefile in the SUMO root directory and change MATLABDIR to ${EBROOTMATLAB}
MATLABDIR ?= ${EBROOTMATLAB}
* In the terminal, change directory to the SUMO root directory and type in "make" to compile the toolbox for use (this will compile all the Java classes and other libraries needed by the SUMO Toolbox)
* Verify that this first compilation has worked by starting Matlab in the terminal and doing a test run (this will only work on the login node)
* Edit the Makefile again and add the path to the example(s) you plan to run. This is only required if you use a MATLAB script as simulator (MCR can only run scripts that have been included by mcc). The mcc line:
${MATLABDIR}/bin/mcc -m -v -a '''pathToMyExample''' -a ./src/matlab/ -a ./configure.m -a ./startup.m -R '-nodesktop, -nosplash' -d './dist/csumo-toolbox' ./go.m
* Compile the SUMO Toolbox for ''standalone'' use by typing into terminal the following command: <code>make dist-csumo</code>

== Running an example on the HPC infracture ==
You can only run matlab code that was included in the mcc options. To run the standalone version of the toolbox, use the <code>run_go.sh</code> bash script which takes two arguments the MCR root and your configuration file. Note: make sure you test your configuration file extensively both locally and on the debug queue of the HPC to avoid wasting your HPC resources as the infrastructure can be very busy at times.

* Log on to HPC and request for some worker nodes (see the HPC wiki for more information), for example this command will request the HPC for node in the debug queue:
qsub -I
* In the terminal, change directory to where the compiled SUMO Toolbox is located (by default this is SUMORoot/csumo)
* Set the MCRROOT environmental variable to point to the MCR root, e.g:
export MCRROOT=/apps/gent/gengar/harpertown/software/MATLAB/MCR_2011a/v715
* Run the a configuration xml-file using:
./run_go.sh $MCRROOT pathToYourConfig/yourConfig.xml

== Submit SUMO jobs ==
Example job script to run a SUMO job on the HPC. It reserves an MCR cache to avoid issues when 2 MATLAB jobs end up on the same node.

<source lang="Bash">
#!/bin/bash

#PBS -N JOBNAME
#PBS -l nodes=1:ppn=1
#PBS -l walltime=11:59:00
#PBS -l vmem=8gb

## name of .m file
name=go

## options to pass to the executable (= parameters for go script)
opts="config/default.xml"

## directory where the execuatble and script can be found
## PBS_O_WORKDIR variable points to the directory you are submitting from
dir=/path/to/compiled/sumo-toolbox

## version: version of MATLAB
version=2012b

module load MATLAB/${version}

if [ ! -d $dir ]
then
echo "Directory $dir is not a directory"
exit 1
fi

cd $dir

if [ ! -x $name ]
then
echo "No executable $name found."
exit 2
fi
script=run_${name}.sh
if [ ! -x $script ]
then
echo "No run script $script found"
exit 3
fi

## make cache dir
## TMPDIR is set and created by torque. 1 unique dir per job
cdir=$TMPDIR/mcrcache
mkdir -p $cdir
if [ ! -d $cdir ]
then
echo "No tempdir $cdir found."
exit 1
fi

## set dir
export MCR_CACHE_ROOT=$cdir
## 1GB cache (more then large enough)
export MCR_CACHE_SIZE=$((1024*1024*1024))

## real running
./$script ${EBROOTMATLAB} $opts > ~/${PBS_JOBNAME}.log 2> ~/${PBS_JOBNAME}.err
</source>
Ideally, configure output to be stored on the local disc of the node to avoid network overhead (although this is currently not possible as environment variables are unavailable in the XML!). Another alternative is $VSC_SCRATCH (fast access network storage). <code>echo ${VSC_SCRATCH}</code> provides the absolute path which can be used in the config.

Do not claim many cores: the advantage of multicore computing in MATLAB is quite limited, 1 or 2 should be ok. This has the additional advantage of being placed on nodes which are only partially filled. For more advanced topics (such as array jobs), please refer to the HPC user wiki.

Running SUMO on UGent HPC

2014-03-17T15:06:35Z

Javdrher: /* Submit SUMO jobs */

== Introduction ==

This page provides instructions on how to use the HPC infracture of Ghent university only, although it might apply to other clusters as well. Make sure you have the latest nightly build version of the SUMO Toolbox which you can find [[Downloading|here]] as these instructions are not supported by the SUMO Toolbox version 7.0.2. For more information about using the toolbox with distributed backends go [[Add_Distributed_Backend|here]]. Note though that many features of the toolbox are not yet tested :).

For more information about the UGent HPC infracture itself please visit their [http://hpc.ugent.be/userwiki/index.php/Main_Page website].
To learn how to run Matlab in general on the UGent HPC see this [http://hpc.ugent.be/userwiki/index.php/Main_Page/User:MATLAB page].

== Compiling a standalone copy of the SUMO Toolbox for use on the HPC ==
The SUMO Toolbox needs to be compiled because the worker nodes of the HPC cannot connect to license server and due to the limited number of Matlab licenses. By compiling the toolbox, you create a standalone program that does not need to connect to the license server. The compiled version of the toolbox will not have all functionality however and some '''functionalities''' such the GA algorithms and the ANN Toolbox.

The process of compiling the toolbox is pretty straightforward. Copy the toolbox to your working directory on the server, then load the required modules for compilation. If the separate components of the toolbox still need to be compiled (such as the Java classes and some binary code for certain model types), do so using make.
By default the examples directory is not included in the compiled version of the toolbox and you probably want to add your example to the mcc options. To make a standalone SUMO Toolbox version, type in make-csumo. The steps you need to follow are summarized below.

Currently, compiled MATLAB code can be ran on haunter, gastly (recommended) and gengar. All other clusters (including the default delcatty) do not have MCR installed. Also, remember that the checkpointing framework (for jobs > 72 hours) does not work with MATLAB. SUMO Runs can take no more than 72 hours.

* Log on to the HPC
* Upload a copy of the toolbox if you haven't done so. Tip: place and install the toolbox in $VSC_DATA
* On the Login node, select the proper cluster. For example, to use the gastly cluster:
module swap cluster/gastly
* Load the Matlab and ant module using these commands (you may want to change the versions to suit your needs):
module load MATLAB/2012b
module load ant
export _JAVA_OPTIONS="-Xmx1024M -Xms512M" (use this option in case you need more heap space)
* Open the Makefile in the SUMO root directory and change MATLABDIR to ${EBROOTMATLAB}
MATLABDIR ?= ${EBROOTMATLAB}
* In the terminal, change directory to the SUMO root directory and type in "make" to compile the toolbox for use (this will compile all the Java classes and other libraries needed by the SUMO Toolbox)
* Verify that this first compilation has worked by starting Matlab in the terminal and doing a test run (this will only work on the login node)
* Edit the Makefile again and add the path to the example(s) you plan to run. This is only required if you use a MATLAB script as simulator (MCR can only run scripts that have been included by mcc). The mcc line:
${MATLABDIR}/bin/mcc -m -v -a '''pathToMyExample''' -a ./src/matlab/ -a ./configure.m -a ./startup.m -R '-nodesktop, -nosplash' -d './dist/csumo-toolbox' ./go.m
* Compile the SUMO Toolbox for ''standalone'' use by typing into terminal the following command: <code>make dist-csumo</code>

== Running an example on the HPC infracture ==
You can only run matlab code that was included in the mcc options. To run the standalone version of the toolbox, use the <code>run_go.sh</code> bash script which takes two arguments the MCR root and your configuration file. Note: make sure you test your configuration file extensively both locally and on the debug queue of the HPC to avoid wasting your HPC resources as the infrastructure can be very busy at times.

* Log on to HPC and request for some worker nodes (see the HPC wiki for more information), for example this command will request the HPC for node in the debug queue:
qsub -I
* In the terminal, change directory to where the compiled SUMO Toolbox is located (by default this is SUMORoot/csumo)
* Set the MCRROOT environmental variable to point to the MCR root, e.g:
export MCRROOT=/apps/gent/gengar/harpertown/software/MATLAB/MCR_2011a/v715
* Run the a configuration xml-file using:
./run_go.sh $MCRROOT pathToYourConfig/yourConfig.xml

== Submit SUMO jobs ==
Example job script to run a SUMO job on the HPC. It reserves an MCR cache to avoid issues when 2 MATLAB jobs end up on the same node.

<source lang="Bash">
#!/bin/bash

#PBS -N JOBNAME
#PBS -l nodes=1:ppn=1
#PBS -l walltime=11:59:00
#PBS -l vmem=8gb

## name of .m file
name=go

## options to pass to the executable (= parameters for go script)
opts="config/default.xml"

## directory where the execuatble and script can be found
## PBS_O_WORKDIR variable points to the directory you are submitting from
dir=/path/to/compiled/sumo-toolbox

## version: version of MATLAB
version=2012b

module load MATLAB/${version}

if [ ! -d $dir ]
then
echo "Directory $dir is not a directory"
exit 1
fi

cd $dir

if [ ! -x $name ]
then
echo "No executable $name found."
exit 2
fi
script=run_${name}.sh
if [ ! -x $script ]
then
echo "No run script $script found"
exit 3
fi

## make cache dir
## TMPDIR is set and created by torque. 1 unique dir per job
cdir=$TMPDIR/mcrcache
mkdir -p $cdir
if [ ! -d $cdir ]
then
echo "No tempdir $cdir found."
exit 1
fi

## set dir
export MCR_CACHE_ROOT=$cdir
## 1GB cache (more then large enough)
export MCR_CACHE_SIZE=$((1024*1024*1024))

## real running
./$script ${EBROOTMATLAB} $opts > ~/${PBS_JOBNAME}.log 2> ~/${PBS_JOBNAME}.err
</source>
Ideally, configure output to be stored on the local disc of the node to avoid network overhead (although this is currently not possible as environment variables are unavailable in the XML!). Another alternative is $VSC_SCRATCH (fast access network storage). <code>echo ${VSC_SCRATCH}</code> provides the absolute path which can be used in the config.

For more advanced topics (such as array jobs), please refer to the HPC user wiki.

Running SUMO on UGent HPC

2014-03-17T15:04:22Z

Javdrher: /* Compiling a standalone copy of the SUMO Toolbox for use on the HPC */

== Introduction ==

This page provides instructions on how to use the HPC infracture of Ghent university only, although it might apply to other clusters as well. Make sure you have the latest nightly build version of the SUMO Toolbox which you can find [[Downloading|here]] as these instructions are not supported by the SUMO Toolbox version 7.0.2. For more information about using the toolbox with distributed backends go [[Add_Distributed_Backend|here]]. Note though that many features of the toolbox are not yet tested :).

For more information about the UGent HPC infracture itself please visit their [http://hpc.ugent.be/userwiki/index.php/Main_Page website].
To learn how to run Matlab in general on the UGent HPC see this [http://hpc.ugent.be/userwiki/index.php/Main_Page/User:MATLAB page].

== Compiling a standalone copy of the SUMO Toolbox for use on the HPC ==
The SUMO Toolbox needs to be compiled because the worker nodes of the HPC cannot connect to license server and due to the limited number of Matlab licenses. By compiling the toolbox, you create a standalone program that does not need to connect to the license server. The compiled version of the toolbox will not have all functionality however and some '''functionalities''' such the GA algorithms and the ANN Toolbox.

The process of compiling the toolbox is pretty straightforward. Copy the toolbox to your working directory on the server, then load the required modules for compilation. If the separate components of the toolbox still need to be compiled (such as the Java classes and some binary code for certain model types), do so using make.
By default the examples directory is not included in the compiled version of the toolbox and you probably want to add your example to the mcc options. To make a standalone SUMO Toolbox version, type in make-csumo. The steps you need to follow are summarized below.

Currently, compiled MATLAB code can be ran on haunter, gastly (recommended) and gengar. All other clusters (including the default delcatty) do not have MCR installed. Also, remember that the checkpointing framework (for jobs > 72 hours) does not work with MATLAB. SUMO Runs can take no more than 72 hours.

* Log on to the HPC
* Upload a copy of the toolbox if you haven't done so. Tip: place and install the toolbox in $VSC_DATA
* On the Login node, select the proper cluster. For example, to use the gastly cluster:
module swap cluster/gastly
* Load the Matlab and ant module using these commands (you may want to change the versions to suit your needs):
module load MATLAB/2012b
module load ant
export _JAVA_OPTIONS="-Xmx1024M -Xms512M" (use this option in case you need more heap space)
* Open the Makefile in the SUMO root directory and change MATLABDIR to ${EBROOTMATLAB}
MATLABDIR ?= ${EBROOTMATLAB}
* In the terminal, change directory to the SUMO root directory and type in "make" to compile the toolbox for use (this will compile all the Java classes and other libraries needed by the SUMO Toolbox)
* Verify that this first compilation has worked by starting Matlab in the terminal and doing a test run (this will only work on the login node)
* Edit the Makefile again and add the path to the example(s) you plan to run. This is only required if you use a MATLAB script as simulator (MCR can only run scripts that have been included by mcc). The mcc line:
${MATLABDIR}/bin/mcc -m -v -a '''pathToMyExample''' -a ./src/matlab/ -a ./configure.m -a ./startup.m -R '-nodesktop, -nosplash' -d './dist/csumo-toolbox' ./go.m
* Compile the SUMO Toolbox for ''standalone'' use by typing into terminal the following command: <code>make dist-csumo</code>

== Running an example on the HPC infracture ==
You can only run matlab code that was included in the mcc options. To run the standalone version of the toolbox, use the <code>run_go.sh</code> bash script which takes two arguments the MCR root and your configuration file. Note: make sure you test your configuration file extensively both locally and on the debug queue of the HPC to avoid wasting your HPC resources as the infrastructure can be very busy at times.

* Log on to HPC and request for some worker nodes (see the HPC wiki for more information), for example this command will request the HPC for node in the debug queue:
qsub -I
* In the terminal, change directory to where the compiled SUMO Toolbox is located (by default this is SUMORoot/csumo)
* Set the MCRROOT environmental variable to point to the MCR root, e.g:
export MCRROOT=/apps/gent/gengar/harpertown/software/MATLAB/MCR_2011a/v715
* Run the a configuration xml-file using:
./run_go.sh $MCRROOT pathToYourConfig/yourConfig.xml

== Submit SUMO jobs ==
Example job script to run a SUMO job on the HPC. It reserves an MCR cache to avoid issues when 2 MATLAB jobs end up on the same node.

<source lang="Bash">
#!/bin/bash

#PBS -N JOBNAME
#PBS -l nodes=1:ppn=1
#PBS -l walltime=11:59:00
#PBS -l vmem=8gb

## name of .m file
name=go

## options to pass to the executable (= parameters for go script)
opts="config/default.xml"

## directory where the execuatble and script can be found
## PBS_O_WORKDIR variable points to the directory you are submitting from
dir=/path/to/compiled/sumo-toolbox

## version: version of MATLAB
version=2012b

module load MATLAB/${version}

if [ ! -d $dir ]
then
echo "Directory $dir is not a directory"
exit 1
fi

cd $dir

if [ ! -x $name ]
then
echo "No executable $name found."
exit 2
fi
script=run_${name}.sh
if [ ! -x $script ]
then
echo "No run script $script found"
exit 3
fi

## make cache dir
## TMPDIR is set and created by torque. 1 unique dir per job
cdir=$TMPDIR/mcrcache
mkdir -p $cdir
if [ ! -d $cdir ]
then
echo "No tempdir $cdir found."
exit 1
fi

## set dir
export MCR_CACHE_ROOT=$cdir
## 1GB cache (more then large enough)
export MCR_CACHE_SIZE=$((1024*1024*1024))

## real running
./$script ${EBROOTMATLAB} $opts > ~/${PBS_JOBNAME}.log 2> ~/${PBS_JOBNAME}.err
</source>

Running SUMO on UGent HPC

2014-03-17T15:03:14Z

Javdrher: /* Submit SUMO jobs */

== Introduction ==

This page provides instructions on how to use the HPC infracture of Ghent university only, although it might apply to other clusters as well. Make sure you have the latest nightly build version of the SUMO Toolbox which you can find [[Downloading|here]] as these instructions are not supported by the SUMO Toolbox version 7.0.2. For more information about using the toolbox with distributed backends go [[Add_Distributed_Backend|here]]. Note though that many features of the toolbox are not yet tested :).

For more information about the UGent HPC infracture itself please visit their [http://hpc.ugent.be/userwiki/index.php/Main_Page website].
To learn how to run Matlab in general on the UGent HPC see this [http://hpc.ugent.be/userwiki/index.php/Main_Page/User:MATLAB page].

== Compiling a standalone copy of the SUMO Toolbox for use on the HPC ==
The SUMO Toolbox needs to be compiled because the worker nodes of the HPC cannot connect to license server and due to the limited number of Matlab licenses. By compiling the toolbox, you create a standalone program that does not need to connect to the license server. The compiled version of the toolbox will not have all functionality however and some '''functionalities''' such the GA algorithms and the ANN Toolbox.

The process of compiling the toolbox is pretty straightforward. Copy the toolbox to your working directory on the server, then load the required modules for compilation. If the separate components of the toolbox still need to be compiled (such as the Java classes and some binary code for certain model types), do so using make.
By default the examples directory is not included in the compiled version of the toolbox and you probably want to add your example to the mcc options. To make a standalone SUMO Toolbox version, type in make-csumo. The steps you need to follow are summarized below.

Currently, compiled MATLAB code can be ran on haunter, gastly (recommended) and gengar. All other clusters (including the default delcatty) do not have MCR installed. Also, remember that the checkpointing framework (for jobs > 72 hours) does not work with MATLAB. SUMO Runs can take no more than 72 hours.

* Log on to the HPC
* Upload a copy of the toolbox if you haven't done so
* On the Login node, select the proper cluster. For example, to use the gastly cluster:
module swap cluster/gastly
* Load the Matlab and ant module using these commands (you may want to change the versions to suit your needs):
module load MATLAB/2012b
module load ant
export _JAVA_OPTIONS="-Xmx1024M -Xms512M" (use this option in case you need more heap space)
* Open the Makefile in the SUMO root directory and change MATLABDIR to ${EBROOTMATLAB}
MATLABDIR ?= ${EBROOTMATLAB}
* In the terminal, change directory to the SUMO root directory and type in "make" to compile the toolbox for use (this will compile all the Java classes and other libraries needed by the SUMO Toolbox)
* Verify that this first compilation has worked by starting Matlab in the terminal and doing a test run (this will only work on the login node)
* Edit the Makefile again and add the path to the example(s) you plan to run. This is only required if you use a MATLAB script as simulator (MCR can only run scripts that have been included by mcc). The mcc line:
${MATLABDIR}/bin/mcc -m -v -a '''pathToMyExample''' -a ./src/matlab/ -a ./configure.m -a ./startup.m -R '-nodesktop, -nosplash' -d './dist/csumo-toolbox' ./go.m
* Compile the SUMO Toolbox for ''standalone'' use by typing into terminal the following command: <code>make dist-csumo</code>

== Running an example on the HPC infracture ==
You can only run matlab code that was included in the mcc options. To run the standalone version of the toolbox, use the <code>run_go.sh</code> bash script which takes two arguments the MCR root and your configuration file. Note: make sure you test your configuration file extensively both locally and on the debug queue of the HPC to avoid wasting your HPC resources as the infrastructure can be very busy at times.

* Log on to HPC and request for some worker nodes (see the HPC wiki for more information), for example this command will request the HPC for node in the debug queue:
qsub -I
* In the terminal, change directory to where the compiled SUMO Toolbox is located (by default this is SUMORoot/csumo)
* Set the MCRROOT environmental variable to point to the MCR root, e.g:
export MCRROOT=/apps/gent/gengar/harpertown/software/MATLAB/MCR_2011a/v715
* Run the a configuration xml-file using:
./run_go.sh $MCRROOT pathToYourConfig/yourConfig.xml

== Submit SUMO jobs ==
Example job script to run a SUMO job on the HPC. It reserves an MCR cache to avoid issues when 2 MATLAB jobs end up on the same node.

<source lang="Bash">
#!/bin/bash

#PBS -N JOBNAME
#PBS -l nodes=1:ppn=1
#PBS -l walltime=11:59:00
#PBS -l vmem=8gb

## name of .m file
name=go

## options to pass to the executable (= parameters for go script)
opts="config/default.xml"

## directory where the execuatble and script can be found
## PBS_O_WORKDIR variable points to the directory you are submitting from
dir=/path/to/compiled/sumo-toolbox

## version: version of MATLAB
version=2012b

module load MATLAB/${version}

if [ ! -d $dir ]
then
echo "Directory $dir is not a directory"
exit 1
fi

cd $dir

if [ ! -x $name ]
then
echo "No executable $name found."
exit 2
fi
script=run_${name}.sh
if [ ! -x $script ]
then
echo "No run script $script found"
exit 3
fi

## make cache dir
## TMPDIR is set and created by torque. 1 unique dir per job
cdir=$TMPDIR/mcrcache
mkdir -p $cdir
if [ ! -d $cdir ]
then
echo "No tempdir $cdir found."
exit 1
fi

## set dir
export MCR_CACHE_ROOT=$cdir
## 1GB cache (more then large enough)
export MCR_CACHE_SIZE=$((1024*1024*1024))

## real running
./$script ${EBROOTMATLAB} $opts > ~/${PBS_JOBNAME}.log 2> ~/${PBS_JOBNAME}.err
</source>

Running SUMO on UGent HPC

2014-03-17T15:01:07Z

Javdrher: /* Submit SUMO jobs */

== Introduction ==

This page provides instructions on how to use the HPC infracture of Ghent university only, although it might apply to other clusters as well. Make sure you have the latest nightly build version of the SUMO Toolbox which you can find [[Downloading|here]] as these instructions are not supported by the SUMO Toolbox version 7.0.2. For more information about using the toolbox with distributed backends go [[Add_Distributed_Backend|here]]. Note though that many features of the toolbox are not yet tested :).

For more information about the UGent HPC infracture itself please visit their [http://hpc.ugent.be/userwiki/index.php/Main_Page website].
To learn how to run Matlab in general on the UGent HPC see this [http://hpc.ugent.be/userwiki/index.php/Main_Page/User:MATLAB page].

== Compiling a standalone copy of the SUMO Toolbox for use on the HPC ==
The SUMO Toolbox needs to be compiled because the worker nodes of the HPC cannot connect to license server and due to the limited number of Matlab licenses. By compiling the toolbox, you create a standalone program that does not need to connect to the license server. The compiled version of the toolbox will not have all functionality however and some '''functionalities''' such the GA algorithms and the ANN Toolbox.

The process of compiling the toolbox is pretty straightforward. Copy the toolbox to your working directory on the server, then load the required modules for compilation. If the separate components of the toolbox still need to be compiled (such as the Java classes and some binary code for certain model types), do so using make.
By default the examples directory is not included in the compiled version of the toolbox and you probably want to add your example to the mcc options. To make a standalone SUMO Toolbox version, type in make-csumo. The steps you need to follow are summarized below.

Currently, compiled MATLAB code can be ran on haunter, gastly (recommended) and gengar. All other clusters (including the default delcatty) do not have MCR installed. Also, remember that the checkpointing framework (for jobs > 72 hours) does not work with MATLAB. SUMO Runs can take no more than 72 hours.

* Log on to the HPC
* Upload a copy of the toolbox if you haven't done so
* On the Login node, select the proper cluster. For example, to use the gastly cluster:
module swap cluster/gastly
* Load the Matlab and ant module using these commands (you may want to change the versions to suit your needs):
module load MATLAB/2012b
module load ant
export _JAVA_OPTIONS="-Xmx1024M -Xms512M" (use this option in case you need more heap space)
* Open the Makefile in the SUMO root directory and change MATLABDIR to ${EBROOTMATLAB}
MATLABDIR ?= ${EBROOTMATLAB}
* In the terminal, change directory to the SUMO root directory and type in "make" to compile the toolbox for use (this will compile all the Java classes and other libraries needed by the SUMO Toolbox)
* Verify that this first compilation has worked by starting Matlab in the terminal and doing a test run (this will only work on the login node)
* Edit the Makefile again and add the path to the example(s) you plan to run. This is only required if you use a MATLAB script as simulator (MCR can only run scripts that have been included by mcc). The mcc line:
${MATLABDIR}/bin/mcc -m -v -a '''pathToMyExample''' -a ./src/matlab/ -a ./configure.m -a ./startup.m -R '-nodesktop, -nosplash' -d './dist/csumo-toolbox' ./go.m
* Compile the SUMO Toolbox for ''standalone'' use by typing into terminal the following command: <code>make dist-csumo</code>

== Running an example on the HPC infracture ==
You can only run matlab code that was included in the mcc options. To run the standalone version of the toolbox, use the <code>run_go.sh</code> bash script which takes two arguments the MCR root and your configuration file. Note: make sure you test your configuration file extensively both locally and on the debug queue of the HPC to avoid wasting your HPC resources as the infrastructure can be very busy at times.

* Log on to HPC and request for some worker nodes (see the HPC wiki for more information), for example this command will request the HPC for node in the debug queue:
qsub -I
* In the terminal, change directory to where the compiled SUMO Toolbox is located (by default this is SUMORoot/csumo)
* Set the MCRROOT environmental variable to point to the MCR root, e.g:
export MCRROOT=/apps/gent/gengar/harpertown/software/MATLAB/MCR_2011a/v715
* Run the a configuration xml-file using:
./run_go.sh $MCRROOT pathToYourConfig/yourConfig.xml

== Submit SUMO jobs ==
Example job script to run a SUMO job on the HPC. It reserves an MCR cache to avoid issues when 2 MATLAB jobs end up on the same node.

<code>
#!/bin/bash

#PBS -N JOBNAME
#PBS -l nodes=1:ppn=1
#PBS -l walltime=11:59:00
#PBS -l vmem=8gb

## name of .m file
name=go

## options to pass to the executable (= parameters for go script)
opts="config/default.xml"

## directory where the execuatble and script can be found
## PBS_O_WORKDIR variable points to the directory you are submitting from
dir=/path/to/compiled/sumo-toolbox

## version: version of MATLAB
version=2012b

module load MATLAB/${version}

if [ ! -d $dir ]
then
echo "Directory $dir is not a directory"
exit 1
fi

cd $dir

if [ ! -x $name ]
then
echo "No executable $name found."
exit 2
fi
script=run_${name}.sh
if [ ! -x $script ]
then
echo "No run script $script found"
exit 3
fi

## make cache dir
## TMPDIR is set and created by torque. 1 unique dir per job
cdir=$TMPDIR/mcrcache
mkdir -p $cdir
if [ ! -d $cdir ]
then
echo "No tempdir $cdir found."
exit 1
fi

## set dir
export MCR_CACHE_ROOT=$cdir
## 1GB cache (more then large enough)
export MCR_CACHE_SIZE=$((1024*1024*1024))

## real running
./$script ${EBROOTMATLAB} $opts > ~/${PBS_JOBNAME}.log 2> ~/${PBS_JOBNAME}.err
</code>

Running SUMO on UGent HPC

2014-03-17T15:00:11Z

Javdrher:

== Introduction ==

This page provides instructions on how to use the HPC infracture of Ghent university only, although it might apply to other clusters as well. Make sure you have the latest nightly build version of the SUMO Toolbox which you can find [[Downloading|here]] as these instructions are not supported by the SUMO Toolbox version 7.0.2. For more information about using the toolbox with distributed backends go [[Add_Distributed_Backend|here]]. Note though that many features of the toolbox are not yet tested :).

For more information about the UGent HPC infracture itself please visit their [http://hpc.ugent.be/userwiki/index.php/Main_Page website].
To learn how to run Matlab in general on the UGent HPC see this [http://hpc.ugent.be/userwiki/index.php/Main_Page/User:MATLAB page].

== Compiling a standalone copy of the SUMO Toolbox for use on the HPC ==
The SUMO Toolbox needs to be compiled because the worker nodes of the HPC cannot connect to license server and due to the limited number of Matlab licenses. By compiling the toolbox, you create a standalone program that does not need to connect to the license server. The compiled version of the toolbox will not have all functionality however and some '''functionalities''' such the GA algorithms and the ANN Toolbox.

The process of compiling the toolbox is pretty straightforward. Copy the toolbox to your working directory on the server, then load the required modules for compilation. If the separate components of the toolbox still need to be compiled (such as the Java classes and some binary code for certain model types), do so using make.
By default the examples directory is not included in the compiled version of the toolbox and you probably want to add your example to the mcc options. To make a standalone SUMO Toolbox version, type in make-csumo. The steps you need to follow are summarized below.

Currently, compiled MATLAB code can be ran on haunter, gastly (recommended) and gengar. All other clusters (including the default delcatty) do not have MCR installed. Also, remember that the checkpointing framework (for jobs > 72 hours) does not work with MATLAB. SUMO Runs can take no more than 72 hours.

* Log on to the HPC
* Upload a copy of the toolbox if you haven't done so
* On the Login node, select the proper cluster. For example, to use the gastly cluster:
module swap cluster/gastly
* Load the Matlab and ant module using these commands (you may want to change the versions to suit your needs):
module load MATLAB/2012b
module load ant
export _JAVA_OPTIONS="-Xmx1024M -Xms512M" (use this option in case you need more heap space)
* Open the Makefile in the SUMO root directory and change MATLABDIR to ${EBROOTMATLAB}
MATLABDIR ?= ${EBROOTMATLAB}
* In the terminal, change directory to the SUMO root directory and type in "make" to compile the toolbox for use (this will compile all the Java classes and other libraries needed by the SUMO Toolbox)
* Verify that this first compilation has worked by starting Matlab in the terminal and doing a test run (this will only work on the login node)
* Edit the Makefile again and add the path to the example(s) you plan to run. This is only required if you use a MATLAB script as simulator (MCR can only run scripts that have been included by mcc). The mcc line:
${MATLABDIR}/bin/mcc -m -v -a '''pathToMyExample''' -a ./src/matlab/ -a ./configure.m -a ./startup.m -R '-nodesktop, -nosplash' -d './dist/csumo-toolbox' ./go.m
* Compile the SUMO Toolbox for ''standalone'' use by typing into terminal the following command: <code>make dist-csumo</code>

== Running an example on the HPC infracture ==
You can only run matlab code that was included in the mcc options. To run the standalone version of the toolbox, use the <code>run_go.sh</code> bash script which takes two arguments the MCR root and your configuration file. Note: make sure you test your configuration file extensively both locally and on the debug queue of the HPC to avoid wasting your HPC resources as the infrastructure can be very busy at times.

* Log on to HPC and request for some worker nodes (see the HPC wiki for more information), for example this command will request the HPC for node in the debug queue:
qsub -I
* In the terminal, change directory to where the compiled SUMO Toolbox is located (by default this is SUMORoot/csumo)
* Set the MCRROOT environmental variable to point to the MCR root, e.g:
export MCRROOT=/apps/gent/gengar/harpertown/software/MATLAB/MCR_2011a/v715
* Run the a configuration xml-file using:
./run_go.sh $MCRROOT pathToYourConfig/yourConfig.xml

== Submit SUMO jobs ==
Example job script to run a SUMO job on the HPC. It reserves an MCR cache to avoid issues when 2 MATLAB jobs end up on the same node.

#!/bin/bash

#PBS -N JOBNAME
#PBS -l nodes=1:ppn=1
#PBS -l walltime=11:59:00
#PBS -l vmem=8gb

## name of .m file
name=go

## options to pass to the executable (= parameters for go script)
opts="config/default.xml"

## directory where the execuatble and script can be found
## PBS_O_WORKDIR variable points to the directory you are submitting from
dir=/path/to/compiled/sumo-toolbox

## version: version of MATLAB
version=2012b

module load MATLAB/${version}

if [ ! -d $dir ]
then
echo "Directory $dir is not a directory"
exit 1
fi

cd $dir

if [ ! -x $name ]
then
echo "No executable $name found."
exit 2
fi
script=run_${name}.sh
if [ ! -x $script ]
then
echo "No run script $script found"
exit 3
fi

## make cache dir
## TMPDIR is set and created by torque. 1 unique dir per job
cdir=$TMPDIR/mcrcache
mkdir -p $cdir
if [ ! -d $cdir ]
then
echo "No tempdir $cdir found."
exit 1
fi

## set dir
export MCR_CACHE_ROOT=$cdir
## 1GB cache (more then large enough)
export MCR_CACHE_SIZE=$((1024*1024*1024))

## real running
./$script ${EBROOTMATLAB} $opts > ~/${PBS_JOBNAME}.log 2> ~/${PBS_JOBNAME}.err

Running SUMO on UGent HPC

2014-03-17T14:47:10Z

Javdrher: /* Compiling a standalone copy of the SUMO Toolbox for use on the HPC */

Running SUMO on UGent HPC

2014-03-17T14:43:14Z

Javdrher: /* Compiling a standalone copy of the SUMO Toolbox for use on the HPC */

Running SUMO on UGent HPC

2014-03-17T14:41:58Z

Javdrher: /* Compiling a standalone copy of the SUMO Toolbox for use on the HPC */

FAQ

2014-03-13T09:21:13Z

Javdrher: /* What happened to the M3-Toolbox? */

== General ==

=== What is a global surrogate model? ===

A global [http://en.wikipedia.org/wiki/Surrogate_model surrogate model] is a mathematical model that mimics the behavior of a computationally expensive simulation code over '''the complete parameter space''' as accurately as possible, using as little data points as possible. So note that optimization is not the primary goal, although it can be done as a post-processing step. Global surrogate models are useful for:

* design space exploration, to get a ''feel'' of how the different parameters behave
* sensitivity analysis
* ''what-if'' analysis
* prototyping
* visualization
* ...

In addition they are a cheap way to model large scale systems, multiple global surrogate models can be chained together in a model cascade.

See also the [[About]] page.

=== What about surrogate driven optimization? ===

When coining the term '''surrogate driven optimization''' most people associate it with trust-region strategies and simple polynomial models. These frameworks first construct a local surrogate which is optimized to find an optimum. Afterwards, a move limit strategy decides how the local surrogate is scaled and/or moved through the input space. Subsequently the surrogate is rebuild and optimized. I.e. the surrogate zooms in to the global optimum. For instance the [http://www.cs.sandia.gov/DAKOTA/ DAKOTA] Toolbox implements such strategies where the surrogate construction is separated from optimization.

Such a framework was earlier implemented in the SUMO Toolbox but was deprecated as it didn't fit the philosophy and design of the toolbox.

Instead another, equally powerful, approach was taken. The current optimization framework is in fact a sampling selection strategy that balances local and global search. In other words, it balances between exploring the input space and exploiting the information the surrogate gives us.

A configuration example can be found [[Config:SampleSelector#expectedImprovement|here]].

=== What is (adaptive) sampling? Why is it used? ===

In classical Design of Experiments you need to specify the design of your experiment up-front. Or in other words, you have to say up-front how many data points you need and how they should be distributed. Two examples are Central Composite Designs and Latin Hypercube designs. However, if your data is expensive to generate (e.g., an expensive simulation code) it is not clear how many points are needed up-front. Instead data points are selected adaptively, only a couple at a time. This process of incrementally selecting new data points in regions that are the most interesting is called adaptive sampling, sequential design, or active learning. Of course the sampling process needs to start from somewhere so the very first set of points is selected based on a fixed, classic experimental design. See also [[Running#Understanding_the_control_flow]].
SUMO provides a number of different sampling algorithms: [[Config:SequentialDesign|SequentialDesign]]

Of course sometimes you dont want to do sampling. For example if you have a fixed dataset you just want to load all the data in one go and model that. For how to do this see [[FAQ#How_do_I_turn_off_adaptive_sampling_.28run_the_toolbox_for_a_fixed_set_of_samples.29.3F]].

=== What about dynamical, time dependent data? ===

The original design and purpose was to tackle static input-output systems, where there is no memory. Just a complex mapping that must be learnt and approximated. Of course you can take a fixed time interval and apply the toolbox but that typically is not a desired solution. Usually you are interested in time series prediction, e.g., given a set of output values from time t=0 to t=k, predict what happens at time t=k+1,k+2,...

The toolbox was originally not intended for this purpose. However, it is quite easy to add support for recurrent models. Automatic generation of dynamical models would involve adding a new model type (just like you would add a new regression technique) or require adapting an existing one. For example it would not be too much work to adapt the ANN or SVM models to support dynamic problems. The only extra work besides that would be to add a new [[Measures|Measure]] that can evaluate the fidelity of the models' prediction.

Naturally though, you would be unable to use sample selection (since it makes no sense in those problems). Unless of course there is a specialized need for it. In that case you would add a new [[Config:SequentialDesign|SequentialDesign]].

For more information on this topic [[Contact]] us.

=== What about classification problems? ===

The main focus of the SUMO Toolbox is on regression/function approximation. However, the framework for hyperparameter optimization, model selection, etc. can also be used for classification. Starting from version 6.3 a demo file is included in the distribution that shows how this works on the well known two spiral test problem. It is possible to specify a run as a classification problem by setting the 'classificationMode' and 'numberOfClasses' option in ContextConfig in the configuration file. Classification models from WEKA are also available in SUMO. Please refer to the default configuration file for the explanation on usage of WEKA model types available through SUMO. The LOLA-Voronoi sample selection scheme also supports classification, and its usage is documented in the default configuration file as well.

=== Does SUMO support discrete inputs/outputs ===

Not, if you mean in a smart way. There is a way to flag an input/output as discrete but it is not used anywhere. It is on the wishlist but we have not been able to get to it yet. Discrete inputs are just handled as if they were continuous. Depending on how many levels there are and if there is an ordering this may work ok or not work at all. You could of course add your own model type that can handle these :) As for discrete outputs see [[FAQ#What_about_classification_problems.3F]].

=== Can the toolbox drive my simulation code directly? ===

Yes it can. See the [[Interfacing with the toolbox]] page.

=== What is the difference between the M3-Toolbox and the SUMO-Toolbox? ===

The SUMO toolbox is a complete, feature-full framework for automatically generating approximation models and performing adaptive sampling. In contrast, the M3-Toolbox was more of a proof-of-principle.

=== What happened to the M3-Toolbox? ===

The M3 Toolbox project has been discontinued (Fall 2007) and superseded by the SUMO Toolbox. Please contact tom.dhaene@ugent.be for any inquiries and requests about the M3 Toolbox.

=== How can I stay up to date with the latest news? ===

To stay up to date with the latest news and releases, we also recommend subscribing to our newsletter [http://www.sumo.intec.ugent.be here]. Traffic will be kept to a minimum (1 message every 2-3 months) and you can unsubscribe at any time.

You can also follow our blog: [http://sumolab.blogspot.com/ http://sumolab.blogspot.com/].

=== What is the roadmap for the future? ===

There is no explicit roadmap since much depends on where our research leads us, what feedback we get, which problems we are working on, etc. However, to get an idea of features to come you can always check the [[Whats new]] page.

You can also follow our blog: [http://sumolab.blogspot.com/ http://sumolab.blogspot.com/].

=== Will there be an R/Scilab/Octave/Sage/.. version? ===

At the start of the project we considered moving from Matlab to one of the available open source alternatives. However, after much discussion we decided against this for several reasons, including:

* Existing experience and know-how of the development team
* The widespread use of the Matlab platform in the target application domains
* The quality and amount of available Matlab documentation
* The quality and number of Matlab toolboxes
* Support for object orientation (inheritance, polymorphism, etc.)
* Many well documented interfacing options (especially the seamless integration with Java)

Matlab, as a proprietary platform, definitely has its problems and deficiencies but the number of advanced algorithms and available toolboxes make it a very attractive platform. Equally important is the fact that every function is properly documented, tested, and includes examples, tutorials, and in some cases GUI tools. A lot of things would have been a lot harder and/or time consuming to implement on one of the other platforms. Add to that the fact that many engineers (particularly in aerospace) already use Matlab quite heavily. Thus given our situation, goals, and resources at the time, Matlab was the best choice for us.

The other platforms remain on our radar however, and we do look into them from time to time. Though, with our limited resources porting to one of those platforms is not (yet) cost effective.

=== What are collaboration options? ===

We will gladly help out with any SUMO-Toolbox related questions or problems. However, since we are a university research group the most interesting goal for us is to work towards some joint publication (e.g., we can help with the modeling of your problem). Alternatively, it is always nice if we could use your data/problem (fully referenced and/or anonymized if necessary of course) as an example application during a conference presentation or in a PhD thesis.

The most interesting case is if your problem involves sample selection and modeling. This means you have some simulation code or script to drive and you want an accurate model while minimizing the number of data points. In this case, in order for us to optimally help you it would be easiest if we could run your simulation code (or script) locally or access it remotely. Else its difficult to give good recommendations about what settings to use.

If this is not possible (e.g., expensive, proprietary or secret modeling code) or if your problem does not involve sample selection, you can send us a fixed data set that is representative of your problem. Again, this may be fully anonymized and will be kept confidential of course.

In either case (code or dataset) remember:

* the data file should be an ASCII file in column format (each row containing one data point) (see also [[Interfacing_with_the_toolbox]])
* include a short description of your data:
** number of inputs and number of outputs
** the range of each input (or scaled to [-1 1] if you do not wish to disclose this)
** if the outputs are real or complex valued
** how noisy the data is or if it is completely deterministic (computer simulation) (please also see: [[FAQ#My_data_contains_noise_can_the_SUMO-Toolbox_help_me.3F]]).
** if possible the expected range of each output (or scaled if you do not wish to disclose this)
** if possible the names of each input/output + a short description of what they mean
** any further insight you have about the data, expected behavior, expected importance of each input, etc.

If you have any further questions or comments related to this please [[Contact]] us.

=== Can you help me model my problem? ===

Please see the previous question: [[FAQ#What_are_collaboration_options.3F]]

== Installation and Configuration ==

=== What is the relationship between Matlab and Java? ===

Many people do not know this, but your Matlab installation automatically includes a Java virtual machine. By default, Matlab seamlessly integrates with Java, allowing you to create Java objects from the command line (e.g., 's = java.lang.String'). It is possible to disable java support but in order to use the SUMO Toolbox it should not be. To check if Java is enabled you can use the 'usejava' command.

=== What is Java, why do I need it, do I have to install it, etc. ? ===

The short answer is: no, dont worry about it. The long answer is: Some of the code of the SUMO Toolbox is written in [http://en.wikipedia.org/wiki/Java_(programming_language) Java], since it makes a lot more sense in many situations and is a proper programming language instead of a scripting language like Matlab. Since Matlab automatically includes a JVM to run Java code there is nothing you need to do or worry about (see the previous FAQ entry). Unless its not working of course, in that case see [[FAQ#When_running_the_toolbox_you_get_something_like_.27.3F.3F.3F_Undefined_variable_.22ibbt.22_or_class_.22ibbt.sumo.config.ContextConfig.setRootDirectory.22.27]].

=== What is XML? ===

XML stands for eXtensible Markup Language and is related to HTML (= the stuff web pages are written in). The first thing you have to understand is that '''does not do anything'''. Honest. Many engineers are not used to it and think it is some complicated computer programming language-stuff-thingy. This is of course not the case (we ignore some of the fancy stuff you can do with it for now). XML is a markup language meaning, it provides some rules how you can annotate or structure existing text.

The way SUMO uses XML is really simple and there is not much to understand (for more information on how SUMO uses XML go this [[Config:ToolboxConfiguration#Interpreting_the_configuration_file|page]]).

First some simple terminology. Take the following example:

<source lang="xml">
<Foo attr="bar">bla bla bla</Foo>
</source>

Here we have '''a tag''' called ''Foo'' containing text ''bla bla bla''. The tag Foo also has an '''attribute''' ''attr'' with value ''bar''. '<Foo>' is what we call the '''opening tag''', and '</Foo>' is the '''closing tag'''. Each time you open a tag you must close it again. How you name the tags or attributes it totally up to you, you choose :)

Lets take a more interesting example. Here we have used XML to represent information about a receipe for pancakes:

<source lang="xml">
<recipe category="dessert">
<title>Pancakes</title>
<author>sumo@intec.ugent.be</author>
<date>Wed, 14 Jun 95</date>
<description>
Good old fashioned pancakes.
</description>
<ingredients>
<item>
<amount>3</amount>
<type>eggs</type>
</item>

<item>
<amount>0.5 tablespoon</amount>
<type>salt</type>
</item>
...
</ingredients>
<preparation>
...
</preparation>
</recipe>
</source>

So basically, you see that XML is just a way to structure, order, and group information. Thats it! So SUMO basically uses it to store and structure configuration options. And this works well due to the nice hierarchical nature of XML.

If you understand this there is nothing else to it in order to be able to understand the SUMO configuration files. If you need more information see the tutorial here: [http://www.w3schools.com/XML/xml_whatis.asp http://www.w3schools.com/XML/xml_whatis.asp]. You can also have a look at the wikipedia page here: [http://en.wikipedia.org/wiki/XML http://en.wikipedia.org/wiki/XML]

=== Why does SUMO use XML? ===

XML is the de facto standard way of structuring information. This ranges from spreadsheet files (Microsoft Excel for example), to configuration data, to scientific data, ... There are even whole database systems based solely on XML. So basically, its an intuitive way to structure data and it is used everywhere. This makes that there are a very large number of libraries and programming languages available that can parse, and handle XML easily. That means less work for the programmer. Then of course there is stuff like XSLT, XQuery, etc that makes life even easier.
So basically, it would not make sense for SUMO to use any other format :). For more information on how SUMO uses XML go this [[Config:ToolboxConfiguration#Interpreting_the_configuration_file|page]].

=== I get an error that SUMO is not yet activated ===

Make sure you installed the activation file that was mailed to you as is explained in the [[Installation]] instructions. Also double check your system meets the [[System requirements]] and that [http://www.sumowiki.intec.ugent.be/index.php/FAQ#When_running_the_toolbox_you_get_something_like_.27.3F.3F.3F_Undefined_variable_.22ibbt.22_or_class_.22ibbt.sumo.config.ContextConfig.setRootDirectory.22.27|java java is enabled]. To fully verify that the activation file installation is correct ensure that the file ContextConfig.class is present in the directory ''<SUMO installation directory>/bin/java/ibbt/sumo/config''.

Please note that more flexible research licenses are available if it is possible to [[FAQ#What_are_collaboration_options.3F|collaborate in any way]].

== Upgrading ==

=== How do I upgrade to a newer version? ===

Delete your old <code><SUMO-Toolbox-directory></code> completely and replace it by the new one. Install the new activation file / extension pack as before (see [[Installation]]), start Matlab and make sure the default run works. To port your old configuration files to the new version: make a copy of default.xml (from the new version) and copy over your custom changes (from the old version) one by one. This should prevent any weirdness if the XML structure has changed between releases.

If you had a valid activation file for the previous version, just [[Contact]] us (giving your SUMOlab website username) and we will send you a new activation file. Note that to update an activation file you must first unzip a copy of the toolbox to a new directory and install the activation file as if it was the very first time. Upgrading of an activation file without performing a new toolbox install is (unfortunately) not (yet) supported.

== Using ==

=== I have no idea how to use the toolbox, what should I do? ===

See: [[Running#Getting_started]]

=== I want to try one of the different examples ===

See [[Running#Running_different_examples]].

=== I want to model my own problem ===

See : [[Adding an example]].

=== I want to contribute some data/patch/documentation/... ===

See : [[Contributing]].

=== How do I interface with the SUMO Toolbox? ===

See : [[Interfacing with the toolbox]].

=== What configuration options (model type, sample selection algorithm, ...) should I use for my problem? ===

See [[General_guidelines]].

=== Ok, I generated a model, what can I do with it? ===

See: [[Using a model]].

=== How can I share a model created by the SUMO Toolbox? ===

See : [[Using a model#Model_portability| Model portability]].

=== I dont like the final model generated by SUMO how do I improve it? ===

Before you start the modeling you should really ask youself this question: ''What properties do I want to see in the final model?'' You have to think about what for you constitutes a good model and what constitutes a poor model. Then you should rank those properties depending on how important you find them. Examples are:

* accuracy in the training data
** is it important that the error in the training data is exactly 0, or do you prefer some smoothing
* accuracy outside the training data
** this is the validation or test error, how important is proper generalization (usually this is very important)
* what does accuracy mean to you? a low maximum error, a low average error, both, ...
* smoothness
** should your model be perfectly smooth or is it acceptable that you have a few small ripples here and there for example
* are some regions of the response more important than others?
** for example you may want to be certain that the minima/maxima are captured very accurately but everything in between is less important
* are there particular special features that your model should have
** for example, capture underlying poles or discontinuities correctly
* extrapolation capability
* ...

It is important to note that often these criteria may be conflicting. The classical example is fitting noisy data: the lower your training error the higher your testing error. A natural approach is to combine multiple criteria, see [[Multi-Objective Modeling]].

Once you have decided on a set of requirements the question is then, can the SUMO-Toolbox produce a model that meets them? In SUMO model generation is driven by one or more [[Measures]]. So you should choose the combination of [[Measures]] that most closely match your requirements. Of course we can not provide a Measure for every single property, but it is very straightforward to [[Add_Measure|add your own Measure]].

Now, lets say you have chosen what you think are the best Measures but you are still not happy with the final model. Reasons could be:

* you need more modeling iterations or you need to build more models per iteration (see [[Running#Understanding_the_control_flow]]). This will result in a more extensive search of the model parameter space, but will take longer to run.
* you should switch to a different model parameter optimization algorithm (e.g., for example instead of the Pattern Search variant, try the Genetic Algorithm variant of your AdaptiveModelBuilder.)
* the model type you are using is not ideally suited to your data
* there simply is not enough data, use a larger initial design or perform more sampling iterations to get more information per dimension
* maybe the sample distribution is causing troubles for your model (e.g., Kriging can have problems with clustered data). In that case it could be worthwhile to choose a different sample selection algorithm.
* the range of your response variable is not ideal (for example, neural networks have trouble modeling data if the range of the outputs is very very small)

You may also refer to the following [[General_guidelines]]. Finally, of course it may be that your problem is simply a very difficult one and does not approximate well. But, still you should at least get something satisfactory.

If you are having these kinds of problems, please [[Reporting_problems|let us know]] and we will gladly help out.

=== My data contains noise can the SUMO-Toolbox help me? ===

The original purpose of the SUMO-Toolbox was for it to be used in conjunction with computer simulations. Since these are fully deterministic you do not have to worry about noise in the data and all the problems it causes. However, the methods in the toolbox are general fitting methods that work on noisy data as well. So yes, the toolbox can be used with noisy data, but you will just have to be more careful about how you apply the methods and how you perform model selection. Its only when you use the toolbox with a noisy simulation engine that a few special options may need to be set. In that case [[Contact]] us for more information.

Note though, that the toolbox is not a statistical package, if you have noisy data and you need noise estimation algorithms, kernel smoothing algorithms, etc. you should look towards other tools.

=== What is the difference between a ModelBuilder and a ModelFactory? ===

See [[Add Model Type]].

=== Why are the Neural Networks so slow? ===

The ANN models are an extremely powerful model type that give very good results in many problems. However, they are quite slow to use. There are some things you can do:

* use trainlm or trainscg instead of the default training function trainbr. trainbr gives very good, smooth results but is slower to use. If results with trainlm are not good enough, try using msereg as a performance function.
* try setting the training goal (= the SSE to reach during training) to a small positive number (e.g., 1e-5) instead of 0.
* check that the output range of your problem is not very small. If your response data lies between 10e-5 and 10e-9 for example it will be very hard for the neural net to learn it. In that case rescale your data to a more sane range.
* switch from ANN to one of the other neural network modelers: fanngenetic or nanngenetic. These are a lot faster than the default backend based on the [http://www.mathworks.com/products/neuralnet/ Matlab Neural Network Toolbox]. However, the accuracy is usually not as good.
* If you are using [[Measures#CrossValidation| CrossValidation]] try to switch to a different measure since CrossValidation is very expensive to use. CrossValidation is used by default if you have not defined a [[Measures| measure]] yourself. When using one of the neural network model types, try to use a different measure if you can. For example, our tests have shown that minimizing the sum of [[Measures#SampleError| SampleError]] and [[Measures#LRMMeasure| LRMMeasure]] can give equal or even better results than CrossValidation, while being much cheaper (see [[Multi-Objective Modeling]] for how to combine multiple measures). See also the comments in <code>default.xml</code> for examples.
* Finally, as with any model type things will slow down if you have many dimensions or very large amounts of data. If that is the case, try some dimensionality reduction or subsampling techniques.

See also [[FAQ#How_can_I_make_the_toolbox_run_faster.3F]]

=== How can I make the toolbox run faster? ===

There are a number of things you can do to speed things up. These are listed below. Remember though that the main reason the toolbox may seem to be slow is due to the many models being built as part of the hyperparameter optimization. Please make sure you fully understand the [[Running#Understanding_the_control_flow|control flow described here]] before trying more advanced options.

* First of all check that your virus scanner is not interfering with Matlab. If McAfee or any other program wants to scan every file SUMO generates this really slows things down and your computer becomes unusable.

* Turn off the plotting of models in [[Config:ContextConfig#PlotOptions| ContextConfig]], you can always generate plots from the saved mat files

* This is an important one. For most model builders there is an option "maxFunEals", "maxIterations", or equivalent. Change this value to change the maximum number of models built between 2 sampling iterations. The higher this number, the slower, but the better the models ''may'' be. Equivalently, for the Genetic model builders reduce the population size and the number of generations.

* If you are using [[Measures#CrossValidation]] see if you can avoid it and use one of the other measures or a combination of measures (see [[Multi-Objective Modeling]]

* If you are using a very dense [[Measures#ValidationSet]] as your Measure, this means that every single model will be evaluated on that data set. For some models like RBF, Kriging, SVM, this can slow things down.

* Disable some, or even all of the [[Config:ContextConfig#Profiling| profilers]] or disable the output handlers that draw charts. For example, you might use the following configuration for the profilers:

<source lang="xml">
<Profiling>
<Profiler name=".*share.*|.*ensemble.*|.*Level.*" enabled="true">
<Output type="toImage"/>
<Output type="toFile"/>
</Profiler>

<Profiler name=".*" enabled="true">
<Output type="toFile"/>
</Profiler>
</Profiling>
</source>

The ".*" means match any one or more characters ([http://java.sun.com/j2se/1.5.0/docs/api/java/util/regex/Pattern.html see here for the full list of supported wildcards]). Thus in this example all the profilers that have "share", "ensemble", or "Level" in their name shoud be enabled and should be saved as a text file (toFile) AND as an image file (toImage). All the other profilers should be saved just to file. The idea is to only save to image what you want as an image since image generation is expensive. If you do this or switch off image generation completely you will see everything run much faster.

* Decrease the logging granularity, a log level of FINE (the default is FINEST or ALL) is more then granular enough. Setting it to FINE, INFO, or even WARNING should speed things up.

* If you have a multi-core/multi-cpu machine:
** if you have the Matlab Parallel Computing Toolbox, try setting the parallelMode option to true in [[Config:ContextConfig]]. Now all model training occurs in parallel. This may give unexpected errors in some cases so beware when using.
** if you are using a native executable or script as the sample evaluator set the threadCount variable in [[Config:SampleEvaluator#LocalSampleEvaluator| LocalSampleEvaluator]] equal to the number of cores/CPUs (only do this if it is ok to start multiple instances of your simulation script in parallel!)

* Dont use the Min-Max measure, it can slow things down. See also [[FAQ#How_do_I_force_the_output_of_the_model_to_lie_in_a_certain_range]]

* If you are using neural networks see [[FAQ#Why_are_the_Neural_Networks_so_slow.3F]]

* If you are having problems with very slow or seemingly hanging runs:
** Do a run inside the [http://www.mathworks.com/access/helpdesk/help/techdoc/index.html?/access/helpdesk/help/techdoc/matlab_env/f9-17018.html&http://www.google.be/search?client=firefox-a&rls=org.mozilla%3Aen-US%3Aofficial&channel=s&hl=nl&q=matlab+profiler&meta=&btnG=Google+zoeken Matlab profiler] and see where most time is spent.

** Monitor CPU and physical/virtual memory usage while the SUMO toolbox is running and see if you notice anything strange.

* Also note that by default Matlab only allocates about 117 MB memory space for the Java Virtual Machine. If you would like to increase this limit (which you should) please follow the instructions [http://www.mathworks.com/support/solutions/data/1-18I2C.html?solution=1-18I2C here]. See also the general memory instructions [http://www.mathworks.com/support/tech-notes/1100/1106.html here].

To check if your SUMO run has hanged, monitor your log file (with the level set at least to FINE). If you see no changes for about 30 minutes the toolbox will probably have stalled. [[Reporting problems| report the problems here]].

Such problems are hard to identify and fix so it is best to work towards a reproducible test case if you think you found a performance or scalability issue.

=== How do I build models with more than one output ===

Sometimes you have multiple responses that you want to model at once. See [[Running#Models_with_multiple_outputs]]

=== How do I turn off adaptive sampling (run the toolbox for a fixed set of samples)? ===

See : [[Adaptive Modeling Mode]].

=== How do I change the error function (relative error, RMSE, ...)? ===

The [[Measures| <Measure>]] tag specifies the algorithm to use to assign models a score, e.g., [[Measures#CrossValidation| CrossValidation]]. It is also possible to specify which '''error function''' to use, in the measure. The default error function is '<code>rootRelativeSquareError</code>'.

Say you want to use [[Measures#CrossValidation| CrossValidation]] with the maximum absolute error, then you would put:

<source lang="xml">
<Measure type="CrossValidation" target="0.001" errorFcn="maxAbsoluteError"/>
</source>

On the other hand, if you wanted to use the [[Measures#ValidationSet| ValidationSet]] measure with a relative root-mean-square error you would put:

<source lang="xml">
<Measure type="ValidationSet" target="0.001" errorFcn="relativeRms"/>
</source>

The default error function is '<code>rootRelativeSquareError</code>'. These error functions can be found in the <code>src/matlab/tools/errorFunctions</code> directory. You are free to modify them and add your own. Remember that the choice of error function is very important! Make sure you think well about it. Also see [[Multi-Objective Modeling]].

=== How do I enable more profilers? ===

Go to the [[Config:ContextConfig#Profiling| <Profiling>]] tag and put <code>"<nowiki>.*</nowiki>"</code> as the regular expression. See also the next question.

=== What regular expressions can I use to filter profilers? ===

See the syntax [http://java.sun.com/j2se/1.5.0/docs/api/java/util/regex/Pattern.html here].

=== How can I ensure deterministic results? ===

See : [[Random state]].

=== How do I get a simple closed-form model (symbolic expression)? ===

See : [[Using a model]].

=== How do I enable the Heterogenous evolution to automatically select the best model type? ===

Simply use the [[Config:AdaptiveModelBuilder#heterogenetic| heterogenetic modelbuilder]] as you would any other.

=== What is the combineOutputs option? ===

See [[Running#Models_with_multiple_outputs]]

=== What error function should I use? ===

The default error function is the Root Relative Square Error (RRSE). On the other hand meanRelativeError may be more intuitive but in that case you have to be careful if you have function values close to zero since in that case the relative error explodes or even gives infinity. You could also use one of the combined relative error functions (contain a +1 in the denominator to account for small values) but then you get something between a relative and absolute error (=> hard to interpret).

So to be sure an absolute error seems the safest bet (like the RMSE), however in that case you have to come up with sensible accuracy targets and realize that you will build models that try to fit the regions of high absolute value better than the low ones.

Picking an error function is a very tricky business and many people do not realize this. Which one is best for you and what targets you use ultimately depends on your application and on what kind of model you want. There is no general answer.

A recommended read is [http://www.springerlink.com/content/24104526223221u3/ is this paper]. See also the page on [[Multi-Objective Modeling]].

=== I just want to generate an initial design (no sampling, no modeling) ===

Do a regular SUMO run, except set the 'maxModelingIterations' in the SUMO tag to 0. The resulting run will only generate (and evaluate) the initial design and save it to samples.txt in the output directory.

=== How do I start a run with the samples of of a previous run, or with a custom initial design? ===

Use a Dataset design component, for example:

<source lang="xml">
<InitialDesign type="DatasetDesign">
<Option key="file" value="/path/to/the/file/containing/the/points.txt"/>
</InitialDesign>
</source>

The points of a previous run can be found in the samples.txt file in the output directory of the run you want to continue.

As a sidenote, remark you can start the toolbox with *data points* of a previous run, but not with the *models* of a previous run.

=== What is a level plot? ===

A level plot is a plot that shows how the error histogram changes as the best model improves. An example is:
<gallery>
Image:levelplot.png
</gallery>
Level plots only work if you have a separate dataset (test set) that the model can be checked against. See the comments in default.xml for how to enable level plots.

=== How do I force the output of the model to lie in a certain range ===

See [[Measures#MinMax]].

=== My problem is high dimensional and has a lot of input parameters (more than 10). Can I use SUMO? ===

That depends. Remember that the main focus of SUMO is to generate accurate 'global' models. If you want to do sampling the practical dimensionality is limited to around 6-8 (though it depends on the problem and how cheap the simulations are!). Since the more dimensions the more space you need to fill. At that point you need to see if you can extend the models with domain specific knowledge (to improve performance) or apply a dimensionality reduction method ([[FAQ#Can_the_toolbox_tell_me_which_are_the_most_important_inputs_.28.3D_variable_selection.29.3F|see the next question]]). On the other hand, if you don't need to do sample selection but you have a fixed dataset which you want to model. Then the performance on high dimensional data just depends on the model type. For examples SVM type models are independent of the dimension and thus can always be applied. Though things like feature selection are always recommended.

=== Can the toolbox tell me which are the most important inputs (= variable selection)? ===

When tackling high dimensional problems a crucial question is "Are all my input parameters relevant?". Normally domain knowledge would answer this question but this is not always straightforward. In those cases a whole set of algorithms exist for doing dimensionality reduction (= feature selection). Support for some of these algorithms may eventually make it into the toolbox but are not currently implemented. That is a whole PhD thesis on its own. However, if a model type provides functions for input relevance determination the toolbox can leverage this. For example, the LS-SVM model available in the toolbox supports Automatic Relevance Determination (ARD). This means that if you use the SUMO Toolbox to generate an LS-SVM model, you can call the function ''ARD()'' on the model and it will give you a list of the inputs it thinks are most important.

=== Should I use a Matlab script or a shell script for interfacing with my simulation code? ===

When you want to link SUMO with an external simulation engine (ADS Momentum, SPECTRE, FEBIO, SWAT, ...) you need a [http://en.wikipedia.org/wiki/Shell_script shell script] (or executable) that can take the requested points from SUMO, setup the simulation engine (e.g., set necessary input files), calls the simulator for all the requested points, reads the output (e.g., one or more output files), and returns the results to SUMO (see [[Interfacing with the toolbox]]).

Which one you choose (matlab script + [[Config:SampleEvaluator#matlab|Matlab Sample Evaluator]], or shell script/executable with [[Config:SampleEvaluator#local|Local Sample Evaluator]] is basically a matter of preference, take whatever is easiest for you.

HOWEVER, there is one important consideration: Matlab does not support threads so this means that if you use a matlab script to interface with the simulation engine, simulations and modeling will happen sequentially, NOT in parallel. This means the modeling code will sit around waiting, doing nothing, until the simulation(s) have finished. If your simulation code takes a long time to run this is not very efficient.

On the other hand, using a shell script/executable, does allow the modeling and simulation to occur in parallel (at least if you wrote your interface script in such a way that it can be run multiple times in parallel, i.e., no shared global directories or variables that can cause [http://en.wikipedia.org/wiki/Race_condition race conditions]).

As a sidenote, note that if you already put work into a Matlab script, it is still possible to use a shell script, by writing a shell script that starts Matlab (using -nodisplay or -nojvm options), executes your script (using the -r option), and exits Matlab again. Of course it is not very elegant and adds some overhead but depending on your situation it may be worth it.

=== How can I look at the internal structure of a SUMO model ===

See [[Using_a_model#Available_methods]].

=== Is there any design documentation available? ===

An in depth overview of the rationale and philosophy, including a treatment of the software architecture underlying the SUMO Toolbox is available in the form of a PhD dissertation. A copy of this dissertation [http://www.sumo.intec.ugent.be/?q=system/files/2010_04_PhD_DirkGorissen.pdf is available here].

== Troubleshooting ==

=== I have a problem and I want to report it ===

See : [[Reporting problems]].

===I am getting a java out of memory error, what happened?===
Datasets are loaded through java. This means that the java heap space is used for storing the data. If you try to load a huge dataset (> 50MB), you might experience problems with the maximum heap size. You can solve this by raising the heap size as described on the following webpage:
[http://www.mathworks.com/support/solutions/data/1-18I2C.html]

=== I sometimes get flat models when using rational functions ===

First make sure the model is indeed flat, and does not just appear so on the plot. You can verify this by looking at the output axis range and making sure it is within reasonable bounds. When there are poles in the model, the axis range is sometimes stretched to make it possible to plot the high values around the pole, causing the rest of the model to appear flat. If the model contains poles, refer to the next question for the solution.

The [[Config:AdaptiveModelBuilder#rational| RationalModel]] tries to do a least squares fit, based on which monomials are allowed in numerator and denominator. We have experienced that some models just find a flat model as the best least squares fit. There are two causes for this:

* The number of sample points is few, and the model parameters (as explained [[Model types explained#PolynomialModel|here]]) force the model to use only a very small set of degrees of freedom. The solution in this case is to increase the minimum percentage bound in the RationalFactory section of your configuration file: change the <code>"percentBounds"</code> option to <code>"60,100"</code>, <code>"80,100"</code>, or even <code>"100,100"</code>. A setting of <code>"100,100"</code> will force the polynomial models to always exactly interpolate. However, note that this does not scale very well with the number of samples (to counter this you can set <code>"maxDegrees"</code>). If, after increasing the <code>"percentBounds"</code> you still get weird, spiky, models you simply need more samples or you should switch to a different model type.
* Another possibility is that given a set of monomial degrees, the flat function is just the best possible least squares fit. In that case you simply need to wait for more samples.
* The measure you are using is not accurately estimating the true error, try a different measure or error function. Note that a maximum relative error is dangerous to use since a the 0-function (= a flat model) has a lower maximum relative error than a function which overshoots the true behavior in some places but is otherwise correct.

=== When using rational functions I sometimes get 'spikes' (poles) in my model ===

When the denominator polynomial of a rational model has zeros inside the domain, the model will tend to infinity near these points. In most cases these models will only be recognized as being `the best' for a short period of time. As more samples get selected these models get replaced by better ones and the spikes should disappear.

So, it is possible that a rational model with 'spikes' (caused by poles inside the domain) will be selected as best model. This may or may not be an issue, depending on what you want to use the model for. If it doesn't matter that the model is very inaccurate at one particular, small spot (near the pole), you can use the model with the pole and it should perform properly.

However, if the model should have a reasonable error on the entire domain, several methods are available to reduce the chance of getting poles or remove the possibility altogether. The possible solutions are:

* Simply wait for more data, usually spikes disappear (but not always).
* Lower the maximum of the <code>"percentBounds"</code> option in the RationalFactory section of your configuration file. For example, say you have 500 data points and if the maximum of the <code>"percentBounds"</code> option is set to 100 percent it means the degrees of the polynomials in the rational function can go up to 500. If you set the maximum of the <code>"percentBounds"</code> option to 10, on the other hand, the maximum degree is set at 50 (= 10 percent of 500). You can also use the <code>"maxDegrees"</code> option to set an absolute bound.
* If you roughly know the output range your data should have, an easy way to eliminate poles is to use the [[Measures#MinMax| MinMax]] [[Measures| Measure]] together with your current measure ([[Measures#CrossValidation| CrossValidation]] by default). This will cause models whose response falls outside the min-max bounds to be penalized extra, thus spikes should disappear.
* Use a different model type (RBF, ANN, SVM,...), as spikes are a typical problem of rational functions.
* Increase the population size if using the genetic version
* Try using the [[SampleSelector#RationalPoleSuppressionSampleSelector| RationalPoleSuppressionSampleSelector]], it was designed to get rid of this problem more quickly, but it only selects one sample at the time.

However, these solutions may not still not suffice in some cases. The underlying reason is that the order selection algorithm contains quite a lot of randomness, making it prone to over-fitting. This issue is being worked on but will take some time. Automatic order selection is not an easy problem

=== There is no noise in my data yet the rational functions don't interpolate ===

[[FAQ#I sometimes get flat models when using rational functions |see this question]].

=== When loading a model from disk I get "Warning: Class ':all:' is an unknown object class. Object 'model' of this class has been converted to a structure." ===

You are trying to load a model file without the SUMO Toolbox in your Matlab path. Make sure the toolbox is in your Matlab path.

In short: Start Matlab, run <code><SUMO-Toolbox-directory>/startup.m</code> (to ensure the toolbox is in your path) and then try to load your model.

=== When running the SUMO Toolbox you get an error like "No component with id 'annpso' of type 'adaptive model builder' found in config file." ===

This means you have specified to use a component with a certain id (in this case an AdaptiveModelBuilder component with id 'annpso') but a component with that id does not exist further down in the configuration file (in this particular case 'annpso' does not exist but 'anngenetic' or 'ann' does, as a quick search through the configuration file will show). So make sure you only declare components which have a definition lower down. So see which components are available, simply scroll down the configuration file and see which id's are specified. Please also refer to the [[Toolbox configuration#Declarations and Definitions | Declarations and Definitions]] page.

=== When using NANN models I sometimes get "Runtime error in matrix library, Choldc failed. Matrix not positive definite" ===

This is a problem in the mex implementation of the [http://www.iau.dtu.dk/research/control/nnsysid.html NNSYSID] toolbox. Simply delete the mex files, the Matlab implementation will be used and this will not cause any problems.

=== When using FANN models I sometimes get "Invalid MEX-file createFann.mexa64, libfann.so.2: cannot open shared object file: No such file or directory." ===

This means Matlab cannot find the [http://leenissen.dk/fann/ FANN] library itself to link to dynamically. Make sure the FANN libraries (stored in src/matlab/contrib/fann/src/.libs/) are in your library path, e.g., on unix systems, make sure they are included in LD_LIBRARY_PATH.

=== Undeﬁned function or method ’createFann’ for input arguments of type ’double’. ===

See [[FAQ#When_using_FANN_models_I_sometimes_get_.22Invalid_MEX-file_createFann.mexa64.2C_libfann.so.2:_cannot_open_shared_object_file:_No_such_file_or_directory..22]]

=== When trying to use SVM models I get 'Error during fitness evaluation: Error using ==> svmtrain at 170, Group must be a vector' ===

You forgot to build the SVM mex files for your platform. For windows they are pre-compiled for you, on other systems you have to compile them yourself with the makefile.

=== When running the toolbox you get something like '??? Undefined variable "ibbt" or class "ibbt.sumo.config.ContextConfig.setRootDirectory"' ===

First see [[FAQ#What_is_the_relationship_between_Matlab_and_Java.3F | this FAQ entry]].

This means Matlab cannot find the needed Java classes. This typically means that you forgot to run 'startup' (to set the path correctly) before running the toolbox (using 'go'). So make sure you always run 'startup' before running 'go' and that both commands are always executed in the toolbox root directory.

If you did run 'startup' correctly and you are still getting an error, check that Java is properly enabled:

# typing 'usejava jvm' should return 1
# typing 's = java.lang.String', this should ''not'' give an error
# typing 'version('-java')' should return at least version 1.5.0

If (1) returns 0, then the jvm of your Matlab installation is not enabled. Check your Matlab installation or startup parameters (did you start Matlab with -nojvm?)
If (2) fails but (1) is ok, there is a very weird problem, check the Matlab documentation.
If (3) returns a version before 1.5.0 you will have to upgrade Matlab to a newer version or force Matlab to use a custom, newer, jvm (See the Matlab docs for how to do this).

=== You get errors related to ''gaoptimset'',''psoptimset'',''saoptimset'',''newff'' not being found or unknown ===

You are trying to use a component of the SUMO toolbox that requires a Matlab toolbox that you do not have. See the [[System requirements]] for more information.

=== After upgrading I get all kinds of weird errors or warnings when I run my XML files ===

See [[FAQ#How_do_I_upgrade_to_a_newer_version.3F]]

=== I get a warning about duplicate samples being selected, why is this? ===

Sometimes, in special circumstances, multiple sample selectors may select the same sample at the same time. Even though in most cases this is detected and avoided, it can still happen when multiple outputs are modelled in one run, and each output is sampled by a different sample selector. These sample selectors may then accidentally choose the same new sample location.

=== I sometimes see the error of the best model go up, shouldn't it decrease monotonically? ===

There is no short answer here, it depends on the situation. Below 'single objective' refers to the case where during the hyperparameter optimization (= the modeling iteration) combineOutputs=false, and there is only a single measure set to 'on'. The other cases are classified as 'multi objective'. See also [[Multi-Objective Modeling]].

# '''Sampling off'''
## ''Single objective'': the error should always decrease monotonically, you should never see it rise. If it does [[reporting problems|report it as a bug]]
## ''Multi objective'': There is a very small chance the error can temporarily decrease but it should be safe to ignore. In this case it is best to use a multi objective enabled modeling algorithm
# '''Sampling on'''
## ''Single objective'': inside each modeling iteration the error should always monotonically decrease. At each sampling iteration the best models are updated (to reflect the new data), thus there the best model score may increase, this is normal behavior(*). It is possible that the error increases for a short while, but as more samples come in it should decrease again. If this does not happen you are using a poor measure or poor hyperparameter optimization algorithm, or there is a problem with the modeling technique itself (e.g., clustering in the datapoints is causing numerical problems).
## ''Multi objective'': Combination of 1.2 and 2.1.

(*) This is normal if you are using a measure like cross validation that is less reliable on little data than on more data. However, in some cases you may wish to override this behavior if you are using a measure that is independent of the number of samples the model is trained with (e.g., a dense, external validation set). In this case you can force a monotonic decrease by setting the 'keepOldModels' option in the SUMO tag to true. Use with caution!

=== At the end of a run I get Undefined variable "ibbt" or class "ibbt.sumo.util.JpegImagesToMovie.createMovie" ===

This is normal, the warning printed out before the error explains why:

''[WARNING] jmf.jar not found in the java classpath, movie creation may not work! Did you install the SUMO extension pack? Alternatively you can install the java media framwork from java.sun.com''

By default, at the end of a run, the toolbox will try to generate a movie of all the intermediate model plots. To do this it requires the extension pack to be installed (you can download it from the SUMO lab website). So install the extension pack and you will no longer get the error. Alternatively you can simply set the "createMovie" option in the <SUMO> tag to "false".
So note that there is nothing to worry about, everything has run correctly, it is just the movie creation that is failing.

=== On startup I get the error "java.io.IOException: Couldn't get lock for output/SUMO-Toolbox.%g.%u.log" ===

This error means that SUMO is unable to create the log file. Check the output directory exists and has the correct permissions. If your output directory is on a shared (network) drive this could also cause problems. Also make sure you are running the toolbox (calling 'go') from the toolbox root directory, and not in some toolbox sub directory! This is very important.

If you still have problems you can override the default logfile name and location as follows:

In the <FileHandler> tag inside the <Logging> tag add the following option:

<code>
<Option key="Pattern" value="My_SUMO_Log_file.log"/>
</code>

This means that from now on the sumo log file will be saved as the file "My_SUMO_Log_file.log" in the SUMO root directory. You can use any path you like.
For more information about this option see [http://java.sun.com/j2se/1.4.2/docs/api/java/util/logging/FileHandler.html the FileHandler Javadoc].

=== The Toolbox crashes with "Too many open files" what should I do? ===

This is a known bug, see [[Known_bugs#Version_6.1]].

If this does not fix your problem then do the following:

On Windows try increasing the limit in windows as dictated by the error message. Also, when you get the error, use the fopen("all") command to see which files are open and send us the list of filenames. Then we can maybe further help you debug the problem. Even better would be to use the Process Explorer utility [http://technet.microsoft.com/en-us/sysinternals/bb896653.aspx available here]. When you get the error, dont shut down Matlab but start Process explorer and see which SUMO-Toolbox related files are open. If you then [[Reporting_problems|let us know]] we can further debug the problem.

On Linux again don't shut down Matlab but:

* open a new terminal window
* type:
<source lang="bash">
lsof > openFiles.txt
</source>
* Then [[Contact|send us]] the following information:
** the file openFiles.txt
** the exact Linux distribution you are using (Red Hat 10, CentOS 5, SUSE 11, etc).
** the output of
<source lang="bash">
uname -a ; df -T ; mount
</source>

As a temporary workaround you can try increasing the maximum number of open files ([http://www.linuxforums.org/forum/redhat-fedora-linux-help/64716-where-chnage-file-max-permanently.html see for example here]). We are currently debugging this issue.

In general: to be safe it is always best to do a SUMO run from a clean Matlab startup, especially if the run is important or may take a long time.

=== When using the LS-SVM models I get lots of warnings: "make sure lssvmFILE.x (lssvmFILE.exe) is in the current directory, change now to MATLAB implementation..." ===

The LS-SVMs have a C implementation and a Matlab implementation. If you dont have the compiled mex files it will use the matlab implementation and give a warning. But everything will work properly. To get rid of the warnings, compile the mex files [[Installation#Windows|as described here]], this can be done very easily. Or simply comment out the lines that produce the output in the lssvmlab directory in src/matlab/contrib.

=== I get an error "Undefined function or method 'trainlssvm' for input arguments of type 'cell'" ===

You most likely forgot to [[Installation#Extension_pack|install the extension pack]].

=== When running the SUMO-Toolbox under Linux, the [http://en.wikipedia.org/wiki/X_Window_System X server] suddenly restarts and I am logged out of my session ===

Note that in Linux there is an explicit difference between the [http://en.wikipedia.org/wiki/Linux_kernel kernel] and the [http://en.wikipedia.org/wiki/X_Window_System X display server]. If the kernel crashes or panics your system completely freezes (you have to reset manually) or your computer does a full reboot. Luckily this is very rare. However, if you display server (X) crashes or restarts it means your operating system is still running fine, its just that you have to log in again since your graphical session has terminated. The FAQ entry is only for the latter. If you find your kernel is panicing or freezing, that is a more fundamental problem and you should contact your system admin.

So what happens is that after a few seconds when the toolbox wants to plot the first model [http://en.wikipedia.org/wiki/X_Window_System X] crashes and you are suddenly presented with a login screen. The problem is not due to SUMO but rather to the Matlab - Display server interaction.

What you should first do is set plotModels to false in the [[Config:ContextConfig]] tag, run again and see if the problem occurs again. If it does please [[Reporting_problems| report it]]. If the problem does not occur you can then try the following:

* Log in as root (or use [http://en.wikipedia.org/wiki/Sudo sudo])
* Edit the following configuration file using a text editor (pico, nano, vi, kwrite, gedit,...)

<source lang="bash">
/etc/X11/xorg.conf
</source>

Note: the exact location of the xorg.conf file may vary on your system.

* Look for the following line:

<source lang="bash">
Load "glx"
</source>

* Comment it out by replacing it by:

<source lang="bash">
# Load "glx"
</source>

* Then save the file, restart your X server (if you do not know how to do this simply reboot your computer)
* Log in again, and try running the toolbox (making sure plotModels is set to true again). It should now work. If it still does not please [[Reporting_problems| report it]].

Note:
* this is just an empirical workaround, if you have a better idea please [[Contact|let us know]]
* if you wish to debug further yourself please check the Xorg log files and those in /var/log
* another possible workaround is to start matlab with the "-nodisplay" option. That could work as well.

=== I get the error "Failed to close Matlab pool cleanly, error is Too many output arguments" ===

This happens if you run the toolbox on Matlab version 2008a and you have the parallel computing toolbox installed. You can simply ignore this error message, it does not cause any problems. If you want to use SUMO with the parallel computing toolbox you will need Matlab 2008b.

=== The toolbox seems to keep on running forever, when or how will it stop? ===

The toolbox will keep on generating models and selecting data until one of the termination criteria has been reached. It is up to ''you'' to choose these targets carefully, so how low the toolbox runs simply depends on what targets you choose. Please see [[Running#Understanding_the_control_flow]].

Of course choosing a-priori targets up front is not always easy and there is no real solution for this, except thinking well about what type of model you want (see [[FAQ#I_dont_like_the_final_model_generated_by_SUMO_how_do_I_improve_it.3F]]). In doubt you can always use a small value (or 0) and then simply quit the running toolbox using Ctrl-C when you think its been enough.

While one could implement fancy, automatic stopping algorithms, their actual benefit is questionable.

FAQ

2014-03-13T09:20:55Z

Javdrher: /* What about dynamical, time dependent data? */

== General ==

=== What is a global surrogate model? ===

A global [http://en.wikipedia.org/wiki/Surrogate_model surrogate model] is a mathematical model that mimics the behavior of a computationally expensive simulation code over '''the complete parameter space''' as accurately as possible, using as little data points as possible. So note that optimization is not the primary goal, although it can be done as a post-processing step. Global surrogate models are useful for:

* design space exploration, to get a ''feel'' of how the different parameters behave
* sensitivity analysis
* ''what-if'' analysis
* prototyping
* visualization
* ...

In addition they are a cheap way to model large scale systems, multiple global surrogate models can be chained together in a model cascade.

See also the [[About]] page.

=== What about surrogate driven optimization? ===

When coining the term '''surrogate driven optimization''' most people associate it with trust-region strategies and simple polynomial models. These frameworks first construct a local surrogate which is optimized to find an optimum. Afterwards, a move limit strategy decides how the local surrogate is scaled and/or moved through the input space. Subsequently the surrogate is rebuild and optimized. I.e. the surrogate zooms in to the global optimum. For instance the [http://www.cs.sandia.gov/DAKOTA/ DAKOTA] Toolbox implements such strategies where the surrogate construction is separated from optimization.

Such a framework was earlier implemented in the SUMO Toolbox but was deprecated as it didn't fit the philosophy and design of the toolbox.

Instead another, equally powerful, approach was taken. The current optimization framework is in fact a sampling selection strategy that balances local and global search. In other words, it balances between exploring the input space and exploiting the information the surrogate gives us.

A configuration example can be found [[Config:SampleSelector#expectedImprovement|here]].

=== What is (adaptive) sampling? Why is it used? ===

In classical Design of Experiments you need to specify the design of your experiment up-front. Or in other words, you have to say up-front how many data points you need and how they should be distributed. Two examples are Central Composite Designs and Latin Hypercube designs. However, if your data is expensive to generate (e.g., an expensive simulation code) it is not clear how many points are needed up-front. Instead data points are selected adaptively, only a couple at a time. This process of incrementally selecting new data points in regions that are the most interesting is called adaptive sampling, sequential design, or active learning. Of course the sampling process needs to start from somewhere so the very first set of points is selected based on a fixed, classic experimental design. See also [[Running#Understanding_the_control_flow]].
SUMO provides a number of different sampling algorithms: [[Config:SequentialDesign|SequentialDesign]]

Of course sometimes you dont want to do sampling. For example if you have a fixed dataset you just want to load all the data in one go and model that. For how to do this see [[FAQ#How_do_I_turn_off_adaptive_sampling_.28run_the_toolbox_for_a_fixed_set_of_samples.29.3F]].

=== What about dynamical, time dependent data? ===

The original design and purpose was to tackle static input-output systems, where there is no memory. Just a complex mapping that must be learnt and approximated. Of course you can take a fixed time interval and apply the toolbox but that typically is not a desired solution. Usually you are interested in time series prediction, e.g., given a set of output values from time t=0 to t=k, predict what happens at time t=k+1,k+2,...

The toolbox was originally not intended for this purpose. However, it is quite easy to add support for recurrent models. Automatic generation of dynamical models would involve adding a new model type (just like you would add a new regression technique) or require adapting an existing one. For example it would not be too much work to adapt the ANN or SVM models to support dynamic problems. The only extra work besides that would be to add a new [[Measures|Measure]] that can evaluate the fidelity of the models' prediction.

Naturally though, you would be unable to use sample selection (since it makes no sense in those problems). Unless of course there is a specialized need for it. In that case you would add a new [[Config:SequentialDesign|SequentialDesign]].

For more information on this topic [[Contact]] us.

=== What about classification problems? ===

The main focus of the SUMO Toolbox is on regression/function approximation. However, the framework for hyperparameter optimization, model selection, etc. can also be used for classification. Starting from version 6.3 a demo file is included in the distribution that shows how this works on the well known two spiral test problem. It is possible to specify a run as a classification problem by setting the 'classificationMode' and 'numberOfClasses' option in ContextConfig in the configuration file. Classification models from WEKA are also available in SUMO. Please refer to the default configuration file for the explanation on usage of WEKA model types available through SUMO. The LOLA-Voronoi sample selection scheme also supports classification, and its usage is documented in the default configuration file as well.

=== Does SUMO support discrete inputs/outputs ===

Not, if you mean in a smart way. There is a way to flag an input/output as discrete but it is not used anywhere. It is on the wishlist but we have not been able to get to it yet. Discrete inputs are just handled as if they were continuous. Depending on how many levels there are and if there is an ordering this may work ok or not work at all. You could of course add your own model type that can handle these :) As for discrete outputs see [[FAQ#What_about_classification_problems.3F]].

=== Can the toolbox drive my simulation code directly? ===

Yes it can. See the [[Interfacing with the toolbox]] page.

=== What is the difference between the M3-Toolbox and the SUMO-Toolbox? ===

The SUMO toolbox is a complete, feature-full framework for automatically generating approximation models and performing adaptive sampling. In contrast, the M3-Toolbox was more of a proof-of-principle.

=== What happened to the M3-Toolbox? ===

The M3 Toolbox project has been discontinued (Fall 2007) and superseded by the SUMO Toolbox. Please contact tom.dhaene@ua.ac.be for any inquiries and requests about the M3 Toolbox.

=== How can I stay up to date with the latest news? ===

To stay up to date with the latest news and releases, we also recommend subscribing to our newsletter [http://www.sumo.intec.ugent.be here]. Traffic will be kept to a minimum (1 message every 2-3 months) and you can unsubscribe at any time.

You can also follow our blog: [http://sumolab.blogspot.com/ http://sumolab.blogspot.com/].

=== What is the roadmap for the future? ===

There is no explicit roadmap since much depends on where our research leads us, what feedback we get, which problems we are working on, etc. However, to get an idea of features to come you can always check the [[Whats new]] page.

You can also follow our blog: [http://sumolab.blogspot.com/ http://sumolab.blogspot.com/].

=== Will there be an R/Scilab/Octave/Sage/.. version? ===

At the start of the project we considered moving from Matlab to one of the available open source alternatives. However, after much discussion we decided against this for several reasons, including:

* Existing experience and know-how of the development team
* The widespread use of the Matlab platform in the target application domains
* The quality and amount of available Matlab documentation
* The quality and number of Matlab toolboxes
* Support for object orientation (inheritance, polymorphism, etc.)
* Many well documented interfacing options (especially the seamless integration with Java)

Matlab, as a proprietary platform, definitely has its problems and deficiencies but the number of advanced algorithms and available toolboxes make it a very attractive platform. Equally important is the fact that every function is properly documented, tested, and includes examples, tutorials, and in some cases GUI tools. A lot of things would have been a lot harder and/or time consuming to implement on one of the other platforms. Add to that the fact that many engineers (particularly in aerospace) already use Matlab quite heavily. Thus given our situation, goals, and resources at the time, Matlab was the best choice for us.

The other platforms remain on our radar however, and we do look into them from time to time. Though, with our limited resources porting to one of those platforms is not (yet) cost effective.

=== What are collaboration options? ===

We will gladly help out with any SUMO-Toolbox related questions or problems. However, since we are a university research group the most interesting goal for us is to work towards some joint publication (e.g., we can help with the modeling of your problem). Alternatively, it is always nice if we could use your data/problem (fully referenced and/or anonymized if necessary of course) as an example application during a conference presentation or in a PhD thesis.

The most interesting case is if your problem involves sample selection and modeling. This means you have some simulation code or script to drive and you want an accurate model while minimizing the number of data points. In this case, in order for us to optimally help you it would be easiest if we could run your simulation code (or script) locally or access it remotely. Else its difficult to give good recommendations about what settings to use.

If this is not possible (e.g., expensive, proprietary or secret modeling code) or if your problem does not involve sample selection, you can send us a fixed data set that is representative of your problem. Again, this may be fully anonymized and will be kept confidential of course.

In either case (code or dataset) remember:

* the data file should be an ASCII file in column format (each row containing one data point) (see also [[Interfacing_with_the_toolbox]])
* include a short description of your data:
** number of inputs and number of outputs
** the range of each input (or scaled to [-1 1] if you do not wish to disclose this)
** if the outputs are real or complex valued
** how noisy the data is or if it is completely deterministic (computer simulation) (please also see: [[FAQ#My_data_contains_noise_can_the_SUMO-Toolbox_help_me.3F]]).
** if possible the expected range of each output (or scaled if you do not wish to disclose this)
** if possible the names of each input/output + a short description of what they mean
** any further insight you have about the data, expected behavior, expected importance of each input, etc.

If you have any further questions or comments related to this please [[Contact]] us.

=== Can you help me model my problem? ===

Please see the previous question: [[FAQ#What_are_collaboration_options.3F]]

== Installation and Configuration ==

=== What is the relationship between Matlab and Java? ===

Many people do not know this, but your Matlab installation automatically includes a Java virtual machine. By default, Matlab seamlessly integrates with Java, allowing you to create Java objects from the command line (e.g., 's = java.lang.String'). It is possible to disable java support but in order to use the SUMO Toolbox it should not be. To check if Java is enabled you can use the 'usejava' command.

=== What is Java, why do I need it, do I have to install it, etc. ? ===

The short answer is: no, dont worry about it. The long answer is: Some of the code of the SUMO Toolbox is written in [http://en.wikipedia.org/wiki/Java_(programming_language) Java], since it makes a lot more sense in many situations and is a proper programming language instead of a scripting language like Matlab. Since Matlab automatically includes a JVM to run Java code there is nothing you need to do or worry about (see the previous FAQ entry). Unless its not working of course, in that case see [[FAQ#When_running_the_toolbox_you_get_something_like_.27.3F.3F.3F_Undefined_variable_.22ibbt.22_or_class_.22ibbt.sumo.config.ContextConfig.setRootDirectory.22.27]].

=== What is XML? ===

XML stands for eXtensible Markup Language and is related to HTML (= the stuff web pages are written in). The first thing you have to understand is that '''does not do anything'''. Honest. Many engineers are not used to it and think it is some complicated computer programming language-stuff-thingy. This is of course not the case (we ignore some of the fancy stuff you can do with it for now). XML is a markup language meaning, it provides some rules how you can annotate or structure existing text.

The way SUMO uses XML is really simple and there is not much to understand (for more information on how SUMO uses XML go this [[Config:ToolboxConfiguration#Interpreting_the_configuration_file|page]]).

First some simple terminology. Take the following example:

<source lang="xml">
<Foo attr="bar">bla bla bla</Foo>
</source>

Here we have '''a tag''' called ''Foo'' containing text ''bla bla bla''. The tag Foo also has an '''attribute''' ''attr'' with value ''bar''. '<Foo>' is what we call the '''opening tag''', and '</Foo>' is the '''closing tag'''. Each time you open a tag you must close it again. How you name the tags or attributes it totally up to you, you choose :)

Lets take a more interesting example. Here we have used XML to represent information about a receipe for pancakes:

<source lang="xml">
<recipe category="dessert">
<title>Pancakes</title>
<author>sumo@intec.ugent.be</author>
<date>Wed, 14 Jun 95</date>
<description>
Good old fashioned pancakes.
</description>
<ingredients>
<item>
<amount>3</amount>
<type>eggs</type>
</item>

<item>
<amount>0.5 tablespoon</amount>
<type>salt</type>
</item>
...
</ingredients>
<preparation>
...
</preparation>
</recipe>
</source>

So basically, you see that XML is just a way to structure, order, and group information. Thats it! So SUMO basically uses it to store and structure configuration options. And this works well due to the nice hierarchical nature of XML.

If you understand this there is nothing else to it in order to be able to understand the SUMO configuration files. If you need more information see the tutorial here: [http://www.w3schools.com/XML/xml_whatis.asp http://www.w3schools.com/XML/xml_whatis.asp]. You can also have a look at the wikipedia page here: [http://en.wikipedia.org/wiki/XML http://en.wikipedia.org/wiki/XML]

=== Why does SUMO use XML? ===

XML is the de facto standard way of structuring information. This ranges from spreadsheet files (Microsoft Excel for example), to configuration data, to scientific data, ... There are even whole database systems based solely on XML. So basically, its an intuitive way to structure data and it is used everywhere. This makes that there are a very large number of libraries and programming languages available that can parse, and handle XML easily. That means less work for the programmer. Then of course there is stuff like XSLT, XQuery, etc that makes life even easier.
So basically, it would not make sense for SUMO to use any other format :). For more information on how SUMO uses XML go this [[Config:ToolboxConfiguration#Interpreting_the_configuration_file|page]].

=== I get an error that SUMO is not yet activated ===

Make sure you installed the activation file that was mailed to you as is explained in the [[Installation]] instructions. Also double check your system meets the [[System requirements]] and that [http://www.sumowiki.intec.ugent.be/index.php/FAQ#When_running_the_toolbox_you_get_something_like_.27.3F.3F.3F_Undefined_variable_.22ibbt.22_or_class_.22ibbt.sumo.config.ContextConfig.setRootDirectory.22.27|java java is enabled]. To fully verify that the activation file installation is correct ensure that the file ContextConfig.class is present in the directory ''<SUMO installation directory>/bin/java/ibbt/sumo/config''.

Please note that more flexible research licenses are available if it is possible to [[FAQ#What_are_collaboration_options.3F|collaborate in any way]].

== Upgrading ==

=== How do I upgrade to a newer version? ===

Delete your old <code><SUMO-Toolbox-directory></code> completely and replace it by the new one. Install the new activation file / extension pack as before (see [[Installation]]), start Matlab and make sure the default run works. To port your old configuration files to the new version: make a copy of default.xml (from the new version) and copy over your custom changes (from the old version) one by one. This should prevent any weirdness if the XML structure has changed between releases.

If you had a valid activation file for the previous version, just [[Contact]] us (giving your SUMOlab website username) and we will send you a new activation file. Note that to update an activation file you must first unzip a copy of the toolbox to a new directory and install the activation file as if it was the very first time. Upgrading of an activation file without performing a new toolbox install is (unfortunately) not (yet) supported.

== Using ==

=== I have no idea how to use the toolbox, what should I do? ===

See: [[Running#Getting_started]]

=== I want to try one of the different examples ===

See [[Running#Running_different_examples]].

=== I want to model my own problem ===

See : [[Adding an example]].

=== I want to contribute some data/patch/documentation/... ===

See : [[Contributing]].

=== How do I interface with the SUMO Toolbox? ===

See : [[Interfacing with the toolbox]].

=== What configuration options (model type, sample selection algorithm, ...) should I use for my problem? ===

See [[General_guidelines]].

=== Ok, I generated a model, what can I do with it? ===

See: [[Using a model]].

=== How can I share a model created by the SUMO Toolbox? ===

See : [[Using a model#Model_portability| Model portability]].

=== I dont like the final model generated by SUMO how do I improve it? ===

Before you start the modeling you should really ask youself this question: ''What properties do I want to see in the final model?'' You have to think about what for you constitutes a good model and what constitutes a poor model. Then you should rank those properties depending on how important you find them. Examples are:

* accuracy in the training data
** is it important that the error in the training data is exactly 0, or do you prefer some smoothing
* accuracy outside the training data
** this is the validation or test error, how important is proper generalization (usually this is very important)
* what does accuracy mean to you? a low maximum error, a low average error, both, ...
* smoothness
** should your model be perfectly smooth or is it acceptable that you have a few small ripples here and there for example
* are some regions of the response more important than others?
** for example you may want to be certain that the minima/maxima are captured very accurately but everything in between is less important
* are there particular special features that your model should have
** for example, capture underlying poles or discontinuities correctly
* extrapolation capability
* ...

It is important to note that often these criteria may be conflicting. The classical example is fitting noisy data: the lower your training error the higher your testing error. A natural approach is to combine multiple criteria, see [[Multi-Objective Modeling]].

Once you have decided on a set of requirements the question is then, can the SUMO-Toolbox produce a model that meets them? In SUMO model generation is driven by one or more [[Measures]]. So you should choose the combination of [[Measures]] that most closely match your requirements. Of course we can not provide a Measure for every single property, but it is very straightforward to [[Add_Measure|add your own Measure]].

Now, lets say you have chosen what you think are the best Measures but you are still not happy with the final model. Reasons could be:

* you need more modeling iterations or you need to build more models per iteration (see [[Running#Understanding_the_control_flow]]). This will result in a more extensive search of the model parameter space, but will take longer to run.
* you should switch to a different model parameter optimization algorithm (e.g., for example instead of the Pattern Search variant, try the Genetic Algorithm variant of your AdaptiveModelBuilder.)
* the model type you are using is not ideally suited to your data
* there simply is not enough data, use a larger initial design or perform more sampling iterations to get more information per dimension
* maybe the sample distribution is causing troubles for your model (e.g., Kriging can have problems with clustered data). In that case it could be worthwhile to choose a different sample selection algorithm.
* the range of your response variable is not ideal (for example, neural networks have trouble modeling data if the range of the outputs is very very small)

You may also refer to the following [[General_guidelines]]. Finally, of course it may be that your problem is simply a very difficult one and does not approximate well. But, still you should at least get something satisfactory.

If you are having these kinds of problems, please [[Reporting_problems|let us know]] and we will gladly help out.

=== My data contains noise can the SUMO-Toolbox help me? ===

The original purpose of the SUMO-Toolbox was for it to be used in conjunction with computer simulations. Since these are fully deterministic you do not have to worry about noise in the data and all the problems it causes. However, the methods in the toolbox are general fitting methods that work on noisy data as well. So yes, the toolbox can be used with noisy data, but you will just have to be more careful about how you apply the methods and how you perform model selection. Its only when you use the toolbox with a noisy simulation engine that a few special options may need to be set. In that case [[Contact]] us for more information.

Note though, that the toolbox is not a statistical package, if you have noisy data and you need noise estimation algorithms, kernel smoothing algorithms, etc. you should look towards other tools.

=== What is the difference between a ModelBuilder and a ModelFactory? ===

See [[Add Model Type]].

=== Why are the Neural Networks so slow? ===

The ANN models are an extremely powerful model type that give very good results in many problems. However, they are quite slow to use. There are some things you can do:

* use trainlm or trainscg instead of the default training function trainbr. trainbr gives very good, smooth results but is slower to use. If results with trainlm are not good enough, try using msereg as a performance function.
* try setting the training goal (= the SSE to reach during training) to a small positive number (e.g., 1e-5) instead of 0.
* check that the output range of your problem is not very small. If your response data lies between 10e-5 and 10e-9 for example it will be very hard for the neural net to learn it. In that case rescale your data to a more sane range.
* switch from ANN to one of the other neural network modelers: fanngenetic or nanngenetic. These are a lot faster than the default backend based on the [http://www.mathworks.com/products/neuralnet/ Matlab Neural Network Toolbox]. However, the accuracy is usually not as good.
* If you are using [[Measures#CrossValidation| CrossValidation]] try to switch to a different measure since CrossValidation is very expensive to use. CrossValidation is used by default if you have not defined a [[Measures| measure]] yourself. When using one of the neural network model types, try to use a different measure if you can. For example, our tests have shown that minimizing the sum of [[Measures#SampleError| SampleError]] and [[Measures#LRMMeasure| LRMMeasure]] can give equal or even better results than CrossValidation, while being much cheaper (see [[Multi-Objective Modeling]] for how to combine multiple measures). See also the comments in <code>default.xml</code> for examples.
* Finally, as with any model type things will slow down if you have many dimensions or very large amounts of data. If that is the case, try some dimensionality reduction or subsampling techniques.

See also [[FAQ#How_can_I_make_the_toolbox_run_faster.3F]]

=== How can I make the toolbox run faster? ===

There are a number of things you can do to speed things up. These are listed below. Remember though that the main reason the toolbox may seem to be slow is due to the many models being built as part of the hyperparameter optimization. Please make sure you fully understand the [[Running#Understanding_the_control_flow|control flow described here]] before trying more advanced options.

* First of all check that your virus scanner is not interfering with Matlab. If McAfee or any other program wants to scan every file SUMO generates this really slows things down and your computer becomes unusable.

* Turn off the plotting of models in [[Config:ContextConfig#PlotOptions| ContextConfig]], you can always generate plots from the saved mat files

* This is an important one. For most model builders there is an option "maxFunEals", "maxIterations", or equivalent. Change this value to change the maximum number of models built between 2 sampling iterations. The higher this number, the slower, but the better the models ''may'' be. Equivalently, for the Genetic model builders reduce the population size and the number of generations.

* If you are using [[Measures#CrossValidation]] see if you can avoid it and use one of the other measures or a combination of measures (see [[Multi-Objective Modeling]]

* If you are using a very dense [[Measures#ValidationSet]] as your Measure, this means that every single model will be evaluated on that data set. For some models like RBF, Kriging, SVM, this can slow things down.

* Disable some, or even all of the [[Config:ContextConfig#Profiling| profilers]] or disable the output handlers that draw charts. For example, you might use the following configuration for the profilers:

<source lang="xml">
<Profiling>
<Profiler name=".*share.*|.*ensemble.*|.*Level.*" enabled="true">
<Output type="toImage"/>
<Output type="toFile"/>
</Profiler>

<Profiler name=".*" enabled="true">
<Output type="toFile"/>
</Profiler>
</Profiling>
</source>

The ".*" means match any one or more characters ([http://java.sun.com/j2se/1.5.0/docs/api/java/util/regex/Pattern.html see here for the full list of supported wildcards]). Thus in this example all the profilers that have "share", "ensemble", or "Level" in their name shoud be enabled and should be saved as a text file (toFile) AND as an image file (toImage). All the other profilers should be saved just to file. The idea is to only save to image what you want as an image since image generation is expensive. If you do this or switch off image generation completely you will see everything run much faster.

* Decrease the logging granularity, a log level of FINE (the default is FINEST or ALL) is more then granular enough. Setting it to FINE, INFO, or even WARNING should speed things up.

* If you have a multi-core/multi-cpu machine:
** if you have the Matlab Parallel Computing Toolbox, try setting the parallelMode option to true in [[Config:ContextConfig]]. Now all model training occurs in parallel. This may give unexpected errors in some cases so beware when using.
** if you are using a native executable or script as the sample evaluator set the threadCount variable in [[Config:SampleEvaluator#LocalSampleEvaluator| LocalSampleEvaluator]] equal to the number of cores/CPUs (only do this if it is ok to start multiple instances of your simulation script in parallel!)

* Dont use the Min-Max measure, it can slow things down. See also [[FAQ#How_do_I_force_the_output_of_the_model_to_lie_in_a_certain_range]]

* If you are using neural networks see [[FAQ#Why_are_the_Neural_Networks_so_slow.3F]]

* If you are having problems with very slow or seemingly hanging runs:
** Do a run inside the [http://www.mathworks.com/access/helpdesk/help/techdoc/index.html?/access/helpdesk/help/techdoc/matlab_env/f9-17018.html&http://www.google.be/search?client=firefox-a&rls=org.mozilla%3Aen-US%3Aofficial&channel=s&hl=nl&q=matlab+profiler&meta=&btnG=Google+zoeken Matlab profiler] and see where most time is spent.

** Monitor CPU and physical/virtual memory usage while the SUMO toolbox is running and see if you notice anything strange.

* Also note that by default Matlab only allocates about 117 MB memory space for the Java Virtual Machine. If you would like to increase this limit (which you should) please follow the instructions [http://www.mathworks.com/support/solutions/data/1-18I2C.html?solution=1-18I2C here]. See also the general memory instructions [http://www.mathworks.com/support/tech-notes/1100/1106.html here].

To check if your SUMO run has hanged, monitor your log file (with the level set at least to FINE). If you see no changes for about 30 minutes the toolbox will probably have stalled. [[Reporting problems| report the problems here]].

Such problems are hard to identify and fix so it is best to work towards a reproducible test case if you think you found a performance or scalability issue.

=== How do I build models with more than one output ===

Sometimes you have multiple responses that you want to model at once. See [[Running#Models_with_multiple_outputs]]

=== How do I turn off adaptive sampling (run the toolbox for a fixed set of samples)? ===

See : [[Adaptive Modeling Mode]].

=== How do I change the error function (relative error, RMSE, ...)? ===

The [[Measures| <Measure>]] tag specifies the algorithm to use to assign models a score, e.g., [[Measures#CrossValidation| CrossValidation]]. It is also possible to specify which '''error function''' to use, in the measure. The default error function is '<code>rootRelativeSquareError</code>'.

Say you want to use [[Measures#CrossValidation| CrossValidation]] with the maximum absolute error, then you would put:

<source lang="xml">
<Measure type="CrossValidation" target="0.001" errorFcn="maxAbsoluteError"/>
</source>

On the other hand, if you wanted to use the [[Measures#ValidationSet| ValidationSet]] measure with a relative root-mean-square error you would put:

<source lang="xml">
<Measure type="ValidationSet" target="0.001" errorFcn="relativeRms"/>
</source>

The default error function is '<code>rootRelativeSquareError</code>'. These error functions can be found in the <code>src/matlab/tools/errorFunctions</code> directory. You are free to modify them and add your own. Remember that the choice of error function is very important! Make sure you think well about it. Also see [[Multi-Objective Modeling]].

=== How do I enable more profilers? ===

Go to the [[Config:ContextConfig#Profiling| <Profiling>]] tag and put <code>"<nowiki>.*</nowiki>"</code> as the regular expression. See also the next question.

=== What regular expressions can I use to filter profilers? ===

See the syntax [http://java.sun.com/j2se/1.5.0/docs/api/java/util/regex/Pattern.html here].

=== How can I ensure deterministic results? ===

See : [[Random state]].

=== How do I get a simple closed-form model (symbolic expression)? ===

See : [[Using a model]].

=== How do I enable the Heterogenous evolution to automatically select the best model type? ===

Simply use the [[Config:AdaptiveModelBuilder#heterogenetic| heterogenetic modelbuilder]] as you would any other.

=== What is the combineOutputs option? ===

See [[Running#Models_with_multiple_outputs]]

=== What error function should I use? ===

The default error function is the Root Relative Square Error (RRSE). On the other hand meanRelativeError may be more intuitive but in that case you have to be careful if you have function values close to zero since in that case the relative error explodes or even gives infinity. You could also use one of the combined relative error functions (contain a +1 in the denominator to account for small values) but then you get something between a relative and absolute error (=> hard to interpret).

So to be sure an absolute error seems the safest bet (like the RMSE), however in that case you have to come up with sensible accuracy targets and realize that you will build models that try to fit the regions of high absolute value better than the low ones.

Picking an error function is a very tricky business and many people do not realize this. Which one is best for you and what targets you use ultimately depends on your application and on what kind of model you want. There is no general answer.

A recommended read is [http://www.springerlink.com/content/24104526223221u3/ is this paper]. See also the page on [[Multi-Objective Modeling]].

=== I just want to generate an initial design (no sampling, no modeling) ===

Do a regular SUMO run, except set the 'maxModelingIterations' in the SUMO tag to 0. The resulting run will only generate (and evaluate) the initial design and save it to samples.txt in the output directory.

=== How do I start a run with the samples of of a previous run, or with a custom initial design? ===

Use a Dataset design component, for example:

<source lang="xml">
<InitialDesign type="DatasetDesign">
<Option key="file" value="/path/to/the/file/containing/the/points.txt"/>
</InitialDesign>
</source>

The points of a previous run can be found in the samples.txt file in the output directory of the run you want to continue.

As a sidenote, remark you can start the toolbox with *data points* of a previous run, but not with the *models* of a previous run.

=== What is a level plot? ===

A level plot is a plot that shows how the error histogram changes as the best model improves. An example is:
<gallery>
Image:levelplot.png
</gallery>
Level plots only work if you have a separate dataset (test set) that the model can be checked against. See the comments in default.xml for how to enable level plots.

=== How do I force the output of the model to lie in a certain range ===

See [[Measures#MinMax]].

=== My problem is high dimensional and has a lot of input parameters (more than 10). Can I use SUMO? ===

That depends. Remember that the main focus of SUMO is to generate accurate 'global' models. If you want to do sampling the practical dimensionality is limited to around 6-8 (though it depends on the problem and how cheap the simulations are!). Since the more dimensions the more space you need to fill. At that point you need to see if you can extend the models with domain specific knowledge (to improve performance) or apply a dimensionality reduction method ([[FAQ#Can_the_toolbox_tell_me_which_are_the_most_important_inputs_.28.3D_variable_selection.29.3F|see the next question]]). On the other hand, if you don't need to do sample selection but you have a fixed dataset which you want to model. Then the performance on high dimensional data just depends on the model type. For examples SVM type models are independent of the dimension and thus can always be applied. Though things like feature selection are always recommended.

=== Can the toolbox tell me which are the most important inputs (= variable selection)? ===

When tackling high dimensional problems a crucial question is "Are all my input parameters relevant?". Normally domain knowledge would answer this question but this is not always straightforward. In those cases a whole set of algorithms exist for doing dimensionality reduction (= feature selection). Support for some of these algorithms may eventually make it into the toolbox but are not currently implemented. That is a whole PhD thesis on its own. However, if a model type provides functions for input relevance determination the toolbox can leverage this. For example, the LS-SVM model available in the toolbox supports Automatic Relevance Determination (ARD). This means that if you use the SUMO Toolbox to generate an LS-SVM model, you can call the function ''ARD()'' on the model and it will give you a list of the inputs it thinks are most important.

=== Should I use a Matlab script or a shell script for interfacing with my simulation code? ===

When you want to link SUMO with an external simulation engine (ADS Momentum, SPECTRE, FEBIO, SWAT, ...) you need a [http://en.wikipedia.org/wiki/Shell_script shell script] (or executable) that can take the requested points from SUMO, setup the simulation engine (e.g., set necessary input files), calls the simulator for all the requested points, reads the output (e.g., one or more output files), and returns the results to SUMO (see [[Interfacing with the toolbox]]).

Which one you choose (matlab script + [[Config:SampleEvaluator#matlab|Matlab Sample Evaluator]], or shell script/executable with [[Config:SampleEvaluator#local|Local Sample Evaluator]] is basically a matter of preference, take whatever is easiest for you.

HOWEVER, there is one important consideration: Matlab does not support threads so this means that if you use a matlab script to interface with the simulation engine, simulations and modeling will happen sequentially, NOT in parallel. This means the modeling code will sit around waiting, doing nothing, until the simulation(s) have finished. If your simulation code takes a long time to run this is not very efficient.

On the other hand, using a shell script/executable, does allow the modeling and simulation to occur in parallel (at least if you wrote your interface script in such a way that it can be run multiple times in parallel, i.e., no shared global directories or variables that can cause [http://en.wikipedia.org/wiki/Race_condition race conditions]).

As a sidenote, note that if you already put work into a Matlab script, it is still possible to use a shell script, by writing a shell script that starts Matlab (using -nodisplay or -nojvm options), executes your script (using the -r option), and exits Matlab again. Of course it is not very elegant and adds some overhead but depending on your situation it may be worth it.

=== How can I look at the internal structure of a SUMO model ===

See [[Using_a_model#Available_methods]].

=== Is there any design documentation available? ===

An in depth overview of the rationale and philosophy, including a treatment of the software architecture underlying the SUMO Toolbox is available in the form of a PhD dissertation. A copy of this dissertation [http://www.sumo.intec.ugent.be/?q=system/files/2010_04_PhD_DirkGorissen.pdf is available here].

== Troubleshooting ==

=== I have a problem and I want to report it ===

See : [[Reporting problems]].

===I am getting a java out of memory error, what happened?===
Datasets are loaded through java. This means that the java heap space is used for storing the data. If you try to load a huge dataset (> 50MB), you might experience problems with the maximum heap size. You can solve this by raising the heap size as described on the following webpage:
[http://www.mathworks.com/support/solutions/data/1-18I2C.html]

=== I sometimes get flat models when using rational functions ===

First make sure the model is indeed flat, and does not just appear so on the plot. You can verify this by looking at the output axis range and making sure it is within reasonable bounds. When there are poles in the model, the axis range is sometimes stretched to make it possible to plot the high values around the pole, causing the rest of the model to appear flat. If the model contains poles, refer to the next question for the solution.

The [[Config:AdaptiveModelBuilder#rational| RationalModel]] tries to do a least squares fit, based on which monomials are allowed in numerator and denominator. We have experienced that some models just find a flat model as the best least squares fit. There are two causes for this:

* The number of sample points is few, and the model parameters (as explained [[Model types explained#PolynomialModel|here]]) force the model to use only a very small set of degrees of freedom. The solution in this case is to increase the minimum percentage bound in the RationalFactory section of your configuration file: change the <code>"percentBounds"</code> option to <code>"60,100"</code>, <code>"80,100"</code>, or even <code>"100,100"</code>. A setting of <code>"100,100"</code> will force the polynomial models to always exactly interpolate. However, note that this does not scale very well with the number of samples (to counter this you can set <code>"maxDegrees"</code>). If, after increasing the <code>"percentBounds"</code> you still get weird, spiky, models you simply need more samples or you should switch to a different model type.
* Another possibility is that given a set of monomial degrees, the flat function is just the best possible least squares fit. In that case you simply need to wait for more samples.
* The measure you are using is not accurately estimating the true error, try a different measure or error function. Note that a maximum relative error is dangerous to use since a the 0-function (= a flat model) has a lower maximum relative error than a function which overshoots the true behavior in some places but is otherwise correct.

=== When using rational functions I sometimes get 'spikes' (poles) in my model ===

When the denominator polynomial of a rational model has zeros inside the domain, the model will tend to infinity near these points. In most cases these models will only be recognized as being `the best' for a short period of time. As more samples get selected these models get replaced by better ones and the spikes should disappear.

So, it is possible that a rational model with 'spikes' (caused by poles inside the domain) will be selected as best model. This may or may not be an issue, depending on what you want to use the model for. If it doesn't matter that the model is very inaccurate at one particular, small spot (near the pole), you can use the model with the pole and it should perform properly.

However, if the model should have a reasonable error on the entire domain, several methods are available to reduce the chance of getting poles or remove the possibility altogether. The possible solutions are:

* Simply wait for more data, usually spikes disappear (but not always).
* Lower the maximum of the <code>"percentBounds"</code> option in the RationalFactory section of your configuration file. For example, say you have 500 data points and if the maximum of the <code>"percentBounds"</code> option is set to 100 percent it means the degrees of the polynomials in the rational function can go up to 500. If you set the maximum of the <code>"percentBounds"</code> option to 10, on the other hand, the maximum degree is set at 50 (= 10 percent of 500). You can also use the <code>"maxDegrees"</code> option to set an absolute bound.
* If you roughly know the output range your data should have, an easy way to eliminate poles is to use the [[Measures#MinMax| MinMax]] [[Measures| Measure]] together with your current measure ([[Measures#CrossValidation| CrossValidation]] by default). This will cause models whose response falls outside the min-max bounds to be penalized extra, thus spikes should disappear.
* Use a different model type (RBF, ANN, SVM,...), as spikes are a typical problem of rational functions.
* Increase the population size if using the genetic version
* Try using the [[SampleSelector#RationalPoleSuppressionSampleSelector| RationalPoleSuppressionSampleSelector]], it was designed to get rid of this problem more quickly, but it only selects one sample at the time.

However, these solutions may not still not suffice in some cases. The underlying reason is that the order selection algorithm contains quite a lot of randomness, making it prone to over-fitting. This issue is being worked on but will take some time. Automatic order selection is not an easy problem

=== There is no noise in my data yet the rational functions don't interpolate ===

[[FAQ#I sometimes get flat models when using rational functions |see this question]].

=== When loading a model from disk I get "Warning: Class ':all:' is an unknown object class. Object 'model' of this class has been converted to a structure." ===

You are trying to load a model file without the SUMO Toolbox in your Matlab path. Make sure the toolbox is in your Matlab path.

In short: Start Matlab, run <code><SUMO-Toolbox-directory>/startup.m</code> (to ensure the toolbox is in your path) and then try to load your model.

=== When running the SUMO Toolbox you get an error like "No component with id 'annpso' of type 'adaptive model builder' found in config file." ===

This means you have specified to use a component with a certain id (in this case an AdaptiveModelBuilder component with id 'annpso') but a component with that id does not exist further down in the configuration file (in this particular case 'annpso' does not exist but 'anngenetic' or 'ann' does, as a quick search through the configuration file will show). So make sure you only declare components which have a definition lower down. So see which components are available, simply scroll down the configuration file and see which id's are specified. Please also refer to the [[Toolbox configuration#Declarations and Definitions | Declarations and Definitions]] page.

=== When using NANN models I sometimes get "Runtime error in matrix library, Choldc failed. Matrix not positive definite" ===

This is a problem in the mex implementation of the [http://www.iau.dtu.dk/research/control/nnsysid.html NNSYSID] toolbox. Simply delete the mex files, the Matlab implementation will be used and this will not cause any problems.

=== When using FANN models I sometimes get "Invalid MEX-file createFann.mexa64, libfann.so.2: cannot open shared object file: No such file or directory." ===

This means Matlab cannot find the [http://leenissen.dk/fann/ FANN] library itself to link to dynamically. Make sure the FANN libraries (stored in src/matlab/contrib/fann/src/.libs/) are in your library path, e.g., on unix systems, make sure they are included in LD_LIBRARY_PATH.

=== Undeﬁned function or method ’createFann’ for input arguments of type ’double’. ===

See [[FAQ#When_using_FANN_models_I_sometimes_get_.22Invalid_MEX-file_createFann.mexa64.2C_libfann.so.2:_cannot_open_shared_object_file:_No_such_file_or_directory..22]]

=== When trying to use SVM models I get 'Error during fitness evaluation: Error using ==> svmtrain at 170, Group must be a vector' ===

You forgot to build the SVM mex files for your platform. For windows they are pre-compiled for you, on other systems you have to compile them yourself with the makefile.

=== When running the toolbox you get something like '??? Undefined variable "ibbt" or class "ibbt.sumo.config.ContextConfig.setRootDirectory"' ===

First see [[FAQ#What_is_the_relationship_between_Matlab_and_Java.3F | this FAQ entry]].

This means Matlab cannot find the needed Java classes. This typically means that you forgot to run 'startup' (to set the path correctly) before running the toolbox (using 'go'). So make sure you always run 'startup' before running 'go' and that both commands are always executed in the toolbox root directory.

If you did run 'startup' correctly and you are still getting an error, check that Java is properly enabled:

# typing 'usejava jvm' should return 1
# typing 's = java.lang.String', this should ''not'' give an error
# typing 'version('-java')' should return at least version 1.5.0

If (1) returns 0, then the jvm of your Matlab installation is not enabled. Check your Matlab installation or startup parameters (did you start Matlab with -nojvm?)
If (2) fails but (1) is ok, there is a very weird problem, check the Matlab documentation.
If (3) returns a version before 1.5.0 you will have to upgrade Matlab to a newer version or force Matlab to use a custom, newer, jvm (See the Matlab docs for how to do this).

=== You get errors related to ''gaoptimset'',''psoptimset'',''saoptimset'',''newff'' not being found or unknown ===

You are trying to use a component of the SUMO toolbox that requires a Matlab toolbox that you do not have. See the [[System requirements]] for more information.

=== After upgrading I get all kinds of weird errors or warnings when I run my XML files ===

See [[FAQ#How_do_I_upgrade_to_a_newer_version.3F]]

=== I get a warning about duplicate samples being selected, why is this? ===

Sometimes, in special circumstances, multiple sample selectors may select the same sample at the same time. Even though in most cases this is detected and avoided, it can still happen when multiple outputs are modelled in one run, and each output is sampled by a different sample selector. These sample selectors may then accidentally choose the same new sample location.

=== I sometimes see the error of the best model go up, shouldn't it decrease monotonically? ===

There is no short answer here, it depends on the situation. Below 'single objective' refers to the case where during the hyperparameter optimization (= the modeling iteration) combineOutputs=false, and there is only a single measure set to 'on'. The other cases are classified as 'multi objective'. See also [[Multi-Objective Modeling]].

# '''Sampling off'''
## ''Single objective'': the error should always decrease monotonically, you should never see it rise. If it does [[reporting problems|report it as a bug]]
## ''Multi objective'': There is a very small chance the error can temporarily decrease but it should be safe to ignore. In this case it is best to use a multi objective enabled modeling algorithm
# '''Sampling on'''
## ''Single objective'': inside each modeling iteration the error should always monotonically decrease. At each sampling iteration the best models are updated (to reflect the new data), thus there the best model score may increase, this is normal behavior(*). It is possible that the error increases for a short while, but as more samples come in it should decrease again. If this does not happen you are using a poor measure or poor hyperparameter optimization algorithm, or there is a problem with the modeling technique itself (e.g., clustering in the datapoints is causing numerical problems).
## ''Multi objective'': Combination of 1.2 and 2.1.

(*) This is normal if you are using a measure like cross validation that is less reliable on little data than on more data. However, in some cases you may wish to override this behavior if you are using a measure that is independent of the number of samples the model is trained with (e.g., a dense, external validation set). In this case you can force a monotonic decrease by setting the 'keepOldModels' option in the SUMO tag to true. Use with caution!

=== At the end of a run I get Undefined variable "ibbt" or class "ibbt.sumo.util.JpegImagesToMovie.createMovie" ===

This is normal, the warning printed out before the error explains why:

''[WARNING] jmf.jar not found in the java classpath, movie creation may not work! Did you install the SUMO extension pack? Alternatively you can install the java media framwork from java.sun.com''

By default, at the end of a run, the toolbox will try to generate a movie of all the intermediate model plots. To do this it requires the extension pack to be installed (you can download it from the SUMO lab website). So install the extension pack and you will no longer get the error. Alternatively you can simply set the "createMovie" option in the <SUMO> tag to "false".
So note that there is nothing to worry about, everything has run correctly, it is just the movie creation that is failing.

=== On startup I get the error "java.io.IOException: Couldn't get lock for output/SUMO-Toolbox.%g.%u.log" ===

This error means that SUMO is unable to create the log file. Check the output directory exists and has the correct permissions. If your output directory is on a shared (network) drive this could also cause problems. Also make sure you are running the toolbox (calling 'go') from the toolbox root directory, and not in some toolbox sub directory! This is very important.

If you still have problems you can override the default logfile name and location as follows:

In the <FileHandler> tag inside the <Logging> tag add the following option:

<code>
<Option key="Pattern" value="My_SUMO_Log_file.log"/>
</code>

This means that from now on the sumo log file will be saved as the file "My_SUMO_Log_file.log" in the SUMO root directory. You can use any path you like.
For more information about this option see [http://java.sun.com/j2se/1.4.2/docs/api/java/util/logging/FileHandler.html the FileHandler Javadoc].

=== The Toolbox crashes with "Too many open files" what should I do? ===

This is a known bug, see [[Known_bugs#Version_6.1]].

If this does not fix your problem then do the following:

On Windows try increasing the limit in windows as dictated by the error message. Also, when you get the error, use the fopen("all") command to see which files are open and send us the list of filenames. Then we can maybe further help you debug the problem. Even better would be to use the Process Explorer utility [http://technet.microsoft.com/en-us/sysinternals/bb896653.aspx available here]. When you get the error, dont shut down Matlab but start Process explorer and see which SUMO-Toolbox related files are open. If you then [[Reporting_problems|let us know]] we can further debug the problem.

On Linux again don't shut down Matlab but:

* open a new terminal window
* type:
<source lang="bash">
lsof > openFiles.txt
</source>
* Then [[Contact|send us]] the following information:
** the file openFiles.txt
** the exact Linux distribution you are using (Red Hat 10, CentOS 5, SUSE 11, etc).
** the output of
<source lang="bash">
uname -a ; df -T ; mount
</source>

As a temporary workaround you can try increasing the maximum number of open files ([http://www.linuxforums.org/forum/redhat-fedora-linux-help/64716-where-chnage-file-max-permanently.html see for example here]). We are currently debugging this issue.

In general: to be safe it is always best to do a SUMO run from a clean Matlab startup, especially if the run is important or may take a long time.

=== When using the LS-SVM models I get lots of warnings: "make sure lssvmFILE.x (lssvmFILE.exe) is in the current directory, change now to MATLAB implementation..." ===

The LS-SVMs have a C implementation and a Matlab implementation. If you dont have the compiled mex files it will use the matlab implementation and give a warning. But everything will work properly. To get rid of the warnings, compile the mex files [[Installation#Windows|as described here]], this can be done very easily. Or simply comment out the lines that produce the output in the lssvmlab directory in src/matlab/contrib.

=== I get an error "Undefined function or method 'trainlssvm' for input arguments of type 'cell'" ===

You most likely forgot to [[Installation#Extension_pack|install the extension pack]].

=== When running the SUMO-Toolbox under Linux, the [http://en.wikipedia.org/wiki/X_Window_System X server] suddenly restarts and I am logged out of my session ===

Note that in Linux there is an explicit difference between the [http://en.wikipedia.org/wiki/Linux_kernel kernel] and the [http://en.wikipedia.org/wiki/X_Window_System X display server]. If the kernel crashes or panics your system completely freezes (you have to reset manually) or your computer does a full reboot. Luckily this is very rare. However, if you display server (X) crashes or restarts it means your operating system is still running fine, its just that you have to log in again since your graphical session has terminated. The FAQ entry is only for the latter. If you find your kernel is panicing or freezing, that is a more fundamental problem and you should contact your system admin.

So what happens is that after a few seconds when the toolbox wants to plot the first model [http://en.wikipedia.org/wiki/X_Window_System X] crashes and you are suddenly presented with a login screen. The problem is not due to SUMO but rather to the Matlab - Display server interaction.

What you should first do is set plotModels to false in the [[Config:ContextConfig]] tag, run again and see if the problem occurs again. If it does please [[Reporting_problems| report it]]. If the problem does not occur you can then try the following:

* Log in as root (or use [http://en.wikipedia.org/wiki/Sudo sudo])
* Edit the following configuration file using a text editor (pico, nano, vi, kwrite, gedit,...)

<source lang="bash">
/etc/X11/xorg.conf
</source>

Note: the exact location of the xorg.conf file may vary on your system.

* Look for the following line:

<source lang="bash">
Load "glx"
</source>

* Comment it out by replacing it by:

<source lang="bash">
# Load "glx"
</source>

* Then save the file, restart your X server (if you do not know how to do this simply reboot your computer)
* Log in again, and try running the toolbox (making sure plotModels is set to true again). It should now work. If it still does not please [[Reporting_problems| report it]].

Note:
* this is just an empirical workaround, if you have a better idea please [[Contact|let us know]]
* if you wish to debug further yourself please check the Xorg log files and those in /var/log
* another possible workaround is to start matlab with the "-nodisplay" option. That could work as well.

=== I get the error "Failed to close Matlab pool cleanly, error is Too many output arguments" ===

This happens if you run the toolbox on Matlab version 2008a and you have the parallel computing toolbox installed. You can simply ignore this error message, it does not cause any problems. If you want to use SUMO with the parallel computing toolbox you will need Matlab 2008b.

=== The toolbox seems to keep on running forever, when or how will it stop? ===

The toolbox will keep on generating models and selecting data until one of the termination criteria has been reached. It is up to ''you'' to choose these targets carefully, so how low the toolbox runs simply depends on what targets you choose. Please see [[Running#Understanding_the_control_flow]].

Of course choosing a-priori targets up front is not always easy and there is no real solution for this, except thinking well about what type of model you want (see [[FAQ#I_dont_like_the_final_model_generated_by_SUMO_how_do_I_improve_it.3F]]). In doubt you can always use a small value (or 0) and then simply quit the running toolbox using Ctrl-C when you think its been enough.

While one could implement fancy, automatic stopping algorithms, their actual benefit is questionable.

FAQ

2014-03-13T09:20:35Z

Javdrher: /* What is (adaptive) sampling? Why is it used? */

== General ==

=== What is a global surrogate model? ===

A global [http://en.wikipedia.org/wiki/Surrogate_model surrogate model] is a mathematical model that mimics the behavior of a computationally expensive simulation code over '''the complete parameter space''' as accurately as possible, using as little data points as possible. So note that optimization is not the primary goal, although it can be done as a post-processing step. Global surrogate models are useful for:

* design space exploration, to get a ''feel'' of how the different parameters behave
* sensitivity analysis
* ''what-if'' analysis
* prototyping
* visualization
* ...

In addition they are a cheap way to model large scale systems, multiple global surrogate models can be chained together in a model cascade.

See also the [[About]] page.

=== What about surrogate driven optimization? ===

When coining the term '''surrogate driven optimization''' most people associate it with trust-region strategies and simple polynomial models. These frameworks first construct a local surrogate which is optimized to find an optimum. Afterwards, a move limit strategy decides how the local surrogate is scaled and/or moved through the input space. Subsequently the surrogate is rebuild and optimized. I.e. the surrogate zooms in to the global optimum. For instance the [http://www.cs.sandia.gov/DAKOTA/ DAKOTA] Toolbox implements such strategies where the surrogate construction is separated from optimization.

Such a framework was earlier implemented in the SUMO Toolbox but was deprecated as it didn't fit the philosophy and design of the toolbox.

Instead another, equally powerful, approach was taken. The current optimization framework is in fact a sampling selection strategy that balances local and global search. In other words, it balances between exploring the input space and exploiting the information the surrogate gives us.

A configuration example can be found [[Config:SampleSelector#expectedImprovement|here]].

=== What is (adaptive) sampling? Why is it used? ===

In classical Design of Experiments you need to specify the design of your experiment up-front. Or in other words, you have to say up-front how many data points you need and how they should be distributed. Two examples are Central Composite Designs and Latin Hypercube designs. However, if your data is expensive to generate (e.g., an expensive simulation code) it is not clear how many points are needed up-front. Instead data points are selected adaptively, only a couple at a time. This process of incrementally selecting new data points in regions that are the most interesting is called adaptive sampling, sequential design, or active learning. Of course the sampling process needs to start from somewhere so the very first set of points is selected based on a fixed, classic experimental design. See also [[Running#Understanding_the_control_flow]].
SUMO provides a number of different sampling algorithms: [[Config:SequentialDesign|SequentialDesign]]

Of course sometimes you dont want to do sampling. For example if you have a fixed dataset you just want to load all the data in one go and model that. For how to do this see [[FAQ#How_do_I_turn_off_adaptive_sampling_.28run_the_toolbox_for_a_fixed_set_of_samples.29.3F]].

=== What about dynamical, time dependent data? ===

The original design and purpose was to tackle static input-output systems, where there is no memory. Just a complex mapping that must be learnt and approximated. Of course you can take a fixed time interval and apply the toolbox but that typically is not a desired solution. Usually you are interested in time series prediction, e.g., given a set of output values from time t=0 to t=k, predict what happens at time t=k+1,k+2,...

The toolbox was originally not intended for this purpose. However, it is quite easy to add support for recurrent models. Automatic generation of dynamical models would involve adding a new model type (just like you would add a new regression technique) or require adapting an existing one. For example it would not be too much work to adapt the ANN or SVM models to support dynamic problems. The only extra work besides that would be to add a new [[Measures|Measure]] that can evaluate the fidelity of the models' prediction.

Naturally though, you would be unable to use sample selection (since it makes no sense in those problems). Unless of course there is a specialized need for it. In that case you would add a new [[SampleSelector]].

For more information on this topic [[Contact]] us.

=== What about classification problems? ===

The main focus of the SUMO Toolbox is on regression/function approximation. However, the framework for hyperparameter optimization, model selection, etc. can also be used for classification. Starting from version 6.3 a demo file is included in the distribution that shows how this works on the well known two spiral test problem. It is possible to specify a run as a classification problem by setting the 'classificationMode' and 'numberOfClasses' option in ContextConfig in the configuration file. Classification models from WEKA are also available in SUMO. Please refer to the default configuration file for the explanation on usage of WEKA model types available through SUMO. The LOLA-Voronoi sample selection scheme also supports classification, and its usage is documented in the default configuration file as well.

=== Does SUMO support discrete inputs/outputs ===

Not, if you mean in a smart way. There is a way to flag an input/output as discrete but it is not used anywhere. It is on the wishlist but we have not been able to get to it yet. Discrete inputs are just handled as if they were continuous. Depending on how many levels there are and if there is an ordering this may work ok or not work at all. You could of course add your own model type that can handle these :) As for discrete outputs see [[FAQ#What_about_classification_problems.3F]].

=== Can the toolbox drive my simulation code directly? ===

Yes it can. See the [[Interfacing with the toolbox]] page.

=== What is the difference between the M3-Toolbox and the SUMO-Toolbox? ===

The SUMO toolbox is a complete, feature-full framework for automatically generating approximation models and performing adaptive sampling. In contrast, the M3-Toolbox was more of a proof-of-principle.

=== What happened to the M3-Toolbox? ===

The M3 Toolbox project has been discontinued (Fall 2007) and superseded by the SUMO Toolbox. Please contact tom.dhaene@ua.ac.be for any inquiries and requests about the M3 Toolbox.

=== How can I stay up to date with the latest news? ===

To stay up to date with the latest news and releases, we also recommend subscribing to our newsletter [http://www.sumo.intec.ugent.be here]. Traffic will be kept to a minimum (1 message every 2-3 months) and you can unsubscribe at any time.

You can also follow our blog: [http://sumolab.blogspot.com/ http://sumolab.blogspot.com/].

=== What is the roadmap for the future? ===

There is no explicit roadmap since much depends on where our research leads us, what feedback we get, which problems we are working on, etc. However, to get an idea of features to come you can always check the [[Whats new]] page.

You can also follow our blog: [http://sumolab.blogspot.com/ http://sumolab.blogspot.com/].

=== Will there be an R/Scilab/Octave/Sage/.. version? ===

At the start of the project we considered moving from Matlab to one of the available open source alternatives. However, after much discussion we decided against this for several reasons, including:

* Existing experience and know-how of the development team
* The widespread use of the Matlab platform in the target application domains
* The quality and amount of available Matlab documentation
* The quality and number of Matlab toolboxes
* Support for object orientation (inheritance, polymorphism, etc.)
* Many well documented interfacing options (especially the seamless integration with Java)

Matlab, as a proprietary platform, definitely has its problems and deficiencies but the number of advanced algorithms and available toolboxes make it a very attractive platform. Equally important is the fact that every function is properly documented, tested, and includes examples, tutorials, and in some cases GUI tools. A lot of things would have been a lot harder and/or time consuming to implement on one of the other platforms. Add to that the fact that many engineers (particularly in aerospace) already use Matlab quite heavily. Thus given our situation, goals, and resources at the time, Matlab was the best choice for us.

The other platforms remain on our radar however, and we do look into them from time to time. Though, with our limited resources porting to one of those platforms is not (yet) cost effective.

=== What are collaboration options? ===

We will gladly help out with any SUMO-Toolbox related questions or problems. However, since we are a university research group the most interesting goal for us is to work towards some joint publication (e.g., we can help with the modeling of your problem). Alternatively, it is always nice if we could use your data/problem (fully referenced and/or anonymized if necessary of course) as an example application during a conference presentation or in a PhD thesis.

The most interesting case is if your problem involves sample selection and modeling. This means you have some simulation code or script to drive and you want an accurate model while minimizing the number of data points. In this case, in order for us to optimally help you it would be easiest if we could run your simulation code (or script) locally or access it remotely. Else its difficult to give good recommendations about what settings to use.

If this is not possible (e.g., expensive, proprietary or secret modeling code) or if your problem does not involve sample selection, you can send us a fixed data set that is representative of your problem. Again, this may be fully anonymized and will be kept confidential of course.

In either case (code or dataset) remember:

* the data file should be an ASCII file in column format (each row containing one data point) (see also [[Interfacing_with_the_toolbox]])
* include a short description of your data:
** number of inputs and number of outputs
** the range of each input (or scaled to [-1 1] if you do not wish to disclose this)
** if the outputs are real or complex valued
** how noisy the data is or if it is completely deterministic (computer simulation) (please also see: [[FAQ#My_data_contains_noise_can_the_SUMO-Toolbox_help_me.3F]]).
** if possible the expected range of each output (or scaled if you do not wish to disclose this)
** if possible the names of each input/output + a short description of what they mean
** any further insight you have about the data, expected behavior, expected importance of each input, etc.

If you have any further questions or comments related to this please [[Contact]] us.

=== Can you help me model my problem? ===

Please see the previous question: [[FAQ#What_are_collaboration_options.3F]]

== Installation and Configuration ==

=== What is the relationship between Matlab and Java? ===

Many people do not know this, but your Matlab installation automatically includes a Java virtual machine. By default, Matlab seamlessly integrates with Java, allowing you to create Java objects from the command line (e.g., 's = java.lang.String'). It is possible to disable java support but in order to use the SUMO Toolbox it should not be. To check if Java is enabled you can use the 'usejava' command.

=== What is Java, why do I need it, do I have to install it, etc. ? ===

The short answer is: no, dont worry about it. The long answer is: Some of the code of the SUMO Toolbox is written in [http://en.wikipedia.org/wiki/Java_(programming_language) Java], since it makes a lot more sense in many situations and is a proper programming language instead of a scripting language like Matlab. Since Matlab automatically includes a JVM to run Java code there is nothing you need to do or worry about (see the previous FAQ entry). Unless its not working of course, in that case see [[FAQ#When_running_the_toolbox_you_get_something_like_.27.3F.3F.3F_Undefined_variable_.22ibbt.22_or_class_.22ibbt.sumo.config.ContextConfig.setRootDirectory.22.27]].

=== What is XML? ===

XML stands for eXtensible Markup Language and is related to HTML (= the stuff web pages are written in). The first thing you have to understand is that '''does not do anything'''. Honest. Many engineers are not used to it and think it is some complicated computer programming language-stuff-thingy. This is of course not the case (we ignore some of the fancy stuff you can do with it for now). XML is a markup language meaning, it provides some rules how you can annotate or structure existing text.

The way SUMO uses XML is really simple and there is not much to understand (for more information on how SUMO uses XML go this [[Config:ToolboxConfiguration#Interpreting_the_configuration_file|page]]).

First some simple terminology. Take the following example:

<source lang="xml">
<Foo attr="bar">bla bla bla</Foo>
</source>

Here we have '''a tag''' called ''Foo'' containing text ''bla bla bla''. The tag Foo also has an '''attribute''' ''attr'' with value ''bar''. '<Foo>' is what we call the '''opening tag''', and '</Foo>' is the '''closing tag'''. Each time you open a tag you must close it again. How you name the tags or attributes it totally up to you, you choose :)

Lets take a more interesting example. Here we have used XML to represent information about a receipe for pancakes:

<source lang="xml">
<recipe category="dessert">
<title>Pancakes</title>
<author>sumo@intec.ugent.be</author>
<date>Wed, 14 Jun 95</date>
<description>
Good old fashioned pancakes.
</description>
<ingredients>
<item>
<amount>3</amount>
<type>eggs</type>
</item>

<item>
<amount>0.5 tablespoon</amount>
<type>salt</type>
</item>
...
</ingredients>
<preparation>
...
</preparation>
</recipe>
</source>

So basically, you see that XML is just a way to structure, order, and group information. Thats it! So SUMO basically uses it to store and structure configuration options. And this works well due to the nice hierarchical nature of XML.

If you understand this there is nothing else to it in order to be able to understand the SUMO configuration files. If you need more information see the tutorial here: [http://www.w3schools.com/XML/xml_whatis.asp http://www.w3schools.com/XML/xml_whatis.asp]. You can also have a look at the wikipedia page here: [http://en.wikipedia.org/wiki/XML http://en.wikipedia.org/wiki/XML]

=== Why does SUMO use XML? ===

XML is the de facto standard way of structuring information. This ranges from spreadsheet files (Microsoft Excel for example), to configuration data, to scientific data, ... There are even whole database systems based solely on XML. So basically, its an intuitive way to structure data and it is used everywhere. This makes that there are a very large number of libraries and programming languages available that can parse, and handle XML easily. That means less work for the programmer. Then of course there is stuff like XSLT, XQuery, etc that makes life even easier.
So basically, it would not make sense for SUMO to use any other format :). For more information on how SUMO uses XML go this [[Config:ToolboxConfiguration#Interpreting_the_configuration_file|page]].

=== I get an error that SUMO is not yet activated ===

Make sure you installed the activation file that was mailed to you as is explained in the [[Installation]] instructions. Also double check your system meets the [[System requirements]] and that [http://www.sumowiki.intec.ugent.be/index.php/FAQ#When_running_the_toolbox_you_get_something_like_.27.3F.3F.3F_Undefined_variable_.22ibbt.22_or_class_.22ibbt.sumo.config.ContextConfig.setRootDirectory.22.27|java java is enabled]. To fully verify that the activation file installation is correct ensure that the file ContextConfig.class is present in the directory ''<SUMO installation directory>/bin/java/ibbt/sumo/config''.

Please note that more flexible research licenses are available if it is possible to [[FAQ#What_are_collaboration_options.3F|collaborate in any way]].

== Upgrading ==

=== How do I upgrade to a newer version? ===

Delete your old <code><SUMO-Toolbox-directory></code> completely and replace it by the new one. Install the new activation file / extension pack as before (see [[Installation]]), start Matlab and make sure the default run works. To port your old configuration files to the new version: make a copy of default.xml (from the new version) and copy over your custom changes (from the old version) one by one. This should prevent any weirdness if the XML structure has changed between releases.

If you had a valid activation file for the previous version, just [[Contact]] us (giving your SUMOlab website username) and we will send you a new activation file. Note that to update an activation file you must first unzip a copy of the toolbox to a new directory and install the activation file as if it was the very first time. Upgrading of an activation file without performing a new toolbox install is (unfortunately) not (yet) supported.

== Using ==

=== I have no idea how to use the toolbox, what should I do? ===

See: [[Running#Getting_started]]

=== I want to try one of the different examples ===

See [[Running#Running_different_examples]].

=== I want to model my own problem ===

See : [[Adding an example]].

=== I want to contribute some data/patch/documentation/... ===

See : [[Contributing]].

=== How do I interface with the SUMO Toolbox? ===

See : [[Interfacing with the toolbox]].

=== What configuration options (model type, sample selection algorithm, ...) should I use for my problem? ===

See [[General_guidelines]].

=== Ok, I generated a model, what can I do with it? ===

See: [[Using a model]].

=== How can I share a model created by the SUMO Toolbox? ===

See : [[Using a model#Model_portability| Model portability]].

=== I dont like the final model generated by SUMO how do I improve it? ===

Before you start the modeling you should really ask youself this question: ''What properties do I want to see in the final model?'' You have to think about what for you constitutes a good model and what constitutes a poor model. Then you should rank those properties depending on how important you find them. Examples are:

* accuracy in the training data
** is it important that the error in the training data is exactly 0, or do you prefer some smoothing
* accuracy outside the training data
** this is the validation or test error, how important is proper generalization (usually this is very important)
* what does accuracy mean to you? a low maximum error, a low average error, both, ...
* smoothness
** should your model be perfectly smooth or is it acceptable that you have a few small ripples here and there for example
* are some regions of the response more important than others?
** for example you may want to be certain that the minima/maxima are captured very accurately but everything in between is less important
* are there particular special features that your model should have
** for example, capture underlying poles or discontinuities correctly
* extrapolation capability
* ...

It is important to note that often these criteria may be conflicting. The classical example is fitting noisy data: the lower your training error the higher your testing error. A natural approach is to combine multiple criteria, see [[Multi-Objective Modeling]].

Once you have decided on a set of requirements the question is then, can the SUMO-Toolbox produce a model that meets them? In SUMO model generation is driven by one or more [[Measures]]. So you should choose the combination of [[Measures]] that most closely match your requirements. Of course we can not provide a Measure for every single property, but it is very straightforward to [[Add_Measure|add your own Measure]].

Now, lets say you have chosen what you think are the best Measures but you are still not happy with the final model. Reasons could be:

* you need more modeling iterations or you need to build more models per iteration (see [[Running#Understanding_the_control_flow]]). This will result in a more extensive search of the model parameter space, but will take longer to run.
* you should switch to a different model parameter optimization algorithm (e.g., for example instead of the Pattern Search variant, try the Genetic Algorithm variant of your AdaptiveModelBuilder.)
* the model type you are using is not ideally suited to your data
* there simply is not enough data, use a larger initial design or perform more sampling iterations to get more information per dimension
* maybe the sample distribution is causing troubles for your model (e.g., Kriging can have problems with clustered data). In that case it could be worthwhile to choose a different sample selection algorithm.
* the range of your response variable is not ideal (for example, neural networks have trouble modeling data if the range of the outputs is very very small)

You may also refer to the following [[General_guidelines]]. Finally, of course it may be that your problem is simply a very difficult one and does not approximate well. But, still you should at least get something satisfactory.

If you are having these kinds of problems, please [[Reporting_problems|let us know]] and we will gladly help out.

=== My data contains noise can the SUMO-Toolbox help me? ===

The original purpose of the SUMO-Toolbox was for it to be used in conjunction with computer simulations. Since these are fully deterministic you do not have to worry about noise in the data and all the problems it causes. However, the methods in the toolbox are general fitting methods that work on noisy data as well. So yes, the toolbox can be used with noisy data, but you will just have to be more careful about how you apply the methods and how you perform model selection. Its only when you use the toolbox with a noisy simulation engine that a few special options may need to be set. In that case [[Contact]] us for more information.

Note though, that the toolbox is not a statistical package, if you have noisy data and you need noise estimation algorithms, kernel smoothing algorithms, etc. you should look towards other tools.

=== What is the difference between a ModelBuilder and a ModelFactory? ===

See [[Add Model Type]].

=== Why are the Neural Networks so slow? ===

The ANN models are an extremely powerful model type that give very good results in many problems. However, they are quite slow to use. There are some things you can do:

* use trainlm or trainscg instead of the default training function trainbr. trainbr gives very good, smooth results but is slower to use. If results with trainlm are not good enough, try using msereg as a performance function.
* try setting the training goal (= the SSE to reach during training) to a small positive number (e.g., 1e-5) instead of 0.
* check that the output range of your problem is not very small. If your response data lies between 10e-5 and 10e-9 for example it will be very hard for the neural net to learn it. In that case rescale your data to a more sane range.
* switch from ANN to one of the other neural network modelers: fanngenetic or nanngenetic. These are a lot faster than the default backend based on the [http://www.mathworks.com/products/neuralnet/ Matlab Neural Network Toolbox]. However, the accuracy is usually not as good.
* If you are using [[Measures#CrossValidation| CrossValidation]] try to switch to a different measure since CrossValidation is very expensive to use. CrossValidation is used by default if you have not defined a [[Measures| measure]] yourself. When using one of the neural network model types, try to use a different measure if you can. For example, our tests have shown that minimizing the sum of [[Measures#SampleError| SampleError]] and [[Measures#LRMMeasure| LRMMeasure]] can give equal or even better results than CrossValidation, while being much cheaper (see [[Multi-Objective Modeling]] for how to combine multiple measures). See also the comments in <code>default.xml</code> for examples.
* Finally, as with any model type things will slow down if you have many dimensions or very large amounts of data. If that is the case, try some dimensionality reduction or subsampling techniques.

See also [[FAQ#How_can_I_make_the_toolbox_run_faster.3F]]

=== How can I make the toolbox run faster? ===

There are a number of things you can do to speed things up. These are listed below. Remember though that the main reason the toolbox may seem to be slow is due to the many models being built as part of the hyperparameter optimization. Please make sure you fully understand the [[Running#Understanding_the_control_flow|control flow described here]] before trying more advanced options.

* First of all check that your virus scanner is not interfering with Matlab. If McAfee or any other program wants to scan every file SUMO generates this really slows things down and your computer becomes unusable.

* Turn off the plotting of models in [[Config:ContextConfig#PlotOptions| ContextConfig]], you can always generate plots from the saved mat files

* This is an important one. For most model builders there is an option "maxFunEals", "maxIterations", or equivalent. Change this value to change the maximum number of models built between 2 sampling iterations. The higher this number, the slower, but the better the models ''may'' be. Equivalently, for the Genetic model builders reduce the population size and the number of generations.

* If you are using [[Measures#CrossValidation]] see if you can avoid it and use one of the other measures or a combination of measures (see [[Multi-Objective Modeling]]

* If you are using a very dense [[Measures#ValidationSet]] as your Measure, this means that every single model will be evaluated on that data set. For some models like RBF, Kriging, SVM, this can slow things down.

* Disable some, or even all of the [[Config:ContextConfig#Profiling| profilers]] or disable the output handlers that draw charts. For example, you might use the following configuration for the profilers:

<source lang="xml">
<Profiling>
<Profiler name=".*share.*|.*ensemble.*|.*Level.*" enabled="true">
<Output type="toImage"/>
<Output type="toFile"/>
</Profiler>

<Profiler name=".*" enabled="true">
<Output type="toFile"/>
</Profiler>
</Profiling>
</source>

The ".*" means match any one or more characters ([http://java.sun.com/j2se/1.5.0/docs/api/java/util/regex/Pattern.html see here for the full list of supported wildcards]). Thus in this example all the profilers that have "share", "ensemble", or "Level" in their name shoud be enabled and should be saved as a text file (toFile) AND as an image file (toImage). All the other profilers should be saved just to file. The idea is to only save to image what you want as an image since image generation is expensive. If you do this or switch off image generation completely you will see everything run much faster.

* Decrease the logging granularity, a log level of FINE (the default is FINEST or ALL) is more then granular enough. Setting it to FINE, INFO, or even WARNING should speed things up.

* If you have a multi-core/multi-cpu machine:
** if you have the Matlab Parallel Computing Toolbox, try setting the parallelMode option to true in [[Config:ContextConfig]]. Now all model training occurs in parallel. This may give unexpected errors in some cases so beware when using.
** if you are using a native executable or script as the sample evaluator set the threadCount variable in [[Config:SampleEvaluator#LocalSampleEvaluator| LocalSampleEvaluator]] equal to the number of cores/CPUs (only do this if it is ok to start multiple instances of your simulation script in parallel!)

* Dont use the Min-Max measure, it can slow things down. See also [[FAQ#How_do_I_force_the_output_of_the_model_to_lie_in_a_certain_range]]

* If you are using neural networks see [[FAQ#Why_are_the_Neural_Networks_so_slow.3F]]

* If you are having problems with very slow or seemingly hanging runs:
** Do a run inside the [http://www.mathworks.com/access/helpdesk/help/techdoc/index.html?/access/helpdesk/help/techdoc/matlab_env/f9-17018.html&http://www.google.be/search?client=firefox-a&rls=org.mozilla%3Aen-US%3Aofficial&channel=s&hl=nl&q=matlab+profiler&meta=&btnG=Google+zoeken Matlab profiler] and see where most time is spent.

** Monitor CPU and physical/virtual memory usage while the SUMO toolbox is running and see if you notice anything strange.

* Also note that by default Matlab only allocates about 117 MB memory space for the Java Virtual Machine. If you would like to increase this limit (which you should) please follow the instructions [http://www.mathworks.com/support/solutions/data/1-18I2C.html?solution=1-18I2C here]. See also the general memory instructions [http://www.mathworks.com/support/tech-notes/1100/1106.html here].

To check if your SUMO run has hanged, monitor your log file (with the level set at least to FINE). If you see no changes for about 30 minutes the toolbox will probably have stalled. [[Reporting problems| report the problems here]].

Such problems are hard to identify and fix so it is best to work towards a reproducible test case if you think you found a performance or scalability issue.

=== How do I build models with more than one output ===

Sometimes you have multiple responses that you want to model at once. See [[Running#Models_with_multiple_outputs]]

=== How do I turn off adaptive sampling (run the toolbox for a fixed set of samples)? ===

See : [[Adaptive Modeling Mode]].

=== How do I change the error function (relative error, RMSE, ...)? ===

The [[Measures| <Measure>]] tag specifies the algorithm to use to assign models a score, e.g., [[Measures#CrossValidation| CrossValidation]]. It is also possible to specify which '''error function''' to use, in the measure. The default error function is '<code>rootRelativeSquareError</code>'.

Say you want to use [[Measures#CrossValidation| CrossValidation]] with the maximum absolute error, then you would put:

<source lang="xml">
<Measure type="CrossValidation" target="0.001" errorFcn="maxAbsoluteError"/>
</source>

On the other hand, if you wanted to use the [[Measures#ValidationSet| ValidationSet]] measure with a relative root-mean-square error you would put:

<source lang="xml">
<Measure type="ValidationSet" target="0.001" errorFcn="relativeRms"/>
</source>

The default error function is '<code>rootRelativeSquareError</code>'. These error functions can be found in the <code>src/matlab/tools/errorFunctions</code> directory. You are free to modify them and add your own. Remember that the choice of error function is very important! Make sure you think well about it. Also see [[Multi-Objective Modeling]].

=== How do I enable more profilers? ===

Go to the [[Config:ContextConfig#Profiling| <Profiling>]] tag and put <code>"<nowiki>.*</nowiki>"</code> as the regular expression. See also the next question.

=== What regular expressions can I use to filter profilers? ===

See the syntax [http://java.sun.com/j2se/1.5.0/docs/api/java/util/regex/Pattern.html here].

=== How can I ensure deterministic results? ===

See : [[Random state]].

=== How do I get a simple closed-form model (symbolic expression)? ===

See : [[Using a model]].

=== How do I enable the Heterogenous evolution to automatically select the best model type? ===

Simply use the [[Config:AdaptiveModelBuilder#heterogenetic| heterogenetic modelbuilder]] as you would any other.

=== What is the combineOutputs option? ===

See [[Running#Models_with_multiple_outputs]]

=== What error function should I use? ===

The default error function is the Root Relative Square Error (RRSE). On the other hand meanRelativeError may be more intuitive but in that case you have to be careful if you have function values close to zero since in that case the relative error explodes or even gives infinity. You could also use one of the combined relative error functions (contain a +1 in the denominator to account for small values) but then you get something between a relative and absolute error (=> hard to interpret).

So to be sure an absolute error seems the safest bet (like the RMSE), however in that case you have to come up with sensible accuracy targets and realize that you will build models that try to fit the regions of high absolute value better than the low ones.

Picking an error function is a very tricky business and many people do not realize this. Which one is best for you and what targets you use ultimately depends on your application and on what kind of model you want. There is no general answer.

A recommended read is [http://www.springerlink.com/content/24104526223221u3/ is this paper]. See also the page on [[Multi-Objective Modeling]].

=== I just want to generate an initial design (no sampling, no modeling) ===

Do a regular SUMO run, except set the 'maxModelingIterations' in the SUMO tag to 0. The resulting run will only generate (and evaluate) the initial design and save it to samples.txt in the output directory.

=== How do I start a run with the samples of of a previous run, or with a custom initial design? ===

Use a Dataset design component, for example:

<source lang="xml">
<InitialDesign type="DatasetDesign">
<Option key="file" value="/path/to/the/file/containing/the/points.txt"/>
</InitialDesign>
</source>

The points of a previous run can be found in the samples.txt file in the output directory of the run you want to continue.

As a sidenote, remark you can start the toolbox with *data points* of a previous run, but not with the *models* of a previous run.

=== What is a level plot? ===

A level plot is a plot that shows how the error histogram changes as the best model improves. An example is:
<gallery>
Image:levelplot.png
</gallery>
Level plots only work if you have a separate dataset (test set) that the model can be checked against. See the comments in default.xml for how to enable level plots.

=== How do I force the output of the model to lie in a certain range ===

See [[Measures#MinMax]].

=== My problem is high dimensional and has a lot of input parameters (more than 10). Can I use SUMO? ===

That depends. Remember that the main focus of SUMO is to generate accurate 'global' models. If you want to do sampling the practical dimensionality is limited to around 6-8 (though it depends on the problem and how cheap the simulations are!). Since the more dimensions the more space you need to fill. At that point you need to see if you can extend the models with domain specific knowledge (to improve performance) or apply a dimensionality reduction method ([[FAQ#Can_the_toolbox_tell_me_which_are_the_most_important_inputs_.28.3D_variable_selection.29.3F|see the next question]]). On the other hand, if you don't need to do sample selection but you have a fixed dataset which you want to model. Then the performance on high dimensional data just depends on the model type. For examples SVM type models are independent of the dimension and thus can always be applied. Though things like feature selection are always recommended.

=== Can the toolbox tell me which are the most important inputs (= variable selection)? ===

When tackling high dimensional problems a crucial question is "Are all my input parameters relevant?". Normally domain knowledge would answer this question but this is not always straightforward. In those cases a whole set of algorithms exist for doing dimensionality reduction (= feature selection). Support for some of these algorithms may eventually make it into the toolbox but are not currently implemented. That is a whole PhD thesis on its own. However, if a model type provides functions for input relevance determination the toolbox can leverage this. For example, the LS-SVM model available in the toolbox supports Automatic Relevance Determination (ARD). This means that if you use the SUMO Toolbox to generate an LS-SVM model, you can call the function ''ARD()'' on the model and it will give you a list of the inputs it thinks are most important.

=== Should I use a Matlab script or a shell script for interfacing with my simulation code? ===

When you want to link SUMO with an external simulation engine (ADS Momentum, SPECTRE, FEBIO, SWAT, ...) you need a [http://en.wikipedia.org/wiki/Shell_script shell script] (or executable) that can take the requested points from SUMO, setup the simulation engine (e.g., set necessary input files), calls the simulator for all the requested points, reads the output (e.g., one or more output files), and returns the results to SUMO (see [[Interfacing with the toolbox]]).

Which one you choose (matlab script + [[Config:SampleEvaluator#matlab|Matlab Sample Evaluator]], or shell script/executable with [[Config:SampleEvaluator#local|Local Sample Evaluator]] is basically a matter of preference, take whatever is easiest for you.

HOWEVER, there is one important consideration: Matlab does not support threads so this means that if you use a matlab script to interface with the simulation engine, simulations and modeling will happen sequentially, NOT in parallel. This means the modeling code will sit around waiting, doing nothing, until the simulation(s) have finished. If your simulation code takes a long time to run this is not very efficient.

On the other hand, using a shell script/executable, does allow the modeling and simulation to occur in parallel (at least if you wrote your interface script in such a way that it can be run multiple times in parallel, i.e., no shared global directories or variables that can cause [http://en.wikipedia.org/wiki/Race_condition race conditions]).

As a sidenote, note that if you already put work into a Matlab script, it is still possible to use a shell script, by writing a shell script that starts Matlab (using -nodisplay or -nojvm options), executes your script (using the -r option), and exits Matlab again. Of course it is not very elegant and adds some overhead but depending on your situation it may be worth it.

=== How can I look at the internal structure of a SUMO model ===

See [[Using_a_model#Available_methods]].

=== Is there any design documentation available? ===

An in depth overview of the rationale and philosophy, including a treatment of the software architecture underlying the SUMO Toolbox is available in the form of a PhD dissertation. A copy of this dissertation [http://www.sumo.intec.ugent.be/?q=system/files/2010_04_PhD_DirkGorissen.pdf is available here].

== Troubleshooting ==

=== I have a problem and I want to report it ===

See : [[Reporting problems]].

===I am getting a java out of memory error, what happened?===
Datasets are loaded through java. This means that the java heap space is used for storing the data. If you try to load a huge dataset (> 50MB), you might experience problems with the maximum heap size. You can solve this by raising the heap size as described on the following webpage:
[http://www.mathworks.com/support/solutions/data/1-18I2C.html]

=== I sometimes get flat models when using rational functions ===

First make sure the model is indeed flat, and does not just appear so on the plot. You can verify this by looking at the output axis range and making sure it is within reasonable bounds. When there are poles in the model, the axis range is sometimes stretched to make it possible to plot the high values around the pole, causing the rest of the model to appear flat. If the model contains poles, refer to the next question for the solution.

The [[Config:AdaptiveModelBuilder#rational| RationalModel]] tries to do a least squares fit, based on which monomials are allowed in numerator and denominator. We have experienced that some models just find a flat model as the best least squares fit. There are two causes for this:

* The number of sample points is few, and the model parameters (as explained [[Model types explained#PolynomialModel|here]]) force the model to use only a very small set of degrees of freedom. The solution in this case is to increase the minimum percentage bound in the RationalFactory section of your configuration file: change the <code>"percentBounds"</code> option to <code>"60,100"</code>, <code>"80,100"</code>, or even <code>"100,100"</code>. A setting of <code>"100,100"</code> will force the polynomial models to always exactly interpolate. However, note that this does not scale very well with the number of samples (to counter this you can set <code>"maxDegrees"</code>). If, after increasing the <code>"percentBounds"</code> you still get weird, spiky, models you simply need more samples or you should switch to a different model type.
* Another possibility is that given a set of monomial degrees, the flat function is just the best possible least squares fit. In that case you simply need to wait for more samples.
* The measure you are using is not accurately estimating the true error, try a different measure or error function. Note that a maximum relative error is dangerous to use since a the 0-function (= a flat model) has a lower maximum relative error than a function which overshoots the true behavior in some places but is otherwise correct.

=== When using rational functions I sometimes get 'spikes' (poles) in my model ===

When the denominator polynomial of a rational model has zeros inside the domain, the model will tend to infinity near these points. In most cases these models will only be recognized as being `the best' for a short period of time. As more samples get selected these models get replaced by better ones and the spikes should disappear.

So, it is possible that a rational model with 'spikes' (caused by poles inside the domain) will be selected as best model. This may or may not be an issue, depending on what you want to use the model for. If it doesn't matter that the model is very inaccurate at one particular, small spot (near the pole), you can use the model with the pole and it should perform properly.

However, if the model should have a reasonable error on the entire domain, several methods are available to reduce the chance of getting poles or remove the possibility altogether. The possible solutions are:

* Simply wait for more data, usually spikes disappear (but not always).
* Lower the maximum of the <code>"percentBounds"</code> option in the RationalFactory section of your configuration file. For example, say you have 500 data points and if the maximum of the <code>"percentBounds"</code> option is set to 100 percent it means the degrees of the polynomials in the rational function can go up to 500. If you set the maximum of the <code>"percentBounds"</code> option to 10, on the other hand, the maximum degree is set at 50 (= 10 percent of 500). You can also use the <code>"maxDegrees"</code> option to set an absolute bound.
* If you roughly know the output range your data should have, an easy way to eliminate poles is to use the [[Measures#MinMax| MinMax]] [[Measures| Measure]] together with your current measure ([[Measures#CrossValidation| CrossValidation]] by default). This will cause models whose response falls outside the min-max bounds to be penalized extra, thus spikes should disappear.
* Use a different model type (RBF, ANN, SVM,...), as spikes are a typical problem of rational functions.
* Increase the population size if using the genetic version
* Try using the [[SampleSelector#RationalPoleSuppressionSampleSelector| RationalPoleSuppressionSampleSelector]], it was designed to get rid of this problem more quickly, but it only selects one sample at the time.

However, these solutions may not still not suffice in some cases. The underlying reason is that the order selection algorithm contains quite a lot of randomness, making it prone to over-fitting. This issue is being worked on but will take some time. Automatic order selection is not an easy problem

=== There is no noise in my data yet the rational functions don't interpolate ===

[[FAQ#I sometimes get flat models when using rational functions |see this question]].

=== When loading a model from disk I get "Warning: Class ':all:' is an unknown object class. Object 'model' of this class has been converted to a structure." ===

You are trying to load a model file without the SUMO Toolbox in your Matlab path. Make sure the toolbox is in your Matlab path.

In short: Start Matlab, run <code><SUMO-Toolbox-directory>/startup.m</code> (to ensure the toolbox is in your path) and then try to load your model.

=== When running the SUMO Toolbox you get an error like "No component with id 'annpso' of type 'adaptive model builder' found in config file." ===

This means you have specified to use a component with a certain id (in this case an AdaptiveModelBuilder component with id 'annpso') but a component with that id does not exist further down in the configuration file (in this particular case 'annpso' does not exist but 'anngenetic' or 'ann' does, as a quick search through the configuration file will show). So make sure you only declare components which have a definition lower down. So see which components are available, simply scroll down the configuration file and see which id's are specified. Please also refer to the [[Toolbox configuration#Declarations and Definitions | Declarations and Definitions]] page.

=== When using NANN models I sometimes get "Runtime error in matrix library, Choldc failed. Matrix not positive definite" ===

This is a problem in the mex implementation of the [http://www.iau.dtu.dk/research/control/nnsysid.html NNSYSID] toolbox. Simply delete the mex files, the Matlab implementation will be used and this will not cause any problems.

=== When using FANN models I sometimes get "Invalid MEX-file createFann.mexa64, libfann.so.2: cannot open shared object file: No such file or directory." ===

This means Matlab cannot find the [http://leenissen.dk/fann/ FANN] library itself to link to dynamically. Make sure the FANN libraries (stored in src/matlab/contrib/fann/src/.libs/) are in your library path, e.g., on unix systems, make sure they are included in LD_LIBRARY_PATH.

=== Undeﬁned function or method ’createFann’ for input arguments of type ’double’. ===

See [[FAQ#When_using_FANN_models_I_sometimes_get_.22Invalid_MEX-file_createFann.mexa64.2C_libfann.so.2:_cannot_open_shared_object_file:_No_such_file_or_directory..22]]

=== When trying to use SVM models I get 'Error during fitness evaluation: Error using ==> svmtrain at 170, Group must be a vector' ===

You forgot to build the SVM mex files for your platform. For windows they are pre-compiled for you, on other systems you have to compile them yourself with the makefile.

=== When running the toolbox you get something like '??? Undefined variable "ibbt" or class "ibbt.sumo.config.ContextConfig.setRootDirectory"' ===

First see [[FAQ#What_is_the_relationship_between_Matlab_and_Java.3F | this FAQ entry]].

This means Matlab cannot find the needed Java classes. This typically means that you forgot to run 'startup' (to set the path correctly) before running the toolbox (using 'go'). So make sure you always run 'startup' before running 'go' and that both commands are always executed in the toolbox root directory.

If you did run 'startup' correctly and you are still getting an error, check that Java is properly enabled:

# typing 'usejava jvm' should return 1
# typing 's = java.lang.String', this should ''not'' give an error
# typing 'version('-java')' should return at least version 1.5.0

If (1) returns 0, then the jvm of your Matlab installation is not enabled. Check your Matlab installation or startup parameters (did you start Matlab with -nojvm?)
If (2) fails but (1) is ok, there is a very weird problem, check the Matlab documentation.
If (3) returns a version before 1.5.0 you will have to upgrade Matlab to a newer version or force Matlab to use a custom, newer, jvm (See the Matlab docs for how to do this).

=== You get errors related to ''gaoptimset'',''psoptimset'',''saoptimset'',''newff'' not being found or unknown ===

You are trying to use a component of the SUMO toolbox that requires a Matlab toolbox that you do not have. See the [[System requirements]] for more information.

=== After upgrading I get all kinds of weird errors or warnings when I run my XML files ===

See [[FAQ#How_do_I_upgrade_to_a_newer_version.3F]]

=== I get a warning about duplicate samples being selected, why is this? ===

Sometimes, in special circumstances, multiple sample selectors may select the same sample at the same time. Even though in most cases this is detected and avoided, it can still happen when multiple outputs are modelled in one run, and each output is sampled by a different sample selector. These sample selectors may then accidentally choose the same new sample location.

=== I sometimes see the error of the best model go up, shouldn't it decrease monotonically? ===

There is no short answer here, it depends on the situation. Below 'single objective' refers to the case where during the hyperparameter optimization (= the modeling iteration) combineOutputs=false, and there is only a single measure set to 'on'. The other cases are classified as 'multi objective'. See also [[Multi-Objective Modeling]].

# '''Sampling off'''
## ''Single objective'': the error should always decrease monotonically, you should never see it rise. If it does [[reporting problems|report it as a bug]]
## ''Multi objective'': There is a very small chance the error can temporarily decrease but it should be safe to ignore. In this case it is best to use a multi objective enabled modeling algorithm
# '''Sampling on'''
## ''Single objective'': inside each modeling iteration the error should always monotonically decrease. At each sampling iteration the best models are updated (to reflect the new data), thus there the best model score may increase, this is normal behavior(*). It is possible that the error increases for a short while, but as more samples come in it should decrease again. If this does not happen you are using a poor measure or poor hyperparameter optimization algorithm, or there is a problem with the modeling technique itself (e.g., clustering in the datapoints is causing numerical problems).
## ''Multi objective'': Combination of 1.2 and 2.1.

(*) This is normal if you are using a measure like cross validation that is less reliable on little data than on more data. However, in some cases you may wish to override this behavior if you are using a measure that is independent of the number of samples the model is trained with (e.g., a dense, external validation set). In this case you can force a monotonic decrease by setting the 'keepOldModels' option in the SUMO tag to true. Use with caution!

=== At the end of a run I get Undefined variable "ibbt" or class "ibbt.sumo.util.JpegImagesToMovie.createMovie" ===

This is normal, the warning printed out before the error explains why:

''[WARNING] jmf.jar not found in the java classpath, movie creation may not work! Did you install the SUMO extension pack? Alternatively you can install the java media framwork from java.sun.com''

By default, at the end of a run, the toolbox will try to generate a movie of all the intermediate model plots. To do this it requires the extension pack to be installed (you can download it from the SUMO lab website). So install the extension pack and you will no longer get the error. Alternatively you can simply set the "createMovie" option in the <SUMO> tag to "false".
So note that there is nothing to worry about, everything has run correctly, it is just the movie creation that is failing.

=== On startup I get the error "java.io.IOException: Couldn't get lock for output/SUMO-Toolbox.%g.%u.log" ===

This error means that SUMO is unable to create the log file. Check the output directory exists and has the correct permissions. If your output directory is on a shared (network) drive this could also cause problems. Also make sure you are running the toolbox (calling 'go') from the toolbox root directory, and not in some toolbox sub directory! This is very important.

If you still have problems you can override the default logfile name and location as follows:

In the <FileHandler> tag inside the <Logging> tag add the following option:

<code>
<Option key="Pattern" value="My_SUMO_Log_file.log"/>
</code>

This means that from now on the sumo log file will be saved as the file "My_SUMO_Log_file.log" in the SUMO root directory. You can use any path you like.
For more information about this option see [http://java.sun.com/j2se/1.4.2/docs/api/java/util/logging/FileHandler.html the FileHandler Javadoc].

=== The Toolbox crashes with "Too many open files" what should I do? ===

This is a known bug, see [[Known_bugs#Version_6.1]].

If this does not fix your problem then do the following:

On Windows try increasing the limit in windows as dictated by the error message. Also, when you get the error, use the fopen("all") command to see which files are open and send us the list of filenames. Then we can maybe further help you debug the problem. Even better would be to use the Process Explorer utility [http://technet.microsoft.com/en-us/sysinternals/bb896653.aspx available here]. When you get the error, dont shut down Matlab but start Process explorer and see which SUMO-Toolbox related files are open. If you then [[Reporting_problems|let us know]] we can further debug the problem.

On Linux again don't shut down Matlab but:

* open a new terminal window
* type:
<source lang="bash">
lsof > openFiles.txt
</source>
* Then [[Contact|send us]] the following information:
** the file openFiles.txt
** the exact Linux distribution you are using (Red Hat 10, CentOS 5, SUSE 11, etc).
** the output of
<source lang="bash">
uname -a ; df -T ; mount
</source>

As a temporary workaround you can try increasing the maximum number of open files ([http://www.linuxforums.org/forum/redhat-fedora-linux-help/64716-where-chnage-file-max-permanently.html see for example here]). We are currently debugging this issue.

In general: to be safe it is always best to do a SUMO run from a clean Matlab startup, especially if the run is important or may take a long time.

=== When using the LS-SVM models I get lots of warnings: "make sure lssvmFILE.x (lssvmFILE.exe) is in the current directory, change now to MATLAB implementation..." ===

The LS-SVMs have a C implementation and a Matlab implementation. If you dont have the compiled mex files it will use the matlab implementation and give a warning. But everything will work properly. To get rid of the warnings, compile the mex files [[Installation#Windows|as described here]], this can be done very easily. Or simply comment out the lines that produce the output in the lssvmlab directory in src/matlab/contrib.

=== I get an error "Undefined function or method 'trainlssvm' for input arguments of type 'cell'" ===

You most likely forgot to [[Installation#Extension_pack|install the extension pack]].

=== When running the SUMO-Toolbox under Linux, the [http://en.wikipedia.org/wiki/X_Window_System X server] suddenly restarts and I am logged out of my session ===

Note that in Linux there is an explicit difference between the [http://en.wikipedia.org/wiki/Linux_kernel kernel] and the [http://en.wikipedia.org/wiki/X_Window_System X display server]. If the kernel crashes or panics your system completely freezes (you have to reset manually) or your computer does a full reboot. Luckily this is very rare. However, if you display server (X) crashes or restarts it means your operating system is still running fine, its just that you have to log in again since your graphical session has terminated. The FAQ entry is only for the latter. If you find your kernel is panicing or freezing, that is a more fundamental problem and you should contact your system admin.

So what happens is that after a few seconds when the toolbox wants to plot the first model [http://en.wikipedia.org/wiki/X_Window_System X] crashes and you are suddenly presented with a login screen. The problem is not due to SUMO but rather to the Matlab - Display server interaction.

What you should first do is set plotModels to false in the [[Config:ContextConfig]] tag, run again and see if the problem occurs again. If it does please [[Reporting_problems| report it]]. If the problem does not occur you can then try the following:

* Log in as root (or use [http://en.wikipedia.org/wiki/Sudo sudo])
* Edit the following configuration file using a text editor (pico, nano, vi, kwrite, gedit,...)

<source lang="bash">
/etc/X11/xorg.conf
</source>

Note: the exact location of the xorg.conf file may vary on your system.

* Look for the following line:

<source lang="bash">
Load "glx"
</source>

* Comment it out by replacing it by:

<source lang="bash">
# Load "glx"
</source>

* Then save the file, restart your X server (if you do not know how to do this simply reboot your computer)
* Log in again, and try running the toolbox (making sure plotModels is set to true again). It should now work. If it still does not please [[Reporting_problems| report it]].

Note:
* this is just an empirical workaround, if you have a better idea please [[Contact|let us know]]
* if you wish to debug further yourself please check the Xorg log files and those in /var/log
* another possible workaround is to start matlab with the "-nodisplay" option. That could work as well.

=== I get the error "Failed to close Matlab pool cleanly, error is Too many output arguments" ===

This happens if you run the toolbox on Matlab version 2008a and you have the parallel computing toolbox installed. You can simply ignore this error message, it does not cause any problems. If you want to use SUMO with the parallel computing toolbox you will need Matlab 2008b.

=== The toolbox seems to keep on running forever, when or how will it stop? ===

The toolbox will keep on generating models and selecting data until one of the termination criteria has been reached. It is up to ''you'' to choose these targets carefully, so how low the toolbox runs simply depends on what targets you choose. Please see [[Running#Understanding_the_control_flow]].

Of course choosing a-priori targets up front is not always easy and there is no real solution for this, except thinking well about what type of model you want (see [[FAQ#I_dont_like_the_final_model_generated_by_SUMO_how_do_I_improve_it.3F]]). In doubt you can always use a small value (or 0) and then simply quit the running toolbox using Ctrl-C when you think its been enough.

While one could implement fancy, automatic stopping algorithms, their actual benefit is questionable.

Running

2014-03-13T09:18:22Z

Javdrher: /* Understanding the control flow */

== Getting started ==

If you are just getting started with the toolbox and you want to find out how everything works, this section should help you on your way.

* The '''features''' and scope of the SUMO Toolbox are detailed on this [[About#Intended_use|page]] where you can find out whether the SUMO Toolbox suits your needs. To find out more about the SUMO Toolbox in general, check out the documentation on this [[About#Documentation|page]].

* If you want to get hands-on with the SUMO Toolbox, we recommend using this [http://www.sumowiki.intec.ugent.be/images/7/7b/SUMO_hands_on.pdf guide]. The guide explains the basic SUMO framework, how to '''install''' the SUMO Toolbox on your computer and provides some '''examples''' on running the toolbox.

* Since the SUMO Toolbox is [[Configuration|configured]] by editing XML files it might be a good idea to read [[FAQ#What is XML?|this page]], if you are not familiar with XML files. You can also check out this [[Config:ToolboxConfiguration#Interpreting_the_configuration_file| page]] which has more info on how the SUMO Toolbox uses XML.

* The '''installation''' information can also be found [[Installation|on this wiki]] and more information on running a different example the SUMO Toolbox can be found [[Running#Running_different_examples|here]].

* The SUMO Toolbox also comes with a set of '''demo's''' showing the different uses of the toolbox. You can find the configuration files for these demo's in the 'config/demo' directory.

* We have also provided some [[General_guidelines|general modelling guidelines]] which you can use a starting point to model your problems.

* Also be sure to check out the '''Frequently Asked Questions''' ([[FAQ|FAQ]]) page as it might answer some of your questions.

Finally if you get stuck or have any problems [[Reporting problems|feel free to let us know]] and will do our best to help you.

''We are well aware that documentation is not always complete and possibly even out of date in some cases. We try to document everything as best we can but much is limited by available time and manpower. We are are a university research group after all. The most up to date documentation can always be found (if not here) in the default.xml configuration file and, of course, in the source files. If something is unclear please don't hesitate to [[Reporting problems|ask]].''

== Running different examples ==

=== Prerequisites ===
This section is about running a different example problem, if you want to model your own problem see [[Adding an example]]. Make sure you [[configuration|understand the difference between the simulator configuration file and the toolbox configuration file]] and understand how these configuration files are [[Toolbox configuration#Structure|structured]].

=== Changing the configuration xml ===
The <code>examples/</code> directory contains many example simulators that you can use to test the toolbox with. These examples range from predefined functions, to datasets from various domains, to native simulation code. If you want to try one of the examples, open <code>config/default.xml</code> and edit the [[Simulator| <Simulator>]] tag to suit your needs (for more information about editing the configuration xml, go to this [[Config:ToolboxConfiguration#Interpreting_the_configuration_file|page]]).

For example, originally the default '''configuration xml''' file, default.xml, contains:

<source lang="xml">
<Simulator>Math/Academic/Academic2DTwice.xml</Simulator>
</source>

The toolbox will look in the examples directory for a project directory called <code>Math/Academic</code> and load the '''simulator xml''' file named 'Academic2DTwice.xml'. If no simulator xml file name is specified, the SUMO Toolbox will load the simulator xml with the same name as directory. For example <code>Math/Peaks</code> is equivalent to <code>Math/Peaks/Peaks.xml</code>.

Now let's say you want to run one of the different example problems, for example, lets say you want to try the 'Michalewicz' example. In this case you would replace the original Simulator tag with:

<source lang="xml">
<Simulator>Math/Michalewicz</Simulator>
</source>

In addition you would have to change the <code><Outputs></code> tag. The <code>Math/Academic/Academic2DTwice.xml</code> example has two outputs (''out'' and ''outinverse''). However, the Michalewicz example has only one (''out''). Thus telling the SUMO Toolbox to model the ''outinverse'' output in that case makes no sense since it does not exist for the 'Michalewicz' example. So the following output configuration suffices:

<source lang="xml">
<Outputs>
<Output name="out">
</Output>
</source>

The rest of default.xml can be kept the same, then simply type in '<code>go</code>' in the SUMO root to run the example. If you do not specify any arguments, the SUMO Toolbox will use the settings in the default.xml file. If you wish run a different configuration file use the following command '<code>go(pathToYourConfig/yourConfig.xml)</code>' where pathToYourConfig is the path to where your configuration XML-file is located, and yourConfig.xml is the name of your configuration XML-file.

As noted above, it is also possible to specify an absolute path or refer to a particular simulator xml file directly. For example:

<source lang="xml">
<Simulator>/path/to/your/project/directory</Simulator>
</source>

or:

<source lang="xml">
<Simulator>Ackley/Ackley2D.xml</Simulator>
</source>

=== Important notes ===

If you start changing default.xml to try out different examples, there are a number of important things you should be aware of.

==== Select matching Inputs and Outputs ====
Using the <code><Inputs></code> and <code><Outputs></code> tags in the SUMO-Toolbox configuration file you can tell the toolbox which outputs should be modeled and how. Note that these tags are optional. You can delete them and then the toolbox will simply model all available inputs and outputs. If you do specify a particular output, for example say you tell the toolbox to model output ''temperature'' of the simulator 'ChemistryProblem'. If you then change the configuration file to model 'BiologyProblem' you will have to change the name of the selected output (or input) since most likely 'BiologyProblem' will not have an output called ''temperature''.

Information on how to further customize to modelling of the outputs can be found [[Outputs|here]].

==== Select a matching SampleEvaluator ====
There is one important caveat. Some examples consist of a fixed data set, some are implemented as a Matlab function, others as a C++ executable, etc. When running a different example you have to tell the SUMO Toolbox how the example is implemented so the toolbox knows how to extract data (eg: should it load a data file or should it call a Matlab function). This is done by specifying the correct [[Config:DataSource|DataSource]] tag. The default DataSource is:

<source lang="xml">
<DataSource>matlab</DataSource>
</source>

So this means that the toolbox expects the example you want to run is implemented as a Matlab function. Thus it is no use running an example that is implemented as a static dataset using the '[[Config:DataSource#matlab|matlab]]' or '[[Config:DataSource#local|local]]' sample evaluators. Doing this will result in an error. In this case you should use '[[Config:DataSource#scatteredDataset|scatteredDataset]]' (or sometimes [[Config:DataSource#griddedDataset|griddedDataset]]).

To see how an example is implemented open the XML file inside the example directory and look at the <source lang="xml"><Implementation></source> tag. To see which DataSources are available see [[Config:DataSource]].

=== Select an appropriate model type ===
The choice of the model type which you use to model your problem has a great impact on the overall accuracy. If you switch to a different example you may also have to change the model type used. For example, if you are using a spline model (which only works in 2D) and you decide to model a problem with many dimensions (e.g., CompActive or BostonHousing) you will have to switch to a different model type (e.g., any of the SVM or LS-SVM model builders).

The <ModelBuilder> tag specifies which model type is used to model problem. In most cases the 'ModelBuilder' also specifies an optimization algorithm to find the best 'hyperparameters' of the models. Hyperparameters are parameters which define model, such as the order of a polynomial or the the number of hidden nodes of an Artificial Neural Network. To see all the ModelBuilder options and what they do go to [[Config:ModelBuilder| this page]].

=== One-shot designs ===
If you want to use the toolbox to simply model all your data without instead of using the default sequential approach, see [[Adaptive_Modeling_Mode]] for how to do this.

== Running different configuration files ==

If you just type "go" the SUMO-Toolbox will run using the configuration options in default.xml. However you may want to make a copy of default.xml and play around with that, leaving your original default.xml intact. So the question is, how do you run that file? Lets say your copy is called MyConfigFile.xml. In order to tell SUMO to run that file you would type:

<source lang="xml">
go('/path/to/MyConfigFile.xml')
</source>

The path can be an absolute path, or a path relative to the SUMO Toolbox root directory.
To see what other options you have when running go type ''help go''.

'''Remember to always run go from the toolbox root directory.'''

=== Merging your configuration ===

If you know what you are doing, you can merge your own custom configuration with the default configuration by using the '-merge' option. Options or tags that are missing in this custom file will then be filled up with the values from the default configuration. This prevents you from having to duplicate tags in default.xml and creates xml files which are easier to manipulate. However, if you are unfamiliar with XML and not quite sure what you are doing we advise against using it.

=== Running optimization examples ===
The SUMO toolbox can also be used for minimizing the simulator in an intelligent way. There are 2 examples in included in <code>config/Optimization</code>. To run these examples is exactly the same as always, e.g. <code>go('config/optimization/Branin.xml')</code>. The only difference is in the sample selector which is specified in the configuration file itself.
<gallery>
Image:ISCSampleSelector2.png
</gallery>
The example configuration files are well documented, it is advised to go through them for more detailed information.

== Understanding the control flow ==

[[Image:sumo-control-flow.png|thumb|300px|right|The general SUMO-Toolbox control flow]]

When the toolbox is running you might wonder what exactly is going on. The high level control flow that the toolbox goes through is illustrated in the flow chart and explained in more detail below. You may also refer to the [[About#Presentation|general SUMO presentation]].

# Select samples according to the [[InitialDesign|initial design]] and execute the [[Simulator]] for each of the points
# Once enough points are available, start the [[Add_Model_Type#Models.2C_Model_builders.2C_and_Factories|Model builder]] which will start producing models as it optimizes the model parameters
## the number of models generated depends on the [[Config:ModelBuilder|ModelBuilder]] used. Usually the ModelBuilder tag contains a setting like ''maxFunEvals'' or ''popSize''. This indicates to the algorithm that is optimizing the model parameters (and thus generating models) how many models it should maximally generate before stopping. By increasing this number you will generate more models in between sampling iterations, thus have a higher chance of getting a better model, but increasing the computation time. This step is what we refer to as a ''modeling iteration''.
## optimization over the model parameters is driven by the [[Measures|Measure(s)]] that are enabled. Selection of the Measure is thus very important for the modeling process!
## each time the model builder generates a model that has a lower measure score than the previous best model, the toolbox will trigger a "New best model found" event, save the model, generate a plot, and trigger all the profilers to update themselves.
## so note that by default, you only see something happen when a new best model is found, you do not see all the other models that are being generated in the background. If you want to see those, you must increase the logging granularity (or just look in the log file) or [[FAQ#How_do_I_enable_more_profilers.3F|enable more profilers]].
# So the model builder will run until it has completed
# Then, if the current best model satisfies all the targets in the enabled Measures, it means we have reached the requirements and the toolbox terminates.
# If not, the [[Config:SequentialDesign|SequentialDesign]] selects a new set of samples (= a ''sampling iteration''), they are simulated, and the model building resumes or is restarted according to the configured restart strategy
# This whole loop continues (thus the toolbox will keep running) until one of the following conditions is true:
## the targets specified in the active measure tags have been reached (each Measure has a target value which you can set). Note though, that when you are using multiple measures (see [[Multi-Objective Modeling]]) or when using single measures like AIC or LRM, it becomes difficult to set a priori targets since you cant really interpret the scores (in contrast to the simple case with a single measure like CrossValidation where your target is simply the error you require). In those cases you should usually set the targets to 0 and use one of the other criteria below to make sure the toolbox stops.
## the maximum running time has been reached (''maximumTime'' property in the [[Config:SUMO]] tag)
## the maximum number of samples has been reached (''maximumTotalSamples'' property in the [[Config:SUMO]] tag)
## the maximum number of modeling iterations has been reached (''maxModelingIterations'' property in the [[Config:SUMO]] tag)

Note that it is also possible to disable the sample selection loop, see [[Adaptive Modeling Mode]]. Also note that while you might think the toolbox is not doing anything, it is actually building models in the background (see above for how to see the details). The toolbox will only inform you (unless configured otherwise) if it finds a model that is better than the previous best model (using that particular measure!!). If not it will continue running until one of the stopping conditions is true.

== SUMO Toolbox output ==

All output is stored under the [[Config:ContextConfig#OutputDirectory|directory]] specified in the [[Config:ContextConfig]] section of the configuration file (by default this is set to "<code>output</code>").

Starting from version 6.0 the output directory is always relative to the project directory of your example. Unless you specify an absolute path.

After completion of a SUMO Toolbox run, the following files and directories can be found there (e.g. : in <code>output/<run_name+date+time>/</code> subdirectory) :

* <code>config.xml</code>: The xml file that was used by this run. Can be used to reproduce the entire modeling process for that run.
* <code>randstate.dat</code>: contains states of the random number generators, so that it becomes possible to deterministically repeat a run (see the [[Random state]] page).
* <code>samples.txt</code>: a list of all the samples that were evaluated, and their outputs.
* <code>profilers</code>-dir: contains information and plots about convergence rates, resource usage, and so on.
* <code>best</code>-dir: contains the best models (+ plots) of all outputs that were constructed during the run. This is continuously updated as the modeling progresses.
* <code>models_outputName</code>-dir: contains a history of all intermediate models (+ plots + movie) for each output that was modeled.

If you generated models [[Multi-Objective Modeling|multi-objectively]] you will also find the following directory:

* <code>paretoFronts</code>-dir: contains snapshots of the population during multi-objective optimization of the model parameters.

== Debugging ==

Remember to always check the log file first if problems occur!
When [[reporting problems]] please attach your log file and the xml configuration file you used.

To aid understanding and debugging you should set the console and file logging level to FINE (or even FINER, FINEST)
as follows:

Change the level of the ConsoleHandler tag to FINE, FINER or FINEST. Do the same for the FileHandler tag.

<source lang="xml">

<ConsoleHandler>
<Option key="Level" value="FINE"/>
</ConsoleHandler>
</source>

== Using models ==

Once you have generated a model, you might wonder what you can do with it. To see how to load, export, and use SUMO generated models see the [[Using a model]] page.

== Modelling complex outputs ==

The toolbox supports the modeling of complex valued data. If you do not specify any specific <[[Outputs|Output]]> tags, all outputs will be modeled with [[Outputs#Complex_handling|complexHandling]] set to '<code>complex</code>'. This means that a real output will be modeled as a real value, and a complex output will be modeled as a complex value (with a real and imaginary part). If you don't want this (i.e., you want to model the modulus of a complex output or you want to model real and imaginary parts separately), you explicitly have to set [[Outputs#Complex_handling|complexHandling]] to 'modulus', 'real', 'imaginary', or 'split'.

More information on this subject can be found at the [[Outputs#Complex_handling|Outputs]] page.

== Models with multiple outputs ==

If multiple [[Outputs]] are selected, by default the toolbox will model each output separately using a separate adaptive model builder object. So if you have a system with 3 outputs you will get three different models each with one output. However, sometimes you may want a single model with multiple outputs. For example instead of having a neural network for each component of a complex output (real/imaginary) you might prefer a single network with 2 outputs. To do this simply set the 'combineOutputs' attribute of the <AdaptiveModelBuilder> tag to 'true'. That means that each time that model builder is selected for an output, the same model builder object will be used instead of creating a new one.

Note though, that not all model types support multiple outputs. If they don't you will get an error message.

Also note that you can also generate models with multiple outputs in a multi-objective fashion. For information on this see the page on [[Multi-Objective Modeling]].

== Multi-Objective Model generation ==

See the page on [[Multi-Objective Modeling]].

== Interfacing with the SUMO Toolbox ==

To learn how to interface with the toolbox or model your own problem see the [[Adding an example]] and [[Interfacing with the toolbox]] pages.

== Test Suite ==

A test harness is provided that can be run manually or automatically as part of a cron job. The test suite consists of a number of test XML files (in the config/test/ directory), each describing a particular surrogate modeling experiment. The file config/test/suite.xml dictates which tests are run and their order. The suite.xml file also contains the accuracy and sample bounds that are checked after each test. If the final model found does not fall within the accuracy or number-of-samples bounds, the test is considered failed.

Note also that some of the predefined test cases may rely on data sets or simulation code that are not publically available for confidentiality reasons. However, since these test problems typically make very good benchmark problems we left them in for illustration purposes.

The coordinating class is the Matlab TestSuite class found in the src/matlab directory. Besides running the tests defined in suite.xml it also tests each of the model member functions.

Assuming the SUMO Toolbox is setup properly and the necessary libraries are compiled ([[Installation#Optional:_Compiling_libraries|see here]]), the test suite should be run as follows (from the SUMO root directory):

<source lang="matlab">
s = TestEngine('config/test/suite.xml') ; s.run()
</source>

The "run()" method also supports an optional parameter (a vector) that dictates which tests to run (e.g., run([2 5 3]) will run tests 2,5 and 3).

''Note that due to randomization the final accuracy and number of samples used may vary slightly from run to run (causing failed tests). Thus the bounds must be set sufficiently loose.''

== Tips ==

See the [[Tips]] page for various tips and gotchas.

Tips

2014-03-13T09:17:13Z

Javdrher:

* If you want to benchmark your computer for Matlab speed simply run "bench" in Matlab

* You can switch off adaptive sample selection if you do not specify a [[Config:SequentialDesign| <SequentialDesign>]] tag. See [[Adaptive Modeling Mode]].

* Remember that the Measure (and error function you use) strongly influence the quality and fit of the surrogate model. If you are unhappy with the final model, try a different [[Measures| Measure]] and/or [[FAQ#How_do_I_change_the_error_function_.28relative_error.2C_RMS.2C_....29.3F| error function]]. See also [[Multi-Objective Modeling]].

* If the toolbox is too slow for you can speed it up in different ways. See: [[FAQ#How_can_I_make_the_toolbox_run_faster.3F]]

* By default Matlab warnings are turned off. To turn them on, either edit configure.m or type 'warning on' before running 'go'

* '''A blog''' covering related research can be found here [http://sumolab.blogspot.com/ http://sumolab.blogspot.com]

Tips

2014-03-13T09:16:47Z

Javdrher:

* If you want to benchmark your computer for Matlab speed simply run "bench" in Matlab

* You can switch off adaptive sample selection if you do not specify a [[SequentialDesign| <SequentialDesign>]] tag. See [[Adaptive Modeling Mode]].

* Remember that the Measure (and error function you use) strongly influence the quality and fit of the surrogate model. If you are unhappy with the final model, try a different [[Measures| Measure]] and/or [[FAQ#How_do_I_change_the_error_function_.28relative_error.2C_RMS.2C_....29.3F| error function]]. See also [[Multi-Objective Modeling]].

* If the toolbox is too slow for you can speed it up in different ways. See: [[FAQ#How_can_I_make_the_toolbox_run_faster.3F]]

* By default Matlab warnings are turned off. To turn them on, either edit configure.m or type 'warning on' before running 'go'

* '''A blog''' covering related research can be found here [http://sumolab.blogspot.com/ http://sumolab.blogspot.com]

Running

2014-03-13T09:14:05Z

Javdrher: /* Select an appropriate model type */

== Getting started ==

If you are just getting started with the toolbox and you want to find out how everything works, this section should help you on your way.

* The '''features''' and scope of the SUMO Toolbox are detailed on this [[About#Intended_use|page]] where you can find out whether the SUMO Toolbox suits your needs. To find out more about the SUMO Toolbox in general, check out the documentation on this [[About#Documentation|page]].

* If you want to get hands-on with the SUMO Toolbox, we recommend using this [http://www.sumowiki.intec.ugent.be/images/7/7b/SUMO_hands_on.pdf guide]. The guide explains the basic SUMO framework, how to '''install''' the SUMO Toolbox on your computer and provides some '''examples''' on running the toolbox.

* Since the SUMO Toolbox is [[Configuration|configured]] by editing XML files it might be a good idea to read [[FAQ#What is XML?|this page]], if you are not familiar with XML files. You can also check out this [[Config:ToolboxConfiguration#Interpreting_the_configuration_file| page]] which has more info on how the SUMO Toolbox uses XML.

* The '''installation''' information can also be found [[Installation|on this wiki]] and more information on running a different example the SUMO Toolbox can be found [[Running#Running_different_examples|here]].

* The SUMO Toolbox also comes with a set of '''demo's''' showing the different uses of the toolbox. You can find the configuration files for these demo's in the 'config/demo' directory.

* We have also provided some [[General_guidelines|general modelling guidelines]] which you can use a starting point to model your problems.

* Also be sure to check out the '''Frequently Asked Questions''' ([[FAQ|FAQ]]) page as it might answer some of your questions.

Finally if you get stuck or have any problems [[Reporting problems|feel free to let us know]] and will do our best to help you.

''We are well aware that documentation is not always complete and possibly even out of date in some cases. We try to document everything as best we can but much is limited by available time and manpower. We are are a university research group after all. The most up to date documentation can always be found (if not here) in the default.xml configuration file and, of course, in the source files. If something is unclear please don't hesitate to [[Reporting problems|ask]].''

== Running different examples ==

=== Prerequisites ===
This section is about running a different example problem, if you want to model your own problem see [[Adding an example]]. Make sure you [[configuration|understand the difference between the simulator configuration file and the toolbox configuration file]] and understand how these configuration files are [[Toolbox configuration#Structure|structured]].

=== Changing the configuration xml ===
The <code>examples/</code> directory contains many example simulators that you can use to test the toolbox with. These examples range from predefined functions, to datasets from various domains, to native simulation code. If you want to try one of the examples, open <code>config/default.xml</code> and edit the [[Simulator| <Simulator>]] tag to suit your needs (for more information about editing the configuration xml, go to this [[Config:ToolboxConfiguration#Interpreting_the_configuration_file|page]]).

For example, originally the default '''configuration xml''' file, default.xml, contains:

<source lang="xml">
<Simulator>Math/Academic/Academic2DTwice.xml</Simulator>
</source>

The toolbox will look in the examples directory for a project directory called <code>Math/Academic</code> and load the '''simulator xml''' file named 'Academic2DTwice.xml'. If no simulator xml file name is specified, the SUMO Toolbox will load the simulator xml with the same name as directory. For example <code>Math/Peaks</code> is equivalent to <code>Math/Peaks/Peaks.xml</code>.

Now let's say you want to run one of the different example problems, for example, lets say you want to try the 'Michalewicz' example. In this case you would replace the original Simulator tag with:

<source lang="xml">
<Simulator>Math/Michalewicz</Simulator>
</source>

In addition you would have to change the <code><Outputs></code> tag. The <code>Math/Academic/Academic2DTwice.xml</code> example has two outputs (''out'' and ''outinverse''). However, the Michalewicz example has only one (''out''). Thus telling the SUMO Toolbox to model the ''outinverse'' output in that case makes no sense since it does not exist for the 'Michalewicz' example. So the following output configuration suffices:

<source lang="xml">
<Outputs>
<Output name="out">
</Output>
</source>

The rest of default.xml can be kept the same, then simply type in '<code>go</code>' in the SUMO root to run the example. If you do not specify any arguments, the SUMO Toolbox will use the settings in the default.xml file. If you wish run a different configuration file use the following command '<code>go(pathToYourConfig/yourConfig.xml)</code>' where pathToYourConfig is the path to where your configuration XML-file is located, and yourConfig.xml is the name of your configuration XML-file.

As noted above, it is also possible to specify an absolute path or refer to a particular simulator xml file directly. For example:

<source lang="xml">
<Simulator>/path/to/your/project/directory</Simulator>
</source>

or:

<source lang="xml">
<Simulator>Ackley/Ackley2D.xml</Simulator>
</source>

=== Important notes ===

If you start changing default.xml to try out different examples, there are a number of important things you should be aware of.

==== Select matching Inputs and Outputs ====
Using the <code><Inputs></code> and <code><Outputs></code> tags in the SUMO-Toolbox configuration file you can tell the toolbox which outputs should be modeled and how. Note that these tags are optional. You can delete them and then the toolbox will simply model all available inputs and outputs. If you do specify a particular output, for example say you tell the toolbox to model output ''temperature'' of the simulator 'ChemistryProblem'. If you then change the configuration file to model 'BiologyProblem' you will have to change the name of the selected output (or input) since most likely 'BiologyProblem' will not have an output called ''temperature''.

Information on how to further customize to modelling of the outputs can be found [[Outputs|here]].

==== Select a matching SampleEvaluator ====
There is one important caveat. Some examples consist of a fixed data set, some are implemented as a Matlab function, others as a C++ executable, etc. When running a different example you have to tell the SUMO Toolbox how the example is implemented so the toolbox knows how to extract data (eg: should it load a data file or should it call a Matlab function). This is done by specifying the correct [[Config:DataSource|DataSource]] tag. The default DataSource is:

<source lang="xml">
<DataSource>matlab</DataSource>
</source>

So this means that the toolbox expects the example you want to run is implemented as a Matlab function. Thus it is no use running an example that is implemented as a static dataset using the '[[Config:DataSource#matlab|matlab]]' or '[[Config:DataSource#local|local]]' sample evaluators. Doing this will result in an error. In this case you should use '[[Config:DataSource#scatteredDataset|scatteredDataset]]' (or sometimes [[Config:DataSource#griddedDataset|griddedDataset]]).

To see how an example is implemented open the XML file inside the example directory and look at the <source lang="xml"><Implementation></source> tag. To see which DataSources are available see [[Config:DataSource]].

=== Select an appropriate model type ===
The choice of the model type which you use to model your problem has a great impact on the overall accuracy. If you switch to a different example you may also have to change the model type used. For example, if you are using a spline model (which only works in 2D) and you decide to model a problem with many dimensions (e.g., CompActive or BostonHousing) you will have to switch to a different model type (e.g., any of the SVM or LS-SVM model builders).

The <ModelBuilder> tag specifies which model type is used to model problem. In most cases the 'ModelBuilder' also specifies an optimization algorithm to find the best 'hyperparameters' of the models. Hyperparameters are parameters which define model, such as the order of a polynomial or the the number of hidden nodes of an Artificial Neural Network. To see all the ModelBuilder options and what they do go to [[Config:ModelBuilder| this page]].

=== One-shot designs ===
If you want to use the toolbox to simply model all your data without instead of using the default sequential approach, see [[Adaptive_Modeling_Mode]] for how to do this.

== Running different configuration files ==

If you just type "go" the SUMO-Toolbox will run using the configuration options in default.xml. However you may want to make a copy of default.xml and play around with that, leaving your original default.xml intact. So the question is, how do you run that file? Lets say your copy is called MyConfigFile.xml. In order to tell SUMO to run that file you would type:

<source lang="xml">
go('/path/to/MyConfigFile.xml')
</source>

The path can be an absolute path, or a path relative to the SUMO Toolbox root directory.
To see what other options you have when running go type ''help go''.

'''Remember to always run go from the toolbox root directory.'''

=== Merging your configuration ===

If you know what you are doing, you can merge your own custom configuration with the default configuration by using the '-merge' option. Options or tags that are missing in this custom file will then be filled up with the values from the default configuration. This prevents you from having to duplicate tags in default.xml and creates xml files which are easier to manipulate. However, if you are unfamiliar with XML and not quite sure what you are doing we advise against using it.

=== Running optimization examples ===
The SUMO toolbox can also be used for minimizing the simulator in an intelligent way. There are 2 examples in included in <code>config/Optimization</code>. To run these examples is exactly the same as always, e.g. <code>go('config/optimization/Branin.xml')</code>. The only difference is in the sample selector which is specified in the configuration file itself.
<gallery>
Image:ISCSampleSelector2.png
</gallery>
The example configuration files are well documented, it is advised to go through them for more detailed information.

== Understanding the control flow ==

[[Image:sumo-control-flow.png|thumb|300px|right|The general SUMO-Toolbox control flow]]

When the toolbox is running you might wonder what exactly is going on. The high level control flow that the toolbox goes through is illustrated in the flow chart and explained in more detail below. You may also refer to the [[About#Presentation|general SUMO presentation]].

# Select samples according to the [[InitialDesign|initial design]] and execute the [[Simulator]] for each of the points
# Once enough points are available, start the [[Add_Model_Type#Models.2C_Model_builders.2C_and_Factories|Model builder]] which will start producing models as it optimizes the model parameters
## the number of models generated depends on the [[Config:ModelBuilder|ModelBuilder]] used. Usually the ModelBuilder tag contains a setting like ''maxFunEvals'' or ''popSize''. This indicates to the algorithm that is optimizing the model parameters (and thus generating models) how many models it should maximally generate before stopping. By increasing this number you will generate more models in between sampling iterations, thus have a higher chance of getting a better model, but increasing the computation time. This step is what we refer to as a ''modeling iteration''.
## optimization over the model parameters is driven by the [[Measures|Measure(s)]] that are enabled. Selection of the Measure is thus very important for the modeling process!
## each time the model builder generates a model that has a lower measure score than the previous best model, the toolbox will trigger a "New best model found" event, save the model, generate a plot, and trigger all the profilers to update themselves.
## so note that by default, you only see something happen when a new best model is found, you do not see all the other models that are being generated in the background. If you want to see those, you must increase the logging granularity (or just look in the log file) or [[FAQ#How_do_I_enable_more_profilers.3F|enable more profilers]].
# So the model builder will run until it has completed
# Then, if the current best model satisfies all the targets in the enabled Measures, it means we have reached the requirements and the toolbox terminates.
# If not, the [[SequentialDesign]] selects a new set of samples (= a ''sampling iteration''), they are simulated, and the model building resumes or is restarted according to the configured restart strategy
# This whole loop continues (thus the toolbox will keep running) until one of the following conditions is true:
## the targets specified in the active measure tags have been reached (each Measure has a target value which you can set). Note though, that when you are using multiple measures (see [[Multi-Objective Modeling]]) or when using single measures like AIC or LRM, it becomes difficult to set a priori targets since you cant really interpret the scores (in contrast to the simple case with a single measure like CrossValidation where your target is simply the error you require). In those cases you should usually set the targets to 0 and use one of the other criteria below to make sure the toolbox stops.
## the maximum running time has been reached (''maximumTime'' property in the [[Config:SUMO]] tag)
## the maximum number of samples has been reached (''maximumTotalSamples'' property in the [[Config:SUMO]] tag)
## the maximum number of modeling iterations has been reached (''maxModelingIterations'' property in the [[Config:SUMO]] tag)

Note that it is also possible to disable the sample selection loop, see [[Adaptive Modeling Mode]]. Also note that while you might think the toolbox is not doing anything, it is actually building models in the background (see above for how to see the details). The toolbox will only inform you (unless configured otherwise) if it finds a model that is better than the previous best model (using that particular measure!!). If not it will continue running until one of the stopping conditions is true.

== SUMO Toolbox output ==

All output is stored under the [[Config:ContextConfig#OutputDirectory|directory]] specified in the [[Config:ContextConfig]] section of the configuration file (by default this is set to "<code>output</code>").

Starting from version 6.0 the output directory is always relative to the project directory of your example. Unless you specify an absolute path.

After completion of a SUMO Toolbox run, the following files and directories can be found there (e.g. : in <code>output/<run_name+date+time>/</code> subdirectory) :

* <code>config.xml</code>: The xml file that was used by this run. Can be used to reproduce the entire modeling process for that run.
* <code>randstate.dat</code>: contains states of the random number generators, so that it becomes possible to deterministically repeat a run (see the [[Random state]] page).
* <code>samples.txt</code>: a list of all the samples that were evaluated, and their outputs.
* <code>profilers</code>-dir: contains information and plots about convergence rates, resource usage, and so on.
* <code>best</code>-dir: contains the best models (+ plots) of all outputs that were constructed during the run. This is continuously updated as the modeling progresses.
* <code>models_outputName</code>-dir: contains a history of all intermediate models (+ plots + movie) for each output that was modeled.

If you generated models [[Multi-Objective Modeling|multi-objectively]] you will also find the following directory:

* <code>paretoFronts</code>-dir: contains snapshots of the population during multi-objective optimization of the model parameters.

== Debugging ==

Remember to always check the log file first if problems occur!
When [[reporting problems]] please attach your log file and the xml configuration file you used.

To aid understanding and debugging you should set the console and file logging level to FINE (or even FINER, FINEST)
as follows:

Change the level of the ConsoleHandler tag to FINE, FINER or FINEST. Do the same for the FileHandler tag.

<source lang="xml">

<ConsoleHandler>
<Option key="Level" value="FINE"/>
</ConsoleHandler>
</source>

== Using models ==

Once you have generated a model, you might wonder what you can do with it. To see how to load, export, and use SUMO generated models see the [[Using a model]] page.

== Modelling complex outputs ==

The toolbox supports the modeling of complex valued data. If you do not specify any specific <[[Outputs|Output]]> tags, all outputs will be modeled with [[Outputs#Complex_handling|complexHandling]] set to '<code>complex</code>'. This means that a real output will be modeled as a real value, and a complex output will be modeled as a complex value (with a real and imaginary part). If you don't want this (i.e., you want to model the modulus of a complex output or you want to model real and imaginary parts separately), you explicitly have to set [[Outputs#Complex_handling|complexHandling]] to 'modulus', 'real', 'imaginary', or 'split'.

More information on this subject can be found at the [[Outputs#Complex_handling|Outputs]] page.

== Models with multiple outputs ==

If multiple [[Outputs]] are selected, by default the toolbox will model each output separately using a separate adaptive model builder object. So if you have a system with 3 outputs you will get three different models each with one output. However, sometimes you may want a single model with multiple outputs. For example instead of having a neural network for each component of a complex output (real/imaginary) you might prefer a single network with 2 outputs. To do this simply set the 'combineOutputs' attribute of the <AdaptiveModelBuilder> tag to 'true'. That means that each time that model builder is selected for an output, the same model builder object will be used instead of creating a new one.

Note though, that not all model types support multiple outputs. If they don't you will get an error message.

Also note that you can also generate models with multiple outputs in a multi-objective fashion. For information on this see the page on [[Multi-Objective Modeling]].

== Multi-Objective Model generation ==

See the page on [[Multi-Objective Modeling]].

== Interfacing with the SUMO Toolbox ==

To learn how to interface with the toolbox or model your own problem see the [[Adding an example]] and [[Interfacing with the toolbox]] pages.

== Test Suite ==

A test harness is provided that can be run manually or automatically as part of a cron job. The test suite consists of a number of test XML files (in the config/test/ directory), each describing a particular surrogate modeling experiment. The file config/test/suite.xml dictates which tests are run and their order. The suite.xml file also contains the accuracy and sample bounds that are checked after each test. If the final model found does not fall within the accuracy or number-of-samples bounds, the test is considered failed.

Note also that some of the predefined test cases may rely on data sets or simulation code that are not publically available for confidentiality reasons. However, since these test problems typically make very good benchmark problems we left them in for illustration purposes.

The coordinating class is the Matlab TestSuite class found in the src/matlab directory. Besides running the tests defined in suite.xml it also tests each of the model member functions.

Assuming the SUMO Toolbox is setup properly and the necessary libraries are compiled ([[Installation#Optional:_Compiling_libraries|see here]]), the test suite should be run as follows (from the SUMO root directory):

<source lang="matlab">
s = TestEngine('config/test/suite.xml') ; s.run()
</source>

The "run()" method also supports an optional parameter (a vector) that dictates which tests to run (e.g., run([2 5 3]) will run tests 2,5 and 3).

''Note that due to randomization the final accuracy and number of samples used may vary slightly from run to run (causing failed tests). Thus the bounds must be set sufficiently loose.''

== Tips ==

See the [[Tips]] page for various tips and gotchas.

Running

2014-03-13T09:13:24Z

Javdrher: /* Select a matching SampleEvaluator */

== Getting started ==

If you are just getting started with the toolbox and you want to find out how everything works, this section should help you on your way.

* The '''features''' and scope of the SUMO Toolbox are detailed on this [[About#Intended_use|page]] where you can find out whether the SUMO Toolbox suits your needs. To find out more about the SUMO Toolbox in general, check out the documentation on this [[About#Documentation|page]].

* If you want to get hands-on with the SUMO Toolbox, we recommend using this [http://www.sumowiki.intec.ugent.be/images/7/7b/SUMO_hands_on.pdf guide]. The guide explains the basic SUMO framework, how to '''install''' the SUMO Toolbox on your computer and provides some '''examples''' on running the toolbox.

* Since the SUMO Toolbox is [[Configuration|configured]] by editing XML files it might be a good idea to read [[FAQ#What is XML?|this page]], if you are not familiar with XML files. You can also check out this [[Config:ToolboxConfiguration#Interpreting_the_configuration_file| page]] which has more info on how the SUMO Toolbox uses XML.

* The '''installation''' information can also be found [[Installation|on this wiki]] and more information on running a different example the SUMO Toolbox can be found [[Running#Running_different_examples|here]].

* The SUMO Toolbox also comes with a set of '''demo's''' showing the different uses of the toolbox. You can find the configuration files for these demo's in the 'config/demo' directory.

* We have also provided some [[General_guidelines|general modelling guidelines]] which you can use a starting point to model your problems.

* Also be sure to check out the '''Frequently Asked Questions''' ([[FAQ|FAQ]]) page as it might answer some of your questions.

Finally if you get stuck or have any problems [[Reporting problems|feel free to let us know]] and will do our best to help you.

''We are well aware that documentation is not always complete and possibly even out of date in some cases. We try to document everything as best we can but much is limited by available time and manpower. We are are a university research group after all. The most up to date documentation can always be found (if not here) in the default.xml configuration file and, of course, in the source files. If something is unclear please don't hesitate to [[Reporting problems|ask]].''

== Running different examples ==

=== Prerequisites ===
This section is about running a different example problem, if you want to model your own problem see [[Adding an example]]. Make sure you [[configuration|understand the difference between the simulator configuration file and the toolbox configuration file]] and understand how these configuration files are [[Toolbox configuration#Structure|structured]].

=== Changing the configuration xml ===
The <code>examples/</code> directory contains many example simulators that you can use to test the toolbox with. These examples range from predefined functions, to datasets from various domains, to native simulation code. If you want to try one of the examples, open <code>config/default.xml</code> and edit the [[Simulator| <Simulator>]] tag to suit your needs (for more information about editing the configuration xml, go to this [[Config:ToolboxConfiguration#Interpreting_the_configuration_file|page]]).

For example, originally the default '''configuration xml''' file, default.xml, contains:

<source lang="xml">
<Simulator>Math/Academic/Academic2DTwice.xml</Simulator>
</source>

The toolbox will look in the examples directory for a project directory called <code>Math/Academic</code> and load the '''simulator xml''' file named 'Academic2DTwice.xml'. If no simulator xml file name is specified, the SUMO Toolbox will load the simulator xml with the same name as directory. For example <code>Math/Peaks</code> is equivalent to <code>Math/Peaks/Peaks.xml</code>.

Now let's say you want to run one of the different example problems, for example, lets say you want to try the 'Michalewicz' example. In this case you would replace the original Simulator tag with:

<source lang="xml">
<Simulator>Math/Michalewicz</Simulator>
</source>

In addition you would have to change the <code><Outputs></code> tag. The <code>Math/Academic/Academic2DTwice.xml</code> example has two outputs (''out'' and ''outinverse''). However, the Michalewicz example has only one (''out''). Thus telling the SUMO Toolbox to model the ''outinverse'' output in that case makes no sense since it does not exist for the 'Michalewicz' example. So the following output configuration suffices:

<source lang="xml">
<Outputs>
<Output name="out">
</Output>
</source>

The rest of default.xml can be kept the same, then simply type in '<code>go</code>' in the SUMO root to run the example. If you do not specify any arguments, the SUMO Toolbox will use the settings in the default.xml file. If you wish run a different configuration file use the following command '<code>go(pathToYourConfig/yourConfig.xml)</code>' where pathToYourConfig is the path to where your configuration XML-file is located, and yourConfig.xml is the name of your configuration XML-file.

As noted above, it is also possible to specify an absolute path or refer to a particular simulator xml file directly. For example:

<source lang="xml">
<Simulator>/path/to/your/project/directory</Simulator>
</source>

or:

<source lang="xml">
<Simulator>Ackley/Ackley2D.xml</Simulator>
</source>

=== Important notes ===

If you start changing default.xml to try out different examples, there are a number of important things you should be aware of.

==== Select matching Inputs and Outputs ====
Using the <code><Inputs></code> and <code><Outputs></code> tags in the SUMO-Toolbox configuration file you can tell the toolbox which outputs should be modeled and how. Note that these tags are optional. You can delete them and then the toolbox will simply model all available inputs and outputs. If you do specify a particular output, for example say you tell the toolbox to model output ''temperature'' of the simulator 'ChemistryProblem'. If you then change the configuration file to model 'BiologyProblem' you will have to change the name of the selected output (or input) since most likely 'BiologyProblem' will not have an output called ''temperature''.

Information on how to further customize to modelling of the outputs can be found [[Outputs|here]].

==== Select a matching SampleEvaluator ====
There is one important caveat. Some examples consist of a fixed data set, some are implemented as a Matlab function, others as a C++ executable, etc. When running a different example you have to tell the SUMO Toolbox how the example is implemented so the toolbox knows how to extract data (eg: should it load a data file or should it call a Matlab function). This is done by specifying the correct [[Config:DataSource|DataSource]] tag. The default DataSource is:

<source lang="xml">
<DataSource>matlab</DataSource>
</source>

So this means that the toolbox expects the example you want to run is implemented as a Matlab function. Thus it is no use running an example that is implemented as a static dataset using the '[[Config:DataSource#matlab|matlab]]' or '[[Config:DataSource#local|local]]' sample evaluators. Doing this will result in an error. In this case you should use '[[Config:DataSource#scatteredDataset|scatteredDataset]]' (or sometimes [[Config:DataSource#griddedDataset|griddedDataset]]).

To see how an example is implemented open the XML file inside the example directory and look at the <source lang="xml"><Implementation></source> tag. To see which DataSources are available see [[Config:DataSource]].

=== Select an appropriate model type ===
The choice of the model type which you use to model your problem has a great impact on the overall accuracy. If you switch to a different example you may also have to change the model type used. For example, if you are using a spline model (which only works in 2D) and you decide to model a problem with many dimensions (e.g., CompActive or BostonHousing) you will have to switch to a different model type (e.g., any of the SVM or LS-SVM model builders).

The <AdaptiveModelBuilder> tag specifies which model type is used to model problem. In most cases the 'AdaptiveModelBuilder' also specifies an optimization algorithm to find the best 'hyperparameters' of the models. Hyperparameters are parameters which define model, such as the order of a polynomial or the the number of hidden nodes of an Artificial Neural Network. To see all the AdaptiveModelBuilder options and what they do go to [[Config:AdaptiveModelBuilder| this page]].

=== One-shot designs ===
If you want to use the toolbox to simply model all your data without instead of using the default sequential approach, see [[Adaptive_Modeling_Mode]] for how to do this.

== Running different configuration files ==

If you just type "go" the SUMO-Toolbox will run using the configuration options in default.xml. However you may want to make a copy of default.xml and play around with that, leaving your original default.xml intact. So the question is, how do you run that file? Lets say your copy is called MyConfigFile.xml. In order to tell SUMO to run that file you would type:

<source lang="xml">
go('/path/to/MyConfigFile.xml')
</source>

The path can be an absolute path, or a path relative to the SUMO Toolbox root directory.
To see what other options you have when running go type ''help go''.

'''Remember to always run go from the toolbox root directory.'''

=== Merging your configuration ===

If you know what you are doing, you can merge your own custom configuration with the default configuration by using the '-merge' option. Options or tags that are missing in this custom file will then be filled up with the values from the default configuration. This prevents you from having to duplicate tags in default.xml and creates xml files which are easier to manipulate. However, if you are unfamiliar with XML and not quite sure what you are doing we advise against using it.

=== Running optimization examples ===
The SUMO toolbox can also be used for minimizing the simulator in an intelligent way. There are 2 examples in included in <code>config/Optimization</code>. To run these examples is exactly the same as always, e.g. <code>go('config/optimization/Branin.xml')</code>. The only difference is in the sample selector which is specified in the configuration file itself.
<gallery>
Image:ISCSampleSelector2.png
</gallery>
The example configuration files are well documented, it is advised to go through them for more detailed information.

== Understanding the control flow ==

[[Image:sumo-control-flow.png|thumb|300px|right|The general SUMO-Toolbox control flow]]

When the toolbox is running you might wonder what exactly is going on. The high level control flow that the toolbox goes through is illustrated in the flow chart and explained in more detail below. You may also refer to the [[About#Presentation|general SUMO presentation]].

# Select samples according to the [[InitialDesign|initial design]] and execute the [[Simulator]] for each of the points
# Once enough points are available, start the [[Add_Model_Type#Models.2C_Model_builders.2C_and_Factories|Model builder]] which will start producing models as it optimizes the model parameters
## the number of models generated depends on the [[Config:ModelBuilder|ModelBuilder]] used. Usually the ModelBuilder tag contains a setting like ''maxFunEvals'' or ''popSize''. This indicates to the algorithm that is optimizing the model parameters (and thus generating models) how many models it should maximally generate before stopping. By increasing this number you will generate more models in between sampling iterations, thus have a higher chance of getting a better model, but increasing the computation time. This step is what we refer to as a ''modeling iteration''.
## optimization over the model parameters is driven by the [[Measures|Measure(s)]] that are enabled. Selection of the Measure is thus very important for the modeling process!
## each time the model builder generates a model that has a lower measure score than the previous best model, the toolbox will trigger a "New best model found" event, save the model, generate a plot, and trigger all the profilers to update themselves.
## so note that by default, you only see something happen when a new best model is found, you do not see all the other models that are being generated in the background. If you want to see those, you must increase the logging granularity (or just look in the log file) or [[FAQ#How_do_I_enable_more_profilers.3F|enable more profilers]].
# So the model builder will run until it has completed
# Then, if the current best model satisfies all the targets in the enabled Measures, it means we have reached the requirements and the toolbox terminates.
# If not, the [[SequentialDesign]] selects a new set of samples (= a ''sampling iteration''), they are simulated, and the model building resumes or is restarted according to the configured restart strategy
# This whole loop continues (thus the toolbox will keep running) until one of the following conditions is true:
## the targets specified in the active measure tags have been reached (each Measure has a target value which you can set). Note though, that when you are using multiple measures (see [[Multi-Objective Modeling]]) or when using single measures like AIC or LRM, it becomes difficult to set a priori targets since you cant really interpret the scores (in contrast to the simple case with a single measure like CrossValidation where your target is simply the error you require). In those cases you should usually set the targets to 0 and use one of the other criteria below to make sure the toolbox stops.
## the maximum running time has been reached (''maximumTime'' property in the [[Config:SUMO]] tag)
## the maximum number of samples has been reached (''maximumTotalSamples'' property in the [[Config:SUMO]] tag)
## the maximum number of modeling iterations has been reached (''maxModelingIterations'' property in the [[Config:SUMO]] tag)

Note that it is also possible to disable the sample selection loop, see [[Adaptive Modeling Mode]]. Also note that while you might think the toolbox is not doing anything, it is actually building models in the background (see above for how to see the details). The toolbox will only inform you (unless configured otherwise) if it finds a model that is better than the previous best model (using that particular measure!!). If not it will continue running until one of the stopping conditions is true.

== SUMO Toolbox output ==

All output is stored under the [[Config:ContextConfig#OutputDirectory|directory]] specified in the [[Config:ContextConfig]] section of the configuration file (by default this is set to "<code>output</code>").

Starting from version 6.0 the output directory is always relative to the project directory of your example. Unless you specify an absolute path.

After completion of a SUMO Toolbox run, the following files and directories can be found there (e.g. : in <code>output/<run_name+date+time>/</code> subdirectory) :

* <code>config.xml</code>: The xml file that was used by this run. Can be used to reproduce the entire modeling process for that run.
* <code>randstate.dat</code>: contains states of the random number generators, so that it becomes possible to deterministically repeat a run (see the [[Random state]] page).
* <code>samples.txt</code>: a list of all the samples that were evaluated, and their outputs.
* <code>profilers</code>-dir: contains information and plots about convergence rates, resource usage, and so on.
* <code>best</code>-dir: contains the best models (+ plots) of all outputs that were constructed during the run. This is continuously updated as the modeling progresses.
* <code>models_outputName</code>-dir: contains a history of all intermediate models (+ plots + movie) for each output that was modeled.

If you generated models [[Multi-Objective Modeling|multi-objectively]] you will also find the following directory:

* <code>paretoFronts</code>-dir: contains snapshots of the population during multi-objective optimization of the model parameters.

== Debugging ==

Remember to always check the log file first if problems occur!
When [[reporting problems]] please attach your log file and the xml configuration file you used.

To aid understanding and debugging you should set the console and file logging level to FINE (or even FINER, FINEST)
as follows:

Change the level of the ConsoleHandler tag to FINE, FINER or FINEST. Do the same for the FileHandler tag.

<source lang="xml">

<ConsoleHandler>
<Option key="Level" value="FINE"/>
</ConsoleHandler>
</source>

== Using models ==

Once you have generated a model, you might wonder what you can do with it. To see how to load, export, and use SUMO generated models see the [[Using a model]] page.

== Modelling complex outputs ==

The toolbox supports the modeling of complex valued data. If you do not specify any specific <[[Outputs|Output]]> tags, all outputs will be modeled with [[Outputs#Complex_handling|complexHandling]] set to '<code>complex</code>'. This means that a real output will be modeled as a real value, and a complex output will be modeled as a complex value (with a real and imaginary part). If you don't want this (i.e., you want to model the modulus of a complex output or you want to model real and imaginary parts separately), you explicitly have to set [[Outputs#Complex_handling|complexHandling]] to 'modulus', 'real', 'imaginary', or 'split'.

More information on this subject can be found at the [[Outputs#Complex_handling|Outputs]] page.

== Models with multiple outputs ==

If multiple [[Outputs]] are selected, by default the toolbox will model each output separately using a separate adaptive model builder object. So if you have a system with 3 outputs you will get three different models each with one output. However, sometimes you may want a single model with multiple outputs. For example instead of having a neural network for each component of a complex output (real/imaginary) you might prefer a single network with 2 outputs. To do this simply set the 'combineOutputs' attribute of the <AdaptiveModelBuilder> tag to 'true'. That means that each time that model builder is selected for an output, the same model builder object will be used instead of creating a new one.

Note though, that not all model types support multiple outputs. If they don't you will get an error message.

Also note that you can also generate models with multiple outputs in a multi-objective fashion. For information on this see the page on [[Multi-Objective Modeling]].

== Multi-Objective Model generation ==

See the page on [[Multi-Objective Modeling]].

== Interfacing with the SUMO Toolbox ==

To learn how to interface with the toolbox or model your own problem see the [[Adding an example]] and [[Interfacing with the toolbox]] pages.

== Test Suite ==

A test harness is provided that can be run manually or automatically as part of a cron job. The test suite consists of a number of test XML files (in the config/test/ directory), each describing a particular surrogate modeling experiment. The file config/test/suite.xml dictates which tests are run and their order. The suite.xml file also contains the accuracy and sample bounds that are checked after each test. If the final model found does not fall within the accuracy or number-of-samples bounds, the test is considered failed.

Note also that some of the predefined test cases may rely on data sets or simulation code that are not publically available for confidentiality reasons. However, since these test problems typically make very good benchmark problems we left them in for illustration purposes.

The coordinating class is the Matlab TestSuite class found in the src/matlab directory. Besides running the tests defined in suite.xml it also tests each of the model member functions.

Assuming the SUMO Toolbox is setup properly and the necessary libraries are compiled ([[Installation#Optional:_Compiling_libraries|see here]]), the test suite should be run as follows (from the SUMO root directory):

<source lang="matlab">
s = TestEngine('config/test/suite.xml') ; s.run()
</source>

The "run()" method also supports an optional parameter (a vector) that dictates which tests to run (e.g., run([2 5 3]) will run tests 2,5 and 3).

''Note that due to randomization the final accuracy and number of samples used may vary slightly from run to run (causing failed tests). Thus the bounds must be set sufficiently loose.''

== Tips ==

See the [[Tips]] page for various tips and gotchas.

Running

2014-03-13T09:13:00Z

Javdrher: /* Select a matching SampleEvaluator */

== Getting started ==

If you are just getting started with the toolbox and you want to find out how everything works, this section should help you on your way.

* The '''features''' and scope of the SUMO Toolbox are detailed on this [[About#Intended_use|page]] where you can find out whether the SUMO Toolbox suits your needs. To find out more about the SUMO Toolbox in general, check out the documentation on this [[About#Documentation|page]].

* If you want to get hands-on with the SUMO Toolbox, we recommend using this [http://www.sumowiki.intec.ugent.be/images/7/7b/SUMO_hands_on.pdf guide]. The guide explains the basic SUMO framework, how to '''install''' the SUMO Toolbox on your computer and provides some '''examples''' on running the toolbox.

* Since the SUMO Toolbox is [[Configuration|configured]] by editing XML files it might be a good idea to read [[FAQ#What is XML?|this page]], if you are not familiar with XML files. You can also check out this [[Config:ToolboxConfiguration#Interpreting_the_configuration_file| page]] which has more info on how the SUMO Toolbox uses XML.

* The '''installation''' information can also be found [[Installation|on this wiki]] and more information on running a different example the SUMO Toolbox can be found [[Running#Running_different_examples|here]].

* The SUMO Toolbox also comes with a set of '''demo's''' showing the different uses of the toolbox. You can find the configuration files for these demo's in the 'config/demo' directory.

* We have also provided some [[General_guidelines|general modelling guidelines]] which you can use a starting point to model your problems.

* Also be sure to check out the '''Frequently Asked Questions''' ([[FAQ|FAQ]]) page as it might answer some of your questions.

Finally if you get stuck or have any problems [[Reporting problems|feel free to let us know]] and will do our best to help you.

''We are well aware that documentation is not always complete and possibly even out of date in some cases. We try to document everything as best we can but much is limited by available time and manpower. We are are a university research group after all. The most up to date documentation can always be found (if not here) in the default.xml configuration file and, of course, in the source files. If something is unclear please don't hesitate to [[Reporting problems|ask]].''

== Running different examples ==

=== Prerequisites ===
This section is about running a different example problem, if you want to model your own problem see [[Adding an example]]. Make sure you [[configuration|understand the difference between the simulator configuration file and the toolbox configuration file]] and understand how these configuration files are [[Toolbox configuration#Structure|structured]].

=== Changing the configuration xml ===
The <code>examples/</code> directory contains many example simulators that you can use to test the toolbox with. These examples range from predefined functions, to datasets from various domains, to native simulation code. If you want to try one of the examples, open <code>config/default.xml</code> and edit the [[Simulator| <Simulator>]] tag to suit your needs (for more information about editing the configuration xml, go to this [[Config:ToolboxConfiguration#Interpreting_the_configuration_file|page]]).

For example, originally the default '''configuration xml''' file, default.xml, contains:

<source lang="xml">
<Simulator>Math/Academic/Academic2DTwice.xml</Simulator>
</source>

The toolbox will look in the examples directory for a project directory called <code>Math/Academic</code> and load the '''simulator xml''' file named 'Academic2DTwice.xml'. If no simulator xml file name is specified, the SUMO Toolbox will load the simulator xml with the same name as directory. For example <code>Math/Peaks</code> is equivalent to <code>Math/Peaks/Peaks.xml</code>.

Now let's say you want to run one of the different example problems, for example, lets say you want to try the 'Michalewicz' example. In this case you would replace the original Simulator tag with:

<source lang="xml">
<Simulator>Math/Michalewicz</Simulator>
</source>

In addition you would have to change the <code><Outputs></code> tag. The <code>Math/Academic/Academic2DTwice.xml</code> example has two outputs (''out'' and ''outinverse''). However, the Michalewicz example has only one (''out''). Thus telling the SUMO Toolbox to model the ''outinverse'' output in that case makes no sense since it does not exist for the 'Michalewicz' example. So the following output configuration suffices:

<source lang="xml">
<Outputs>
<Output name="out">
</Output>
</source>

The rest of default.xml can be kept the same, then simply type in '<code>go</code>' in the SUMO root to run the example. If you do not specify any arguments, the SUMO Toolbox will use the settings in the default.xml file. If you wish run a different configuration file use the following command '<code>go(pathToYourConfig/yourConfig.xml)</code>' where pathToYourConfig is the path to where your configuration XML-file is located, and yourConfig.xml is the name of your configuration XML-file.

As noted above, it is also possible to specify an absolute path or refer to a particular simulator xml file directly. For example:

<source lang="xml">
<Simulator>/path/to/your/project/directory</Simulator>
</source>

or:

<source lang="xml">
<Simulator>Ackley/Ackley2D.xml</Simulator>
</source>

=== Important notes ===

If you start changing default.xml to try out different examples, there are a number of important things you should be aware of.

==== Select matching Inputs and Outputs ====
Using the <code><Inputs></code> and <code><Outputs></code> tags in the SUMO-Toolbox configuration file you can tell the toolbox which outputs should be modeled and how. Note that these tags are optional. You can delete them and then the toolbox will simply model all available inputs and outputs. If you do specify a particular output, for example say you tell the toolbox to model output ''temperature'' of the simulator 'ChemistryProblem'. If you then change the configuration file to model 'BiologyProblem' you will have to change the name of the selected output (or input) since most likely 'BiologyProblem' will not have an output called ''temperature''.

Information on how to further customize to modelling of the outputs can be found [[Outputs|here]].

==== Select a matching SampleEvaluator ====
There is one important caveat. Some examples consist of a fixed data set, some are implemented as a Matlab function, others as a C++ executable, etc. When running a different example you have to tell the SUMO Toolbox how the example is implemented so the toolbox knows how to extract data (eg: should it load a data file or should it call a Matlab function). This is done by specifying the correct [[Config:DataSource|DataSource]] tag. The default DataSource is:

<source lang="xml">
<DataSource>matlab</DataSource>
</source>

So this means that the toolbox expects the example you want to run is implemented as a Matlab function. Thus it is no use running an example that is implemented as a static dataset using the '[[Config:DatatSource#matlab|matlab]]' or '[[Config:DataSource#local|local]]' sample evaluators. Doing this will result in an error. In this case you should use '[[Config:DataSource#scatteredDataset|scatteredDataset]]' (or sometimes [[Config:DataSource#griddedDataset|griddedDataset]]).

To see how an example is implemented open the XML file inside the example directory and look at the <source lang="xml"><Implementation></source> tag. To see which DataSources are available see [[Config:DataSource]].

=== Select an appropriate model type ===
The choice of the model type which you use to model your problem has a great impact on the overall accuracy. If you switch to a different example you may also have to change the model type used. For example, if you are using a spline model (which only works in 2D) and you decide to model a problem with many dimensions (e.g., CompActive or BostonHousing) you will have to switch to a different model type (e.g., any of the SVM or LS-SVM model builders).

The <AdaptiveModelBuilder> tag specifies which model type is used to model problem. In most cases the 'AdaptiveModelBuilder' also specifies an optimization algorithm to find the best 'hyperparameters' of the models. Hyperparameters are parameters which define model, such as the order of a polynomial or the the number of hidden nodes of an Artificial Neural Network. To see all the AdaptiveModelBuilder options and what they do go to [[Config:AdaptiveModelBuilder| this page]].

=== One-shot designs ===
If you want to use the toolbox to simply model all your data without instead of using the default sequential approach, see [[Adaptive_Modeling_Mode]] for how to do this.

== Running different configuration files ==

If you just type "go" the SUMO-Toolbox will run using the configuration options in default.xml. However you may want to make a copy of default.xml and play around with that, leaving your original default.xml intact. So the question is, how do you run that file? Lets say your copy is called MyConfigFile.xml. In order to tell SUMO to run that file you would type:

<source lang="xml">
go('/path/to/MyConfigFile.xml')
</source>

The path can be an absolute path, or a path relative to the SUMO Toolbox root directory.
To see what other options you have when running go type ''help go''.

'''Remember to always run go from the toolbox root directory.'''

=== Merging your configuration ===

If you know what you are doing, you can merge your own custom configuration with the default configuration by using the '-merge' option. Options or tags that are missing in this custom file will then be filled up with the values from the default configuration. This prevents you from having to duplicate tags in default.xml and creates xml files which are easier to manipulate. However, if you are unfamiliar with XML and not quite sure what you are doing we advise against using it.

=== Running optimization examples ===
The SUMO toolbox can also be used for minimizing the simulator in an intelligent way. There are 2 examples in included in <code>config/Optimization</code>. To run these examples is exactly the same as always, e.g. <code>go('config/optimization/Branin.xml')</code>. The only difference is in the sample selector which is specified in the configuration file itself.
<gallery>
Image:ISCSampleSelector2.png
</gallery>
The example configuration files are well documented, it is advised to go through them for more detailed information.

== Understanding the control flow ==

[[Image:sumo-control-flow.png|thumb|300px|right|The general SUMO-Toolbox control flow]]

When the toolbox is running you might wonder what exactly is going on. The high level control flow that the toolbox goes through is illustrated in the flow chart and explained in more detail below. You may also refer to the [[About#Presentation|general SUMO presentation]].

# Select samples according to the [[InitialDesign|initial design]] and execute the [[Simulator]] for each of the points
# Once enough points are available, start the [[Add_Model_Type#Models.2C_Model_builders.2C_and_Factories|Model builder]] which will start producing models as it optimizes the model parameters
## the number of models generated depends on the [[Config:ModelBuilder|ModelBuilder]] used. Usually the ModelBuilder tag contains a setting like ''maxFunEvals'' or ''popSize''. This indicates to the algorithm that is optimizing the model parameters (and thus generating models) how many models it should maximally generate before stopping. By increasing this number you will generate more models in between sampling iterations, thus have a higher chance of getting a better model, but increasing the computation time. This step is what we refer to as a ''modeling iteration''.
## optimization over the model parameters is driven by the [[Measures|Measure(s)]] that are enabled. Selection of the Measure is thus very important for the modeling process!
## each time the model builder generates a model that has a lower measure score than the previous best model, the toolbox will trigger a "New best model found" event, save the model, generate a plot, and trigger all the profilers to update themselves.
## so note that by default, you only see something happen when a new best model is found, you do not see all the other models that are being generated in the background. If you want to see those, you must increase the logging granularity (or just look in the log file) or [[FAQ#How_do_I_enable_more_profilers.3F|enable more profilers]].
# So the model builder will run until it has completed
# Then, if the current best model satisfies all the targets in the enabled Measures, it means we have reached the requirements and the toolbox terminates.
# If not, the [[SequentialDesign]] selects a new set of samples (= a ''sampling iteration''), they are simulated, and the model building resumes or is restarted according to the configured restart strategy
# This whole loop continues (thus the toolbox will keep running) until one of the following conditions is true:
## the targets specified in the active measure tags have been reached (each Measure has a target value which you can set). Note though, that when you are using multiple measures (see [[Multi-Objective Modeling]]) or when using single measures like AIC or LRM, it becomes difficult to set a priori targets since you cant really interpret the scores (in contrast to the simple case with a single measure like CrossValidation where your target is simply the error you require). In those cases you should usually set the targets to 0 and use one of the other criteria below to make sure the toolbox stops.
## the maximum running time has been reached (''maximumTime'' property in the [[Config:SUMO]] tag)
## the maximum number of samples has been reached (''maximumTotalSamples'' property in the [[Config:SUMO]] tag)
## the maximum number of modeling iterations has been reached (''maxModelingIterations'' property in the [[Config:SUMO]] tag)

Note that it is also possible to disable the sample selection loop, see [[Adaptive Modeling Mode]]. Also note that while you might think the toolbox is not doing anything, it is actually building models in the background (see above for how to see the details). The toolbox will only inform you (unless configured otherwise) if it finds a model that is better than the previous best model (using that particular measure!!). If not it will continue running until one of the stopping conditions is true.

== SUMO Toolbox output ==

All output is stored under the [[Config:ContextConfig#OutputDirectory|directory]] specified in the [[Config:ContextConfig]] section of the configuration file (by default this is set to "<code>output</code>").

Starting from version 6.0 the output directory is always relative to the project directory of your example. Unless you specify an absolute path.

After completion of a SUMO Toolbox run, the following files and directories can be found there (e.g. : in <code>output/<run_name+date+time>/</code> subdirectory) :

* <code>config.xml</code>: The xml file that was used by this run. Can be used to reproduce the entire modeling process for that run.
* <code>randstate.dat</code>: contains states of the random number generators, so that it becomes possible to deterministically repeat a run (see the [[Random state]] page).
* <code>samples.txt</code>: a list of all the samples that were evaluated, and their outputs.
* <code>profilers</code>-dir: contains information and plots about convergence rates, resource usage, and so on.
* <code>best</code>-dir: contains the best models (+ plots) of all outputs that were constructed during the run. This is continuously updated as the modeling progresses.
* <code>models_outputName</code>-dir: contains a history of all intermediate models (+ plots + movie) for each output that was modeled.

If you generated models [[Multi-Objective Modeling|multi-objectively]] you will also find the following directory:

* <code>paretoFronts</code>-dir: contains snapshots of the population during multi-objective optimization of the model parameters.

== Debugging ==

Remember to always check the log file first if problems occur!
When [[reporting problems]] please attach your log file and the xml configuration file you used.

To aid understanding and debugging you should set the console and file logging level to FINE (or even FINER, FINEST)
as follows:

Change the level of the ConsoleHandler tag to FINE, FINER or FINEST. Do the same for the FileHandler tag.

<source lang="xml">

<ConsoleHandler>
<Option key="Level" value="FINE"/>
</ConsoleHandler>
</source>

== Using models ==

Once you have generated a model, you might wonder what you can do with it. To see how to load, export, and use SUMO generated models see the [[Using a model]] page.

== Modelling complex outputs ==

The toolbox supports the modeling of complex valued data. If you do not specify any specific <[[Outputs|Output]]> tags, all outputs will be modeled with [[Outputs#Complex_handling|complexHandling]] set to '<code>complex</code>'. This means that a real output will be modeled as a real value, and a complex output will be modeled as a complex value (with a real and imaginary part). If you don't want this (i.e., you want to model the modulus of a complex output or you want to model real and imaginary parts separately), you explicitly have to set [[Outputs#Complex_handling|complexHandling]] to 'modulus', 'real', 'imaginary', or 'split'.

More information on this subject can be found at the [[Outputs#Complex_handling|Outputs]] page.

== Models with multiple outputs ==

If multiple [[Outputs]] are selected, by default the toolbox will model each output separately using a separate adaptive model builder object. So if you have a system with 3 outputs you will get three different models each with one output. However, sometimes you may want a single model with multiple outputs. For example instead of having a neural network for each component of a complex output (real/imaginary) you might prefer a single network with 2 outputs. To do this simply set the 'combineOutputs' attribute of the <AdaptiveModelBuilder> tag to 'true'. That means that each time that model builder is selected for an output, the same model builder object will be used instead of creating a new one.

Note though, that not all model types support multiple outputs. If they don't you will get an error message.

Also note that you can also generate models with multiple outputs in a multi-objective fashion. For information on this see the page on [[Multi-Objective Modeling]].

== Multi-Objective Model generation ==

See the page on [[Multi-Objective Modeling]].

== Interfacing with the SUMO Toolbox ==

To learn how to interface with the toolbox or model your own problem see the [[Adding an example]] and [[Interfacing with the toolbox]] pages.

== Test Suite ==

A test harness is provided that can be run manually or automatically as part of a cron job. The test suite consists of a number of test XML files (in the config/test/ directory), each describing a particular surrogate modeling experiment. The file config/test/suite.xml dictates which tests are run and their order. The suite.xml file also contains the accuracy and sample bounds that are checked after each test. If the final model found does not fall within the accuracy or number-of-samples bounds, the test is considered failed.

Note also that some of the predefined test cases may rely on data sets or simulation code that are not publically available for confidentiality reasons. However, since these test problems typically make very good benchmark problems we left them in for illustration purposes.

The coordinating class is the Matlab TestSuite class found in the src/matlab directory. Besides running the tests defined in suite.xml it also tests each of the model member functions.

Assuming the SUMO Toolbox is setup properly and the necessary libraries are compiled ([[Installation#Optional:_Compiling_libraries|see here]]), the test suite should be run as follows (from the SUMO root directory):

<source lang="matlab">
s = TestEngine('config/test/suite.xml') ; s.run()
</source>

The "run()" method also supports an optional parameter (a vector) that dictates which tests to run (e.g., run([2 5 3]) will run tests 2,5 and 3).

''Note that due to randomization the final accuracy and number of samples used may vary slightly from run to run (causing failed tests). Thus the bounds must be set sufficiently loose.''

== Tips ==

See the [[Tips]] page for various tips and gotchas.

Config:LevelPlot

2014-02-28T08:54:50Z

Javdrher: /* SampleEvaluator */

Outputs

2014-02-28T08:52:04Z

Javdrher: /* Complex handling */

There are three levels on which you can configure the way the different outputs in the simulator file are modeled.

== Default behavior, all outputs modeled ==

If no Outputs tag is defined in the configuration file, all outputs are modeled and evaluated with the [[Measures|measure]] specified by the plan or the run. Complex outputs are modeled directly. The default error function is rootRelativeSquareError.m.

== Default behavior, selected outputs modeled ==

To change this default behavior, you must specify an <Outputs> tag in the Run configuration. Inside you specify an <Output> tag for each output you want to model. As an example, the Academic2DTwice example has two outputs: 'out' and 'outinverse'. If you only want to model 'outinverse' you specify:

<source lang="xml">
<Outputs>
<Output name="outinverse" />
</Outputs>
</source>

If on the other hand you want to model both you can delete the <Outputs> tag all together and fall back to the default behavior, or you can put:

<source lang="xml">
<Outputs>
<Output name="out" />
<Output name="outinverse" />
</Outputs>
</source>

== Custom behavior for each output ==

=== Component customization ===

If you also want to change or fine-tune the behavior of the toolbox for each output separately, you can add subelements to each Output tag to customize the toolbox for that particular output. This allows you to use different sample selectors and/or model builders for each output, change the default measure, or combine multiple measures.

Several examples of valid Output configurations can be found commented in default.xml.

Here is an example of an output configuration for the Academic2DTwice test function:

<source lang="xml">
<Outputs>
<Output name="out">
<SequentialDesign>lola</SequentialDesign>
<Measure type="CrossValidation" target=".01" errorFcn="meanSquareError" use="on" />
<ModelBuilder>lssvmps</ModelBuilder>
</Output>

<Output name="outinverse">
<SequentialDesign>delaunay</SequentialDesign>
<Measure type="CrossValidation" target=".05" errorFcn="maxRelativeError" use="on" />
<Measure type="MinMax" />
</Output>
</Outputs>
</source>

This configuration models the first Output named "out" using the CrossValidation [[Measures|measure]] (using the meanSquareError function, and a target of 0.1) with the LOLA sample selector to select new samples. It builds models for this output using the lssvmps model builder. The second output, named "outinverse", uses CrossValidation in combination with the MinMax measure, and uses the delaunay sample selector to select new samples for this output. It builds models using the model builder defined higher up in the xml hierarchy. As you can see, you can add multiple measures to one Output, so that each measure has something to say in deciding the accuracy of the models. However, you can only select one SequentialDesign or odelBuilder.

For information on how multiple measures are handled and the configuration options of each measure, see [[Measures]].

=== Complex handling ===

By default, a complex output is treated as such and is passed in its original form to all the components of the toolbox. However, some components, do not support complex numbers directly and are therefore incompatible with the default setting (they will give an error). In order to get these components to work with complex outputs, the outputs will have to be pre-processed. This is done by changing the complexHandling attribute of the Output. There are 3 valid values for this attribute:

#'''complex''': Complex outputs are treated as is. Will not work with components in the aforementioned list.
#'''split''': Complex numbers are split in real and imaginary parts, and each part is modeled separately. This means that two models will be built, and the toolbox will have twice the work it would normally have. Also any correlation between the real and imaginary part is lost.
#'''real''': Only the real part of the complex number is modeled, the imaginary part is discarded.
#'''imaginary''': Only the imaginary part of the complex number is modeled, the real part is discarded. The imaginary part is treated as one real output.
#'''modulus''': The modulus of the complex number is modeled instead of the original number. Since the modulus is a real number, it can be modeled using all the components available.

A full-blown example for the InductivePosts test function that uses all the options mentioned in this article can be found below:

<source lang="xml">


<Output name="S11,S12" complexHandling="real">
<SequentialDesign>lola</SequentialDesign>
<Measure type="CrossValidation" target=".01" />
</Output>



<Output name="S12" complexHandling="complex">
<SequentialDesign>error</SequentialDesign>
<ModelBuilder>svmps</ModelBuilder>
<Measure type="ValidationSet" target=".05" />
<Measure type="CrossValidation" target=".05" use="off" />
</Output>



<Output name="S22" complexHandling="modulus">
<Measure type="CrossValidation" target=".05" />
</Output>
</source>

== Multi-Output Modeling ==

See the page [[Running#Models_with_multiple_outputs]].

Outputs

2014-02-28T08:51:39Z

Javdrher: /* Component customization */

There are three levels on which you can configure the way the different outputs in the simulator file are modeled.

== Default behavior, all outputs modeled ==

If no Outputs tag is defined in the configuration file, all outputs are modeled and evaluated with the [[Measures|measure]] specified by the plan or the run. Complex outputs are modeled directly. The default error function is rootRelativeSquareError.m.

== Default behavior, selected outputs modeled ==

To change this default behavior, you must specify an <Outputs> tag in the Run configuration. Inside you specify an <Output> tag for each output you want to model. As an example, the Academic2DTwice example has two outputs: 'out' and 'outinverse'. If you only want to model 'outinverse' you specify:

<source lang="xml">
<Outputs>
<Output name="outinverse" />
</Outputs>
</source>

If on the other hand you want to model both you can delete the <Outputs> tag all together and fall back to the default behavior, or you can put:

<source lang="xml">
<Outputs>
<Output name="out" />
<Output name="outinverse" />
</Outputs>
</source>

== Custom behavior for each output ==

=== Component customization ===

If you also want to change or fine-tune the behavior of the toolbox for each output separately, you can add subelements to each Output tag to customize the toolbox for that particular output. This allows you to use different sample selectors and/or model builders for each output, change the default measure, or combine multiple measures.

Several examples of valid Output configurations can be found commented in default.xml.

Here is an example of an output configuration for the Academic2DTwice test function:

<source lang="xml">
<Outputs>
<Output name="out">
<SequentialDesign>lola</SequentialDesign>
<Measure type="CrossValidation" target=".01" errorFcn="meanSquareError" use="on" />
<ModelBuilder>lssvmps</ModelBuilder>
</Output>

<Output name="outinverse">
<SequentialDesign>delaunay</SequentialDesign>
<Measure type="CrossValidation" target=".05" errorFcn="maxRelativeError" use="on" />
<Measure type="MinMax" />
</Output>
</Outputs>
</source>

This configuration models the first Output named "out" using the CrossValidation [[Measures|measure]] (using the meanSquareError function, and a target of 0.1) with the LOLA sample selector to select new samples. It builds models for this output using the lssvmps model builder. The second output, named "outinverse", uses CrossValidation in combination with the MinMax measure, and uses the delaunay sample selector to select new samples for this output. It builds models using the model builder defined higher up in the xml hierarchy. As you can see, you can add multiple measures to one Output, so that each measure has something to say in deciding the accuracy of the models. However, you can only select one SequentialDesign or odelBuilder.

For information on how multiple measures are handled and the configuration options of each measure, see [[Measures]].

=== Complex handling ===

By default, a complex output is treated as such and is passed in its original form to all the components of the toolbox. However, some components, do not support complex numbers directly and are therefore incompatible with the default setting (they will give an error). In order to get these components to work with complex outputs, the outputs will have to be pre-processed. This is done by changing the complexHandling attribute of the Output. There are 3 valid values for this attribute:

#'''complex''': Complex outputs are treated as is. Will not work with components in the aforementioned list.
#'''split''': Complex numbers are split in real and imaginary parts, and each part is modeled separately. This means that two models will be built, and the toolbox will have twice the work it would normally have. Also any correlation between the real and imaginary part is lost.
#'''real''': Only the real part of the complex number is modeled, the imaginary part is discarded.
#'''imaginary''': Only the imaginary part of the complex number is modeled, the real part is discarded. The imaginary part is treated as one real output.
#'''modulus''': The modulus of the complex number is modeled instead of the original number. Since the modulus is a real number, it can be modeled using all the components available.

A full-blown example for the InductivePosts test function that uses all the options mentioned in this article can be found below:

<source lang="xml">


<Output name="S11,S12" complexHandling="real">
<SampleSelector>lola</SampleSelector>
<Measure type="CrossValidation" target=".01" />
</Output>



<Output name="S12" complexHandling="complex">
<SampleSelector>error</SampleSelector>
<AdaptiveModelBuilder>svmps</AdaptiveModelBuilder>
<Measure type="ValidationSet" target=".05" />
<Measure type="CrossValidation" target=".05" use="off" />
</Output>



<Output name="S22" complexHandling="modulus">
<Measure type="CrossValidation" target=".05" />
</Output>
</source>

== Multi-Output Modeling ==

See the page [[Running#Models_with_multiple_outputs]].

Running SUMO on UGent HPC

2014-02-27T15:43:40Z

Javdrher: /* Compiling a standalone copy of the SUMO Toolbox for use on the HPC */

Running SUMO on UGent HPC

2014-02-27T15:39:19Z

Javdrher: /* Introduction */

Adaptive Modeling Mode

2014-02-27T15:35:47Z

Javdrher:

It is possible to switch off sample selection and run the toolbox for a fixed set of samples, only optimizing the model parameters. This is what we call running in Adaptive Modeling Mode.

You can switch off adaptive sample selection if you simply do not specify a <SequentialDesign> tag in your configuration file. Just remove it from the <Plan> section. When you then run the toolbox, all the available data will be used and only adaptive modeling will be done. There are two possibilities:

#Your simulator is a dataset: the whole dataset will be read in one go and all of the data therein will be used for modeling
#Your simulator is an executable or Matlab script: only the initial design will be generated and used for adaptive modeling, no further samples will be selected.

This is useful if you just want to see what the best model is you can get for a fixed dataset in a certain amount of time.

Running

2014-02-27T15:33:25Z

Javdrher: /* Understanding the control flow */

== Getting started ==

If you are just getting started with the toolbox and you want to find out how everything works, this section should help you on your way.

* The '''features''' and scope of the SUMO Toolbox are detailed on this [[About#Intended_use|page]] where you can find out whether the SUMO Toolbox suits your needs. To find out more about the SUMO Toolbox in general, check out the documentation on this [[About#Documentation|page]].

* If you want to get hands-on with the SUMO Toolbox, we recommend using this [http://www.sumowiki.intec.ugent.be/images/7/7b/SUMO_hands_on.pdf guide]. The guide explains the basic SUMO framework, how to '''install''' the SUMO Toolbox on your computer and provides some '''examples''' on running the toolbox.

* Since the SUMO Toolbox is [[Configuration|configured]] by editing XML files it might be a good idea to read [[FAQ#What is XML?|this page]], if you are not familiar with XML files. You can also check out this [[Config:ToolboxConfiguration#Interpreting_the_configuration_file| page]] which has more info on how the SUMO Toolbox uses XML.

* The '''installation''' information can also be found [[Installation|on this wiki]] and more information on running a different example the SUMO Toolbox can be found [[Running#Running_different_examples|here]].

* The SUMO Toolbox also comes with a set of '''demo's''' showing the different uses of the toolbox. You can find the configuration files for these demo's in the 'config/demo' directory.

* We have also provided some [[General_guidelines|general modelling guidelines]] which you can use a starting point to model your problems.

* Also be sure to check out the '''Frequently Asked Questions''' ([[FAQ|FAQ]]) page as it might answer some of your questions.

Finally if you get stuck or have any problems [[Reporting problems|feel free to let us know]] and will do our best to help you.

''We are well aware that documentation is not always complete and possibly even out of date in some cases. We try to document everything as best we can but much is limited by available time and manpower. We are are a university research group after all. The most up to date documentation can always be found (if not here) in the default.xml configuration file and, of course, in the source files. If something is unclear please don't hesitate to [[Reporting problems|ask]].''

== Running different examples ==

=== Prerequisites ===
This section is about running a different example problem, if you want to model your own problem see [[Adding an example]]. Make sure you [[configuration|understand the difference between the simulator configuration file and the toolbox configuration file]] and understand how these configuration files are [[Toolbox configuration#Structure|structured]].

=== Changing the configuration xml ===
The <code>examples/</code> directory contains many example simulators that you can use to test the toolbox with. These examples range from predefined functions, to datasets from various domains, to native simulation code. If you want to try one of the examples, open <code>config/default.xml</code> and edit the [[Simulator| <Simulator>]] tag to suit your needs (for more information about editing the configuration xml, go to this [[Config:ToolboxConfiguration#Interpreting_the_configuration_file|page]]).

For example, originally the default '''configuration xml''' file, default.xml, contains:

<source lang="xml">
<Simulator>Math/Academic/Academic2DTwice.xml</Simulator>
</source>

The toolbox will look in the examples directory for a project directory called <code>Math/Academic</code> and load the '''simulator xml''' file named 'Academic2DTwice.xml'. If no simulator xml file name is specified, the SUMO Toolbox will load the simulator xml with the same name as directory. For example <code>Math/Peaks</code> is equivalent to <code>Math/Peaks/Peaks.xml</code>.

Now let's say you want to run one of the different example problems, for example, lets say you want to try the 'Michalewicz' example. In this case you would replace the original Simulator tag with:

<source lang="xml">
<Simulator>Math/Michalewicz</Simulator>
</source>

In addition you would have to change the <code><Outputs></code> tag. The <code>Math/Academic/Academic2DTwice.xml</code> example has two outputs (''out'' and ''outinverse''). However, the Michalewicz example has only one (''out''). Thus telling the SUMO Toolbox to model the ''outinverse'' output in that case makes no sense since it does not exist for the 'Michalewicz' example. So the following output configuration suffices:

<source lang="xml">
<Outputs>
<Output name="out">
</Output>
</source>

The rest of default.xml can be kept the same, then simply type in '<code>go</code>' in the SUMO root to run the example. If you do not specify any arguments, the SUMO Toolbox will use the settings in the default.xml file. If you wish run a different configuration file use the following command '<code>go(pathToYourConfig/yourConfig.xml)</code>' where pathToYourConfig is the path to where your configuration XML-file is located, and yourConfig.xml is the name of your configuration XML-file.

As noted above, it is also possible to specify an absolute path or refer to a particular simulator xml file directly. For example:

<source lang="xml">
<Simulator>/path/to/your/project/directory</Simulator>
</source>

or:

<source lang="xml">
<Simulator>Ackley/Ackley2D.xml</Simulator>
</source>

=== Important notes ===

If you start changing default.xml to try out different examples, there are a number of important things you should be aware of.

==== Select matching Inputs and Outputs ====
Using the <code><Inputs></code> and <code><Outputs></code> tags in the SUMO-Toolbox configuration file you can tell the toolbox which outputs should be modeled and how. Note that these tags are optional. You can delete them and then the toolbox will simply model all available inputs and outputs. If you do specify a particular output, for example say you tell the toolbox to model output ''temperature'' of the simulator 'ChemistryProblem'. If you then change the configuration file to model 'BiologyProblem' you will have to change the name of the selected output (or input) since most likely 'BiologyProblem' will not have an output called ''temperature''.

Information on how to further customize to modelling of the outputs can be found [[Outputs|here]].

==== Select a matching SampleEvaluator ====
There is one important caveat. Some examples consist of a fixed data set, some are implemented as a Matlab function, others as a C++ executable, etc. When running a different example you have to tell the SUMO Toolbox how the example is implemented so the toolbox knows how to extract data (eg: should it load a data file or should it call a Matlab function). This is done by specifying the correct [[Config:SampleEvaluator|SampleEvaluator]] tag. The default SampleEvaluator is:

<source lang="xml">
<SampleEvaluator>matlab</SampleEvaluator>
</source>

So this means that the toolbox expects the example you want to run is implemented as a Matlab function. Thus it is no use running an example that is implemented as a static dataset using the '[[Config:SampleEvaluator#matlab|matlab]]' or '[[Config:SampleEvaluator#local|local]]' sample evaluators. Doing this will result in an error. In this case you should use '[[Config:SampleEvaluator#scatteredDataset|scatteredDataset]]' (or sometimes [[Config:SampleEvaluator#griddedDataset|griddedDataset]]).

To see how an example is implemented open the XML file inside the example directory and look at the <source lang="xml"><Implementation></source> tag. To see which SampleEvaluators are available see [[Config:SampleEvaluator]].

=== Select an appropriate model type ===
The choice of the model type which you use to model your problem has a great impact on the overall accuracy. If you switch to a different example you may also have to change the model type used. For example, if you are using a spline model (which only works in 2D) and you decide to model a problem with many dimensions (e.g., CompActive or BostonHousing) you will have to switch to a different model type (e.g., any of the SVM or LS-SVM model builders).

The <AdaptiveModelBuilder> tag specifies which model type is used to model problem. In most cases the 'AdaptiveModelBuilder' also specifies an optimization algorithm to find the best 'hyperparameters' of the models. Hyperparameters are parameters which define model, such as the order of a polynomial or the the number of hidden nodes of an Artificial Neural Network. To see all the AdaptiveModelBuilder options and what they do go to [[Config:AdaptiveModelBuilder| this page]].

=== One-shot designs ===
If you want to use the toolbox to simply model all your data without instead of using the default sequential approach, see [[Adaptive_Modeling_Mode]] for how to do this.

== Running different configuration files ==

If you just type "go" the SUMO-Toolbox will run using the configuration options in default.xml. However you may want to make a copy of default.xml and play around with that, leaving your original default.xml intact. So the question is, how do you run that file? Lets say your copy is called MyConfigFile.xml. In order to tell SUMO to run that file you would type:

<source lang="xml">
go('/path/to/MyConfigFile.xml')
</source>

The path can be an absolute path, or a path relative to the SUMO Toolbox root directory.
To see what other options you have when running go type ''help go''.

'''Remember to always run go from the toolbox root directory.'''

=== Merging your configuration ===

If you know what you are doing, you can merge your own custom configuration with the default configuration by using the '-merge' option. Options or tags that are missing in this custom file will then be filled up with the values from the default configuration. This prevents you from having to duplicate tags in default.xml and creates xml files which are easier to manipulate. However, if you are unfamiliar with XML and not quite sure what you are doing we advise against using it.

=== Running optimization examples ===
The SUMO toolbox can also be used for minimizing the simulator in an intelligent way. There are 2 examples in included in <code>config/Optimization</code>. To run these examples is exactly the same as always, e.g. <code>go('config/optimization/Branin.xml')</code>. The only difference is in the sample selector which is specified in the configuration file itself.
<gallery>
Image:ISCSampleSelector2.png
</gallery>
The example configuration files are well documented, it is advised to go through them for more detailed information.

== Understanding the control flow ==

[[Image:sumo-control-flow.png|thumb|300px|right|The general SUMO-Toolbox control flow]]

When the toolbox is running you might wonder what exactly is going on. The high level control flow that the toolbox goes through is illustrated in the flow chart and explained in more detail below. You may also refer to the [[About#Presentation|general SUMO presentation]].

# Select samples according to the [[InitialDesign|initial design]] and execute the [[Simulator]] for each of the points
# Once enough points are available, start the [[Add_Model_Type#Models.2C_Model_builders.2C_and_Factories|Model builder]] which will start producing models as it optimizes the model parameters
## the number of models generated depends on the [[Config:ModelBuilder|ModelBuilder]] used. Usually the ModelBuilder tag contains a setting like ''maxFunEvals'' or ''popSize''. This indicates to the algorithm that is optimizing the model parameters (and thus generating models) how many models it should maximally generate before stopping. By increasing this number you will generate more models in between sampling iterations, thus have a higher chance of getting a better model, but increasing the computation time. This step is what we refer to as a ''modeling iteration''.
## optimization over the model parameters is driven by the [[Measures|Measure(s)]] that are enabled. Selection of the Measure is thus very important for the modeling process!
## each time the model builder generates a model that has a lower measure score than the previous best model, the toolbox will trigger a "New best model found" event, save the model, generate a plot, and trigger all the profilers to update themselves.
## so note that by default, you only see something happen when a new best model is found, you do not see all the other models that are being generated in the background. If you want to see those, you must increase the logging granularity (or just look in the log file) or [[FAQ#How_do_I_enable_more_profilers.3F|enable more profilers]].
# So the model builder will run until it has completed
# Then, if the current best model satisfies all the targets in the enabled Measures, it means we have reached the requirements and the toolbox terminates.
# If not, the [[SequentialDesign]] selects a new set of samples (= a ''sampling iteration''), they are simulated, and the model building resumes or is restarted according to the configured restart strategy
# This whole loop continues (thus the toolbox will keep running) until one of the following conditions is true:
## the targets specified in the active measure tags have been reached (each Measure has a target value which you can set). Note though, that when you are using multiple measures (see [[Multi-Objective Modeling]]) or when using single measures like AIC or LRM, it becomes difficult to set a priori targets since you cant really interpret the scores (in contrast to the simple case with a single measure like CrossValidation where your target is simply the error you require). In those cases you should usually set the targets to 0 and use one of the other criteria below to make sure the toolbox stops.
## the maximum running time has been reached (''maximumTime'' property in the [[Config:SUMO]] tag)
## the maximum number of samples has been reached (''maximumTotalSamples'' property in the [[Config:SUMO]] tag)
## the maximum number of modeling iterations has been reached (''maxModelingIterations'' property in the [[Config:SUMO]] tag)

Note that it is also possible to disable the sample selection loop, see [[Adaptive Modeling Mode]]. Also note that while you might think the toolbox is not doing anything, it is actually building models in the background (see above for how to see the details). The toolbox will only inform you (unless configured otherwise) if it finds a model that is better than the previous best model (using that particular measure!!). If not it will continue running until one of the stopping conditions is true.

== SUMO Toolbox output ==

All output is stored under the [[Config:ContextConfig#OutputDirectory|directory]] specified in the [[Config:ContextConfig]] section of the configuration file (by default this is set to "<code>output</code>").

Starting from version 6.0 the output directory is always relative to the project directory of your example. Unless you specify an absolute path.

After completion of a SUMO Toolbox run, the following files and directories can be found there (e.g. : in <code>output/<run_name+date+time>/</code> subdirectory) :

* <code>config.xml</code>: The xml file that was used by this run. Can be used to reproduce the entire modeling process for that run.
* <code>randstate.dat</code>: contains states of the random number generators, so that it becomes possible to deterministically repeat a run (see the [[Random state]] page).
* <code>samples.txt</code>: a list of all the samples that were evaluated, and their outputs.
* <code>profilers</code>-dir: contains information and plots about convergence rates, resource usage, and so on.
* <code>best</code>-dir: contains the best models (+ plots) of all outputs that were constructed during the run. This is continuously updated as the modeling progresses.
* <code>models_outputName</code>-dir: contains a history of all intermediate models (+ plots + movie) for each output that was modeled.

If you generated models [[Multi-Objective Modeling|multi-objectively]] you will also find the following directory:

* <code>paretoFronts</code>-dir: contains snapshots of the population during multi-objective optimization of the model parameters.

== Debugging ==

Remember to always check the log file first if problems occur!
When [[reporting problems]] please attach your log file and the xml configuration file you used.

To aid understanding and debugging you should set the console and file logging level to FINE (or even FINER, FINEST)
as follows:

Change the level of the ConsoleHandler tag to FINE, FINER or FINEST. Do the same for the FileHandler tag.

<source lang="xml">

<ConsoleHandler>
<Option key="Level" value="FINE"/>
</ConsoleHandler>
</source>

== Using models ==

Once you have generated a model, you might wonder what you can do with it. To see how to load, export, and use SUMO generated models see the [[Using a model]] page.

== Modelling complex outputs ==

The toolbox supports the modeling of complex valued data. If you do not specify any specific <[[Outputs|Output]]> tags, all outputs will be modeled with [[Outputs#Complex_handling|complexHandling]] set to '<code>complex</code>'. This means that a real output will be modeled as a real value, and a complex output will be modeled as a complex value (with a real and imaginary part). If you don't want this (i.e., you want to model the modulus of a complex output or you want to model real and imaginary parts separately), you explicitly have to set [[Outputs#Complex_handling|complexHandling]] to 'modulus', 'real', 'imaginary', or 'split'.

More information on this subject can be found at the [[Outputs#Complex_handling|Outputs]] page.

== Models with multiple outputs ==

If multiple [[Outputs]] are selected, by default the toolbox will model each output separately using a separate adaptive model builder object. So if you have a system with 3 outputs you will get three different models each with one output. However, sometimes you may want a single model with multiple outputs. For example instead of having a neural network for each component of a complex output (real/imaginary) you might prefer a single network with 2 outputs. To do this simply set the 'combineOutputs' attribute of the <AdaptiveModelBuilder> tag to 'true'. That means that each time that model builder is selected for an output, the same model builder object will be used instead of creating a new one.

Note though, that not all model types support multiple outputs. If they don't you will get an error message.

Also note that you can also generate models with multiple outputs in a multi-objective fashion. For information on this see the page on [[Multi-Objective Modeling]].

== Multi-Objective Model generation ==

See the page on [[Multi-Objective Modeling]].

== Interfacing with the SUMO Toolbox ==

To learn how to interface with the toolbox or model your own problem see the [[Adding an example]] and [[Interfacing with the toolbox]] pages.

== Test Suite ==

A test harness is provided that can be run manually or automatically as part of a cron job. The test suite consists of a number of test XML files (in the config/test/ directory), each describing a particular surrogate modeling experiment. The file config/test/suite.xml dictates which tests are run and their order. The suite.xml file also contains the accuracy and sample bounds that are checked after each test. If the final model found does not fall within the accuracy or number-of-samples bounds, the test is considered failed.

Note also that some of the predefined test cases may rely on data sets or simulation code that are not publically available for confidentiality reasons. However, since these test problems typically make very good benchmark problems we left them in for illustration purposes.

The coordinating class is the Matlab TestSuite class found in the src/matlab directory. Besides running the tests defined in suite.xml it also tests each of the model member functions.

Assuming the SUMO Toolbox is setup properly and the necessary libraries are compiled ([[Installation#Optional:_Compiling_libraries|see here]]), the test suite should be run as follows (from the SUMO root directory):

<source lang="matlab">
s = TestEngine('config/test/suite.xml') ; s.run()
</source>

The "run()" method also supports an optional parameter (a vector) that dictates which tests to run (e.g., run([2 5 3]) will run tests 2,5 and 3).

''Note that due to randomization the final accuracy and number of samples used may vary slightly from run to run (causing failed tests). Thus the bounds must be set sufficiently loose.''

== Tips ==

See the [[Tips]] page for various tips and gotchas.

General guidelines

2014-02-27T15:30:16Z

Javdrher: /* Adaptive Model Builders */

The <code>[[Config:ToolboxConfiguration|default.xml]]</code> file can be used as a starting point for default behavior for the SUMO Toolbox. If you are a new user, you should initially leave most options at their default values. The default settings were chosen since they produce good results on average.

However, usually the optimal choice of components depends on the problem itself, so that the default settings aren't necessarily the best. This page will give the user general guidelines to decide which component to use for each situation they may encounter. The user is of course free to ignore these rules and experiment with other settings.

Note this list is very brief and incomplete, feel free to [[Contact]] us if you have any further questions.

== Measures ==

The default [[Measures| Measure]] is [[Measures#CrossValidation| CrossValidation]]. Even though this is a very good, accurate, overall measure, there are some considerations to make in the following cases:

* '''Expensive modelers (ann):''' If it is relatively expensive to train a model (for example, with neural networks), CrossValidation is also very slow, because it has to train a model for each fold (which is 5 by default). If modeling takes too long, you might want to use a faster alternative, such as [[Measures#ValidationSet|ValidationSet]] or a combination of [[Measures#SampleError|SampleError]] and [[Measures#LRMMeasure|LRMMeasure]].
* '''ErrorSampleSelector:''' CrossValidation might give a biased result when combined with the [[SampleSelector#ErrorSampleSelector|ErrorSampleSelector]]. This is because the ErrorSampleSelector tends to cluster samples around one point, which will result in very accurate surrogate models for all the points in this cluster (and thus good results with CrossValidation ). So when using CrossValidation and ErrorSampleSelector together, keep in mind that the real accuracy might be slightly lower than the estimated one.
* '''Rational modeler:''' When using Rational modeler, you might want to manually add a [[Measures#MinMax| MinMax]] measure (if you got a rough estimate of the minimum and maximum values for your outputs) and use it together with CrossValidation. By adding the MinMax measure, you eliminate models which have poles in the design space, because these poles always break the minimum and maximum bounds. This usually results in better models and quicker convergence.

Selecting a good Measure '''is a very important''' part of the modeling process! It is CRUCIAL that you think well about this. Make sure you also read [[Multi-Objective Modeling]].

== Sequential Design ==

The default [[Config:SequentialDesign|Sequential Design]] is the [[Config:SequentialDesign#lola-voronoi|LOLA-Voronoi sample selector]] combined with the [[Config:SequentialDesign#error|error-based sample selector]], with a weight of 0.7 for LOLA and 0.3 for error. This is a very robust sample selector, capable of dealing with most situations. There are, however, some cases in which it is advisable to choose a different one:

* '''Large-scale problems (1000+ samples):''' LOLA-Voronoi's time complexity is O(n²) to the number of samples n, so for large-scale experiments in which many samples are taken, LOLA-Voronoibecomes quite slow. Depending on the time it takes to perform one simulation, this may or may not be a problem. If it takes a long time to perform one simulation, the cost for selecting new samples with LOLA-Voronoi might still be negligible. If, however, you need a quicker sample selector, it is advized to use [[Config:SequentialDesign#voronoi|voronoi]] or [[Config:SequentialDesign#error|error]] instead.
* '''Rational modeler:''' Benchmarks have shown that the gain of LOLA-Voronoi over the [[Config:SequentialDesign#error|error-based sample selector]] when using global approximation methods (mainly rational/polynomial) is pretty much zero. It is therefore advisable to use the (much faster) [[Config:SampleSelector#error|error-based sample selector]] when using the Rational modeler. This can be done by changing the weights in default.xml to 1.0 for error and 0.0 for LOLA.
* If you need to sample multiple outputs at once, with one sample selector, or you need an auto-sampled input (for example: a frequency input), you should use [[Config:SequentialDesign#lola-voronoi|LOLA-Voronoi]]. It is the only sample selector with fully integrated and optimized support for these features.

When using the [[Config:SequentialDesign#error|error-based sample selector]] separately, it is always a good idea to combine it with the [[Config:SequentialDesign#voronoi|voronoi]], to combat stability/robustness issues the error-based sample selector often causes. It is a good idea to select about 60% of the samples with error, and 40% with the voronoi. This will ensure that at least the entire design space is covered to a certain degree. This additional sample selector is NOT necessary when using LOLA-Voronoi. To combine sample selectors, create a CombinedSampleSelector. See the [[Config:SequentialDesign#default|default sample selector]] for an example.

== Model Builders ==

The question that always gets asked is ''Which model type should I use for my data?'' Unfortunately there is no straightforward since it all depends on your problem: how many dimensions, how many points, is your function rugged, smooth, or both, is there noise, etc, etc. Based on this knowledge it is possible to say which model types are more likely to do well but it remains a heuristic. Best is to try a few and see what happens, or use the ''heterogenetic'' model builder to try multiple model types in parallel and automatically try to determine the best type.

Howeve, since this question keeps coming up, some very rough intuition is the following:

# The models SVM, RBF, DACE, Kriging, RBFNN, GaussianProcess all belong to the same family, thus their general performance with respect to the data distribution will also be similar
# SVM and LS-SVM perform pretty much the same, though LS-SVM is faster
# The SVM models are usually the best to use for a high number of dimensions. They become slower to use if the number of datapoints increases though (> 1000).
# The SVM models also tend to converge quite quickly. You will quickly get a smooth fit, but for high accuracy you often need a lot of datapoints.
# If your function is uniformly smooth pretty much any model type will do well with a nice spread out data distribution
# If your function is uniformly rugged ('bumpy') the SVM/RBF/Kriging/... type models will tend to do quite well
# If your function is smooth but with some sharp non-linearities, the SVM/RBF/Kriging/... family tend to need quite a lot of samples to get the accuracy low enough. In this case the ANN models perform much better.
# The rational models can behave very erratic and are not recommended for for difficult bumpy problems or if the dimension exceeds 3.
# The ANN models generally perfom very well across all problems but are very slow to use. Also if the function is uniformly rugged the Kriging/RBF/... models will give a better fit with much less points (eg. ackley function).
# The FANN and NANN models are much faster than the ANN models, but usually the accuracy of the ANN models is much better

Finally, a related question is, which model builder variant should I use (e.g., svmsim, svmga, svmps, svnoptim, etc). The best optimization algorithm to use will usually depend on how many model parameters you have. For example, since SVM models only have 2 or 3 parameters most algorithms do well and you wont see that much difference. On the other hand, if you are fitting a 5D Kriging model (thus you have at least 5 model parameters to optimize) you will most likely see better performance using the GA or PSO versions over for example the pattern search or gradient descent versions.

However, our general experience is that it does not make that much of a difference (outside the obvious extremes like gradient descent vs GA). Only if data is really expensive and you want to be sure of the best model with least samples should you really start worrying about this.

'''Note this is just some very rough intuition gained from our experience with different datasets, your mileage may vary! If you have any suggestions [[Contact|let us know]]'''

General guidelines

2014-02-27T15:30:04Z

Javdrher: /* Sample Selectors */

The <code>[[Config:ToolboxConfiguration|default.xml]]</code> file can be used as a starting point for default behavior for the SUMO Toolbox. If you are a new user, you should initially leave most options at their default values. The default settings were chosen since they produce good results on average.

However, usually the optimal choice of components depends on the problem itself, so that the default settings aren't necessarily the best. This page will give the user general guidelines to decide which component to use for each situation they may encounter. The user is of course free to ignore these rules and experiment with other settings.

Note this list is very brief and incomplete, feel free to [[Contact]] us if you have any further questions.

== Measures ==

The default [[Measures| Measure]] is [[Measures#CrossValidation| CrossValidation]]. Even though this is a very good, accurate, overall measure, there are some considerations to make in the following cases:

* '''Expensive modelers (ann):''' If it is relatively expensive to train a model (for example, with neural networks), CrossValidation is also very slow, because it has to train a model for each fold (which is 5 by default). If modeling takes too long, you might want to use a faster alternative, such as [[Measures#ValidationSet|ValidationSet]] or a combination of [[Measures#SampleError|SampleError]] and [[Measures#LRMMeasure|LRMMeasure]].
* '''ErrorSampleSelector:''' CrossValidation might give a biased result when combined with the [[SampleSelector#ErrorSampleSelector|ErrorSampleSelector]]. This is because the ErrorSampleSelector tends to cluster samples around one point, which will result in very accurate surrogate models for all the points in this cluster (and thus good results with CrossValidation ). So when using CrossValidation and ErrorSampleSelector together, keep in mind that the real accuracy might be slightly lower than the estimated one.
* '''Rational modeler:''' When using Rational modeler, you might want to manually add a [[Measures#MinMax| MinMax]] measure (if you got a rough estimate of the minimum and maximum values for your outputs) and use it together with CrossValidation. By adding the MinMax measure, you eliminate models which have poles in the design space, because these poles always break the minimum and maximum bounds. This usually results in better models and quicker convergence.

Selecting a good Measure '''is a very important''' part of the modeling process! It is CRUCIAL that you think well about this. Make sure you also read [[Multi-Objective Modeling]].

== Sequential Design ==

The default [[Config:SequentialDesign|Sequential Design]] is the [[Config:SequentialDesign#lola-voronoi|LOLA-Voronoi sample selector]] combined with the [[Config:SequentialDesign#error|error-based sample selector]], with a weight of 0.7 for LOLA and 0.3 for error. This is a very robust sample selector, capable of dealing with most situations. There are, however, some cases in which it is advisable to choose a different one:

* '''Large-scale problems (1000+ samples):''' LOLA-Voronoi's time complexity is O(n²) to the number of samples n, so for large-scale experiments in which many samples are taken, LOLA-Voronoibecomes quite slow. Depending on the time it takes to perform one simulation, this may or may not be a problem. If it takes a long time to perform one simulation, the cost for selecting new samples with LOLA-Voronoi might still be negligible. If, however, you need a quicker sample selector, it is advized to use [[Config:SequentialDesign#voronoi|voronoi]] or [[Config:SequentialDesign#error|error]] instead.
* '''Rational modeler:''' Benchmarks have shown that the gain of LOLA-Voronoi over the [[Config:SequentialDesign#error|error-based sample selector]] when using global approximation methods (mainly rational/polynomial) is pretty much zero. It is therefore advisable to use the (much faster) [[Config:SampleSelector#error|error-based sample selector]] when using the Rational modeler. This can be done by changing the weights in default.xml to 1.0 for error and 0.0 for LOLA.
* If you need to sample multiple outputs at once, with one sample selector, or you need an auto-sampled input (for example: a frequency input), you should use [[Config:SequentialDesign#lola-voronoi|LOLA-Voronoi]]. It is the only sample selector with fully integrated and optimized support for these features.

When using the [[Config:SequentialDesign#error|error-based sample selector]] separately, it is always a good idea to combine it with the [[Config:SequentialDesign#voronoi|voronoi]], to combat stability/robustness issues the error-based sample selector often causes. It is a good idea to select about 60% of the samples with error, and 40% with the voronoi. This will ensure that at least the entire design space is covered to a certain degree. This additional sample selector is NOT necessary when using LOLA-Voronoi. To combine sample selectors, create a CombinedSampleSelector. See the [[Config:SequentialDesign#default|default sample selector]] for an example.

== Adaptive Model Builders ==

The question that always gets asked is ''Which model type should I use for my data?'' Unfortunately there is no straightforward since it all depends on your problem: how many dimensions, how many points, is your function rugged, smooth, or both, is there noise, etc, etc. Based on this knowledge it is possible to say which model types are more likely to do well but it remains a heuristic. Best is to try a few and see what happens, or use the ''heterogenetic'' model builder to try multiple model types in parallel and automatically try to determine the best type.

Howeve, since this question keeps coming up, some very rough intuition is the following:

# The models SVM, RBF, DACE, Kriging, RBFNN, GaussianProcess all belong to the same family, thus their general performance with respect to the data distribution will also be similar
# SVM and LS-SVM perform pretty much the same, though LS-SVM is faster
# The SVM models are usually the best to use for a high number of dimensions. They become slower to use if the number of datapoints increases though (> 1000).
# The SVM models also tend to converge quite quickly. You will quickly get a smooth fit, but for high accuracy you often need a lot of datapoints.
# If your function is uniformly smooth pretty much any model type will do well with a nice spread out data distribution
# If your function is uniformly rugged ('bumpy') the SVM/RBF/Kriging/... type models will tend to do quite well
# If your function is smooth but with some sharp non-linearities, the SVM/RBF/Kriging/... family tend to need quite a lot of samples to get the accuracy low enough. In this case the ANN models perform much better.
# The rational models can behave very erratic and are not recommended for for difficult bumpy problems or if the dimension exceeds 3.
# The ANN models generally perfom very well across all problems but are very slow to use. Also if the function is uniformly rugged the Kriging/RBF/... models will give a better fit with much less points (eg. ackley function).
# The FANN and NANN models are much faster than the ANN models, but usually the accuracy of the ANN models is much better

Finally, a related question is, which model builder variant should I use (e.g., svmsim, svmga, svmps, svnoptim, etc). The best optimization algorithm to use will usually depend on how many model parameters you have. For example, since SVM models only have 2 or 3 parameters most algorithms do well and you wont see that much difference. On the other hand, if you are fitting a 5D Kriging model (thus you have at least 5 model parameters to optimize) you will most likely see better performance using the GA or PSO versions over for example the pattern search or gradient descent versions.

However, our general experience is that it does not make that much of a difference (outside the obvious extremes like gradient descent vs GA). Only if data is really expensive and you want to be sure of the best model with least samples should you really start worrying about this.

'''Note this is just some very rough intuition gained from our experience with different datasets, your mileage may vary! If you have any suggestions [[Contact|let us know]]'''

General guidelines

2014-02-27T15:28:40Z

Javdrher: /* Sample Selectors */

The <code>[[Config:ToolboxConfiguration|default.xml]]</code> file can be used as a starting point for default behavior for the SUMO Toolbox. If you are a new user, you should initially leave most options at their default values. The default settings were chosen since they produce good results on average.

However, usually the optimal choice of components depends on the problem itself, so that the default settings aren't necessarily the best. This page will give the user general guidelines to decide which component to use for each situation they may encounter. The user is of course free to ignore these rules and experiment with other settings.

Note this list is very brief and incomplete, feel free to [[Contact]] us if you have any further questions.

== Measures ==

The default [[Measures| Measure]] is [[Measures#CrossValidation| CrossValidation]]. Even though this is a very good, accurate, overall measure, there are some considerations to make in the following cases:

* '''Expensive modelers (ann):''' If it is relatively expensive to train a model (for example, with neural networks), CrossValidation is also very slow, because it has to train a model for each fold (which is 5 by default). If modeling takes too long, you might want to use a faster alternative, such as [[Measures#ValidationSet|ValidationSet]] or a combination of [[Measures#SampleError|SampleError]] and [[Measures#LRMMeasure|LRMMeasure]].
* '''ErrorSampleSelector:''' CrossValidation might give a biased result when combined with the [[SampleSelector#ErrorSampleSelector|ErrorSampleSelector]]. This is because the ErrorSampleSelector tends to cluster samples around one point, which will result in very accurate surrogate models for all the points in this cluster (and thus good results with CrossValidation ). So when using CrossValidation and ErrorSampleSelector together, keep in mind that the real accuracy might be slightly lower than the estimated one.
* '''Rational modeler:''' When using Rational modeler, you might want to manually add a [[Measures#MinMax| MinMax]] measure (if you got a rough estimate of the minimum and maximum values for your outputs) and use it together with CrossValidation. By adding the MinMax measure, you eliminate models which have poles in the design space, because these poles always break the minimum and maximum bounds. This usually results in better models and quicker convergence.

Selecting a good Measure '''is a very important''' part of the modeling process! It is CRUCIAL that you think well about this. Make sure you also read [[Multi-Objective Modeling]].

== Sample Selectors ==

The default [[Config:SequentialDesign|Sequential Design]] is the [[Config:SequentialDesign#lola-voronoi|LOLA-Voronoi sample selector]] combined with the [[Config:SequentialDesign#error|error-based sample selector]], with a weight of 0.7 for LOLA and 0.3 for error. This is a very robust sample selector, capable of dealing with most situations. There are, however, some cases in which it is advisable to choose a different one:

* '''Large-scale problems (1000+ samples):''' LOLA-Voronoi's time complexity is O(n²) to the number of samples n, so for large-scale experiments in which many samples are taken, LOLA-Voronoibecomes quite slow. Depending on the time it takes to perform one simulation, this may or may not be a problem. If it takes a long time to perform one simulation, the cost for selecting new samples with LOLA-Voronoi might still be negligible. If, however, you need a quicker sample selector, it is advized to use [[Config:SampleSelector#voronoi|voronoi]] or [[Config:SampleSelector#error|error]] instead.
* '''Rational modeler:''' Benchmarks have shown that the gain of LOLA-Voronoi over the [[Config:SampleSelector#error|error-based sample selector]] when using global approximation methods (mainly rational/polynomial) is pretty much zero. It is therefore advisable to use the (much faster) [[Config:SampleSelector#error|error-based sample selector]] when using the Rational modeler. This can be done by changing the weights in default.xml to 1.0 for error and 0.0 for LOLA.
* If you need to sample multiple outputs at once, with one sample selector, or you need an auto-sampled input (for example: a frequency input), you should use [[Config:SampleSelector#lola-voronoi|LOLA-Voronoi]]. It is the only sample selector with fully integrated and optimized support for these features.

When using the [[Config:SampleSelector#error|error-based sample selector]] separately, it is always a good idea to combine it with the [[Config:SampleSelector#voronoi|voronoi]], to combat stability/robustness issues the error-based sample selector often causes. It is a good idea to select about 60% of the samples with error, and 40% with the voronoi. This will ensure that at least the entire design space is covered to a certain degree. This additional sample selector is NOT necessary when using LOLA-Voronoi. To combine sample selectors, create a CombinedSampleSelector. See the [[Config:SampleSelector#default|default sample selector]] for an example.

== Adaptive Model Builders ==

The question that always gets asked is ''Which model type should I use for my data?'' Unfortunately there is no straightforward since it all depends on your problem: how many dimensions, how many points, is your function rugged, smooth, or both, is there noise, etc, etc. Based on this knowledge it is possible to say which model types are more likely to do well but it remains a heuristic. Best is to try a few and see what happens, or use the ''heterogenetic'' model builder to try multiple model types in parallel and automatically try to determine the best type.

Howeve, since this question keeps coming up, some very rough intuition is the following:

# The models SVM, RBF, DACE, Kriging, RBFNN, GaussianProcess all belong to the same family, thus their general performance with respect to the data distribution will also be similar
# SVM and LS-SVM perform pretty much the same, though LS-SVM is faster
# The SVM models are usually the best to use for a high number of dimensions. They become slower to use if the number of datapoints increases though (> 1000).
# The SVM models also tend to converge quite quickly. You will quickly get a smooth fit, but for high accuracy you often need a lot of datapoints.
# If your function is uniformly smooth pretty much any model type will do well with a nice spread out data distribution
# If your function is uniformly rugged ('bumpy') the SVM/RBF/Kriging/... type models will tend to do quite well
# If your function is smooth but with some sharp non-linearities, the SVM/RBF/Kriging/... family tend to need quite a lot of samples to get the accuracy low enough. In this case the ANN models perform much better.
# The rational models can behave very erratic and are not recommended for for difficult bumpy problems or if the dimension exceeds 3.
# The ANN models generally perfom very well across all problems but are very slow to use. Also if the function is uniformly rugged the Kriging/RBF/... models will give a better fit with much less points (eg. ackley function).
# The FANN and NANN models are much faster than the ANN models, but usually the accuracy of the ANN models is much better

Finally, a related question is, which model builder variant should I use (e.g., svmsim, svmga, svmps, svnoptim, etc). The best optimization algorithm to use will usually depend on how many model parameters you have. For example, since SVM models only have 2 or 3 parameters most algorithms do well and you wont see that much difference. On the other hand, if you are fitting a 5D Kriging model (thus you have at least 5 model parameters to optimize) you will most likely see better performance using the GA or PSO versions over for example the pattern search or gradient descent versions.

However, our general experience is that it does not make that much of a difference (outside the obvious extremes like gradient descent vs GA). Only if data is really expensive and you want to be sure of the best model with least samples should you really start worrying about this.

'''Note this is just some very rough intuition gained from our experience with different datasets, your mileage may vary! If you have any suggestions [[Contact|let us know]]'''

Config:ToolboxConfiguration

2014-02-27T15:26:22Z

Javdrher: /* Changing configuration components of an experimental run */

== Toolbox configuration file ==
This is the default SUMO toolbox configuration, this is what gets used when you run 'go' without any arguments You can edit this file directly or make a copy and run that. See the wiki for detailed information.
<source xmlns:saxon="http://icl.com/saxon" lang="xml">
<[[Config:ToolboxConfiguration|ToolboxConfiguration]] version="7.0">
<[[Config:Plan|Plan]]/>
<[[Config:ContextConfig|ContextConfig]]/>
<[[Config:Logging|Logging]]/>
<[[Config:LevelPlot|LevelPlot]]/>
<[[Config:SUMO|SUMO]]/>
<[[Config:DataSource|DataSource]]/>
<[[Config:SequentialDesign|SequentialDesign]]/>
<[[Config:ModelBuilder|ModelBuilder]]/>
<[[Config:BasisFunction|BasisFunction]]/>
<[[Config:InitialDesign|InitialDesign]]/>
<[[Config:Optimizer|Optimizer]]/>
</[[Config:ToolboxConfiguration|ToolboxConfiguration]]>
</source>

== Interpreting the configuration file ==

When first looking at the default.xml you are presented with a lot of information and it can be a bit difficult to understand what is going, especially if you are not familiar with XML. However there is method in the madness, and this section will help you to break down the components that make up the configuration file.

=== Comments ===
Comments in the XML are displayed like so:

<source xmlns:saxon="http://icl.com/saxon" lang="xml">

</source>

The default.xml contains a lot of comments with information about the different sections or example usage.

=== Tags ===
XML groups information that logically belong to with each other within so called '''tags'''. Here is an example of a ''recipe'' tag. This is a recipe to make pancakes, which is grouped in a logical way; all the ingredients ''items'' are grouped within the ''ingredients'' tag. The ''items'' themselves in turn group information about the ''type'' of ingredient and the required ''amount''. Tags can also have attributes, here the ''recipe'' tag has an attribute called ''category'' with the value ''desert''.

<source lang="xml">
<recipe category="dessert">
<title>Pancakes</title>
<author>sumo@intec.ugent.be</author>
<date>Wed, 14 Jun 95</date>
<description>
Good old fashioned pancakes.
</description>
<ingredients>
<item>
<amount>3</amount>
<type>eggs</type>
</item>

<item>
<amount>0.5 tablespoon</amount>
<type>salt</type>
</item>
...
</ingredients>
<preparation>
...
</preparation>
</recipe>
</source>

The configuration file uses XML to group information about the Toolbox into logical units. Here is an example configuration of a [[Config:SampleSelector#delaunay|SampleSelector]] called delaunay. It has three attributes ''id'', ''type'' and ''combineOutputs''. The SUMO Toolbox uses the ''id'' to refer to this configuration section in other places in the configuration file. The ''type'' refers tells the toolbox what class of sample selector it has to look for in the <code>src</code> folder, in this case the class ''PipeLinSampleSelector'' which you can find under <code>src/matlab/sampleselector/@PipeLineSampleSelector</code>. The ''combineOutputs'' tells the SUMO how the SampleSelector has to deal with multiple outputs.

If you look at the implementation of the PipeLineSampleSelector you will see that it requires a ''CandidateGenerator'', ''CandidateRanker'' and a ''MergeCriterion'' all of which are specified here within the ''SampleSelector'' tag.

Other components (such as [[Config:ModelBuilder|ModelBuilder]], [[Config:InitialDesign|initial design]], etc...) require different information. To find what options and configurations you need/can give to a component check out their wiki page or their implementation in the Toolbox.

<source lang=xml>

<SequentialDesign id="delaunay" type="PipelineSampleSelector" combineOutputs="false">

<CandidateGenerator type="DelaunayCandidateGenerator"/>

<CandidateRanker type="modelDifference">
<Option key="criterion_parameter" value="2"/>
</CandidateRanker>
<CandidateRanker type="delaunayVolume"/>

<MergeCriterion type="WeightedAverage" weights="[1 1]"/>

</SequentialDesign>
</source>

=== Changing configuration components of an experimental run ===
The SUMO Toolbox was written with flexibility in mind, making it easy to experiment with different combinations of algorithms. Here is a snippet from a configuration XML. The snippet show the <Plan> tag which determines how the experimental runs are configured and two different pre-defined <SampleSelector> tags, ''random'' and ''delaunay''.

<source xmlns:saxon="http://icl.com/saxon" lang="xml">

<?xml version="1.0" encoding="ISO-8859-1" ?>
<ToolboxConfiguration version="7.0">

<Plan>
<ContextConfig>default</ContextConfig>
<SUMO>default</SUMO>
<LevelPlot>default</LevelPlot>
<Simulator>Math/Academic/Academic2DTwice.xml</Simulator>
<Run name="" repeat="1">
<InitialDesign>lhdWithCornerPoints</InitialDesign>
<DataSource>random</DataSource>
<SampleEvaluator>matlab</SampleEvaluator>
<SequentialDesign>kriging</SequentialDesign>
<Measure type="CrossValidation" target="0.01" errorFcn="rootRelativeSquareError" use="on" />
<Outputs>
<Output name="out">
</Output>
<Output name="outinverse">
</Output>
</Outputs>
</Run>
</Plan>

...

<SampleSelector id="random" type="RandomSampleSelector" combineOutputs="false"/>

<SampleSelector id="delaunay" type="PipelineSampleSelector" combineOutputs="false">

<CandidateGenerator type="DelaunayCandidateGenerator"/>

<CandidateRanker type="modelDifference">
<Option key="criterion_parameter" value="2"/>
</CandidateRanker>
<CandidateRanker type="delaunayVolume"/>

<MergeCriterion type="WeightedAverage" weights="[1 1]"/>

</SampleSelector>

</ToolboxConfiguration>

</source>

All the tags within the <Plan> tag, except for Measure, Inputs and Outputs, refer to a configuration section defined further in the configuration xml. These tags determine what algorithm will be used for the experimental run. For example in this case the tag:

<source lang="xml">
<SequentialDesign>random</SequentialDesign>
</source>

refers to the SampleSelector tag with an ''id=random''. So even though both the ''random'' SampleSelector and ''delaunay'' SampleSelector are defined, it is the random SampleSelector which will be used in the experimental run. To use the ''delaunay'' SampleSelector simply replace ''random'' by ''delaunay'' in the <Plan>.

The default.xml already has a number of configuration section already defined, making it easier for you to experiment with them. All you have to do is use the appropriate ''id'' in the right tags.

=== Modifying/Creating your own configuration section ===

In some cases the default configuration sections might no suit your needs. For example if you require a lhdWithCornerPoints [[Config:InitialDesign|initial design]] with 40 instead of the default 20 points. In this case you can either create a new configuration section like so:

<source lang="xml">
<InitialDesign id="lhdWithCornerPoints40" type="CombinedDesign">
<InitialDesign type="LatinHypercubeDesign">

<Option key="points" value="40"/>
</InitialDesign>

<InitialDesign type="FactorialDesign">
<Option key="levels" value="2" />
</InitialDesign>
</InitialDesign>
</source>

and refer to it in the plan using the ''id'' lhdWithCornerPoints40 or you can simply edit the default lhdWithCornerPoints :). When making your own configuration section, make sure you specify all necessary tags and options and you're set to go!

Config:ToolboxConfiguration

2014-02-27T15:26:01Z

Javdrher: /* Changing configuration components of an experimental run */

== Toolbox configuration file ==
This is the default SUMO toolbox configuration, this is what gets used when you run 'go' without any arguments You can edit this file directly or make a copy and run that. See the wiki for detailed information.
<source xmlns:saxon="http://icl.com/saxon" lang="xml">
<[[Config:ToolboxConfiguration|ToolboxConfiguration]] version="7.0">
<[[Config:Plan|Plan]]/>
<[[Config:ContextConfig|ContextConfig]]/>
<[[Config:Logging|Logging]]/>
<[[Config:LevelPlot|LevelPlot]]/>
<[[Config:SUMO|SUMO]]/>
<[[Config:DataSource|DataSource]]/>
<[[Config:SequentialDesign|SequentialDesign]]/>
<[[Config:ModelBuilder|ModelBuilder]]/>
<[[Config:BasisFunction|BasisFunction]]/>
<[[Config:InitialDesign|InitialDesign]]/>
<[[Config:Optimizer|Optimizer]]/>
</[[Config:ToolboxConfiguration|ToolboxConfiguration]]>
</source>

== Interpreting the configuration file ==

When first looking at the default.xml you are presented with a lot of information and it can be a bit difficult to understand what is going, especially if you are not familiar with XML. However there is method in the madness, and this section will help you to break down the components that make up the configuration file.

=== Comments ===
Comments in the XML are displayed like so:

<source xmlns:saxon="http://icl.com/saxon" lang="xml">

</source>

The default.xml contains a lot of comments with information about the different sections or example usage.

=== Tags ===
XML groups information that logically belong to with each other within so called '''tags'''. Here is an example of a ''recipe'' tag. This is a recipe to make pancakes, which is grouped in a logical way; all the ingredients ''items'' are grouped within the ''ingredients'' tag. The ''items'' themselves in turn group information about the ''type'' of ingredient and the required ''amount''. Tags can also have attributes, here the ''recipe'' tag has an attribute called ''category'' with the value ''desert''.

<source lang="xml">
<recipe category="dessert">
<title>Pancakes</title>
<author>sumo@intec.ugent.be</author>
<date>Wed, 14 Jun 95</date>
<description>
Good old fashioned pancakes.
</description>
<ingredients>
<item>
<amount>3</amount>
<type>eggs</type>
</item>

<item>
<amount>0.5 tablespoon</amount>
<type>salt</type>
</item>
...
</ingredients>
<preparation>
...
</preparation>
</recipe>
</source>

The configuration file uses XML to group information about the Toolbox into logical units. Here is an example configuration of a [[Config:SampleSelector#delaunay|SampleSelector]] called delaunay. It has three attributes ''id'', ''type'' and ''combineOutputs''. The SUMO Toolbox uses the ''id'' to refer to this configuration section in other places in the configuration file. The ''type'' refers tells the toolbox what class of sample selector it has to look for in the <code>src</code> folder, in this case the class ''PipeLinSampleSelector'' which you can find under <code>src/matlab/sampleselector/@PipeLineSampleSelector</code>. The ''combineOutputs'' tells the SUMO how the SampleSelector has to deal with multiple outputs.

If you look at the implementation of the PipeLineSampleSelector you will see that it requires a ''CandidateGenerator'', ''CandidateRanker'' and a ''MergeCriterion'' all of which are specified here within the ''SampleSelector'' tag.

Other components (such as [[Config:ModelBuilder|ModelBuilder]], [[Config:InitialDesign|initial design]], etc...) require different information. To find what options and configurations you need/can give to a component check out their wiki page or their implementation in the Toolbox.

<source lang=xml>

<SequentialDesign id="delaunay" type="PipelineSampleSelector" combineOutputs="false">

<CandidateGenerator type="DelaunayCandidateGenerator"/>

<CandidateRanker type="modelDifference">
<Option key="criterion_parameter" value="2"/>
</CandidateRanker>
<CandidateRanker type="delaunayVolume"/>

<MergeCriterion type="WeightedAverage" weights="[1 1]"/>

</SequentialDesign>
</source>

=== Changing configuration components of an experimental run ===
The SUMO Toolbox was written with flexibility in mind, making it easy to experiment with different combinations of algorithms. Here is a snippet from a configuration XML. The snippet show the <Plan> tag which determines how the experimental runs are configured and two different pre-defined <SampleSelector> tags, ''random'' and ''delaunay''.

<source xmlns:saxon="http://icl.com/saxon" lang="xml">

<?xml version="1.0" encoding="ISO-8859-1" ?>
<ToolboxConfiguration version="7.0">

<Plan>
<ContextConfig>default</ContextConfig>
<SUMO>default</SUMO>
<LevelPlot>default</LevelPlot>
<Simulator>Math/Academic/Academic2DTwice.xml</Simulator>
<Run name="" repeat="1">
<InitialDesign>lhdWithCornerPoints</InitialDesign>
<DataSource>random</DataSource>
<SampleEvaluator>matlab</SampleEvaluator>
<SequentialDesign>kriging</SequentialDesign>
<Measure type="CrossValidation" target="0.01" errorFcn="rootRelativeSquareError" use="on" />
<Outputs>
<Output name="out">
</Output>
<Output name="outinverse">
</Output>
</Outputs>
</Run>
</Plan>

...

<SampleSelector id="random" type="RandomSampleSelector" combineOutputs="false"/>

<SampleSelector id="delaunay" type="PipelineSampleSelector" combineOutputs="false">

<CandidateGenerator type="DelaunayCandidateGenerator"/>

<CandidateRanker type="modelDifference">
<Option key="criterion_parameter" value="2"/>
</CandidateRanker>
<CandidateRanker type="delaunayVolume"/>

<MergeCriterion type="WeightedAverage" weights="[1 1]"/>

</SampleSelector>

</ToolboxConfiguration>

</source>

All the tags within the <Plan> tag, except for Measure, Inputs and Outputs, refer to a configuration section defined further in the configuration xml. These tags determine what algorithm will be used for the experimental run. For example in this case the tag:

<source lang="xml">
<SampleSelector>random</SampleSelector>
</source>

refers to the SampleSelector tag with an ''id=random''. So even though both the ''random'' SampleSelector and ''delaunay'' SampleSelector are defined, it is the random SampleSelector which will be used in the experimental run. To use the ''delaunay'' SampleSelector simply replace ''random'' by ''delaunay'' in the <Plan>.

The default.xml already has a number of configuration section already defined, making it easier for you to experiment with them. All you have to do is use the appropriate ''id'' in the right tags.

=== Modifying/Creating your own configuration section ===

In some cases the default configuration sections might no suit your needs. For example if you require a lhdWithCornerPoints [[Config:InitialDesign|initial design]] with 40 instead of the default 20 points. In this case you can either create a new configuration section like so:

<source lang="xml">
<InitialDesign id="lhdWithCornerPoints40" type="CombinedDesign">
<InitialDesign type="LatinHypercubeDesign">

<Option key="points" value="40"/>
</InitialDesign>

<InitialDesign type="FactorialDesign">
<Option key="levels" value="2" />
</InitialDesign>
</InitialDesign>
</source>

and refer to it in the plan using the ''id'' lhdWithCornerPoints40 or you can simply edit the default lhdWithCornerPoints :). When making your own configuration section, make sure you specify all necessary tags and options and you're set to go!

Config:ToolboxConfiguration

2014-02-27T15:25:26Z

Javdrher: /* Tags */

== Toolbox configuration file ==
This is the default SUMO toolbox configuration, this is what gets used when you run 'go' without any arguments You can edit this file directly or make a copy and run that. See the wiki for detailed information.
<source xmlns:saxon="http://icl.com/saxon" lang="xml">
<[[Config:ToolboxConfiguration|ToolboxConfiguration]] version="7.0">
<[[Config:Plan|Plan]]/>
<[[Config:ContextConfig|ContextConfig]]/>
<[[Config:Logging|Logging]]/>
<[[Config:LevelPlot|LevelPlot]]/>
<[[Config:SUMO|SUMO]]/>
<[[Config:DataSource|DataSource]]/>
<[[Config:SequentialDesign|SequentialDesign]]/>
<[[Config:ModelBuilder|ModelBuilder]]/>
<[[Config:BasisFunction|BasisFunction]]/>
<[[Config:InitialDesign|InitialDesign]]/>
<[[Config:Optimizer|Optimizer]]/>
</[[Config:ToolboxConfiguration|ToolboxConfiguration]]>
</source>

== Interpreting the configuration file ==

When first looking at the default.xml you are presented with a lot of information and it can be a bit difficult to understand what is going, especially if you are not familiar with XML. However there is method in the madness, and this section will help you to break down the components that make up the configuration file.

=== Comments ===
Comments in the XML are displayed like so:

<source xmlns:saxon="http://icl.com/saxon" lang="xml">

</source>

The default.xml contains a lot of comments with information about the different sections or example usage.

=== Tags ===
XML groups information that logically belong to with each other within so called '''tags'''. Here is an example of a ''recipe'' tag. This is a recipe to make pancakes, which is grouped in a logical way; all the ingredients ''items'' are grouped within the ''ingredients'' tag. The ''items'' themselves in turn group information about the ''type'' of ingredient and the required ''amount''. Tags can also have attributes, here the ''recipe'' tag has an attribute called ''category'' with the value ''desert''.

<source lang="xml">
<recipe category="dessert">
<title>Pancakes</title>
<author>sumo@intec.ugent.be</author>
<date>Wed, 14 Jun 95</date>
<description>
Good old fashioned pancakes.
</description>
<ingredients>
<item>
<amount>3</amount>
<type>eggs</type>
</item>

<item>
<amount>0.5 tablespoon</amount>
<type>salt</type>
</item>
...
</ingredients>
<preparation>
...
</preparation>
</recipe>
</source>

The configuration file uses XML to group information about the Toolbox into logical units. Here is an example configuration of a [[Config:SampleSelector#delaunay|SampleSelector]] called delaunay. It has three attributes ''id'', ''type'' and ''combineOutputs''. The SUMO Toolbox uses the ''id'' to refer to this configuration section in other places in the configuration file. The ''type'' refers tells the toolbox what class of sample selector it has to look for in the <code>src</code> folder, in this case the class ''PipeLinSampleSelector'' which you can find under <code>src/matlab/sampleselector/@PipeLineSampleSelector</code>. The ''combineOutputs'' tells the SUMO how the SampleSelector has to deal with multiple outputs.

If you look at the implementation of the PipeLineSampleSelector you will see that it requires a ''CandidateGenerator'', ''CandidateRanker'' and a ''MergeCriterion'' all of which are specified here within the ''SampleSelector'' tag.

Other components (such as [[Config:ModelBuilder|ModelBuilder]], [[Config:InitialDesign|initial design]], etc...) require different information. To find what options and configurations you need/can give to a component check out their wiki page or their implementation in the Toolbox.

<source lang=xml>

<SequentialDesign id="delaunay" type="PipelineSampleSelector" combineOutputs="false">

<CandidateGenerator type="DelaunayCandidateGenerator"/>

<CandidateRanker type="modelDifference">
<Option key="criterion_parameter" value="2"/>
</CandidateRanker>
<CandidateRanker type="delaunayVolume"/>

<MergeCriterion type="WeightedAverage" weights="[1 1]"/>

</SequentialDesign>
</source>

=== Changing configuration components of an experimental run ===
The SUMO Toolbox was written with flexibility in mind, making it easy to experiment with different combinations of algorithms. Here is a snippet from a configuration XML. The snippet show the <Plan> tag which determines how the experimental runs are configured and two different pre-defined <SampleSelector> tags, ''random'' and ''delaunay''.

<source xmlns:saxon="http://icl.com/saxon" lang="xml">

<?xml version="1.0" encoding="ISO-8859-1" ?>
<ToolboxConfiguration version="7.0">

<Plan>
<ContextConfig>default</ContextConfig>
<SUMO>default</SUMO>
<LevelPlot>default</LevelPlot>
<Simulator>Math/Academic/Academic2DTwice.xml</Simulator>
<Run name="" repeat="1">
<InitialDesign>lhdWithCornerPoints</InitialDesign>
<SampleSelector>random</SampleSelector>
<SampleEvaluator>matlab</SampleEvaluator>
<AdaptiveModelBuilder>kriging</AdaptiveModelBuilder>
<Measure type="CrossValidation" target="0.01" errorFcn="rootRelativeSquareError" use="on" />
<Outputs>
<Output name="out">
</Output>
<Output name="outinverse">
</Output>
</Outputs>
</Run>
</Plan>

...

<SampleSelector id="random" type="RandomSampleSelector" combineOutputs="false"/>

<SampleSelector id="delaunay" type="PipelineSampleSelector" combineOutputs="false">

<CandidateGenerator type="DelaunayCandidateGenerator"/>

<CandidateRanker type="modelDifference">
<Option key="criterion_parameter" value="2"/>
</CandidateRanker>
<CandidateRanker type="delaunayVolume"/>

<MergeCriterion type="WeightedAverage" weights="[1 1]"/>

</SampleSelector>

</ToolboxConfiguration>

</source>

All the tags within the <Plan> tag, except for Measure, Inputs and Outputs, refer to a configuration section defined further in the configuration xml. These tags determine what algorithm will be used for the experimental run. For example in this case the tag:

<source lang="xml">
<SampleSelector>random</SampleSelector>
</source>

refers to the SampleSelector tag with an ''id=random''. So even though both the ''random'' SampleSelector and ''delaunay'' SampleSelector are defined, it is the random SampleSelector which will be used in the experimental run. To use the ''delaunay'' SampleSelector simply replace ''random'' by ''delaunay'' in the <Plan>.

The default.xml already has a number of configuration section already defined, making it easier for you to experiment with them. All you have to do is use the appropriate ''id'' in the right tags.

=== Modifying/Creating your own configuration section ===

In some cases the default configuration sections might no suit your needs. For example if you require a lhdWithCornerPoints [[Config:InitialDesign|initial design]] with 40 instead of the default 20 points. In this case you can either create a new configuration section like so:

<source lang="xml">
<InitialDesign id="lhdWithCornerPoints40" type="CombinedDesign">
<InitialDesign type="LatinHypercubeDesign">

<Option key="points" value="40"/>
</InitialDesign>

<InitialDesign type="FactorialDesign">
<Option key="levels" value="2" />
</InitialDesign>
</InitialDesign>
</source>

and refer to it in the plan using the ''id'' lhdWithCornerPoints40 or you can simply edit the default lhdWithCornerPoints :). When making your own configuration section, make sure you specify all necessary tags and options and you're set to go!

Config:ToolboxConfiguration

2014-02-27T15:24:46Z

Javdrher: /* Toolbox configuration file */

== Toolbox configuration file ==
This is the default SUMO toolbox configuration, this is what gets used when you run 'go' without any arguments You can edit this file directly or make a copy and run that. See the wiki for detailed information.
<source xmlns:saxon="http://icl.com/saxon" lang="xml">
<[[Config:ToolboxConfiguration|ToolboxConfiguration]] version="7.0">
<[[Config:Plan|Plan]]/>
<[[Config:ContextConfig|ContextConfig]]/>
<[[Config:Logging|Logging]]/>
<[[Config:LevelPlot|LevelPlot]]/>
<[[Config:SUMO|SUMO]]/>
<[[Config:DataSource|DataSource]]/>
<[[Config:SequentialDesign|SequentialDesign]]/>
<[[Config:ModelBuilder|ModelBuilder]]/>
<[[Config:BasisFunction|BasisFunction]]/>
<[[Config:InitialDesign|InitialDesign]]/>
<[[Config:Optimizer|Optimizer]]/>
</[[Config:ToolboxConfiguration|ToolboxConfiguration]]>
</source>

== Interpreting the configuration file ==

When first looking at the default.xml you are presented with a lot of information and it can be a bit difficult to understand what is going, especially if you are not familiar with XML. However there is method in the madness, and this section will help you to break down the components that make up the configuration file.

=== Comments ===
Comments in the XML are displayed like so:

<source xmlns:saxon="http://icl.com/saxon" lang="xml">

</source>

The default.xml contains a lot of comments with information about the different sections or example usage.

=== Tags ===
XML groups information that logically belong to with each other within so called '''tags'''. Here is an example of a ''recipe'' tag. This is a recipe to make pancakes, which is grouped in a logical way; all the ingredients ''items'' are grouped within the ''ingredients'' tag. The ''items'' themselves in turn group information about the ''type'' of ingredient and the required ''amount''. Tags can also have attributes, here the ''recipe'' tag has an attribute called ''category'' with the value ''desert''.

<source lang="xml">
<recipe category="dessert">
<title>Pancakes</title>
<author>sumo@intec.ugent.be</author>
<date>Wed, 14 Jun 95</date>
<description>
Good old fashioned pancakes.
</description>
<ingredients>
<item>
<amount>3</amount>
<type>eggs</type>
</item>

<item>
<amount>0.5 tablespoon</amount>
<type>salt</type>
</item>
...
</ingredients>
<preparation>
...
</preparation>
</recipe>
</source>

The configuration file uses XML to group information about the Toolbox into logical units. Here is an example configuration of a [[Config:SampleSelector#delaunay|SampleSelector]] called delaunay. It has three attributes ''id'', ''type'' and ''combineOutputs''. The SUMO Toolbox uses the ''id'' to refer to this configuration section in other places in the configuration file. The ''type'' refers tells the toolbox what class of sample selector it has to look for in the <code>src</code> folder, in this case the class ''PipeLinSampleSelector'' which you can find under <code>src/matlab/sampleselector/@PipeLineSampleSelector</code>. The ''combineOutputs'' tells the SUMO how the SampleSelector has to deal with multiple outputs.

If you look at the implementation of the PipeLineSampleSelector you will see that it requires a ''CandidateGenerator'', ''CandidateRanker'' and a ''MergeCriterion'' all of which are specified here within the ''SampleSelector'' tag.

Other components (such as [[Config:AdaptiveModelBuilder|AdaptiveModelBuilder]], [[Config:InitialDesign|initial design]], etc...) require different information. To find what options and configurations you need/can give to a component check out their wiki page or their implementation in the Toolbox.

<source lang=xml>

<SampleSelector id="delaunay" type="PipelineSampleSelector" combineOutputs="false">

<CandidateGenerator type="DelaunayCandidateGenerator"/>

<CandidateRanker type="modelDifference">
<Option key="criterion_parameter" value="2"/>
</CandidateRanker>
<CandidateRanker type="delaunayVolume"/>

<MergeCriterion type="WeightedAverage" weights="[1 1]"/>

</SampleSelector>
</source>

=== Changing configuration components of an experimental run ===
The SUMO Toolbox was written with flexibility in mind, making it easy to experiment with different combinations of algorithms. Here is a snippet from a configuration XML. The snippet show the <Plan> tag which determines how the experimental runs are configured and two different pre-defined <SampleSelector> tags, ''random'' and ''delaunay''.

<source xmlns:saxon="http://icl.com/saxon" lang="xml">

<?xml version="1.0" encoding="ISO-8859-1" ?>
<ToolboxConfiguration version="7.0">

<Plan>
<ContextConfig>default</ContextConfig>
<SUMO>default</SUMO>
<LevelPlot>default</LevelPlot>
<Simulator>Math/Academic/Academic2DTwice.xml</Simulator>
<Run name="" repeat="1">
<InitialDesign>lhdWithCornerPoints</InitialDesign>
<SampleSelector>random</SampleSelector>
<SampleEvaluator>matlab</SampleEvaluator>
<AdaptiveModelBuilder>kriging</AdaptiveModelBuilder>
<Measure type="CrossValidation" target="0.01" errorFcn="rootRelativeSquareError" use="on" />
<Outputs>
<Output name="out">
</Output>
<Output name="outinverse">
</Output>
</Outputs>
</Run>
</Plan>

...

<SampleSelector id="random" type="RandomSampleSelector" combineOutputs="false"/>

<SampleSelector id="delaunay" type="PipelineSampleSelector" combineOutputs="false">

<CandidateGenerator type="DelaunayCandidateGenerator"/>

<CandidateRanker type="modelDifference">
<Option key="criterion_parameter" value="2"/>
</CandidateRanker>
<CandidateRanker type="delaunayVolume"/>

<MergeCriterion type="WeightedAverage" weights="[1 1]"/>

</SampleSelector>

</ToolboxConfiguration>

</source>

All the tags within the <Plan> tag, except for Measure, Inputs and Outputs, refer to a configuration section defined further in the configuration xml. These tags determine what algorithm will be used for the experimental run. For example in this case the tag:

<source lang="xml">
<SampleSelector>random</SampleSelector>
</source>

refers to the SampleSelector tag with an ''id=random''. So even though both the ''random'' SampleSelector and ''delaunay'' SampleSelector are defined, it is the random SampleSelector which will be used in the experimental run. To use the ''delaunay'' SampleSelector simply replace ''random'' by ''delaunay'' in the <Plan>.

The default.xml already has a number of configuration section already defined, making it easier for you to experiment with them. All you have to do is use the appropriate ''id'' in the right tags.

=== Modifying/Creating your own configuration section ===

In some cases the default configuration sections might no suit your needs. For example if you require a lhdWithCornerPoints [[Config:InitialDesign|initial design]] with 40 instead of the default 20 points. In this case you can either create a new configuration section like so:

<source lang="xml">
<InitialDesign id="lhdWithCornerPoints40" type="CombinedDesign">
<InitialDesign type="LatinHypercubeDesign">

<Option key="points" value="40"/>
</InitialDesign>

<InitialDesign type="FactorialDesign">
<Option key="levels" value="2" />
</InitialDesign>
</InitialDesign>
</source>

and refer to it in the plan using the ''id'' lhdWithCornerPoints40 or you can simply edit the default lhdWithCornerPoints :). When making your own configuration section, make sure you specify all necessary tags and options and you're set to go!

Toolbox configuration

2014-02-27T15:23:40Z

Javdrher: /* Components */

The toolbox can be configured by means of an [[FAQ#What is XML?|XML]] file.
Examples can be found in the <code>config/</code> and <code>demo/</code> subdirectories of the SUMO installation directory.
The default configuration file is '''<code>config/default.xml</code>'''.

== Structure ==

If you do not know what a tag or XML is please see [[FAQ#What is XML?]] first.

=== Plans and Runs ===

The general structure of the toolbox is as follows:

* The top-level <[[Config:Plan|Plan]]> type defines a surrogate modeling experiment, and an experiment may consist of multiple <[[Config:Plan#Run|Run]]> tags.
* Each <[[Config:Plan#Run|Run]]> tag can be configured separately.

For example, say you want to model some problem from electronics and you have at your disposal 3 algorithms for selecting data points. Now lets assume you want to compare the different algorithms on your problem and see which one gives you the best model with the least number of data samples. In this case your <[[Config:Plan|Plan]]> tag would contain 3 <[[Config:Plan#Run|Run]]> tags and each <[[Config:Plan#Run|Run]]> tag would contain a different <[[SequentialDesign|SequentialDesign]]> tag. For example:

<source lang="xml">
<Plan>
...
<Run name="lola-run" repeat="1">
<SequentialDesign>lola</SequentialDesign>
</Run>
<Run name="density-run" repeat="1">
<SequentialDesign>density</SequentialDesign>
</Run>
<Run name="random-run" repeat="1">
<SequentialDesign>random</SequentialDesign>
</Run>
...
</Plan>
...
</source>

Thus, this concept of a plan and multiple runs allows you to setup different configurations beforehand and try them all in one go.

As you can see it is also possible to specify a '<code>repeat</code>' attribute. Setting it to 5, for example, will ensure that that particular run is repeated 5 times. This is usually a good idea if there is a lot of randomness in the algorithms (as is usually the case).

Remember though to set the '<code>[[Random_state|seedRandomState]]</code>' option in the [[Config:SUMO| <SUMO>]] tag to '<code>random</code>', or otherwise you might get deterministic results:

<source lang="xml">
<Option key="seedRandomState" value="random"/>
</source>

=== Declarations and Definitions ===

Each component in the toolbox has its own configuration section. Inside the <[[Config:Plan#Run|Run]]> tag you ''declare'' what components you would like to use. This declaration refers to the ''definition'' of each component, further down the file. So when you see line like:

<source lang="xml">
<SequentialDesign>lola</SequentialDesign>
</source>

This means we want to use the '<code>lola</code>' sample selection algorithm, the word '<code>lola</code>' is a unique identifier that refers to the <[[SequentialDesign|SequentialDesign]]> tag that has '<code>[[SampleSelector#GradientSampleSelector|lola]]</code>' as its "id" attribute. In this case your configuration file would have the following structure:

<source lang="xml">
<Plan>
<Run>

<SequentialDesign>lola</SequentialDesign>
...
</Run>
...
<Plan>
...


<SequentialDesign id="lola">
...

</SequentialDesign>
...
</source>

If you would like to use a different algorithm (e.g., '<code>[[SampleSelector#ErrorSampleSelector|error]]</code>' for the Error sample selector), you simply fill in a different id in the the <[[SampleSelector|SampleSelector]]> tag in the <[[Config:Plan#Run|Run]]> tag:

<source lang="xml">
<SequentialDesign>error</SequentialDesign>
</source>

You just have to make sure there is a matching definition lower down the file for the id you have filled in.

All the other components ([[Config:ModelBuilder|ModelBuilder]], [[Config:DataSource|DataSource]], ...) work in exactly the same way.

== Running a configuration ==

See the [[Running]] page for how to run the toolbox with a different example or with a your own configuration file.

== Components ==

''We are well aware that documentation is not always complete and possibly even out of date in some cases. We try to document everything as best we can but much is limited by available time and manpower. We are are a university research group after all. The most up to date documentation can always be found (if not here) in the default.xml configuration file and, of course, in the source files. If something is unclear please dont hesitate to [[Reporting problems|ask]].''

The following components can be configured separately:

* [[Config:Plan|Plan]]
* [[Config:Plan|Simulator tag]]
* [[Config:Plan|Inputs]]
* [[Outputs]]
* [[Measures]]
* [[Config:ContextConfig|ContextConfig]]
* [[Config:SUMO|SUMO]]
* [[Config:InitialDesign|Initial Designs]]
* [[Config:DataSource|Data Sources]]
* [[Config:SequentialDesign|Sequential designs]]
* [[Config:ModelBuilder|Model Builders]]

== General guidelines ==

Some general guidelines on how to configure the toolbox for different situations can be found on [[General guidelines|this page]].

Config:SampleSelector

2014-02-27T15:23:17Z

Javdrher: Javdrher moved page Config:SampleSelector to Config:SequentialDesign

#REDIRECT [[Config:SequentialDesign]]

Config:SequentialDesign

2014-02-27T15:23:16Z

Javdrher: Javdrher moved page Config:SampleSelector to Config:SequentialDesign

'''Generated for SUMO toolbox version 7.0'''.
''We are well aware that documentation is not always complete and possibly even out of date in some cases. We try to document everything as best we can but much is limited by available time and manpower. We are are a university research group after all. The most up to date documentation can always be found (if not here) in the default.xml configuration file and, of course, in the source files. If something is unclear please dont hesitate to [[Reporting problems|ask]].''
== SampleSelector ==

=== empty ===
Don't select any new samples, useful when modeling multiple outputs, and you don't want to involve one of these outputs in the sampling process.
<source xmlns:saxon="http://icl.com/saxon" lang="xml">
<[[Config:SampleSelector|SampleSelector]] type="[[SampleSelector#EmptySampleSelector|EmptySampleSelector]]" combineOutputs="false"/>
</source>
=== random ===
Selects new samples randomly in the design space.
<source xmlns:saxon="http://icl.com/saxon" lang="xml">
<[[Config:SampleSelector|SampleSelector]] type="[[SampleSelector#RandomSampleSelector|RandomSampleSelector]]" combineOutputs="false"/>
</source>
=== delaunay ===
This sample selector uses a Delaunay triangulation of the data to select samples in locations far from previous samples, or in locations where the estimated model error is largest. This algorithm uses QHull, which is very slow for high dimensions, so you should only use this sample selector for less than 6D and for less than 1000 samples.
<source xmlns:saxon="http://icl.com/saxon" lang="xml">
<[[Config:SampleSelector|SampleSelector]] type="[[SampleSelector#PipelineSampleSelector|PipelineSampleSelector]]" combineOutputs="false">

<[[Config:CandidateGenerator|CandidateGenerator]] type="[[CandidateGenerator#DelaunayCandidateGenerator|DelaunayCandidateGenerator]]"/>

<[[Config:CandidateRanker|CandidateRanker]] type="[[CandidateRanker#modelDifference|modelDifference]]">
<Option key="criterion_parameter" value="2"/>
</[[Config:CandidateRanker|CandidateRanker]]>
<[[Config:CandidateRanker|CandidateRanker]] type="[[CandidateRanker#delaunayVolume|delaunayVolume]]"/>

<[[Config:MergeCriterion|MergeCriterion]] type="[[MergeCriterion#WeightedAverage|WeightedAverage]]" weights="[1 1]"/>

</[[Config:SampleSelector|SampleSelector]]>
</source>
=== density ===
A space-filling sampling algorithm which uses an approximation of the Voronoi tessellation of the design space. Will only sample within the "allowed" areas if constraints are specified.
<source xmlns:saxon="http://icl.com/saxon" lang="xml">
<[[Config:SampleSelector|SampleSelector]] type="[[SampleSelector#VoronoiSampleRanker|VoronoiSampleRanker]]" combineOutputs="false"/>
</source>
=== error ===
An adaptive sample selection algorithm (error based), driven by the evaluation of your model on a dense grid, which selects samples in locations where the model error is estimated to be the largest.
<source xmlns:saxon="http://icl.com/saxon" lang="xml">
<[[Config:SampleSelector|SampleSelector]] type="[[SampleSelector#PipelineSampleSelector|PipelineSampleSelector]]" combineOutputs="false">

<[[Config:CandidateGenerator|CandidateGenerator]] type="[[CandidateGenerator#GridCandidateGenerator|GridCandidateGenerator]]"/>

<[[Config:CandidateRanker|CandidateRanker]] type="[[CandidateRanker#modelDifference|modelDifference]]">
<Option key="criterion_parameter" value="4"/>
</[[Config:CandidateRanker|CandidateRanker]]>

<[[Config:MergeCriterion|MergeCriterion]] type="[[MergeCriterion#ClosenessThreshold|ClosenessThreshold]]">


<Option key="closenessThreshold" value="0.05"/>

<Option key="randomPercentage" value="20"/>

<Option key="debug" value="off"/>
</[[Config:MergeCriterion|MergeCriterion]]>
</[[Config:SampleSelector|SampleSelector]]>
</source>
=== lola-voronoi ===
A highly adaptive sampling algorithm which performs a trade-off between exploration (filling up the design space as equally as possible) and exploitation (selecting data points in highly nonlinear regions). lola-voronoi is the only sample selector which currently supports multiple outputs, auto-sampled inputs and constraints.
<source xmlns:saxon="http://icl.com/saxon" lang="xml">
<[[Config:SampleSelector|SampleSelector]] type="[[SampleSelector#LOLAVoronoiSampleSelector|LOLAVoronoiSampleSelector]]" combineOutputs="false">

<Option key="neighbourhoodSize" value="2"/>

<Option key="frequencies" value="11"/>
</[[Config:SampleSelector|SampleSelector]]>
</source>
=== rationalPoleSupression ===
A sampling algorithm aimed at supressing poles in rational models by sampling them (only for Rational models)
<source xmlns:saxon="http://icl.com/saxon" lang="xml">
<[[Config:SampleSelector|SampleSelector]] type="[[SampleSelector#OptimizeCriterion|OptimizeCriterion]]" combineOutputs="false">


<[[Config:Optimizer|Optimizer]]>patternsearch</[[Config:Optimizer|Optimizer]]>

<[[Config:CandidateRanker|CandidateRanker]] type="[[CandidateRanker#rationalPoleSupression|rationalPoleSupression]]" scaling="none"/>
<[[Config:CandidateRanker|CandidateRanker]] type="[[CandidateRanker#modelDifference|modelDifference]]" scaling="none"/>


<Option key="debug" value="off"/>
</[[Config:SampleSelector|SampleSelector]]>
</source>
=== expectedImprovement ===
A sampling algorithm aimed at optimization problems (only for Kriging and RBF)
<source xmlns:saxon="http://icl.com/saxon" lang="xml">
<[[Config:SampleSelector|SampleSelector]] type="[[SampleSelector#OptimizeCriterion|OptimizeCriterion]]" combineOutputs="false">


<[[Config:Optimizer|Optimizer]]>patternsearch</[[Config:Optimizer|Optimizer]]>

<[[Config:CandidateRanker|CandidateRanker]] type="[[CandidateRanker#expectedImprovement|expectedImprovement]]" scaling="none">
</[[Config:CandidateRanker|CandidateRanker]]>
<[[Config:CandidateRanker|CandidateRanker]] type="[[CandidateRanker#maxvar|maxvar]]" scaling="none"/>


<Option key="debug" value="off"/>
</[[Config:SampleSelector|SampleSelector]]>
</source>
=== extremaLOLA ===
LOLA-Voronoi sample selector supplemented with 1 sample at the minimum and maximum
<source xmlns:saxon="http://icl.com/saxon" lang="xml">
<[[Config:SampleSelector|SampleSelector]] type="[[SampleSelector#CombinedSampleSelector|CombinedSampleSelector]]" combineOutputs="false">

<[[Config:SampleSelector|SampleSelector]] weight="0.8">lola-voronoi</[[Config:SampleSelector|SampleSelector]]>
<[[Config:SampleSelector|SampleSelector]] weight="0.1">sampleMinimum</[[Config:SampleSelector|SampleSelector]]>
<[[Config:SampleSelector|SampleSelector]] weight="0.1">sampleMaximum</[[Config:SampleSelector|SampleSelector]]>

<[[Config:MergeCriterion|MergeCriterion]] type="[[MergeCriterion#ClosenessThreshold|ClosenessThreshold]]">


<Option key="closenessThreshold" value="0.05"/>

<Option key="randomPercentage" value="0"/>

<Option key="debug" value="off"/>
</[[Config:MergeCriterion|MergeCriterion]]>
</[[Config:SampleSelector|SampleSelector]]>
</source>
=== sampleMinimum ===
Selects one sample at the minimum of the model.
<source xmlns:saxon="http://icl.com/saxon" lang="xml">
<[[Config:SampleSelector|SampleSelector]] type="[[SampleSelector#OptimizeCriterion|OptimizeCriterion]]" combineOutputs="false">
<[[Config:Optimizer|Optimizer]]>patternsearch</[[Config:Optimizer|Optimizer]]>
<[[Config:CandidateRanker|CandidateRanker]] type="[[CandidateRanker#minmodel|minmodel]]" scaling="none"/>
</[[Config:SampleSelector|SampleSelector]]>
</source>
=== sampleMaximum ===
Selects one sample at the maximum of the model.
<source xmlns:saxon="http://icl.com/saxon" lang="xml">
<[[Config:SampleSelector|SampleSelector]] type="[[SampleSelector#OptimizeCriterion|OptimizeCriterion]]" combineOutputs="false">
<[[Config:Optimizer|Optimizer]]>patternsearch</[[Config:Optimizer|Optimizer]]>
<[[Config:CandidateRanker|CandidateRanker]] type="[[CandidateRanker#maxmodel|maxmodel]]" scaling="none"/>
</[[Config:SampleSelector|SampleSelector]]>
</source>
=== default ===
LOLA sample selector combined with error based sample selector
<source xmlns:saxon="http://icl.com/saxon" lang="xml">
<[[Config:SampleSelector|SampleSelector]] type="[[SampleSelector#CombinedSampleSelector|CombinedSampleSelector]]" combineOutputs="false">
<[[Config:SampleSelector|SampleSelector]] weight="0.7">lola-voronoi</[[Config:SampleSelector|SampleSelector]]>
<[[Config:SampleSelector|SampleSelector]] weight="0.3">error</[[Config:SampleSelector|SampleSelector]]>

<[[Config:MergeCriterion|MergeCriterion]] type="[[MergeCriterion#ClosenessThreshold|ClosenessThreshold]]">

<Option key="closenessThreshold" value="0.05"/>

<Option key="randomPercentage" value="0"/>

<Option key="debug" value="off"/>
</[[Config:MergeCriterion|MergeCriterion]]>
</[[Config:SampleSelector|SampleSelector]]>
</source>

Toolbox configuration

2014-02-27T15:22:22Z

Javdrher: /* Components */

The toolbox can be configured by means of an [[FAQ#What is XML?|XML]] file.
Examples can be found in the <code>config/</code> and <code>demo/</code> subdirectories of the SUMO installation directory.
The default configuration file is '''<code>config/default.xml</code>'''.

== Structure ==

If you do not know what a tag or XML is please see [[FAQ#What is XML?]] first.

=== Plans and Runs ===

The general structure of the toolbox is as follows:

* The top-level <[[Config:Plan|Plan]]> type defines a surrogate modeling experiment, and an experiment may consist of multiple <[[Config:Plan#Run|Run]]> tags.
* Each <[[Config:Plan#Run|Run]]> tag can be configured separately.

For example, say you want to model some problem from electronics and you have at your disposal 3 algorithms for selecting data points. Now lets assume you want to compare the different algorithms on your problem and see which one gives you the best model with the least number of data samples. In this case your <[[Config:Plan|Plan]]> tag would contain 3 <[[Config:Plan#Run|Run]]> tags and each <[[Config:Plan#Run|Run]]> tag would contain a different <[[SequentialDesign|SequentialDesign]]> tag. For example:

<source lang="xml">
<Plan>
...
<Run name="lola-run" repeat="1">
<SequentialDesign>lola</SequentialDesign>
</Run>
<Run name="density-run" repeat="1">
<SequentialDesign>density</SequentialDesign>
</Run>
<Run name="random-run" repeat="1">
<SequentialDesign>random</SequentialDesign>
</Run>
...
</Plan>
...
</source>

Thus, this concept of a plan and multiple runs allows you to setup different configurations beforehand and try them all in one go.

As you can see it is also possible to specify a '<code>repeat</code>' attribute. Setting it to 5, for example, will ensure that that particular run is repeated 5 times. This is usually a good idea if there is a lot of randomness in the algorithms (as is usually the case).

Remember though to set the '<code>[[Random_state|seedRandomState]]</code>' option in the [[Config:SUMO| <SUMO>]] tag to '<code>random</code>', or otherwise you might get deterministic results:

<source lang="xml">
<Option key="seedRandomState" value="random"/>
</source>

=== Declarations and Definitions ===

Each component in the toolbox has its own configuration section. Inside the <[[Config:Plan#Run|Run]]> tag you ''declare'' what components you would like to use. This declaration refers to the ''definition'' of each component, further down the file. So when you see line like:

<source lang="xml">
<SequentialDesign>lola</SequentialDesign>
</source>

This means we want to use the '<code>lola</code>' sample selection algorithm, the word '<code>lola</code>' is a unique identifier that refers to the <[[SequentialDesign|SequentialDesign]]> tag that has '<code>[[SampleSelector#GradientSampleSelector|lola]]</code>' as its "id" attribute. In this case your configuration file would have the following structure:

<source lang="xml">
<Plan>
<Run>

<SequentialDesign>lola</SequentialDesign>
...
</Run>
...
<Plan>
...


<SequentialDesign id="lola">
...

</SequentialDesign>
...
</source>

If you would like to use a different algorithm (e.g., '<code>[[SampleSelector#ErrorSampleSelector|error]]</code>' for the Error sample selector), you simply fill in a different id in the the <[[SampleSelector|SampleSelector]]> tag in the <[[Config:Plan#Run|Run]]> tag:

<source lang="xml">
<SequentialDesign>error</SequentialDesign>
</source>

You just have to make sure there is a matching definition lower down the file for the id you have filled in.

All the other components ([[Config:ModelBuilder|ModelBuilder]], [[Config:DataSource|DataSource]], ...) work in exactly the same way.

== Running a configuration ==

See the [[Running]] page for how to run the toolbox with a different example or with a your own configuration file.

== Components ==

''We are well aware that documentation is not always complete and possibly even out of date in some cases. We try to document everything as best we can but much is limited by available time and manpower. We are are a university research group after all. The most up to date documentation can always be found (if not here) in the default.xml configuration file and, of course, in the source files. If something is unclear please dont hesitate to [[Reporting problems|ask]].''

The following components can be configured separately:

* [[Config:Plan|Plan]]
* [[Config:Plan|Simulator tag]]
* [[Config:Plan|Inputs]]
* [[Outputs]]
* [[Measures]]
* [[Config:ContextConfig|ContextConfig]]
* [[Config:SUMO|SUMO]]
* [[Config:InitialDesign|Initial Designs]]
* [[Config:DataSource|Data Sources]]
* [[Config:Sequential Design|Sequential designs]]
* [[Config:ModelBuilder|Model Builders]]

== General guidelines ==

Some general guidelines on how to configure the toolbox for different situations can be found on [[General guidelines|this page]].

Toolbox configuration

2014-02-27T15:20:56Z

Javdrher: /* Declarations and Definitions */

The toolbox can be configured by means of an [[FAQ#What is XML?|XML]] file.
Examples can be found in the <code>config/</code> and <code>demo/</code> subdirectories of the SUMO installation directory.
The default configuration file is '''<code>config/default.xml</code>'''.

== Structure ==

If you do not know what a tag or XML is please see [[FAQ#What is XML?]] first.

=== Plans and Runs ===

The general structure of the toolbox is as follows:

* The top-level <[[Config:Plan|Plan]]> type defines a surrogate modeling experiment, and an experiment may consist of multiple <[[Config:Plan#Run|Run]]> tags.
* Each <[[Config:Plan#Run|Run]]> tag can be configured separately.

For example, say you want to model some problem from electronics and you have at your disposal 3 algorithms for selecting data points. Now lets assume you want to compare the different algorithms on your problem and see which one gives you the best model with the least number of data samples. In this case your <[[Config:Plan|Plan]]> tag would contain 3 <[[Config:Plan#Run|Run]]> tags and each <[[Config:Plan#Run|Run]]> tag would contain a different <[[SequentialDesign|SequentialDesign]]> tag. For example:

<source lang="xml">
<Plan>
...
<Run name="lola-run" repeat="1">
<SequentialDesign>lola</SequentialDesign>
</Run>
<Run name="density-run" repeat="1">
<SequentialDesign>density</SequentialDesign>
</Run>
<Run name="random-run" repeat="1">
<SequentialDesign>random</SequentialDesign>
</Run>
...
</Plan>
...
</source>

Thus, this concept of a plan and multiple runs allows you to setup different configurations beforehand and try them all in one go.

As you can see it is also possible to specify a '<code>repeat</code>' attribute. Setting it to 5, for example, will ensure that that particular run is repeated 5 times. This is usually a good idea if there is a lot of randomness in the algorithms (as is usually the case).

Remember though to set the '<code>[[Random_state|seedRandomState]]</code>' option in the [[Config:SUMO| <SUMO>]] tag to '<code>random</code>', or otherwise you might get deterministic results:

<source lang="xml">
<Option key="seedRandomState" value="random"/>
</source>

=== Declarations and Definitions ===

Each component in the toolbox has its own configuration section. Inside the <[[Config:Plan#Run|Run]]> tag you ''declare'' what components you would like to use. This declaration refers to the ''definition'' of each component, further down the file. So when you see line like:

<source lang="xml">
<SequentialDesign>lola</SequentialDesign>
</source>

This means we want to use the '<code>lola</code>' sample selection algorithm, the word '<code>lola</code>' is a unique identifier that refers to the <[[SequentialDesign|SequentialDesign]]> tag that has '<code>[[SampleSelector#GradientSampleSelector|lola]]</code>' as its "id" attribute. In this case your configuration file would have the following structure:

<source lang="xml">
<Plan>
<Run>

<SequentialDesign>lola</SequentialDesign>
...
</Run>
...
<Plan>
...


<SequentialDesign id="lola">
...

</SequentialDesign>
...
</source>

If you would like to use a different algorithm (e.g., '<code>[[SampleSelector#ErrorSampleSelector|error]]</code>' for the Error sample selector), you simply fill in a different id in the the <[[SampleSelector|SampleSelector]]> tag in the <[[Config:Plan#Run|Run]]> tag:

<source lang="xml">
<SequentialDesign>error</SequentialDesign>
</source>

You just have to make sure there is a matching definition lower down the file for the id you have filled in.

All the other components ([[Config:ModelBuilder|ModelBuilder]], [[Config:DataSource|DataSource]], ...) work in exactly the same way.

== Running a configuration ==

See the [[Running]] page for how to run the toolbox with a different example or with a your own configuration file.

== Components ==

''We are well aware that documentation is not always complete and possibly even out of date in some cases. We try to document everything as best we can but much is limited by available time and manpower. We are are a university research group after all. The most up to date documentation can always be found (if not here) in the default.xml configuration file and, of course, in the source files. If something is unclear please dont hesitate to [[Reporting problems|ask]].''

The following components can be configured separately:

* [[Config:Plan|Plan]]
* [[Config:Plan|Simulator tag]]
* [[Config:Plan|Inputs]]
* [[Outputs]]
* [[Measures]]
* [[Config:ContextConfig|ContextConfig]]
* [[Config:SUMO|SUMO]]
* [[Config:InitialDesign|Initial Designs]]
* [[Config:SampleEvaluator|Sample Evaluators]]
* [[Config:SampleSelector|Sample Selectors]]
* [[Config:AdaptiveModelBuilder|Adaptive Model Builders]]

== General guidelines ==

Some general guidelines on how to configure the toolbox for different situations can be found on [[General guidelines|this page]].

Config:SampleEvaluator

2014-02-27T15:20:25Z

Javdrher: Javdrher moved page Config:SampleEvaluator to Config:DataSource: Component was renamed

#REDIRECT [[Config:DataSource]]

Config:DataSource

2014-02-27T15:20:25Z

Javdrher: Javdrher moved page Config:SampleEvaluator to Config:DataSource: Component was renamed

This page lists the various SampleEvaluators used by the SUMO Toolbox. To find out more about the data formats and how to define your own data generating code go [[Interfacing_with_the_toolbox|here]].

'''Generated for SUMO toolbox version 7.0'''.
''We are well aware that documentation is not always complete and possibly even out of date in some cases. We try to document everything as best we can but much is limited by available time and manpower. We are are a university research group after all. The most up to date documentation can always be found (if not here) in the default.xml configuration file and, of course, in the source files. If something is unclear please dont hesitate to [[Reporting problems|ask]].''
== SampleEvaluator ==

=== local ===
Use this if you data generator is a native executable, shell script, or java class
<source xmlns:saxon="http://icl.com/saxon" lang="xml">
<[[Config:SampleEvaluator|SampleEvaluator]] id="local" type="ibbt.sumo.sampleevaluators.LocalSampleEvaluator">

<Option key="maxResubmissions" value="1"/>

<Option key="sampleTimeout" value="-1"/>

<Option key="simulatorType" value=""/>



<Option key="threadCount" value="1"/>
</[[Config:SampleEvaluator|SampleEvaluator]]>
</source>
=== matlabOld ===
Evaluate samples using a matlab script (ie. your simulator is a matlab script). The evaluation is handled via the Java side of the toolbox.

<source xmlns:saxon="http://icl.com/saxon" lang="xml">
<[[Config:SampleEvaluator|SampleEvaluator]] id="matlab" type="ibbt.sumo.sampleevaluators.matlab.MatlabSampleEvaluator">

<Option key="maxResubmissions" value="1"/>

<Option key="sampleTimeout" value="-1"/>
</[[Config:SampleEvaluator|SampleEvaluator]]>
</source>

=== matlab ===
Evaluate samples using a matlab script '''without''' using Java. This is the default evaluator of Matlab m-files.

<source xmlns:saxon="http://icl.com/saxon" lang="xml">
<SampleEvaluator id="matlabDirect" type="MatlabDirectSampleEvaluator"/>
</source>

=== griddedDataset ===
Evaluate samples using a gridded dataset. This data format does not include any inputs, but lists only outputs and only work for a uniform grid of data points. The order in which the output values are given determine their location in the grid.

For example, if you want to define a 3-dimensional dataset with grid size 2x3x2 on the [-1,1] domain, you must provide the outputs for the samples in the following order:

<code><pre>
value at [-1, -1, -1]
value at [-1, -1, 1]
value at [-1, 0, -1]
value at [-1, 0, 1]
value at [-1, 1, -1]
value at [-1, 1, 1]
value at [ 1, -1, -1]
value at [ 1, -1, 1]
value at [ 1, 0, -1]
value at [ 1, 0, 1]
value at [ 1, 1, -1]
value at [ 1, 1, 1]
</pre></code>

<source xmlns:saxon="http://icl.com/saxon" lang="xml">
<[[Config:SampleEvaluator|SampleEvaluator]] id="griddedDataset" type="ibbt.sumo.sampleevaluators.datasets.GriddedDatasetSampleEvaluator">


</[[Config:SampleEvaluator|SampleEvaluator]]>
</source>

=== scatteredDataset ===
Evaluate samples using a scattered dataset. Each row of the dataset represents a data point. If the dimensionality of the problem is D (i.e. there are D inputs), the first D columns represent the inputs and the remaining columns the outputs. Note that complex number need to given as separate columns.

<source xmlns:saxon="http://icl.com/saxon" lang="xml">
<[[Config:SampleEvaluator|SampleEvaluator]] id="scatteredDataset" type="ibbt.sumo.sampleevaluators.datasets.ScatteredDatasetSampleEvaluator">


</[[Config:SampleEvaluator|SampleEvaluator]]>
</source>

=== calcua ===
Evaluate samples on a SGE administered cluster through a remote, ssh reachable frontnode
<source xmlns:saxon="http://icl.com/saxon" lang="xml">
<[[Config:SampleEvaluator|SampleEvaluator]] id="calcua" type="ibbt.sumo.sampleevaluators.distributed.sge.RemoteSGESampleEvaluator">

<Option key="maxResubmissions" value="1"/>

<Option key="sampleTimeout" value="-1"/>


<[[Config:Executable|Executable]] platform="linux" arch="x86_64"/>
<[[Config:Backend|Backend]] id="remoteSGE" type="ibbt.sumo.sampleevaluators.distributed.sge.RemoteSGEBackend">

<Option key="user" value="dgorisse"/>

<Option key="frontNode" value="submit.calcua.ua.ac.be"/>

<Option key="remoteDirectory" value="/storeA/users/dgorisse/output"/>

<Option key="pollInterval" value="20"/>

<Option key="queues" value="all.q,fast.q"/>

<Option key="queueRevisionRate" value="10"/>

<Option key="environmentCommand" value=". ~/.profile;"/>
</[[Config:Backend|Backend]]>
</[[Config:SampleEvaluator|SampleEvaluator]]>
</source>

Toolbox configuration

2014-02-27T15:20:04Z

Javdrher: /* Declarations and Definitions */

The toolbox can be configured by means of an [[FAQ#What is XML?|XML]] file.
Examples can be found in the <code>config/</code> and <code>demo/</code> subdirectories of the SUMO installation directory.
The default configuration file is '''<code>config/default.xml</code>'''.

== Structure ==

If you do not know what a tag or XML is please see [[FAQ#What is XML?]] first.

=== Plans and Runs ===

The general structure of the toolbox is as follows:

* The top-level <[[Config:Plan|Plan]]> type defines a surrogate modeling experiment, and an experiment may consist of multiple <[[Config:Plan#Run|Run]]> tags.
* Each <[[Config:Plan#Run|Run]]> tag can be configured separately.

For example, say you want to model some problem from electronics and you have at your disposal 3 algorithms for selecting data points. Now lets assume you want to compare the different algorithms on your problem and see which one gives you the best model with the least number of data samples. In this case your <[[Config:Plan|Plan]]> tag would contain 3 <[[Config:Plan#Run|Run]]> tags and each <[[Config:Plan#Run|Run]]> tag would contain a different <[[SequentialDesign|SequentialDesign]]> tag. For example:

<source lang="xml">
<Plan>
...
<Run name="lola-run" repeat="1">
<SequentialDesign>lola</SequentialDesign>
</Run>
<Run name="density-run" repeat="1">
<SequentialDesign>density</SequentialDesign>
</Run>
<Run name="random-run" repeat="1">
<SequentialDesign>random</SequentialDesign>
</Run>
...
</Plan>
...
</source>

Thus, this concept of a plan and multiple runs allows you to setup different configurations beforehand and try them all in one go.

As you can see it is also possible to specify a '<code>repeat</code>' attribute. Setting it to 5, for example, will ensure that that particular run is repeated 5 times. This is usually a good idea if there is a lot of randomness in the algorithms (as is usually the case).

Remember though to set the '<code>[[Random_state|seedRandomState]]</code>' option in the [[Config:SUMO| <SUMO>]] tag to '<code>random</code>', or otherwise you might get deterministic results:

<source lang="xml">
<Option key="seedRandomState" value="random"/>
</source>

=== Declarations and Definitions ===

Each component in the toolbox has its own configuration section. Inside the <[[Config:Plan#Run|Run]]> tag you ''declare'' what components you would like to use. This declaration refers to the ''definition'' of each component, further down the file. So when you see line like:

<source lang="xml">
<SequentialDesign>lola</SequentialDesign>
</source>

This means we want to use the '<code>lola</code>' sample selection algorithm, the word '<code>lola</code>' is a unique identifier that refers to the <[[SequentialDesign|SequentialDesign]]> tag that has '<code>[[SampleSelector#GradientSampleSelector|lola]]</code>' as its "id" attribute. In this case your configuration file would have the following structure:

<source lang="xml">
<Plan>
<Run>

<SequentialDesign>lola</SequentialDesign>
...
</Run>
...
<Plan>
...


<SequentialDesign id="lola">
...

</SequentialDesign>
...
</source>

If you would like to use a different algorithm (e.g., '<code>[[SampleSelector#ErrorSampleSelector|error]]</code>' for the Error sample selector), you simply fill in a different id in the the <[[SampleSelector|SampleSelector]]> tag in the <[[Config:Plan#Run|Run]]> tag:

<source lang="xml">
<SequentialDesign>error</SequentialDesign>
</source>

You just have to make sure there is a matching definition lower down the file for the id you have filled in.

All the other components ([[Config:ModelBuilder|ModelBuilder]], [[Config:SampleEvaluator|SampleEvaluator]], ...) work in exactly the same way.

== Running a configuration ==

See the [[Running]] page for how to run the toolbox with a different example or with a your own configuration file.

== Components ==

''We are well aware that documentation is not always complete and possibly even out of date in some cases. We try to document everything as best we can but much is limited by available time and manpower. We are are a university research group after all. The most up to date documentation can always be found (if not here) in the default.xml configuration file and, of course, in the source files. If something is unclear please dont hesitate to [[Reporting problems|ask]].''

The following components can be configured separately:

* [[Config:Plan|Plan]]
* [[Config:Plan|Simulator tag]]
* [[Config:Plan|Inputs]]
* [[Outputs]]
* [[Measures]]
* [[Config:ContextConfig|ContextConfig]]
* [[Config:SUMO|SUMO]]
* [[Config:InitialDesign|Initial Designs]]
* [[Config:SampleEvaluator|Sample Evaluators]]
* [[Config:SampleSelector|Sample Selectors]]
* [[Config:AdaptiveModelBuilder|Adaptive Model Builders]]

== General guidelines ==

Some general guidelines on how to configure the toolbox for different situations can be found on [[General guidelines|this page]].

Config:AdaptiveModelBuilder

2014-02-27T15:19:18Z

Javdrher: Javdrher moved page Config:AdaptiveModelBuilder to Config:ModelBuilder: Component was renamed

#REDIRECT [[Config:ModelBuilder]]