Difference between revisions of "Interfacing with the toolbox"

From SUMOwiki
Jump to navigationJump to search
(New page: == IMPORTANT == Version 4.1: * '''The toolbox only works on the input domain [-1 1]''' ** '''You must scale the inputs of your datasets to [-1 1]''' ** '''You must wrap or edit your simul...)
 
 
(49 intermediate revisions by 4 users not shown)
Line 1: Line 1:
 +
For information how to model your own problem/data see the [[Adding an example]] page.
 +
 
== IMPORTANT ==
 
== IMPORTANT ==
  
Version 4.1:
+
=== Input/Output Format ===
* '''The toolbox only works on the input domain [-1 1]'''
+
 
** '''You must scale the inputs of your datasets to [-1 1]'''
+
* The SUMO Toolbox works on any '''input domain''' (= design space = input parameter ranges) specified in the [[simulator configuration]] file by a '<code>minimum</code>' and '<code>maximum</code>' attribute, for each input parameter.
** '''You must wrap or edit your simulation code so that it scales the input points generated by the toolbox to the range required by your code.'''
+
** If a '<code>minimum</code>' is not specified, the default value of '<code>-1</code>' is assumed.
 +
** If a '<code>maximum</code>' is not specified, the default value of '<code>+1</code>' is assumed.  
 +
** Example:
  
Version 4.2 and higher:
+
<source lang="xml">
* '''The toolbox works on any input domain specified in the simulator configuration file by a minimum and maximum attribute.
+
<InputParameters>
** '''If a minimum is not given, the default value of -1 is assumed. Equivalently, the default value for the maximum range is 1.'''
+
    <Parameter name="a" type="real" minimum="47.0" maximum="50.0"/>
** '''Example:'''
+
    <Parameter name="b" type="real" minimum="-20.0"/>
<code><pre><InputParameters>
+
</InputParameters>
<Parameter name="x" type="real" minimum="47.0" maximum="50.0"/>
+
</source>
<Parameter name="y" type="real" minimum="-20.0"/>
 
</InputParameters></pre></code>
 
* '''Careful, all input values that are not in the domain specified are trimmed and thus not used in the modeling process.
 
  
 +
* Be aware that all input values that are not in the specified input domain are trimmed, and thus not used in the modeling process.
  
 
Also remember that:
 
Also remember that:
 +
* A '''Complex output''' should always be returned as '''2 real values''' (i.e., real part and imaginary part separately).
 +
 +
Make sure your data source complies with these requirements. This is your responsibility.
  
* '''Complex outputs should always be returned as 2 values (real and imaginary)'''
+
=== Batch Mode ===
  
So make sure you have scaled your datasets or wrapped your executable, this is your responsibility.
+
By default the toolbox will call your simulation code or script with one point at a time.  However, sometimes this may not be efficient and you want to execute multiple simulations in one go.  This is referred to as '''Batch Mode'''.
 +
 
 +
If you want to use Batch mode you must add the ''batch'' and ''batchSize'' attributes to the <Executable> tag in your simulator file.  For example:
 +
 
 +
<source lang="xml">
 +
<Executable platform="matlab" batch="true" batchSize="9">/path/to/your/executable</Executable>
 +
</source>
 +
 
 +
This means that we want to use batch mode (batch = true) and we want to evaluate maximally 9 points per batch.
  
 
== Passing data directly ==
 
== Passing data directly ==
As mentioned on the [[Running]] page you can call the toolbox as follows:
 
  
<code><pre>
+
It is possible to pass data directly to the toolbox.  For how to do this just type "help go".
    "go('MyConfigFile.xml',xValues, yValues, [options])"
+
Remember though that the dimensions of your data must still match the information in the [[Toolbox configuration|toolbox configuration file]] used.
</pre></code>
+
 
 +
== Scattered datasets ==
 +
 
 +
Your data source may also be a dataset containing some scattered data points.  Scattered means the points do not have to be in any order, i.e., they may be distributed in any way (e.g., randomly).  In this case your dataset must be stored in textual format and should contain exactly one data point per row with inputs and outputs separated by spaces.
 +
 
 +
For example, for a problem with 3 inputs and 2 outputs your text file looks like:
 +
 
 +
  -1.5743  -0.0328    0.2732  -0.6980  -0.8389
 +
  -0.7347  -1.8929    0.2294  -0.9992  -1.5545
 +
    0.7472    0.5474  -0.8233    0.9931    1.5339
 +
    0.3766    0.8020  -0.0336    0.9758    1.4774
 +
      ...      ...      ...      ...      ...
 +
    0.8785    0.0362  -1.4864    0.8407    1.1173
 +
 
 +
So the first three columns are the input points, the last two are the outputs.  The file may not contain any other comments or text. Again, remember that a complex output should be stored as two columns (real and imaginary).
 +
 
 +
== Native simulator ==
 +
 
 +
If your simulator is a native binary (e.g., exe file) or shell script it is expected to produce one '''output''' value per line.  So every output should be on a new line, with complex outputs using two lines.  Your code/script should NOT produce any other output.
  
This allows you to pass your data directly to the toolbox. Remember though that the dimensions of your data must still match the information in ''MyConfigFile.xml''.
+
There are two ways your code/script can be called by the SUMO-Toolbox: '''batch mode''' and '''command line mode'''.
  
== Native Simulator ==
+
In '''command line mode''' (= the default option), the inputs are given to the simulator as command line arguments. A call to a simulator in command line mode looks like (for a problem with 3 input parameters):
If your simulator is a native code or script it is expected to produce one output value per line. So every output should be on a new line, with complex outputs using two lines. There are 2 inputs methods supported for native simulators: batch mode and command line mode.
 
  
In command line mode (the default option), the inputs are given to the simulator as command line arguments. A call to a simulator in command line mode looks like:
+
<source lang="bash">
 +
$ ./yourSimulationCode  0.5  0.6  0.5
 +
</source>
  
<code>
+
Your code/script should then produce one value per output per line (as discussed above).
<pre>
 
>> ./someSimulationCode 0.5 0.6 0.5
 
</pre>
 
</code>
 
  
In batch mode, multiple samples can be evaluated in batches. The simulation code is called with no command line arguments (except for optional options, see below). The inputs for a batch are instead given to the simulator on standard input (stdin). First, the size of the batch (the number of samples) is placed on stdin. Then, one line is written for each sample. this means that in total, 1 + (batchSize * inputDimension) numbers are written to stdin. An example of the format looks like:
+
If your simulator is called with '''batch mode''', multiple samples can be evaluated in batches. The simulation code is called with NO command line arguments (except for optional options, see below). The input points for a batch are instead passed to your simulation code/script through standard input (<code>stdin</code>). First, the size of the batch (the number of samples) is placed on <code>stdin</code>. Then, one line is written for each sample. this means that in total, <code>1 + (batchSize * inputDimension)</code> numbers are written to <code>stdin</code>. An example of the format looks like:
  
<pre>
+
<source lang="bash">
 
3
 
3
0.5 0.6 0.5
+
0.5 0.6 0.5
0.2 0.7 0.3
+
0.2 0.7 0.3
0.2 0.6 0.8
+
0.2 0.6 0.8
</pre>
+
</source>
  
 +
The executable '<code>yourSimulationCode</code>' must be in your path or the absolute path to the executable must be specified in the [[simulator configuration]] xml file.
  
The executable ''someSimulationCode'' must be in your path (easiest is just to place it in the M3-Toolbox/bin/c directory) or the absolute path to the executable must be specified in the Simulator xml file.
+
If your xml file contains options, these will be passed to the simulator as command line arguments (both in single and batch mode). For example:
  
Options (if present) are passed the simulator as follows (only from version 4.2 and later):
+
<source lang="bash">
 +
$ ./yourSimulationCode  0.5  0.6  0.5  option1=value1  option2=value2  etc..
 +
</source>
  
<code>
+
== Matlab simulator ==
<pre>
 
>> ./someSimulationCode 0.5 0.6 0.5 option1=value1 option2=value2 etc..
 
</pre>
 
</code>
 
  
== Matlab Simulator ==
 
 
=== Matlab function ===
 
=== Matlab function ===
If your simulator is a matlab file you just have to provide the following function to your code (for the same 3D example):
 
  
<code>
+
If your simulator is a Matlab file and you are not using batch mode, you just have to provide the following function to your code (for the same 3D example):
<pre>
+
 
 +
<source lang="matlab">
 
function [output1 output2 output3] = mySimulationCode(input1, input2 ,input3)
 
function [output1 output2 output3] = mySimulationCode(input1, input2 ,input3)
 
   ...
 
   ...
 
   % do the calculation
 
   % do the calculation
</pre>
+
</source>
</code>
 
  
Then you just need to make sure the matlab file is in the toolbox path (e.g., you can place it in src/matlab/examples).
+
You can also pass inputs and outputs as a single matrices.  Then you just need to make sure the Matlab file is in your project directory (see [[Adding an example]]).
  
Options (if present) are passed the simulator as an extra cell array parameter (only from version 4.2 and later):  
+
Options (if present) are passed to the simulator as an extra cell array parameter:  
  
<code>
+
 
<pre>
+
<source lang="matlab">
 
function [output1 output2 output3] = mySimulationCode(input1, input2 ,input3, options)
 
function [output1 output2 output3] = mySimulationCode(input1, input2 ,input3, options)
</pre>
+
</source>
</code>
 
  
Where options is a cell array of strings of the form
+
where '<code>options</code>' is a cell array of strings of the form:
  
<code>
+
<source lang="matlab">
<pre>
 
 
options : {'option1','value1','option2','value2',...}
 
options : {'option1','value1','option2','value2',...}
</pre>
+
</source>
</code>
 
  
=== Matlab class ===
+
'''Note''': see also [[FAQ#Should_I_use_a_Matlab_script_or_a_shell_script_for_interfacing_with_my_simulation_code.3F]]
Your simulator can also be a Matlab object of a particular class. Provide your class with the following function:
 
  
<code>
+
=== Batch Mode ===
<pre>
+
 
function [output1 output2 output3] = evaluate(input1, input2 ,input3)
+
If you ARE using batch mode, then your function must look like this:
 +
 
 +
<source lang="matlab">
 +
function [result] = mySimulationCode(inputs)
 
   ...
 
   ...
 
   % do the calculation
 
   % do the calculation
</pre>
+
  ...
</code>
+
  % IMPORTANT: you MUST return both inputs and output values
 +
  result = [inputs outputs];
 +
</source>
  
Options work in the same way as the pure Matlab simulator.
+
== Java simulator ==
  
== Java Simulator ==
+
You can also implement your simulator as a [http://en.wikipedia.org/wiki/Java_%28programming_language%29 Java] class.  All you need to do is write a class that implements the ''Simulator'' interface.  And make sure the class file is in the Matlab Java path.
You can also implement your simulator as a [http://en.wikipedia.org/wiki/Java_%28programming_language%29 Java] class.  All you need to do is write a class that implements the ''Simulator'' interface.  Options are passed as a java Properties object.
 
  
== Scattered datasets ==
+
Options are passed as a java Properties object.
Each row should contain exactly one datapoint, with one column per dimension.
 
  
 
== Gridded datasets ==
 
== Gridded datasets ==
 +
Gridded datasets assume that the data is spread uniformly over a grid. By making this assumption, there is no need to store the sample locations as in a scattered dataset: only the output values are stored. However, you must specify the 'gridSize' attribute in the [[Simulator configuration]] file. For example, setting 'gridSize="20,40,50"' means the toolbox will expect the gridded dataset to contain 40000 values per output (20-by-40-by-50 grid = 40000 points for one output).
 +
 +
Because the input values are not stored, the dataset must adhere to a strict order in which the output values are specified: the points must be specified in lexicographic order. For example, if you want to define a 3-dimensional dataset with grid size 2x3x2 on the [-1,1] domain, you must provide the outputs for the samples in the following order:
 +
 +
<code><pre>
 +
value at [-1, -1, -1]
 +
value at [-1, -1,  1]
 +
value at [-1,  0, -1]
 +
value at [-1,  0,  1]
 +
value at [-1,  1, -1]
 +
value at [-1,  1,  1]
 +
value at [ 1, -1, -1]
 +
value at [ 1, -1,  1]
 +
value at [ 1,  0, -1]
 +
value at [ 1,  0,  1]
 +
value at [ 1,  1, -1]
 +
value at [ 1,  1,  1]
 +
</pre></code>
 +
 +
The advantage of gridded datasets is that they are a bit faster to work with.  However, they are a bit harder to interpret and to transfer to other programs who expect a scattered format.  In general we recommend to simply use the scattered format.

Latest revision as of 10:55, 6 March 2010

For information how to model your own problem/data see the Adding an example page.

IMPORTANT

Input/Output Format

  • The SUMO Toolbox works on any input domain (= design space = input parameter ranges) specified in the simulator configuration file by a 'minimum' and 'maximum' attribute, for each input parameter.
    • If a 'minimum' is not specified, the default value of '-1' is assumed.
    • If a 'maximum' is not specified, the default value of '+1' is assumed.
    • Example:
<InputParameters>
    <Parameter name="a" type="real" minimum="47.0" maximum="50.0"/>
    <Parameter name="b" type="real" minimum="-20.0"/>
</InputParameters>
  • Be aware that all input values that are not in the specified input domain are trimmed, and thus not used in the modeling process.

Also remember that:

  • A Complex output should always be returned as 2 real values (i.e., real part and imaginary part separately).

Make sure your data source complies with these requirements. This is your responsibility.

Batch Mode

By default the toolbox will call your simulation code or script with one point at a time. However, sometimes this may not be efficient and you want to execute multiple simulations in one go. This is referred to as Batch Mode.

If you want to use Batch mode you must add the batch and batchSize attributes to the <Executable> tag in your simulator file. For example:

<Executable platform="matlab" batch="true" batchSize="9">/path/to/your/executable</Executable>

This means that we want to use batch mode (batch = true) and we want to evaluate maximally 9 points per batch.

Passing data directly

It is possible to pass data directly to the toolbox. For how to do this just type "help go". Remember though that the dimensions of your data must still match the information in the toolbox configuration file used.

Scattered datasets

Your data source may also be a dataset containing some scattered data points. Scattered means the points do not have to be in any order, i.e., they may be distributed in any way (e.g., randomly). In this case your dataset must be stored in textual format and should contain exactly one data point per row with inputs and outputs separated by spaces.

For example, for a problem with 3 inputs and 2 outputs your text file looks like:

  -1.5743   -0.0328    0.2732   -0.6980   -0.8389
  -0.7347   -1.8929    0.2294   -0.9992   -1.5545
   0.7472    0.5474   -0.8233    0.9931    1.5339
   0.3766    0.8020   -0.0336    0.9758    1.4774
     ...       ...       ...       ...       ...
   0.8785    0.0362   -1.4864    0.8407    1.1173

So the first three columns are the input points, the last two are the outputs. The file may not contain any other comments or text. Again, remember that a complex output should be stored as two columns (real and imaginary).

Native simulator

If your simulator is a native binary (e.g., exe file) or shell script it is expected to produce one output value per line. So every output should be on a new line, with complex outputs using two lines. Your code/script should NOT produce any other output.

There are two ways your code/script can be called by the SUMO-Toolbox: batch mode and command line mode.

In command line mode (= the default option), the inputs are given to the simulator as command line arguments. A call to a simulator in command line mode looks like (for a problem with 3 input parameters):

$ ./yourSimulationCode  0.5  0.6  0.5

Your code/script should then produce one value per output per line (as discussed above).

If your simulator is called with batch mode, multiple samples can be evaluated in batches. The simulation code is called with NO command line arguments (except for optional options, see below). The input points for a batch are instead passed to your simulation code/script through standard input (stdin). First, the size of the batch (the number of samples) is placed on stdin. Then, one line is written for each sample. this means that in total, 1 + (batchSize * inputDimension) numbers are written to stdin. An example of the format looks like:

3
0.5  0.6  0.5
0.2  0.7  0.3
0.2  0.6  0.8

The executable 'yourSimulationCode' must be in your path or the absolute path to the executable must be specified in the simulator configuration xml file.

If your xml file contains options, these will be passed to the simulator as command line arguments (both in single and batch mode). For example:

$ ./yourSimulationCode  0.5  0.6  0.5  option1=value1  option2=value2  etc..

Matlab simulator

Matlab function

If your simulator is a Matlab file and you are not using batch mode, you just have to provide the following function to your code (for the same 3D example):

function [output1 output2 output3] = mySimulationCode(input1, input2 ,input3)
   ...
   % do the calculation

You can also pass inputs and outputs as a single matrices. Then you just need to make sure the Matlab file is in your project directory (see Adding an example).

Options (if present) are passed to the simulator as an extra cell array parameter:


function [output1 output2 output3] = mySimulationCode(input1, input2 ,input3, options)

where 'options' is a cell array of strings of the form:

options : {'option1','value1','option2','value2',...}

Note: see also FAQ#Should_I_use_a_Matlab_script_or_a_shell_script_for_interfacing_with_my_simulation_code.3F

Batch Mode

If you ARE using batch mode, then your function must look like this:

function [result] = mySimulationCode(inputs)
   ...
   % do the calculation
   ...
   % IMPORTANT: you MUST return both inputs and output values
   result = [inputs outputs];

Java simulator

You can also implement your simulator as a Java class. All you need to do is write a class that implements the Simulator interface. And make sure the class file is in the Matlab Java path.

Options are passed as a java Properties object.

Gridded datasets

Gridded datasets assume that the data is spread uniformly over a grid. By making this assumption, there is no need to store the sample locations as in a scattered dataset: only the output values are stored. However, you must specify the 'gridSize' attribute in the Simulator configuration file. For example, setting 'gridSize="20,40,50"' means the toolbox will expect the gridded dataset to contain 40000 values per output (20-by-40-by-50 grid = 40000 points for one output).

Because the input values are not stored, the dataset must adhere to a strict order in which the output values are specified: the points must be specified in lexicographic order. For example, if you want to define a 3-dimensional dataset with grid size 2x3x2 on the [-1,1] domain, you must provide the outputs for the samples in the following order:

value at [-1, -1, -1]
value at [-1, -1,  1]
value at [-1,  0, -1]
value at [-1,  0,  1]
value at [-1,  1, -1]
value at [-1,  1,  1]
value at [ 1, -1, -1]
value at [ 1, -1,  1]
value at [ 1,  0, -1]
value at [ 1,  0,  1]
value at [ 1,  1, -1]
value at [ 1,  1,  1]

The advantage of gridded datasets is that they are a bit faster to work with. However, they are a bit harder to interpret and to transfer to other programs who expect a scattered format. In general we recommend to simply use the scattered format.