FAQ
General
What is a global surrogate model?
A global surrogate model is a mathematical model that mimics the behavior of a computationally expensive simulation code over the complete parameter space as accurately as possible, using as little data points as possible. So note that optimization is not the primary goal, although it can be done as a post-processing step. Global surrogate models are useful for:
- design space exploration, to get a feel of how the different parameters behave
- sensitivity analysis
- what-if analysis
- prototyping
- ...
In addition they are a cheap way to model large scale systems, multiple global surrogate models can be chained together in a model cascade.
What about surrogate driven optimization?
When coining the term surrogate driven optimization most people associate it with trust-region strategies and simple polynomial models. These frameworks first construct a local surrogate which is optimized to find an optimum. Afterwards, a move limit strategy decides how the local surrogate is scaled and/or moved through the input space. Subsequently the surrogate is rebuild and optimized. I.e. the surrogate zooms in to the global optimum. For instance the DAKOTA Toolbox implements such strategies where the surrogate construction is separated from optimization.
Such a framework was earlier implemented in the SUMO Toolbox but was deprecated as it didn't fit the philosophy and design of the toolbox.
Instead another, equally powerful, approach was taken. The current optimization framework is in fact a sampling selection strategy that balances local and global search. In other words, it balances between exploring the input space and exploiting the information the surrogate gives us.
A configuration example can be found here. For more information see the InfillSamplingCriterion .
What is the difference between the M3-Toolbox and the SUMO-Toolbox?
The SUMO toolbox is a complete, feature-full framework for automatically generating approximation models and performing adaptive sampling. In contrast, the M3-Toolbox was more of a proof-of-principle.
What happened to the M3-Toolbox?
The M3 Toolbox project has been discontinued (Fall 2007) and superseded by the SUMO Toolbox. Please contact tom.dhaene@ua.ac.be for any inquiries and requests about the M3 Toolbox.
How can I stay up to date with the latest news?
To stay up to date with the latest news and releases, we also recommend subscribing to our newsletter here. Traffic will be kept to a minimum (1 message every 2-3 months) and you can unsubscribe at any time.
What is the roadmap for the future
There is no explicit roadmap since much depends on where our research leads us, what feedback we get, which problems we are working on, etc. However, to get an idea of features to come you can always check the Whats new page.
Installation and Configuration
Will the SUMO Toolbox work with Matlab R2008 and later?
Initial tests have shown that the SUMO Toolbox will work with R2008, but we do not guarantee that every component will work flawlessly. R2008 also introduced a new way of handling classes and objects to which the SUMO Toolbox will be ported when R2008a and later become more widespread. At that point compatibility with previous Matlab versions will be dropped.
Upgrading
How do I upgrade to a newer version?
Delete your old <SUMO-Toolbox-directory>
and replace it by the new one.
Using
I want to model my own problem
See : Adding an example.
I want to contribute some data/patch/documentation/...
See : Contributing.
How do I interface with the SUMO Toolbox?
See : Interfacing with the toolbox.
See : Model portability.
Why are the Neural Networks so slow?
You are probably using the CrossValidation measure. CrossValidation is used by default if you have not defined a measure yourself. Since you need to train them, neural nets will always be slower than the other models. Using CrossValidation will slow things down much much more (5-times slower by default). Therefore, when using one of the neural network model types, please use a different measure, such as ValidationSet or SampleError. See the comments in default.xml
for examples.
Note: Starting from version 5.0, two new neural network backends are available (based on FANN and NNSYSID). These are a lot faster than the default backend based on the Matlab Neural Network Toolbox. However, the accuracy it not as good.
How can I speed things up?
There are a number of things you can do to speed things up:
- Disable some, or all of the profilers or disable the output handlers that draw charts
- Turn off the plotting of models in ContextConfig, you can always generate plots from the saved mat files
- If you have a multi-core/multi-cpu machine, set the threadCount variable in LocalSampleEvaluator equal to the number of cores/CPUs
- Upgrade to Matlab 7.4 or later which has better multi-threaded support
How do I turn off adaptive sampling (run the toolbox for a fixed set of samples)?
See : Adaptive Modeling Mode.
How do I change the error function (relative error, RMS, ...)?
The <Measure> tag specifies the algorithm to use to assign models a score, e.g., CrossValidation. It is also possible to specify which error function to use, in the measure. The default error function is 'rootRelativeSquareError
'.
Say you want to use CrossValidation with the maximum absolute error, then you would put:
<Measure type="CrossValidation" target="0.001" errorFcn="maxAbsoluteError"/>
On the other hand, if you wanted to use the ValidationSet measure with a relative root-mean-square error you would put:
<Measure type="ValidationSet" target="0.001" errorFcn="relativeRms"/>
The default error function is 'rootRelativeSquareError
'. These error functions can be found in the src/matlab/tools/errorFunctions
directory. You are free to modify them and add your own.
How do I enable more profilers?
Go to the <Profiling> tag and put "*"
as the regular expression. See also the next question.
What regular expressions can I use to filter profilers?
See the syntax here.
How can I ensure deterministic results?
See : Random state.
How do I get a simple closed-form model (symbolic expression)?
See : Using a model.
Use the getExpression(..)
function.
How do I enable the Heterogenous evolution to automatically select the best model type?
Due to a limitation of the Matlab Genetic Algorithm and Direct Search (GADS) Toolbox, you first have to manually edit the file src/matlab/contrib/modifiedMigrate.m. Open it and follow the instructions. Once that is done you can use the heterogenetic modelbuilder as you would any other.
Troubleshooting
I have a problem and I want to report it
See : Reporting problems.
I sometimes get flat models when using rational functions
First make sure the model is indeed flat, and does not just appear so on the plot. You can verify this by looking at the output axis range and making sure it is within reasonable bounds. When there are poles in the model, the axis range is sometimes stretched to make it possible to plot the high values around the pole, causing the rest of the model to appear flat. If the model contains poles, refer to the next question for the solution.
The RationalModel tries to do a least squares fit, based on which monomials are allowed in numerator and denominator. We have experienced that some models just find a flat model as the best least squares fit. There are two causes for this:
- The number of sample points is few, and the model parameters (as explained here and here) force the model to use only a very small set of degrees of freedom. The solution in this case is to increase the minimum percentage bound in the RationalXYZInterface ( RationalSequentialInterface or RationalGeneticInterface) section of your configuration file: change the
"percentBounds"
option to"60,100"
,"80,100"
, or even"100,100"
. A setting of"100,100"
will force the polynomial models to always exactly interpolate. However, note that this does not scale very well with the number of samples (to counter this you can set"maxDegrees"
). If, after increasing the"percentBounds"
you still get weird, spiky, models you simply need more samples or you should switch to a different model type. - Another possibility is that given a set of monomial degrees, the flat function is just the best possible least squares fit. In that case you simply need to wait for more samples.
When using rational functions I sometimes get 'spikes' (poles) in my model
When the denominator polynomial of a rational model has zeros inside the domain, the model will tend to infinity near these points. In most cases these models will only be recognized as being `the best' for a short period of time. As more samples get selected these models get replaced by better ones and the spikes should disappear.
So, it is possible that a rational model with 'spikes' (caused by poles inside the domain) will be selected as best model. This may or may not be an issue, depending on what you want to use the model for. If it doesn't matter that the model is very inaccurate at one particular, small spot (near the pole), you can use the model with the pole and it should perform properly.
However, if the model should have a reasonable error on the entire domain, several methods are available to reduce the chance of getting poles or remove the possibility altogether. The possible solutions are:
- Simply wait for more data, usually spikes disappear (but not always).
- Lower the maximum of the
"percentBounds"
option in the RationalXYZInterface ( RationalSequentialInterface or RationalGeneticInterface) section of your configuration file. For example, say you have 500 data points and if the maximum of the"percentBounds"
option is set to 100 percent it means the degrees of the polynomials in the rational function can go up to 500. If you set the maximum of the"percentBounds"
option to 10, on the other hand, the maximum degree is set at 50 (= 10 percent of 500). You can also use the"maxDegrees"
option to set an absolute bound. - If you roughly know the output range your data should have, an easy way to eliminate poles is to use the MinMax Measure together with your current measure ( CrossValidation by default). This will cause models whose response falls outside the min-max bounds to be penalized extra, thus spikes should disappear. See : Combining measures.
- Use a different model type (RBF, ANN, SVM,...), as spikes are a typical problem of rational functions.
- Try using the RationalPoleSuppressionSampleSelector, it was designed to get rid of this problem more quickly, but it only selects one sample at the time.
There is no noise in my data yet the rational functions don't interpolate
See : this question.
When loading a model from disk I get "Warning: Class ':all:' is an unknown object class. Object 'model' of this class has been converted to a structure."
You are trying to load a model file without the SUMO Toolbox in your Matlab path. Make sure the toolbox is in your Matlab path.
In short: Start Matlab, run <SUMO-Toolbox-directory>/startup.m
(to ensure the toolbox is in your path) and then try to load your model.
When running the SUMO Toolbox you get an error like "No component with id 'ann' of type 'adaptive model builder' found in config file."
This means you have specified to use a component with a certain id (in this case an AdaptiveModelBuilder component with id 'ann') but a component with that id does not exist further down in the configuration file (in this particular case 'ann' does not exist but 'anngenetic' does, as a quick search through the configuration file will show). So make sure you only declare components which have a definition lower down. So see which components are available, simply scroll down the configuration file and see which id's are specified. Please also refer to the Declarations and Definitions page.
When using RBF neural network models I sometimes get get a crash in "newrb"
This is an error in the Matlab Neural Network Toolbox implementation and not anything we can do about (a workaround is available on request). This should be fixed by Matlab 7.5.
When using NANN models I sometimes get "Runtime error in matrix library, Choldc failed. Matrix not positive definite"
This is a problem in the mex implementation of the NNSYSID toolbox. Simply delete the mex files, the Matlab implementation will be used and this will not cause any problems.
This means Matlab cannot find the FANN library itself to link to dynamically. Make sure it is in your library path, ie, on unix systems, make sure it is included in LD_LIBRARY_PATH.
When trying to use SVM models I get 'Error during fitness evaluation: Error using ==> svmtrain at 170, Group must be a vector'
You forgot to build the SVM mex files for your platform. For windows they are pre-compiled for you, on other systems you have to compile them yourself with the makefile.