http://sumowiki.intec.ugent.be/api.php?action=feedcontributions&user=Dgorissen&feedformat=atomSUMOwiki - User contributions [en]2020-04-04T18:51:56ZUser contributionsMediaWiki 1.31.2http://sumowiki.intec.ugent.be/index.php?title=FAQ&diff=5501FAQ2011-05-24T15:43:48Z<p>Dgorissen: /* Why are the Neural Networks so slow? */</p>
<hr />
<div>== General ==<br />
<br />
=== What is a global surrogate model? ===<br />
<br />
A global [http://en.wikipedia.org/wiki/Surrogate_model surrogate model] is a mathematical model that mimics the behavior of a computationally expensive simulation code over '''the complete parameter space''' as accurately as possible, using as little data points as possible. So note that optimization is not the primary goal, although it can be done as a post-processing step. Global surrogate models are useful for:<br />
<br />
* design space exploration, to get a ''feel'' of how the different parameters behave<br />
* sensitivity analysis<br />
* ''what-if'' analysis<br />
* prototyping<br />
* visualization<br />
* ...<br />
<br />
In addition they are a cheap way to model large scale systems, multiple global surrogate models can be chained together in a model cascade.<br />
<br />
See also the [[About]] page.<br />
<br />
=== What about surrogate driven optimization? ===<br />
<br />
When coining the term '''surrogate driven optimization''' most people associate it with trust-region strategies and simple polynomial models. These frameworks first construct a local surrogate which is optimized to find an optimum. Afterwards, a move limit strategy decides how the local surrogate is scaled and/or moved through the input space. Subsequently the surrogate is rebuild and optimized. I.e. the surrogate zooms in to the global optimum. For instance the [http://www.cs.sandia.gov/DAKOTA/ DAKOTA] Toolbox implements such strategies where the surrogate construction is separated from optimization.<br />
<br />
Such a framework was earlier implemented in the SUMO Toolbox but was deprecated as it didn't fit the philosophy and design of the toolbox. <br />
<br />
Instead another, equally powerful, approach was taken. The current optimization framework is in fact a sampling selection strategy that balances local and global search. In other words, it balances between exploring the input space and exploiting the information the surrogate gives us.<br />
<br />
A configuration example can be found [[Config:SampleSelector#expectedImprovement|here]].<br />
<br />
=== What is (adaptive) sampling? Why is it used? ===<br />
<br />
In classical Design of Experiments you need to specify the design of your experiment up-front. Or in other words, you have to say up-front how many data points you need and how they should be distributed. Two examples are Central Composite Designs and Latin Hypercube designs. However, if your data is expensive to generate (e.g., an expensive simulation code) it is not clear how many points are needed up-front. Instead data points are selected adaptively, only a couple at a time. This process of incrementally selecting new data points in regions that are the most interesting is called adaptive sampling, sequential design, or active learning. Of course the sampling process needs to start from somewhere so the very first set of points is selected based on a fixed, classic experimental design. See also [[Running#Understanding_the_control_flow]].<br />
SUMO provides a number of different sampling algorithms: [[SampleSelector]]<br />
<br />
Of course sometimes you dont want to do sampling. For example if you have a fixed dataset you just want to load all the data in one go and model that. For how to do this see [[FAQ#How_do_I_turn_off_adaptive_sampling_.28run_the_toolbox_for_a_fixed_set_of_samples.29.3F]].<br />
<br />
=== What about dynamical, time dependent data? ===<br />
<br />
The original design and purpose was to tackle static input-output systems, where there is no memory. Just a complex mapping that must be learnt and approximated. Of course you can take a fixed time interval and apply the toolbox but that typically is not a desired solution. Usually you are interested in time series prediction, e.g., given a set of output values from time t=0 to t=k, predict what happens at time t=k+1,k+2,...<br />
<br />
The toolbox was originally not intended for this purpose. However, it is quite easy to add support for recurrent models. Automatic generation of dynamical models would involve adding a new model type (just like you would add a new regression technique) or require adapting an existing one. For example it would not be too much work to adapt the ANN or SVM models to support dynamic problems. The only extra work besides that would be to add a new [[Measures|Measure]] that can evaluate the fidelity of the models' prediction.<br />
<br />
Naturally though, you would be unable to use sample selection (since it makes no sense in those problems). Unless of course there is a specialized need for it. In that case you would add a new [[SampleSelector]].<br />
<br />
For more information on this topic [[Contact]] us.<br />
<br />
=== What about classification problems? ===<br />
<br />
The main focus of the SUMO Toolbox is on regression/function approximation. However, the framework for hyperparameter optimization, model selection, etc. can also be used for classification. Starting from version 6.3 a demo file is included in the distribution that shows how this works on a well known test problem. If you want to play around with this feature without waiting for 6.3 to be released [[Contact|just let us know]].<br />
<br />
=== Can the toolbox drive my simulation code directly? ===<br />
<br />
Yes it can. See the [[Interfacing with the toolbox]] page.<br />
<br />
=== What is the difference between the M3-Toolbox and the SUMO-Toolbox? ===<br />
<br />
The SUMO toolbox is a complete, feature-full framework for automatically generating approximation models and performing adaptive sampling. In contrast, the M3-Toolbox was more of a proof-of-principle.<br />
<br />
=== What happened to the M3-Toolbox? ===<br />
<br />
The M3 Toolbox project has been discontinued (Fall 2007) and superseded by the SUMO Toolbox. Please contact tom.dhaene@ua.ac.be for any inquiries and requests about the M3 Toolbox.<br />
<br />
=== How can I stay up to date with the latest news? ===<br />
<br />
To stay up to date with the latest news and releases, we also recommend subscribing to our newsletter [http://www.sumo.intec.ugent.be here]. Traffic will be kept to a minimum (1 message every 2-3 months) and you can unsubscribe at any time.<br />
<br />
You can also follow our blog: [http://sumolab.blogspot.com/ http://sumolab.blogspot.com/].<br />
<br />
=== What is the roadmap for the future? ===<br />
<br />
There is no explicit roadmap since much depends on where our research leads us, what feedback we get, which problems we are working on, etc. However, to get an idea of features to come you can always check the [[Whats new]] page.<br />
<br />
You can also follow our blog: [http://sumolab.blogspot.com/ http://sumolab.blogspot.com/].<br />
<br />
=== Will there be an R/Scilab/Octave/Sage/.. version? ===<br />
<br />
At the start of the project we considered moving from Matlab to one of the available open source alternatives. However, after much discussion we decided against this for several reasons, including:<br />
<br />
* Existing experience and know-how of the development team<br />
* The widespread use of the Matlab platform in the target application domains<br />
* The quality and amount of available Matlab documentation<br />
* The quality and number of Matlab toolboxes<br />
* Support for object orientation (inheritance, polymorphism, etc.)<br />
* Many well documented interfacing options (especially the seamless integration with Java)<br />
<br />
Matlab, as a proprietary platform, definitely has its problems and deficiencies but the number of advanced algorithms and available toolboxes make it a very attractive platform. Equally important is the fact that every function is properly documented, tested, and includes examples, tutorials, and in some cases GUI tools. A lot of things would have been a lot harder and/or time consuming to implement on one of the other platforms. Add to that the fact that many engineers (particularly in aerospace) already use Matlab quite heavily. Thus given our situation, goals, and resources at the time, Matlab was the best choice for us. <br />
<br />
The other platforms remain on our radar however, and we do look into them from time to time. Though, with our limited resources porting to one of those platforms is not (yet) cost effective.<br />
<br />
=== What are collaboration options? ===<br />
<br />
We will gladly help out with any SUMO-Toolbox related questions or problems. However, since we are a university research group the most interesting goal for us is to work towards some joint publication (e.g., we can help with the modeling of your problem). Alternatively, it is always nice if we could use your data/problem (fully referenced and/or anonymized if necessary of course) as an example application during a conference presentation or in a PhD thesis.<br />
<br />
The most interesting case is if your problem involves sample selection and modeling. This means you have some simulation code or script to drive and you want an accurate model while minimizing the number of data points. In this case, in order for us to optimally help you it would be easiest if we could run your simulation code (or script) locally or access it remotely. Else its difficult to give good recommendations about what settings to use.<br />
<br />
If this is not possible (e.g., expensive, proprietary or secret modeling code) or if your problem does not involve sample selection, you can send us a fixed data set that is representative of your problem. Again, this may be fully anonymized and will be kept confidential of course.<br />
<br />
In either case (code or dataset) remember:<br />
<br />
* the data file should be an ASCII file in column format (each row containing one data point) (see also [[Interfacing_with_the_toolbox]])<br />
* include a short description of your data:<br />
** number of inputs and number of outputs<br />
** the range of each input (or scaled to [-1 1] if you do not wish to disclose this)<br />
** if the outputs are real or complex valued<br />
** how noisy the data is or if it is completely deterministic (computer simulation) (please also see: [[FAQ#My_data_contains_noise_can_the_SUMO-Toolbox_help_me.3F]]).<br />
** if possible the expected range of each output (or scaled if you do not wish to disclose this)<br />
** if possible the names of each input/output + a short description of what they mean<br />
** any further insight you have about the data, expected behavior, expected importance of each input, etc.<br />
<br />
If you have any further questions or comments related to this please [[Contact]] us.<br />
<br />
=== Can you help me model my problem? ===<br />
<br />
Please see the previous question: [[FAQ#What_are_collaboration_options.3F]]<br />
<br />
== Installation and Configuration ==<br />
<br />
=== What is the relationship between Matlab and Java? ===<br />
<br />
Many people do not know this, but your Matlab installation automatically includes a Java virtual machine. By default, Matlab seamlessly integrates with Java, allowing you to create Java objects from the command line (e.g., 's = java.lang.String'). It is possible to disable java support but in order to use the SUMO Toolbox it should not be. To check if Java is enabled you can use the 'usejava' command.<br />
<br />
=== What is Java, why do I need it, do I have to install it, etc. ? ===<br />
<br />
The short answer is: no, dont worry about it. The long answer is: Some of the code of the SUMO Toolbox is written in [http://en.wikipedia.org/wiki/Java_(programming_language) Java], since it makes a lot more sense in many situations and is a proper programming language instead of a scripting language like Matlab. Since Matlab automatically includes a JVM to run Java code there is nothing you need to do or worry about (see the previous FAQ entry). Unless its not working of course, in that case see [[FAQ#When_running_the_toolbox_you_get_something_like_.27.3F.3F.3F_Undefined_variable_.22ibbt.22_or_class_.22ibbt.sumo.config.ContextConfig.setRootDirectory.22.27]].<br />
<br />
=== What is XML? ===<br />
<br />
XML stands for eXtensible Markup Language and is related to HTML (= the stuff web pages are written in). The first thing you have to understand is that '''does not do anything'''. Honest. Many engineers are not used to it and think it is some complicated computer programming language-stuff-thingy. This is of course not the case (we ignore some of the fancy stuff you can do with it for now). XML is a markup language meaning, it provides some rules how you can annotate or structure existing text.<br />
<br />
The way SUMO uses XML is really simple and there is not much to understand. First some simple terminology. Take the following example:<br />
<br />
<source lang="xml"><br />
<Foo attr="bar">bla bla bla</Foo> <br />
</source><br />
<br />
Here we have '''a tag''' called ''Foo'' containing text ''bla bla bla''. The tag Foo also has an '''attribute''' ''attr'' with value ''bar''. '<Foo>' is what we call the '''opening tag''', and '</Foo>' is the '''closing tag'''. Each time you open a tag you must close it again. How you name the tags or attributes it totally up to you, you choose :)<br />
<br />
Lets take a more interesting example. Here we have used XML to represent information about a receipe for pancakes:<br />
<br />
<source lang="xml"><br />
<recipe category="dessert"><br />
<title>Pancakes</title><br />
<author>sumo@intec.ugent.be</author><br />
<date>Wed, 14 Jun 95</date><br />
<description><br />
Good old fashioned pancakes.<br />
</description><br />
<ingredients><br />
<item><br />
<amount>3</amount><br />
<type>eggs</type><br />
</item><br />
<br />
<item><br />
<amount>0.5 tablespoon</amount><br />
<type>salt</type><br />
</item><br />
...<br />
</ingredients><br />
<preparation><br />
...<br />
</preparation><br />
</recipe><br />
</source><br />
<br />
So basically, you see that XML is just a way to structure, order, and group information. Thats it! So SUMO basically uses it to store and structure configuration options. And this works well due to the nice hierarchical nature of XML.<br />
<br />
If you understand this there is nothing else to it in order to be able to understand the SUMO configuration files. If you need more information see the tutorial here: [http://www.w3schools.com/XML/xml_whatis.asp http://www.w3schools.com/XML/xml_whatis.asp]. You can also have a look at the wikipedia page here: [http://en.wikipedia.org/wiki/XML http://en.wikipedia.org/wiki/XML]<br />
<br />
=== Why does SUMO use XML? ===<br />
<br />
XML is the defacto standard way of structuring information. This ranges from spreadsheet files (Microsoft Excel for example), to configuration data, to scientific data, ... There are even whole database systems based solely on XML. So basically, its an intuitive way to structure data and it is used everywhere. This makes that there are a very large number of libraries and programming languages available that can parse, and handle XML easily. That means less work for the programmer. Then of course there is stuff like XSLT, XQuery, etc that makes life even easier.<br />
So basically, it would not make sense for SUMO to use any other format :)<br />
<br />
=== I get an error that SUMO is not yet activated ===<br />
<br />
Make sure you installed the activation file that was mailed to you as is explained in the [[Installation]] instructions. Also double check your system meets the [[System requirements]] and that [http://www.sumowiki.intec.ugent.be/index.php/FAQ#When_running_the_toolbox_you_get_something_like_.27.3F.3F.3F_Undefined_variable_.22ibbt.22_or_class_.22ibbt.sumo.config.ContextConfig.setRootDirectory.22.27|java java is enabled]. To fully verify that the activation file installation is correct ensure that the file ContextConfig.class is present in the directory ''<SUMO installation directory>/bin/java/ibbt/sumo/config''.<br />
<br />
Please note that more flexible research licenses are available if it is possible to [[FAQ#What_are_collaboration_options.3F|collaborate in any way]].<br />
<br />
== Upgrading ==<br />
<br />
=== How do I upgrade to a newer version? ===<br />
<br />
Delete your old <code><SUMO-Toolbox-directory></code> completely and replace it by the new one. Install the new activation file / extension pack as before (see [[Installation]]), start Matlab and make sure the default run works. To port your old configuration files to the new version: make a copy of default.xml (from the new version) and copy over your custom changes (from the old version) one by one. This should prevent any weirdness if the XML structure has changed between releases.<br />
<br />
If you had a valid activation file for the previous version, just [[Contact]] us (giving your SUMOlab website username) and we will send you a new activation file. Note that to update an activation file you must first unzip a copy of the toolbox to a new directory and install the activation file as if it was the very first time. Upgrading of an activation file without performing a new toolbox install is (unfortunately) not (yet) supported.<br />
<br />
== Using ==<br />
<br />
=== I have no idea how to use the toolbox, what should I do? ===<br />
<br />
See: [[Running#Getting_started]]<br />
<br />
=== I want to try one of the different examples ===<br />
<br />
See [[Running#Running_different_examples]].<br />
<br />
=== I want to model my own problem ===<br />
<br />
See : [[Adding an example]].<br />
<br />
=== I want to contribute some data/patch/documentation/... ===<br />
<br />
See : [[Contributing]].<br />
<br />
=== How do I interface with the SUMO Toolbox? ===<br />
<br />
See : [[Interfacing with the toolbox]].<br />
<br />
=== What configuration options (model type, sample selection algorithm, ...) should I use for my problem? ===<br />
<br />
See [[General_guidelines]].<br />
<br />
=== Ok, I generated a model, what can I do with it? ===<br />
<br />
See: [[Using a model]].<br />
<br />
=== How can I share a model created by the SUMO Toolbox? ===<br />
<br />
See : [[Using a model#Model_portability| Model portability]].<br />
<br />
=== I dont like the final model generated by SUMO how do I improve it? ===<br />
<br />
Before you start the modeling you should really ask youself this question: ''What properties do I want to see in the final model?'' You have to think about what for you constitutes a good model and what constitutes a poor model. Then you should rank those properties depending on how important you find them. Examples are:<br />
<br />
* accuracy in the training data<br />
** is it important that the error in the training data is exactly 0, or do you prefer some smoothing<br />
* accuracy outside the training data<br />
** this is the validation or test error, how important is proper generalization (usually this is very important)<br />
* what does accuracy mean to you? a low maximum error, a low average error, both, ...<br />
* smoothness<br />
** should your model be perfectly smooth or is it acceptable that you have a few small ripples here and there for example<br />
* are some regions of the response more important than others?<br />
** for example you may want to be certain that the minima/maxima are captured very accurately but everything in between is less important<br />
* are there particular special features that your model should have<br />
** for example, capture underlying poles or discontinuities correctly<br />
* extrapolation capability<br />
* ...<br />
<br />
It is important to note that often these criteria may be conflicting. The classical example is fitting noisy data: the lower your training error the higher your testing error. A natural approach is to combine multiple criteria, see [[Multi-Objective Modeling]].<br />
<br />
Once you have decided on a set of requirements the question is then, can the SUMO-Toolbox produce a model that meets them? In SUMO model generation is driven by one or more [[Measures]]. So you should choose the combination of [[Measures]] that most closely match your requirements. Of course we can not provide a Measure for every single property, but it is very straightforward to [[Add_Measure|add your own Measure]].<br />
<br />
Now, lets say you have chosen what you think are the best Measures but you are still not happy with the final model. Reasons could be:<br />
<br />
* you need more modeling iterations or you need to build more models per iteration (see [[Running#Understanding_the_control_flow]]). This will result in a more extensive search of the model parameter space, but will take longer to run.<br />
* you should switch to a different model parameter optimization algorithm (e.g., for example instead of the Pattern Search variant, try the Genetic Algorithm variant of your AdaptiveModelBuilder.)<br />
* the model type you are using is not ideally suited to your data<br />
* there simply is not enough data, use a larger initial design or perform more sampling iterations to get more information per dimension<br />
* maybe the sample distribution is causing troubles for your model (e.g., Kriging can have problems with clustered data). In that case it could be worthwhile to choose a different sample selection algorithm.<br />
* the range of your response variable is not ideal (for example, neural networks have trouble modeling data if the range of the outputs is very very small)<br />
<br />
You may also refer to the following [[General_guidelines]]. Finally, of course it may be that your problem is simply a very difficult one and does not approximate well. But, still you should at least get something satisfactory.<br />
<br />
If you are having these kinds of problems, please [[Reporting_problems|let us know]] and we will gladly help out.<br />
<br />
=== My data contains noise can the SUMO-Toolbox help me? ===<br />
<br />
The original purpose of the SUMO-Toolbox was for it to be used in conjunction with computer simulations. Since these are fully deterministic you do not have to worry about noise in the data and all the problems it causes. However, the methods in the toolbox are general fitting methods that work on noisy data as well. So yes, the toolbox can be used with noisy data, but you will just have to be more careful about how you apply the methods and how you perform model selection. Its only when you use the toolbox with a noisy simulation engine that a few special options may need to be set. In that case [[Contact]] us for more information.<br />
<br />
Note though, that the toolbox is not a statistical package, if you have noisy data and you need noise estimation algorithms, kernel smoothing algorithms, etc. you should look towards other tools.<br />
<br />
=== What is the difference between a ModelBuilder and a ModelFactory? ===<br />
<br />
See [[Add Model Type]].<br />
<br />
=== Why are the Neural Networks so slow? ===<br />
<br />
The ANN models are an extremely powerful model type that give very good results in many problems. However, they are quite slow to use. There are some things you can do:<br />
<br />
* use trainlm or trainscg instead of the default training function trainbr. trainbr gives very good, smooth results but is slower to use. If results with trainlm are not good enough, try using msereg as a performance function.<br />
* try setting the training goal (= the SSE to reach during training) to a small positive number (e.g., 1e-5) instead of 0.<br />
* check that the output range of your problem is not very small. If your response data lies between 10e-5 and 10e-9 for example it will be very hard for the neural net to learn it. In that case rescale your data to a more sane range.<br />
* switch from ANN to one of the other neural network modelers: fanngenetic or nanngenetic. These are a lot faster than the default backend based on the [http://www.mathworks.com/products/neuralnet/ Matlab Neural Network Toolbox]. However, the accuracy is usually not as good.<br />
* If you are using [[Measures#CrossValidation| CrossValidation]] try to switch to a different measure since CrossValidation is very expensive to use. CrossValidation is used by default if you have not defined a [[Measures| measure]] yourself. When using one of the neural network model types, try to use a different measure if you can. For example, our tests have shown that minimizing the sum of [[Measures#SampleError| SampleError]] and [[Measures#LRMMeasure| LRMMeasure]] can give equal or even better results than CrossValidation, while being much cheaper (see [[Multi-Objective Modeling]] for how to combine multiple measures). See also the comments in <code>default.xml</code> for examples.<br />
* Finally, as with any model type things will slow down if you have many dimensions or very large amounts of data. If that is the case, try some dimensionality reduction or subsampling techniques.<br />
<br />
See also [[FAQ#How_can_I_make_the_toolbox_run_faster.3F]]<br />
<br />
=== How can I make the toolbox run faster? ===<br />
<br />
There are a number of things you can do to speed things up. These are listed below. Remember though that the main reason the toolbox may seem to be slow is due to the many models being built as part of the hyperparameter optimization. Please make sure you fully understand the [[Running#Understanding_the_control_flow|control flow described here]] before trying more advanced options.<br />
<br />
* First of all check that your virus scanner is not interfering with Matlab. If McAfee or any other program wants to scan every file SUMO generates this really slows things down and your computer becomes unusable.<br />
<br />
* Turn off the plotting of models in [[Config:ContextConfig#PlotOptions| ContextConfig]], you can always generate plots from the saved mat files<br />
<br />
* This is an important one. For most model builders there is an option "maxFunEals", "maxIterations", or equivalent. Change this value to change the maximum number of models built between 2 sampling iterations. The higher this number, the slower, but the better the models ''may'' be. Equivalently, for the Genetic model builders reduce the population size and the number of generations.<br />
<br />
* If you are using [[Measures#CrossValidation]] see if you can avoid it and use one of the other measures or a combination of measures (see [[Multi-Objective Modeling]]<br />
<br />
* If you are using a very dense [[Measures#ValidationSet]] as your Measure, this means that every single model will be evaluated on that data set. For some models like RBF, Kriging, SVM, this can slow things down.<br />
<br />
* Disable some, or even all of the [[Config:ContextConfig#Profiling| profilers]] or disable the output handlers that draw charts. For example, you might use the following configuration for the profilers:<br />
<br />
<source lang="xml"><br />
<Profiling><br />
<Profiler name=".*share.*|.*ensemble.*|.*Level.*" enabled="true"><br />
<Output type="toImage"/><br />
<Output type="toFile"/><br />
</Profiler><br />
<br />
<Profiler name=".*" enabled="true"><br />
<Output type="toFile"/><br />
</Profiler><br />
</Profiling><br />
</source><br />
<br />
The ".*" means match any one or more characters ([http://java.sun.com/j2se/1.5.0/docs/api/java/util/regex/Pattern.html see here for the full list of supported wildcards]). Thus in this example all the profilers that have "share", "ensemble", or "Level" in their name shoud be enabled and should be saved as a text file (toFile) AND as an image file (toImage). All the other profilers should be saved just to file. The idea is to only save to image what you want as an image since image generation is expensive. If you do this or switch off image generation completely you will see everything run much faster.<br />
<br />
* Decrease the logging granularity, a log level of FINE (the default is FINEST or ALL) is more then granular enough. Setting it to FINE, INFO, or even WARNING should speed things up.<br />
<br />
* If you have a multi-core/multi-cpu machine:<br />
** if you have the Matlab Parallel Computing Toolbox, try setting the parallelMode option to true in [[Config:ContextConfig]]. Now all model training occurs in parallel. This may give unexpected errors in some cases so beware when using.<br />
** if you are using a native executable or script as the sample evaluator set the threadCount variable in [[Config:SampleEvaluator#LocalSampleEvaluator| LocalSampleEvaluator]] equal to the number of cores/CPUs (only do this if it is ok to start multiple instances of your simulation script in parallel!)<br />
<br />
* Dont use the Min-Max measure, it can slow things down. See also [[FAQ#How_do_I_force_the_output_of_the_model_to_lie_in_a_certain_range]]<br />
<br />
* If you are using neural networks see [[FAQ#Why_are_the_Neural_Networks_so_slow.3F]]<br />
<br />
* If you are having problems with very slow or seemingly hanging runs:<br />
** Do a run inside the [http://www.mathworks.com/access/helpdesk/help/techdoc/index.html?/access/helpdesk/help/techdoc/matlab_env/f9-17018.html&http://www.google.be/search?client=firefox-a&rls=org.mozilla%3Aen-US%3Aofficial&channel=s&hl=nl&q=matlab+profiler&meta=&btnG=Google+zoeken Matlab profiler] and see where most time is spent.<br />
<br />
** Monitor CPU and physical/virtual memory usage while the SUMO toolbox is running and see if you notice anything strange. <br />
<br />
* Also note that by default Matlab only allocates about 117 MB memory space for the Java Virtual Machine. If you would like to increase this limit (which you should) please follow the instructions [http://www.mathworks.com/support/solutions/data/1-18I2C.html?solution=1-18I2C here]. See also the general memory instructions [http://www.mathworks.com/support/tech-notes/1100/1106.html here].<br />
<br />
To check if your SUMO run has hanged, monitor your log file (with the level set at least to FINE). If you see no changes for about 30 minutes the toolbox will probably have stalled. [[Reporting problems| report the problems here]].<br />
<br />
Such problems are hard to identify and fix so it is best to work towards a reproducible test case if you think you found a performance or scalability issue.<br />
<br />
=== How do I build models with more than one output ===<br />
<br />
Sometimes you have multiple responses that you want to model at once. See [[Running#Models_with_multiple_outputs]]<br />
<br />
=== How do I turn off adaptive sampling (run the toolbox for a fixed set of samples)? ===<br />
<br />
See : [[Adaptive Modeling Mode]].<br />
<br />
=== How do I change the error function (relative error, RMSE, ...)? ===<br />
<br />
The [[Measures| <Measure>]] tag specifies the algorithm to use to assign models a score, e.g., [[Measures#CrossValidation| CrossValidation]]. It is also possible to specify which '''error function''' to use, in the measure. The default error function is '<code>rootRelativeSquareError</code>'.<br />
<br />
Say you want to use [[Measures#CrossValidation| CrossValidation]] with the maximum absolute error, then you would put:<br />
<br />
<source lang="xml"><br />
<Measure type="CrossValidation" target="0.001" errorFcn="maxAbsoluteError"/><br />
</source><br />
<br />
On the other hand, if you wanted to use the [[Measures#ValidationSet| ValidationSet]] measure with a relative root-mean-square error you would put:<br />
<br />
<source lang="xml"><br />
<Measure type="ValidationSet" target="0.001" errorFcn="relativeRms"/><br />
</source><br />
<br />
The default error function is '<code>rootRelativeSquareError</code>'. These error functions can be found in the <code>src/matlab/tools/errorFunctions</code> directory. You are free to modify them and add your own. Remember that the choice of error function is very important! Make sure you think well about it. Also see [[Multi-Objective Modeling]].<br />
<br />
=== How do I enable more profilers? ===<br />
<br />
Go to the [[Config:ContextConfig#Profiling| <Profiling>]] tag and put <code>"<nowiki>.*</nowiki>"</code> as the regular expression. See also the next question.<br />
<br />
=== What regular expressions can I use to filter profilers? ===<br />
<br />
See the syntax [http://java.sun.com/j2se/1.5.0/docs/api/java/util/regex/Pattern.html here].<br />
<br />
=== How can I ensure deterministic results? ===<br />
<br />
See : [[Random state]].<br />
<br />
=== How do I get a simple closed-form model (symbolic expression)? ===<br />
<br />
See : [[Using a model]].<br />
<br />
=== How do I enable the Heterogenous evolution to automatically select the best model type? ===<br />
<br />
Simply use the [[Config:AdaptiveModelBuilder#heterogenetic| heterogenetic modelbuilder]] as you would any other.<br />
<br />
=== What is the combineOutputs option? ===<br />
<br />
See [[Running#Models_with_multiple_outputs]]<br />
<br />
=== What error function should I use? ===<br />
<br />
The default error function is the Root Relative Square Error (RRSE). On the other hand meanRelativeError may be more intuitive but in that case you have to be careful if you have function values close to zero since in that case the relative error explodes or even gives infinity. You could also use one of the combined relative error functions (contain a +1 in the denominator to account for small values) but then you get something between a relative and absolute error (=> hard to interpret).<br />
<br />
So to be sure an absolute error seems the safest bet (like the RMSE), however in that case you have to come up with sensible accuracy targets and realize that you will build models that try to fit the regions of high absolute value better than the low ones.<br />
<br />
Picking an error function is a very tricky business and many people do not realize this. Which one is best for you and what targets you use ultimately depends on your application and on what kind of model you want. There is no general answer.<br />
<br />
A recommended read is [http://www.springerlink.com/content/24104526223221u3/ is this paper]. See also the page on [[Multi-Objective Modeling]].<br />
<br />
=== I just want to generate an initial design (no sampling, no modeling) ===<br />
<br />
Do a regular SUMO run, except set the 'maxModelingIterations' in the SUMO tag to 0. The resulting run will only generate (and evaluate) the initial design and save it to samples.txt in the output directory.<br />
<br />
=== How do I start a run with the samples of of a previous run, or with a custom initial design? ===<br />
<br />
Use a Dataset design component, for example:<br />
<br />
<source lang="xml"><br />
<InitialDesign type="DatasetDesign"><br />
<Option key="file" value="/path/to/the/file/containing/the/points.txt"/><br />
</InitialDesign><br />
</source><br />
<br />
The points of a previous run can be found in the samples.txt file in the output directory of the run you want to continue.<br />
<br />
As a sidenote, remark you can start the toolbox with *data points* of a previous run, but not with the *models* of a previous run.<br />
<br />
=== What is a level plot? ===<br />
<br />
A level plot is a plot that shows how the error histogram changes as the best model improves. An example is:<br />
<gallery><br />
Image:levelplot.png<br />
</gallery><br />
Level plots only work if you have a separate dataset (test set) that the model can be checked against. See the comments in default.xml for how to enable level plots.<br />
<br />
===I am getting a java out of memory error, what happened?===<br />
Datasets are loaded through java. This means that the java heap space is used for storing the data. If you try to load a huge dataset (> 50MB), you might experience problems with the maximum heap size. You can solve this by raising the heap size as described on the following webpage:<br />
[http://www.mathworks.com/support/solutions/data/1-18I2C.html]<br />
<br />
=== How do I force the output of the model to lie in a certain range ===<br />
<br />
See [[Measures#MinMax]].<br />
<br />
=== My problem is high dimensional and has a lot of input parameters (more than 10). Can I use SUMO? ===<br />
<br />
That depends. Remember that the main focus of SUMO is to generate accurate 'global' models. If you want to do sampling the practical dimensionality is limited to around 6-8 (though it depends on the problem and how cheap the simulations are!). Since the more dimensions the more space you need to fill. At that point you need to see if you can extend the models with domain specific knowledge (to improve performance) or apply a dimensionality reduction method ([[FAQ#Can_the_toolbox_tell_me_which_are_the_most_important_inputs_.28.3D_variable_selection.29.3F|see the next question]]). On the other hand, if you don't need to do sample selection but you have a fixed dataset which you want to model. Then the performance on high dimensional data just depends on the model type. For examples SVM type models are independent of the dimension and thus can always be applied. Though things like feature selection are always recommended.<br />
<br />
=== Can the toolbox tell me which are the most important inputs (= variable selection)? ===<br />
<br />
When tackling high dimensional problems a crucial question is "Are all my input parameters relevant?". Normally domain knowledge would answer this question but this is not always straightforward. In those cases a whole set of algorithms exist for doing dimensionality reduction (= feature selection). Support for some of these algorithms may eventually make it into the toolbox but are not currently implemented. That is a whole PhD thesis on its own. However, if a model type provides functions for input relevance determination the toolbox can leverage this. For example, the LS-SVM model available in the toolbox supports Automatic Relevance Determination (ARD). This means that if you use the SUMO Toolbox to generate an LS-SVM model, you can call the function ''ARD()'' on the model and it will give you a list of the inputs it thinks are most important.<br />
<br />
=== Should I use a Matlab script or a shell script for interfacing with my simulation code? ===<br />
<br />
When you want to link SUMO with an external simulation engine (ADS Momentum, SPECTRE, FEBIO, SWAT, ...) you need a [http://en.wikipedia.org/wiki/Shell_script shell script] (or executable) that can take the requested points from SUMO, setup the simulation engine (e.g., set necessary input files), calls the simulator for all the requested points, reads the output (e.g., one or more output files), and returns the results to SUMO (see [[Interfacing with the toolbox]]).<br />
<br />
Which one you choose (matlab script + [[Config:SampleEvaluator#matlab|Matlab Sample Evaluator]], or shell script/executable with [[Config:SampleEvaluator#local|Local Sample Evaluator]] is basically a matter of preference, take whatever is easiest for you.<br />
<br />
HOWEVER, there is one important consideration: Matlab does not support threads so this means that if you use a matlab script to interface with the simulation engine, simulations and modeling will happen sequentially, NOT in parallel. This means the modeling code will sit around waiting, doing nothing, until the simulation(s) have finished. If your simulation code takes a long time to run this is not very efficient.<br />
<br />
On the other hand, using a shell script/executable, does allow the modeling and simulation to occur in parallel (at least if you wrote your interface script in such a way that it can be run multiple times in parallel, i.e., no shared global directories or variables that can cause [http://en.wikipedia.org/wiki/Race_condition race conditions]).<br />
<br />
As a sidenote, note that if you already put work into a Matlab script, it is still possible to use a shell script, by writing a shell script that starts Matlab (using -nodisplay or -nojvm options), executes your script (using the -r option), and exits Matlab again. Of course it is not very elegant and adds some overhead but depending on your situation it may be worth it.<br />
<br />
=== How can I look at the internal structure of a SUMO model ===<br />
<br />
See [[Using_a_model#Available_methods]].<br />
<br />
=== Is there any design documentation available? ===<br />
<br />
An in depth overview of the rationale and philosophy, including a treatment of the software architecture underlying the SUMO Toolbox is available in the form of a PhD dissertation. A copy of this dissertation [http://www.sumo.intec.ugent.be/?q=system/files/2010_04_PhD_DirkGorissen.pdf is available here].<br />
<br />
== Troubleshooting ==<br />
<br />
=== I have a problem and I want to report it ===<br />
<br />
See : [[Reporting problems]].<br />
<br />
=== I sometimes get flat models when using rational functions ===<br />
<br />
First make sure the model is indeed flat, and does not just appear so on the plot. You can verify this by looking at the output axis range and making sure it is within reasonable bounds. When there are poles in the model, the axis range is sometimes stretched to make it possible to plot the high values around the pole, causing the rest of the model to appear flat. If the model contains poles, refer to the next question for the solution.<br />
<br />
The [[Config:AdaptiveModelBuilder#rational| RationalModel]] tries to do a least squares fit, based on which monomials are allowed in numerator and denominator. We have experienced that some models just find a flat model as the best least squares fit. There are two causes for this:<br />
<br />
* The number of sample points is few, and the model parameters (as explained [[Model types explained#PolynomialModel|here]]) force the model to use only a very small set of degrees of freedom. The solution in this case is to increase the minimum percentage bound in the RationalFactory section of your configuration file: change the <code>"percentBounds"</code> option to <code>"60,100"</code>, <code>"80,100"</code>, or even <code>"100,100"</code>. A setting of <code>"100,100"</code> will force the polynomial models to always exactly interpolate. However, note that this does not scale very well with the number of samples (to counter this you can set <code>"maxDegrees"</code>). If, after increasing the <code>"percentBounds"</code> you still get weird, spiky, models you simply need more samples or you should switch to a different model type.<br />
* Another possibility is that given a set of monomial degrees, the flat function is just the best possible least squares fit. In that case you simply need to wait for more samples.<br />
* The measure you are using is not accurately estimating the true error, try a different measure or error function. Note that a maximum relative error is dangerous to use since a the 0-function (= a flat model) has a lower maximum relative error than a function which overshoots the true behavior in some places but is otherwise correct.<br />
<br />
=== When using rational functions I sometimes get 'spikes' (poles) in my model ===<br />
<br />
When the denominator polynomial of a rational model has zeros inside the domain, the model will tend to infinity near these points. In most cases these models will only be recognized as being `the best' for a short period of time. As more samples get selected these models get replaced by better ones and the spikes should disappear.<br />
<br />
So, it is possible that a rational model with 'spikes' (caused by poles inside the domain) will be selected as best model. This may or may not be an issue, depending on what you want to use the model for. If it doesn't matter that the model is very inaccurate at one particular, small spot (near the pole), you can use the model with the pole and it should perform properly.<br />
<br />
However, if the model should have a reasonable error on the entire domain, several methods are available to reduce the chance of getting poles or remove the possibility altogether. The possible solutions are:<br />
<br />
* Simply wait for more data, usually spikes disappear (but not always).<br />
* Lower the maximum of the <code>"percentBounds"</code> option in the RationalFactory section of your configuration file. For example, say you have 500 data points and if the maximum of the <code>"percentBounds"</code> option is set to 100 percent it means the degrees of the polynomials in the rational function can go up to 500. If you set the maximum of the <code>"percentBounds"</code> option to 10, on the other hand, the maximum degree is set at 50 (= 10 percent of 500). You can also use the <code>"maxDegrees"</code> option to set an absolute bound.<br />
* If you roughly know the output range your data should have, an easy way to eliminate poles is to use the [[Measures#MinMax| MinMax]] [[Measures| Measure]] together with your current measure ([[Measures#CrossValidation| CrossValidation]] by default). This will cause models whose response falls outside the min-max bounds to be penalized extra, thus spikes should disappear.<br />
* Use a different model type (RBF, ANN, SVM,...), as spikes are a typical problem of rational functions.<br />
* Increase the population size if using the genetic version<br />
* Try using the [[SampleSelector#RationalPoleSuppressionSampleSelector| RationalPoleSuppressionSampleSelector]], it was designed to get rid of this problem more quickly, but it only selects one sample at the time.<br />
<br />
However, these solutions may not still not suffice in some cases. The underlying reason is that the order selection algorithm contains quite a lot of randomness, making it prone to over-fitting. This issue is being worked on but will take some time. Automatic order selection is not an easy problem<br />
<br />
=== There is no noise in my data yet the rational functions don't interpolate ===<br />
<br />
[[FAQ#I sometimes get flat models when using rational functions |see this question]].<br />
<br />
=== When loading a model from disk I get "Warning: Class ':all:' is an unknown object class. Object 'model' of this class has been converted to a structure." ===<br />
<br />
You are trying to load a model file without the SUMO Toolbox in your Matlab path. Make sure the toolbox is in your Matlab path. <br />
<br />
In short: Start Matlab, run <code><SUMO-Toolbox-directory>/startup.m</code> (to ensure the toolbox is in your path) and then try to load your model.<br />
<br />
=== When running the SUMO Toolbox you get an error like "No component with id 'annpso' of type 'adaptive model builder' found in config file." ===<br />
<br />
This means you have specified to use a component with a certain id (in this case an AdaptiveModelBuilder component with id 'annpso') but a component with that id does not exist further down in the configuration file (in this particular case 'annpso' does not exist but 'anngenetic' or 'ann' does, as a quick search through the configuration file will show). So make sure you only declare components which have a definition lower down. So see which components are available, simply scroll down the configuration file and see which id's are specified. Please also refer to the [[Toolbox configuration#Declarations and Definitions | Declarations and Definitions]] page.<br />
<br />
=== When using NANN models I sometimes get "Runtime error in matrix library, Choldc failed. Matrix not positive definite" ===<br />
<br />
This is a problem in the mex implementation of the [http://www.iau.dtu.dk/research/control/nnsysid.html NNSYSID] toolbox. Simply delete the mex files, the Matlab implementation will be used and this will not cause any problems.<br />
<br />
=== When using FANN models I sometimes get "Invalid MEX-file createFann.mexa64, libfann.so.2: cannot open shared object file: No such file or directory." ===<br />
<br />
This means Matlab cannot find the [http://leenissen.dk/fann/ FANN] library itself to link to dynamically. Make sure the FANN libraries (stored in src/matlab/contrib/fann/src/.libs/) are in your library path, e.g., on unix systems, make sure they are included in LD_LIBRARY_PATH.<br />
<br />
=== Undeﬁned function or method ’createFann’ for input arguments of type ’double’. ===<br />
<br />
See [[FAQ#When_using_FANN_models_I_sometimes_get_.22Invalid_MEX-file_createFann.mexa64.2C_libfann.so.2:_cannot_open_shared_object_file:_No_such_file_or_directory..22]]<br />
<br />
=== When trying to use SVM models I get 'Error during fitness evaluation: Error using ==> svmtrain at 170, Group must be a vector' ===<br />
<br />
You forgot to build the SVM mex files for your platform. For windows they are pre-compiled for you, on other systems you have to compile them yourself with the makefile.<br />
<br />
=== When running the toolbox you get something like '??? Undefined variable "ibbt" or class "ibbt.sumo.config.ContextConfig.setRootDirectory"' ===<br />
<br />
First see [[FAQ#What_is_the_relationship_between_Matlab_and_Java.3F | this FAQ entry]].<br />
<br />
This means Matlab cannot find the needed Java classes. This typically means that you forgot to run 'startup' (to set the path correctly) before running the toolbox (using 'go'). So make sure you always run 'startup' before running 'go' and that both commands are always executed in the toolbox root directory.<br />
<br />
If you did run 'startup' correctly and you are still getting an error, check that Java is properly enabled:<br />
<br />
# typing 'usejava jvm' should return 1 <br />
# typing 's = java.lang.String', this should ''not'' give an error<br />
# typing 'version('-java')' should return at least version 1.5.0<br />
<br />
If (1) returns 0, then the jvm of your Matlab installation is not enabled. Check your Matlab installation or startup parameters (did you start Matlab with -nojvm?)<br />
If (2) fails but (1) is ok, there is a very weird problem, check the Matlab documentation.<br />
If (3) returns a version before 1.5.0 you will have to upgrade Matlab to a newer version or force Matlab to use a custom, newer, jvm (See the Matlab docs for how to do this).<br />
<br />
=== You get errors related to ''gaoptimset'',''psoptimset'',''saoptimset'',''newff'' not being found or unknown ===<br />
<br />
You are trying to use a component of the SUMO toolbox that requires a Matlab toolbox that you do not have. See the [[System requirements]] for more information.<br />
<br />
=== After upgrading I get all kinds of weird errors or warnings when I run my XML files ===<br />
<br />
See [[FAQ#How_do_I_upgrade_to_a_newer_version.3F]]<br />
<br />
=== I get a warning about duplicate samples being selected, why is this? ===<br />
<br />
Sometimes, in special circumstances, multiple sample selectors may select the same sample at the same time. Even though in most cases this is detected and avoided, it can still happen when multiple outputs are modelled in one run, and each output is sampled by a different sample selector. These sample selectors may then accidentally choose the same new sample location.<br />
<br />
=== I sometimes see the error of the best model go up, shouldn't it decrease monotonically? ===<br />
<br />
There is no short answer here, it depends on the situation. Below 'single objective' refers to the case where during the hyperparameter optimization (= the modeling iteration) combineOutputs=false, and there is only a single measure set to 'on'. The other cases are classified as 'multi objective'. See also [[Multi-Objective Modeling]].<br />
<br />
# '''Sampling off'''<br />
## ''Single objective'': the error should always decrease monotonically, you should never see it rise. If it does [[reporting problems|report it as a bug]]<br />
## ''Multi objective'': There is a very small chance the error can temporarily decrease but it should be safe to ignore. In this case it is best to use a multi objective enabled modeling algorithm<br />
# '''Sampling on'''<br />
## ''Single objective'': inside each modeling iteration the error should always monotonically decrease. At each sampling iteration the best models are updated (to reflect the new data), thus there the best model score may increase, this is normal behavior(*). It is possible that the error increases for a short while, but as more samples come in it should decrease again. If this does not happen you are using a poor measure or poor hyperparameter optimization algorithm, or there is a problem with the modeling technique itself (e.g., clustering in the datapoints is causing numerical problems).<br />
## ''Multi objective'': Combination of 1.2 and 2.1.<br />
<br />
(*) This is normal if you are using a measure like cross validation that is less reliable on little data than on more data. However, in some cases you may wish to override this behavior if you are using a measure that is independent of the number of samples the model is trained with (e.g., a dense, external validation set). In this case you can force a monotonic decrease by setting the 'keepOldModels' option in the SUMO tag to true. Use with caution!<br />
<br />
=== At the end of a run I get Undefined variable "ibbt" or class "ibbt.sumo.util.JpegImagesToMovie.createMovie" ===<br />
<br />
This is normal, the warning printed out before the error explains why:<br />
<br />
''[WARNING] jmf.jar not found in the java classpath, movie creation may not work! Did you install the SUMO extension pack? Alternatively you can install the java media framwork from java.sun.com''<br />
<br />
By default, at the end of a run, the toolbox will try to generate a movie of all the intermediate model plots. To do this it requires the extension pack to be installed (you can download it from the SUMO lab website). So install the extension pack and you will no longer get the error. Alternatively you can simply set the "createMovie" option in the <SUMO> tag to "false".<br />
So note that there is nothing to worry about, everything has run correctly, it is just the movie creation that is failing.<br />
<br />
=== On startup I get the error "java.io.IOException: Couldn't get lock for output/SUMO-Toolbox.%g.%u.log" ===<br />
<br />
This error means that SUMO is unable to create the log file. Check the output directory exists and has the correct permissions. If your output directory is on a shared (network) drive this could also cause problems. Also make sure you are running the toolbox (calling 'go') from the toolbox root directory, and not in some toolbox sub directory! This is very important.<br />
<br />
If you still have problems you can override the default logfile name and location as follows:<br />
<br />
In the <FileHandler> tag inside the <Logging> tag add the following option:<br />
<br />
<code><br />
<Option key="Pattern" value="My_SUMO_Log_file.log"/><br />
</code><br />
<br />
This means that from now on the sumo log file will be saved as the file "My_SUMO_Log_file.log" in the SUMO root directory. You can use any path you like.<br />
For more information about this option see [http://java.sun.com/j2se/1.4.2/docs/api/java/util/logging/FileHandler.html the FileHandler Javadoc].<br />
<br />
=== The Toolbox crashes with "Too many open files" what should I do? ===<br />
<br />
This is a known bug, see [[Known_bugs#Version_6.1]].<br />
<br />
If this does not fix your problem then do the following:<br />
<br />
On Windows try increasing the limit in windows as dictated by the error message. Also, when you get the error, use the fopen("all") command to see which files are open and send us the list of filenames. Then we can maybe further help you debug the problem. Even better would be to use the Process Explorer utility [http://technet.microsoft.com/en-us/sysinternals/bb896653.aspx available here]. When you get the error, dont shut down Matlab but start Process explorer and see which SUMO-Toolbox related files are open. If you then [[Reporting_problems|let us know]] we can further debug the problem.<br />
<br />
On Linux again don't shut down Matlab but:<br />
<br />
* open a new terminal window<br />
* type:<br />
<source lang="bash"><br />
lsof > openFiles.txt<br />
</source><br />
* Then [[Contact|send us]] the following information:<br />
** the file openFiles.txt <br />
** the exact Linux distribution you are using (Red Hat 10, CentOS 5, SUSE 11, etc).<br />
** the output of<br />
<source lang="bash"><br />
uname -a ; df -T ; mount<br />
</source><br />
<br />
As a temporary workaround you can try increasing the maximum number of open files ([http://www.linuxforums.org/forum/redhat-fedora-linux-help/64716-where-chnage-file-max-permanently.html see for example here]). We are currently debugging this issue.<br />
<br />
In general: to be safe it is always best to do a SUMO run from a clean Matlab startup, especially if the run is important or may take a long time.<br />
<br />
=== When using the LS-SVM models I get lots of warnings: "make sure lssvmFILE.x (lssvmFILE.exe) is in the current directory, change now to MATLAB implementation..." ===<br />
<br />
The LS-SVMs have a C implementation and a Matlab implementation. If you dont have the compiled mex files it will use the matlab implementation and give a warning. But everything will work properly. To get rid of the warnings, compile the mex files [[Installation#Windows|as described here]], this can be done very easily. Or simply comment out the lines that produce the output in the lssvmlab directory in src/matlab/contrib.<br />
<br />
=== I get an error "Undefined function or method 'trainlssvm' for input arguments of type 'cell'" ===<br />
<br />
You most likely forgot to [[Installation#Extension_pack|install the extension pack]].<br />
<br />
=== When running the SUMO-Toolbox under Linux, the [http://en.wikipedia.org/wiki/X_Window_System X server] suddenly restarts and I am logged out of my session ===<br />
<br />
Note that in Linux there is an explicit difference between the [http://en.wikipedia.org/wiki/Linux_kernel kernel] and the [http://en.wikipedia.org/wiki/X_Window_System X display server]. If the kernel crashes or panics your system completely freezes (you have to reset manually) or your computer does a full reboot. Luckily this is very rare. However, if you display server (X) crashes or restarts it means your operating system is still running fine, its just that you have to log in again since your graphical session has terminated. The FAQ entry is only for the latter. If you find your kernel is panicing or freezing, that is a more fundamental problem and you should contact your system admin.<br />
<br />
So what happens is that after a few seconds when the toolbox wants to plot the first model [http://en.wikipedia.org/wiki/X_Window_System X] crashes and you are suddenly presented with a login screen. The problem is not due to SUMO but rather to the Matlab - Display server interaction.<br />
<br />
What you should first do is set plotModels to false in the [[Config:ContextConfig]] tag, run again and see if the problem occurs again. If it does please [[Reporting_problems| report it]]. If the problem does not occur you can then try the following:<br />
<br />
* Log in as root (or use [http://en.wikipedia.org/wiki/Sudo sudo])<br />
* Edit the following configuration file using a text editor (pico, nano, vi, kwrite, gedit,...)<br />
<br />
<source lang="bash"><br />
/etc/X11/xorg.conf<br />
</source><br />
<br />
Note: the exact location of the xorg.conf file may vary on your system.<br />
<br />
* Look for the following line:<br />
<br />
<source lang="bash"><br />
Load "glx"<br />
</source><br />
<br />
* Comment it out by replacing it by:<br />
<br />
<source lang="bash"><br />
# Load "glx"<br />
</source><br />
<br />
* Then save the file, restart your X server (if you do not know how to do this simply reboot your computer)<br />
* Log in again, and try running the toolbox (making sure plotModels is set to true again). It should now work. If it still does not please [[Reporting_problems| report it]].<br />
<br />
Note:<br />
* this is just an empirical workaround, if you have a better idea please [[Contact|let us know]]<br />
* if you wish to debug further yourself please check the Xorg log files and those in /var/log<br />
* another possible workaround is to start matlab with the "-nodisplay" option. That could work as well.<br />
<br />
=== I get the error "Failed to close Matlab pool cleanly, error is Too many output arguments" ===<br />
<br />
This happens if you run the toolbox on Matlab version 2008a and you have the parallel computing toolbox installed. You can simply ignore this error message, it does not cause any problems. If you want to use SUMO with the parallel computing toolbox you will need Matlab 2008b.<br />
<br />
=== The toolbox seems to keep on running forever, when or how will it stop? ===<br />
<br />
The toolbox will keep on generating models and selecting data until one of the termination criteria has been reached. It is up to ''you'' to choose these targets carefully, so how low the toolbox runs simply depends on what targets you choose. Please see [[Running#Understanding_the_control_flow]].<br />
<br />
Of course choosing a-priori targets up front is not always easy and there is no real solution for this, except thinking well about what type of model you want (see [[FAQ#I_dont_like_the_final_model_generated_by_SUMO_how_do_I_improve_it.3F]]). In doubt you can always use a small value (or 0) and then simply quit the running toolbox using Ctrl-C when you think its been enough.<br />
<br />
While one could implement fancy, automatic stopping algorithms, their actual benefit is questionable.</div>Dgorissenhttp://sumowiki.intec.ugent.be/index.php?title=Measures&diff=5212Measures2010-08-24T15:54:46Z<p>Dgorissen: /* ValidationSet */</p>
<hr />
<div>== What is a Measure ==<br />
A crucial aspect of generating models is estimating their quality. Some metric is needed to decide whether a model is poor, good, very good, etc. In the SUMO Toolbox this role is played by the ''Measure'' component. A measure is an object that, given a model, returns an estimation of its quality. This can be something very simple as the error of the model fit in the training data, or it may involve a complex calculation to see if a model satisfies some physical constraint.<br />
<br />
The Measure is used in the SUMO Toolbox to drive the [[Add_Model_Type#Models.2C_Model_builders.2C_and_Factories|model parameter optimization]] (see also the [[Running#Understanding_the_control_flow|toolbox control flow]]). This can be done in a single objective or [[Multi-Objective Modeling|multi-objective]] fashion.<br />
<br />
There are two aspects to a Measure:<br />
<br />
# The quality estimation algorithm<br />
# The error function<br />
<br />
The first is the algorithm used to estimate the model quality. This is for example the in-sample error, or the 5-fold crossvalidation score. The error function determines what kind of error you want to use. You can calculate the in-sample error using: the average absolute error, the root mean square error, a maximum relative error, etc. Note that the error function may not be relevant for every type of measure (eg., AIC).<br />
<br />
It cannot be stressed enough that:<br />
<br />
<center>'''A proper choice of Measure and Error function is CRUCIAL to the success of your modeling'''</center><br />
<br />
That choice will depend on your problem characteristics, the data distribution and the model type you will use to fit the data. It is '''extremely important''' to think about this well ("what do you want?") before starting any modeling.<br />
<br />
As a side remark, note that Measures and [[SampleSelector|sample selectors]] are closely related. The same criteria that is used to decide if a model is good or bad, can be used to identify interesting locations to select new data points.<br />
<br />
Also note that some model builders (notably GeneticModelBuilder) support constraints directly. This means you can also implement what you would otherwise implement as a Measure as constraint in the model parameter optimization algorithm itself. This is a superior solution to using a Measure.<br />
<br />
== Using Measures ==<br />
<br />
A recommended read is [http://www.sumo.intec.ugent.be/files/2009_08_EWC.pdf the paper available here].<br />
<br />
In general, the default measure, 5 fold CrossValidation, is an acceptable choice. However, it is also very expensive, as it requires that a model be re-trained for each fold. This may slow things down if a model is expensive to train (e.g., neural nets). Also CrossValidation can give biased results if data is clustered or scarce. Increasing the number of folds may help here. A cheaper alternative is ValidationSet (see below) or AIC. For a full list of available measures see the <code>src/matlab/measures</code> subdirectory.<br />
<br />
Note that multiple measures may also be combined. For more information see [[Multi-Objective Modeling]].<br />
<br />
For how to change the error function see [[FAQ#How_do_I_change_the_error_function_.28relative_error.2C_RMS.2C_....29.3F| this FAQ entry]].<br />
<br />
Below is a list of some available measures and the configuration options available for each of them. Each measure also has a target accuracy attribute, which can be omitted and which defaults to 0.001. In certain cases, such as the binary MinMax measure, the target accuracy is irrelevant.<br />
<br />
== Defining your own Measure ==<br />
<br />
see [[Add_Measure]]<br />
<br />
== Measure types ==<br />
<br />
<br />
''We are well aware that documentation is not always complete and possibly even out of date in some cases. We try to document everything as best we can but much is limited by available time and manpower. We are are a university research group after all. The most up to date documentation can always be found (if not here) in the default.xml configuration file and, of course, in the source files. If something is unclear please dont hesitate to [[Reporting problems|ask]].''<br />
<br />
=== CrossValidation ===<br />
<br />
The CrossValidation measure is the default choice and performs an n-fold cross validation on the model to create an efficient estimation of the accuracy of the model. Several options are available to customize this measure.<br />
<br />
{{OptionsHeader}}<br />
{{Option<br />
|name = folds<br />
|values = positive integer<br />
|default = 5<br />
|description = The number of folds used for the measure. A higher number means that more models will be built, but that a better accuracy estimate is achieved.<br />
}}<br />
{{Option<br />
|name = randomThreshold<br />
|values = positive integer<br />
|default = 1000<br />
|description = If the number of samples is greater than this number a random partitioning is used<br />
}}<br />
{{Option<br />
|name = partitionMethod<br />
|values = [uniform,random]<br />
|default = uniform<br />
|description = This option defines whether the test sets for the folds are chosen randomly, or are chosen in such a way as to maximize the domain coverage. Random is generally much faster, but might result in pessimistic scoring, as unlucky test set choice can result in an inaccurate error. This can partly be fixed by enabling the resetFolds option.<br />
}}<br />
{{Option<br />
|name = resetFolds<br />
|values = boolean<br />
|default = no<br />
|description = Folds are generated from scratch for each model that is evaluated using this measure. If the same model is evaluated twice (for example, after a rebuild), new folds are used. Enabling this feature can be very costly for large sample sizes. As a rule of thumb, enable this in combination with the random partition method, or disable this when using the uniform method.<br />
}}<br />
<br />
=== ValidationSet ===<br />
<br />
The ValidationSet measure has two different methods of operation.<br />
<br />
# In the first method, the list of samples that have been evaluated is split into a validation set and a training set. A model is then built using the training set, and evaluated using the validation set (which is by default 20% of the total sample pool).<br />
# However, an external data file containing a validation set can also specified. In this case, all the evaluated samples are used for training, and the external set is used for validation only. Which of these two operation methods is used, depends on the configuration options below. By default, no external validation set is loaded.<br />
<br />
If you want to use an external validation set, you will have to provide a SampleEvaluator configuration so that the validation set can be loaded from an external source. Here is a ValidationSet configuration example which loads the validation set from the scattered data file provided in the simulator file:<br />
<br />
<source lang="xml"><br />
<Measure type="ValidationSet" target=".001"><br />
<Option key="type" value="file"/><br />
<SampleEvaluator type="ibbt.sumo.sampleevaluators.datasets.ScatteredDatasetSampleEvaluator"/><br />
</Measure><br />
</source><br />
<br />
{{OptionsHeader}}<br />
{{Option<br />
|name = type<br />
|values = [distance, random, file]<br />
|default = distance<br />
|description = Method used to acquire samples for the validation set. The default method, 'distance', tries to select a validation set which covers the entire domain as good as possible, ensuring that not all validation samples are chosen in the same part of the domain. This is achieved using a distance heuristic, which gives no guarantees on optimal coverage but performs very well in almost all situations. The 'random' method just picks a random set of samples from the entire pool to be used for validation set.<br />
Finally, the 'file' method does not take samples at all from the pool, but loads a validation set from an external dataset.<br />
}}<br />
{{Option<br />
|name = percentUsed<br />
|values = [0,100]<br />
|default = 20<br />
|description = Percent of samples used for the validation set. By default 20% of all samples are used for validation, while the remaining 80% are used for training. This option is irrelevant if the 'type' option is set to 'file'.<br />
}}<br />
{{Option<br />
|name = randomThreshold<br />
|values = positive integer<br />
|default = 1000<br />
|description = When the sample pool is very large, the distance heuristic used by default becomes too slow, and the toolbox switches to random sample selection automatically. This is done when the amount of samples is larger than this value, which defaults to 1000. This option should not be changed unless the performance is unacceptable even for sample sets smaller than this amount.<br />
}}<br />
<br />
=== MinMax ===<br />
<br />
The MinMax measure is used to eliminate models whose response falls below a given minimum or above a given maximum. This measure can be used to detect models that have poles in the model domain and to guide the modeling process in the right direction. If the output is known to lie within certain value bounds, these can be added to the simulator file as follows:<br />
<br />
<source lang="xml"><br />
<OutputParameters><br />
<Parameter name="out" type="real" minimum="-1" maximum="1"/><br />
</OutputParameters><br />
</source><br />
<br />
When the MinMax measure is defined, these values will be used to ensure that all models stay within these bounds. If only the minimum or only the maximum is defined, naturally only these are enforced. There are no further configuration options for this measure. In case of complex outputs the modulus is used.<br />
<br />
Remember though, that no guarantee can be given that the poles will really disappear. Using this measure only combats the symptoms and not the cause of the problem. Also, this measure can be reasonably slow, because it evaluates a dense grid to decide wether the model is crossing boundaries. If the model is slow to evaluate, this can take a considerable amount of time. Also note that the model is checked on a dense grid. However, in higher dimensions the grid is sparser and thus no absolute certainty is given.<br />
Finally note that this is quite a strong constraint on the model building process. It means that a model which is otherwise very good, but simply overshoots the data in one place, will be penalized quite heavily. This, if you can, try not to to give too strict bounds.<br />
<br />
'''Tips''':<br />
# even if you don't know the exact bounds on your output, you can still use this measure by specifying very broad bounds (e.g., [-10000 10000]). This can still allow you to catch poles since, by definition, they reach until infinity.<br />
# if you are using the ANN models and you want your output to lie within [0 1] you can simply use a 'logsig' transfer function for the output layer. Then BY DEFINITION your output will lie in [0 1] and no extra measure is needed. To get a different range you could even rename logsig.m and add your own scaling. This solution is far superior than using MinMax.<br />
# if you are using the GeneticModelBuilder with population type custom, the MinMax idea can also be implemented in the GA itself as a constraint. Then you do not need the measure, this would also be a better solution.<br />
<br />
=== SampleError ===<br />
<br />
This measure simply calculates the error in the training data. Note that this measure is useless for interpolating methods like Kriging and RBF.<br />
<br />
=== ModelDifference ===<br />
<br />
The toolbox keeps track of the n best models found so far. The ModelDifference measure uses the disagreement between those models as a heuristic for ranking them. A model that differs considerably from the other models is assumed to be of poor quality. Remember that this is just a heuristic! We recommend ModelDifference never be used alone, but always in combination with some other model.<br />
<br />
=== LRMMeasure ===<br />
<br />
This is a very useful measure that can be used to complement other measures (see [[Multi-Objective Modeling]]) with some good results. For example, as a cheaper alternative for crossvaliation with neural networks. It will penalize models where they show unwanted 'bumps' or 'ripples' in the response.<br />
<br />
=== AIC ===<br />
<br />
Implements [http://en.wikipedia.org/wiki/Akaike_information_criterion Akaikes Information Criterion]. Note that this requires a proper implementation of the freeParams of a Model.</div>Dgorissenhttp://sumowiki.intec.ugent.be/index.php?title=FAQ&diff=5211FAQ2010-08-23T08:21:13Z<p>Dgorissen: /* How do I start a run with the samples of of a previous run, or with a custom initial design? */</p>
<hr />
<div>== General ==<br />
<br />
=== What is a global surrogate model? ===<br />
<br />
A global [http://en.wikipedia.org/wiki/Surrogate_model surrogate model] is a mathematical model that mimics the behavior of a computationally expensive simulation code over '''the complete parameter space''' as accurately as possible, using as little data points as possible. So note that optimization is not the primary goal, although it can be done as a post-processing step. Global surrogate models are useful for:<br />
<br />
* design space exploration, to get a ''feel'' of how the different parameters behave<br />
* sensitivity analysis<br />
* ''what-if'' analysis<br />
* prototyping<br />
* visualization<br />
* ...<br />
<br />
In addition they are a cheap way to model large scale systems, multiple global surrogate models can be chained together in a model cascade.<br />
<br />
See also the [[About]] page.<br />
<br />
=== What about surrogate driven optimization? ===<br />
<br />
When coining the term '''surrogate driven optimization''' most people associate it with trust-region strategies and simple polynomial models. These frameworks first construct a local surrogate which is optimized to find an optimum. Afterwards, a move limit strategy decides how the local surrogate is scaled and/or moved through the input space. Subsequently the surrogate is rebuild and optimized. I.e. the surrogate zooms in to the global optimum. For instance the [http://www.cs.sandia.gov/DAKOTA/ DAKOTA] Toolbox implements such strategies where the surrogate construction is separated from optimization.<br />
<br />
Such a framework was earlier implemented in the SUMO Toolbox but was deprecated as it didn't fit the philosophy and design of the toolbox. <br />
<br />
Instead another, equally powerful, approach was taken. The current optimization framework is in fact a sampling selection strategy that balances local and global search. In other words, it balances between exploring the input space and exploiting the information the surrogate gives us.<br />
<br />
A configuration example can be found [[Config:SampleSelector#expectedImprovement|here]].<br />
<br />
=== What is (adaptive) sampling? Why is it used? ===<br />
<br />
In classical Design of Experiments you need to specify the design of your experiment up-front. Or in other words, you have to say up-front how many data points you need and how they should be distributed. Two examples are Central Composite Designs and Latin Hypercube designs. However, if your data is expensive to generate (e.g., an expensive simulation code) it is not clear how many points are needed up-front. Instead data points are selected adaptively, only a couple at a time. This process of incrementally selecting new data points in regions that are the most interesting is called adaptive sampling, sequential design, or active learning. Of course the sampling process needs to start from somewhere so the very first set of points is selected based on a fixed, classic experimental design. See also [[Running#Understanding_the_control_flow]].<br />
SUMO provides a number of different sampling algorithms: [[SampleSelector]]<br />
<br />
Of course sometimes you dont want to do sampling. For example if you have a fixed dataset you just want to load all the data in one go and model that. For how to do this see [[FAQ#How_do_I_turn_off_adaptive_sampling_.28run_the_toolbox_for_a_fixed_set_of_samples.29.3F]].<br />
<br />
=== What about dynamical, time dependent data? ===<br />
<br />
The original design and purpose was to tackle static input-output systems, where there is no memory. Just a complex mapping that must be learnt and approximated. Of course you can take a fixed time interval and apply the toolbox but that typically is not a desired solution. Usually you are interested in time series prediction, e.g., given a set of output values from time t=0 to t=k, predict what happens at time t=k+1,k+2,...<br />
<br />
The toolbox was originally not intended for this purpose. However, it is quite easy to add support for recurrent models. Automatic generation of dynamical models would involve adding a new model type (just like you would add a new regression technique) or require adapting an existing one. For example it would not be too much work to adapt the ANN or SVM models to support dynamic problems. The only extra work besides that would be to add a new [[Measures|Measure]] that can evaluate the fidelity of the models' prediction.<br />
<br />
Naturally though, you would be unable to use sample selection (since it makes no sense in those problems). Unless of course there is a specialized need for it. In that case you would add a new [[SampleSelector]].<br />
<br />
For more information on this topic [[Contact]] us.<br />
<br />
=== What about classification problems? ===<br />
<br />
The main focus of the SUMO Toolbox is on regression/function approximation. However, the framework for hyperparameter optimization, model selection, etc. can also be used for classification. Starting from version 6.3 a demo file is included in the distribution that shows how this works on a well known test problem. If you want to play around with this feature without waiting for 6.3 to be released [[Contact|just let us know]].<br />
<br />
=== Can the toolbox drive my simulation code directly? ===<br />
<br />
Yes it can. See the [[Interfacing with the toolbox]] page.<br />
<br />
=== What is the difference between the M3-Toolbox and the SUMO-Toolbox? ===<br />
<br />
The SUMO toolbox is a complete, feature-full framework for automatically generating approximation models and performing adaptive sampling. In contrast, the M3-Toolbox was more of a proof-of-principle.<br />
<br />
=== What happened to the M3-Toolbox? ===<br />
<br />
The M3 Toolbox project has been discontinued (Fall 2007) and superseded by the SUMO Toolbox. Please contact tom.dhaene@ua.ac.be for any inquiries and requests about the M3 Toolbox.<br />
<br />
=== How can I stay up to date with the latest news? ===<br />
<br />
To stay up to date with the latest news and releases, we also recommend subscribing to our newsletter [http://www.sumo.intec.ugent.be here]. Traffic will be kept to a minimum (1 message every 2-3 months) and you can unsubscribe at any time.<br />
<br />
You can also follow our blog: [http://sumolab.blogspot.com/ http://sumolab.blogspot.com/].<br />
<br />
=== What is the roadmap for the future? ===<br />
<br />
There is no explicit roadmap since much depends on where our research leads us, what feedback we get, which problems we are working on, etc. However, to get an idea of features to come you can always check the [[Whats new]] page.<br />
<br />
You can also follow our blog: [http://sumolab.blogspot.com/ http://sumolab.blogspot.com/].<br />
<br />
=== Will there be an R/Scilab/Octave/Sage/.. version? ===<br />
<br />
At the start of the project we considered moving from Matlab to one of the available open source alternatives. However, after much discussion we decided against this for several reasons, including:<br />
<br />
* Existing experience and know-how of the development team<br />
* The widespread use of the Matlab platform in the target application domains<br />
* The quality and amount of available Matlab documentation<br />
* The quality and number of Matlab toolboxes<br />
* Support for object orientation (inheritance, polymorphism, etc.)<br />
* Many well documented interfacing options (especially the seamless integration with Java)<br />
<br />
Matlab, as a proprietary platform, definitely has its problems and deficiencies but the number of advanced algorithms and available toolboxes make it a very attractive platform. Equally important is the fact that every function is properly documented, tested, and includes examples, tutorials, and in some cases GUI tools. A lot of things would have been a lot harder and/or time consuming to implement on one of the other platforms. Add to that the fact that many engineers (particularly in aerospace) already use Matlab quite heavily. Thus given our situation, goals, and resources at the time, Matlab was the best choice for us. <br />
<br />
The other platforms remain on our radar however, and we do look into them from time to time. Though, with our limited resources porting to one of those platforms is not (yet) cost effective.<br />
<br />
=== What are collaboration options? ===<br />
<br />
We will gladly help out with any SUMO-Toolbox related questions or problems. However, since we are a university research group the most interesting goal for us is to work towards some joint publication (e.g., we can help with the modeling of your problem). Alternatively, it is always nice if we could use your data/problem (fully referenced and/or anonymized if necessary of course) as an example application during a conference presentation or in a PhD thesis.<br />
<br />
The most interesting case is if your problem involves sample selection and modeling. This means you have some simulation code or script to drive and you want an accurate model while minimizing the number of data points. In this case, in order for us to optimally help you it would be easiest if we could run your simulation code (or script) locally or access it remotely. Else its difficult to give good recommendations about what settings to use.<br />
<br />
If this is not possible (e.g., expensive, proprietary or secret modeling code) or if your problem does not involve sample selection, you can send us a fixed data set that is representative of your problem. Again, this may be fully anonymized and will be kept confidential of course.<br />
<br />
In either case (code or dataset) remember:<br />
<br />
* the data file should be an ASCII file in column format (each row containing one data point) (see also [[Interfacing_with_the_toolbox]])<br />
* include a short description of your data:<br />
** number of inputs and number of outputs<br />
** the range of each input (or scaled to [-1 1] if you do not wish to disclose this)<br />
** if the outputs are real or complex valued<br />
** how noisy the data is or if it is completely deterministic (computer simulation) (please also see: [[FAQ#My_data_contains_noise_can_the_SUMO-Toolbox_help_me.3F]]).<br />
** if possible the expected range of each output (or scaled if you do not wish to disclose this)<br />
** if possible the names of each input/output + a short description of what they mean<br />
** any further insight you have about the data, expected behavior, expected importance of each input, etc.<br />
<br />
If you have any further questions or comments related to this please [[Contact]] us.<br />
<br />
=== Can you help me model my problem? ===<br />
<br />
Please see the previous question: [[FAQ#What_are_collaboration_options.3F]]<br />
<br />
== Installation and Configuration ==<br />
<br />
=== What is the relationship between Matlab and Java? ===<br />
<br />
Many people do not know this, but your Matlab installation automatically includes a Java virtual machine. By default, Matlab seamlessly integrates with Java, allowing you to create Java objects from the command line (e.g., 's = java.lang.String'). It is possible to disable java support but in order to use the SUMO Toolbox it should not be. To check if Java is enabled you can use the 'usejava' command.<br />
<br />
=== What is Java, why do I need it, do I have to install it, etc. ? ===<br />
<br />
The short answer is: no, dont worry about it. The long answer is: Some of the code of the SUMO Toolbox is written in [http://en.wikipedia.org/wiki/Java_(programming_language) Java], since it makes a lot more sense in many situations and is a proper programming language instead of a scripting language like Matlab. Since Matlab automatically includes a JVM to run Java code there is nothing you need to do or worry about (see the previous FAQ entry). Unless its not working of course, in that case see [[FAQ#When_running_the_toolbox_you_get_something_like_.27.3F.3F.3F_Undefined_variable_.22ibbt.22_or_class_.22ibbt.sumo.config.ContextConfig.setRootDirectory.22.27]].<br />
<br />
=== What is XML? ===<br />
<br />
XML stands for eXtensible Markup Language and is related to HTML (= the stuff web pages are written in). The first thing you have to understand is that '''does not do anything'''. Honest. Many engineers are not used to it and think it is some complicated computer programming language-stuff-thingy. This is of course not the case (we ignore some of the fancy stuff you can do with it for now). XML is a markup language meaning, it provides some rules how you can annotate or structure existing text.<br />
<br />
The way SUMO uses XML is really simple and there is not much to understand. First some simple terminology. Take the following example:<br />
<br />
<source lang="xml"><br />
<Foo attr="bar">bla bla bla</Foo> <br />
</source><br />
<br />
Here we have '''a tag''' called ''Foo'' containing text ''bla bla bla''. The tag Foo also has an '''attribute''' ''attr'' with value ''bar''. '<Foo>' is what we call the '''opening tag''', and '</Foo>' is the '''closing tag'''. Each time you open a tag you must close it again. How you name the tags or attributes it totally up to you, you choose :)<br />
<br />
Lets take a more interesting example. Here we have used XML to represent information about a receipe for pancakes:<br />
<br />
<source lang="xml"><br />
<recipe category="dessert"><br />
<title>Pancakes</title><br />
<author>sumo@intec.ugent.be</author><br />
<date>Wed, 14 Jun 95</date><br />
<description><br />
Good old fashioned pancakes.<br />
</description><br />
<ingredients><br />
<item><br />
<amount>3</amount><br />
<type>eggs</type><br />
</item><br />
<br />
<item><br />
<amount>0.5 tablespoon</amount><br />
<type>salt</type><br />
</item><br />
...<br />
</ingredients><br />
<preparation><br />
...<br />
</preparation><br />
</recipe><br />
</source><br />
<br />
So basically, you see that XML is just a way to structure, order, and group information. Thats it! So SUMO basically uses it to store and structure configuration options. And this works well due to the nice hierarchical nature of XML.<br />
<br />
If you understand this there is nothing else to it in order to be able to understand the SUMO configuration files. If you need more information see the tutorial here: [http://www.w3schools.com/XML/xml_whatis.asp http://www.w3schools.com/XML/xml_whatis.asp]. You can also have a look at the wikipedia page here: [http://en.wikipedia.org/wiki/XML http://en.wikipedia.org/wiki/XML]<br />
<br />
=== Why does SUMO use XML? ===<br />
<br />
XML is the defacto standard way of structuring information. This ranges from spreadsheet files (Microsoft Excel for example), to configuration data, to scientific data, ... There are even whole database systems based solely on XML. So basically, its an intuitive way to structure data and it is used everywhere. This makes that there are a very large number of libraries and programming languages available that can parse, and handle XML easily. That means less work for the programmer. Then of course there is stuff like XSLT, XQuery, etc that makes life even easier.<br />
So basically, it would not make sense for SUMO to use any other format :)<br />
<br />
=== I get an error that SUMO is not yet activated ===<br />
<br />
Make sure you installed the activation file that was mailed to you as is explained in the [[Installation]] instructions. Also double check your system meets the [[System requirements]] and that [http://www.sumowiki.intec.ugent.be/index.php/FAQ#When_running_the_toolbox_you_get_something_like_.27.3F.3F.3F_Undefined_variable_.22ibbt.22_or_class_.22ibbt.sumo.config.ContextConfig.setRootDirectory.22.27|java java is enabled]. To fully verify that the activation file installation is correct ensure that the file ContextConfig.class is present in the directory ''<SUMO installation directory>/bin/java/ibbt/sumo/config''.<br />
<br />
Please note that more flexible research licenses are available if it is possible to [[FAQ#What_are_collaboration_options.3F|collaborate in any way]].<br />
<br />
== Upgrading ==<br />
<br />
=== How do I upgrade to a newer version? ===<br />
<br />
Delete your old <code><SUMO-Toolbox-directory></code> completely and replace it by the new one. Install the new activation file / extension pack as before (see [[Installation]]), start Matlab and make sure the default run works. To port your old configuration files to the new version: make a copy of default.xml (from the new version) and copy over your custom changes (from the old version) one by one. This should prevent any weirdness if the XML structure has changed between releases.<br />
<br />
If you had a valid activation file for the previous version, just [[Contact]] us (giving your SUMOlab website username) and we will send you a new activation file. Note that to update an activation file you must first unzip a copy of the toolbox to a new directory and install the activation file as if it was the very first time. Upgrading of an activation file without performing a new toolbox install is (unfortunately) not (yet) supported.<br />
<br />
== Using ==<br />
<br />
=== I have no idea how to use the toolbox, what should I do? ===<br />
<br />
See: [[Running#Getting_started]]<br />
<br />
=== I want to try one of the different examples ===<br />
<br />
See [[Running#Running_different_examples]].<br />
<br />
=== I want to model my own problem ===<br />
<br />
See : [[Adding an example]].<br />
<br />
=== I want to contribute some data/patch/documentation/... ===<br />
<br />
See : [[Contributing]].<br />
<br />
=== How do I interface with the SUMO Toolbox? ===<br />
<br />
See : [[Interfacing with the toolbox]].<br />
<br />
=== What configuration options (model type, sample selection algorithm, ...) should I use for my problem? ===<br />
<br />
See [[General_guidelines]].<br />
<br />
=== Ok, I generated a model, what can I do with it? ===<br />
<br />
See: [[Using a model]].<br />
<br />
=== How can I share a model created by the SUMO Toolbox? ===<br />
<br />
See : [[Using a model#Model_portability| Model portability]].<br />
<br />
=== I dont like the final model generated by SUMO how do I improve it? ===<br />
<br />
Before you start the modeling you should really ask youself this question: ''What properties do I want to see in the final model?'' You have to think about what for you constitutes a good model and what constitutes a poor model. Then you should rank those properties depending on how important you find them. Examples are:<br />
<br />
* accuracy in the training data<br />
** is it important that the error in the training data is exactly 0, or do you prefer some smoothing<br />
* accuracy outside the training data<br />
** this is the validation or test error, how important is proper generalization (usually this is very important)<br />
* what does accuracy mean to you? a low maximum error, a low average error, both, ...<br />
* smoothness<br />
** should your model be perfectly smooth or is it acceptable that you have a few small ripples here and there for example<br />
* are some regions of the response more important than others?<br />
** for example you may want to be certain that the minima/maxima are captured very accurately but everything in between is less important<br />
* are there particular special features that your model should have<br />
** for example, capture underlying poles or discontinuities correctly<br />
* extrapolation capability<br />
* ...<br />
<br />
It is important to note that often these criteria may be conflicting. The classical example is fitting noisy data: the lower your training error the higher your testing error. A natural approach is to combine multiple criteria, see [[Multi-Objective Modeling]].<br />
<br />
Once you have decided on a set of requirements the question is then, can the SUMO-Toolbox produce a model that meets them? In SUMO model generation is driven by one or more [[Measures]]. So you should choose the combination of [[Measures]] that most closely match your requirements. Of course we can not provide a Measure for every single property, but it is very straightforward to [[Add_Measure|add your own Measure]].<br />
<br />
Now, lets say you have chosen what you think are the best Measures but you are still not happy with the final model. Reasons could be:<br />
<br />
* you need more modeling iterations or you need to build more models per iteration (see [[Running#Understanding_the_control_flow]]). This will result in a more extensive search of the model parameter space, but will take longer to run.<br />
* you should switch to a different model parameter optimization algorithm (e.g., for example instead of the Pattern Search variant, try the Genetic Algorithm variant of your AdaptiveModelBuilder.)<br />
* the model type you are using is not ideally suited to your data<br />
* there simply is not enough data, use a larger initial design or perform more sampling iterations to get more information per dimension<br />
* maybe the sample distribution is causing troubles for your model (e.g., Kriging can have problems with clustered data). In that case it could be worthwhile to choose a different sample selection algorithm.<br />
* the range of your response variable is not ideal (for example, neural networks have trouble modeling data if the range of the outputs is very very small)<br />
<br />
You may also refer to the following [[General_guidelines]]. Finally, of course it may be that your problem is simply a very difficult one and does not approximate well. But, still you should at least get something satisfactory.<br />
<br />
If you are having these kinds of problems, please [[Reporting_problems|let us know]] and we will gladly help out.<br />
<br />
=== My data contains noise can the SUMO-Toolbox help me? ===<br />
<br />
The original purpose of the SUMO-Toolbox was for it to be used in conjunction with computer simulations. Since these are fully deterministic you do not have to worry about noise in the data and all the problems it causes. However, the methods in the toolbox are general fitting methods that work on noisy data as well. So yes, the toolbox can be used with noisy data, but you will just have to be more careful about how you apply the methods and how you perform model selection. Its only when you use the toolbox with a noisy simulation engine that a few special options may need to be set. In that case [[Contact]] us for more information.<br />
<br />
Note though, that the toolbox is not a statistical package, if you have noisy data and you need noise estimation algorithms, kernel smoothing algorithms, etc. you should look towards other tools.<br />
<br />
=== What is the difference between a ModelBuilder and a ModelFactory? ===<br />
<br />
See [[Add Model Type]].<br />
<br />
=== Why are the Neural Networks so slow? ===<br />
<br />
The ANN models are an extremely powerful model type that give very good results in many problems. However, they are quite slow to use. There are some things you can do:<br />
<br />
* use trainlm or trainscg instead of the default training function trainbr. trainbr gives very good, smooth results but is slower to use. If results with trainlm are not good enough, try using msereg as a performance function.<br />
* try setting the training goal (= the SSE to reach during training) to a small positive number (e.g., 1e-5) instead of 0.<br />
* check that the output range of your problem is not very small. If your response data lies between 10e-5 and 10e-9 for example it will be very hard for the neural net to learn it. In that case rescale your data to a more sane range.<br />
* switch from ANN to one of the other neural network modelers: fanngenetic or nanngenetic. These are a lot faster than the default backend based on the [http://www.mathworks.com/products/neuralnet/ Matlab Neural Network Toolbox]. However, the accuracy is usually not as good.<br />
* If you are using [[Measures#CrossValidation| CrossValidation]] try to switch to a different measure since CrossValidation is very expensive to use. CrossValidation is used by default if you have not defined a [[Measures| measure]] yourself. When using one of the neural network model types, try to use a different measure if you can. For example, our tests have shown that minimizing the sum of [[Measures#SampleError| SampleError]] and [[Measures#LRMMeasure| LRMMeasure]] can give equal or even better results than CrossValidation, while being much cheaper (see [[Multi-Objective Modeling]] for how to combine multiple measures). See also the comments in <code>default.xml</code> for examples.<br />
<br />
See also [[FAQ#How_can_I_make_the_toolbox_run_faster.3F]]<br />
<br />
=== How can I make the toolbox run faster? ===<br />
<br />
There are a number of things you can do to speed things up. These are listed below. Remember though that the main reason the toolbox may seem to be slow is due to the many models being built as part of the hyperparameter optimization. Please make sure you fully understand the [[Running#Understanding_the_control_flow|control flow described here]] before trying more advanced options.<br />
<br />
* First of all check that your virus scanner is not interfering with Matlab. If McAfee or any other program wants to scan every file SUMO generates this really slows things down and your computer becomes unusable.<br />
<br />
* Turn off the plotting of models in [[Config:ContextConfig#PlotOptions| ContextConfig]], you can always generate plots from the saved mat files<br />
<br />
* This is an important one. For most model builders there is an option "maxFunEals", "maxIterations", or equivalent. Change this value to change the maximum number of models built between 2 sampling iterations. The higher this number, the slower, but the better the models ''may'' be. Equivalently, for the Genetic model builders reduce the population size and the number of generations.<br />
<br />
* If you are using [[Measures#CrossValidation]] see if you can avoid it and use one of the other measures or a combination of measures (see [[Multi-Objective Modeling]]<br />
<br />
* If you are using a very dense [[Measures#ValidationSet]] as your Measure, this means that every single model will be evaluated on that data set. For some models like RBF, Kriging, SVM, this can slow things down.<br />
<br />
* Disable some, or even all of the [[Config:ContextConfig#Profiling| profilers]] or disable the output handlers that draw charts. For example, you might use the following configuration for the profilers:<br />
<br />
<source lang="xml"><br />
<Profiling><br />
<Profiler name=".*share.*|.*ensemble.*|.*Level.*" enabled="true"><br />
<Output type="toImage"/><br />
<Output type="toFile"/><br />
</Profiler><br />
<br />
<Profiler name=".*" enabled="true"><br />
<Output type="toFile"/><br />
</Profiler><br />
</Profiling><br />
</source><br />
<br />
The ".*" means match any one or more characters ([http://java.sun.com/j2se/1.5.0/docs/api/java/util/regex/Pattern.html see here for the full list of supported wildcards]). Thus in this example all the profilers that have "share", "ensemble", or "Level" in their name shoud be enabled and should be saved as a text file (toFile) AND as an image file (toImage). All the other profilers should be saved just to file. The idea is to only save to image what you want as an image since image generation is expensive. If you do this or switch off image generation completely you will see everything run much faster.<br />
<br />
* Decrease the logging granularity, a log level of FINE (the default is FINEST or ALL) is more then granular enough. Setting it to FINE, INFO, or even WARNING should speed things up.<br />
<br />
* If you have a multi-core/multi-cpu machine:<br />
** if you have the Matlab Parallel Computing Toolbox, try setting the parallelMode option to true in [[Config:ContextConfig]]. Now all model training occurs in parallel. This may give unexpected errors in some cases so beware when using.<br />
** if you are using a native executable or script as the sample evaluator set the threadCount variable in [[Config:SampleEvaluator#LocalSampleEvaluator| LocalSampleEvaluator]] equal to the number of cores/CPUs (only do this if it is ok to start multiple instances of your simulation script in parallel!)<br />
<br />
* Dont use the Min-Max measure, it can slow things down. See also [[FAQ#How_do_I_force_the_output_of_the_model_to_lie_in_a_certain_range]]<br />
<br />
* If you are using neural networks see [[FAQ#Why_are_the_Neural_Networks_so_slow.3F]]<br />
<br />
* If you are having problems with very slow or seemingly hanging runs:<br />
** Do a run inside the [http://www.mathworks.com/access/helpdesk/help/techdoc/index.html?/access/helpdesk/help/techdoc/matlab_env/f9-17018.html&http://www.google.be/search?client=firefox-a&rls=org.mozilla%3Aen-US%3Aofficial&channel=s&hl=nl&q=matlab+profiler&meta=&btnG=Google+zoeken Matlab profiler] and see where most time is spent.<br />
<br />
** Monitor CPU and physical/virtual memory usage while the SUMO toolbox is running and see if you notice anything strange. <br />
<br />
* Also note that by default Matlab only allocates about 117 MB memory space for the Java Virtual Machine. If you would like to increase this limit (which you should) please follow the instructions [http://www.mathworks.com/support/solutions/data/1-18I2C.html?solution=1-18I2C here]. See also the general memory instructions [http://www.mathworks.com/support/tech-notes/1100/1106.html here].<br />
<br />
To check if your SUMO run has hanged, monitor your log file (with the level set at least to FINE). If you see no changes for about 30 minutes the toolbox will probably have stalled. [[Reporting problems| report the problems here]].<br />
<br />
Such problems are hard to identify and fix so it is best to work towards a reproducible test case if you think you found a performance or scalability issue.<br />
<br />
=== How do I build models with more than one output ===<br />
<br />
Sometimes you have multiple responses that you want to model at once. See [[Running#Models_with_multiple_outputs]]<br />
<br />
=== How do I turn off adaptive sampling (run the toolbox for a fixed set of samples)? ===<br />
<br />
See : [[Adaptive Modeling Mode]].<br />
<br />
=== How do I change the error function (relative error, RMSE, ...)? ===<br />
<br />
The [[Measures| <Measure>]] tag specifies the algorithm to use to assign models a score, e.g., [[Measures#CrossValidation| CrossValidation]]. It is also possible to specify which '''error function''' to use, in the measure. The default error function is '<code>rootRelativeSquareError</code>'.<br />
<br />
Say you want to use [[Measures#CrossValidation| CrossValidation]] with the maximum absolute error, then you would put:<br />
<br />
<source lang="xml"><br />
<Measure type="CrossValidation" target="0.001" errorFcn="maxAbsoluteError"/><br />
</source><br />
<br />
On the other hand, if you wanted to use the [[Measures#ValidationSet| ValidationSet]] measure with a relative root-mean-square error you would put:<br />
<br />
<source lang="xml"><br />
<Measure type="ValidationSet" target="0.001" errorFcn="relativeRms"/><br />
</source><br />
<br />
The default error function is '<code>rootRelativeSquareError</code>'. These error functions can be found in the <code>src/matlab/tools/errorFunctions</code> directory. You are free to modify them and add your own. Remember that the choice of error function is very important! Make sure you think well about it. Also see [[Multi-Objective Modeling]].<br />
<br />
=== How do I enable more profilers? ===<br />
<br />
Go to the [[Config:ContextConfig#Profiling| <Profiling>]] tag and put <code>"<nowiki>.*</nowiki>"</code> as the regular expression. See also the next question.<br />
<br />
=== What regular expressions can I use to filter profilers? ===<br />
<br />
See the syntax [http://java.sun.com/j2se/1.5.0/docs/api/java/util/regex/Pattern.html here].<br />
<br />
=== How can I ensure deterministic results? ===<br />
<br />
See : [[Random state]].<br />
<br />
=== How do I get a simple closed-form model (symbolic expression)? ===<br />
<br />
See : [[Using a model]].<br />
<br />
=== How do I enable the Heterogenous evolution to automatically select the best model type? ===<br />
<br />
Simply use the [[Config:AdaptiveModelBuilder#heterogenetic| heterogenetic modelbuilder]] as you would any other.<br />
<br />
=== What is the combineOutputs option? ===<br />
<br />
See [[Running#Models_with_multiple_outputs]]<br />
<br />
=== What error function should I use? ===<br />
<br />
The default error function is the Root Relative Square Error (RRSE). On the other hand meanRelativeError may be more intuitive but in that case you have to be careful if you have function values close to zero since in that case the relative error explodes or even gives infinity. You could also use one of the combined relative error functions (contain a +1 in the denominator to account for small values) but then you get something between a relative and absolute error (=> hard to interpret).<br />
<br />
So to be sure an absolute error seems the safest bet (like the RMSE), however in that case you have to come up with sensible accuracy targets and realize that you will build models that try to fit the regions of high absolute value better than the low ones.<br />
<br />
Picking an error function is a very tricky business and many people do not realize this. Which one is best for you and what targets you use ultimately depends on your application and on what kind of model you want. There is no general answer.<br />
<br />
A recommended read is [http://www.springerlink.com/content/24104526223221u3/ is this paper]. See also the page on [[Multi-Objective Modeling]].<br />
<br />
=== I just want to generate an initial design (no sampling, no modeling) ===<br />
<br />
Do a regular SUMO run, except set the 'maxModelingIterations' in the SUMO tag to 0. The resulting run will only generate (and evaluate) the initial design and save it to samples.txt in the output directory.<br />
<br />
=== How do I start a run with the samples of of a previous run, or with a custom initial design? ===<br />
<br />
Use a Dataset design component, for example:<br />
<br />
<source lang="xml"><br />
<InitialDesign type="DatasetDesign"><br />
<Option key="file" value="/path/to/the/file/containing/the/points.txt"/><br />
</InitialDesign><br />
</source><br />
<br />
The points of a previous run can be found in the samples.txt file in the output directory of the run you want to continue.<br />
<br />
As a sidenote, remark you can start the toolbox with *data points* of a previous run, but not with the *models* of a previous run.<br />
<br />
=== What is a level plot? ===<br />
<br />
A level plot is a plot that shows how the error histogram changes as the best model improves. An example is:<br />
<gallery><br />
Image:levelplot.png<br />
</gallery><br />
Level plots only work if you have a separate dataset (test set) that the model can be checked against. See the comments in default.xml for how to enable level plots.<br />
<br />
===I am getting a java out of memory error, what happened?===<br />
Datasets are loaded through java. This means that the java heap space is used for storing the data. If you try to load a huge dataset (> 50MB), you might experience problems with the maximum heap size. You can solve this by raising the heap size as described on the following webpage:<br />
[http://www.mathworks.com/support/solutions/data/1-18I2C.html]<br />
<br />
=== How do I force the output of the model to lie in a certain range ===<br />
<br />
See [[Measures#MinMax]].<br />
<br />
=== My problem is high dimensional and has a lot of input parameters (more than 10). Can I use SUMO? ===<br />
<br />
That depends. Remember that the main focus of SUMO is to generate accurate 'global' models. If you want to do sampling the practical dimensionality is limited to around 6-8 (though it depends on the problem and how cheap the simulations are!). Since the more dimensions the more space you need to fill. At that point you need to see if you can extend the models with domain specific knowledge (to improve performance) or apply a dimensionality reduction method ([[FAQ#Can_the_toolbox_tell_me_which_are_the_most_important_inputs_.28.3D_variable_selection.29.3F|see the next question]]). On the other hand, if you don't need to do sample selection but you have a fixed dataset which you want to model. Then the performance on high dimensional data just depends on the model type. For examples SVM type models are independent of the dimension and thus can always be applied. Though things like feature selection are always recommended.<br />
<br />
=== Can the toolbox tell me which are the most important inputs (= variable selection)? ===<br />
<br />
When tackling high dimensional problems a crucial question is "Are all my input parameters relevant?". Normally domain knowledge would answer this question but this is not always straightforward. In those cases a whole set of algorithms exist for doing dimensionality reduction (= feature selection). Support for some of these algorithms may eventually make it into the toolbox but are not currently implemented. That is a whole PhD thesis on its own. However, if a model type provides functions for input relevance determination the toolbox can leverage this. For example, the LS-SVM model available in the toolbox supports Automatic Relevance Determination (ARD). This means that if you use the SUMO Toolbox to generate an LS-SVM model, you can call the function ''ARD()'' on the model and it will give you a list of the inputs it thinks are most important.<br />
<br />
=== Should I use a Matlab script or a shell script for interfacing with my simulation code? ===<br />
<br />
When you want to link SUMO with an external simulation engine (ADS Momentum, SPECTRE, FEBIO, SWAT, ...) you need a [http://en.wikipedia.org/wiki/Shell_script shell script] (or executable) that can take the requested points from SUMO, setup the simulation engine (e.g., set necessary input files), calls the simulator for all the requested points, reads the output (e.g., one or more output files), and returns the results to SUMO (see [[Interfacing with the toolbox]]).<br />
<br />
Which one you choose (matlab script + [[Config:SampleEvaluator#matlab|Matlab Sample Evaluator]], or shell script/executable with [[Config:SampleEvaluator#local|Local Sample Evaluator]] is basically a matter of preference, take whatever is easiest for you.<br />
<br />
HOWEVER, there is one important consideration: Matlab does not support threads so this means that if you use a matlab script to interface with the simulation engine, simulations and modeling will happen sequentially, NOT in parallel. This means the modeling code will sit around waiting, doing nothing, until the simulation(s) have finished. If your simulation code takes a long time to run this is not very efficient.<br />
<br />
On the other hand, using a shell script/executable, does allow the modeling and simulation to occur in parallel (at least if you wrote your interface script in such a way that it can be run multiple times in parallel, i.e., no shared global directories or variables that can cause [http://en.wikipedia.org/wiki/Race_condition race conditions]).<br />
<br />
As a sidenote, note that if you already put work into a Matlab script, it is still possible to use a shell script, by writing a shell script that starts Matlab (using -nodisplay or -nojvm options), executes your script (using the -r option), and exits Matlab again. Of course it is not very elegant and adds some overhead but depending on your situation it may be worth it.<br />
<br />
=== How can I look at the internal structure of a SUMO model ===<br />
<br />
See [[Using_a_model#Available_methods]].<br />
<br />
=== Is there any design documentation available? ===<br />
<br />
An in depth overview of the rationale and philosophy, including a treatment of the software architecture underlying the SUMO Toolbox is available in the form of a PhD dissertation. A copy of this dissertation [http://www.sumo.intec.ugent.be/?q=system/files/2010_04_PhD_DirkGorissen.pdf is available here].<br />
<br />
== Troubleshooting ==<br />
<br />
=== I have a problem and I want to report it ===<br />
<br />
See : [[Reporting problems]].<br />
<br />
=== I sometimes get flat models when using rational functions ===<br />
<br />
First make sure the model is indeed flat, and does not just appear so on the plot. You can verify this by looking at the output axis range and making sure it is within reasonable bounds. When there are poles in the model, the axis range is sometimes stretched to make it possible to plot the high values around the pole, causing the rest of the model to appear flat. If the model contains poles, refer to the next question for the solution.<br />
<br />
The [[Config:AdaptiveModelBuilder#rational| RationalModel]] tries to do a least squares fit, based on which monomials are allowed in numerator and denominator. We have experienced that some models just find a flat model as the best least squares fit. There are two causes for this:<br />
<br />
* The number of sample points is few, and the model parameters (as explained [[Model types explained#PolynomialModel|here]]) force the model to use only a very small set of degrees of freedom. The solution in this case is to increase the minimum percentage bound in the RationalFactory section of your configuration file: change the <code>"percentBounds"</code> option to <code>"60,100"</code>, <code>"80,100"</code>, or even <code>"100,100"</code>. A setting of <code>"100,100"</code> will force the polynomial models to always exactly interpolate. However, note that this does not scale very well with the number of samples (to counter this you can set <code>"maxDegrees"</code>). If, after increasing the <code>"percentBounds"</code> you still get weird, spiky, models you simply need more samples or you should switch to a different model type.<br />
* Another possibility is that given a set of monomial degrees, the flat function is just the best possible least squares fit. In that case you simply need to wait for more samples.<br />
* The measure you are using is not accurately estimating the true error, try a different measure or error function. Note that a maximum relative error is dangerous to use since a the 0-function (= a flat model) has a lower maximum relative error than a function which overshoots the true behavior in some places but is otherwise correct.<br />
<br />
=== When using rational functions I sometimes get 'spikes' (poles) in my model ===<br />
<br />
When the denominator polynomial of a rational model has zeros inside the domain, the model will tend to infinity near these points. In most cases these models will only be recognized as being `the best' for a short period of time. As more samples get selected these models get replaced by better ones and the spikes should disappear.<br />
<br />
So, it is possible that a rational model with 'spikes' (caused by poles inside the domain) will be selected as best model. This may or may not be an issue, depending on what you want to use the model for. If it doesn't matter that the model is very inaccurate at one particular, small spot (near the pole), you can use the model with the pole and it should perform properly.<br />
<br />
However, if the model should have a reasonable error on the entire domain, several methods are available to reduce the chance of getting poles or remove the possibility altogether. The possible solutions are:<br />
<br />
* Simply wait for more data, usually spikes disappear (but not always).<br />
* Lower the maximum of the <code>"percentBounds"</code> option in the RationalFactory section of your configuration file. For example, say you have 500 data points and if the maximum of the <code>"percentBounds"</code> option is set to 100 percent it means the degrees of the polynomials in the rational function can go up to 500. If you set the maximum of the <code>"percentBounds"</code> option to 10, on the other hand, the maximum degree is set at 50 (= 10 percent of 500). You can also use the <code>"maxDegrees"</code> option to set an absolute bound.<br />
* If you roughly know the output range your data should have, an easy way to eliminate poles is to use the [[Measures#MinMax| MinMax]] [[Measures| Measure]] together with your current measure ([[Measures#CrossValidation| CrossValidation]] by default). This will cause models whose response falls outside the min-max bounds to be penalized extra, thus spikes should disappear.<br />
* Use a different model type (RBF, ANN, SVM,...), as spikes are a typical problem of rational functions.<br />
* Increase the population size if using the genetic version<br />
* Try using the [[SampleSelector#RationalPoleSuppressionSampleSelector| RationalPoleSuppressionSampleSelector]], it was designed to get rid of this problem more quickly, but it only selects one sample at the time.<br />
<br />
However, these solutions may not still not suffice in some cases. The underlying reason is that the order selection algorithm contains quite a lot of randomness, making it prone to over-fitting. This issue is being worked on but will take some time. Automatic order selection is not an easy problem<br />
<br />
=== There is no noise in my data yet the rational functions don't interpolate ===<br />
<br />
[[FAQ#I sometimes get flat models when using rational functions |see this question]].<br />
<br />
=== When loading a model from disk I get "Warning: Class ':all:' is an unknown object class. Object 'model' of this class has been converted to a structure." ===<br />
<br />
You are trying to load a model file without the SUMO Toolbox in your Matlab path. Make sure the toolbox is in your Matlab path. <br />
<br />
In short: Start Matlab, run <code><SUMO-Toolbox-directory>/startup.m</code> (to ensure the toolbox is in your path) and then try to load your model.<br />
<br />
=== When running the SUMO Toolbox you get an error like "No component with id 'annpso' of type 'adaptive model builder' found in config file." ===<br />
<br />
This means you have specified to use a component with a certain id (in this case an AdaptiveModelBuilder component with id 'annpso') but a component with that id does not exist further down in the configuration file (in this particular case 'annpso' does not exist but 'anngenetic' or 'ann' does, as a quick search through the configuration file will show). So make sure you only declare components which have a definition lower down. So see which components are available, simply scroll down the configuration file and see which id's are specified. Please also refer to the [[Toolbox configuration#Declarations and Definitions | Declarations and Definitions]] page.<br />
<br />
=== When using NANN models I sometimes get "Runtime error in matrix library, Choldc failed. Matrix not positive definite" ===<br />
<br />
This is a problem in the mex implementation of the [http://www.iau.dtu.dk/research/control/nnsysid.html NNSYSID] toolbox. Simply delete the mex files, the Matlab implementation will be used and this will not cause any problems.<br />
<br />
=== When using FANN models I sometimes get "Invalid MEX-file createFann.mexa64, libfann.so.2: cannot open shared object file: No such file or directory." ===<br />
<br />
This means Matlab cannot find the [http://leenissen.dk/fann/ FANN] library itself to link to dynamically. Make sure the FANN libraries (stored in src/matlab/contrib/fann/src/.libs/) are in your library path, e.g., on unix systems, make sure they are included in LD_LIBRARY_PATH.<br />
<br />
=== Undeﬁned function or method ’createFann’ for input arguments of type ’double’. ===<br />
<br />
See [[FAQ#When_using_FANN_models_I_sometimes_get_.22Invalid_MEX-file_createFann.mexa64.2C_libfann.so.2:_cannot_open_shared_object_file:_No_such_file_or_directory..22]]<br />
<br />
=== When trying to use SVM models I get 'Error during fitness evaluation: Error using ==> svmtrain at 170, Group must be a vector' ===<br />
<br />
You forgot to build the SVM mex files for your platform. For windows they are pre-compiled for you, on other systems you have to compile them yourself with the makefile.<br />
<br />
=== When running the toolbox you get something like '??? Undefined variable "ibbt" or class "ibbt.sumo.config.ContextConfig.setRootDirectory"' ===<br />
<br />
First see [[FAQ#What_is_the_relationship_between_Matlab_and_Java.3F | this FAQ entry]].<br />
<br />
This means Matlab cannot find the needed Java classes. This typically means that you forgot to run 'startup' (to set the path correctly) before running the toolbox (using 'go'). So make sure you always run 'startup' before running 'go' and that both commands are always executed in the toolbox root directory.<br />
<br />
If you did run 'startup' correctly and you are still getting an error, check that Java is properly enabled:<br />
<br />
# typing 'usejava jvm' should return 1 <br />
# typing 's = java.lang.String', this should ''not'' give an error<br />
# typing 'version('-java')' should return at least version 1.5.0<br />
<br />
If (1) returns 0, then the jvm of your Matlab installation is not enabled. Check your Matlab installation or startup parameters (did you start Matlab with -nojvm?)<br />
If (2) fails but (1) is ok, there is a very weird problem, check the Matlab documentation.<br />
If (3) returns a version before 1.5.0 you will have to upgrade Matlab to a newer version or force Matlab to use a custom, newer, jvm (See the Matlab docs for how to do this).<br />
<br />
=== You get errors related to ''gaoptimset'',''psoptimset'',''saoptimset'',''newff'' not being found or unknown ===<br />
<br />
You are trying to use a component of the SUMO toolbox that requires a Matlab toolbox that you do not have. See the [[System requirements]] for more information.<br />
<br />
=== After upgrading I get all kinds of weird errors or warnings when I run my XML files ===<br />
<br />
See [[FAQ#How_do_I_upgrade_to_a_newer_version.3F]]<br />
<br />
=== I get a warning about duplicate samples being selected, why is this? ===<br />
<br />
Sometimes, in special circumstances, multiple sample selectors may select the same sample at the same time. Even though in most cases this is detected and avoided, it can still happen when multiple outputs are modelled in one run, and each output is sampled by a different sample selector. These sample selectors may then accidentally choose the same new sample location.<br />
<br />
=== I sometimes see the error of the best model go up, shouldn't it decrease monotonically? ===<br />
<br />
There is no short answer here, it depends on the situation. Below 'single objective' refers to the case where during the hyperparameter optimization (= the modeling iteration) combineOutputs=false, and there is only a single measure set to 'on'. The other cases are classified as 'multi objective'. See also [[Multi-Objective Modeling]].<br />
<br />
# '''Sampling off'''<br />
## ''Single objective'': the error should always decrease monotonically, you should never see it rise. If it does [[reporting problems|report it as a bug]]<br />
## ''Multi objective'': There is a very small chance the error can temporarily decrease but it should be safe to ignore. In this case it is best to use a multi objective enabled modeling algorithm<br />
# '''Sampling on'''<br />
## ''Single objective'': inside each modeling iteration the error should always monotonically decrease. At each sampling iteration the best models are updated (to reflect the new data), thus there the best model score may increase, this is normal behavior(*). It is possible that the error increases for a short while, but as more samples come in it should decrease again. If this does not happen you are using a poor measure or poor hyperparameter optimization algorithm, or there is a problem with the modeling technique itself (e.g., clustering in the datapoints is causing numerical problems).<br />
## ''Multi objective'': Combination of 1.2 and 2.1.<br />
<br />
(*) This is normal if you are using a measure like cross validation that is less reliable on little data than on more data. However, in some cases you may wish to override this behavior if you are using a measure that is independent of the number of samples the model is trained with (e.g., a dense, external validation set). In this case you can force a monotonic decrease by setting the 'keepOldModels' option in the SUMO tag to true. Use with caution!<br />
<br />
=== At the end of a run I get Undefined variable "ibbt" or class "ibbt.sumo.util.JpegImagesToMovie.createMovie" ===<br />
<br />
This is normal, the warning printed out before the error explains why:<br />
<br />
''[WARNING] jmf.jar not found in the java classpath, movie creation may not work! Did you install the SUMO extension pack? Alternatively you can install the java media framwork from java.sun.com''<br />
<br />
By default, at the end of a run, the toolbox will try to generate a movie of all the intermediate model plots. To do this it requires the extension pack to be installed (you can download it from the SUMO lab website). So install the extension pack and you will no longer get the error. Alternatively you can simply set the "createMovie" option in the <SUMO> tag to "false".<br />
So note that there is nothing to worry about, everything has run correctly, it is just the movie creation that is failing.<br />
<br />
=== On startup I get the error "java.io.IOException: Couldn't get lock for output/SUMO-Toolbox.%g.%u.log" ===<br />
<br />
This error means that SUMO is unable to create the log file. Check the output directory exists and has the correct permissions. If your output directory is on a shared (network) drive this could also cause problems. Also make sure you are running the toolbox (calling 'go') from the toolbox root directory, and not in some toolbox sub directory! This is very important.<br />
<br />
If you still have problems you can override the default logfile name and location as follows:<br />
<br />
In the <FileHandler> tag inside the <Logging> tag add the following option:<br />
<br />
<code><br />
<Option key="Pattern" value="My_SUMO_Log_file.log"/><br />
</code><br />
<br />
This means that from now on the sumo log file will be saved as the file "My_SUMO_Log_file.log" in the SUMO root directory. You can use any path you like.<br />
For more information about this option see [http://java.sun.com/j2se/1.4.2/docs/api/java/util/logging/FileHandler.html the FileHandler Javadoc].<br />
<br />
=== The Toolbox crashes with "Too many open files" what should I do? ===<br />
<br />
This is a known bug, see [[Known_bugs#Version_6.1]].<br />
<br />
If this does not fix your problem then do the following:<br />
<br />
On Windows try increasing the limit in windows as dictated by the error message. Also, when you get the error, use the fopen("all") command to see which files are open and send us the list of filenames. Then we can maybe further help you debug the problem. Even better would be to use the Process Explorer utility [http://technet.microsoft.com/en-us/sysinternals/bb896653.aspx available here]. When you get the error, dont shut down Matlab but start Process explorer and see which SUMO-Toolbox related files are open. If you then [[Reporting_problems|let us know]] we can further debug the problem.<br />
<br />
On Linux again don't shut down Matlab but:<br />
<br />
* open a new terminal window<br />
* type:<br />
<source lang="bash"><br />
lsof > openFiles.txt<br />
</source><br />
* Then [[Contact|send us]] the following information:<br />
** the file openFiles.txt <br />
** the exact Linux distribution you are using (Red Hat 10, CentOS 5, SUSE 11, etc).<br />
** the output of<br />
<source lang="bash"><br />
uname -a ; df -T ; mount<br />
</source><br />
<br />
As a temporary workaround you can try increasing the maximum number of open files ([http://www.linuxforums.org/forum/redhat-fedora-linux-help/64716-where-chnage-file-max-permanently.html see for example here]). We are currently debugging this issue.<br />
<br />
In general: to be safe it is always best to do a SUMO run from a clean Matlab startup, especially if the run is important or may take a long time.<br />
<br />
=== When using the LS-SVM models I get lots of warnings: "make sure lssvmFILE.x (lssvmFILE.exe) is in the current directory, change now to MATLAB implementation..." ===<br />
<br />
The LS-SVMs have a C implementation and a Matlab implementation. If you dont have the compiled mex files it will use the matlab implementation and give a warning. But everything will work properly. To get rid of the warnings, compile the mex files [[Installation#Windows|as described here]], this can be done very easily. Or simply comment out the lines that produce the output in the lssvmlab directory in src/matlab/contrib.<br />
<br />
=== I get an error "Undefined function or method 'trainlssvm' for input arguments of type 'cell'" ===<br />
<br />
You most likely forgot to [[Installation#Extension_pack|install the extension pack]].<br />
<br />
=== When running the SUMO-Toolbox under Linux, the [http://en.wikipedia.org/wiki/X_Window_System X server] suddenly restarts and I am logged out of my session ===<br />
<br />
Note that in Linux there is an explicit difference between the [http://en.wikipedia.org/wiki/Linux_kernel kernel] and the [http://en.wikipedia.org/wiki/X_Window_System X display server]. If the kernel crashes or panics your system completely freezes (you have to reset manually) or your computer does a full reboot. Luckily this is very rare. However, if you display server (X) crashes or restarts it means your operating system is still running fine, its just that you have to log in again since your graphical session has terminated. The FAQ entry is only for the latter. If you find your kernel is panicing or freezing, that is a more fundamental problem and you should contact your system admin.<br />
<br />
So what happens is that after a few seconds when the toolbox wants to plot the first model [http://en.wikipedia.org/wiki/X_Window_System X] crashes and you are suddenly presented with a login screen. The problem is not due to SUMO but rather to the Matlab - Display server interaction.<br />
<br />
What you should first do is set plotModels to false in the [[Config:ContextConfig]] tag, run again and see if the problem occurs again. If it does please [[Reporting_problems| report it]]. If the problem does not occur you can then try the following:<br />
<br />
* Log in as root (or use [http://en.wikipedia.org/wiki/Sudo sudo])<br />
* Edit the following configuration file using a text editor (pico, nano, vi, kwrite, gedit,...)<br />
<br />
<source lang="bash"><br />
/etc/X11/xorg.conf<br />
</source><br />
<br />
Note: the exact location of the xorg.conf file may vary on your system.<br />
<br />
* Look for the following line:<br />
<br />
<source lang="bash"><br />
Load "glx"<br />
</source><br />
<br />
* Comment it out by replacing it by:<br />
<br />
<source lang="bash"><br />
# Load "glx"<br />
</source><br />
<br />
* Then save the file, restart your X server (if you do not know how to do this simply reboot your computer)<br />
* Log in again, and try running the toolbox (making sure plotModels is set to true again). It should now work. If it still does not please [[Reporting_problems| report it]].<br />
<br />
Note:<br />
* this is just an empirical workaround, if you have a better idea please [[Contact|let us know]]<br />
* if you wish to debug further yourself please check the Xorg log files and those in /var/log<br />
* another possible workaround is to start matlab with the "-nodisplay" option. That could work as well.<br />
<br />
=== I get the error "Failed to close Matlab pool cleanly, error is Too many output arguments" ===<br />
<br />
This happens if you run the toolbox on Matlab version 2008a and you have the parallel computing toolbox installed. You can simply ignore this error message, it does not cause any problems. If you want to use SUMO with the parallel computing toolbox you will need Matlab 2008b.<br />
<br />
=== The toolbox seems to keep on running forever, when or how will it stop? ===<br />
<br />
The toolbox will keep on generating models and selecting data until one of the termination criteria has been reached. It is up to ''you'' to choose these targets carefully, so how low the toolbox runs simply depends on what targets you choose. Please see [[Running#Understanding_the_control_flow]].<br />
<br />
Of course choosing a-priori targets up front is not always easy and there is no real solution for this, except thinking well about what type of model you want (see [[FAQ#I_dont_like_the_final_model_generated_by_SUMO_how_do_I_improve_it.3F]]). In doubt you can always use a small value (or 0) and then simply quit the running toolbox using Ctrl-C when you think its been enough.<br />
<br />
While one could implement fancy, automatic stopping algorithms, their actual benefit is questionable.</div>Dgorissenhttp://sumowiki.intec.ugent.be/index.php?title=FAQ&diff=5210FAQ2010-08-23T08:20:30Z<p>Dgorissen: /* How do I start a run with the samples of of a previous run, or with a custom initial design? */</p>
<hr />
<div>== General ==<br />
<br />
=== What is a global surrogate model? ===<br />
<br />
A global [http://en.wikipedia.org/wiki/Surrogate_model surrogate model] is a mathematical model that mimics the behavior of a computationally expensive simulation code over '''the complete parameter space''' as accurately as possible, using as little data points as possible. So note that optimization is not the primary goal, although it can be done as a post-processing step. Global surrogate models are useful for:<br />
<br />
* design space exploration, to get a ''feel'' of how the different parameters behave<br />
* sensitivity analysis<br />
* ''what-if'' analysis<br />
* prototyping<br />
* visualization<br />
* ...<br />
<br />
In addition they are a cheap way to model large scale systems, multiple global surrogate models can be chained together in a model cascade.<br />
<br />
See also the [[About]] page.<br />
<br />
=== What about surrogate driven optimization? ===<br />
<br />
When coining the term '''surrogate driven optimization''' most people associate it with trust-region strategies and simple polynomial models. These frameworks first construct a local surrogate which is optimized to find an optimum. Afterwards, a move limit strategy decides how the local surrogate is scaled and/or moved through the input space. Subsequently the surrogate is rebuild and optimized. I.e. the surrogate zooms in to the global optimum. For instance the [http://www.cs.sandia.gov/DAKOTA/ DAKOTA] Toolbox implements such strategies where the surrogate construction is separated from optimization.<br />
<br />
Such a framework was earlier implemented in the SUMO Toolbox but was deprecated as it didn't fit the philosophy and design of the toolbox. <br />
<br />
Instead another, equally powerful, approach was taken. The current optimization framework is in fact a sampling selection strategy that balances local and global search. In other words, it balances between exploring the input space and exploiting the information the surrogate gives us.<br />
<br />
A configuration example can be found [[Config:SampleSelector#expectedImprovement|here]].<br />
<br />
=== What is (adaptive) sampling? Why is it used? ===<br />
<br />
In classical Design of Experiments you need to specify the design of your experiment up-front. Or in other words, you have to say up-front how many data points you need and how they should be distributed. Two examples are Central Composite Designs and Latin Hypercube designs. However, if your data is expensive to generate (e.g., an expensive simulation code) it is not clear how many points are needed up-front. Instead data points are selected adaptively, only a couple at a time. This process of incrementally selecting new data points in regions that are the most interesting is called adaptive sampling, sequential design, or active learning. Of course the sampling process needs to start from somewhere so the very first set of points is selected based on a fixed, classic experimental design. See also [[Running#Understanding_the_control_flow]].<br />
SUMO provides a number of different sampling algorithms: [[SampleSelector]]<br />
<br />
Of course sometimes you dont want to do sampling. For example if you have a fixed dataset you just want to load all the data in one go and model that. For how to do this see [[FAQ#How_do_I_turn_off_adaptive_sampling_.28run_the_toolbox_for_a_fixed_set_of_samples.29.3F]].<br />
<br />
=== What about dynamical, time dependent data? ===<br />
<br />
The original design and purpose was to tackle static input-output systems, where there is no memory. Just a complex mapping that must be learnt and approximated. Of course you can take a fixed time interval and apply the toolbox but that typically is not a desired solution. Usually you are interested in time series prediction, e.g., given a set of output values from time t=0 to t=k, predict what happens at time t=k+1,k+2,...<br />
<br />
The toolbox was originally not intended for this purpose. However, it is quite easy to add support for recurrent models. Automatic generation of dynamical models would involve adding a new model type (just like you would add a new regression technique) or require adapting an existing one. For example it would not be too much work to adapt the ANN or SVM models to support dynamic problems. The only extra work besides that would be to add a new [[Measures|Measure]] that can evaluate the fidelity of the models' prediction.<br />
<br />
Naturally though, you would be unable to use sample selection (since it makes no sense in those problems). Unless of course there is a specialized need for it. In that case you would add a new [[SampleSelector]].<br />
<br />
For more information on this topic [[Contact]] us.<br />
<br />
=== What about classification problems? ===<br />
<br />
The main focus of the SUMO Toolbox is on regression/function approximation. However, the framework for hyperparameter optimization, model selection, etc. can also be used for classification. Starting from version 6.3 a demo file is included in the distribution that shows how this works on a well known test problem. If you want to play around with this feature without waiting for 6.3 to be released [[Contact|just let us know]].<br />
<br />
=== Can the toolbox drive my simulation code directly? ===<br />
<br />
Yes it can. See the [[Interfacing with the toolbox]] page.<br />
<br />
=== What is the difference between the M3-Toolbox and the SUMO-Toolbox? ===<br />
<br />
The SUMO toolbox is a complete, feature-full framework for automatically generating approximation models and performing adaptive sampling. In contrast, the M3-Toolbox was more of a proof-of-principle.<br />
<br />
=== What happened to the M3-Toolbox? ===<br />
<br />
The M3 Toolbox project has been discontinued (Fall 2007) and superseded by the SUMO Toolbox. Please contact tom.dhaene@ua.ac.be for any inquiries and requests about the M3 Toolbox.<br />
<br />
=== How can I stay up to date with the latest news? ===<br />
<br />
To stay up to date with the latest news and releases, we also recommend subscribing to our newsletter [http://www.sumo.intec.ugent.be here]. Traffic will be kept to a minimum (1 message every 2-3 months) and you can unsubscribe at any time.<br />
<br />
You can also follow our blog: [http://sumolab.blogspot.com/ http://sumolab.blogspot.com/].<br />
<br />
=== What is the roadmap for the future? ===<br />
<br />
There is no explicit roadmap since much depends on where our research leads us, what feedback we get, which problems we are working on, etc. However, to get an idea of features to come you can always check the [[Whats new]] page.<br />
<br />
You can also follow our blog: [http://sumolab.blogspot.com/ http://sumolab.blogspot.com/].<br />
<br />
=== Will there be an R/Scilab/Octave/Sage/.. version? ===<br />
<br />
At the start of the project we considered moving from Matlab to one of the available open source alternatives. However, after much discussion we decided against this for several reasons, including:<br />
<br />
* Existing experience and know-how of the development team<br />
* The widespread use of the Matlab platform in the target application domains<br />
* The quality and amount of available Matlab documentation<br />
* The quality and number of Matlab toolboxes<br />
* Support for object orientation (inheritance, polymorphism, etc.)<br />
* Many well documented interfacing options (especially the seamless integration with Java)<br />
<br />
Matlab, as a proprietary platform, definitely has its problems and deficiencies but the number of advanced algorithms and available toolboxes make it a very attractive platform. Equally important is the fact that every function is properly documented, tested, and includes examples, tutorials, and in some cases GUI tools. A lot of things would have been a lot harder and/or time consuming to implement on one of the other platforms. Add to that the fact that many engineers (particularly in aerospace) already use Matlab quite heavily. Thus given our situation, goals, and resources at the time, Matlab was the best choice for us. <br />
<br />
The other platforms remain on our radar however, and we do look into them from time to time. Though, with our limited resources porting to one of those platforms is not (yet) cost effective.<br />
<br />
=== What are collaboration options? ===<br />
<br />
We will gladly help out with any SUMO-Toolbox related questions or problems. However, since we are a university research group the most interesting goal for us is to work towards some joint publication (e.g., we can help with the modeling of your problem). Alternatively, it is always nice if we could use your data/problem (fully referenced and/or anonymized if necessary of course) as an example application during a conference presentation or in a PhD thesis.<br />
<br />
The most interesting case is if your problem involves sample selection and modeling. This means you have some simulation code or script to drive and you want an accurate model while minimizing the number of data points. In this case, in order for us to optimally help you it would be easiest if we could run your simulation code (or script) locally or access it remotely. Else its difficult to give good recommendations about what settings to use.<br />
<br />
If this is not possible (e.g., expensive, proprietary or secret modeling code) or if your problem does not involve sample selection, you can send us a fixed data set that is representative of your problem. Again, this may be fully anonymized and will be kept confidential of course.<br />
<br />
In either case (code or dataset) remember:<br />
<br />
* the data file should be an ASCII file in column format (each row containing one data point) (see also [[Interfacing_with_the_toolbox]])<br />
* include a short description of your data:<br />
** number of inputs and number of outputs<br />
** the range of each input (or scaled to [-1 1] if you do not wish to disclose this)<br />
** if the outputs are real or complex valued<br />
** how noisy the data is or if it is completely deterministic (computer simulation) (please also see: [[FAQ#My_data_contains_noise_can_the_SUMO-Toolbox_help_me.3F]]).<br />
** if possible the expected range of each output (or scaled if you do not wish to disclose this)<br />
** if possible the names of each input/output + a short description of what they mean<br />
** any further insight you have about the data, expected behavior, expected importance of each input, etc.<br />
<br />
If you have any further questions or comments related to this please [[Contact]] us.<br />
<br />
=== Can you help me model my problem? ===<br />
<br />
Please see the previous question: [[FAQ#What_are_collaboration_options.3F]]<br />
<br />
== Installation and Configuration ==<br />
<br />
=== What is the relationship between Matlab and Java? ===<br />
<br />
Many people do not know this, but your Matlab installation automatically includes a Java virtual machine. By default, Matlab seamlessly integrates with Java, allowing you to create Java objects from the command line (e.g., 's = java.lang.String'). It is possible to disable java support but in order to use the SUMO Toolbox it should not be. To check if Java is enabled you can use the 'usejava' command.<br />
<br />
=== What is Java, why do I need it, do I have to install it, etc. ? ===<br />
<br />
The short answer is: no, dont worry about it. The long answer is: Some of the code of the SUMO Toolbox is written in [http://en.wikipedia.org/wiki/Java_(programming_language) Java], since it makes a lot more sense in many situations and is a proper programming language instead of a scripting language like Matlab. Since Matlab automatically includes a JVM to run Java code there is nothing you need to do or worry about (see the previous FAQ entry). Unless its not working of course, in that case see [[FAQ#When_running_the_toolbox_you_get_something_like_.27.3F.3F.3F_Undefined_variable_.22ibbt.22_or_class_.22ibbt.sumo.config.ContextConfig.setRootDirectory.22.27]].<br />
<br />
=== What is XML? ===<br />
<br />
XML stands for eXtensible Markup Language and is related to HTML (= the stuff web pages are written in). The first thing you have to understand is that '''does not do anything'''. Honest. Many engineers are not used to it and think it is some complicated computer programming language-stuff-thingy. This is of course not the case (we ignore some of the fancy stuff you can do with it for now). XML is a markup language meaning, it provides some rules how you can annotate or structure existing text.<br />
<br />
The way SUMO uses XML is really simple and there is not much to understand. First some simple terminology. Take the following example:<br />
<br />
<source lang="xml"><br />
<Foo attr="bar">bla bla bla</Foo> <br />
</source><br />
<br />
Here we have '''a tag''' called ''Foo'' containing text ''bla bla bla''. The tag Foo also has an '''attribute''' ''attr'' with value ''bar''. '<Foo>' is what we call the '''opening tag''', and '</Foo>' is the '''closing tag'''. Each time you open a tag you must close it again. How you name the tags or attributes it totally up to you, you choose :)<br />
<br />
Lets take a more interesting example. Here we have used XML to represent information about a receipe for pancakes:<br />
<br />
<source lang="xml"><br />
<recipe category="dessert"><br />
<title>Pancakes</title><br />
<author>sumo@intec.ugent.be</author><br />
<date>Wed, 14 Jun 95</date><br />
<description><br />
Good old fashioned pancakes.<br />
</description><br />
<ingredients><br />
<item><br />
<amount>3</amount><br />
<type>eggs</type><br />
</item><br />
<br />
<item><br />
<amount>0.5 tablespoon</amount><br />
<type>salt</type><br />
</item><br />
...<br />
</ingredients><br />
<preparation><br />
...<br />
</preparation><br />
</recipe><br />
</source><br />
<br />
So basically, you see that XML is just a way to structure, order, and group information. Thats it! So SUMO basically uses it to store and structure configuration options. And this works well due to the nice hierarchical nature of XML.<br />
<br />
If you understand this there is nothing else to it in order to be able to understand the SUMO configuration files. If you need more information see the tutorial here: [http://www.w3schools.com/XML/xml_whatis.asp http://www.w3schools.com/XML/xml_whatis.asp]. You can also have a look at the wikipedia page here: [http://en.wikipedia.org/wiki/XML http://en.wikipedia.org/wiki/XML]<br />
<br />
=== Why does SUMO use XML? ===<br />
<br />
XML is the defacto standard way of structuring information. This ranges from spreadsheet files (Microsoft Excel for example), to configuration data, to scientific data, ... There are even whole database systems based solely on XML. So basically, its an intuitive way to structure data and it is used everywhere. This makes that there are a very large number of libraries and programming languages available that can parse, and handle XML easily. That means less work for the programmer. Then of course there is stuff like XSLT, XQuery, etc that makes life even easier.<br />
So basically, it would not make sense for SUMO to use any other format :)<br />
<br />
=== I get an error that SUMO is not yet activated ===<br />
<br />
Make sure you installed the activation file that was mailed to you as is explained in the [[Installation]] instructions. Also double check your system meets the [[System requirements]] and that [http://www.sumowiki.intec.ugent.be/index.php/FAQ#When_running_the_toolbox_you_get_something_like_.27.3F.3F.3F_Undefined_variable_.22ibbt.22_or_class_.22ibbt.sumo.config.ContextConfig.setRootDirectory.22.27|java java is enabled]. To fully verify that the activation file installation is correct ensure that the file ContextConfig.class is present in the directory ''<SUMO installation directory>/bin/java/ibbt/sumo/config''.<br />
<br />
Please note that more flexible research licenses are available if it is possible to [[FAQ#What_are_collaboration_options.3F|collaborate in any way]].<br />
<br />
== Upgrading ==<br />
<br />
=== How do I upgrade to a newer version? ===<br />
<br />
Delete your old <code><SUMO-Toolbox-directory></code> completely and replace it by the new one. Install the new activation file / extension pack as before (see [[Installation]]), start Matlab and make sure the default run works. To port your old configuration files to the new version: make a copy of default.xml (from the new version) and copy over your custom changes (from the old version) one by one. This should prevent any weirdness if the XML structure has changed between releases.<br />
<br />
If you had a valid activation file for the previous version, just [[Contact]] us (giving your SUMOlab website username) and we will send you a new activation file. Note that to update an activation file you must first unzip a copy of the toolbox to a new directory and install the activation file as if it was the very first time. Upgrading of an activation file without performing a new toolbox install is (unfortunately) not (yet) supported.<br />
<br />
== Using ==<br />
<br />
=== I have no idea how to use the toolbox, what should I do? ===<br />
<br />
See: [[Running#Getting_started]]<br />
<br />
=== I want to try one of the different examples ===<br />
<br />
See [[Running#Running_different_examples]].<br />
<br />
=== I want to model my own problem ===<br />
<br />
See : [[Adding an example]].<br />
<br />
=== I want to contribute some data/patch/documentation/... ===<br />
<br />
See : [[Contributing]].<br />
<br />
=== How do I interface with the SUMO Toolbox? ===<br />
<br />
See : [[Interfacing with the toolbox]].<br />
<br />
=== What configuration options (model type, sample selection algorithm, ...) should I use for my problem? ===<br />
<br />
See [[General_guidelines]].<br />
<br />
=== Ok, I generated a model, what can I do with it? ===<br />
<br />
See: [[Using a model]].<br />
<br />
=== How can I share a model created by the SUMO Toolbox? ===<br />
<br />
See : [[Using a model#Model_portability| Model portability]].<br />
<br />
=== I dont like the final model generated by SUMO how do I improve it? ===<br />
<br />
Before you start the modeling you should really ask youself this question: ''What properties do I want to see in the final model?'' You have to think about what for you constitutes a good model and what constitutes a poor model. Then you should rank those properties depending on how important you find them. Examples are:<br />
<br />
* accuracy in the training data<br />
** is it important that the error in the training data is exactly 0, or do you prefer some smoothing<br />
* accuracy outside the training data<br />
** this is the validation or test error, how important is proper generalization (usually this is very important)<br />
* what does accuracy mean to you? a low maximum error, a low average error, both, ...<br />
* smoothness<br />
** should your model be perfectly smooth or is it acceptable that you have a few small ripples here and there for example<br />
* are some regions of the response more important than others?<br />
** for example you may want to be certain that the minima/maxima are captured very accurately but everything in between is less important<br />
* are there particular special features that your model should have<br />
** for example, capture underlying poles or discontinuities correctly<br />
* extrapolation capability<br />
* ...<br />
<br />
It is important to note that often these criteria may be conflicting. The classical example is fitting noisy data: the lower your training error the higher your testing error. A natural approach is to combine multiple criteria, see [[Multi-Objective Modeling]].<br />
<br />
Once you have decided on a set of requirements the question is then, can the SUMO-Toolbox produce a model that meets them? In SUMO model generation is driven by one or more [[Measures]]. So you should choose the combination of [[Measures]] that most closely match your requirements. Of course we can not provide a Measure for every single property, but it is very straightforward to [[Add_Measure|add your own Measure]].<br />
<br />
Now, lets say you have chosen what you think are the best Measures but you are still not happy with the final model. Reasons could be:<br />
<br />
* you need more modeling iterations or you need to build more models per iteration (see [[Running#Understanding_the_control_flow]]). This will result in a more extensive search of the model parameter space, but will take longer to run.<br />
* you should switch to a different model parameter optimization algorithm (e.g., for example instead of the Pattern Search variant, try the Genetic Algorithm variant of your AdaptiveModelBuilder.)<br />
* the model type you are using is not ideally suited to your data<br />
* there simply is not enough data, use a larger initial design or perform more sampling iterations to get more information per dimension<br />
* maybe the sample distribution is causing troubles for your model (e.g., Kriging can have problems with clustered data). In that case it could be worthwhile to choose a different sample selection algorithm.<br />
* the range of your response variable is not ideal (for example, neural networks have trouble modeling data if the range of the outputs is very very small)<br />
<br />
You may also refer to the following [[General_guidelines]]. Finally, of course it may be that your problem is simply a very difficult one and does not approximate well. But, still you should at least get something satisfactory.<br />
<br />
If you are having these kinds of problems, please [[Reporting_problems|let us know]] and we will gladly help out.<br />
<br />
=== My data contains noise can the SUMO-Toolbox help me? ===<br />
<br />
The original purpose of the SUMO-Toolbox was for it to be used in conjunction with computer simulations. Since these are fully deterministic you do not have to worry about noise in the data and all the problems it causes. However, the methods in the toolbox are general fitting methods that work on noisy data as well. So yes, the toolbox can be used with noisy data, but you will just have to be more careful about how you apply the methods and how you perform model selection. Its only when you use the toolbox with a noisy simulation engine that a few special options may need to be set. In that case [[Contact]] us for more information.<br />
<br />
Note though, that the toolbox is not a statistical package, if you have noisy data and you need noise estimation algorithms, kernel smoothing algorithms, etc. you should look towards other tools.<br />
<br />
=== What is the difference between a ModelBuilder and a ModelFactory? ===<br />
<br />
See [[Add Model Type]].<br />
<br />
=== Why are the Neural Networks so slow? ===<br />
<br />
The ANN models are an extremely powerful model type that give very good results in many problems. However, they are quite slow to use. There are some things you can do:<br />
<br />
* use trainlm or trainscg instead of the default training function trainbr. trainbr gives very good, smooth results but is slower to use. If results with trainlm are not good enough, try using msereg as a performance function.<br />
* try setting the training goal (= the SSE to reach during training) to a small positive number (e.g., 1e-5) instead of 0.<br />
* check that the output range of your problem is not very small. If your response data lies between 10e-5 and 10e-9 for example it will be very hard for the neural net to learn it. In that case rescale your data to a more sane range.<br />
* switch from ANN to one of the other neural network modelers: fanngenetic or nanngenetic. These are a lot faster than the default backend based on the [http://www.mathworks.com/products/neuralnet/ Matlab Neural Network Toolbox]. However, the accuracy is usually not as good.<br />
* If you are using [[Measures#CrossValidation| CrossValidation]] try to switch to a different measure since CrossValidation is very expensive to use. CrossValidation is used by default if you have not defined a [[Measures| measure]] yourself. When using one of the neural network model types, try to use a different measure if you can. For example, our tests have shown that minimizing the sum of [[Measures#SampleError| SampleError]] and [[Measures#LRMMeasure| LRMMeasure]] can give equal or even better results than CrossValidation, while being much cheaper (see [[Multi-Objective Modeling]] for how to combine multiple measures). See also the comments in <code>default.xml</code> for examples.<br />
<br />
See also [[FAQ#How_can_I_make_the_toolbox_run_faster.3F]]<br />
<br />
=== How can I make the toolbox run faster? ===<br />
<br />
There are a number of things you can do to speed things up. These are listed below. Remember though that the main reason the toolbox may seem to be slow is due to the many models being built as part of the hyperparameter optimization. Please make sure you fully understand the [[Running#Understanding_the_control_flow|control flow described here]] before trying more advanced options.<br />
<br />
* First of all check that your virus scanner is not interfering with Matlab. If McAfee or any other program wants to scan every file SUMO generates this really slows things down and your computer becomes unusable.<br />
<br />
* Turn off the plotting of models in [[Config:ContextConfig#PlotOptions| ContextConfig]], you can always generate plots from the saved mat files<br />
<br />
* This is an important one. For most model builders there is an option "maxFunEals", "maxIterations", or equivalent. Change this value to change the maximum number of models built between 2 sampling iterations. The higher this number, the slower, but the better the models ''may'' be. Equivalently, for the Genetic model builders reduce the population size and the number of generations.<br />
<br />
* If you are using [[Measures#CrossValidation]] see if you can avoid it and use one of the other measures or a combination of measures (see [[Multi-Objective Modeling]]<br />
<br />
* If you are using a very dense [[Measures#ValidationSet]] as your Measure, this means that every single model will be evaluated on that data set. For some models like RBF, Kriging, SVM, this can slow things down.<br />
<br />
* Disable some, or even all of the [[Config:ContextConfig#Profiling| profilers]] or disable the output handlers that draw charts. For example, you might use the following configuration for the profilers:<br />
<br />
<source lang="xml"><br />
<Profiling><br />
<Profiler name=".*share.*|.*ensemble.*|.*Level.*" enabled="true"><br />
<Output type="toImage"/><br />
<Output type="toFile"/><br />
</Profiler><br />
<br />
<Profiler name=".*" enabled="true"><br />
<Output type="toFile"/><br />
</Profiler><br />
</Profiling><br />
</source><br />
<br />
The ".*" means match any one or more characters ([http://java.sun.com/j2se/1.5.0/docs/api/java/util/regex/Pattern.html see here for the full list of supported wildcards]). Thus in this example all the profilers that have "share", "ensemble", or "Level" in their name shoud be enabled and should be saved as a text file (toFile) AND as an image file (toImage). All the other profilers should be saved just to file. The idea is to only save to image what you want as an image since image generation is expensive. If you do this or switch off image generation completely you will see everything run much faster.<br />
<br />
* Decrease the logging granularity, a log level of FINE (the default is FINEST or ALL) is more then granular enough. Setting it to FINE, INFO, or even WARNING should speed things up.<br />
<br />
* If you have a multi-core/multi-cpu machine:<br />
** if you have the Matlab Parallel Computing Toolbox, try setting the parallelMode option to true in [[Config:ContextConfig]]. Now all model training occurs in parallel. This may give unexpected errors in some cases so beware when using.<br />
** if you are using a native executable or script as the sample evaluator set the threadCount variable in [[Config:SampleEvaluator#LocalSampleEvaluator| LocalSampleEvaluator]] equal to the number of cores/CPUs (only do this if it is ok to start multiple instances of your simulation script in parallel!)<br />
<br />
* Dont use the Min-Max measure, it can slow things down. See also [[FAQ#How_do_I_force_the_output_of_the_model_to_lie_in_a_certain_range]]<br />
<br />
* If you are using neural networks see [[FAQ#Why_are_the_Neural_Networks_so_slow.3F]]<br />
<br />
* If you are having problems with very slow or seemingly hanging runs:<br />
** Do a run inside the [http://www.mathworks.com/access/helpdesk/help/techdoc/index.html?/access/helpdesk/help/techdoc/matlab_env/f9-17018.html&http://www.google.be/search?client=firefox-a&rls=org.mozilla%3Aen-US%3Aofficial&channel=s&hl=nl&q=matlab+profiler&meta=&btnG=Google+zoeken Matlab profiler] and see where most time is spent.<br />
<br />
** Monitor CPU and physical/virtual memory usage while the SUMO toolbox is running and see if you notice anything strange. <br />
<br />
* Also note that by default Matlab only allocates about 117 MB memory space for the Java Virtual Machine. If you would like to increase this limit (which you should) please follow the instructions [http://www.mathworks.com/support/solutions/data/1-18I2C.html?solution=1-18I2C here]. See also the general memory instructions [http://www.mathworks.com/support/tech-notes/1100/1106.html here].<br />
<br />
To check if your SUMO run has hanged, monitor your log file (with the level set at least to FINE). If you see no changes for about 30 minutes the toolbox will probably have stalled. [[Reporting problems| report the problems here]].<br />
<br />
Such problems are hard to identify and fix so it is best to work towards a reproducible test case if you think you found a performance or scalability issue.<br />
<br />
=== How do I build models with more than one output ===<br />
<br />
Sometimes you have multiple responses that you want to model at once. See [[Running#Models_with_multiple_outputs]]<br />
<br />
=== How do I turn off adaptive sampling (run the toolbox for a fixed set of samples)? ===<br />
<br />
See : [[Adaptive Modeling Mode]].<br />
<br />
=== How do I change the error function (relative error, RMSE, ...)? ===<br />
<br />
The [[Measures| <Measure>]] tag specifies the algorithm to use to assign models a score, e.g., [[Measures#CrossValidation| CrossValidation]]. It is also possible to specify which '''error function''' to use, in the measure. The default error function is '<code>rootRelativeSquareError</code>'.<br />
<br />
Say you want to use [[Measures#CrossValidation| CrossValidation]] with the maximum absolute error, then you would put:<br />
<br />
<source lang="xml"><br />
<Measure type="CrossValidation" target="0.001" errorFcn="maxAbsoluteError"/><br />
</source><br />
<br />
On the other hand, if you wanted to use the [[Measures#ValidationSet| ValidationSet]] measure with a relative root-mean-square error you would put:<br />
<br />
<source lang="xml"><br />
<Measure type="ValidationSet" target="0.001" errorFcn="relativeRms"/><br />
</source><br />
<br />
The default error function is '<code>rootRelativeSquareError</code>'. These error functions can be found in the <code>src/matlab/tools/errorFunctions</code> directory. You are free to modify them and add your own. Remember that the choice of error function is very important! Make sure you think well about it. Also see [[Multi-Objective Modeling]].<br />
<br />
=== How do I enable more profilers? ===<br />
<br />
Go to the [[Config:ContextConfig#Profiling| <Profiling>]] tag and put <code>"<nowiki>.*</nowiki>"</code> as the regular expression. See also the next question.<br />
<br />
=== What regular expressions can I use to filter profilers? ===<br />
<br />
See the syntax [http://java.sun.com/j2se/1.5.0/docs/api/java/util/regex/Pattern.html here].<br />
<br />
=== How can I ensure deterministic results? ===<br />
<br />
See : [[Random state]].<br />
<br />
=== How do I get a simple closed-form model (symbolic expression)? ===<br />
<br />
See : [[Using a model]].<br />
<br />
=== How do I enable the Heterogenous evolution to automatically select the best model type? ===<br />
<br />
Simply use the [[Config:AdaptiveModelBuilder#heterogenetic| heterogenetic modelbuilder]] as you would any other.<br />
<br />
=== What is the combineOutputs option? ===<br />
<br />
See [[Running#Models_with_multiple_outputs]]<br />
<br />
=== What error function should I use? ===<br />
<br />
The default error function is the Root Relative Square Error (RRSE). On the other hand meanRelativeError may be more intuitive but in that case you have to be careful if you have function values close to zero since in that case the relative error explodes or even gives infinity. You could also use one of the combined relative error functions (contain a +1 in the denominator to account for small values) but then you get something between a relative and absolute error (=> hard to interpret).<br />
<br />
So to be sure an absolute error seems the safest bet (like the RMSE), however in that case you have to come up with sensible accuracy targets and realize that you will build models that try to fit the regions of high absolute value better than the low ones.<br />
<br />
Picking an error function is a very tricky business and many people do not realize this. Which one is best for you and what targets you use ultimately depends on your application and on what kind of model you want. There is no general answer.<br />
<br />
A recommended read is [http://www.springerlink.com/content/24104526223221u3/ is this paper]. See also the page on [[Multi-Objective Modeling]].<br />
<br />
=== I just want to generate an initial design (no sampling, no modeling) ===<br />
<br />
Do a regular SUMO run, except set the 'maxModelingIterations' in the SUMO tag to 0. The resulting run will only generate (and evaluate) the initial design and save it to samples.txt in the output directory.<br />
<br />
=== How do I start a run with the samples of of a previous run, or with a custom initial design? ===<br />
<br />
Use a Dataset design component, for example:<br />
<br />
<source lang="xml"><br />
<InitialDesign type="DatasetDesign"><br />
<Option key="file" value="/path/to/the/file/containing/the/points.txt"/><br />
</InitialDesign><br />
</source><br />
<br />
As a sidenote, remark you can start the toolbox with *data points* of a previous run, but not with the *models* of a previous run.<br />
<br />
=== What is a level plot? ===<br />
<br />
A level plot is a plot that shows how the error histogram changes as the best model improves. An example is:<br />
<gallery><br />
Image:levelplot.png<br />
</gallery><br />
Level plots only work if you have a separate dataset (test set) that the model can be checked against. See the comments in default.xml for how to enable level plots.<br />
<br />
===I am getting a java out of memory error, what happened?===<br />
Datasets are loaded through java. This means that the java heap space is used for storing the data. If you try to load a huge dataset (> 50MB), you might experience problems with the maximum heap size. You can solve this by raising the heap size as described on the following webpage:<br />
[http://www.mathworks.com/support/solutions/data/1-18I2C.html]<br />
<br />
=== How do I force the output of the model to lie in a certain range ===<br />
<br />
See [[Measures#MinMax]].<br />
<br />
=== My problem is high dimensional and has a lot of input parameters (more than 10). Can I use SUMO? ===<br />
<br />
That depends. Remember that the main focus of SUMO is to generate accurate 'global' models. If you want to do sampling the practical dimensionality is limited to around 6-8 (though it depends on the problem and how cheap the simulations are!). Since the more dimensions the more space you need to fill. At that point you need to see if you can extend the models with domain specific knowledge (to improve performance) or apply a dimensionality reduction method ([[FAQ#Can_the_toolbox_tell_me_which_are_the_most_important_inputs_.28.3D_variable_selection.29.3F|see the next question]]). On the other hand, if you don't need to do sample selection but you have a fixed dataset which you want to model. Then the performance on high dimensional data just depends on the model type. For examples SVM type models are independent of the dimension and thus can always be applied. Though things like feature selection are always recommended.<br />
<br />
=== Can the toolbox tell me which are the most important inputs (= variable selection)? ===<br />
<br />
When tackling high dimensional problems a crucial question is "Are all my input parameters relevant?". Normally domain knowledge would answer this question but this is not always straightforward. In those cases a whole set of algorithms exist for doing dimensionality reduction (= feature selection). Support for some of these algorithms may eventually make it into the toolbox but are not currently implemented. That is a whole PhD thesis on its own. However, if a model type provides functions for input relevance determination the toolbox can leverage this. For example, the LS-SVM model available in the toolbox supports Automatic Relevance Determination (ARD). This means that if you use the SUMO Toolbox to generate an LS-SVM model, you can call the function ''ARD()'' on the model and it will give you a list of the inputs it thinks are most important.<br />
<br />
=== Should I use a Matlab script or a shell script for interfacing with my simulation code? ===<br />
<br />
When you want to link SUMO with an external simulation engine (ADS Momentum, SPECTRE, FEBIO, SWAT, ...) you need a [http://en.wikipedia.org/wiki/Shell_script shell script] (or executable) that can take the requested points from SUMO, setup the simulation engine (e.g., set necessary input files), calls the simulator for all the requested points, reads the output (e.g., one or more output files), and returns the results to SUMO (see [[Interfacing with the toolbox]]).<br />
<br />
Which one you choose (matlab script + [[Config:SampleEvaluator#matlab|Matlab Sample Evaluator]], or shell script/executable with [[Config:SampleEvaluator#local|Local Sample Evaluator]] is basically a matter of preference, take whatever is easiest for you.<br />
<br />
HOWEVER, there is one important consideration: Matlab does not support threads so this means that if you use a matlab script to interface with the simulation engine, simulations and modeling will happen sequentially, NOT in parallel. This means the modeling code will sit around waiting, doing nothing, until the simulation(s) have finished. If your simulation code takes a long time to run this is not very efficient.<br />
<br />
On the other hand, using a shell script/executable, does allow the modeling and simulation to occur in parallel (at least if you wrote your interface script in such a way that it can be run multiple times in parallel, i.e., no shared global directories or variables that can cause [http://en.wikipedia.org/wiki/Race_condition race conditions]).<br />
<br />
As a sidenote, note that if you already put work into a Matlab script, it is still possible to use a shell script, by writing a shell script that starts Matlab (using -nodisplay or -nojvm options), executes your script (using the -r option), and exits Matlab again. Of course it is not very elegant and adds some overhead but depending on your situation it may be worth it.<br />
<br />
=== How can I look at the internal structure of a SUMO model ===<br />
<br />
See [[Using_a_model#Available_methods]].<br />
<br />
=== Is there any design documentation available? ===<br />
<br />
An in depth overview of the rationale and philosophy, including a treatment of the software architecture underlying the SUMO Toolbox is available in the form of a PhD dissertation. A copy of this dissertation [http://www.sumo.intec.ugent.be/?q=system/files/2010_04_PhD_DirkGorissen.pdf is available here].<br />
<br />
== Troubleshooting ==<br />
<br />
=== I have a problem and I want to report it ===<br />
<br />
See : [[Reporting problems]].<br />
<br />
=== I sometimes get flat models when using rational functions ===<br />
<br />
First make sure the model is indeed flat, and does not just appear so on the plot. You can verify this by looking at the output axis range and making sure it is within reasonable bounds. When there are poles in the model, the axis range is sometimes stretched to make it possible to plot the high values around the pole, causing the rest of the model to appear flat. If the model contains poles, refer to the next question for the solution.<br />
<br />
The [[Config:AdaptiveModelBuilder#rational| RationalModel]] tries to do a least squares fit, based on which monomials are allowed in numerator and denominator. We have experienced that some models just find a flat model as the best least squares fit. There are two causes for this:<br />
<br />
* The number of sample points is few, and the model parameters (as explained [[Model types explained#PolynomialModel|here]]) force the model to use only a very small set of degrees of freedom. The solution in this case is to increase the minimum percentage bound in the RationalFactory section of your configuration file: change the <code>"percentBounds"</code> option to <code>"60,100"</code>, <code>"80,100"</code>, or even <code>"100,100"</code>. A setting of <code>"100,100"</code> will force the polynomial models to always exactly interpolate. However, note that this does not scale very well with the number of samples (to counter this you can set <code>"maxDegrees"</code>). If, after increasing the <code>"percentBounds"</code> you still get weird, spiky, models you simply need more samples or you should switch to a different model type.<br />
* Another possibility is that given a set of monomial degrees, the flat function is just the best possible least squares fit. In that case you simply need to wait for more samples.<br />
* The measure you are using is not accurately estimating the true error, try a different measure or error function. Note that a maximum relative error is dangerous to use since a the 0-function (= a flat model) has a lower maximum relative error than a function which overshoots the true behavior in some places but is otherwise correct.<br />
<br />
=== When using rational functions I sometimes get 'spikes' (poles) in my model ===<br />
<br />
When the denominator polynomial of a rational model has zeros inside the domain, the model will tend to infinity near these points. In most cases these models will only be recognized as being `the best' for a short period of time. As more samples get selected these models get replaced by better ones and the spikes should disappear.<br />
<br />
So, it is possible that a rational model with 'spikes' (caused by poles inside the domain) will be selected as best model. This may or may not be an issue, depending on what you want to use the model for. If it doesn't matter that the model is very inaccurate at one particular, small spot (near the pole), you can use the model with the pole and it should perform properly.<br />
<br />
However, if the model should have a reasonable error on the entire domain, several methods are available to reduce the chance of getting poles or remove the possibility altogether. The possible solutions are:<br />
<br />
* Simply wait for more data, usually spikes disappear (but not always).<br />
* Lower the maximum of the <code>"percentBounds"</code> option in the RationalFactory section of your configuration file. For example, say you have 500 data points and if the maximum of the <code>"percentBounds"</code> option is set to 100 percent it means the degrees of the polynomials in the rational function can go up to 500. If you set the maximum of the <code>"percentBounds"</code> option to 10, on the other hand, the maximum degree is set at 50 (= 10 percent of 500). You can also use the <code>"maxDegrees"</code> option to set an absolute bound.<br />
* If you roughly know the output range your data should have, an easy way to eliminate poles is to use the [[Measures#MinMax| MinMax]] [[Measures| Measure]] together with your current measure ([[Measures#CrossValidation| CrossValidation]] by default). This will cause models whose response falls outside the min-max bounds to be penalized extra, thus spikes should disappear.<br />
* Use a different model type (RBF, ANN, SVM,...), as spikes are a typical problem of rational functions.<br />
* Increase the population size if using the genetic version<br />
* Try using the [[SampleSelector#RationalPoleSuppressionSampleSelector| RationalPoleSuppressionSampleSelector]], it was designed to get rid of this problem more quickly, but it only selects one sample at the time.<br />
<br />
However, these solutions may not still not suffice in some cases. The underlying reason is that the order selection algorithm contains quite a lot of randomness, making it prone to over-fitting. This issue is being worked on but will take some time. Automatic order selection is not an easy problem<br />
<br />
=== There is no noise in my data yet the rational functions don't interpolate ===<br />
<br />
[[FAQ#I sometimes get flat models when using rational functions |see this question]].<br />
<br />
=== When loading a model from disk I get "Warning: Class ':all:' is an unknown object class. Object 'model' of this class has been converted to a structure." ===<br />
<br />
You are trying to load a model file without the SUMO Toolbox in your Matlab path. Make sure the toolbox is in your Matlab path. <br />
<br />
In short: Start Matlab, run <code><SUMO-Toolbox-directory>/startup.m</code> (to ensure the toolbox is in your path) and then try to load your model.<br />
<br />
=== When running the SUMO Toolbox you get an error like "No component with id 'annpso' of type 'adaptive model builder' found in config file." ===<br />
<br />
This means you have specified to use a component with a certain id (in this case an AdaptiveModelBuilder component with id 'annpso') but a component with that id does not exist further down in the configuration file (in this particular case 'annpso' does not exist but 'anngenetic' or 'ann' does, as a quick search through the configuration file will show). So make sure you only declare components which have a definition lower down. So see which components are available, simply scroll down the configuration file and see which id's are specified. Please also refer to the [[Toolbox configuration#Declarations and Definitions | Declarations and Definitions]] page.<br />
<br />
=== When using NANN models I sometimes get "Runtime error in matrix library, Choldc failed. Matrix not positive definite" ===<br />
<br />
This is a problem in the mex implementation of the [http://www.iau.dtu.dk/research/control/nnsysid.html NNSYSID] toolbox. Simply delete the mex files, the Matlab implementation will be used and this will not cause any problems.<br />
<br />
=== When using FANN models I sometimes get "Invalid MEX-file createFann.mexa64, libfann.so.2: cannot open shared object file: No such file or directory." ===<br />
<br />
This means Matlab cannot find the [http://leenissen.dk/fann/ FANN] library itself to link to dynamically. Make sure the FANN libraries (stored in src/matlab/contrib/fann/src/.libs/) are in your library path, e.g., on unix systems, make sure they are included in LD_LIBRARY_PATH.<br />
<br />
=== Undeﬁned function or method ’createFann’ for input arguments of type ’double’. ===<br />
<br />
See [[FAQ#When_using_FANN_models_I_sometimes_get_.22Invalid_MEX-file_createFann.mexa64.2C_libfann.so.2:_cannot_open_shared_object_file:_No_such_file_or_directory..22]]<br />
<br />
=== When trying to use SVM models I get 'Error during fitness evaluation: Error using ==> svmtrain at 170, Group must be a vector' ===<br />
<br />
You forgot to build the SVM mex files for your platform. For windows they are pre-compiled for you, on other systems you have to compile them yourself with the makefile.<br />
<br />
=== When running the toolbox you get something like '??? Undefined variable "ibbt" or class "ibbt.sumo.config.ContextConfig.setRootDirectory"' ===<br />
<br />
First see [[FAQ#What_is_the_relationship_between_Matlab_and_Java.3F | this FAQ entry]].<br />
<br />
This means Matlab cannot find the needed Java classes. This typically means that you forgot to run 'startup' (to set the path correctly) before running the toolbox (using 'go'). So make sure you always run 'startup' before running 'go' and that both commands are always executed in the toolbox root directory.<br />
<br />
If you did run 'startup' correctly and you are still getting an error, check that Java is properly enabled:<br />
<br />
# typing 'usejava jvm' should return 1 <br />
# typing 's = java.lang.String', this should ''not'' give an error<br />
# typing 'version('-java')' should return at least version 1.5.0<br />
<br />
If (1) returns 0, then the jvm of your Matlab installation is not enabled. Check your Matlab installation or startup parameters (did you start Matlab with -nojvm?)<br />
If (2) fails but (1) is ok, there is a very weird problem, check the Matlab documentation.<br />
If (3) returns a version before 1.5.0 you will have to upgrade Matlab to a newer version or force Matlab to use a custom, newer, jvm (See the Matlab docs for how to do this).<br />
<br />
=== You get errors related to ''gaoptimset'',''psoptimset'',''saoptimset'',''newff'' not being found or unknown ===<br />
<br />
You are trying to use a component of the SUMO toolbox that requires a Matlab toolbox that you do not have. See the [[System requirements]] for more information.<br />
<br />
=== After upgrading I get all kinds of weird errors or warnings when I run my XML files ===<br />
<br />
See [[FAQ#How_do_I_upgrade_to_a_newer_version.3F]]<br />
<br />
=== I get a warning about duplicate samples being selected, why is this? ===<br />
<br />
Sometimes, in special circumstances, multiple sample selectors may select the same sample at the same time. Even though in most cases this is detected and avoided, it can still happen when multiple outputs are modelled in one run, and each output is sampled by a different sample selector. These sample selectors may then accidentally choose the same new sample location.<br />
<br />
=== I sometimes see the error of the best model go up, shouldn't it decrease monotonically? ===<br />
<br />
There is no short answer here, it depends on the situation. Below 'single objective' refers to the case where during the hyperparameter optimization (= the modeling iteration) combineOutputs=false, and there is only a single measure set to 'on'. The other cases are classified as 'multi objective'. See also [[Multi-Objective Modeling]].<br />
<br />
# '''Sampling off'''<br />
## ''Single objective'': the error should always decrease monotonically, you should never see it rise. If it does [[reporting problems|report it as a bug]]<br />
## ''Multi objective'': There is a very small chance the error can temporarily decrease but it should be safe to ignore. In this case it is best to use a multi objective enabled modeling algorithm<br />
# '''Sampling on'''<br />
## ''Single objective'': inside each modeling iteration the error should always monotonically decrease. At each sampling iteration the best models are updated (to reflect the new data), thus there the best model score may increase, this is normal behavior(*). It is possible that the error increases for a short while, but as more samples come in it should decrease again. If this does not happen you are using a poor measure or poor hyperparameter optimization algorithm, or there is a problem with the modeling technique itself (e.g., clustering in the datapoints is causing numerical problems).<br />
## ''Multi objective'': Combination of 1.2 and 2.1.<br />
<br />
(*) This is normal if you are using a measure like cross validation that is less reliable on little data than on more data. However, in some cases you may wish to override this behavior if you are using a measure that is independent of the number of samples the model is trained with (e.g., a dense, external validation set). In this case you can force a monotonic decrease by setting the 'keepOldModels' option in the SUMO tag to true. Use with caution!<br />
<br />
=== At the end of a run I get Undefined variable "ibbt" or class "ibbt.sumo.util.JpegImagesToMovie.createMovie" ===<br />
<br />
This is normal, the warning printed out before the error explains why:<br />
<br />
''[WARNING] jmf.jar not found in the java classpath, movie creation may not work! Did you install the SUMO extension pack? Alternatively you can install the java media framwork from java.sun.com''<br />
<br />
By default, at the end of a run, the toolbox will try to generate a movie of all the intermediate model plots. To do this it requires the extension pack to be installed (you can download it from the SUMO lab website). So install the extension pack and you will no longer get the error. Alternatively you can simply set the "createMovie" option in the <SUMO> tag to "false".<br />
So note that there is nothing to worry about, everything has run correctly, it is just the movie creation that is failing.<br />
<br />
=== On startup I get the error "java.io.IOException: Couldn't get lock for output/SUMO-Toolbox.%g.%u.log" ===<br />
<br />
This error means that SUMO is unable to create the log file. Check the output directory exists and has the correct permissions. If your output directory is on a shared (network) drive this could also cause problems. Also make sure you are running the toolbox (calling 'go') from the toolbox root directory, and not in some toolbox sub directory! This is very important.<br />
<br />
If you still have problems you can override the default logfile name and location as follows:<br />
<br />
In the <FileHandler> tag inside the <Logging> tag add the following option:<br />
<br />
<code><br />
<Option key="Pattern" value="My_SUMO_Log_file.log"/><br />
</code><br />
<br />
This means that from now on the sumo log file will be saved as the file "My_SUMO_Log_file.log" in the SUMO root directory. You can use any path you like.<br />
For more information about this option see [http://java.sun.com/j2se/1.4.2/docs/api/java/util/logging/FileHandler.html the FileHandler Javadoc].<br />
<br />
=== The Toolbox crashes with "Too many open files" what should I do? ===<br />
<br />
This is a known bug, see [[Known_bugs#Version_6.1]].<br />
<br />
If this does not fix your problem then do the following:<br />
<br />
On Windows try increasing the limit in windows as dictated by the error message. Also, when you get the error, use the fopen("all") command to see which files are open and send us the list of filenames. Then we can maybe further help you debug the problem. Even better would be to use the Process Explorer utility [http://technet.microsoft.com/en-us/sysinternals/bb896653.aspx available here]. When you get the error, dont shut down Matlab but start Process explorer and see which SUMO-Toolbox related files are open. If you then [[Reporting_problems|let us know]] we can further debug the problem.<br />
<br />
On Linux again don't shut down Matlab but:<br />
<br />
* open a new terminal window<br />
* type:<br />
<source lang="bash"><br />
lsof > openFiles.txt<br />
</source><br />
* Then [[Contact|send us]] the following information:<br />
** the file openFiles.txt <br />
** the exact Linux distribution you are using (Red Hat 10, CentOS 5, SUSE 11, etc).<br />
** the output of<br />
<source lang="bash"><br />
uname -a ; df -T ; mount<br />
</source><br />
<br />
As a temporary workaround you can try increasing the maximum number of open files ([http://www.linuxforums.org/forum/redhat-fedora-linux-help/64716-where-chnage-file-max-permanently.html see for example here]). We are currently debugging this issue.<br />
<br />
In general: to be safe it is always best to do a SUMO run from a clean Matlab startup, especially if the run is important or may take a long time.<br />
<br />
=== When using the LS-SVM models I get lots of warnings: "make sure lssvmFILE.x (lssvmFILE.exe) is in the current directory, change now to MATLAB implementation..." ===<br />
<br />
The LS-SVMs have a C implementation and a Matlab implementation. If you dont have the compiled mex files it will use the matlab implementation and give a warning. But everything will work properly. To get rid of the warnings, compile the mex files [[Installation#Windows|as described here]], this can be done very easily. Or simply comment out the lines that produce the output in the lssvmlab directory in src/matlab/contrib.<br />
<br />
=== I get an error "Undefined function or method 'trainlssvm' for input arguments of type 'cell'" ===<br />
<br />
You most likely forgot to [[Installation#Extension_pack|install the extension pack]].<br />
<br />
=== When running the SUMO-Toolbox under Linux, the [http://en.wikipedia.org/wiki/X_Window_System X server] suddenly restarts and I am logged out of my session ===<br />
<br />
Note that in Linux there is an explicit difference between the [http://en.wikipedia.org/wiki/Linux_kernel kernel] and the [http://en.wikipedia.org/wiki/X_Window_System X display server]. If the kernel crashes or panics your system completely freezes (you have to reset manually) or your computer does a full reboot. Luckily this is very rare. However, if you display server (X) crashes or restarts it means your operating system is still running fine, its just that you have to log in again since your graphical session has terminated. The FAQ entry is only for the latter. If you find your kernel is panicing or freezing, that is a more fundamental problem and you should contact your system admin.<br />
<br />
So what happens is that after a few seconds when the toolbox wants to plot the first model [http://en.wikipedia.org/wiki/X_Window_System X] crashes and you are suddenly presented with a login screen. The problem is not due to SUMO but rather to the Matlab - Display server interaction.<br />
<br />
What you should first do is set plotModels to false in the [[Config:ContextConfig]] tag, run again and see if the problem occurs again. If it does please [[Reporting_problems| report it]]. If the problem does not occur you can then try the following:<br />
<br />
* Log in as root (or use [http://en.wikipedia.org/wiki/Sudo sudo])<br />
* Edit the following configuration file using a text editor (pico, nano, vi, kwrite, gedit,...)<br />
<br />
<source lang="bash"><br />
/etc/X11/xorg.conf<br />
</source><br />
<br />
Note: the exact location of the xorg.conf file may vary on your system.<br />
<br />
* Look for the following line:<br />
<br />
<source lang="bash"><br />
Load "glx"<br />
</source><br />
<br />
* Comment it out by replacing it by:<br />
<br />
<source lang="bash"><br />
# Load "glx"<br />
</source><br />
<br />
* Then save the file, restart your X server (if you do not know how to do this simply reboot your computer)<br />
* Log in again, and try running the toolbox (making sure plotModels is set to true again). It should now work. If it still does not please [[Reporting_problems| report it]].<br />
<br />
Note:<br />
* this is just an empirical workaround, if you have a better idea please [[Contact|let us know]]<br />
* if you wish to debug further yourself please check the Xorg log files and those in /var/log<br />
* another possible workaround is to start matlab with the "-nodisplay" option. That could work as well.<br />
<br />
=== I get the error "Failed to close Matlab pool cleanly, error is Too many output arguments" ===<br />
<br />
This happens if you run the toolbox on Matlab version 2008a and you have the parallel computing toolbox installed. You can simply ignore this error message, it does not cause any problems. If you want to use SUMO with the parallel computing toolbox you will need Matlab 2008b.<br />
<br />
=== The toolbox seems to keep on running forever, when or how will it stop? ===<br />
<br />
The toolbox will keep on generating models and selecting data until one of the termination criteria has been reached. It is up to ''you'' to choose these targets carefully, so how low the toolbox runs simply depends on what targets you choose. Please see [[Running#Understanding_the_control_flow]].<br />
<br />
Of course choosing a-priori targets up front is not always easy and there is no real solution for this, except thinking well about what type of model you want (see [[FAQ#I_dont_like_the_final_model_generated_by_SUMO_how_do_I_improve_it.3F]]). In doubt you can always use a small value (or 0) and then simply quit the running toolbox using Ctrl-C when you think its been enough.<br />
<br />
While one could implement fancy, automatic stopping algorithms, their actual benefit is questionable.</div>Dgorissenhttp://sumowiki.intec.ugent.be/index.php?title=FAQ&diff=5209FAQ2010-08-23T06:46:34Z<p>Dgorissen: /* Should I use a Matlab script or a shell script for interfacing with my simulation code? */</p>
<hr />
<div>== General ==<br />
<br />
=== What is a global surrogate model? ===<br />
<br />
A global [http://en.wikipedia.org/wiki/Surrogate_model surrogate model] is a mathematical model that mimics the behavior of a computationally expensive simulation code over '''the complete parameter space''' as accurately as possible, using as little data points as possible. So note that optimization is not the primary goal, although it can be done as a post-processing step. Global surrogate models are useful for:<br />
<br />
* design space exploration, to get a ''feel'' of how the different parameters behave<br />
* sensitivity analysis<br />
* ''what-if'' analysis<br />
* prototyping<br />
* visualization<br />
* ...<br />
<br />
In addition they are a cheap way to model large scale systems, multiple global surrogate models can be chained together in a model cascade.<br />
<br />
See also the [[About]] page.<br />
<br />
=== What about surrogate driven optimization? ===<br />
<br />
When coining the term '''surrogate driven optimization''' most people associate it with trust-region strategies and simple polynomial models. These frameworks first construct a local surrogate which is optimized to find an optimum. Afterwards, a move limit strategy decides how the local surrogate is scaled and/or moved through the input space. Subsequently the surrogate is rebuild and optimized. I.e. the surrogate zooms in to the global optimum. For instance the [http://www.cs.sandia.gov/DAKOTA/ DAKOTA] Toolbox implements such strategies where the surrogate construction is separated from optimization.<br />
<br />
Such a framework was earlier implemented in the SUMO Toolbox but was deprecated as it didn't fit the philosophy and design of the toolbox. <br />
<br />
Instead another, equally powerful, approach was taken. The current optimization framework is in fact a sampling selection strategy that balances local and global search. In other words, it balances between exploring the input space and exploiting the information the surrogate gives us.<br />
<br />
A configuration example can be found [[Config:SampleSelector#expectedImprovement|here]].<br />
<br />
=== What is (adaptive) sampling? Why is it used? ===<br />
<br />
In classical Design of Experiments you need to specify the design of your experiment up-front. Or in other words, you have to say up-front how many data points you need and how they should be distributed. Two examples are Central Composite Designs and Latin Hypercube designs. However, if your data is expensive to generate (e.g., an expensive simulation code) it is not clear how many points are needed up-front. Instead data points are selected adaptively, only a couple at a time. This process of incrementally selecting new data points in regions that are the most interesting is called adaptive sampling, sequential design, or active learning. Of course the sampling process needs to start from somewhere so the very first set of points is selected based on a fixed, classic experimental design. See also [[Running#Understanding_the_control_flow]].<br />
SUMO provides a number of different sampling algorithms: [[SampleSelector]]<br />
<br />
Of course sometimes you dont want to do sampling. For example if you have a fixed dataset you just want to load all the data in one go and model that. For how to do this see [[FAQ#How_do_I_turn_off_adaptive_sampling_.28run_the_toolbox_for_a_fixed_set_of_samples.29.3F]].<br />
<br />
=== What about dynamical, time dependent data? ===<br />
<br />
The original design and purpose was to tackle static input-output systems, where there is no memory. Just a complex mapping that must be learnt and approximated. Of course you can take a fixed time interval and apply the toolbox but that typically is not a desired solution. Usually you are interested in time series prediction, e.g., given a set of output values from time t=0 to t=k, predict what happens at time t=k+1,k+2,...<br />
<br />
The toolbox was originally not intended for this purpose. However, it is quite easy to add support for recurrent models. Automatic generation of dynamical models would involve adding a new model type (just like you would add a new regression technique) or require adapting an existing one. For example it would not be too much work to adapt the ANN or SVM models to support dynamic problems. The only extra work besides that would be to add a new [[Measures|Measure]] that can evaluate the fidelity of the models' prediction.<br />
<br />
Naturally though, you would be unable to use sample selection (since it makes no sense in those problems). Unless of course there is a specialized need for it. In that case you would add a new [[SampleSelector]].<br />
<br />
For more information on this topic [[Contact]] us.<br />
<br />
=== What about classification problems? ===<br />
<br />
The main focus of the SUMO Toolbox is on regression/function approximation. However, the framework for hyperparameter optimization, model selection, etc. can also be used for classification. Starting from version 6.3 a demo file is included in the distribution that shows how this works on a well known test problem. If you want to play around with this feature without waiting for 6.3 to be released [[Contact|just let us know]].<br />
<br />
=== Can the toolbox drive my simulation code directly? ===<br />
<br />
Yes it can. See the [[Interfacing with the toolbox]] page.<br />
<br />
=== What is the difference between the M3-Toolbox and the SUMO-Toolbox? ===<br />
<br />
The SUMO toolbox is a complete, feature-full framework for automatically generating approximation models and performing adaptive sampling. In contrast, the M3-Toolbox was more of a proof-of-principle.<br />
<br />
=== What happened to the M3-Toolbox? ===<br />
<br />
The M3 Toolbox project has been discontinued (Fall 2007) and superseded by the SUMO Toolbox. Please contact tom.dhaene@ua.ac.be for any inquiries and requests about the M3 Toolbox.<br />
<br />
=== How can I stay up to date with the latest news? ===<br />
<br />
To stay up to date with the latest news and releases, we also recommend subscribing to our newsletter [http://www.sumo.intec.ugent.be here]. Traffic will be kept to a minimum (1 message every 2-3 months) and you can unsubscribe at any time.<br />
<br />
You can also follow our blog: [http://sumolab.blogspot.com/ http://sumolab.blogspot.com/].<br />
<br />
=== What is the roadmap for the future? ===<br />
<br />
There is no explicit roadmap since much depends on where our research leads us, what feedback we get, which problems we are working on, etc. However, to get an idea of features to come you can always check the [[Whats new]] page.<br />
<br />
You can also follow our blog: [http://sumolab.blogspot.com/ http://sumolab.blogspot.com/].<br />
<br />
=== Will there be an R/Scilab/Octave/Sage/.. version? ===<br />
<br />
At the start of the project we considered moving from Matlab to one of the available open source alternatives. However, after much discussion we decided against this for several reasons, including:<br />
<br />
* Existing experience and know-how of the development team<br />
* The widespread use of the Matlab platform in the target application domains<br />
* The quality and amount of available Matlab documentation<br />
* The quality and number of Matlab toolboxes<br />
* Support for object orientation (inheritance, polymorphism, etc.)<br />
* Many well documented interfacing options (especially the seamless integration with Java)<br />
<br />
Matlab, as a proprietary platform, definitely has its problems and deficiencies but the number of advanced algorithms and available toolboxes make it a very attractive platform. Equally important is the fact that every function is properly documented, tested, and includes examples, tutorials, and in some cases GUI tools. A lot of things would have been a lot harder and/or time consuming to implement on one of the other platforms. Add to that the fact that many engineers (particularly in aerospace) already use Matlab quite heavily. Thus given our situation, goals, and resources at the time, Matlab was the best choice for us. <br />
<br />
The other platforms remain on our radar however, and we do look into them from time to time. Though, with our limited resources porting to one of those platforms is not (yet) cost effective.<br />
<br />
=== What are collaboration options? ===<br />
<br />
We will gladly help out with any SUMO-Toolbox related questions or problems. However, since we are a university research group the most interesting goal for us is to work towards some joint publication (e.g., we can help with the modeling of your problem). Alternatively, it is always nice if we could use your data/problem (fully referenced and/or anonymized if necessary of course) as an example application during a conference presentation or in a PhD thesis.<br />
<br />
The most interesting case is if your problem involves sample selection and modeling. This means you have some simulation code or script to drive and you want an accurate model while minimizing the number of data points. In this case, in order for us to optimally help you it would be easiest if we could run your simulation code (or script) locally or access it remotely. Else its difficult to give good recommendations about what settings to use.<br />
<br />
If this is not possible (e.g., expensive, proprietary or secret modeling code) or if your problem does not involve sample selection, you can send us a fixed data set that is representative of your problem. Again, this may be fully anonymized and will be kept confidential of course.<br />
<br />
In either case (code or dataset) remember:<br />
<br />
* the data file should be an ASCII file in column format (each row containing one data point) (see also [[Interfacing_with_the_toolbox]])<br />
* include a short description of your data:<br />
** number of inputs and number of outputs<br />
** the range of each input (or scaled to [-1 1] if you do not wish to disclose this)<br />
** if the outputs are real or complex valued<br />
** how noisy the data is or if it is completely deterministic (computer simulation) (please also see: [[FAQ#My_data_contains_noise_can_the_SUMO-Toolbox_help_me.3F]]).<br />
** if possible the expected range of each output (or scaled if you do not wish to disclose this)<br />
** if possible the names of each input/output + a short description of what they mean<br />
** any further insight you have about the data, expected behavior, expected importance of each input, etc.<br />
<br />
If you have any further questions or comments related to this please [[Contact]] us.<br />
<br />
=== Can you help me model my problem? ===<br />
<br />
Please see the previous question: [[FAQ#What_are_collaboration_options.3F]]<br />
<br />
== Installation and Configuration ==<br />
<br />
=== What is the relationship between Matlab and Java? ===<br />
<br />
Many people do not know this, but your Matlab installation automatically includes a Java virtual machine. By default, Matlab seamlessly integrates with Java, allowing you to create Java objects from the command line (e.g., 's = java.lang.String'). It is possible to disable java support but in order to use the SUMO Toolbox it should not be. To check if Java is enabled you can use the 'usejava' command.<br />
<br />
=== What is Java, why do I need it, do I have to install it, etc. ? ===<br />
<br />
The short answer is: no, dont worry about it. The long answer is: Some of the code of the SUMO Toolbox is written in [http://en.wikipedia.org/wiki/Java_(programming_language) Java], since it makes a lot more sense in many situations and is a proper programming language instead of a scripting language like Matlab. Since Matlab automatically includes a JVM to run Java code there is nothing you need to do or worry about (see the previous FAQ entry). Unless its not working of course, in that case see [[FAQ#When_running_the_toolbox_you_get_something_like_.27.3F.3F.3F_Undefined_variable_.22ibbt.22_or_class_.22ibbt.sumo.config.ContextConfig.setRootDirectory.22.27]].<br />
<br />
=== What is XML? ===<br />
<br />
XML stands for eXtensible Markup Language and is related to HTML (= the stuff web pages are written in). The first thing you have to understand is that '''does not do anything'''. Honest. Many engineers are not used to it and think it is some complicated computer programming language-stuff-thingy. This is of course not the case (we ignore some of the fancy stuff you can do with it for now). XML is a markup language meaning, it provides some rules how you can annotate or structure existing text.<br />
<br />
The way SUMO uses XML is really simple and there is not much to understand. First some simple terminology. Take the following example:<br />
<br />
<source lang="xml"><br />
<Foo attr="bar">bla bla bla</Foo> <br />
</source><br />
<br />
Here we have '''a tag''' called ''Foo'' containing text ''bla bla bla''. The tag Foo also has an '''attribute''' ''attr'' with value ''bar''. '<Foo>' is what we call the '''opening tag''', and '</Foo>' is the '''closing tag'''. Each time you open a tag you must close it again. How you name the tags or attributes it totally up to you, you choose :)<br />
<br />
Lets take a more interesting example. Here we have used XML to represent information about a receipe for pancakes:<br />
<br />
<source lang="xml"><br />
<recipe category="dessert"><br />
<title>Pancakes</title><br />
<author>sumo@intec.ugent.be</author><br />
<date>Wed, 14 Jun 95</date><br />
<description><br />
Good old fashioned pancakes.<br />
</description><br />
<ingredients><br />
<item><br />
<amount>3</amount><br />
<type>eggs</type><br />
</item><br />
<br />
<item><br />
<amount>0.5 tablespoon</amount><br />
<type>salt</type><br />
</item><br />
...<br />
</ingredients><br />
<preparation><br />
...<br />
</preparation><br />
</recipe><br />
</source><br />
<br />
So basically, you see that XML is just a way to structure, order, and group information. Thats it! So SUMO basically uses it to store and structure configuration options. And this works well due to the nice hierarchical nature of XML.<br />
<br />
If you understand this there is nothing else to it in order to be able to understand the SUMO configuration files. If you need more information see the tutorial here: [http://www.w3schools.com/XML/xml_whatis.asp http://www.w3schools.com/XML/xml_whatis.asp]. You can also have a look at the wikipedia page here: [http://en.wikipedia.org/wiki/XML http://en.wikipedia.org/wiki/XML]<br />
<br />
=== Why does SUMO use XML? ===<br />
<br />
XML is the defacto standard way of structuring information. This ranges from spreadsheet files (Microsoft Excel for example), to configuration data, to scientific data, ... There are even whole database systems based solely on XML. So basically, its an intuitive way to structure data and it is used everywhere. This makes that there are a very large number of libraries and programming languages available that can parse, and handle XML easily. That means less work for the programmer. Then of course there is stuff like XSLT, XQuery, etc that makes life even easier.<br />
So basically, it would not make sense for SUMO to use any other format :)<br />
<br />
=== I get an error that SUMO is not yet activated ===<br />
<br />
Make sure you installed the activation file that was mailed to you as is explained in the [[Installation]] instructions. Also double check your system meets the [[System requirements]] and that [http://www.sumowiki.intec.ugent.be/index.php/FAQ#When_running_the_toolbox_you_get_something_like_.27.3F.3F.3F_Undefined_variable_.22ibbt.22_or_class_.22ibbt.sumo.config.ContextConfig.setRootDirectory.22.27|java java is enabled]. To fully verify that the activation file installation is correct ensure that the file ContextConfig.class is present in the directory ''<SUMO installation directory>/bin/java/ibbt/sumo/config''.<br />
<br />
Please note that more flexible research licenses are available if it is possible to [[FAQ#What_are_collaboration_options.3F|collaborate in any way]].<br />
<br />
== Upgrading ==<br />
<br />
=== How do I upgrade to a newer version? ===<br />
<br />
Delete your old <code><SUMO-Toolbox-directory></code> completely and replace it by the new one. Install the new activation file / extension pack as before (see [[Installation]]), start Matlab and make sure the default run works. To port your old configuration files to the new version: make a copy of default.xml (from the new version) and copy over your custom changes (from the old version) one by one. This should prevent any weirdness if the XML structure has changed between releases.<br />
<br />
If you had a valid activation file for the previous version, just [[Contact]] us (giving your SUMOlab website username) and we will send you a new activation file. Note that to update an activation file you must first unzip a copy of the toolbox to a new directory and install the activation file as if it was the very first time. Upgrading of an activation file without performing a new toolbox install is (unfortunately) not (yet) supported.<br />
<br />
== Using ==<br />
<br />
=== I have no idea how to use the toolbox, what should I do? ===<br />
<br />
See: [[Running#Getting_started]]<br />
<br />
=== I want to try one of the different examples ===<br />
<br />
See [[Running#Running_different_examples]].<br />
<br />
=== I want to model my own problem ===<br />
<br />
See : [[Adding an example]].<br />
<br />
=== I want to contribute some data/patch/documentation/... ===<br />
<br />
See : [[Contributing]].<br />
<br />
=== How do I interface with the SUMO Toolbox? ===<br />
<br />
See : [[Interfacing with the toolbox]].<br />
<br />
=== What configuration options (model type, sample selection algorithm, ...) should I use for my problem? ===<br />
<br />
See [[General_guidelines]].<br />
<br />
=== Ok, I generated a model, what can I do with it? ===<br />
<br />
See: [[Using a model]].<br />
<br />
=== How can I share a model created by the SUMO Toolbox? ===<br />
<br />
See : [[Using a model#Model_portability| Model portability]].<br />
<br />
=== I dont like the final model generated by SUMO how do I improve it? ===<br />
<br />
Before you start the modeling you should really ask youself this question: ''What properties do I want to see in the final model?'' You have to think about what for you constitutes a good model and what constitutes a poor model. Then you should rank those properties depending on how important you find them. Examples are:<br />
<br />
* accuracy in the training data<br />
** is it important that the error in the training data is exactly 0, or do you prefer some smoothing<br />
* accuracy outside the training data<br />
** this is the validation or test error, how important is proper generalization (usually this is very important)<br />
* what does accuracy mean to you? a low maximum error, a low average error, both, ...<br />
* smoothness<br />
** should your model be perfectly smooth or is it acceptable that you have a few small ripples here and there for example<br />
* are some regions of the response more important than others?<br />
** for example you may want to be certain that the minima/maxima are captured very accurately but everything in between is less important<br />
* are there particular special features that your model should have<br />
** for example, capture underlying poles or discontinuities correctly<br />
* extrapolation capability<br />
* ...<br />
<br />
It is important to note that often these criteria may be conflicting. The classical example is fitting noisy data: the lower your training error the higher your testing error. A natural approach is to combine multiple criteria, see [[Multi-Objective Modeling]].<br />
<br />
Once you have decided on a set of requirements the question is then, can the SUMO-Toolbox produce a model that meets them? In SUMO model generation is driven by one or more [[Measures]]. So you should choose the combination of [[Measures]] that most closely match your requirements. Of course we can not provide a Measure for every single property, but it is very straightforward to [[Add_Measure|add your own Measure]].<br />
<br />
Now, lets say you have chosen what you think are the best Measures but you are still not happy with the final model. Reasons could be:<br />
<br />
* you need more modeling iterations or you need to build more models per iteration (see [[Running#Understanding_the_control_flow]]). This will result in a more extensive search of the model parameter space, but will take longer to run.<br />
* you should switch to a different model parameter optimization algorithm (e.g., for example instead of the Pattern Search variant, try the Genetic Algorithm variant of your AdaptiveModelBuilder.)<br />
* the model type you are using is not ideally suited to your data<br />
* there simply is not enough data, use a larger initial design or perform more sampling iterations to get more information per dimension<br />
* maybe the sample distribution is causing troubles for your model (e.g., Kriging can have problems with clustered data). In that case it could be worthwhile to choose a different sample selection algorithm.<br />
* the range of your response variable is not ideal (for example, neural networks have trouble modeling data if the range of the outputs is very very small)<br />
<br />
You may also refer to the following [[General_guidelines]]. Finally, of course it may be that your problem is simply a very difficult one and does not approximate well. But, still you should at least get something satisfactory.<br />
<br />
If you are having these kinds of problems, please [[Reporting_problems|let us know]] and we will gladly help out.<br />
<br />
=== My data contains noise can the SUMO-Toolbox help me? ===<br />
<br />
The original purpose of the SUMO-Toolbox was for it to be used in conjunction with computer simulations. Since these are fully deterministic you do not have to worry about noise in the data and all the problems it causes. However, the methods in the toolbox are general fitting methods that work on noisy data as well. So yes, the toolbox can be used with noisy data, but you will just have to be more careful about how you apply the methods and how you perform model selection. Its only when you use the toolbox with a noisy simulation engine that a few special options may need to be set. In that case [[Contact]] us for more information.<br />
<br />
Note though, that the toolbox is not a statistical package, if you have noisy data and you need noise estimation algorithms, kernel smoothing algorithms, etc. you should look towards other tools.<br />
<br />
=== What is the difference between a ModelBuilder and a ModelFactory? ===<br />
<br />
See [[Add Model Type]].<br />
<br />
=== Why are the Neural Networks so slow? ===<br />
<br />
The ANN models are an extremely powerful model type that give very good results in many problems. However, they are quite slow to use. There are some things you can do:<br />
<br />
* use trainlm or trainscg instead of the default training function trainbr. trainbr gives very good, smooth results but is slower to use. If results with trainlm are not good enough, try using msereg as a performance function.<br />
* try setting the training goal (= the SSE to reach during training) to a small positive number (e.g., 1e-5) instead of 0.<br />
* check that the output range of your problem is not very small. If your response data lies between 10e-5 and 10e-9 for example it will be very hard for the neural net to learn it. In that case rescale your data to a more sane range.<br />
* switch from ANN to one of the other neural network modelers: fanngenetic or nanngenetic. These are a lot faster than the default backend based on the [http://www.mathworks.com/products/neuralnet/ Matlab Neural Network Toolbox]. However, the accuracy is usually not as good.<br />
* If you are using [[Measures#CrossValidation| CrossValidation]] try to switch to a different measure since CrossValidation is very expensive to use. CrossValidation is used by default if you have not defined a [[Measures| measure]] yourself. When using one of the neural network model types, try to use a different measure if you can. For example, our tests have shown that minimizing the sum of [[Measures#SampleError| SampleError]] and [[Measures#LRMMeasure| LRMMeasure]] can give equal or even better results than CrossValidation, while being much cheaper (see [[Multi-Objective Modeling]] for how to combine multiple measures). See also the comments in <code>default.xml</code> for examples.<br />
<br />
See also [[FAQ#How_can_I_make_the_toolbox_run_faster.3F]]<br />
<br />
=== How can I make the toolbox run faster? ===<br />
<br />
There are a number of things you can do to speed things up. These are listed below. Remember though that the main reason the toolbox may seem to be slow is due to the many models being built as part of the hyperparameter optimization. Please make sure you fully understand the [[Running#Understanding_the_control_flow|control flow described here]] before trying more advanced options.<br />
<br />
* First of all check that your virus scanner is not interfering with Matlab. If McAfee or any other program wants to scan every file SUMO generates this really slows things down and your computer becomes unusable.<br />
<br />
* Turn off the plotting of models in [[Config:ContextConfig#PlotOptions| ContextConfig]], you can always generate plots from the saved mat files<br />
<br />
* This is an important one. For most model builders there is an option "maxFunEals", "maxIterations", or equivalent. Change this value to change the maximum number of models built between 2 sampling iterations. The higher this number, the slower, but the better the models ''may'' be. Equivalently, for the Genetic model builders reduce the population size and the number of generations.<br />
<br />
* If you are using [[Measures#CrossValidation]] see if you can avoid it and use one of the other measures or a combination of measures (see [[Multi-Objective Modeling]]<br />
<br />
* If you are using a very dense [[Measures#ValidationSet]] as your Measure, this means that every single model will be evaluated on that data set. For some models like RBF, Kriging, SVM, this can slow things down.<br />
<br />
* Disable some, or even all of the [[Config:ContextConfig#Profiling| profilers]] or disable the output handlers that draw charts. For example, you might use the following configuration for the profilers:<br />
<br />
<source lang="xml"><br />
<Profiling><br />
<Profiler name=".*share.*|.*ensemble.*|.*Level.*" enabled="true"><br />
<Output type="toImage"/><br />
<Output type="toFile"/><br />
</Profiler><br />
<br />
<Profiler name=".*" enabled="true"><br />
<Output type="toFile"/><br />
</Profiler><br />
</Profiling><br />
</source><br />
<br />
The ".*" means match any one or more characters ([http://java.sun.com/j2se/1.5.0/docs/api/java/util/regex/Pattern.html see here for the full list of supported wildcards]). Thus in this example all the profilers that have "share", "ensemble", or "Level" in their name shoud be enabled and should be saved as a text file (toFile) AND as an image file (toImage). All the other profilers should be saved just to file. The idea is to only save to image what you want as an image since image generation is expensive. If you do this or switch off image generation completely you will see everything run much faster.<br />
<br />
* Decrease the logging granularity, a log level of FINE (the default is FINEST or ALL) is more then granular enough. Setting it to FINE, INFO, or even WARNING should speed things up.<br />
<br />
* If you have a multi-core/multi-cpu machine:<br />
** if you have the Matlab Parallel Computing Toolbox, try setting the parallelMode option to true in [[Config:ContextConfig]]. Now all model training occurs in parallel. This may give unexpected errors in some cases so beware when using.<br />
** if you are using a native executable or script as the sample evaluator set the threadCount variable in [[Config:SampleEvaluator#LocalSampleEvaluator| LocalSampleEvaluator]] equal to the number of cores/CPUs (only do this if it is ok to start multiple instances of your simulation script in parallel!)<br />
<br />
* Dont use the Min-Max measure, it can slow things down. See also [[FAQ#How_do_I_force_the_output_of_the_model_to_lie_in_a_certain_range]]<br />
<br />
* If you are using neural networks see [[FAQ#Why_are_the_Neural_Networks_so_slow.3F]]<br />
<br />
* If you are having problems with very slow or seemingly hanging runs:<br />
** Do a run inside the [http://www.mathworks.com/access/helpdesk/help/techdoc/index.html?/access/helpdesk/help/techdoc/matlab_env/f9-17018.html&http://www.google.be/search?client=firefox-a&rls=org.mozilla%3Aen-US%3Aofficial&channel=s&hl=nl&q=matlab+profiler&meta=&btnG=Google+zoeken Matlab profiler] and see where most time is spent.<br />
<br />
** Monitor CPU and physical/virtual memory usage while the SUMO toolbox is running and see if you notice anything strange. <br />
<br />
* Also note that by default Matlab only allocates about 117 MB memory space for the Java Virtual Machine. If you would like to increase this limit (which you should) please follow the instructions [http://www.mathworks.com/support/solutions/data/1-18I2C.html?solution=1-18I2C here]. See also the general memory instructions [http://www.mathworks.com/support/tech-notes/1100/1106.html here].<br />
<br />
To check if your SUMO run has hanged, monitor your log file (with the level set at least to FINE). If you see no changes for about 30 minutes the toolbox will probably have stalled. [[Reporting problems| report the problems here]].<br />
<br />
Such problems are hard to identify and fix so it is best to work towards a reproducible test case if you think you found a performance or scalability issue.<br />
<br />
=== How do I build models with more than one output ===<br />
<br />
Sometimes you have multiple responses that you want to model at once. See [[Running#Models_with_multiple_outputs]]<br />
<br />
=== How do I turn off adaptive sampling (run the toolbox for a fixed set of samples)? ===<br />
<br />
See : [[Adaptive Modeling Mode]].<br />
<br />
=== How do I change the error function (relative error, RMSE, ...)? ===<br />
<br />
The [[Measures| <Measure>]] tag specifies the algorithm to use to assign models a score, e.g., [[Measures#CrossValidation| CrossValidation]]. It is also possible to specify which '''error function''' to use, in the measure. The default error function is '<code>rootRelativeSquareError</code>'.<br />
<br />
Say you want to use [[Measures#CrossValidation| CrossValidation]] with the maximum absolute error, then you would put:<br />
<br />
<source lang="xml"><br />
<Measure type="CrossValidation" target="0.001" errorFcn="maxAbsoluteError"/><br />
</source><br />
<br />
On the other hand, if you wanted to use the [[Measures#ValidationSet| ValidationSet]] measure with a relative root-mean-square error you would put:<br />
<br />
<source lang="xml"><br />
<Measure type="ValidationSet" target="0.001" errorFcn="relativeRms"/><br />
</source><br />
<br />
The default error function is '<code>rootRelativeSquareError</code>'. These error functions can be found in the <code>src/matlab/tools/errorFunctions</code> directory. You are free to modify them and add your own. Remember that the choice of error function is very important! Make sure you think well about it. Also see [[Multi-Objective Modeling]].<br />
<br />
=== How do I enable more profilers? ===<br />
<br />
Go to the [[Config:ContextConfig#Profiling| <Profiling>]] tag and put <code>"<nowiki>.*</nowiki>"</code> as the regular expression. See also the next question.<br />
<br />
=== What regular expressions can I use to filter profilers? ===<br />
<br />
See the syntax [http://java.sun.com/j2se/1.5.0/docs/api/java/util/regex/Pattern.html here].<br />
<br />
=== How can I ensure deterministic results? ===<br />
<br />
See : [[Random state]].<br />
<br />
=== How do I get a simple closed-form model (symbolic expression)? ===<br />
<br />
See : [[Using a model]].<br />
<br />
=== How do I enable the Heterogenous evolution to automatically select the best model type? ===<br />
<br />
Simply use the [[Config:AdaptiveModelBuilder#heterogenetic| heterogenetic modelbuilder]] as you would any other.<br />
<br />
=== What is the combineOutputs option? ===<br />
<br />
See [[Running#Models_with_multiple_outputs]]<br />
<br />
=== What error function should I use? ===<br />
<br />
The default error function is the Root Relative Square Error (RRSE). On the other hand meanRelativeError may be more intuitive but in that case you have to be careful if you have function values close to zero since in that case the relative error explodes or even gives infinity. You could also use one of the combined relative error functions (contain a +1 in the denominator to account for small values) but then you get something between a relative and absolute error (=> hard to interpret).<br />
<br />
So to be sure an absolute error seems the safest bet (like the RMSE), however in that case you have to come up with sensible accuracy targets and realize that you will build models that try to fit the regions of high absolute value better than the low ones.<br />
<br />
Picking an error function is a very tricky business and many people do not realize this. Which one is best for you and what targets you use ultimately depends on your application and on what kind of model you want. There is no general answer.<br />
<br />
A recommended read is [http://www.springerlink.com/content/24104526223221u3/ is this paper]. See also the page on [[Multi-Objective Modeling]].<br />
<br />
=== I just want to generate an initial design (no sampling, no modeling) ===<br />
<br />
Do a regular SUMO run, except set the 'maxModelingIterations' in the SUMO tag to 0. The resulting run will only generate (and evaluate) the initial design and save it to samples.txt in the output directory.<br />
<br />
=== How do I start a run with the samples of of a previous run, or with a custom initial design? ===<br />
<br />
Use a Dataset design component, for example:<br />
<br />
<source lang="xml"><br />
<InitialDesign type="DatasetDesign"><br />
<Option key="file" value="/path/to/the/file/containing/the/points.txt"/><br />
</InitialDesign><br />
</source><br />
<br />
=== What is a level plot? ===<br />
<br />
A level plot is a plot that shows how the error histogram changes as the best model improves. An example is:<br />
<gallery><br />
Image:levelplot.png<br />
</gallery><br />
Level plots only work if you have a separate dataset (test set) that the model can be checked against. See the comments in default.xml for how to enable level plots.<br />
<br />
===I am getting a java out of memory error, what happened?===<br />
Datasets are loaded through java. This means that the java heap space is used for storing the data. If you try to load a huge dataset (> 50MB), you might experience problems with the maximum heap size. You can solve this by raising the heap size as described on the following webpage:<br />
[http://www.mathworks.com/support/solutions/data/1-18I2C.html]<br />
<br />
=== How do I force the output of the model to lie in a certain range ===<br />
<br />
See [[Measures#MinMax]].<br />
<br />
=== My problem is high dimensional and has a lot of input parameters (more than 10). Can I use SUMO? ===<br />
<br />
That depends. Remember that the main focus of SUMO is to generate accurate 'global' models. If you want to do sampling the practical dimensionality is limited to around 6-8 (though it depends on the problem and how cheap the simulations are!). Since the more dimensions the more space you need to fill. At that point you need to see if you can extend the models with domain specific knowledge (to improve performance) or apply a dimensionality reduction method ([[FAQ#Can_the_toolbox_tell_me_which_are_the_most_important_inputs_.28.3D_variable_selection.29.3F|see the next question]]). On the other hand, if you don't need to do sample selection but you have a fixed dataset which you want to model. Then the performance on high dimensional data just depends on the model type. For examples SVM type models are independent of the dimension and thus can always be applied. Though things like feature selection are always recommended.<br />
<br />
=== Can the toolbox tell me which are the most important inputs (= variable selection)? ===<br />
<br />
When tackling high dimensional problems a crucial question is "Are all my input parameters relevant?". Normally domain knowledge would answer this question but this is not always straightforward. In those cases a whole set of algorithms exist for doing dimensionality reduction (= feature selection). Support for some of these algorithms may eventually make it into the toolbox but are not currently implemented. That is a whole PhD thesis on its own. However, if a model type provides functions for input relevance determination the toolbox can leverage this. For example, the LS-SVM model available in the toolbox supports Automatic Relevance Determination (ARD). This means that if you use the SUMO Toolbox to generate an LS-SVM model, you can call the function ''ARD()'' on the model and it will give you a list of the inputs it thinks are most important.<br />
<br />
=== Should I use a Matlab script or a shell script for interfacing with my simulation code? ===<br />
<br />
When you want to link SUMO with an external simulation engine (ADS Momentum, SPECTRE, FEBIO, SWAT, ...) you need a [http://en.wikipedia.org/wiki/Shell_script shell script] (or executable) that can take the requested points from SUMO, setup the simulation engine (e.g., set necessary input files), calls the simulator for all the requested points, reads the output (e.g., one or more output files), and returns the results to SUMO (see [[Interfacing with the toolbox]]).<br />
<br />
Which one you choose (matlab script + [[Config:SampleEvaluator#matlab|Matlab Sample Evaluator]], or shell script/executable with [[Config:SampleEvaluator#local|Local Sample Evaluator]] is basically a matter of preference, take whatever is easiest for you.<br />
<br />
HOWEVER, there is one important consideration: Matlab does not support threads so this means that if you use a matlab script to interface with the simulation engine, simulations and modeling will happen sequentially, NOT in parallel. This means the modeling code will sit around waiting, doing nothing, until the simulation(s) have finished. If your simulation code takes a long time to run this is not very efficient.<br />
<br />
On the other hand, using a shell script/executable, does allow the modeling and simulation to occur in parallel (at least if you wrote your interface script in such a way that it can be run multiple times in parallel, i.e., no shared global directories or variables that can cause [http://en.wikipedia.org/wiki/Race_condition race conditions]).<br />
<br />
As a sidenote, note that if you already put work into a Matlab script, it is still possible to use a shell script, by writing a shell script that starts Matlab (using -nodisplay or -nojvm options), executes your script (using the -r option), and exits Matlab again. Of course it is not very elegant and adds some overhead but depending on your situation it may be worth it.<br />
<br />
=== How can I look at the internal structure of a SUMO model ===<br />
<br />
See [[Using_a_model#Available_methods]].<br />
<br />
=== Is there any design documentation available? ===<br />
<br />
An in depth overview of the rationale and philosophy, including a treatment of the software architecture underlying the SUMO Toolbox is available in the form of a PhD dissertation. A copy of this dissertation [http://www.sumo.intec.ugent.be/?q=system/files/2010_04_PhD_DirkGorissen.pdf is available here].<br />
<br />
== Troubleshooting ==<br />
<br />
=== I have a problem and I want to report it ===<br />
<br />
See : [[Reporting problems]].<br />
<br />
=== I sometimes get flat models when using rational functions ===<br />
<br />
First make sure the model is indeed flat, and does not just appear so on the plot. You can verify this by looking at the output axis range and making sure it is within reasonable bounds. When there are poles in the model, the axis range is sometimes stretched to make it possible to plot the high values around the pole, causing the rest of the model to appear flat. If the model contains poles, refer to the next question for the solution.<br />
<br />
The [[Config:AdaptiveModelBuilder#rational| RationalModel]] tries to do a least squares fit, based on which monomials are allowed in numerator and denominator. We have experienced that some models just find a flat model as the best least squares fit. There are two causes for this:<br />
<br />
* The number of sample points is few, and the model parameters (as explained [[Model types explained#PolynomialModel|here]]) force the model to use only a very small set of degrees of freedom. The solution in this case is to increase the minimum percentage bound in the RationalFactory section of your configuration file: change the <code>"percentBounds"</code> option to <code>"60,100"</code>, <code>"80,100"</code>, or even <code>"100,100"</code>. A setting of <code>"100,100"</code> will force the polynomial models to always exactly interpolate. However, note that this does not scale very well with the number of samples (to counter this you can set <code>"maxDegrees"</code>). If, after increasing the <code>"percentBounds"</code> you still get weird, spiky, models you simply need more samples or you should switch to a different model type.<br />
* Another possibility is that given a set of monomial degrees, the flat function is just the best possible least squares fit. In that case you simply need to wait for more samples.<br />
* The measure you are using is not accurately estimating the true error, try a different measure or error function. Note that a maximum relative error is dangerous to use since a the 0-function (= a flat model) has a lower maximum relative error than a function which overshoots the true behavior in some places but is otherwise correct.<br />
<br />
=== When using rational functions I sometimes get 'spikes' (poles) in my model ===<br />
<br />
When the denominator polynomial of a rational model has zeros inside the domain, the model will tend to infinity near these points. In most cases these models will only be recognized as being `the best' for a short period of time. As more samples get selected these models get replaced by better ones and the spikes should disappear.<br />
<br />
So, it is possible that a rational model with 'spikes' (caused by poles inside the domain) will be selected as best model. This may or may not be an issue, depending on what you want to use the model for. If it doesn't matter that the model is very inaccurate at one particular, small spot (near the pole), you can use the model with the pole and it should perform properly.<br />
<br />
However, if the model should have a reasonable error on the entire domain, several methods are available to reduce the chance of getting poles or remove the possibility altogether. The possible solutions are:<br />
<br />
* Simply wait for more data, usually spikes disappear (but not always).<br />
* Lower the maximum of the <code>"percentBounds"</code> option in the RationalFactory section of your configuration file. For example, say you have 500 data points and if the maximum of the <code>"percentBounds"</code> option is set to 100 percent it means the degrees of the polynomials in the rational function can go up to 500. If you set the maximum of the <code>"percentBounds"</code> option to 10, on the other hand, the maximum degree is set at 50 (= 10 percent of 500). You can also use the <code>"maxDegrees"</code> option to set an absolute bound.<br />
* If you roughly know the output range your data should have, an easy way to eliminate poles is to use the [[Measures#MinMax| MinMax]] [[Measures| Measure]] together with your current measure ([[Measures#CrossValidation| CrossValidation]] by default). This will cause models whose response falls outside the min-max bounds to be penalized extra, thus spikes should disappear.<br />
* Use a different model type (RBF, ANN, SVM,...), as spikes are a typical problem of rational functions.<br />
* Increase the population size if using the genetic version<br />
* Try using the [[SampleSelector#RationalPoleSuppressionSampleSelector| RationalPoleSuppressionSampleSelector]], it was designed to get rid of this problem more quickly, but it only selects one sample at the time.<br />
<br />
However, these solutions may not still not suffice in some cases. The underlying reason is that the order selection algorithm contains quite a lot of randomness, making it prone to over-fitting. This issue is being worked on but will take some time. Automatic order selection is not an easy problem<br />
<br />
=== There is no noise in my data yet the rational functions don't interpolate ===<br />
<br />
[[FAQ#I sometimes get flat models when using rational functions |see this question]].<br />
<br />
=== When loading a model from disk I get "Warning: Class ':all:' is an unknown object class. Object 'model' of this class has been converted to a structure." ===<br />
<br />
You are trying to load a model file without the SUMO Toolbox in your Matlab path. Make sure the toolbox is in your Matlab path. <br />
<br />
In short: Start Matlab, run <code><SUMO-Toolbox-directory>/startup.m</code> (to ensure the toolbox is in your path) and then try to load your model.<br />
<br />
=== When running the SUMO Toolbox you get an error like "No component with id 'annpso' of type 'adaptive model builder' found in config file." ===<br />
<br />
This means you have specified to use a component with a certain id (in this case an AdaptiveModelBuilder component with id 'annpso') but a component with that id does not exist further down in the configuration file (in this particular case 'annpso' does not exist but 'anngenetic' or 'ann' does, as a quick search through the configuration file will show). So make sure you only declare components which have a definition lower down. So see which components are available, simply scroll down the configuration file and see which id's are specified. Please also refer to the [[Toolbox configuration#Declarations and Definitions | Declarations and Definitions]] page.<br />
<br />
=== When using NANN models I sometimes get "Runtime error in matrix library, Choldc failed. Matrix not positive definite" ===<br />
<br />
This is a problem in the mex implementation of the [http://www.iau.dtu.dk/research/control/nnsysid.html NNSYSID] toolbox. Simply delete the mex files, the Matlab implementation will be used and this will not cause any problems.<br />
<br />
=== When using FANN models I sometimes get "Invalid MEX-file createFann.mexa64, libfann.so.2: cannot open shared object file: No such file or directory." ===<br />
<br />
This means Matlab cannot find the [http://leenissen.dk/fann/ FANN] library itself to link to dynamically. Make sure the FANN libraries (stored in src/matlab/contrib/fann/src/.libs/) are in your library path, e.g., on unix systems, make sure they are included in LD_LIBRARY_PATH.<br />
<br />
=== Undeﬁned function or method ’createFann’ for input arguments of type ’double’. ===<br />
<br />
See [[FAQ#When_using_FANN_models_I_sometimes_get_.22Invalid_MEX-file_createFann.mexa64.2C_libfann.so.2:_cannot_open_shared_object_file:_No_such_file_or_directory..22]]<br />
<br />
=== When trying to use SVM models I get 'Error during fitness evaluation: Error using ==> svmtrain at 170, Group must be a vector' ===<br />
<br />
You forgot to build the SVM mex files for your platform. For windows they are pre-compiled for you, on other systems you have to compile them yourself with the makefile.<br />
<br />
=== When running the toolbox you get something like '??? Undefined variable "ibbt" or class "ibbt.sumo.config.ContextConfig.setRootDirectory"' ===<br />
<br />
First see [[FAQ#What_is_the_relationship_between_Matlab_and_Java.3F | this FAQ entry]].<br />
<br />
This means Matlab cannot find the needed Java classes. This typically means that you forgot to run 'startup' (to set the path correctly) before running the toolbox (using 'go'). So make sure you always run 'startup' before running 'go' and that both commands are always executed in the toolbox root directory.<br />
<br />
If you did run 'startup' correctly and you are still getting an error, check that Java is properly enabled:<br />
<br />
# typing 'usejava jvm' should return 1 <br />
# typing 's = java.lang.String', this should ''not'' give an error<br />
# typing 'version('-java')' should return at least version 1.5.0<br />
<br />
If (1) returns 0, then the jvm of your Matlab installation is not enabled. Check your Matlab installation or startup parameters (did you start Matlab with -nojvm?)<br />
If (2) fails but (1) is ok, there is a very weird problem, check the Matlab documentation.<br />
If (3) returns a version before 1.5.0 you will have to upgrade Matlab to a newer version or force Matlab to use a custom, newer, jvm (See the Matlab docs for how to do this).<br />
<br />
=== You get errors related to ''gaoptimset'',''psoptimset'',''saoptimset'',''newff'' not being found or unknown ===<br />
<br />
You are trying to use a component of the SUMO toolbox that requires a Matlab toolbox that you do not have. See the [[System requirements]] for more information.<br />
<br />
=== After upgrading I get all kinds of weird errors or warnings when I run my XML files ===<br />
<br />
See [[FAQ#How_do_I_upgrade_to_a_newer_version.3F]]<br />
<br />
=== I get a warning about duplicate samples being selected, why is this? ===<br />
<br />
Sometimes, in special circumstances, multiple sample selectors may select the same sample at the same time. Even though in most cases this is detected and avoided, it can still happen when multiple outputs are modelled in one run, and each output is sampled by a different sample selector. These sample selectors may then accidentally choose the same new sample location.<br />
<br />
=== I sometimes see the error of the best model go up, shouldn't it decrease monotonically? ===<br />
<br />
There is no short answer here, it depends on the situation. Below 'single objective' refers to the case where during the hyperparameter optimization (= the modeling iteration) combineOutputs=false, and there is only a single measure set to 'on'. The other cases are classified as 'multi objective'. See also [[Multi-Objective Modeling]].<br />
<br />
# '''Sampling off'''<br />
## ''Single objective'': the error should always decrease monotonically, you should never see it rise. If it does [[reporting problems|report it as a bug]]<br />
## ''Multi objective'': There is a very small chance the error can temporarily decrease but it should be safe to ignore. In this case it is best to use a multi objective enabled modeling algorithm<br />
# '''Sampling on'''<br />
## ''Single objective'': inside each modeling iteration the error should always monotonically decrease. At each sampling iteration the best models are updated (to reflect the new data), thus there the best model score may increase, this is normal behavior(*). It is possible that the error increases for a short while, but as more samples come in it should decrease again. If this does not happen you are using a poor measure or poor hyperparameter optimization algorithm, or there is a problem with the modeling technique itself (e.g., clustering in the datapoints is causing numerical problems).<br />
## ''Multi objective'': Combination of 1.2 and 2.1.<br />
<br />
(*) This is normal if you are using a measure like cross validation that is less reliable on little data than on more data. However, in some cases you may wish to override this behavior if you are using a measure that is independent of the number of samples the model is trained with (e.g., a dense, external validation set). In this case you can force a monotonic decrease by setting the 'keepOldModels' option in the SUMO tag to true. Use with caution!<br />
<br />
=== At the end of a run I get Undefined variable "ibbt" or class "ibbt.sumo.util.JpegImagesToMovie.createMovie" ===<br />
<br />
This is normal, the warning printed out before the error explains why:<br />
<br />
''[WARNING] jmf.jar not found in the java classpath, movie creation may not work! Did you install the SUMO extension pack? Alternatively you can install the java media framwork from java.sun.com''<br />
<br />
By default, at the end of a run, the toolbox will try to generate a movie of all the intermediate model plots. To do this it requires the extension pack to be installed (you can download it from the SUMO lab website). So install the extension pack and you will no longer get the error. Alternatively you can simply set the "createMovie" option in the <SUMO> tag to "false".<br />
So note that there is nothing to worry about, everything has run correctly, it is just the movie creation that is failing.<br />
<br />
=== On startup I get the error "java.io.IOException: Couldn't get lock for output/SUMO-Toolbox.%g.%u.log" ===<br />
<br />
This error means that SUMO is unable to create the log file. Check the output directory exists and has the correct permissions. If your output directory is on a shared (network) drive this could also cause problems. Also make sure you are running the toolbox (calling 'go') from the toolbox root directory, and not in some toolbox sub directory! This is very important.<br />
<br />
If you still have problems you can override the default logfile name and location as follows:<br />
<br />
In the <FileHandler> tag inside the <Logging> tag add the following option:<br />
<br />
<code><br />
<Option key="Pattern" value="My_SUMO_Log_file.log"/><br />
</code><br />
<br />
This means that from now on the sumo log file will be saved as the file "My_SUMO_Log_file.log" in the SUMO root directory. You can use any path you like.<br />
For more information about this option see [http://java.sun.com/j2se/1.4.2/docs/api/java/util/logging/FileHandler.html the FileHandler Javadoc].<br />
<br />
=== The Toolbox crashes with "Too many open files" what should I do? ===<br />
<br />
This is a known bug, see [[Known_bugs#Version_6.1]].<br />
<br />
If this does not fix your problem then do the following:<br />
<br />
On Windows try increasing the limit in windows as dictated by the error message. Also, when you get the error, use the fopen("all") command to see which files are open and send us the list of filenames. Then we can maybe further help you debug the problem. Even better would be to use the Process Explorer utility [http://technet.microsoft.com/en-us/sysinternals/bb896653.aspx available here]. When you get the error, dont shut down Matlab but start Process explorer and see which SUMO-Toolbox related files are open. If you then [[Reporting_problems|let us know]] we can further debug the problem.<br />
<br />
On Linux again don't shut down Matlab but:<br />
<br />
* open a new terminal window<br />
* type:<br />
<source lang="bash"><br />
lsof > openFiles.txt<br />
</source><br />
* Then [[Contact|send us]] the following information:<br />
** the file openFiles.txt <br />
** the exact Linux distribution you are using (Red Hat 10, CentOS 5, SUSE 11, etc).<br />
** the output of<br />
<source lang="bash"><br />
uname -a ; df -T ; mount<br />
</source><br />
<br />
As a temporary workaround you can try increasing the maximum number of open files ([http://www.linuxforums.org/forum/redhat-fedora-linux-help/64716-where-chnage-file-max-permanently.html see for example here]). We are currently debugging this issue.<br />
<br />
In general: to be safe it is always best to do a SUMO run from a clean Matlab startup, especially if the run is important or may take a long time.<br />
<br />
=== When using the LS-SVM models I get lots of warnings: "make sure lssvmFILE.x (lssvmFILE.exe) is in the current directory, change now to MATLAB implementation..." ===<br />
<br />
The LS-SVMs have a C implementation and a Matlab implementation. If you dont have the compiled mex files it will use the matlab implementation and give a warning. But everything will work properly. To get rid of the warnings, compile the mex files [[Installation#Windows|as described here]], this can be done very easily. Or simply comment out the lines that produce the output in the lssvmlab directory in src/matlab/contrib.<br />
<br />
=== I get an error "Undefined function or method 'trainlssvm' for input arguments of type 'cell'" ===<br />
<br />
You most likely forgot to [[Installation#Extension_pack|install the extension pack]].<br />
<br />
=== When running the SUMO-Toolbox under Linux, the [http://en.wikipedia.org/wiki/X_Window_System X server] suddenly restarts and I am logged out of my session ===<br />
<br />
Note that in Linux there is an explicit difference between the [http://en.wikipedia.org/wiki/Linux_kernel kernel] and the [http://en.wikipedia.org/wiki/X_Window_System X display server]. If the kernel crashes or panics your system completely freezes (you have to reset manually) or your computer does a full reboot. Luckily this is very rare. However, if you display server (X) crashes or restarts it means your operating system is still running fine, its just that you have to log in again since your graphical session has terminated. The FAQ entry is only for the latter. If you find your kernel is panicing or freezing, that is a more fundamental problem and you should contact your system admin.<br />
<br />
So what happens is that after a few seconds when the toolbox wants to plot the first model [http://en.wikipedia.org/wiki/X_Window_System X] crashes and you are suddenly presented with a login screen. The problem is not due to SUMO but rather to the Matlab - Display server interaction.<br />
<br />
What you should first do is set plotModels to false in the [[Config:ContextConfig]] tag, run again and see if the problem occurs again. If it does please [[Reporting_problems| report it]]. If the problem does not occur you can then try the following:<br />
<br />
* Log in as root (or use [http://en.wikipedia.org/wiki/Sudo sudo])<br />
* Edit the following configuration file using a text editor (pico, nano, vi, kwrite, gedit,...)<br />
<br />
<source lang="bash"><br />
/etc/X11/xorg.conf<br />
</source><br />
<br />
Note: the exact location of the xorg.conf file may vary on your system.<br />
<br />
* Look for the following line:<br />
<br />
<source lang="bash"><br />
Load "glx"<br />
</source><br />
<br />
* Comment it out by replacing it by:<br />
<br />
<source lang="bash"><br />
# Load "glx"<br />
</source><br />
<br />
* Then save the file, restart your X server (if you do not know how to do this simply reboot your computer)<br />
* Log in again, and try running the toolbox (making sure plotModels is set to true again). It should now work. If it still does not please [[Reporting_problems| report it]].<br />
<br />
Note:<br />
* this is just an empirical workaround, if you have a better idea please [[Contact|let us know]]<br />
* if you wish to debug further yourself please check the Xorg log files and those in /var/log<br />
* another possible workaround is to start matlab with the "-nodisplay" option. That could work as well.<br />
<br />
=== I get the error "Failed to close Matlab pool cleanly, error is Too many output arguments" ===<br />
<br />
This happens if you run the toolbox on Matlab version 2008a and you have the parallel computing toolbox installed. You can simply ignore this error message, it does not cause any problems. If you want to use SUMO with the parallel computing toolbox you will need Matlab 2008b.<br />
<br />
=== The toolbox seems to keep on running forever, when or how will it stop? ===<br />
<br />
The toolbox will keep on generating models and selecting data until one of the termination criteria has been reached. It is up to ''you'' to choose these targets carefully, so how low the toolbox runs simply depends on what targets you choose. Please see [[Running#Understanding_the_control_flow]].<br />
<br />
Of course choosing a-priori targets up front is not always easy and there is no real solution for this, except thinking well about what type of model you want (see [[FAQ#I_dont_like_the_final_model_generated_by_SUMO_how_do_I_improve_it.3F]]). In doubt you can always use a small value (or 0) and then simply quit the running toolbox using Ctrl-C when you think its been enough.<br />
<br />
While one could implement fancy, automatic stopping algorithms, their actual benefit is questionable.</div>Dgorissenhttp://sumowiki.intec.ugent.be/index.php?title=FAQ&diff=5206FAQ2010-08-18T07:25:40Z<p>Dgorissen: /* Using */</p>
<hr />
<div>== General ==<br />
<br />
=== What is a global surrogate model? ===<br />
<br />
A global [http://en.wikipedia.org/wiki/Surrogate_model surrogate model] is a mathematical model that mimics the behavior of a computationally expensive simulation code over '''the complete parameter space''' as accurately as possible, using as little data points as possible. So note that optimization is not the primary goal, although it can be done as a post-processing step. Global surrogate models are useful for:<br />
<br />
* design space exploration, to get a ''feel'' of how the different parameters behave<br />
* sensitivity analysis<br />
* ''what-if'' analysis<br />
* prototyping<br />
* visualization<br />
* ...<br />
<br />
In addition they are a cheap way to model large scale systems, multiple global surrogate models can be chained together in a model cascade.<br />
<br />
See also the [[About]] page.<br />
<br />
=== What about surrogate driven optimization? ===<br />
<br />
When coining the term '''surrogate driven optimization''' most people associate it with trust-region strategies and simple polynomial models. These frameworks first construct a local surrogate which is optimized to find an optimum. Afterwards, a move limit strategy decides how the local surrogate is scaled and/or moved through the input space. Subsequently the surrogate is rebuild and optimized. I.e. the surrogate zooms in to the global optimum. For instance the [http://www.cs.sandia.gov/DAKOTA/ DAKOTA] Toolbox implements such strategies where the surrogate construction is separated from optimization.<br />
<br />
Such a framework was earlier implemented in the SUMO Toolbox but was deprecated as it didn't fit the philosophy and design of the toolbox. <br />
<br />
Instead another, equally powerful, approach was taken. The current optimization framework is in fact a sampling selection strategy that balances local and global search. In other words, it balances between exploring the input space and exploiting the information the surrogate gives us.<br />
<br />
A configuration example can be found [[Config:SampleSelector#expectedImprovement|here]].<br />
<br />
=== What is (adaptive) sampling? Why is it used? ===<br />
<br />
In classical Design of Experiments you need to specify the design of your experiment up-front. Or in other words, you have to say up-front how many data points you need and how they should be distributed. Two examples are Central Composite Designs and Latin Hypercube designs. However, if your data is expensive to generate (e.g., an expensive simulation code) it is not clear how many points are needed up-front. Instead data points are selected adaptively, only a couple at a time. This process of incrementally selecting new data points in regions that are the most interesting is called adaptive sampling, sequential design, or active learning. Of course the sampling process needs to start from somewhere so the very first set of points is selected based on a fixed, classic experimental design. See also [[Running#Understanding_the_control_flow]].<br />
SUMO provides a number of different sampling algorithms: [[SampleSelector]]<br />
<br />
Of course sometimes you dont want to do sampling. For example if you have a fixed dataset you just want to load all the data in one go and model that. For how to do this see [[FAQ#How_do_I_turn_off_adaptive_sampling_.28run_the_toolbox_for_a_fixed_set_of_samples.29.3F]].<br />
<br />
=== What about dynamical, time dependent data? ===<br />
<br />
The original design and purpose was to tackle static input-output systems, where there is no memory. Just a complex mapping that must be learnt and approximated. Of course you can take a fixed time interval and apply the toolbox but that typically is not a desired solution. Usually you are interested in time series prediction, e.g., given a set of output values from time t=0 to t=k, predict what happens at time t=k+1,k+2,...<br />
<br />
The toolbox was originally not intended for this purpose. However, it is quite easy to add support for recurrent models. Automatic generation of dynamical models would involve adding a new model type (just like you would add a new regression technique) or require adapting an existing one. For example it would not be too much work to adapt the ANN or SVM models to support dynamic problems. The only extra work besides that would be to add a new [[Measures|Measure]] that can evaluate the fidelity of the models' prediction.<br />
<br />
Naturally though, you would be unable to use sample selection (since it makes no sense in those problems). Unless of course there is a specialized need for it. In that case you would add a new [[SampleSelector]].<br />
<br />
For more information on this topic [[Contact]] us.<br />
<br />
=== What about classification problems? ===<br />
<br />
The main focus of the SUMO Toolbox is on regression/function approximation. However, the framework for hyperparameter optimization, model selection, etc. can also be used for classification. Starting from version 6.3 a demo file is included in the distribution that shows how this works on a well known test problem. If you want to play around with this feature without waiting for 6.3 to be released [[Contact|just let us know]].<br />
<br />
=== Can the toolbox drive my simulation code directly? ===<br />
<br />
Yes it can. See the [[Interfacing with the toolbox]] page.<br />
<br />
=== What is the difference between the M3-Toolbox and the SUMO-Toolbox? ===<br />
<br />
The SUMO toolbox is a complete, feature-full framework for automatically generating approximation models and performing adaptive sampling. In contrast, the M3-Toolbox was more of a proof-of-principle.<br />
<br />
=== What happened to the M3-Toolbox? ===<br />
<br />
The M3 Toolbox project has been discontinued (Fall 2007) and superseded by the SUMO Toolbox. Please contact tom.dhaene@ua.ac.be for any inquiries and requests about the M3 Toolbox.<br />
<br />
=== How can I stay up to date with the latest news? ===<br />
<br />
To stay up to date with the latest news and releases, we also recommend subscribing to our newsletter [http://www.sumo.intec.ugent.be here]. Traffic will be kept to a minimum (1 message every 2-3 months) and you can unsubscribe at any time.<br />
<br />
You can also follow our blog: [http://sumolab.blogspot.com/ http://sumolab.blogspot.com/].<br />
<br />
=== What is the roadmap for the future? ===<br />
<br />
There is no explicit roadmap since much depends on where our research leads us, what feedback we get, which problems we are working on, etc. However, to get an idea of features to come you can always check the [[Whats new]] page.<br />
<br />
You can also follow our blog: [http://sumolab.blogspot.com/ http://sumolab.blogspot.com/].<br />
<br />
=== Will there be an R/Scilab/Octave/Sage/.. version? ===<br />
<br />
At the start of the project we considered moving from Matlab to one of the available open source alternatives. However, after much discussion we decided against this for several reasons, including:<br />
<br />
* Existing experience and know-how of the development team<br />
* The widespread use of the Matlab platform in the target application domains<br />
* The quality and amount of available Matlab documentation<br />
* The quality and number of Matlab toolboxes<br />
* Support for object orientation (inheritance, polymorphism, etc.)<br />
* Many well documented interfacing options (especially the seamless integration with Java)<br />
<br />
Matlab, as a proprietary platform, definitely has its problems and deficiencies but the number of advanced algorithms and available toolboxes make it a very attractive platform. Equally important is the fact that every function is properly documented, tested, and includes examples, tutorials, and in some cases GUI tools. A lot of things would have been a lot harder and/or time consuming to implement on one of the other platforms. Add to that the fact that many engineers (particularly in aerospace) already use Matlab quite heavily. Thus given our situation, goals, and resources at the time, Matlab was the best choice for us. <br />
<br />
The other platforms remain on our radar however, and we do look into them from time to time. Though, with our limited resources porting to one of those platforms is not (yet) cost effective.<br />
<br />
=== What are collaboration options? ===<br />
<br />
We will gladly help out with any SUMO-Toolbox related questions or problems. However, since we are a university research group the most interesting goal for us is to work towards some joint publication (e.g., we can help with the modeling of your problem). Alternatively, it is always nice if we could use your data/problem (fully referenced and/or anonymized if necessary of course) as an example application during a conference presentation or in a PhD thesis.<br />
<br />
The most interesting case is if your problem involves sample selection and modeling. This means you have some simulation code or script to drive and you want an accurate model while minimizing the number of data points. In this case, in order for us to optimally help you it would be easiest if we could run your simulation code (or script) locally or access it remotely. Else its difficult to give good recommendations about what settings to use.<br />
<br />
If this is not possible (e.g., expensive, proprietary or secret modeling code) or if your problem does not involve sample selection, you can send us a fixed data set that is representative of your problem. Again, this may be fully anonymized and will be kept confidential of course.<br />
<br />
In either case (code or dataset) remember:<br />
<br />
* the data file should be an ASCII file in column format (each row containing one data point) (see also [[Interfacing_with_the_toolbox]])<br />
* include a short description of your data:<br />
** number of inputs and number of outputs<br />
** the range of each input (or scaled to [-1 1] if you do not wish to disclose this)<br />
** if the outputs are real or complex valued<br />
** how noisy the data is or if it is completely deterministic (computer simulation) (please also see: [[FAQ#My_data_contains_noise_can_the_SUMO-Toolbox_help_me.3F]]).<br />
** if possible the expected range of each output (or scaled if you do not wish to disclose this)<br />
** if possible the names of each input/output + a short description of what they mean<br />
** any further insight you have about the data, expected behavior, expected importance of each input, etc.<br />
<br />
If you have any further questions or comments related to this please [[Contact]] us.<br />
<br />
=== Can you help me model my problem? ===<br />
<br />
Please see the previous question: [[FAQ#What_are_collaboration_options.3F]]<br />
<br />
== Installation and Configuration ==<br />
<br />
=== What is the relationship between Matlab and Java? ===<br />
<br />
Many people do not know this, but your Matlab installation automatically includes a Java virtual machine. By default, Matlab seamlessly integrates with Java, allowing you to create Java objects from the command line (e.g., 's = java.lang.String'). It is possible to disable java support but in order to use the SUMO Toolbox it should not be. To check if Java is enabled you can use the 'usejava' command.<br />
<br />
=== What is Java, why do I need it, do I have to install it, etc. ? ===<br />
<br />
The short answer is: no, dont worry about it. The long answer is: Some of the code of the SUMO Toolbox is written in [http://en.wikipedia.org/wiki/Java_(programming_language) Java], since it makes a lot more sense in many situations and is a proper programming language instead of a scripting language like Matlab. Since Matlab automatically includes a JVM to run Java code there is nothing you need to do or worry about (see the previous FAQ entry). Unless its not working of course, in that case see [[FAQ#When_running_the_toolbox_you_get_something_like_.27.3F.3F.3F_Undefined_variable_.22ibbt.22_or_class_.22ibbt.sumo.config.ContextConfig.setRootDirectory.22.27]].<br />
<br />
=== What is XML? ===<br />
<br />
XML stands for eXtensible Markup Language and is related to HTML (= the stuff web pages are written in). The first thing you have to understand is that '''does not do anything'''. Honest. Many engineers are not used to it and think it is some complicated computer programming language-stuff-thingy. This is of course not the case (we ignore some of the fancy stuff you can do with it for now). XML is a markup language meaning, it provides some rules how you can annotate or structure existing text.<br />
<br />
The way SUMO uses XML is really simple and there is not much to understand. First some simple terminology. Take the following example:<br />
<br />
<source lang="xml"><br />
<Foo attr="bar">bla bla bla</Foo> <br />
</source><br />
<br />
Here we have '''a tag''' called ''Foo'' containing text ''bla bla bla''. The tag Foo also has an '''attribute''' ''attr'' with value ''bar''. '<Foo>' is what we call the '''opening tag''', and '</Foo>' is the '''closing tag'''. Each time you open a tag you must close it again. How you name the tags or attributes it totally up to you, you choose :)<br />
<br />
Lets take a more interesting example. Here we have used XML to represent information about a receipe for pancakes:<br />
<br />
<source lang="xml"><br />
<recipe category="dessert"><br />
<title>Pancakes</title><br />
<author>sumo@intec.ugent.be</author><br />
<date>Wed, 14 Jun 95</date><br />
<description><br />
Good old fashioned pancakes.<br />
</description><br />
<ingredients><br />
<item><br />
<amount>3</amount><br />
<type>eggs</type><br />
</item><br />
<br />
<item><br />
<amount>0.5 tablespoon</amount><br />
<type>salt</type><br />
</item><br />
...<br />
</ingredients><br />
<preparation><br />
...<br />
</preparation><br />
</recipe><br />
</source><br />
<br />
So basically, you see that XML is just a way to structure, order, and group information. Thats it! So SUMO basically uses it to store and structure configuration options. And this works well due to the nice hierarchical nature of XML.<br />
<br />
If you understand this there is nothing else to it in order to be able to understand the SUMO configuration files. If you need more information see the tutorial here: [http://www.w3schools.com/XML/xml_whatis.asp http://www.w3schools.com/XML/xml_whatis.asp]. You can also have a look at the wikipedia page here: [http://en.wikipedia.org/wiki/XML http://en.wikipedia.org/wiki/XML]<br />
<br />
=== Why does SUMO use XML? ===<br />
<br />
XML is the defacto standard way of structuring information. This ranges from spreadsheet files (Microsoft Excel for example), to configuration data, to scientific data, ... There are even whole database systems based solely on XML. So basically, its an intuitive way to structure data and it is used everywhere. This makes that there are a very large number of libraries and programming languages available that can parse, and handle XML easily. That means less work for the programmer. Then of course there is stuff like XSLT, XQuery, etc that makes life even easier.<br />
So basically, it would not make sense for SUMO to use any other format :)<br />
<br />
=== I get an error that SUMO is not yet activated ===<br />
<br />
Make sure you installed the activation file that was mailed to you as is explained in the [[Installation]] instructions. Also double check your system meets the [[System requirements]] and that [http://www.sumowiki.intec.ugent.be/index.php/FAQ#When_running_the_toolbox_you_get_something_like_.27.3F.3F.3F_Undefined_variable_.22ibbt.22_or_class_.22ibbt.sumo.config.ContextConfig.setRootDirectory.22.27|java java is enabled]. To fully verify that the activation file installation is correct ensure that the file ContextConfig.class is present in the directory ''<SUMO installation directory>/bin/java/ibbt/sumo/config''.<br />
<br />
Please note that more flexible research licenses are available if it is possible to [[FAQ#What_are_collaboration_options.3F|collaborate in any way]].<br />
<br />
== Upgrading ==<br />
<br />
=== How do I upgrade to a newer version? ===<br />
<br />
Delete your old <code><SUMO-Toolbox-directory></code> completely and replace it by the new one. Install the new activation file / extension pack as before (see [[Installation]]), start Matlab and make sure the default run works. To port your old configuration files to the new version: make a copy of default.xml (from the new version) and copy over your custom changes (from the old version) one by one. This should prevent any weirdness if the XML structure has changed between releases.<br />
<br />
If you had a valid activation file for the previous version, just [[Contact]] us (giving your SUMOlab website username) and we will send you a new activation file. Note that to update an activation file you must first unzip a copy of the toolbox to a new directory and install the activation file as if it was the very first time. Upgrading of an activation file without performing a new toolbox install is (unfortunately) not (yet) supported.<br />
<br />
== Using ==<br />
<br />
=== I have no idea how to use the toolbox, what should I do? ===<br />
<br />
See: [[Running#Getting_started]]<br />
<br />
=== I want to try one of the different examples ===<br />
<br />
See [[Running#Running_different_examples]].<br />
<br />
=== I want to model my own problem ===<br />
<br />
See : [[Adding an example]].<br />
<br />
=== I want to contribute some data/patch/documentation/... ===<br />
<br />
See : [[Contributing]].<br />
<br />
=== How do I interface with the SUMO Toolbox? ===<br />
<br />
See : [[Interfacing with the toolbox]].<br />
<br />
=== What configuration options (model type, sample selection algorithm, ...) should I use for my problem? ===<br />
<br />
See [[General_guidelines]].<br />
<br />
=== Ok, I generated a model, what can I do with it? ===<br />
<br />
See: [[Using a model]].<br />
<br />
=== How can I share a model created by the SUMO Toolbox? ===<br />
<br />
See : [[Using a model#Model_portability| Model portability]].<br />
<br />
=== I dont like the final model generated by SUMO how do I improve it? ===<br />
<br />
Before you start the modeling you should really ask youself this question: ''What properties do I want to see in the final model?'' You have to think about what for you constitutes a good model and what constitutes a poor model. Then you should rank those properties depending on how important you find them. Examples are:<br />
<br />
* accuracy in the training data<br />
** is it important that the error in the training data is exactly 0, or do you prefer some smoothing<br />
* accuracy outside the training data<br />
** this is the validation or test error, how important is proper generalization (usually this is very important)<br />
* what does accuracy mean to you? a low maximum error, a low average error, both, ...<br />
* smoothness<br />
** should your model be perfectly smooth or is it acceptable that you have a few small ripples here and there for example<br />
* are some regions of the response more important than others?<br />
** for example you may want to be certain that the minima/maxima are captured very accurately but everything in between is less important<br />
* are there particular special features that your model should have<br />
** for example, capture underlying poles or discontinuities correctly<br />
* extrapolation capability<br />
* ...<br />
<br />
It is important to note that often these criteria may be conflicting. The classical example is fitting noisy data: the lower your training error the higher your testing error. A natural approach is to combine multiple criteria, see [[Multi-Objective Modeling]].<br />
<br />
Once you have decided on a set of requirements the question is then, can the SUMO-Toolbox produce a model that meets them? In SUMO model generation is driven by one or more [[Measures]]. So you should choose the combination of [[Measures]] that most closely match your requirements. Of course we can not provide a Measure for every single property, but it is very straightforward to [[Add_Measure|add your own Measure]].<br />
<br />
Now, lets say you have chosen what you think are the best Measures but you are still not happy with the final model. Reasons could be:<br />
<br />
* you need more modeling iterations or you need to build more models per iteration (see [[Running#Understanding_the_control_flow]]). This will result in a more extensive search of the model parameter space, but will take longer to run.<br />
* you should switch to a different model parameter optimization algorithm (e.g., for example instead of the Pattern Search variant, try the Genetic Algorithm variant of your AdaptiveModelBuilder.)<br />
* the model type you are using is not ideally suited to your data<br />
* there simply is not enough data, use a larger initial design or perform more sampling iterations to get more information per dimension<br />
* maybe the sample distribution is causing troubles for your model (e.g., Kriging can have problems with clustered data). In that case it could be worthwhile to choose a different sample selection algorithm.<br />
* the range of your response variable is not ideal (for example, neural networks have trouble modeling data if the range of the outputs is very very small)<br />
<br />
You may also refer to the following [[General_guidelines]]. Finally, of course it may be that your problem is simply a very difficult one and does not approximate well. But, still you should at least get something satisfactory.<br />
<br />
If you are having these kinds of problems, please [[Reporting_problems|let us know]] and we will gladly help out.<br />
<br />
=== My data contains noise can the SUMO-Toolbox help me? ===<br />
<br />
The original purpose of the SUMO-Toolbox was for it to be used in conjunction with computer simulations. Since these are fully deterministic you do not have to worry about noise in the data and all the problems it causes. However, the methods in the toolbox are general fitting methods that work on noisy data as well. So yes, the toolbox can be used with noisy data, but you will just have to be more careful about how you apply the methods and how you perform model selection. Its only when you use the toolbox with a noisy simulation engine that a few special options may need to be set. In that case [[Contact]] us for more information.<br />
<br />
Note though, that the toolbox is not a statistical package, if you have noisy data and you need noise estimation algorithms, kernel smoothing algorithms, etc. you should look towards other tools.<br />
<br />
=== What is the difference between a ModelBuilder and a ModelFactory? ===<br />
<br />
See [[Add Model Type]].<br />
<br />
=== Why are the Neural Networks so slow? ===<br />
<br />
The ANN models are an extremely powerful model type that give very good results in many problems. However, they are quite slow to use. There are some things you can do:<br />
<br />
* use trainlm or trainscg instead of the default training function trainbr. trainbr gives very good, smooth results but is slower to use. If results with trainlm are not good enough, try using msereg as a performance function.<br />
* try setting the training goal (= the SSE to reach during training) to a small positive number (e.g., 1e-5) instead of 0.<br />
* check that the output range of your problem is not very small. If your response data lies between 10e-5 and 10e-9 for example it will be very hard for the neural net to learn it. In that case rescale your data to a more sane range.<br />
* switch from ANN to one of the other neural network modelers: fanngenetic or nanngenetic. These are a lot faster than the default backend based on the [http://www.mathworks.com/products/neuralnet/ Matlab Neural Network Toolbox]. However, the accuracy is usually not as good.<br />
* If you are using [[Measures#CrossValidation| CrossValidation]] try to switch to a different measure since CrossValidation is very expensive to use. CrossValidation is used by default if you have not defined a [[Measures| measure]] yourself. When using one of the neural network model types, try to use a different measure if you can. For example, our tests have shown that minimizing the sum of [[Measures#SampleError| SampleError]] and [[Measures#LRMMeasure| LRMMeasure]] can give equal or even better results than CrossValidation, while being much cheaper (see [[Multi-Objective Modeling]] for how to combine multiple measures). See also the comments in <code>default.xml</code> for examples.<br />
<br />
See also [[FAQ#How_can_I_make_the_toolbox_run_faster.3F]]<br />
<br />
=== How can I make the toolbox run faster? ===<br />
<br />
There are a number of things you can do to speed things up. These are listed below. Remember though that the main reason the toolbox may seem to be slow is due to the many models being built as part of the hyperparameter optimization. Please make sure you fully understand the [[Running#Understanding_the_control_flow|control flow described here]] before trying more advanced options.<br />
<br />
* First of all check that your virus scanner is not interfering with Matlab. If McAfee or any other program wants to scan every file SUMO generates this really slows things down and your computer becomes unusable.<br />
<br />
* Turn off the plotting of models in [[Config:ContextConfig#PlotOptions| ContextConfig]], you can always generate plots from the saved mat files<br />
<br />
* This is an important one. For most model builders there is an option "maxFunEals", "maxIterations", or equivalent. Change this value to change the maximum number of models built between 2 sampling iterations. The higher this number, the slower, but the better the models ''may'' be. Equivalently, for the Genetic model builders reduce the population size and the number of generations.<br />
<br />
* If you are using [[Measures#CrossValidation]] see if you can avoid it and use one of the other measures or a combination of measures (see [[Multi-Objective Modeling]]<br />
<br />
* If you are using a very dense [[Measures#ValidationSet]] as your Measure, this means that every single model will be evaluated on that data set. For some models like RBF, Kriging, SVM, this can slow things down.<br />
<br />
* Disable some, or even all of the [[Config:ContextConfig#Profiling| profilers]] or disable the output handlers that draw charts. For example, you might use the following configuration for the profilers:<br />
<br />
<source lang="xml"><br />
<Profiling><br />
<Profiler name=".*share.*|.*ensemble.*|.*Level.*" enabled="true"><br />
<Output type="toImage"/><br />
<Output type="toFile"/><br />
</Profiler><br />
<br />
<Profiler name=".*" enabled="true"><br />
<Output type="toFile"/><br />
</Profiler><br />
</Profiling><br />
</source><br />
<br />
The ".*" means match any one or more characters ([http://java.sun.com/j2se/1.5.0/docs/api/java/util/regex/Pattern.html see here for the full list of supported wildcards]). Thus in this example all the profilers that have "share", "ensemble", or "Level" in their name shoud be enabled and should be saved as a text file (toFile) AND as an image file (toImage). All the other profilers should be saved just to file. The idea is to only save to image what you want as an image since image generation is expensive. If you do this or switch off image generation completely you will see everything run much faster.<br />
<br />
* Decrease the logging granularity, a log level of FINE (the default is FINEST or ALL) is more then granular enough. Setting it to FINE, INFO, or even WARNING should speed things up.<br />
<br />
* If you have a multi-core/multi-cpu machine:<br />
** if you have the Matlab Parallel Computing Toolbox, try setting the parallelMode option to true in [[Config:ContextConfig]]. Now all model training occurs in parallel. This may give unexpected errors in some cases so beware when using.<br />
** if you are using a native executable or script as the sample evaluator set the threadCount variable in [[Config:SampleEvaluator#LocalSampleEvaluator| LocalSampleEvaluator]] equal to the number of cores/CPUs (only do this if it is ok to start multiple instances of your simulation script in parallel!)<br />
<br />
* Dont use the Min-Max measure, it can slow things down. See also [[FAQ#How_do_I_force_the_output_of_the_model_to_lie_in_a_certain_range]]<br />
<br />
* If you are using neural networks see [[FAQ#Why_are_the_Neural_Networks_so_slow.3F]]<br />
<br />
* If you are having problems with very slow or seemingly hanging runs:<br />
** Do a run inside the [http://www.mathworks.com/access/helpdesk/help/techdoc/index.html?/access/helpdesk/help/techdoc/matlab_env/f9-17018.html&http://www.google.be/search?client=firefox-a&rls=org.mozilla%3Aen-US%3Aofficial&channel=s&hl=nl&q=matlab+profiler&meta=&btnG=Google+zoeken Matlab profiler] and see where most time is spent.<br />
<br />
** Monitor CPU and physical/virtual memory usage while the SUMO toolbox is running and see if you notice anything strange. <br />
<br />
* Also note that by default Matlab only allocates about 117 MB memory space for the Java Virtual Machine. If you would like to increase this limit (which you should) please follow the instructions [http://www.mathworks.com/support/solutions/data/1-18I2C.html?solution=1-18I2C here]. See also the general memory instructions [http://www.mathworks.com/support/tech-notes/1100/1106.html here].<br />
<br />
To check if your SUMO run has hanged, monitor your log file (with the level set at least to FINE). If you see no changes for about 30 minutes the toolbox will probably have stalled. [[Reporting problems| report the problems here]].<br />
<br />
Such problems are hard to identify and fix so it is best to work towards a reproducible test case if you think you found a performance or scalability issue.<br />
<br />
=== How do I build models with more than one output ===<br />
<br />
Sometimes you have multiple responses that you want to model at once. See [[Running#Models_with_multiple_outputs]]<br />
<br />
=== How do I turn off adaptive sampling (run the toolbox for a fixed set of samples)? ===<br />
<br />
See : [[Adaptive Modeling Mode]].<br />
<br />
=== How do I change the error function (relative error, RMSE, ...)? ===<br />
<br />
The [[Measures| <Measure>]] tag specifies the algorithm to use to assign models a score, e.g., [[Measures#CrossValidation| CrossValidation]]. It is also possible to specify which '''error function''' to use, in the measure. The default error function is '<code>rootRelativeSquareError</code>'.<br />
<br />
Say you want to use [[Measures#CrossValidation| CrossValidation]] with the maximum absolute error, then you would put:<br />
<br />
<source lang="xml"><br />
<Measure type="CrossValidation" target="0.001" errorFcn="maxAbsoluteError"/><br />
</source><br />
<br />
On the other hand, if you wanted to use the [[Measures#ValidationSet| ValidationSet]] measure with a relative root-mean-square error you would put:<br />
<br />
<source lang="xml"><br />
<Measure type="ValidationSet" target="0.001" errorFcn="relativeRms"/><br />
</source><br />
<br />
The default error function is '<code>rootRelativeSquareError</code>'. These error functions can be found in the <code>src/matlab/tools/errorFunctions</code> directory. You are free to modify them and add your own. Remember that the choice of error function is very important! Make sure you think well about it. Also see [[Multi-Objective Modeling]].<br />
<br />
=== How do I enable more profilers? ===<br />
<br />
Go to the [[Config:ContextConfig#Profiling| <Profiling>]] tag and put <code>"<nowiki>.*</nowiki>"</code> as the regular expression. See also the next question.<br />
<br />
=== What regular expressions can I use to filter profilers? ===<br />
<br />
See the syntax [http://java.sun.com/j2se/1.5.0/docs/api/java/util/regex/Pattern.html here].<br />
<br />
=== How can I ensure deterministic results? ===<br />
<br />
See : [[Random state]].<br />
<br />
=== How do I get a simple closed-form model (symbolic expression)? ===<br />
<br />
See : [[Using a model]].<br />
<br />
=== How do I enable the Heterogenous evolution to automatically select the best model type? ===<br />
<br />
Simply use the [[Config:AdaptiveModelBuilder#heterogenetic| heterogenetic modelbuilder]] as you would any other.<br />
<br />
=== What is the combineOutputs option? ===<br />
<br />
See [[Running#Models_with_multiple_outputs]]<br />
<br />
=== What error function should I use? ===<br />
<br />
The default error function is the Root Relative Square Error (RRSE). On the other hand meanRelativeError may be more intuitive but in that case you have to be careful if you have function values close to zero since in that case the relative error explodes or even gives infinity. You could also use one of the combined relative error functions (contain a +1 in the denominator to account for small values) but then you get something between a relative and absolute error (=> hard to interpret).<br />
<br />
So to be sure an absolute error seems the safest bet (like the RMSE), however in that case you have to come up with sensible accuracy targets and realize that you will build models that try to fit the regions of high absolute value better than the low ones.<br />
<br />
Picking an error function is a very tricky business and many people do not realize this. Which one is best for you and what targets you use ultimately depends on your application and on what kind of model you want. There is no general answer.<br />
<br />
A recommended read is [http://www.springerlink.com/content/24104526223221u3/ is this paper]. See also the page on [[Multi-Objective Modeling]].<br />
<br />
=== I just want to generate an initial design (no sampling, no modeling) ===<br />
<br />
Do a regular SUMO run, except set the 'maxModelingIterations' in the SUMO tag to 0. The resulting run will only generate (and evaluate) the initial design and save it to samples.txt in the output directory.<br />
<br />
=== How do I start a run with the samples of of a previous run, or with a custom initial design? ===<br />
<br />
Use a Dataset design component, for example:<br />
<br />
<source lang="xml"><br />
<InitialDesign type="DatasetDesign"><br />
<Option key="file" value="/path/to/the/file/containing/the/points.txt"/><br />
</InitialDesign><br />
</source><br />
<br />
=== What is a level plot? ===<br />
<br />
A level plot is a plot that shows how the error histogram changes as the best model improves. An example is:<br />
<gallery><br />
Image:levelplot.png<br />
</gallery><br />
Level plots only work if you have a separate dataset (test set) that the model can be checked against. See the comments in default.xml for how to enable level plots.<br />
<br />
===I am getting a java out of memory error, what happened?===<br />
Datasets are loaded through java. This means that the java heap space is used for storing the data. If you try to load a huge dataset (> 50MB), you might experience problems with the maximum heap size. You can solve this by raising the heap size as described on the following webpage:<br />
[http://www.mathworks.com/support/solutions/data/1-18I2C.html]<br />
<br />
=== How do I force the output of the model to lie in a certain range ===<br />
<br />
See [[Measures#MinMax]].<br />
<br />
=== My problem is high dimensional and has a lot of input parameters (more than 10). Can I use SUMO? ===<br />
<br />
That depends. Remember that the main focus of SUMO is to generate accurate 'global' models. If you want to do sampling the practical dimensionality is limited to around 6-8 (though it depends on the problem and how cheap the simulations are!). Since the more dimensions the more space you need to fill. At that point you need to see if you can extend the models with domain specific knowledge (to improve performance) or apply a dimensionality reduction method ([[FAQ#Can_the_toolbox_tell_me_which_are_the_most_important_inputs_.28.3D_variable_selection.29.3F|see the next question]]). On the other hand, if you don't need to do sample selection but you have a fixed dataset which you want to model. Then the performance on high dimensional data just depends on the model type. For examples SVM type models are independent of the dimension and thus can always be applied. Though things like feature selection are always recommended.<br />
<br />
=== Can the toolbox tell me which are the most important inputs (= variable selection)? ===<br />
<br />
When tackling high dimensional problems a crucial question is "Are all my input parameters relevant?". Normally domain knowledge would answer this question but this is not always straightforward. In those cases a whole set of algorithms exist for doing dimensionality reduction (= feature selection). Support for some of these algorithms may eventually make it into the toolbox but are not currently implemented. That is a whole PhD thesis on its own. However, if a model type provides functions for input relevance determination the toolbox can leverage this. For example, the LS-SVM model available in the toolbox supports Automatic Relevance Determination (ARD). This means that if you use the SUMO Toolbox to generate an LS-SVM model, you can call the function ''ARD()'' on the model and it will give you a list of the inputs it thinks are most important.<br />
<br />
=== Should I use a Matlab script or a shell script for interfacing with my simulation code? ===<br />
<br />
When you want to link SUMO with an external simulation engine (ADS Momentum, SPECTRE, FEBIO, SWAT, ...) you need a [http://en.wikipedia.org/wiki/Shell_script shell script] (or executable) that can take the requested points from SUMO, setup the simulation engine (e.g., set necessary input files), calls the simulator for all the requested points, reads the output (e.g., one or more output files), and returns the results to SUMO (see [[Interfacing with the toolbox]]).<br />
<br />
Which one you choose (matlab script + [[Config:SampleEvaluator#matlab|Matlab Sample Evaluator]], or shell script/executable with [[Config:SampleEvaluator#local|Local Sample Evaluator]] is basically a matter of preference, take whatever is easiest for you.<br />
<br />
HOWEVER, there is one important consideration: Matlab does not support threads so this means that if you use a matlab script to interface with the simulation engine, simulations and modeling will happen sequentially, NOT in parallel. This means the modeling code will sit around waiting, doing nothing, until the simulation(s) have finished. If your simulation code takes a long time to run this is not very efficient. In version 6.2 we will probably fix this by using the Parallel Computing Toolbox.<br />
<br />
On the other hand, using a shell script/executable, does allow the modeling and simulation to occur in parallel (at least if you wrote your interface script in such a way that it can be run multiple times in parallel, i.e., no shared global directories or variables that can cause [http://en.wikipedia.org/wiki/Race_condition race conditions]).<br />
<br />
As a sidenote, note that if you already put work into a Matlab script, it is still possible to use a shell script, by writing a shell script that starts Matlab (using -nodisplay or -nojvm options), executes your script (using the -r option), and exits Matlab again. Of course it is not very elegant and adds some overhead but depending on your situation it may be worth it.<br />
<br />
=== How can I look at the internal structure of a SUMO model ===<br />
<br />
See [[Using_a_model#Available_methods]].<br />
<br />
=== Is there any design documentation available? ===<br />
<br />
An in depth overview of the rationale and philosophy, including a treatment of the software architecture underlying the SUMO Toolbox is available in the form of a PhD dissertation. A copy of this dissertation [http://www.sumo.intec.ugent.be/?q=system/files/2010_04_PhD_DirkGorissen.pdf is available here].<br />
<br />
== Troubleshooting ==<br />
<br />
=== I have a problem and I want to report it ===<br />
<br />
See : [[Reporting problems]].<br />
<br />
=== I sometimes get flat models when using rational functions ===<br />
<br />
First make sure the model is indeed flat, and does not just appear so on the plot. You can verify this by looking at the output axis range and making sure it is within reasonable bounds. When there are poles in the model, the axis range is sometimes stretched to make it possible to plot the high values around the pole, causing the rest of the model to appear flat. If the model contains poles, refer to the next question for the solution.<br />
<br />
The [[Config:AdaptiveModelBuilder#rational| RationalModel]] tries to do a least squares fit, based on which monomials are allowed in numerator and denominator. We have experienced that some models just find a flat model as the best least squares fit. There are two causes for this:<br />
<br />
* The number of sample points is few, and the model parameters (as explained [[Model types explained#PolynomialModel|here]]) force the model to use only a very small set of degrees of freedom. The solution in this case is to increase the minimum percentage bound in the RationalFactory section of your configuration file: change the <code>"percentBounds"</code> option to <code>"60,100"</code>, <code>"80,100"</code>, or even <code>"100,100"</code>. A setting of <code>"100,100"</code> will force the polynomial models to always exactly interpolate. However, note that this does not scale very well with the number of samples (to counter this you can set <code>"maxDegrees"</code>). If, after increasing the <code>"percentBounds"</code> you still get weird, spiky, models you simply need more samples or you should switch to a different model type.<br />
* Another possibility is that given a set of monomial degrees, the flat function is just the best possible least squares fit. In that case you simply need to wait for more samples.<br />
* The measure you are using is not accurately estimating the true error, try a different measure or error function. Note that a maximum relative error is dangerous to use since a the 0-function (= a flat model) has a lower maximum relative error than a function which overshoots the true behavior in some places but is otherwise correct.<br />
<br />
=== When using rational functions I sometimes get 'spikes' (poles) in my model ===<br />
<br />
When the denominator polynomial of a rational model has zeros inside the domain, the model will tend to infinity near these points. In most cases these models will only be recognized as being `the best' for a short period of time. As more samples get selected these models get replaced by better ones and the spikes should disappear.<br />
<br />
So, it is possible that a rational model with 'spikes' (caused by poles inside the domain) will be selected as best model. This may or may not be an issue, depending on what you want to use the model for. If it doesn't matter that the model is very inaccurate at one particular, small spot (near the pole), you can use the model with the pole and it should perform properly.<br />
<br />
However, if the model should have a reasonable error on the entire domain, several methods are available to reduce the chance of getting poles or remove the possibility altogether. The possible solutions are:<br />
<br />
* Simply wait for more data, usually spikes disappear (but not always).<br />
* Lower the maximum of the <code>"percentBounds"</code> option in the RationalFactory section of your configuration file. For example, say you have 500 data points and if the maximum of the <code>"percentBounds"</code> option is set to 100 percent it means the degrees of the polynomials in the rational function can go up to 500. If you set the maximum of the <code>"percentBounds"</code> option to 10, on the other hand, the maximum degree is set at 50 (= 10 percent of 500). You can also use the <code>"maxDegrees"</code> option to set an absolute bound.<br />
* If you roughly know the output range your data should have, an easy way to eliminate poles is to use the [[Measures#MinMax| MinMax]] [[Measures| Measure]] together with your current measure ([[Measures#CrossValidation| CrossValidation]] by default). This will cause models whose response falls outside the min-max bounds to be penalized extra, thus spikes should disappear.<br />
* Use a different model type (RBF, ANN, SVM,...), as spikes are a typical problem of rational functions.<br />
* Increase the population size if using the genetic version<br />
* Try using the [[SampleSelector#RationalPoleSuppressionSampleSelector| RationalPoleSuppressionSampleSelector]], it was designed to get rid of this problem more quickly, but it only selects one sample at the time.<br />
<br />
However, these solutions may not still not suffice in some cases. The underlying reason is that the order selection algorithm contains quite a lot of randomness, making it prone to over-fitting. This issue is being worked on but will take some time. Automatic order selection is not an easy problem<br />
<br />
=== There is no noise in my data yet the rational functions don't interpolate ===<br />
<br />
[[FAQ#I sometimes get flat models when using rational functions |see this question]].<br />
<br />
=== When loading a model from disk I get "Warning: Class ':all:' is an unknown object class. Object 'model' of this class has been converted to a structure." ===<br />
<br />
You are trying to load a model file without the SUMO Toolbox in your Matlab path. Make sure the toolbox is in your Matlab path. <br />
<br />
In short: Start Matlab, run <code><SUMO-Toolbox-directory>/startup.m</code> (to ensure the toolbox is in your path) and then try to load your model.<br />
<br />
=== When running the SUMO Toolbox you get an error like "No component with id 'annpso' of type 'adaptive model builder' found in config file." ===<br />
<br />
This means you have specified to use a component with a certain id (in this case an AdaptiveModelBuilder component with id 'annpso') but a component with that id does not exist further down in the configuration file (in this particular case 'annpso' does not exist but 'anngenetic' or 'ann' does, as a quick search through the configuration file will show). So make sure you only declare components which have a definition lower down. So see which components are available, simply scroll down the configuration file and see which id's are specified. Please also refer to the [[Toolbox configuration#Declarations and Definitions | Declarations and Definitions]] page.<br />
<br />
=== When using NANN models I sometimes get "Runtime error in matrix library, Choldc failed. Matrix not positive definite" ===<br />
<br />
This is a problem in the mex implementation of the [http://www.iau.dtu.dk/research/control/nnsysid.html NNSYSID] toolbox. Simply delete the mex files, the Matlab implementation will be used and this will not cause any problems.<br />
<br />
=== When using FANN models I sometimes get "Invalid MEX-file createFann.mexa64, libfann.so.2: cannot open shared object file: No such file or directory." ===<br />
<br />
This means Matlab cannot find the [http://leenissen.dk/fann/ FANN] library itself to link to dynamically. Make sure the FANN libraries (stored in src/matlab/contrib/fann/src/.libs/) are in your library path, e.g., on unix systems, make sure they are included in LD_LIBRARY_PATH.<br />
<br />
=== Undeﬁned function or method ’createFann’ for input arguments of type ’double’. ===<br />
<br />
See [[FAQ#When_using_FANN_models_I_sometimes_get_.22Invalid_MEX-file_createFann.mexa64.2C_libfann.so.2:_cannot_open_shared_object_file:_No_such_file_or_directory..22]]<br />
<br />
=== When trying to use SVM models I get 'Error during fitness evaluation: Error using ==> svmtrain at 170, Group must be a vector' ===<br />
<br />
You forgot to build the SVM mex files for your platform. For windows they are pre-compiled for you, on other systems you have to compile them yourself with the makefile.<br />
<br />
=== When running the toolbox you get something like '??? Undefined variable "ibbt" or class "ibbt.sumo.config.ContextConfig.setRootDirectory"' ===<br />
<br />
First see [[FAQ#What_is_the_relationship_between_Matlab_and_Java.3F | this FAQ entry]].<br />
<br />
This means Matlab cannot find the needed Java classes. This typically means that you forgot to run 'startup' (to set the path correctly) before running the toolbox (using 'go'). So make sure you always run 'startup' before running 'go' and that both commands are always executed in the toolbox root directory.<br />
<br />
If you did run 'startup' correctly and you are still getting an error, check that Java is properly enabled:<br />
<br />
# typing 'usejava jvm' should return 1 <br />
# typing 's = java.lang.String', this should ''not'' give an error<br />
# typing 'version('-java')' should return at least version 1.5.0<br />
<br />
If (1) returns 0, then the jvm of your Matlab installation is not enabled. Check your Matlab installation or startup parameters (did you start Matlab with -nojvm?)<br />
If (2) fails but (1) is ok, there is a very weird problem, check the Matlab documentation.<br />
If (3) returns a version before 1.5.0 you will have to upgrade Matlab to a newer version or force Matlab to use a custom, newer, jvm (See the Matlab docs for how to do this).<br />
<br />
=== You get errors related to ''gaoptimset'',''psoptimset'',''saoptimset'',''newff'' not being found or unknown ===<br />
<br />
You are trying to use a component of the SUMO toolbox that requires a Matlab toolbox that you do not have. See the [[System requirements]] for more information.<br />
<br />
=== After upgrading I get all kinds of weird errors or warnings when I run my XML files ===<br />
<br />
See [[FAQ#How_do_I_upgrade_to_a_newer_version.3F]]<br />
<br />
=== I get a warning about duplicate samples being selected, why is this? ===<br />
<br />
Sometimes, in special circumstances, multiple sample selectors may select the same sample at the same time. Even though in most cases this is detected and avoided, it can still happen when multiple outputs are modelled in one run, and each output is sampled by a different sample selector. These sample selectors may then accidentally choose the same new sample location.<br />
<br />
=== I sometimes see the error of the best model go up, shouldn't it decrease monotonically? ===<br />
<br />
There is no short answer here, it depends on the situation. Below 'single objective' refers to the case where during the hyperparameter optimization (= the modeling iteration) combineOutputs=false, and there is only a single measure set to 'on'. The other cases are classified as 'multi objective'. See also [[Multi-Objective Modeling]].<br />
<br />
# '''Sampling off'''<br />
## ''Single objective'': the error should always decrease monotonically, you should never see it rise. If it does [[reporting problems|report it as a bug]]<br />
## ''Multi objective'': There is a very small chance the error can temporarily decrease but it should be safe to ignore. In this case it is best to use a multi objective enabled modeling algorithm<br />
# '''Sampling on'''<br />
## ''Single objective'': inside each modeling iteration the error should always monotonically decrease. At each sampling iteration the best models are updated (to reflect the new data), thus there the best model score may increase, this is normal behavior(*). It is possible that the error increases for a short while, but as more samples come in it should decrease again. If this does not happen you are using a poor measure or poor hyperparameter optimization algorithm, or there is a problem with the modeling technique itself (e.g., clustering in the datapoints is causing numerical problems).<br />
## ''Multi objective'': Combination of 1.2 and 2.1.<br />
<br />
(*) This is normal if you are using a measure like cross validation that is less reliable on little data than on more data. However, in some cases you may wish to override this behavior if you are using a measure that is independent of the number of samples the model is trained with (e.g., a dense, external validation set). In this case you can force a monotonic decrease by setting the 'keepOldModels' option in the SUMO tag to true. Use with caution!<br />
<br />
=== At the end of a run I get Undefined variable "ibbt" or class "ibbt.sumo.util.JpegImagesToMovie.createMovie" ===<br />
<br />
This is normal, the warning printed out before the error explains why:<br />
<br />
''[WARNING] jmf.jar not found in the java classpath, movie creation may not work! Did you install the SUMO extension pack? Alternatively you can install the java media framwork from java.sun.com''<br />
<br />
By default, at the end of a run, the toolbox will try to generate a movie of all the intermediate model plots. To do this it requires the extension pack to be installed (you can download it from the SUMO lab website). So install the extension pack and you will no longer get the error. Alternatively you can simply set the "createMovie" option in the <SUMO> tag to "false".<br />
So note that there is nothing to worry about, everything has run correctly, it is just the movie creation that is failing.<br />
<br />
=== On startup I get the error "java.io.IOException: Couldn't get lock for output/SUMO-Toolbox.%g.%u.log" ===<br />
<br />
This error means that SUMO is unable to create the log file. Check the output directory exists and has the correct permissions. If your output directory is on a shared (network) drive this could also cause problems. Also make sure you are running the toolbox (calling 'go') from the toolbox root directory, and not in some toolbox sub directory! This is very important.<br />
<br />
If you still have problems you can override the default logfile name and location as follows:<br />
<br />
In the <FileHandler> tag inside the <Logging> tag add the following option:<br />
<br />
<code><br />
<Option key="Pattern" value="My_SUMO_Log_file.log"/><br />
</code><br />
<br />
This means that from now on the sumo log file will be saved as the file "My_SUMO_Log_file.log" in the SUMO root directory. You can use any path you like.<br />
For more information about this option see [http://java.sun.com/j2se/1.4.2/docs/api/java/util/logging/FileHandler.html the FileHandler Javadoc].<br />
<br />
=== The Toolbox crashes with "Too many open files" what should I do? ===<br />
<br />
This is a known bug, see [[Known_bugs#Version_6.1]].<br />
<br />
If this does not fix your problem then do the following:<br />
<br />
On Windows try increasing the limit in windows as dictated by the error message. Also, when you get the error, use the fopen("all") command to see which files are open and send us the list of filenames. Then we can maybe further help you debug the problem. Even better would be to use the Process Explorer utility [http://technet.microsoft.com/en-us/sysinternals/bb896653.aspx available here]. When you get the error, dont shut down Matlab but start Process explorer and see which SUMO-Toolbox related files are open. If you then [[Reporting_problems|let us know]] we can further debug the problem.<br />
<br />
On Linux again don't shut down Matlab but:<br />
<br />
* open a new terminal window<br />
* type:<br />
<source lang="bash"><br />
lsof > openFiles.txt<br />
</source><br />
* Then [[Contact|send us]] the following information:<br />
** the file openFiles.txt <br />
** the exact Linux distribution you are using (Red Hat 10, CentOS 5, SUSE 11, etc).<br />
** the output of<br />
<source lang="bash"><br />
uname -a ; df -T ; mount<br />
</source><br />
<br />
As a temporary workaround you can try increasing the maximum number of open files ([http://www.linuxforums.org/forum/redhat-fedora-linux-help/64716-where-chnage-file-max-permanently.html see for example here]). We are currently debugging this issue.<br />
<br />
In general: to be safe it is always best to do a SUMO run from a clean Matlab startup, especially if the run is important or may take a long time.<br />
<br />
=== When using the LS-SVM models I get lots of warnings: "make sure lssvmFILE.x (lssvmFILE.exe) is in the current directory, change now to MATLAB implementation..." ===<br />
<br />
The LS-SVMs have a C implementation and a Matlab implementation. If you dont have the compiled mex files it will use the matlab implementation and give a warning. But everything will work properly. To get rid of the warnings, compile the mex files [[Installation#Windows|as described here]], this can be done very easily. Or simply comment out the lines that produce the output in the lssvmlab directory in src/matlab/contrib.<br />
<br />
=== I get an error "Undefined function or method 'trainlssvm' for input arguments of type 'cell'" ===<br />
<br />
You most likely forgot to [[Installation#Extension_pack|install the extension pack]].<br />
<br />
=== When running the SUMO-Toolbox under Linux, the [http://en.wikipedia.org/wiki/X_Window_System X server] suddenly restarts and I am logged out of my session ===<br />
<br />
Note that in Linux there is an explicit difference between the [http://en.wikipedia.org/wiki/Linux_kernel kernel] and the [http://en.wikipedia.org/wiki/X_Window_System X display server]. If the kernel crashes or panics your system completely freezes (you have to reset manually) or your computer does a full reboot. Luckily this is very rare. However, if you display server (X) crashes or restarts it means your operating system is still running fine, its just that you have to log in again since your graphical session has terminated. The FAQ entry is only for the latter. If you find your kernel is panicing or freezing, that is a more fundamental problem and you should contact your system admin.<br />
<br />
So what happens is that after a few seconds when the toolbox wants to plot the first model [http://en.wikipedia.org/wiki/X_Window_System X] crashes and you are suddenly presented with a login screen. The problem is not due to SUMO but rather to the Matlab - Display server interaction.<br />
<br />
What you should first do is set plotModels to false in the [[Config:ContextConfig]] tag, run again and see if the problem occurs again. If it does please [[Reporting_problems| report it]]. If the problem does not occur you can then try the following:<br />
<br />
* Log in as root (or use [http://en.wikipedia.org/wiki/Sudo sudo])<br />
* Edit the following configuration file using a text editor (pico, nano, vi, kwrite, gedit,...)<br />
<br />
<source lang="bash"><br />
/etc/X11/xorg.conf<br />
</source><br />
<br />
Note: the exact location of the xorg.conf file may vary on your system.<br />
<br />
* Look for the following line:<br />
<br />
<source lang="bash"><br />
Load "glx"<br />
</source><br />
<br />
* Comment it out by replacing it by:<br />
<br />
<source lang="bash"><br />
# Load "glx"<br />
</source><br />
<br />
* Then save the file, restart your X server (if you do not know how to do this simply reboot your computer)<br />
* Log in again, and try running the toolbox (making sure plotModels is set to true again). It should now work. If it still does not please [[Reporting_problems| report it]].<br />
<br />
Note:<br />
* this is just an empirical workaround, if you have a better idea please [[Contact|let us know]]<br />
* if you wish to debug further yourself please check the Xorg log files and those in /var/log<br />
* another possible workaround is to start matlab with the "-nodisplay" option. That could work as well.<br />
<br />
=== I get the error "Failed to close Matlab pool cleanly, error is Too many output arguments" ===<br />
<br />
This happens if you run the toolbox on Matlab version 2008a and you have the parallel computing toolbox installed. You can simply ignore this error message, it does not cause any problems. If you want to use SUMO with the parallel computing toolbox you will need Matlab 2008b.<br />
<br />
=== The toolbox seems to keep on running forever, when or how will it stop? ===<br />
<br />
The toolbox will keep on generating models and selecting data until one of the termination criteria has been reached. It is up to ''you'' to choose these targets carefully, so how low the toolbox runs simply depends on what targets you choose. Please see [[Running#Understanding_the_control_flow]].<br />
<br />
Of course choosing a-priori targets up front is not always easy and there is no real solution for this, except thinking well about what type of model you want (see [[FAQ#I_dont_like_the_final_model_generated_by_SUMO_how_do_I_improve_it.3F]]). In doubt you can always use a small value (or 0) and then simply quit the running toolbox using Ctrl-C when you think its been enough.<br />
<br />
While one could implement fancy, automatic stopping algorithms, their actual benefit is questionable.</div>Dgorissenhttp://sumowiki.intec.ugent.be/index.php?title=Using_a_model&diff=5205Using a model2010-08-18T07:23:20Z<p>Dgorissen: /* Available methods */</p>
<hr />
<div>This page explains what you can do with a SUMO generated model.<br />
<br />
== Loading a model from disk ==<br />
<br />
As the SUMO Toolbox builds models, each current best model is stored as a Matlab mat file in the output directory (e.g.: <code>output/Academic_2D_Twice_rep01_run00_2008.05.20_10-27-18/models_out/model_0002.mat</code>). <br />
<br />
In order to load this model from disk and actually use it, do the following:<br />
<br />
* Start Matlab, make sure the SUMO Toolbox is in your path and navigate to the directory where the model file is stored<br />
* Load the model from disk as follows:<br />
** >> <code>modelFile = load('model_0002.mat');</code><br />
** >> <code>model = modelFile.model;</code><br />
<br />
Now the model is available as the variable 'model' in the Matlab workspace.<br />
<br />
== Model portability ==<br />
<br />
How do you exchange and/or export SUMO models.<br />
<br />
=== The other person has the SUMO Toolbox installed ===<br />
<br />
The model 'mat' files can be shared with other people. In order for somebody else to use your saved model the following conditions need to be satisfied:<br />
<br />
* The person has the SUMO Toolbox in his Matlab path<br />
* The person should be using a similar Matlab version (including toolboxes) as was used to create the model file (preferably equal)<br />
* The person should be using a similar SUMO Toolbox version as was used to create the model file (preferably equal)<br />
<br />
We do not guarantee portability if the the above versions differ.<br />
<br />
=== The other person does NOT have the SUMO Toolbox installed ===<br />
<br />
In this case you can use the ''getExpression'' and ''exportToMFile'' (available from v6.0) methods. See below.<br />
<br />
== Model space vs Simulator space ==<br />
<br />
It is important to note the difference between '''Model space''' and '''Simulator space'''. When a data point is in model space, it means its inputs all lie in the range [-1 1]. When the point is in simulator space its inputs lie in the range specified by the [[Simulator configuration]] file.<br />
<br />
Internally the toolbox only works in model space. The toolbox will take care of translating points from simulator space into model space and back (this happens in the SampleManager object). You will note that many methods have a ''XXXinModelSpace'' variant. This just means that the method does exactly the same, except it expects points to be in model space. You should normally not have to care about model space unless you are writing your own extensions to the toolbox. In that case see [[Add Model Type]].<br />
<br />
== Available methods ==<br />
<br />
Once the model is loaded you can invoke a number methods on it. We list the main ones below. For a full list of available methods just use the matlab 'methods' command:<br />
<br />
<source lang="matlab"><br />
>>methods(model)<br />
</source><br />
<br />
If you want to understand the structure of the model, i.e., how the model object is built up you can do two things:<br />
<br />
# open the class file for that model, e.g., for an ANNModel object, open the src/matlab/models/@ANNModel/ANNModel.m file<br />
# use the struct command to convert the model object to a structure. For example:<br />
<br />
<source lang="matlab"><br />
>>str = struct(model)<br />
</source><br />
<br />
Also, some model types provide methods to access the internal model representation. For example, if m is an object of type ANNModel, then executing m.getNetwork() will return the nested Matlab neural network object (from the Matlab Neural Network Toolbox).<br />
<br />
=== guiPlotModel ===<br />
<br />
The easiest way to explore a model is to use the graphical model browser. [[Model Visualization GUI|See here for more information]]<br />
<br />
=== plotModel ===<br />
<source lang="matlab"><br />
>>[figureHandle] = plotModel(model,[outputNumber],[options])<br />
</source><br />
<br />
<code>plotModel</code> will generate an indicative plot of the model surface. To do so, it evaluates the model on a reasonably dense grid of points.<br />
<br />
<code>plotModel</code> optional parameters:<br />
* <code>outputNumber</code>: optional parameter, an integer specifying which output to plot<br />
* <code>options</code>: optional parameter, a struct containing a number of options you can set. To get the default options simply call <code>Model.getPlotDefaults()</code>.<br />
<br />
<br />
To determine which kind of plot is generated, one makes a distinction based on the dimension of the input space:<br />
* '''One dimensional models''' are always plotted in a simple XY line chart. Samples are shown as dots.<br />
* '''Two dimensional models''' are plotted as a Matlab *mesh* plot, i.e. a colored surface. The colors are just an indication of height and don't have any further meaning. The samples are plotted as dots, and should (hopefully) approach the surface.<br />
* '''Three dimensional problems''' are plotted used a custom built [[Slice Plot]].<br />
* '''Four dimensional problems''' are plotted using 3 [[Slice Plot]]s. The leftmost plot fixes the variable of the fourth variable at -1, the middle plot at 0 and the rightmost plot at 1 (thus reducing the function to a three dimensional function, making a slice plot possible<br />
* '''Five dimensional problems''' are plotted using 9 [[Slice Plot]]s. The fourth and fifth variables are fixed at values of -1, 0 and 1. Indicators below the plots show where the variables were fixed.<br />
* '''Higher dimensional problems''': All variables after the fifth are fixed at 0, and plotting proceeds as if the model was five dimensional.<br />
<br />
The toolbox handles complex valued outputs as their modulus (= absolute value = magnitude) for plotting purposes. These plots are just visual aids for monitoring the modeling process. Phase data can be extracted from the model files.<br />
<br />
=== evaluate ===<br />
<source lang="matlab"><br />
>> values = evaluate(model, samples);<br />
</source><br />
<br />
This evaluates the model on the given samples. The samples should be provided in simulator space. Simulator space is defined by the range in the [[Simulator configuration]]. If no range (minimum and maximum) was specified, the domain is assumed to be [-1,1].<br />
<br />
See also [[Using_a_model#Model_object_interfacing_and_optimization]]<br />
<br />
=== evaluateDerivative ===<br />
<source lang="matlab"><br />
>> values = evaluateDerivative(model, samples, [outputIndex]);<br />
</source><br />
<br />
This approximates the partial derivatives of the model at each given sample. Note that the base class implementation is a very simple approximation. Models can override this function to provide more accurate derivatives (e.g., Kriging does this already). However, in its current form it is already useful.<br />
<br />
=== getSamples ===<br />
<source lang="matlab"><br />
>> samples = getSamples(model);<br />
</source><br />
<br />
Returns the samples that were used to fit the model. The samples are returned in simulator space.<br />
<br />
=== getValues ===<br />
<source lang="matlab"><br />
>> values = getValues(model);<br />
</source><br />
<br />
Returns the values that correspond to the samples from getSamples().<br />
<br />
=== getDescription ===<br />
<source lang="matlab"><br />
>> desc = getDescription(model);<br />
</source><br />
<br />
Returns a string with a user friendly description of the model.<br />
<br />
=== getExpression ===<br />
<source lang="matlab"><br />
>> desc = getExpression(model,[outputNumber]);<br />
</source><br />
<br />
Returns the symbolic mathematical expression of this model (e.g., 3*x1^2 - 2*x2 +5). Note that not all model types implement this.<br />
<br />
=== construct ===<br />
<source lang="matlab"><br />
>> model = construct(model,samples);<br />
</source><br />
<br />
This will build/train/fit.. the model on the given set of data points and return the updated model.<br />
<br />
=== complexity ===<br />
<source lang="matlab"><br />
>> n = complexity(model);<br />
</source><br />
<br />
Returns the number of free parameters in the model. By default this returns the number of datapoints the model was built with but this is overridden by some model types. For example, an ANN model returns the number of weights in the network while a rational model returns the number of coefficients.<br />
<br />
== Model object interfacing and optimization ==<br />
<br />
You may want to use the model as part of a larger Matlab program, or you may simply want to optimizer the model. To easily do this you can create a function handle to the model object. You can do this as follows (example for the 3D case):<br />
<br />
<code><pre><br />
handle = @(x,y,z) evaluate( model, [x,y,z] );<br />
</pre></code><br />
<br />
Afterwards, you can pass that handle to your optimization procedure, or use it through <code>feval</code>:<br />
<br />
<code><pre><br />
fmincon( handle, ... );<br />
feval( handle, 0, 1, -1 );<br />
</pre></code></div>Dgorissenhttp://sumowiki.intec.ugent.be/index.php?title=Using_a_model&diff=5204Using a model2010-08-18T07:21:05Z<p>Dgorissen: /* Available methods */</p>
<hr />
<div>This page explains what you can do with a SUMO generated model.<br />
<br />
== Loading a model from disk ==<br />
<br />
As the SUMO Toolbox builds models, each current best model is stored as a Matlab mat file in the output directory (e.g.: <code>output/Academic_2D_Twice_rep01_run00_2008.05.20_10-27-18/models_out/model_0002.mat</code>). <br />
<br />
In order to load this model from disk and actually use it, do the following:<br />
<br />
* Start Matlab, make sure the SUMO Toolbox is in your path and navigate to the directory where the model file is stored<br />
* Load the model from disk as follows:<br />
** >> <code>modelFile = load('model_0002.mat');</code><br />
** >> <code>model = modelFile.model;</code><br />
<br />
Now the model is available as the variable 'model' in the Matlab workspace.<br />
<br />
== Model portability ==<br />
<br />
How do you exchange and/or export SUMO models.<br />
<br />
=== The other person has the SUMO Toolbox installed ===<br />
<br />
The model 'mat' files can be shared with other people. In order for somebody else to use your saved model the following conditions need to be satisfied:<br />
<br />
* The person has the SUMO Toolbox in his Matlab path<br />
* The person should be using a similar Matlab version (including toolboxes) as was used to create the model file (preferably equal)<br />
* The person should be using a similar SUMO Toolbox version as was used to create the model file (preferably equal)<br />
<br />
We do not guarantee portability if the the above versions differ.<br />
<br />
=== The other person does NOT have the SUMO Toolbox installed ===<br />
<br />
In this case you can use the ''getExpression'' and ''exportToMFile'' (available from v6.0) methods. See below.<br />
<br />
== Model space vs Simulator space ==<br />
<br />
It is important to note the difference between '''Model space''' and '''Simulator space'''. When a data point is in model space, it means its inputs all lie in the range [-1 1]. When the point is in simulator space its inputs lie in the range specified by the [[Simulator configuration]] file.<br />
<br />
Internally the toolbox only works in model space. The toolbox will take care of translating points from simulator space into model space and back (this happens in the SampleManager object). You will note that many methods have a ''XXXinModelSpace'' variant. This just means that the method does exactly the same, except it expects points to be in model space. You should normally not have to care about model space unless you are writing your own extensions to the toolbox. In that case see [[Add Model Type]].<br />
<br />
== Available methods ==<br />
<br />
Once the model is loaded you can invoke a number methods on it. We list the main ones below. For a full list of available methods just use the matlab 'methods' command:<br />
<br />
<source lang="matlab"><br />
>>methods(model)<br />
</source><br />
<br />
If you want to understand the structure of the model, i.e., how the model object is built up you can do two things:<br />
<br />
# open the class file for that model, e.g., for an ANNModel object, open the src/matlab/models/@ANNModel/ANNModel.m file<br />
# use the struct command to convert the model object to a structure. For example:<br />
<br />
<source lang="matlab"><br />
>>str = struct(model)<br />
</source><br />
<br />
=== guiPlotModel ===<br />
<br />
The easiest way to explore a model is to use the graphical model browser. [[Model Visualization GUI|See here for more information]]<br />
<br />
=== plotModel ===<br />
<source lang="matlab"><br />
>>[figureHandle] = plotModel(model,[outputNumber],[options])<br />
</source><br />
<br />
<code>plotModel</code> will generate an indicative plot of the model surface. To do so, it evaluates the model on a reasonably dense grid of points.<br />
<br />
<code>plotModel</code> optional parameters:<br />
* <code>outputNumber</code>: optional parameter, an integer specifying which output to plot<br />
* <code>options</code>: optional parameter, a struct containing a number of options you can set. To get the default options simply call <code>Model.getPlotDefaults()</code>.<br />
<br />
<br />
To determine which kind of plot is generated, one makes a distinction based on the dimension of the input space:<br />
* '''One dimensional models''' are always plotted in a simple XY line chart. Samples are shown as dots.<br />
* '''Two dimensional models''' are plotted as a Matlab *mesh* plot, i.e. a colored surface. The colors are just an indication of height and don't have any further meaning. The samples are plotted as dots, and should (hopefully) approach the surface.<br />
* '''Three dimensional problems''' are plotted used a custom built [[Slice Plot]].<br />
* '''Four dimensional problems''' are plotted using 3 [[Slice Plot]]s. The leftmost plot fixes the variable of the fourth variable at -1, the middle plot at 0 and the rightmost plot at 1 (thus reducing the function to a three dimensional function, making a slice plot possible<br />
* '''Five dimensional problems''' are plotted using 9 [[Slice Plot]]s. The fourth and fifth variables are fixed at values of -1, 0 and 1. Indicators below the plots show where the variables were fixed.<br />
* '''Higher dimensional problems''': All variables after the fifth are fixed at 0, and plotting proceeds as if the model was five dimensional.<br />
<br />
The toolbox handles complex valued outputs as their modulus (= absolute value = magnitude) for plotting purposes. These plots are just visual aids for monitoring the modeling process. Phase data can be extracted from the model files.<br />
<br />
=== evaluate ===<br />
<source lang="matlab"><br />
>> values = evaluate(model, samples);<br />
</source><br />
<br />
This evaluates the model on the given samples. The samples should be provided in simulator space. Simulator space is defined by the range in the [[Simulator configuration]]. If no range (minimum and maximum) was specified, the domain is assumed to be [-1,1].<br />
<br />
See also [[Using_a_model#Model_object_interfacing_and_optimization]]<br />
<br />
=== evaluateDerivative ===<br />
<source lang="matlab"><br />
>> values = evaluateDerivative(model, samples, [outputIndex]);<br />
</source><br />
<br />
This approximates the partial derivatives of the model at each given sample. Note that the base class implementation is a very simple approximation. Models can override this function to provide more accurate derivatives (e.g., Kriging does this already). However, in its current form it is already useful.<br />
<br />
=== getSamples ===<br />
<source lang="matlab"><br />
>> samples = getSamples(model);<br />
</source><br />
<br />
Returns the samples that were used to fit the model. The samples are returned in simulator space.<br />
<br />
=== getValues ===<br />
<source lang="matlab"><br />
>> values = getValues(model);<br />
</source><br />
<br />
Returns the values that correspond to the samples from getSamples().<br />
<br />
=== getDescription ===<br />
<source lang="matlab"><br />
>> desc = getDescription(model);<br />
</source><br />
<br />
Returns a string with a user friendly description of the model.<br />
<br />
=== getExpression ===<br />
<source lang="matlab"><br />
>> desc = getExpression(model,[outputNumber]);<br />
</source><br />
<br />
Returns the symbolic mathematical expression of this model (e.g., 3*x1^2 - 2*x2 +5). Note that not all model types implement this.<br />
<br />
=== construct ===<br />
<source lang="matlab"><br />
>> model = construct(model,samples);<br />
</source><br />
<br />
This will build/train/fit.. the model on the given set of data points and return the updated model.<br />
<br />
=== complexity ===<br />
<source lang="matlab"><br />
>> n = complexity(model);<br />
</source><br />
<br />
Returns the number of free parameters in the model. By default this returns the number of datapoints the model was built with but this is overridden by some model types. For example, an ANN model returns the number of weights in the network while a rational model returns the number of coefficients.<br />
<br />
== Model object interfacing and optimization ==<br />
<br />
You may want to use the model as part of a larger Matlab program, or you may simply want to optimizer the model. To easily do this you can create a function handle to the model object. You can do this as follows (example for the 3D case):<br />
<br />
<code><pre><br />
handle = @(x,y,z) evaluate( model, [x,y,z] );<br />
</pre></code><br />
<br />
Afterwards, you can pass that handle to your optimization procedure, or use it through <code>feval</code>:<br />
<br />
<code><pre><br />
fmincon( handle, ... );<br />
feval( handle, 0, 1, -1 );<br />
</pre></code></div>Dgorissenhttp://sumowiki.intec.ugent.be/index.php?title=Using_a_model&diff=5203Using a model2010-08-18T07:20:44Z<p>Dgorissen: /* Available methods */</p>
<hr />
<div>This page explains what you can do with a SUMO generated model.<br />
<br />
== Loading a model from disk ==<br />
<br />
As the SUMO Toolbox builds models, each current best model is stored as a Matlab mat file in the output directory (e.g.: <code>output/Academic_2D_Twice_rep01_run00_2008.05.20_10-27-18/models_out/model_0002.mat</code>). <br />
<br />
In order to load this model from disk and actually use it, do the following:<br />
<br />
* Start Matlab, make sure the SUMO Toolbox is in your path and navigate to the directory where the model file is stored<br />
* Load the model from disk as follows:<br />
** >> <code>modelFile = load('model_0002.mat');</code><br />
** >> <code>model = modelFile.model;</code><br />
<br />
Now the model is available as the variable 'model' in the Matlab workspace.<br />
<br />
== Model portability ==<br />
<br />
How do you exchange and/or export SUMO models.<br />
<br />
=== The other person has the SUMO Toolbox installed ===<br />
<br />
The model 'mat' files can be shared with other people. In order for somebody else to use your saved model the following conditions need to be satisfied:<br />
<br />
* The person has the SUMO Toolbox in his Matlab path<br />
* The person should be using a similar Matlab version (including toolboxes) as was used to create the model file (preferably equal)<br />
* The person should be using a similar SUMO Toolbox version as was used to create the model file (preferably equal)<br />
<br />
We do not guarantee portability if the the above versions differ.<br />
<br />
=== The other person does NOT have the SUMO Toolbox installed ===<br />
<br />
In this case you can use the ''getExpression'' and ''exportToMFile'' (available from v6.0) methods. See below.<br />
<br />
== Model space vs Simulator space ==<br />
<br />
It is important to note the difference between '''Model space''' and '''Simulator space'''. When a data point is in model space, it means its inputs all lie in the range [-1 1]. When the point is in simulator space its inputs lie in the range specified by the [[Simulator configuration]] file.<br />
<br />
Internally the toolbox only works in model space. The toolbox will take care of translating points from simulator space into model space and back (this happens in the SampleManager object). You will note that many methods have a ''XXXinModelSpace'' variant. This just means that the method does exactly the same, except it expects points to be in model space. You should normally not have to care about model space unless you are writing your own extensions to the toolbox. In that case see [[Add Model Type]].<br />
<br />
== Available methods ==<br />
<br />
Once the model is loaded you can invoke a number methods on it. We list the main ones below. For a full list of available methods just use the matlab 'methods' command:<br />
<br />
<source lang="matlab"><br />
>>methods(model)<br />
</source><br />
<br />
If you want to understand the structure of the model, i.e., how the model object is built up you can do two things:<br />
<br />
# open the class file for that model (e.g., for an ANNModel object, open the src/matlab/models/@ANNModel/ANNModel.m file<br />
# use the struct command to convert the model object to a structure. For example:<br />
<br />
<source lang="matlab"><br />
>>str = struct(model)<br />
</source><br />
<br />
<br />
=== guiPlotModel ===<br />
<br />
The easiest way to explore a model is to use the graphical model browser. [[Model Visualization GUI|See here for more information]]<br />
<br />
=== plotModel ===<br />
<source lang="matlab"><br />
>>[figureHandle] = plotModel(model,[outputNumber],[options])<br />
</source><br />
<br />
<code>plotModel</code> will generate an indicative plot of the model surface. To do so, it evaluates the model on a reasonably dense grid of points.<br />
<br />
<code>plotModel</code> optional parameters:<br />
* <code>outputNumber</code>: optional parameter, an integer specifying which output to plot<br />
* <code>options</code>: optional parameter, a struct containing a number of options you can set. To get the default options simply call <code>Model.getPlotDefaults()</code>.<br />
<br />
<br />
To determine which kind of plot is generated, one makes a distinction based on the dimension of the input space:<br />
* '''One dimensional models''' are always plotted in a simple XY line chart. Samples are shown as dots.<br />
* '''Two dimensional models''' are plotted as a Matlab *mesh* plot, i.e. a colored surface. The colors are just an indication of height and don't have any further meaning. The samples are plotted as dots, and should (hopefully) approach the surface.<br />
* '''Three dimensional problems''' are plotted used a custom built [[Slice Plot]].<br />
* '''Four dimensional problems''' are plotted using 3 [[Slice Plot]]s. The leftmost plot fixes the variable of the fourth variable at -1, the middle plot at 0 and the rightmost plot at 1 (thus reducing the function to a three dimensional function, making a slice plot possible<br />
* '''Five dimensional problems''' are plotted using 9 [[Slice Plot]]s. The fourth and fifth variables are fixed at values of -1, 0 and 1. Indicators below the plots show where the variables were fixed.<br />
* '''Higher dimensional problems''': All variables after the fifth are fixed at 0, and plotting proceeds as if the model was five dimensional.<br />
<br />
The toolbox handles complex valued outputs as their modulus (= absolute value = magnitude) for plotting purposes. These plots are just visual aids for monitoring the modeling process. Phase data can be extracted from the model files.<br />
<br />
=== evaluate ===<br />
<source lang="matlab"><br />
>> values = evaluate(model, samples);<br />
</source><br />
<br />
This evaluates the model on the given samples. The samples should be provided in simulator space. Simulator space is defined by the range in the [[Simulator configuration]]. If no range (minimum and maximum) was specified, the domain is assumed to be [-1,1].<br />
<br />
See also [[Using_a_model#Model_object_interfacing_and_optimization]]<br />
<br />
=== evaluateDerivative ===<br />
<source lang="matlab"><br />
>> values = evaluateDerivative(model, samples, [outputIndex]);<br />
</source><br />
<br />
This approximates the partial derivatives of the model at each given sample. Note that the base class implementation is a very simple approximation. Models can override this function to provide more accurate derivatives (e.g., Kriging does this already). However, in its current form it is already useful.<br />
<br />
=== getSamples ===<br />
<source lang="matlab"><br />
>> samples = getSamples(model);<br />
</source><br />
<br />
Returns the samples that were used to fit the model. The samples are returned in simulator space.<br />
<br />
=== getValues ===<br />
<source lang="matlab"><br />
>> values = getValues(model);<br />
</source><br />
<br />
Returns the values that correspond to the samples from getSamples().<br />
<br />
=== getDescription ===<br />
<source lang="matlab"><br />
>> desc = getDescription(model);<br />
</source><br />
<br />
Returns a string with a user friendly description of the model.<br />
<br />
=== getExpression ===<br />
<source lang="matlab"><br />
>> desc = getExpression(model,[outputNumber]);<br />
</source><br />
<br />
Returns the symbolic mathematical expression of this model (e.g., 3*x1^2 - 2*x2 +5). Note that not all model types implement this.<br />
<br />
=== construct ===<br />
<source lang="matlab"><br />
>> model = construct(model,samples);<br />
</source><br />
<br />
This will build/train/fit.. the model on the given set of data points and return the updated model.<br />
<br />
=== complexity ===<br />
<source lang="matlab"><br />
>> n = complexity(model);<br />
</source><br />
<br />
Returns the number of free parameters in the model. By default this returns the number of datapoints the model was built with but this is overridden by some model types. For example, an ANN model returns the number of weights in the network while a rational model returns the number of coefficients.<br />
<br />
== Model object interfacing and optimization ==<br />
<br />
You may want to use the model as part of a larger Matlab program, or you may simply want to optimizer the model. To easily do this you can create a function handle to the model object. You can do this as follows (example for the 3D case):<br />
<br />
<code><pre><br />
handle = @(x,y,z) evaluate( model, [x,y,z] );<br />
</pre></code><br />
<br />
Afterwards, you can pass that handle to your optimization procedure, or use it through <code>feval</code>:<br />
<br />
<code><pre><br />
fmincon( handle, ... );<br />
feval( handle, 0, 1, -1 );<br />
</pre></code></div>Dgorissenhttp://sumowiki.intec.ugent.be/index.php?title=Using_a_model&diff=5202Using a model2010-08-18T07:20:17Z<p>Dgorissen: /* Available methods */</p>
<hr />
<div>This page explains what you can do with a SUMO generated model.<br />
<br />
== Loading a model from disk ==<br />
<br />
As the SUMO Toolbox builds models, each current best model is stored as a Matlab mat file in the output directory (e.g.: <code>output/Academic_2D_Twice_rep01_run00_2008.05.20_10-27-18/models_out/model_0002.mat</code>). <br />
<br />
In order to load this model from disk and actually use it, do the following:<br />
<br />
* Start Matlab, make sure the SUMO Toolbox is in your path and navigate to the directory where the model file is stored<br />
* Load the model from disk as follows:<br />
** >> <code>modelFile = load('model_0002.mat');</code><br />
** >> <code>model = modelFile.model;</code><br />
<br />
Now the model is available as the variable 'model' in the Matlab workspace.<br />
<br />
== Model portability ==<br />
<br />
How do you exchange and/or export SUMO models.<br />
<br />
=== The other person has the SUMO Toolbox installed ===<br />
<br />
The model 'mat' files can be shared with other people. In order for somebody else to use your saved model the following conditions need to be satisfied:<br />
<br />
* The person has the SUMO Toolbox in his Matlab path<br />
* The person should be using a similar Matlab version (including toolboxes) as was used to create the model file (preferably equal)<br />
* The person should be using a similar SUMO Toolbox version as was used to create the model file (preferably equal)<br />
<br />
We do not guarantee portability if the the above versions differ.<br />
<br />
=== The other person does NOT have the SUMO Toolbox installed ===<br />
<br />
In this case you can use the ''getExpression'' and ''exportToMFile'' (available from v6.0) methods. See below.<br />
<br />
== Model space vs Simulator space ==<br />
<br />
It is important to note the difference between '''Model space''' and '''Simulator space'''. When a data point is in model space, it means its inputs all lie in the range [-1 1]. When the point is in simulator space its inputs lie in the range specified by the [[Simulator configuration]] file.<br />
<br />
Internally the toolbox only works in model space. The toolbox will take care of translating points from simulator space into model space and back (this happens in the SampleManager object). You will note that many methods have a ''XXXinModelSpace'' variant. This just means that the method does exactly the same, except it expects points to be in model space. You should normally not have to care about model space unless you are writing your own extensions to the toolbox. In that case see [[Add Model Type]].<br />
<br />
== Available methods ==<br />
<br />
Once the model is loaded you can invoke a number methods on it. We list the main ones below. For a full list of available methods just use the matlab 'methods' command:<br />
<br />
<source lang="matlab"><br />
>>methods(model)<br />
</source><br />
<br />
If you want to understand the structure of the model, i.e., how the model object is built up you can do two things:<br />
<br />
# open the class file for that model (e.g., for an ANNModel object, open the src/matlab/models/@ANNModel/ANNModel.m file<br />
# use the struct command to convert the model object to a structure. For example:<br />
<br />
<source lang="matlab"><br />
>>str = struct(model)<br />
</source><br />
<br />
<br />
=== guiPlotModel ===<br />
<br />
The easiest way to explore a model is to use the graphical model browser. [[Model Visualization GUI|See here for more information]]<br />
<br />
=== plotModel ===<br />
<code><pre><br />
>>[figureHandle] = plotModel(model,[outputNumber],[options])<br />
</pre></code><br />
<br />
<code>plotModel</code> will generate an indicative plot of the model surface. To do so, it evaluates the model on a reasonably dense grid of points.<br />
<br />
<code>plotModel</code> optional parameters:<br />
* <code>outputNumber</code>: optional parameter, an integer specifying which output to plot<br />
* <code>options</code>: optional parameter, a struct containing a number of options you can set. To get the default options simply call <code>Model.getPlotDefaults()</code>.<br />
<br />
<br />
To determine which kind of plot is generated, one makes a distinction based on the dimension of the input space:<br />
* '''One dimensional models''' are always plotted in a simple XY line chart. Samples are shown as dots.<br />
* '''Two dimensional models''' are plotted as a Matlab *mesh* plot, i.e. a colored surface. The colors are just an indication of height and don't have any further meaning. The samples are plotted as dots, and should (hopefully) approach the surface.<br />
* '''Three dimensional problems''' are plotted used a custom built [[Slice Plot]].<br />
* '''Four dimensional problems''' are plotted using 3 [[Slice Plot]]s. The leftmost plot fixes the variable of the fourth variable at -1, the middle plot at 0 and the rightmost plot at 1 (thus reducing the function to a three dimensional function, making a slice plot possible<br />
* '''Five dimensional problems''' are plotted using 9 [[Slice Plot]]s. The fourth and fifth variables are fixed at values of -1, 0 and 1. Indicators below the plots show where the variables were fixed.<br />
* '''Higher dimensional problems''': All variables after the fifth are fixed at 0, and plotting proceeds as if the model was five dimensional.<br />
<br />
The toolbox handles complex valued outputs as their modulus (= absolute value = magnitude) for plotting purposes. These plots are just visual aids for monitoring the modeling process. Phase data can be extracted from the model files.<br />
<br />
=== evaluate ===<br />
<code><pre><br />
>> values = evaluate(model, samples);<br />
</pre></code><br />
<br />
This evaluates the model on the given samples. The samples should be provided in simulator space. Simulator space is defined by the range in the [[Simulator configuration]]. If no range (minimum and maximum) was specified, the domain is assumed to be [-1,1].<br />
<br />
See also [[Using_a_model#Model_object_interfacing_and_optimization]]<br />
<br />
=== evaluateDerivative ===<br />
<code><pre><br />
>> values = evaluateDerivative(model, samples, [outputIndex]);<br />
</pre></code><br />
<br />
This approximates the partial derivatives of the model at each given sample. Note that the base class implementation is a very simple approximation. Models can override this function to provide more accurate derivatives (e.g., Kriging does this already). However, in its current form it is already useful.<br />
<br />
=== getSamples ===<br />
<code><pre><br />
>> samples = getSamples(model);<br />
</pre></code><br />
<br />
Returns the samples that were used to fit the model. The samples are returned in simulator space.<br />
<br />
=== getValues ===<br />
<code><pre><br />
>> values = getValues(model);<br />
</pre></code><br />
<br />
Returns the values that correspond to the samples from getSamples().<br />
<br />
=== getDescription ===<br />
<code><pre><br />
>> desc = getDescription(model);<br />
</pre></code><br />
<br />
Returns a string with a user friendly description of the model.<br />
<br />
=== getExpression ===<br />
<code><pre><br />
>> desc = getExpression(model,[outputNumber]);<br />
</pre></code><br />
<br />
Returns the symbolic mathematical expression of this model (e.g., 3*x1^2 - 2*x2 +5). Note that not all model types implement this.<br />
<br />
=== construct ===<br />
<code><pre><br />
>> model = construct(model,samples);<br />
</pre></code><br />
<br />
This will build/train/fit.. the model on the given set of data points and return the updated model.<br />
<br />
=== complexity ===<br />
<code><pre><br />
>> n = complexity(model);<br />
</pre></code><br />
<br />
Returns the number of free parameters in the model. By default this returns the number of datapoints the model was built with but this is overridden by some model types. For example, an ANN model returns the number of weights in the network while a rational model returns the number of coefficients.<br />
<br />
== Model object interfacing and optimization ==<br />
<br />
You may want to use the model as part of a larger Matlab program, or you may simply want to optimizer the model. To easily do this you can create a function handle to the model object. You can do this as follows (example for the 3D case):<br />
<br />
<code><pre><br />
handle = @(x,y,z) evaluate( model, [x,y,z] );<br />
</pre></code><br />
<br />
Afterwards, you can pass that handle to your optimization procedure, or use it through <code>feval</code>:<br />
<br />
<code><pre><br />
fmincon( handle, ... );<br />
feval( handle, 0, 1, -1 );<br />
</pre></code></div>Dgorissenhttp://sumowiki.intec.ugent.be/index.php?title=Using_a_model&diff=5201Using a model2010-08-18T07:16:31Z<p>Dgorissen: /* Available methods */</p>
<hr />
<div>This page explains what you can do with a SUMO generated model.<br />
<br />
== Loading a model from disk ==<br />
<br />
As the SUMO Toolbox builds models, each current best model is stored as a Matlab mat file in the output directory (e.g.: <code>output/Academic_2D_Twice_rep01_run00_2008.05.20_10-27-18/models_out/model_0002.mat</code>). <br />
<br />
In order to load this model from disk and actually use it, do the following:<br />
<br />
* Start Matlab, make sure the SUMO Toolbox is in your path and navigate to the directory where the model file is stored<br />
* Load the model from disk as follows:<br />
** >> <code>modelFile = load('model_0002.mat');</code><br />
** >> <code>model = modelFile.model;</code><br />
<br />
Now the model is available as the variable 'model' in the Matlab workspace.<br />
<br />
== Model portability ==<br />
<br />
How do you exchange and/or export SUMO models.<br />
<br />
=== The other person has the SUMO Toolbox installed ===<br />
<br />
The model 'mat' files can be shared with other people. In order for somebody else to use your saved model the following conditions need to be satisfied:<br />
<br />
* The person has the SUMO Toolbox in his Matlab path<br />
* The person should be using a similar Matlab version (including toolboxes) as was used to create the model file (preferably equal)<br />
* The person should be using a similar SUMO Toolbox version as was used to create the model file (preferably equal)<br />
<br />
We do not guarantee portability if the the above versions differ.<br />
<br />
=== The other person does NOT have the SUMO Toolbox installed ===<br />
<br />
In this case you can use the ''getExpression'' and ''exportToMFile'' (available from v6.0) methods. See below.<br />
<br />
== Model space vs Simulator space ==<br />
<br />
It is important to note the difference between '''Model space''' and '''Simulator space'''. When a data point is in model space, it means its inputs all lie in the range [-1 1]. When the point is in simulator space its inputs lie in the range specified by the [[Simulator configuration]] file.<br />
<br />
Internally the toolbox only works in model space. The toolbox will take care of translating points from simulator space into model space and back (this happens in the SampleManager object). You will note that many methods have a ''XXXinModelSpace'' variant. This just means that the method does exactly the same, except it expects points to be in model space. You should normally not have to care about model space unless you are writing your own extensions to the toolbox. In that case see [[Add Model Type]].<br />
<br />
== Available methods ==<br />
<br />
Once the model is loaded you can invoke a number methods on it. We list the main ones below. For a full list of available methods just use the matlab 'methods' command:<br />
<br />
<source lang="matlab"><br />
>>methods(model)<br />
</source><br />
<br />
<br />
<br />
=== guiPlotModel ===<br />
<br />
The easiest way to explore a model is to use the graphical model browser. [[Model Visualization GUI|See here for more information]]<br />
<br />
=== plotModel ===<br />
<code><pre><br />
>>[figureHandle] = plotModel(model,[outputNumber],[options])<br />
</pre></code><br />
<br />
<code>plotModel</code> will generate an indicative plot of the model surface. To do so, it evaluates the model on a reasonably dense grid of points.<br />
<br />
<code>plotModel</code> optional parameters:<br />
* <code>outputNumber</code>: optional parameter, an integer specifying which output to plot<br />
* <code>options</code>: optional parameter, a struct containing a number of options you can set. To get the default options simply call <code>Model.getPlotDefaults()</code>.<br />
<br />
<br />
To determine which kind of plot is generated, one makes a distinction based on the dimension of the input space:<br />
* '''One dimensional models''' are always plotted in a simple XY line chart. Samples are shown as dots.<br />
* '''Two dimensional models''' are plotted as a Matlab *mesh* plot, i.e. a colored surface. The colors are just an indication of height and don't have any further meaning. The samples are plotted as dots, and should (hopefully) approach the surface.<br />
* '''Three dimensional problems''' are plotted used a custom built [[Slice Plot]].<br />
* '''Four dimensional problems''' are plotted using 3 [[Slice Plot]]s. The leftmost plot fixes the variable of the fourth variable at -1, the middle plot at 0 and the rightmost plot at 1 (thus reducing the function to a three dimensional function, making a slice plot possible<br />
* '''Five dimensional problems''' are plotted using 9 [[Slice Plot]]s. The fourth and fifth variables are fixed at values of -1, 0 and 1. Indicators below the plots show where the variables were fixed.<br />
* '''Higher dimensional problems''': All variables after the fifth are fixed at 0, and plotting proceeds as if the model was five dimensional.<br />
<br />
The toolbox handles complex valued outputs as their modulus (= absolute value = magnitude) for plotting purposes. These plots are just visual aids for monitoring the modeling process. Phase data can be extracted from the model files.<br />
<br />
=== evaluate ===<br />
<code><pre><br />
>> values = evaluate(model, samples);<br />
</pre></code><br />
<br />
This evaluates the model on the given samples. The samples should be provided in simulator space. Simulator space is defined by the range in the [[Simulator configuration]]. If no range (minimum and maximum) was specified, the domain is assumed to be [-1,1].<br />
<br />
See also [[Using_a_model#Model_object_interfacing_and_optimization]]<br />
<br />
=== evaluateDerivative ===<br />
<code><pre><br />
>> values = evaluateDerivative(model, samples, [outputIndex]);<br />
</pre></code><br />
<br />
This approximates the partial derivatives of the model at each given sample. Note that the base class implementation is a very simple approximation. Models can override this function to provide more accurate derivatives (e.g., Kriging does this already). However, in its current form it is already useful.<br />
<br />
=== getSamples ===<br />
<code><pre><br />
>> samples = getSamples(model);<br />
</pre></code><br />
<br />
Returns the samples that were used to fit the model. The samples are returned in simulator space.<br />
<br />
=== getValues ===<br />
<code><pre><br />
>> values = getValues(model);<br />
</pre></code><br />
<br />
Returns the values that correspond to the samples from getSamples().<br />
<br />
=== getDescription ===<br />
<code><pre><br />
>> desc = getDescription(model);<br />
</pre></code><br />
<br />
Returns a string with a user friendly description of the model.<br />
<br />
=== getExpression ===<br />
<code><pre><br />
>> desc = getExpression(model,[outputNumber]);<br />
</pre></code><br />
<br />
Returns the symbolic mathematical expression of this model (e.g., 3*x1^2 - 2*x2 +5). Note that not all model types implement this.<br />
<br />
=== construct ===<br />
<code><pre><br />
>> model = construct(model,samples);<br />
</pre></code><br />
<br />
This will build/train/fit.. the model on the given set of data points and return the updated model.<br />
<br />
=== complexity ===<br />
<code><pre><br />
>> n = complexity(model);<br />
</pre></code><br />
<br />
Returns the number of free parameters in the model. By default this returns the number of datapoints the model was built with but this is overridden by some model types. For example, an ANN model returns the number of weights in the network while a rational model returns the number of coefficients.<br />
<br />
== Model object interfacing and optimization ==<br />
<br />
You may want to use the model as part of a larger Matlab program, or you may simply want to optimizer the model. To easily do this you can create a function handle to the model object. You can do this as follows (example for the 3D case):<br />
<br />
<code><pre><br />
handle = @(x,y,z) evaluate( model, [x,y,z] );<br />
</pre></code><br />
<br />
Afterwards, you can pass that handle to your optimization procedure, or use it through <code>feval</code>:<br />
<br />
<code><pre><br />
fmincon( handle, ... );<br />
feval( handle, 0, 1, -1 );<br />
</pre></code></div>Dgorissenhttp://sumowiki.intec.ugent.be/index.php?title=Using_a_model&diff=5200Using a model2010-08-18T07:16:24Z<p>Dgorissen: /* Available methods */</p>
<hr />
<div>This page explains what you can do with a SUMO generated model.<br />
<br />
== Loading a model from disk ==<br />
<br />
As the SUMO Toolbox builds models, each current best model is stored as a Matlab mat file in the output directory (e.g.: <code>output/Academic_2D_Twice_rep01_run00_2008.05.20_10-27-18/models_out/model_0002.mat</code>). <br />
<br />
In order to load this model from disk and actually use it, do the following:<br />
<br />
* Start Matlab, make sure the SUMO Toolbox is in your path and navigate to the directory where the model file is stored<br />
* Load the model from disk as follows:<br />
** >> <code>modelFile = load('model_0002.mat');</code><br />
** >> <code>model = modelFile.model;</code><br />
<br />
Now the model is available as the variable 'model' in the Matlab workspace.<br />
<br />
== Model portability ==<br />
<br />
How do you exchange and/or export SUMO models.<br />
<br />
=== The other person has the SUMO Toolbox installed ===<br />
<br />
The model 'mat' files can be shared with other people. In order for somebody else to use your saved model the following conditions need to be satisfied:<br />
<br />
* The person has the SUMO Toolbox in his Matlab path<br />
* The person should be using a similar Matlab version (including toolboxes) as was used to create the model file (preferably equal)<br />
* The person should be using a similar SUMO Toolbox version as was used to create the model file (preferably equal)<br />
<br />
We do not guarantee portability if the the above versions differ.<br />
<br />
=== The other person does NOT have the SUMO Toolbox installed ===<br />
<br />
In this case you can use the ''getExpression'' and ''exportToMFile'' (available from v6.0) methods. See below.<br />
<br />
== Model space vs Simulator space ==<br />
<br />
It is important to note the difference between '''Model space''' and '''Simulator space'''. When a data point is in model space, it means its inputs all lie in the range [-1 1]. When the point is in simulator space its inputs lie in the range specified by the [[Simulator configuration]] file.<br />
<br />
Internally the toolbox only works in model space. The toolbox will take care of translating points from simulator space into model space and back (this happens in the SampleManager object). You will note that many methods have a ''XXXinModelSpace'' variant. This just means that the method does exactly the same, except it expects points to be in model space. You should normally not have to care about model space unless you are writing your own extensions to the toolbox. In that case see [[Add Model Type]].<br />
<br />
== Available methods ==<br />
<br />
Once the model is loaded you can invoke a number methods on it. We list the main ones below. For a full list of available methods just use the matlab 'methods' command:<br />
<br />
<source lang="matlab"><br />
>>methods(model)<br />
</matlab><br />
<br />
<br />
<br />
=== guiPlotModel ===<br />
<br />
The easiest way to explore a model is to use the graphical model browser. [[Model Visualization GUI|See here for more information]]<br />
<br />
=== plotModel ===<br />
<code><pre><br />
>>[figureHandle] = plotModel(model,[outputNumber],[options])<br />
</pre></code><br />
<br />
<code>plotModel</code> will generate an indicative plot of the model surface. To do so, it evaluates the model on a reasonably dense grid of points.<br />
<br />
<code>plotModel</code> optional parameters:<br />
* <code>outputNumber</code>: optional parameter, an integer specifying which output to plot<br />
* <code>options</code>: optional parameter, a struct containing a number of options you can set. To get the default options simply call <code>Model.getPlotDefaults()</code>.<br />
<br />
<br />
To determine which kind of plot is generated, one makes a distinction based on the dimension of the input space:<br />
* '''One dimensional models''' are always plotted in a simple XY line chart. Samples are shown as dots.<br />
* '''Two dimensional models''' are plotted as a Matlab *mesh* plot, i.e. a colored surface. The colors are just an indication of height and don't have any further meaning. The samples are plotted as dots, and should (hopefully) approach the surface.<br />
* '''Three dimensional problems''' are plotted used a custom built [[Slice Plot]].<br />
* '''Four dimensional problems''' are plotted using 3 [[Slice Plot]]s. The leftmost plot fixes the variable of the fourth variable at -1, the middle plot at 0 and the rightmost plot at 1 (thus reducing the function to a three dimensional function, making a slice plot possible<br />
* '''Five dimensional problems''' are plotted using 9 [[Slice Plot]]s. The fourth and fifth variables are fixed at values of -1, 0 and 1. Indicators below the plots show where the variables were fixed.<br />
* '''Higher dimensional problems''': All variables after the fifth are fixed at 0, and plotting proceeds as if the model was five dimensional.<br />
<br />
The toolbox handles complex valued outputs as their modulus (= absolute value = magnitude) for plotting purposes. These plots are just visual aids for monitoring the modeling process. Phase data can be extracted from the model files.<br />
<br />
=== evaluate ===<br />
<code><pre><br />
>> values = evaluate(model, samples);<br />
</pre></code><br />
<br />
This evaluates the model on the given samples. The samples should be provided in simulator space. Simulator space is defined by the range in the [[Simulator configuration]]. If no range (minimum and maximum) was specified, the domain is assumed to be [-1,1].<br />
<br />
See also [[Using_a_model#Model_object_interfacing_and_optimization]]<br />
<br />
=== evaluateDerivative ===<br />
<code><pre><br />
>> values = evaluateDerivative(model, samples, [outputIndex]);<br />
</pre></code><br />
<br />
This approximates the partial derivatives of the model at each given sample. Note that the base class implementation is a very simple approximation. Models can override this function to provide more accurate derivatives (e.g., Kriging does this already). However, in its current form it is already useful.<br />
<br />
=== getSamples ===<br />
<code><pre><br />
>> samples = getSamples(model);<br />
</pre></code><br />
<br />
Returns the samples that were used to fit the model. The samples are returned in simulator space.<br />
<br />
=== getValues ===<br />
<code><pre><br />
>> values = getValues(model);<br />
</pre></code><br />
<br />
Returns the values that correspond to the samples from getSamples().<br />
<br />
=== getDescription ===<br />
<code><pre><br />
>> desc = getDescription(model);<br />
</pre></code><br />
<br />
Returns a string with a user friendly description of the model.<br />
<br />
=== getExpression ===<br />
<code><pre><br />
>> desc = getExpression(model,[outputNumber]);<br />
</pre></code><br />
<br />
Returns the symbolic mathematical expression of this model (e.g., 3*x1^2 - 2*x2 +5). Note that not all model types implement this.<br />
<br />
=== construct ===<br />
<code><pre><br />
>> model = construct(model,samples);<br />
</pre></code><br />
<br />
This will build/train/fit.. the model on the given set of data points and return the updated model.<br />
<br />
=== complexity ===<br />
<code><pre><br />
>> n = complexity(model);<br />
</pre></code><br />
<br />
Returns the number of free parameters in the model. By default this returns the number of datapoints the model was built with but this is overridden by some model types. For example, an ANN model returns the number of weights in the network while a rational model returns the number of coefficients.<br />
<br />
== Model object interfacing and optimization ==<br />
<br />
You may want to use the model as part of a larger Matlab program, or you may simply want to optimizer the model. To easily do this you can create a function handle to the model object. You can do this as follows (example for the 3D case):<br />
<br />
<code><pre><br />
handle = @(x,y,z) evaluate( model, [x,y,z] );<br />
</pre></code><br />
<br />
Afterwards, you can pass that handle to your optimization procedure, or use it through <code>feval</code>:<br />
<br />
<code><pre><br />
fmincon( handle, ... );<br />
feval( handle, 0, 1, -1 );<br />
</pre></code></div>Dgorissenhttp://sumowiki.intec.ugent.be/index.php?title=Citing&diff=5174Citing2010-08-10T08:48:07Z<p>Dgorissen: </p>
<hr />
<div>When reporting on results obtained with the SUMO Toolbox please refer to:<br />
<br />
<br />
'''A Surrogate Modeling and Adaptive Sampling Toolbox for Computer Based Design'''<br />
<br>D. Gorissen, K. Crombecq, I. Couckuyt, T. Dhaene, P. Demeester,<br />
<br>Journal of Machine Learning Research,<br />
<br>Vol. 11, pp. 2051−2055, July 2010.<br />
<br>[http://www.jmlr.org/papers/volume11/gorissen10a/gorissen10a.pdf JMRL link]<br />
<br />
<br />
For a list of SUMO related publications see [http://www.sumo.intec.ugent.be/?q=publications The SUMO-lab home page]. <br />
For publications from other authors/institutions that are similar in idea and scope to the SUMO project, see [[Related publications]].</div>Dgorissenhttp://sumowiki.intec.ugent.be/index.php?title=Whats_new&diff=5169Whats new2010-08-02T08:04:38Z<p>Dgorissen: </p>
<hr />
<div>This page gives a high level overview of the major changes in each toolbox version. For the detailed list of changes please refer to the [[Changelog]] page. For a list of features in the current version [[About#Features|see the about page]].<br />
<br />
== 7.0.2 - 1 August 2010 ==<br />
<br />
A minor cosmetic update to correspond with the upcoming JMLR Software publication.<br />
<br />
== 7.0.1 - 15 January 2010 ==<br />
<br />
This release fixes a couple of known bugs, the most important one being a clustering related bug in the LOLA sample selection algorithm. All users are strongly encouraged to upgrade.<br />
<br />
== 7.0 - 29 January 2010 ==<br />
<br />
The biggest change of this release is the move to a new license model. From now on the SUMO Toolbox will be available under an '''open source''' license for non-commercial use. This means there no longer is a time or user limit and there is no need for activation files. Details can be found in the [[License terms]].<br />
<br />
Besides this the code has seen some improvements and cleanups, most notably the Sample Evaluator and (Blind) Kriging components.<br />
<br />
== 6.2.1 - 19 October 2009 ==<br />
<br />
A bug fix release, all users are strongly requested to upgrade.<br />
<br />
== 6.2 - 6 October 2009 ==<br />
<br />
=== Sample Selection infrastructure ===<br />
<br />
The sample selection infrastructure has been dramatically refactored in to a highly flexible and pluggable system. Different sample selection criteria can now be combined in a variety of different ways and the road has been opened towards dynamic sample selection criteria.<br />
<br />
The LOLA-Voronoi algorithm has also seen some improvement with the addition of support for input constraints, sampling multiple outputs simultaneously, and improved support for dealing with auto-sampled inputs.<br />
<br />
Sample points are now also assigned a priority by the sampling algorithm which is reflected in the order they are evaluated. Finally, the Latin Hypercube design has been much improved. It will now attempt to download known optimal designs automatically before attempting to generate one itself.<br />
<br />
=== Model building infrastructure ===<br />
<br />
The two main changes here are firstly the addition of an "ann" modelbuilder beside the existing "anngenetic" one. This one runs faster, is more configurable and the quality of the models is roughly the same. <br />
<br />
Secondly, the (Blind) Kriging models have been much improved. A new implementation was added that replaces (and outperforms) the existing DACE Toolbox plugin. Support has also been added for automatically selecting the Kriging correlation functions.<br />
<br />
=== Other changes ===<br />
<br />
Other noteworthy changes include: the addition of an interpolation model type, cleanups and fixes in the error functions, improved stability in LRMMeasure, faster measures in a multi-output setting, and more informative help texts. Additionally the Model Browser and Profiler GUIs have seen some improvements in usability and functionality.<br />
<br />
At the same time the code has seen more cleanups (it is now fully Classdef compliant) and the use of the parallel computing toolbox (if available) has been improved.<br />
<br />
As always, a detailed list of changes can be found in the [[Changelog]].<br />
<br />
== 6.1.1 - 17 April 2009 ==<br />
<br />
This is a bugfix release that contains some cleanups and fixes to the [[Known bugs]] of version 6.1<br />
<br />
== 6.1 - 16 February 2009 ==<br />
<br />
The main improvements of 6.1 over 6.0.1 are stability, robustness, speed, and improved interfacing. However, a number of major new features have been added as well.<br />
<br />
=== Multi-Objective Modeling ===<br />
<br />
Full [[Multi-Objective Modeling|multi-objective]] support when optimizing the model parameters. This allows an engineer to enforce multiple criteria on the models produced (instead of just a single accuracy measure). This will also allow the efficient generation of model with multiple outputs (already possible through the combineOutputs option but not yet in a multi-objective setting). Together with the automatic model type selection algorithm (heterogenetic) this allows the automatic selection of the best model type per output. See [[Multi-Objective Modeling]] for more information and usage.<br />
<br />
=== Smoothness Measure ===<br />
<br />
A new measure: Linear Reference Model (LRM) has been added. This measure is best used together with other measures and helps to enforce a smooth model surface.<br />
<br />
=== Parallel Computing ===<br />
<br />
Added experimental support for the Matlab Parallel Computing Toolbox (local scheduler only). This means that when the parallelMode option in ContextConfig is switched on, model construction will make use of all available cores/cpu's in order to build models in parallel. This can result in some significant speedups.<br />
<br />
=== General Modeling ===<br />
<br />
The ''heterogenetic'' model builder for automatic model type selection has seen many cleanups and the code has been improved. Now there should be no more manual hacks in order to use it. The rational models now support all available optimization algorithms for order selection and two new model types have been added: Blind Kriging and Gaussian Process Models. An Efficient Global Optimization (EGO) modelbuilder has also been added. This means that a nested kriging model is used internally to predict which model parameters (e.g., of an SVM model) will result in the most accurate fit. All models can now also be queried for derivatives at any point in their domain (regarless of the model type).<br />
<br />
=== Code improvements ===<br />
<br />
From now on Matlab 2008a or later will be required to run the toolbox (see [[System requirements]]). The reason is that most of the modeling code has been ported to Matlabs new [[OO_Programming_in_Matlab|Object Orientation]] implementation. The result is that the modeling code has become much cleaner and much less prone to bugs. The interfaces have become more well-defined and it should be much easier to incorporate your own model type or hyperparameter optimization algorithm.<br />
<br />
Note also that the Gradient Sample Selection algorithm has been renamed to LOLA.<br />
<br />
=== General Improvements ===<br />
<br />
In general, many bugs have been fixed, features, and error reporting improved and performance enhanced. Also note that the default error function is now the [http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=4107991 Bayesian Error Estimation Quotient (BEEQ)]. Trivial dependencies on the Statistics Toolbox have been removed.<br />
<br />
== 6.0.1 - Released 23 August 2008 ==<br />
<br />
* This is a bugfix release that fixes a few things in the 6.0 release (including a crash on startup in some cases, see [[Known bugs]])<br />
<br />
== 6.0 - Released 6 August 2008 ==<br />
<br />
Originally this was supposed to be 5.1 but after many fixes and added features we decided to to promote it to 6.0. Some of the things that can be expected for 6.0 are:<br />
<br />
* Some important modeling related bugs have been fixed leading to improved model accuracy convergence<br />
* A nice graphical user interface (GUI) for loading models, browsing through dimensions, plotting errors, generating movies, ... ([[Model Visualization GUI|See here for more information]])<br />
* Introduction of project directories. All files belonging to a particular problem (simulation code, datasets, XML files, documentation, ...) are now grouped together in a project directory instead of being spread out over 3 different places.<br />
* Support for autosampling, one or more dimensions can be ignored during adaptive sampling. This is useful if the simulation code can generate samples for that dimension itself (e.g., frequency samples in the case of a frequency domain simulator in Electro-Magnetism)<br />
* Models now remember axis lables, measure scores, and output names<br />
* An export function has been added to export models to a standalone Matlab script (.m file). Not supported for all model types yet.<br />
* Proper support for Matlab R2008<br />
* A simple new model type "PolynomialModel" that builds polynomial models with a fixed (user defined) order<br />
* Note that in some cases loading models generated by older toolbox versions will not work and give an error<br />
<br />
And of course countless bugfixes, performance, and feature enhancements. '''Upgrading is strongly advised'''.<br />
<br />
== 5.0 - Released 8 April 2008 ==<br />
<br />
=== SUMO Toolbox ===<br />
<br />
In April 2008, the first public release of the '''SUrrogate MOdeling (SUMO) Toolbox''' occurred.<br />
<br />
=== Sampling related changes ===<br />
<br />
The sample selection and evaluation backends have seen some major improvements. <br />
<br />
The number of samples selected each iteration need no longer be chosen a priori but is determined on the fly based on the time needed for modeling, the average length of the past 'n' simulations and the number of compute nodes (or CPU cores) available. Of course, a user specified upper bound can still be specified. It is now also possible to evaluate data points in batches instead of always one-by-one. This is useful if, for example, there is a considerable overhead for submitting one point.<br />
<br />
In addition, data points can be assigned priorities by the sample selection algorithm. These priorities are then reflected in the scheduling decisions made by the sample evaluator. It now also becomes possible to add different priority management policies. For example, one could require that 'interest' in sample points be renewed, else their priorities will degrade with time.<br />
<br />
A new sample selection algorithm has been added that can use any function as a criterion of where to select new samples. This function is able to use all the information the surrogate provides to calculate how interesting a certain sample is. Internally, a numeric global optimizer is applied on the criterion to determine the next sample point(s). There are several criterions implemented, mostly for global optimization. For instance the 'expected improvement criterion' is very efficient for global optimization as it balances between optimization itself and refining the surrogate.<br />
<br />
Finally the handling of failed or 'lost' data points has become much more robust. Pending points are automatically removed if their evaluation time exceeds a multiple of the average evaluation time. Failed points can also be re-submitted a number of times before being regarded as permanently failed.<br />
<br />
=== Modeling related changes ===<br />
<br />
The modeling code has seen some much needed cleanups. Adding new model types and improving the existing ones is now much more straightforward.<br />
<br />
Since the default Matlab neural network model implementation is quite slow, two additional implementations were added based on [http://fann.sf.net FANN] and [http://www.iau.dtu.dk/research/control/nnsysid.html NNSYSID] which are much faster. In addition the NNSYSID implementation also supports pruning. However, though these two implementations are faster, the Matlab implementation still outperforms them accuracy wise.<br />
<br />
An intelligent seeding strategy has been enabled. The starting point/population of each new model parameter optimization run is now chosen intelligently in order to achieve a more optimal search of the model parameter space. This leads to better models faster.<br />
<br />
=== Optimization related changes ===<br />
<br />
* The Optimization framework was removed due to [[FAQ#What_about_surrogate_driven_optimization.3F|several reasons]].<br />
* Added an [[Optimizer|optimizer]] class hierarchy for solving subproblems transparently.<br />
* Added several criterions for optimization, available through the [[Config:SampleSelector#isc|InfillSamplingCriterion]].<br />
<br />
=== Various changes ===<br />
<br />
The default 'error function' is now the root relative square error (= a global relative error) instead of the absolute root mean square error. <br />
<br />
The memory usage has been drastically reduced when performing many runs with multiple datasets (datasets are loaded only once).<br />
<br />
The default settings have been harmonized and much improved. For example the SVM parameter space is now searched in log10 instead of loge. The MinMax measure is now also enabled by default if you do not specify any other measure. This means that if you specify minimum and maximum bounds in the simulator xml file, models which do not respect these bounds are penalized.<br />
<br />
Finally this release has seen countless cleanups, bug fixes and feature enhancements.</div>Dgorissenhttp://sumowiki.intec.ugent.be/index.php?title=Changelog&diff=5168Changelog2010-08-02T08:03:56Z<p>Dgorissen: </p>
<hr />
<div>Below you will find the detailed list of changes in every new release. For a more high level overview see the [[Whats new]] page.<br />
<br />
== 7.0.2 - 1 August 2010 ==<br />
<br />
* Minor cosmetic updates<br />
<br />
== 7.0.1 - 15 June 2010 ==<br />
<br />
* Bugfix release<br />
<br />
== 7.0 - 29 January 2010 ==<br />
<br />
* Move to a dual license model, with an open source licence (AGPLv3) for non-commercial use, see [[License terms]]<br />
* Experimental support for classification and 3D geometric modeling problems (see the 2 new demos)<br />
* Thorough cleanup of SampleEvaluator related classes and package structure<br />
* Improved speed and stability in (Blind) Kriging models and fixed the correlation function derivatives.<br />
* Vastly improved the utilization of compute nodes if a distributed sample evaluator is used that interfaces with a cluster or grid<br />
* Support for plotting the prediction uncertainty in the model browser GUI <br />
* Support for quasi random sequences as initial design<br />
<br />
== 6.2.1 - 19 October 2009 ==<br />
<br />
* This release fixes a number of bugs from 6.2. All users are strongly requested to upgrade.<br />
<br />
== 6.2 - 6 October 2009 ==<br />
<br />
* A new neural network modelbuilder "ann". This is a lot faster than the existing "anngenetic" and the quality of the models is roughly the same<br />
* The sample selection infrastructure is now much more powerful, sample selection criteria can be combined with much more flexibility. This opens the way to dynamic variation of sampling criteria.<br />
* Support for Input constraints / multiple output sampling in the LOLA-Voronoi sample selection algorithm<br />
* Support for auto-sampled inputs (e.g., frequency in an EM context) in LOLA-Voronoi. This is useful if a particular input is already sampled by your simulator.<br />
* Automatic filtering of samples close to each other in CombinedSampleSelector<br />
* Support for TriScatteredInterp in InterpolationModel when it is available (Matlab version 2009a and later)<br />
* Sample selectors that support it (for example: LOLA-Voronoi) now give priorities to new samples, to that samples are submitted and evaluated in order of importance.<br />
* Support for pre-calculated Latin Hypercube Designs, these will be automatically downloaded and used where possible and will improve performance<br />
* The Blind Kriging models have been improved and can now also be used as ordinary Kriging models. Since these models are superior to the existing DACE Toolbox models, the DACE Toolbox backend has been removed.<br />
* The EGOModelBuilder (do model parameter optimization using the EGO algorithm) now uses a nested blind kriging model instead of one based on the DACE Toolbox. This allows for better accuracy<br />
* The Kriging correlation functions can now be chosen automatically (instead of only the correlation parameters)<br />
* Support for multiobjective optimization in the EGO framewok (extended version of probability of improvement)<br />
* DelaunaySampleSelector, OptimizeCriterion support the same set of criterions<br />
* EGO Improvement criteria can now be used together with DACEModel, RBFModel, and SVMModel (LS-SVM backend only)<br />
* Added a model type and builder that does linear/cubic/nearest neighbour interpolation<br />
* All error functions and measures now consistently deal with complex valued data and multiple output models<br />
* Various improvements in the Model Info GUI as part of the Model browser tool<br />
* Improved stability in LRMMeasure, a behavioral complexity metric to help ensure parsimonious models<br />
* The profiler GUI has been updated and improved, and support for textual profilers has been added.<br />
* Improved performance when using Measures, especially for models with multiple outputs.<br />
* Improved management of the best model trace, also in pareto mode<br />
* Removed the debug output when using (LS-)SVM models and added compiled mex files for Windows<br />
* Ported the remaining classes to Matlabs Classdef format<br />
* Increased use of the parallel computing toolbox (if available) in order to speed up modeling<br />
* Improved the Matlab file headers so the help text is more informative (always includes at least the signature)<br />
* Support for plotting the model prediction uncertainty in the model browser (only for 1D plots and not supported by all model types)<br />
* Added support for so-called "reference by id" on every level of the config. If a tag of a particular type is defined on top-level with an id, it can be referenced everywhere else, instead of copying it entirely. See rationalPoleSupression sample selector and patternsearch Optimizer, for example.<br />
* EmptyModelBuilder added - in case you just want to use the sequential design facilities of the toolbox, but not its models.<br />
* Various cleanups and bugfixes<br />
<br />
== 6.1.1 - 17 April 2009 ==<br />
<br />
* Various cleanups and bugfixes (see [[Known bugs]] for 6.1)<br />
<br />
== 6.1 - 16 February 2009 ==<br />
<br />
* The default error function is now the Bayesian Error Estimation Quotient (BEEQ)<br />
* Full support for multi-objective model generation, multiple measures can now be enforced simultaneously. This can also be applied to generating models with multiple outputs (combineOutputs = true). Together with the automatic model type selection algorithm (heterogenetic) this allows the automatic selection of the best model type per output.<br />
* The model browser GUI now supports QQ plots<br />
* The Gradient Sample Selection Algorithm has been renamed to the Local Linear Sample Selector (LOLASampleSelector)<br />
* The modelbuilders have been refactored and some removed. This is a result of the optimizer hierarchy being cleaned up. Adding a new model parameter optimization routine should now be more straightforward.<br />
* The interface classes have been renamed to factories as this is more correct. All implementations have been ported to Matlab's new Classdef format and the inherritance hierarchy has been cleaned up. It should now be significantly easier to add support for new approximation types.<br />
* The ModelInterfaces are now known as ModelFactories, this is more correct. Note that the XML tagnames have been changed as well.<br />
* The Model class hierarchy has been converted to the new Classdef format. This means that models generated with previous versions of the toolbox will no longer be loadable in this version.<br />
* The heterogenetic model builder for automatic model type selection has been cleaned up and made more robust.<br />
* Rational models now support all available modelbuilders. This means that order selection can be done by PSO DIRECT, Simulated Annealing, ... instead of just GA and Sequential.<br />
* New optimizers added are (they can also be used as model builders): Differential Evolution<br />
* Added a Blind Kriging model type implementation as a backend of KrigingModel<br />
* Addition of an EGO model builder. This allows optimization of the model parameters using the well known Efficient Global Optimization (EGO) algorithm. In essence this uses a nested Kriging Model to predict which parameters should be used to build the next model.<br />
* Trivial dependencies on the Statistics Toolbox have been removed<br />
* Added a new smoothness measure (LRMMeasure) that helps to ensure smooth models and reduce erratic bumps. It works best when combined with other Measures (such as SampleError for ANN models) <br />
* Models now have a simple evaluateDerivative() method that allows one to easily get gradient information. The base class implementation is very simple but works. Models can override this method to get more efficient implementations.<br />
* Added experimental support for the Matlab Parallel Computing Toolbox (local scheduler only). This means that when the parallelMode option in ContextConfig is switched on, model construction will make use of all available cores/cpu's.<br />
* Many speed improvements, some quite significant.<br />
* Various cleanups and bugfixes<br />
<br />
== 6.0.1 - Released 23 August 2008 ==<br />
<br />
* Fixed a number of (minor) bugs in the 6.0 release<br />
<br />
== 6.0 - Released 6 August 2008 ==<br />
<br />
* Many important bugs have been fixed that could have resulted in sub-optimal models<br />
* Addition of a Model Browser GUI, this allows you to easily 'walk' through multi-dimensional models<br />
* Moved the InitialDesign tag outside of the SUMO tag<br />
* Some speed improvements<br />
* Removed support for dummy inputs<br />
* Measure scores and input/output names are saved inside the models, allowing for more usable plots<br />
* Added the project directory concept, each example is now self contained in its own directory<br />
* #simulatorname# can now be used in the run name, it will get replaced by the real simulator name<br />
* Input dimensions can be ignored during sampling if the simulator samples them for you. This is useful in EM applications for example where frequency points can be cheap.<br />
* Logging framework revamped, logs can now be saved on a per run basis<br />
* The global score calculation has changed! it is a weighted sum of all individual measures. (the weights are configurable but default to 1)<br />
* Added a simple polynomial model where the orders can be chosen manually<br />
* Countless cleanups, minor bugfixes and feature enhancements<br />
<br />
== 5.0 - Released 8 April 2008 ==<br />
<br />
* In April 2008, the first public release of the '''Surrogate Modeling (SUMO) Toolbox''' (v5.0) occurred. <br />
* A major new release with countless fixes, improvements, new sampling and modeling algorithms, and much more.<br />
<br />
List of changes:<br />
<br />
* Fixed the 'Known bugs' for v4.2 (see Wiki)<br />
* data points now have priorities (assigned by the sample selectors)<br />
* Vastly reworked and improved the sample evaluator framework<br />
** robust handling of failed or 'lost' data points<br />
** pluggable input queue infrastructure to make advanced scheduling policies possible<br />
* The number of samples to select each iteration is now selected dynamically, based on the time needed for modeling, the length of one simulation, the number of compute nodes available, ... A user specified upper bound can till be specified of course.<br />
* Model plots are now in the original space instead of the normalized ([-1 1]) space<br />
* The default error function is now the root relative square error (= a global relative error)<br />
* Intelligent seeding of each new model parameter optimization iteration. This means the model parameter space is searched much more efficiently and completely<br />
* Added a fast Neural Network Modeler based on FANN (http://fann.sf.net)<br />
* Added a Neural Network Modeler based on NNSYSID (http://www.iau.dtu.dk/research/control/nnsysid.html)<br />
* The LS-SVM model type has been merged with the SVM model type. The SVM model now supports three backends: libSVM, SVMlight, and lssvm<br />
* Added a SampleSelector using infill sampling criterions (ISC).<br />
** The expected improvement from EGO/superEGO is provided among others. (only usable with Kriging and RBF)<br />
* More robust handling of SSH sessions when running simulators on a remote cluster<br />
* The TestSamples measure has been renamed to ValidationSet<br />
* The Polynomial model type has been renamed to the more apt Rational model<br />
* The grid and voronoi sample selectors have been renamed to Error and Density respectively<br />
* Drastically reduced memory usage when performing many runs with multiple datasets (datasets are cached)<br />
* Added utility functions for easily summarizing profiler data from a large number of runs<br />
* Lots of speed improvements in the gradient sample selector<br />
* The default settings have been harmonized and much improved<br />
* The (LS)SVM parameter space is now searched in log10 instead of ln space<br />
* Added a TestMinimum measure <br />
** compares the minimum of the surrogate model against a predefined value (for instance a known minimum)<br />
* Added a MinimumProfiler<br />
** tracks the minimum of the surrogate model versus the number of iterations<br />
* Movie creation now works on all supported platforms<br />
* Added an optimizer class hierarchy for solving subproblems transparantly<br />
* Cleaned up the structure of all the model classes so they no longer contain an interface object. This was confusing and led to error prone code. Virtually all subsref and subassgn implementations have also been removed.<br />
* The MinMax measure is now enabled by default<br />
* The Optimization framework was removed (and replaced) for various reasons, see: http://sumowiki.intec.ugent.be/index.php/FAQ#What_about_surrogate_driven_optimization.3F<br />
* Fixed the file output of the profiler, formatting is correct now<br />
* New implementation of a maximin latin hypercube design<br />
** Minimizes pairwise correlation<br />
** Minimizes intersite distance<br />
* Removed dependency of factorial design on the statistics toolbox<br />
* Added a plotOptions tag, this allows for more customisability of model plots (grey scale, light effects, ...)<br />
* Profiler plots can now also be saved as JPG, PNG, EPS, PDF, PS and SVG<br />
* Countless cleanups, minor bugfixes and feature enhancements<br />
<br />
== 4.2 - Released 18 October 2007 ==<br />
<br />
* Fixed the 'Known bugs' for v4.1 (see Wiki)<br />
* Simulators can be passed options through an <Options> tag<br />
* Added a fixed model builder so you can manually force which model parameters to use<br />
* Removed ProActive dependency for the SGE distributed backend<br />
* Improved Makefile under unix/linux<br />
* Data produced by simulators no longer needs to be pre-scaled to [-1 1], this can be done automatically from the simulator configuration file<br />
* Deprecated the optimization framework. It is currently under re-design and a better, more integrated version, will be released with the next toolbox version.<br />
* Lots of cleanups, minor bugfixes and small feature enhancements<br />
* In October 2007, the development of the M3-Toolbox was discontinued.<br />
<br />
== 4.1 - Released 27 July 2007 ==<br />
<br />
* Fixed the 'Known bugs' for v4.0 (see Wiki)<br />
* Vastly improved test sample distribution if a test set is created on the fly<br />
* Gradient sample selector now works with complex outputs and has improved neighbourhood selection<br />
* Speed and usability improvements in the profiler framework<br />
* Improvements in the profiler DockedView widget (added a right click context menu)<br />
* Addition of some new examples<br />
* Added an option (on by default) that selects a certain percentage of the grid sample selector's points randomly, making the algorithm more robust<br />
* Some cleanups, minor bugfixes and feature enhancements<br />
<br />
== 4.0 - Released 22 June 2007 ==<br />
<br />
* IMPORTANT: the best model score is now 0 instead of 1, this is more intuitive<br />
* Reworked and improved the model scoring mechanism, now based on a pareto analysis. This makes it possible to combine multpile measures in a sensible way.<br />
* Added a proof of concept surrogate driven optimization framework. Note this is an initial implementation which works, but don't expect state of the art results.<br />
* Cleanup and refactoring of the profiler framework<br />
* The profiling of model parameters has been totally reworked and this can now easily be tracked in a nice GUI widget<br />
* Cleanup of error function logic so you can now easily use different error functions (relative, RMS, ...) in the measures<br />
* Improved model plotting<br />
* Support for the SVMlight library (you must download it yourself in order to use it)<br />
* Added a MinMax measure which can be used to suppress spikes in rational models<br />
* Support for extinction prevention in the heterogenetic modeler<br />
* Fixed warnings (and in some cases errors) when loading models from disk<br />
* Respect the maximum running time more accurately<br />
* Many cleanups, minor bugfixes and feature enhancements<br />
<br />
== 3.3 - Released 2 May 2007 ==<br />
<br />
* Fixed incorrect summary at the end of a run<br />
* Fixed bug due to duplicate sample points<br />
* Ability to evaluate multiple samples in parallel locally (support for dual/multi-core machines)<br />
* Speedups when reading in datasets<br />
* Added 2 new modelbuilders that optimize the parameters using;<br />
** Pattern Search (requires the Matlab direct search toolbox)<br />
** Simulated Annealing (requires Matlab v7.4 and the direct search toolbox)<br />
** The Matlab Optimization Toolbox (includes different gradient based methods like BGFS)<br />
* A new density based sample selction algorithm (VoronoiSampleSelector)<br />
* New simulator examples to test with<br />
* Addition of a profiler to generate levelplots<br />
* Ability to generate Matlab API documentation using m2html<br />
* New neural network training algorithms based on Differential Evolution and Particle Swarm Optimization<br />
* It is now possible to call the toolbox with specific samples/values directly, e.g., go('myConfigFile.xml',xValues,yValues);<br />
* Many minor bugfixes and feature enhancements<br />
<br />
== 3.2 - Released 9 Mar 2007 ==<br />
<br />
* Many important bugfixes<br />
* Documentation improvements<br />
* Fully working support for RBF models<br />
* New measure profilers that track the errors on measures<br />
* Many new predefined functions and datasets to test with. We now have over 50 examples!<br />
<br />
== 3.1 - Released 28 Feb 2007 ==<br />
<br />
* Small bugfixes and usability improvements<br />
* Improved documentation<br />
* Working implementation of a heterogenous evolutionary modelbuilder<br />
* More examples<br />
<br />
== 3.0 - Released 14 Feb 2007 ==<br />
<br />
* Availability of pre-built binaries<br />
* Extensive refactoring and code cleanups<br />
* Many bugfixes and usability improvements<br />
* Resilience against simulator crashes<br />
* Ability to set the maximum running time for one sample evaluation<br />
* Vastly improved Genetic model builder + a neural network implementation<br />
* Addition of a RandomModelBuilder to use as a baseline benchmark<br />
* Possible to add dummy input variables or to model only a subset of the available inputs while clamping others<br />
* Improved multiple output support<br />
** outputs can be modeled in parallel<br />
** each output can be configured separately (eg. per output: model type, accuracy requirements (measure), sample selection algorithm, complex handling flag, etc) <br />
** mutliple outputs can be combined into one model if the model type supports this<br />
* Noisy (gaussian, outliers, ...) versions of a given output can be automatically added <br />
* New and improved directory structure for output data<br />
* New model types:<br />
** Kriging (based on the DACE MATLAB Kriging Toolbox by Lophaven, Nielsen and Sondergaard)<br />
** Splines (based on the MATLAB Splines Toolbox, only for 1D and 2D)<br />
* Now matlab scripts can be used as datasources (simulators) as well<br />
* New initial experimental design<br />
** Based on a dataset<br />
** Combination of existing designs<br />
** Based on the complexity of different 1D fits<br />
* Addition of new datasets and predefined functions as modeling examples<br />
<br />
== 2.0 - Released 15 Nov 2006 ==<br />
<br />
* Initial release of the M3-Toolbox - open source</div>Dgorissenhttp://sumowiki.intec.ugent.be/index.php?title=Running&diff=5166Running2010-07-03T12:29:04Z<p>Dgorissen: /* Test Suite */</p>
<hr />
<div>== Getting started ==<br />
<br />
If you are just getting started with the toolbox and you have no idea how everything works, this section should help you on your way.<br />
First make sure you [[About#Intended_use|know what the toolbox is used for]], you have finished the toolbox [[Installation]] and you have done a successful [[Installation#Test_run|test run]] by running the default configuration. If that works you know everything is working correctly. Then:<br />
<br />
# Go through the presentation [[About#Documentation|available here]], paying specific attention to the control flow<br />
# The behavior of the toolbox is fully configured through two XML files. If you do not know what XML is please read [[FAQ#What is XML?]] first.<br />
# Read [[Toolbox_configuration|the toolbox configuration structure section]]. This is very important. Then print out ''config/default.xml'' and take your time to read it through and understand the structure and the way things work.<br />
# Do the [[Installation#Test_run|test run]] again, this time play closer attention to what is happening and see if you understand what is going on. If you still have no idea you can refer to the [[Running#Understanding_the_control_flow|Understanding the control flow]] section below.<br />
# Ok, by now you should have a rough idea how the configuration file is structured and how the control flow works. Now, Change ''default.xml'' to run a different example. This [[Running#Running_different_examples|is explained below]]. If you can do that and it works you should have mastered all the basic skills needed to use the toolbox. You can now browse through the rest of the wiki as needed.<br />
<br />
If you get stuck or have any problems [[Reporting problems|please let us know]].<br />
<br />
''We are well aware that documentation is not always complete and possibly even out of date in some cases. We try to document everything as best we can but much is limited by available time and manpower. We are are a university research group after all. The most up to date documentation can always be found (if not here) in the default.xml configuration file and, of course, in the source files. If something is unclear please don't hesitate to [[Reporting problems|ask]].''<br />
<br />
== Running the default configuration ==<br />
<br />
Once the SUMO Toolbox is [[Installation|installed]] you can do a simple test run to check if everything is working as expected. This is explained on the [[Installation#Test_run | installation page]].<br />
<br />
== Running different examples ==<br />
<br />
=== Prerequisites ===<br />
This section is about running a different example problem, if you want to model your own problem see [[Adding an example]]. Make sure you [[configuration|understand the difference between the simulator configuration file and the toolbox configuration file]]. You should also have read [[Toolbox configuration#Structure]].<br />
<br />
=== Changing default.xml ===<br />
The <code>examples/</code> directory contains many example simulators that you can use to test the toolbox with. These examples range from predefined functions, to datasets from various domains, to native simulation code. If you want to try one of the examples, open <code>config/default.xml</code> and edit the [[Simulator| <Simulator>]] tag to suit your needs.<br />
<br />
For example, originally default.xml contains:<br />
<br />
<source lang="xml"><br />
<Simulator>Academic2DTwice</Simulator><br />
</source><br />
<br />
This means the toolbox will look in the examples directory for a project directory called <code>Academic2DTwice</code> and load the xml file with the same name inside that directory (in this case: <code>Academic2DTwice/Academic2DTwice.xml</code>).<br />
<br />
Now lets say you want to run one of the different example problems, for example, lets say you want to try the Michalewicz example. In this case you would replace the original Simulator tag with: <br />
<br />
<source lang="xml"><br />
<Simulator>Michalewicz</Simulator><br />
</source><br />
<br />
In addition you would have to change the <code><Outputs></code> tag. The <code>Academic2DTwice</code> example has two outputs (''out'' and ''outinverse''). However, the Michalewicz example has only one (''out''). Thus telling the SUMO Toolbox to model the ''outinverse'' output in that case makes no sense since it does not exist for the Michalewicz example. So the following output configuration suffices:<br />
<br />
<source lang="xml"><br />
<Outputs><br />
<Output name="out"><br />
</Output><br />
</source><br />
<br />
The rest of default.xml can be kept the same. Then simply run '<code>go</code>' to run the example (making sure that the toolbox is in your Matlab path of course).<br />
<br />
Note that it is also possible to specify an absolute path or refer to a particular xml file directly. For example:<br />
<br />
<source lang="xml"><br />
<Simulator>/path/to/your/project/directory</Simulator><br />
</source><br />
<br />
or:<br />
<br />
<source lang="xml"><br />
<Simulator>Ackley/Ackley2D.xml</Simulator><br />
</source><br />
<br />
=== Important notes ===<br />
<br />
If you start changing default.xml to try out different examples, there are a number of important things you should be aware of.<br />
<br />
==== Select a matching Input and Outputs ====<br />
Using the <code><Inputs></code> and <code><Outputs></code> tags in the SUMO-Toolbox configuration file you can tell the toolbox which outputs should be modeled and how. Note that these tags are optional. You can delete them and then the toolbox will simply model all available inputs and outputs. If you do specify a particular output, for example say you tell the toolbox to model output ''temperature'' of the simulator ''ChemistryProblem''. If you then change the configuration file to model ''BiologyProblem'' you will have to change the name of the selected output (or input) since most likely ''BiologyProblem'' will not have an output called ''temperature''.<br />
Another concrete example is given above with the Michalewicz example.<br />
<br />
==== Select a matching SampleEvaluator ====<br />
There is one important caveat. Some examples consist of a fixed data set, some are implemented as a Matlab function, others as a C++ executable, etc. When running a different example you have to tell the SUMO Toolbox how the example is implemented so the toolbox knows how to extract data (eg: should it load a data file or should it call a Matlab function). This is done by specifying the correct [[Config:SampleEvaluator|SampleEvaluator]] tag. The default SampleEvaluator is:<br />
<br />
<source lang="xml"><br />
<SampleEvaluator>matlab</SampleEvaluator><br />
</source><br />
<br />
So this means that the toolbox expects the example you want to run is implemented as a Matlab function. Thus it is no use running an example that is implemented as a static dataset using the '[[Config:SampleEvaluator#matlab|matlab]]' or '[[Config:SampleEvaluator#local|local]]' sample evaluators. Doing this will result in an error. In this case you should use '[[Config:SampleEvaluator#scatteredDataset|scatteredDataset]]' (or sometimes [[Config:SampleEvaluator#griddedDataset|griddedDataset]]).<br />
<br />
To see how an example is implemented open the XML file inside the example directory and look at the <source lang="xml"><Implementation></source> tag. To see which SampleEvaluators are available see [[Config:SampleEvaluator]].<br />
<br />
==== Select an appropriate AdaptiveModelBuilder ====<br />
Also remember that if you switch to a different example you may also have to change the [[Config:AdaptiveModelBuilder]] used. For example, if you are using a spline model (which only works in 2D) and you decide to model a problem with many dimensions (e.g., CompActive or BostonHousing) you will have to switch to a different model type (e.g., any of the SVM or LS-SVM model builders).<br />
<br />
==== Switch off Sample Selection if not needed ====<br />
If you are modeling a fixed, small size dataset it may make no sense to select samples incrementally. Instead you will probably load all the data at once and only generate models. See [[Adaptive_Modeling_Mode]] for how to do this.<br />
<br />
Finally the question may remain, what settings should I use for my problem? Well there is no best answer to this question, see [[General_guidelines]].<br />
<br />
== Running different configuration files ==<br />
<br />
If you just type "go" the SUMO-Toolbox will run using the configuration options in default.xml. However you may want to make a copy of default.xml and play around with that, leaving your original default.xml intact. So the question is, how do you run that file? Lets say your copy is called MyConfigFile.xml. In order to tell SUMO to run that file you would type:<br />
<br />
<source lang="xml"><br />
go('/path/to/MyConfigFile.xml')<br />
</source><br />
<br />
The path can be an absolute path, or a path relative to the SUMO Toolbox root directory.<br />
To see what other options you have when running go type ''help go''.<br />
<br />
'''Remember to always run go from the toolbox root directory.'''<br />
<br />
=== Merging your configuration ===<br />
<br />
If you know what you are doing, you can merge your own custom configuration with the default configuration by using the '-merge' option. Options or tags that are missing in this custom file will then be filled up with the values from the default configuration. This prevents you from having to duplicate tags in default.xml. However, if you are unfamiliar with XML and not quite sure what you are doing we advise against using it.<br />
<br />
=== Running optimization examples ===<br />
The SUMO toolbox can also be used for minimizing the simulator in an intelligent way. There are 2 examples in included in <code>config/Optimization</code>. To run these examples is exactly the same as always, e.g. <code>go('config/optimization/Branin.xml')</code>. The only difference is in the sample selector which is specified in the configuration file itself.<br />
<gallery><br />
Image:ISCSampleSelector2.png<br />
</gallery><br />
The example configuration files are well documented, it is advised to go through them for more detailed information.<br />
<br />
== Understanding the control flow ==<br />
<br />
[[Image:sumo-control-flow.png|thumb|300px|right|The general SUMO-Toolbox control flow]]<br />
<br />
When the toolbox is running you might wonder what exactly is going on. The high level control flow that the toolbox goes through is illustrated in the flow chart and explained in more detail below. You may also refer to the [[About#Presentation|general SUMO presentation]].<br />
<br />
# Select samples according to the [[InitialDesign|initial design]] and execute the [[Simulator]] for each of the points<br />
# Once enough points are available, start the [[Add_Model_Type#Models.2C_Model_builders.2C_and_Factories|Model builder]] which will start producing models as it optimizes the model parameters<br />
## the number of models generated depends on the [[Config:AdaptiveModelBuilder|AdaptiveModelBuilder]] used. Usually the AdaptiveModelBuilder tag contains a setting like ''maxFunEvals'' or ''popSize''. This indicates to the algorithm that is optimizing the model parameters (and thus generating models) how many models it should maximally generate before stopping. By increasing this number you will generate more models in between sampling iterations, thus have a higher chance of getting a better model, but increasing the computation time. This step is what we refer to as a ''modeling iteration''.<br />
## optimization over the model parameters is driven by the [[Measures|Measure(s)]] that are enabled. Selection of the Measure is thus very important for the modeling process!<br />
## each time the model builder generates a model that has a lower measure score than the previous best model, the toolbox will trigger a "New best model found" event, save the model, generate a plot, and trigger all the profilers to update themselves.<br />
## so note that by default, you only see something happen when a new best model is found, you do not see all the other models that are being generated in the background. If you want to see those, you must increase the logging granularity (or just look in the log file) or [[FAQ#How_do_I_enable_more_profilers.3F|enable more profilers]].<br />
# So the model builder will run until it has completed<br />
# Then, if the current best model satisfies all the targets in the enabled Measures, it means we have reached the requirements and the toolbox terminates.<br />
# If not, the [[SampleSelector]] selects a new set of samples (= a ''sampling iteration''), they are simulated, and the model building resumes or is restarted according to the configured restart strategy<br />
# This whole loop continues (thus the toolbox will keep running) until one of the following conditions is true:<br />
## the targets specified in the active measure tags have been reached (each Measure has a target value which you can set). Note though, that when you are using multiple measures (see [[Multi-Objective Modeling]]) or when using single measures like AIC or LRM, it becomes difficult to set a priori targets since you cant really interpret the scores (in contrast to the simple case with a single measure like CrossValidation where your target is simply the error you require). In those cases you should usually set the targets to 0 and use one of the other criteria below to make sure the toolbox stops.<br />
## the maximum running time has been reached (''maximumTime'' property in the [[Config:SUMO]] tag)<br />
## the maximum number of samples has been reached (''maximumTotalSamples'' property in the [[Config:SUMO]] tag)<br />
## the maximum number of modeling iterations has been reached (''maxModelingIterations'' property in the [[Config:SUMO]] tag)<br />
<br />
<br />
Note that it is also possible to disable the sample selection loop, see [[Adaptive Modeling Mode]]. Also note that while you might think the toolbox is not doing anything, it is actually building models in the background (see above for how to see the details). The toolbox will only inform you (unless configured otherwise) if it finds a model that is better than the previous best model (using that particular measure!!). If not it will continue running until one of the stopping conditions is true.<br />
<br />
== Output ==<br />
<br />
All output is stored under the [[Config:ContextConfig#OutputDirectory|directory]] specified in the [[Config:ContextConfig]] section of the configuration file (by default this is set to "<code>output</code>"). <br />
<br />
Starting from version 6.0 the output directory is always relative to the project directory of your example. Unless you specify an absolute path.<br />
<br />
After completion of a SUMO Toolbox run, the following files and directories can be found there (e.g. : in <code>output/<run_name+date+time>/</code> subdirectory) :<br />
<br />
* <code>config.xml</code>: The xml file that was used by this run. Can be used to reproduce the entire modeling process for that run.<br />
* <code>randstate.dat</code>: contains states of the random number generators, so that it becomes possible to deterministically repeat a run (see the [[Random state]] page).<br />
* <code>samples.txt</code>: a list of all the samples that were evaluated, and their outputs.<br />
* <code>profilers</code>-dir: contains information and plots about convergence rates, resource usage, and so on.<br />
* <code>best</code>-dir: contains the best models (+ plots) of all outputs that were constructed during the run. This is continuously updated as the modeling progresses.<br />
* <code>models_outputName</code>-dir: contains a history of all intermediate models (+ plots + movie) for each output that was modeled.<br />
<br />
If you generated models [[Multi-Objective Modeling|multi-objectively]] you will also find the following directory:<br />
<br />
* <code>paretoFronts</code>-dir: contains snapshots of the population during multi-objective optimization of the model parameters.<br />
<br />
== Debugging ==<br />
<br />
Remember to always check the log file first if problems occur!<br />
When [[reporting problems]] please attach your log file and the xml configuration file you used.<br />
<br />
To aid understanding and debugging you should set the console and file logging level to FINE (or even FINER, FINEST)<br />
as follows: <br />
<br />
Change the level of the ConsoleHandler tag to FINE, FINER or FINEST. Do the same for the FileHandler tag. <br />
<br />
<source lang="xml"><br />
<!-- Configure ConsoleHandler instances --><br />
<ConsoleHandler><br />
<Option key="Level" value="FINE"/><br />
</ConsoleHandler><br />
</source><br />
<br />
== Using models ==<br />
<br />
Once you have generated a model, you might wonder what you can do with it. To see how to load, export, and use SUMO generated models see the [[Using a model]] page.<br />
<br />
== Modeling complex outputs ==<br />
<br />
The toolbox supports the modeling of complex valued data. If you do not specify any specific <[[Outputs|Output]]> tags, all outputs will be modeled with [[Outputs#Complex_handling|complexHandling]] set to '<code>complex</code>'. This means that a real output will be modeled as a real value, and a complex output will be modeled as a complex value (with a real and imaginary part). If you don't want this (i.e., you want to model the modulus of a complex output or you want to model real and imaginary parts separately), you explicitly have to set [[Outputs#Complex_handling|complexHandling]] to 'modulus', 'real', 'imaginary', or 'split'.<br />
<br />
More information on this subject can be found at the [[Outputs#Complex_handling|Outputs]] page.<br />
<br />
== Models with multiple outputs ==<br />
<br />
If multiple [[Outputs]] are selected, by default the toolbox will model each output separately using a separate adaptive model builder object. So if you have a system with 3 outputs you will get three different models each with one output. However, sometimes you may want a single model with multiple outputs. For example instead of having a neural network for each component of a complex output (real/imaginary) you might prefer a single network with 2 outputs. To do this simply set the 'combineOutputs' attribute of the <AdaptiveModelBuilder> tag to 'true'. That means that each time that model builder is selected for an output, the same model builder object will be used instead of creating a new one.<br />
<br />
Note though, that not all model types support multiple outputs. If they don't you will get an error message.<br />
<br />
Also note that you can also generate models with multiple outputs in a multi-objective fashion. For information on this see the page on [[Multi-Objective Modeling]].<br />
<br />
== Multi-Objective Model generation ==<br />
<br />
See the page on [[Multi-Objective Modeling]].<br />
<br />
== Interfacing with the SUMO Toolbox ==<br />
<br />
To learn how to interface with the toolbox or model your own problem see the [[Adding an example]] and [[Interfacing with the toolbox]] pages.<br />
<br />
== Test Suite ==<br />
<br />
The a test harness is provided that can be run manually or automatically as part of a cron job. The test suite consists of a number of test XML files (in the config/test/ directory), each describing a particular surrogate modeling experiment. The file config/test/suite.xml dictates which tests are run and their order. The suite.xml file also contains the accuracy and sample bounds that are checked after each test. If the final model found does not fall within the accuracy or number-of-samples bounds, the test is considered failed.<br />
<br />
Note also that some of the predefined test cases may rely on data sets or simulation code that are not publically available for confidentiality reasons. However, since these test problems typically make very good benchmark problems we left them in for illustration purposes.<br />
<br />
The coordinating class is the Matlab TestSuite class found in the src/matlab directory. Besides running the tests defined in suite.xml it also tests each of the model member functions.<br />
<br />
Assuming the SUMO Toolbox is setup properly and the necessary libraries are compiled ([[Installation#Optional:_Compiling_libraries|see here]]), the test suite should be run as follows (from the SUMO root directory):<br />
<br />
<source lang="matlab"><br />
s = TestEngine('config/test/suite.xml') ; s.run()<br />
</source><br />
<br />
The "run()" method also supports an optional parameter (a vector) that dictates which tests to run (e.g., run([2 5 3]) will run tests 2,5 and 3).<br />
<br />
''Note that due to randomization the final accuracy and number of samples used may vary slightly from run to run (causing failed tests). Thus the bounds must be set sufficiently loose.''<br />
<br />
== Tips ==<br />
<br />
See the [[Tips]] page for various tips and gotchas.</div>Dgorissenhttp://sumowiki.intec.ugent.be/index.php?title=Installation&diff=5165Installation2010-07-03T12:27:22Z<p>Dgorissen: /* Optional: Compiling libraries */</p>
<hr />
<div>== Introduction ==<br />
This page will walk you through the SUMO Toolbox installation. Please refer to the [[system requirements]] first. See the [[downloading]] section on how to download the toolbox.<br />
<br />
== Quick start ==<br />
<br />
Quick and dirty instructions:<br />
<br />
# Log into the SUMO lab website with the account information mailed to you and download the toolbox<br />
# Unzip the toolbox zip file, it will create a directory (= the toolbox installation directory)<br />
<!-- # Unzip the activation zip file '''INTO the toolbox installation directory''' (this file was mailed to you after you registered) --><br />
# Start Matlab<br />
# Go to the toolbox directory<br />
# Run '<code>startup</code>'<br />
# Run '<code>go</code>'<br />
<br />
== Basic Installation ==<br />
<br />
=== Toolbox ===<br />
Unzip the toolbox zip file to a directory somewhere on your harddisk, the full path of the SUMO Toolbox (including installation directory) will be referred to as the toolbox installation directory (e.g., c:\software\SUMO-Toolbox-6.3). Note that you do '''not''' have to put the toolbox in the Matlab installation directory, we actually advise against it since it can cause confusing errors.<br />
<br />
Once you have unzipped the toolbox zip file the directory structure looks like this:<br />
<br />
* <code><toolbox installation directory> ''(e.g., c:\software\SUMO-Toolbox-6.3)''</code><br />
** <code>bin/</code> : binaries, executable scripts, ...<br />
** <code>config/</code> : configuration files, location of <code>default.xml</code><br />
** <code>config/demo</code> : a couple of demo configuration files that may help you<br />
** <code>doc/</code> : some documentation<br />
** <code>doc/apidoc</code> : Javadoc and other api docs<br />
** <code>lib/</code> : required libraries (eg: dom4j)<br />
** <code>output/</code> : some output may be placed here (e.g., a global log file)<br />
** <code>src/</code> : all source code<br />
** <code>examples/</code> : project directories of different examples (you can test with these problems and use them as an example to [[Adding an example|add your own]])<br />
<br />
<!--<br />
=== Activation file ===<br />
<br />
Once you have received the activation file simply unzip it '''INTO''' in your toolbox installation directory. So place the zip file in the toolbox installation directory and unzip it there, it should place all files in the correct places (see also the README file in the activation zip). DO NOT unzip the activation file into its own directory somewhere else. Make sure you restart Matlab (if it was running) after you have done this.<br />
<br />
=== Extension pack ===<br />
<br />
There are a number of third party tools and modeling libraries that the SUMO Toolbox can use but that we cannot distribute together with the toolbox. These have been bundled in an extension pack. Only minor patches have been made to the original code to make them work better with SUMO (e.g., remove debug output). To install the extension pack, download the zip file, and unzip it INTO your toolbox installation directory. The files should be placed in the correct directories. Simply re-run 'startup' to make Matlab aware of the new files.<br />
<br />
If you download and/or use these files please respect their licenses (found in doc/licenses), '''THIS IS YOUR RESPONSIBILITY !!!'''.<br />
--><br />
=== Setup ===<br />
<br />
Setting up the toolbox is very easy. Start Matlab, navigate to the toolbox installation directory (not anywhere else, this is important!!) and run '<code>startup</code>'.<br />
<br />
=== Test run ===<br />
<br />
To ensure everything is working you can do a simple run of the toolbox with the default configuration. This means the toolbox will use the setting specified in <code><SUMO-Toolbox-installation-dir>/config/default.xml</code>.<br />
<br />
# Make sure that you are in the toolbox installation directory and you have run '<code>startup</code>' (see above)<br />
# Type '<code>go</code>' and press enter.<br />
# The toolbox will start to model the ''Academic2DTwice'' simulator. This simulator has 2 inputs and 2 outputs, and will be modeled using Kriging models, scored using [[Measures#CrossValidation| CrossValidation]], and samples selected using a combined sample selection method.<br />
# To see the exact settings used open <code>config/default.xml</code>. Feel free to edit this file and play around with the different options.<br />
<br />
The examples directory contains many example simulators that you can use to test the toolbox with. See [[Running#Running_different_examples]].<br />
<br />
== Ok, the test run works, now what? ==<br />
<br />
See [[Running]] page.<br />
<br />
== Problems ==<br />
<br />
See the [[reporting problems]] page.<br />
<br />
== Optional: Compiling libraries ==<br />
<br />
There are some alternative libraries and simulators available that have to be compiled for your specific platform. Instructions depend on your operating system. Ensure you have installed the extension pack before continuing.<br />
<br />
=== Linux/Unix/OSX ===<br />
<br />
# Ensure you have the following environment variables set:<br />
## <code>MATLABDIR=/path/to/your/matlab/installation</code><br />
## <code>JAVA_HOME=/path/to/your/SDK/installation</code><br />
# Ensure you have the usual build tools installed: gcc, g++, autotools, make, etc<br />
# From the command line shell (so NOT from inside Matlab): Go to the toolbox installation directory and type '<code>make</code>'. This will build everything for you (C/C++ files, SVM libraries, ...). If you only want to build certain packages simply '<code>make Package</code>' in the toolbox installation directory. <br />
## Note: if this is giving you problems, and you just want to compile the LS-SVMs you can try running makeLSSVM from inside Matlab (see the Windows instructions below)<br />
# A complete list of available packages follows:<br />
<br />
<br />
{| style="margin: 1em auto 1em auto" border="1"<br />
|-<br />
! Package<br />
! Description<br />
! Requires extension pack<br />
|-<br />
| contrib<br />
| Builds the FANN, SVM (libsvm, LS-SVMlab) and NNSYSID libraries<br />
| Yes<br />
|-<br />
| cexamples<br />
| Builds the binaries for several C/C++ simulators<br />
| No<br />
|}<br />
<br />
Note: if you want to use FANN you will have to ensure Matlab knows of the FANN libraries. See [[FAQ#When_using_FANN_models_I_sometimes_get_.22Invalid_MEX-file_createFann.mexa64.2C_libfann.so.2:_cannot_open_shared_object_file:_No_such_file_or_directory..22 | the FAQ]].<br />
<br />
=== Windows ===<br />
<br />
# Compiling C/C++ codes (examples):<br />
## You will have to do this on your own using a C/C++ compiler of your choice: Dev-c++/Visual Studio/...<br />
# Compiling LS-SVM libraries:<br />
## In order to use the [http://www.esat.kuleuven.be/sista/lssvmlab/ LS-SVM] backend, you will have to compile the LS-SVM mex files (it will work if you dont but you will get a lot of warning messages about a missing CFile implementation).<br />
## This can be done using the built-in LCC compiler of matlab, by calling '<code>makeLSSVM</code>' from the Matlab command prompt (make sure the SUMO Toolbox is in your path)<br />
# Compiling ANN libraries:<br />
## In order to use the [http://leenissen.dk/fann/ FANN] backend, you will have to compile the FANN library and mex files.<br />
## So far nobody has yet got it to work under Windows, but don't let that stop you.</div>Dgorissenhttp://sumowiki.intec.ugent.be/index.php?title=FAQ&diff=5164FAQ2010-07-03T12:26:37Z<p>Dgorissen: </p>
<hr />
<div>== General ==<br />
<br />
=== What is a global surrogate model? ===<br />
<br />
A global [http://en.wikipedia.org/wiki/Surrogate_model surrogate model] is a mathematical model that mimics the behavior of a computationally expensive simulation code over '''the complete parameter space''' as accurately as possible, using as little data points as possible. So note that optimization is not the primary goal, although it can be done as a post-processing step. Global surrogate models are useful for:<br />
<br />
* design space exploration, to get a ''feel'' of how the different parameters behave<br />
* sensitivity analysis<br />
* ''what-if'' analysis<br />
* prototyping<br />
* visualization<br />
* ...<br />
<br />
In addition they are a cheap way to model large scale systems, multiple global surrogate models can be chained together in a model cascade.<br />
<br />
See also the [[About]] page.<br />
<br />
=== What about surrogate driven optimization? ===<br />
<br />
When coining the term '''surrogate driven optimization''' most people associate it with trust-region strategies and simple polynomial models. These frameworks first construct a local surrogate which is optimized to find an optimum. Afterwards, a move limit strategy decides how the local surrogate is scaled and/or moved through the input space. Subsequently the surrogate is rebuild and optimized. I.e. the surrogate zooms in to the global optimum. For instance the [http://www.cs.sandia.gov/DAKOTA/ DAKOTA] Toolbox implements such strategies where the surrogate construction is separated from optimization.<br />
<br />
Such a framework was earlier implemented in the SUMO Toolbox but was deprecated as it didn't fit the philosophy and design of the toolbox. <br />
<br />
Instead another, equally powerful, approach was taken. The current optimization framework is in fact a sampling selection strategy that balances local and global search. In other words, it balances between exploring the input space and exploiting the information the surrogate gives us.<br />
<br />
A configuration example can be found [[Config:SampleSelector#expectedImprovement|here]].<br />
<br />
=== What is (adaptive) sampling? Why is it used? ===<br />
<br />
In classical Design of Experiments you need to specify the design of your experiment up-front. Or in other words, you have to say up-front how many data points you need and how they should be distributed. Two examples are Central Composite Designs and Latin Hypercube designs. However, if your data is expensive to generate (e.g., an expensive simulation code) it is not clear how many points are needed up-front. Instead data points are selected adaptively, only a couple at a time. This process of incrementally selecting new data points in regions that are the most interesting is called adaptive sampling, sequential design, or active learning. Of course the sampling process needs to start from somewhere so the very first set of points is selected based on a fixed, classic experimental design. See also [[Running#Understanding_the_control_flow]].<br />
SUMO provides a number of different sampling algorithms: [[SampleSelector]]<br />
<br />
Of course sometimes you dont want to do sampling. For example if you have a fixed dataset you just want to load all the data in one go and model that. For how to do this see [[FAQ#How_do_I_turn_off_adaptive_sampling_.28run_the_toolbox_for_a_fixed_set_of_samples.29.3F]].<br />
<br />
=== What about dynamical, time dependent data? ===<br />
<br />
The original design and purpose was to tackle static input-output systems, where there is no memory. Just a complex mapping that must be learnt and approximated. Of course you can take a fixed time interval and apply the toolbox but that typically is not a desired solution. Usually you are interested in time series prediction, e.g., given a set of output values from time t=0 to t=k, predict what happens at time t=k+1,k+2,...<br />
<br />
The toolbox was originally not intended for this purpose. However, it is quite easy to add support for recurrent models. Automatic generation of dynamical models would involve adding a new model type (just like you would add a new regression technique) or require adapting an existing one. For example it would not be too much work to adapt the ANN or SVM models to support dynamic problems. The only extra work besides that would be to add a new [[Measures|Measure]] that can evaluate the fidelity of the models' prediction.<br />
<br />
Naturally though, you would be unable to use sample selection (since it makes no sense in those problems). Unless of course there is a specialized need for it. In that case you would add a new [[SampleSelector]].<br />
<br />
For more information on this topic [[Contact]] us.<br />
<br />
=== What about classification problems? ===<br />
<br />
The main focus of the SUMO Toolbox is on regression/function approximation. However, the framework for hyperparameter optimization, model selection, etc. can also be used for classification. Starting from version 6.3 a demo file is included in the distribution that shows how this works on a well known test problem. If you want to play around with this feature without waiting for 6.3 to be released [[Contact|just let us know]].<br />
<br />
=== Can the toolbox drive my simulation code directly? ===<br />
<br />
Yes it can. See the [[Interfacing with the toolbox]] page.<br />
<br />
=== What is the difference between the M3-Toolbox and the SUMO-Toolbox? ===<br />
<br />
The SUMO toolbox is a complete, feature-full framework for automatically generating approximation models and performing adaptive sampling. In contrast, the M3-Toolbox was more of a proof-of-principle.<br />
<br />
=== What happened to the M3-Toolbox? ===<br />
<br />
The M3 Toolbox project has been discontinued (Fall 2007) and superseded by the SUMO Toolbox. Please contact tom.dhaene@ua.ac.be for any inquiries and requests about the M3 Toolbox.<br />
<br />
=== How can I stay up to date with the latest news? ===<br />
<br />
To stay up to date with the latest news and releases, we also recommend subscribing to our newsletter [http://www.sumo.intec.ugent.be here]. Traffic will be kept to a minimum (1 message every 2-3 months) and you can unsubscribe at any time.<br />
<br />
You can also follow our blog: [http://sumolab.blogspot.com/ http://sumolab.blogspot.com/].<br />
<br />
=== What is the roadmap for the future? ===<br />
<br />
There is no explicit roadmap since much depends on where our research leads us, what feedback we get, which problems we are working on, etc. However, to get an idea of features to come you can always check the [[Whats new]] page.<br />
<br />
You can also follow our blog: [http://sumolab.blogspot.com/ http://sumolab.blogspot.com/].<br />
<br />
=== Will there be an R/Scilab/Octave/Sage/.. version? ===<br />
<br />
At the start of the project we considered moving from Matlab to one of the available open source alternatives. However, after much discussion we decided against this for several reasons, including:<br />
<br />
* Existing experience and know-how of the development team<br />
* The widespread use of the Matlab platform in the target application domains<br />
* The quality and amount of available Matlab documentation<br />
* The quality and number of Matlab toolboxes<br />
* Support for object orientation (inheritance, polymorphism, etc.)<br />
* Many well documented interfacing options (especially the seamless integration with Java)<br />
<br />
Matlab, as a proprietary platform, definitely has its problems and deficiencies but the number of advanced algorithms and available toolboxes make it a very attractive platform. Equally important is the fact that every function is properly documented, tested, and includes examples, tutorials, and in some cases GUI tools. A lot of things would have been a lot harder and/or time consuming to implement on one of the other platforms. Add to that the fact that many engineers (particularly in aerospace) already use Matlab quite heavily. Thus given our situation, goals, and resources at the time, Matlab was the best choice for us. <br />
<br />
The other platforms remain on our radar however, and we do look into them from time to time. Though, with our limited resources porting to one of those platforms is not (yet) cost effective.<br />
<br />
=== What are collaboration options? ===<br />
<br />
We will gladly help out with any SUMO-Toolbox related questions or problems. However, since we are a university research group the most interesting goal for us is to work towards some joint publication (e.g., we can help with the modeling of your problem). Alternatively, it is always nice if we could use your data/problem (fully referenced and/or anonymized if necessary of course) as an example application during a conference presentation or in a PhD thesis.<br />
<br />
The most interesting case is if your problem involves sample selection and modeling. This means you have some simulation code or script to drive and you want an accurate model while minimizing the number of data points. In this case, in order for us to optimally help you it would be easiest if we could run your simulation code (or script) locally or access it remotely. Else its difficult to give good recommendations about what settings to use.<br />
<br />
If this is not possible (e.g., expensive, proprietary or secret modeling code) or if your problem does not involve sample selection, you can send us a fixed data set that is representative of your problem. Again, this may be fully anonymized and will be kept confidential of course.<br />
<br />
In either case (code or dataset) remember:<br />
<br />
* the data file should be an ASCII file in column format (each row containing one data point) (see also [[Interfacing_with_the_toolbox]])<br />
* include a short description of your data:<br />
** number of inputs and number of outputs<br />
** the range of each input (or scaled to [-1 1] if you do not wish to disclose this)<br />
** if the outputs are real or complex valued<br />
** how noisy the data is or if it is completely deterministic (computer simulation) (please also see: [[FAQ#My_data_contains_noise_can_the_SUMO-Toolbox_help_me.3F]]).<br />
** if possible the expected range of each output (or scaled if you do not wish to disclose this)<br />
** if possible the names of each input/output + a short description of what they mean<br />
** any further insight you have about the data, expected behavior, expected importance of each input, etc.<br />
<br />
If you have any further questions or comments related to this please [[Contact]] us.<br />
<br />
=== Can you help me model my problem? ===<br />
<br />
Please see the previous question: [[FAQ#What_are_collaboration_options.3F]]<br />
<br />
== Installation and Configuration ==<br />
<br />
=== What is the relationship between Matlab and Java? ===<br />
<br />
Many people do not know this, but your Matlab installation automatically includes a Java virtual machine. By default, Matlab seamlessly integrates with Java, allowing you to create Java objects from the command line (e.g., 's = java.lang.String'). It is possible to disable java support but in order to use the SUMO Toolbox it should not be. To check if Java is enabled you can use the 'usejava' command.<br />
<br />
=== What is Java, why do I need it, do I have to install it, etc. ? ===<br />
<br />
The short answer is: no, dont worry about it. The long answer is: Some of the code of the SUMO Toolbox is written in [http://en.wikipedia.org/wiki/Java_(programming_language) Java], since it makes a lot more sense in many situations and is a proper programming language instead of a scripting language like Matlab. Since Matlab automatically includes a JVM to run Java code there is nothing you need to do or worry about (see the previous FAQ entry). Unless its not working of course, in that case see [[FAQ#When_running_the_toolbox_you_get_something_like_.27.3F.3F.3F_Undefined_variable_.22ibbt.22_or_class_.22ibbt.sumo.config.ContextConfig.setRootDirectory.22.27]].<br />
<br />
=== What is XML? ===<br />
<br />
XML stands for eXtensible Markup Language and is related to HTML (= the stuff web pages are written in). The first thing you have to understand is that '''does not do anything'''. Honest. Many engineers are not used to it and think it is some complicated computer programming language-stuff-thingy. This is of course not the case (we ignore some of the fancy stuff you can do with it for now). XML is a markup language meaning, it provides some rules how you can annotate or structure existing text.<br />
<br />
The way SUMO uses XML is really simple and there is not much to understand. First some simple terminology. Take the following example:<br />
<br />
<source lang="xml"><br />
<Foo attr="bar">bla bla bla</Foo> <br />
</source><br />
<br />
Here we have '''a tag''' called ''Foo'' containing text ''bla bla bla''. The tag Foo also has an '''attribute''' ''attr'' with value ''bar''. '<Foo>' is what we call the '''opening tag''', and '</Foo>' is the '''closing tag'''. Each time you open a tag you must close it again. How you name the tags or attributes it totally up to you, you choose :)<br />
<br />
Lets take a more interesting example. Here we have used XML to represent information about a receipe for pancakes:<br />
<br />
<source lang="xml"><br />
<recipe category="dessert"><br />
<title>Pancakes</title><br />
<author>sumo@intec.ugent.be</author><br />
<date>Wed, 14 Jun 95</date><br />
<description><br />
Good old fashioned pancakes.<br />
</description><br />
<ingredients><br />
<item><br />
<amount>3</amount><br />
<type>eggs</type><br />
</item><br />
<br />
<item><br />
<amount>0.5 tablespoon</amount><br />
<type>salt</type><br />
</item><br />
...<br />
</ingredients><br />
<preparation><br />
...<br />
</preparation><br />
</recipe><br />
</source><br />
<br />
So basically, you see that XML is just a way to structure, order, and group information. Thats it! So SUMO basically uses it to store and structure configuration options. And this works well due to the nice hierarchical nature of XML.<br />
<br />
If you understand this there is nothing else to it in order to be able to understand the SUMO configuration files. If you need more information see the tutorial here: [http://www.w3schools.com/XML/xml_whatis.asp http://www.w3schools.com/XML/xml_whatis.asp]. You can also have a look at the wikipedia page here: [http://en.wikipedia.org/wiki/XML http://en.wikipedia.org/wiki/XML]<br />
<br />
=== Why does SUMO use XML? ===<br />
<br />
XML is the defacto standard way of structuring information. This ranges from spreadsheet files (Microsoft Excel for example), to configuration data, to scientific data, ... There are even whole database systems based solely on XML. So basically, its an intuitive way to structure data and it is used everywhere. This makes that there are a very large number of libraries and programming languages available that can parse, and handle XML easily. That means less work for the programmer. Then of course there is stuff like XSLT, XQuery, etc that makes life even easier.<br />
So basically, it would not make sense for SUMO to use any other format :)<br />
<br />
=== I get an error that SUMO is not yet activated ===<br />
<br />
Make sure you installed the activation file that was mailed to you as is explained in the [[Installation]] instructions. Also double check your system meets the [[System requirements]] and that [http://www.sumowiki.intec.ugent.be/index.php/FAQ#When_running_the_toolbox_you_get_something_like_.27.3F.3F.3F_Undefined_variable_.22ibbt.22_or_class_.22ibbt.sumo.config.ContextConfig.setRootDirectory.22.27|java java is enabled]. To fully verify that the activation file installation is correct ensure that the file ContextConfig.class is present in the directory ''<SUMO installation directory>/bin/java/ibbt/sumo/config''.<br />
<br />
Please note that more flexible research licenses are available if it is possible to [[FAQ#What_are_collaboration_options.3F|collaborate in any way]].<br />
<br />
== Upgrading ==<br />
<br />
=== How do I upgrade to a newer version? ===<br />
<br />
Delete your old <code><SUMO-Toolbox-directory></code> completely and replace it by the new one. Install the new activation file / extension pack as before (see [[Installation]]), start Matlab and make sure the default run works. To port your old configuration files to the new version: make a copy of default.xml (from the new version) and copy over your custom changes (from the old version) one by one. This should prevent any weirdness if the XML structure has changed between releases.<br />
<br />
If you had a valid activation file for the previous version, just [[Contact]] us (giving your SUMOlab website username) and we will send you a new activation file. Note that to update an activation file you must first unzip a copy of the toolbox to a new directory and install the activation file as if it was the very first time. Upgrading of an activation file without performing a new toolbox install is (unfortunately) not (yet) supported.<br />
<br />
== Using ==<br />
<br />
=== I have no idea how to use the toolbox, what should I do? ===<br />
<br />
See: [[Running#Getting_started]]<br />
<br />
=== I want to try one of the different examples ===<br />
<br />
See [[Running#Running_different_examples]].<br />
<br />
=== I want to model my own problem ===<br />
<br />
See : [[Adding an example]].<br />
<br />
=== I want to contribute some data/patch/documentation/... ===<br />
<br />
See : [[Contributing]].<br />
<br />
=== How do I interface with the SUMO Toolbox? ===<br />
<br />
See : [[Interfacing with the toolbox]].<br />
<br />
=== What configuration options (model type, sample selection algorithm, ...) should I use for my problem? ===<br />
<br />
See [[General_guidelines]].<br />
<br />
=== Ok, I generated a model, what can I do with it? ===<br />
<br />
See: [[Using a model]].<br />
<br />
=== How can I share a model created by the SUMO Toolbox? ===<br />
<br />
See : [[Using a model#Model_portability| Model portability]].<br />
<br />
=== I dont like the final model generated by SUMO how do I improve it? ===<br />
<br />
Before you start the modeling you should really ask youself this question: ''What properties do I want to see in the final model?'' You have to think about what for you constitutes a good model and what constitutes a poor model. Then you should rank those properties depending on how important you find them. Examples are:<br />
<br />
* accuracy in the training data<br />
** is it important that the error in the training data is exactly 0, or do you prefer some smoothing<br />
* accuracy outside the training data<br />
** this is the validation or test error, how important is proper generalization (usually this is very important)<br />
* what does accuracy mean to you? a low maximum error, a low average error, both, ...<br />
* smoothness<br />
** should your model be perfectly smooth or is it acceptable that you have a few small ripples here and there for example<br />
* are some regions of the response more important than others?<br />
** for example you may want to be certain that the minima/maxima are captured very accurately but everything in between is less important<br />
* are there particular special features that your model should have<br />
** for example, capture underlying poles or discontinuities correctly<br />
* extrapolation capability<br />
* ...<br />
<br />
It is important to note that often these criteria may be conflicting. The classical example is fitting noisy data: the lower your training error the higher your testing error. A natural approach is to combine multiple criteria, see [[Multi-Objective Modeling]].<br />
<br />
Once you have decided on a set of requirements the question is then, can the SUMO-Toolbox produce a model that meets them? In SUMO model generation is driven by one or more [[Measures]]. So you should choose the combination of [[Measures]] that most closely match your requirements. Of course we can not provide a Measure for every single property, but it is very straightforward to [[Add_Measure|add your own Measure]].<br />
<br />
Now, lets say you have chosen what you think are the best Measures but you are still not happy with the final model. Reasons could be:<br />
<br />
* you need more modeling iterations or you need to build more models per iteration (see [[Running#Understanding_the_control_flow]]). This will result in a more extensive search of the model parameter space, but will take longer to run.<br />
* you should switch to a different model parameter optimization algorithm (e.g., for example instead of the Pattern Search variant, try the Genetic Algorithm variant of your AdaptiveModelBuilder.)<br />
* the model type you are using is not ideally suited to your data<br />
* there simply is not enough data, use a larger initial design or perform more sampling iterations to get more information per dimension<br />
* maybe the sample distribution is causing troubles for your model (e.g., Kriging can have problems with clustered data). In that case it could be worthwhile to choose a different sample selection algorithm.<br />
* the range of your response variable is not ideal (for example, neural networks have trouble modeling data if the range of the outputs is very very small)<br />
<br />
You may also refer to the following [[General_guidelines]]. Finally, of course it may be that your problem is simply a very difficult one and does not approximate well. But, still you should at least get something satisfactory.<br />
<br />
If you are having these kinds of problems, please [[Reporting_problems|let us know]] and we will gladly help out.<br />
<br />
=== My data contains noise can the SUMO-Toolbox help me? ===<br />
<br />
The original purpose of the SUMO-Toolbox was for it to be used in conjunction with computer simulations. Since these are fully deterministic you do not have to worry about noise in the data and all the problems it causes. However, the methods in the toolbox are general fitting methods that work on noisy data as well. So yes, the toolbox can be used with noisy data, but you will just have to be more careful about how you apply the methods and how you perform model selection. Its only when you use the toolbox with a noisy simulation engine that a few special options may need to be set. In that case [[Contact]] us for more information.<br />
<br />
Note though, that the toolbox is not a statistical package, if you have noisy data and you need noise estimation algorithms, kernel smoothing algorithms, etc. you should look towards other tools.<br />
<br />
=== What is the difference between a ModelBuilder and a ModelFactory? ===<br />
<br />
See [[Add Model Type]].<br />
<br />
=== Why are the Neural Networks so slow? ===<br />
<br />
The ANN models are an extremely powerful model type that give very good results in many problems. However, they are quite slow to use. There are some things you can do:<br />
<br />
* use trainlm or trainscg instead of the default training function trainbr. trainbr gives very good, smooth results but is slower to use. If results with trainlm are not good enough, try using msereg as a performance function.<br />
* try setting the training goal (= the SSE to reach during training) to a small positive number (e.g., 1e-5) instead of 0.<br />
* check that the output range of your problem is not very small. If your response data lies between 10e-5 and 10e-9 for example it will be very hard for the neural net to learn it. In that case rescale your data to a more sane range.<br />
* switch from ANN to one of the other neural network modelers: fanngenetic or nanngenetic. These are a lot faster than the default backend based on the [http://www.mathworks.com/products/neuralnet/ Matlab Neural Network Toolbox]. However, the accuracy is usually not as good.<br />
* If you are using [[Measures#CrossValidation| CrossValidation]] try to switch to a different measure since CrossValidation is very expensive to use. CrossValidation is used by default if you have not defined a [[Measures| measure]] yourself. When using one of the neural network model types, try to use a different measure if you can. For example, our tests have shown that minimizing the sum of [[Measures#SampleError| SampleError]] and [[Measures#LRMMeasure| LRMMeasure]] can give equal or even better results than CrossValidation, while being much cheaper (see [[Multi-Objective Modeling]] for how to combine multiple measures). See also the comments in <code>default.xml</code> for examples.<br />
<br />
See also [[FAQ#How_can_I_make_the_toolbox_run_faster.3F]]<br />
<br />
=== How can I make the toolbox run faster? ===<br />
<br />
There are a number of things you can do to speed things up. These are listed below. Remember though that the main reason the toolbox may seem to be slow is due to the many models being built as part of the hyperparameter optimization. Please make sure you fully understand the [[Running#Understanding_the_control_flow|control flow described here]] before trying more advanced options.<br />
<br />
* First of all check that your virus scanner is not interfering with Matlab. If McAfee or any other program wants to scan every file SUMO generates this really slows things down and your computer becomes unusable.<br />
<br />
* Turn off the plotting of models in [[Config:ContextConfig#PlotOptions| ContextConfig]], you can always generate plots from the saved mat files<br />
<br />
* This is an important one. For most model builders there is an option "maxFunEals", "maxIterations", or equivalent. Change this value to change the maximum number of models built between 2 sampling iterations. The higher this number, the slower, but the better the models ''may'' be. Equivalently, for the Genetic model builders reduce the population size and the number of generations.<br />
<br />
* If you are using [[Measures#CrossValidation]] see if you can avoid it and use one of the other measures or a combination of measures (see [[Multi-Objective Modeling]]<br />
<br />
* If you are using a very dense [[Measures#ValidationSet]] as your Measure, this means that every single model will be evaluated on that data set. For some models like RBF, Kriging, SVM, this can slow things down.<br />
<br />
* Disable some, or even all of the [[Config:ContextConfig#Profiling| profilers]] or disable the output handlers that draw charts. For example, you might use the following configuration for the profilers:<br />
<br />
<source lang="xml"><br />
<Profiling><br />
<Profiler name=".*share.*|.*ensemble.*|.*Level.*" enabled="true"><br />
<Output type="toImage"/><br />
<Output type="toFile"/><br />
</Profiler><br />
<br />
<Profiler name=".*" enabled="true"><br />
<Output type="toFile"/><br />
</Profiler><br />
</Profiling><br />
</source><br />
<br />
The ".*" means match any one or more characters ([http://java.sun.com/j2se/1.5.0/docs/api/java/util/regex/Pattern.html see here for the full list of supported wildcards]). Thus in this example all the profilers that have "share", "ensemble", or "Level" in their name shoud be enabled and should be saved as a text file (toFile) AND as an image file (toImage). All the other profilers should be saved just to file. The idea is to only save to image what you want as an image since image generation is expensive. If you do this or switch off image generation completely you will see everything run much faster.<br />
<br />
* Decrease the logging granularity, a log level of FINE (the default is FINEST or ALL) is more then granular enough. Setting it to FINE, INFO, or even WARNING should speed things up.<br />
<br />
* If you have a multi-core/multi-cpu machine:<br />
** if you have the Matlab Parallel Computing Toolbox, try setting the parallelMode option to true in [[Config:ContextConfig]]. Now all model training occurs in parallel. This may give unexpected errors in some cases so beware when using.<br />
** if you are using a native executable or script as the sample evaluator set the threadCount variable in [[Config:SampleEvaluator#LocalSampleEvaluator| LocalSampleEvaluator]] equal to the number of cores/CPUs (only do this if it is ok to start multiple instances of your simulation script in parallel!)<br />
<br />
* Dont use the Min-Max measure, it can slow things down. See also [[FAQ#How_do_I_force_the_output_of_the_model_to_lie_in_a_certain_range]]<br />
<br />
* If you are using neural networks see [[FAQ#Why_are_the_Neural_Networks_so_slow.3F]]<br />
<br />
* If you are having problems with very slow or seemingly hanging runs:<br />
** Do a run inside the [http://www.mathworks.com/access/helpdesk/help/techdoc/index.html?/access/helpdesk/help/techdoc/matlab_env/f9-17018.html&http://www.google.be/search?client=firefox-a&rls=org.mozilla%3Aen-US%3Aofficial&channel=s&hl=nl&q=matlab+profiler&meta=&btnG=Google+zoeken Matlab profiler] and see where most time is spent.<br />
<br />
** Monitor CPU and physical/virtual memory usage while the SUMO toolbox is running and see if you notice anything strange. <br />
<br />
* Also note that by default Matlab only allocates about 117 MB memory space for the Java Virtual Machine. If you would like to increase this limit (which you should) please follow the instructions [http://www.mathworks.com/support/solutions/data/1-18I2C.html?solution=1-18I2C here]. See also the general memory instructions [http://www.mathworks.com/support/tech-notes/1100/1106.html here].<br />
<br />
To check if your SUMO run has hanged, monitor your log file (with the level set at least to FINE). If you see no changes for about 30 minutes the toolbox will probably have stalled. [[Reporting problems| report the problems here]].<br />
<br />
Such problems are hard to identify and fix so it is best to work towards a reproducible test case if you think you found a performance or scalability issue.<br />
<br />
=== How do I build models with more than one output ===<br />
<br />
Sometimes you have multiple responses that you want to model at once. See [[Running#Models_with_multiple_outputs]]<br />
<br />
=== How do I turn off adaptive sampling (run the toolbox for a fixed set of samples)? ===<br />
<br />
See : [[Adaptive Modeling Mode]].<br />
<br />
=== How do I change the error function (relative error, RMSE, ...)? ===<br />
<br />
The [[Measures| <Measure>]] tag specifies the algorithm to use to assign models a score, e.g., [[Measures#CrossValidation| CrossValidation]]. It is also possible to specify which '''error function''' to use, in the measure. The default error function is '<code>rootRelativeSquareError</code>'.<br />
<br />
Say you want to use [[Measures#CrossValidation| CrossValidation]] with the maximum absolute error, then you would put:<br />
<br />
<source lang="xml"><br />
<Measure type="CrossValidation" target="0.001" errorFcn="maxAbsoluteError"/><br />
</source><br />
<br />
On the other hand, if you wanted to use the [[Measures#ValidationSet| ValidationSet]] measure with a relative root-mean-square error you would put:<br />
<br />
<source lang="xml"><br />
<Measure type="ValidationSet" target="0.001" errorFcn="relativeRms"/><br />
</source><br />
<br />
The default error function is '<code>rootRelativeSquareError</code>'. These error functions can be found in the <code>src/matlab/tools/errorFunctions</code> directory. You are free to modify them and add your own. Remember that the choice of error function is very important! Make sure you think well about it. Also see [[Multi-Objective Modeling]].<br />
<br />
=== How do I enable more profilers? ===<br />
<br />
Go to the [[Config:ContextConfig#Profiling| <Profiling>]] tag and put <code>"<nowiki>.*</nowiki>"</code> as the regular expression. See also the next question.<br />
<br />
=== What regular expressions can I use to filter profilers? ===<br />
<br />
See the syntax [http://java.sun.com/j2se/1.5.0/docs/api/java/util/regex/Pattern.html here].<br />
<br />
=== How can I ensure deterministic results? ===<br />
<br />
See : [[Random state]].<br />
<br />
=== How do I get a simple closed-form model (symbolic expression)? ===<br />
<br />
See : [[Using a model]].<br />
<br />
=== How do I enable the Heterogenous evolution to automatically select the best model type? ===<br />
<br />
Simply use the [[Config:AdaptiveModelBuilder#heterogenetic| heterogenetic modelbuilder]] as you would any other.<br />
<br />
=== What is the combineOutputs option? ===<br />
<br />
See [[Running#Models_with_multiple_outputs]]<br />
<br />
=== What error function should I use? ===<br />
<br />
The default error function is the Root Relative Square Error (RRSE). On the other hand meanRelativeError may be more intuitive but in that case you have to be careful if you have function values close to zero since in that case the relative error explodes or even gives infinity. You could also use one of the combined relative error functions (contain a +1 in the denominator to account for small values) but then you get something between a relative and absolute error (=> hard to interpret).<br />
<br />
So to be sure an absolute error seems the safest bet (like the RMSE), however in that case you have to come up with sensible accuracy targets and realize that you will build models that try to fit the regions of high absolute value better than the low ones.<br />
<br />
Picking an error function is a very tricky business and many people do not realize this. Which one is best for you and what targets you use ultimately depends on your application and on what kind of model you want. There is no general answer.<br />
<br />
A recommended read is [http://www.springerlink.com/content/24104526223221u3/ is this paper]. See also the page on [[Multi-Objective Modeling]].<br />
<br />
=== I just want to generate an initial design (no sampling, no modeling) ===<br />
<br />
Do a regular SUMO run, except set the 'maxModelingIterations' in the SUMO tag to 0. The resulting run will only generate (and evaluate) the initial design and save it to samples.txt in the output directory.<br />
<br />
=== How do I start a run with the samples of of a previous run, or with a custom initial design? ===<br />
<br />
Use a Dataset design component, for example:<br />
<br />
<source lang="xml"><br />
<InitialDesign type="DatasetDesign"><br />
<Option key="file" value="/path/to/the/file/containing/the/points.txt"/><br />
</InitialDesign><br />
</source><br />
<br />
=== What is a level plot? ===<br />
<br />
A level plot is a plot that shows how the error histogram changes as the best model improves. An example is:<br />
<gallery><br />
Image:levelplot.png<br />
</gallery><br />
Level plots only work if you have a separate dataset (test set) that the model can be checked against. See the comments in default.xml for how to enable level plots.<br />
<br />
===I am getting a java out of memory error, what happened?===<br />
Datasets are loaded through java. This means that the java heap space is used for storing the data. If you try to load a huge dataset (> 50MB), you might experience problems with the maximum heap size. You can solve this by raising the heap size as described on the following webpage:<br />
[http://www.mathworks.com/support/solutions/data/1-18I2C.html]<br />
<br />
=== How do I force the output of the model to lie in a certain range ===<br />
<br />
See [[Measures#MinMax]].<br />
<br />
=== My problem is high dimensional and has a lot of input parameters (more than 10). Can I use SUMO? ===<br />
<br />
That depends. Remember that the main focus of SUMO is to generate accurate 'global' models. If you want to do sampling the practical dimensionality is limited to around 6-8 (though it depends on the problem and how cheap the simulations are!). Since the more dimensions the more space you need to fill. At that point you need to see if you can extend the models with domain specific knowledge (to improve performance) or apply a dimensionality reduction method ([[FAQ#Can_the_toolbox_tell_me_which_are_the_most_important_inputs_.28.3D_variable_selection.29.3F|see the next question]]). On the other hand, if you don't need to do sample selection but you have a fixed dataset which you want to model. Then the performance on high dimensional data just depends on the model type. For examples SVM type models are independent of the dimension and thus can always be applied. Though things like feature selection are always recommended.<br />
<br />
=== Can the toolbox tell me which are the most important inputs (= variable selection)? ===<br />
<br />
When tackling high dimensional problems a crucial question is "Are all my input parameters relevant?". Normally domain knowledge would answer this question but this is not always straightforward. In those cases a whole set of algorithms exist for doing dimensionality reduction (= feature selection). Support for some of these algorithms may eventually make it into the toolbox but are not currently implemented. That is a whole PhD thesis on its own. However, if a model type provides functions for input relevance determination the toolbox can leverage this. For example, the LS-SVM model available in the toolbox supports Automatic Relevance Determination (ARD). This means that if you use the SUMO Toolbox to generate an LS-SVM model, you can call the function ''ARD()'' on the model and it will give you a list of the inputs it thinks are most important.<br />
<br />
=== Should I use a Matlab script or a shell script for interfacing with my simulation code? ===<br />
<br />
When you want to link SUMO with an external simulation engine (ADS Momentum, SPECTRE, FEBIO, SWAT, ...) you need a [http://en.wikipedia.org/wiki/Shell_script shell script] (or executable) that can take the requested points from SUMO, setup the simulation engine (e.g., set necessary input files), calls the simulator for all the requested points, reads the output (e.g., one or more output files), and returns the results to SUMO (see [[Interfacing with the toolbox]]).<br />
<br />
Which one you choose (matlab script + [[Config:SampleEvaluator#matlab|Matlab Sample Evaluator]], or shell script/executable with [[Config:SampleEvaluator#local|Local Sample Evaluator]] is basically a matter of preference, take whatever is easiest for you.<br />
<br />
HOWEVER, there is one important consideration: Matlab does not support threads so this means that if you use a matlab script to interface with the simulation engine, simulations and modeling will happen sequentially, NOT in parallel. This means the modeling code will sit around waiting, doing nothing, until the simulation(s) have finished. If your simulation code takes a long time to run this is not very efficient. In version 6.2 we will probably fix this by using the Parallel Computing Toolbox.<br />
<br />
On the other hand, using a shell script/executable, does allow the modeling and simulation to occur in parallel (at least if you wrote your interface script in such a way that it can be run multiple times in parallel, i.e., no shared global directories or variables that can cause [http://en.wikipedia.org/wiki/Race_condition race conditions]).<br />
<br />
As a sidenote, note that if you already put work into a Matlab script, it is still possible to use a shell script, by writing a shell script that starts Matlab (using -nodisplay or -nojvm options), executes your script (using the -r option), and exits Matlab again. Of course it is not very elegant and adds some overhead but depending on your situation it may be worth it.<br />
<br />
=== Is there any design documentation available? ===<br />
<br />
There is a PhD thesis fully describing the software architecture and design rationale behind the toolbox. It will be put online in the future. Until then you can [[Contact]] us to obtain a copy.<br />
<br />
== Troubleshooting ==<br />
<br />
=== I have a problem and I want to report it ===<br />
<br />
See : [[Reporting problems]].<br />
<br />
=== I sometimes get flat models when using rational functions ===<br />
<br />
First make sure the model is indeed flat, and does not just appear so on the plot. You can verify this by looking at the output axis range and making sure it is within reasonable bounds. When there are poles in the model, the axis range is sometimes stretched to make it possible to plot the high values around the pole, causing the rest of the model to appear flat. If the model contains poles, refer to the next question for the solution.<br />
<br />
The [[Config:AdaptiveModelBuilder#rational| RationalModel]] tries to do a least squares fit, based on which monomials are allowed in numerator and denominator. We have experienced that some models just find a flat model as the best least squares fit. There are two causes for this:<br />
<br />
* The number of sample points is few, and the model parameters (as explained [[Model types explained#PolynomialModel|here]]) force the model to use only a very small set of degrees of freedom. The solution in this case is to increase the minimum percentage bound in the RationalFactory section of your configuration file: change the <code>"percentBounds"</code> option to <code>"60,100"</code>, <code>"80,100"</code>, or even <code>"100,100"</code>. A setting of <code>"100,100"</code> will force the polynomial models to always exactly interpolate. However, note that this does not scale very well with the number of samples (to counter this you can set <code>"maxDegrees"</code>). If, after increasing the <code>"percentBounds"</code> you still get weird, spiky, models you simply need more samples or you should switch to a different model type.<br />
* Another possibility is that given a set of monomial degrees, the flat function is just the best possible least squares fit. In that case you simply need to wait for more samples.<br />
* The measure you are using is not accurately estimating the true error, try a different measure or error function. Note that a maximum relative error is dangerous to use since a the 0-function (= a flat model) has a lower maximum relative error than a function which overshoots the true behavior in some places but is otherwise correct.<br />
<br />
=== When using rational functions I sometimes get 'spikes' (poles) in my model ===<br />
<br />
When the denominator polynomial of a rational model has zeros inside the domain, the model will tend to infinity near these points. In most cases these models will only be recognized as being `the best' for a short period of time. As more samples get selected these models get replaced by better ones and the spikes should disappear.<br />
<br />
So, it is possible that a rational model with 'spikes' (caused by poles inside the domain) will be selected as best model. This may or may not be an issue, depending on what you want to use the model for. If it doesn't matter that the model is very inaccurate at one particular, small spot (near the pole), you can use the model with the pole and it should perform properly.<br />
<br />
However, if the model should have a reasonable error on the entire domain, several methods are available to reduce the chance of getting poles or remove the possibility altogether. The possible solutions are:<br />
<br />
* Simply wait for more data, usually spikes disappear (but not always).<br />
* Lower the maximum of the <code>"percentBounds"</code> option in the RationalFactory section of your configuration file. For example, say you have 500 data points and if the maximum of the <code>"percentBounds"</code> option is set to 100 percent it means the degrees of the polynomials in the rational function can go up to 500. If you set the maximum of the <code>"percentBounds"</code> option to 10, on the other hand, the maximum degree is set at 50 (= 10 percent of 500). You can also use the <code>"maxDegrees"</code> option to set an absolute bound.<br />
* If you roughly know the output range your data should have, an easy way to eliminate poles is to use the [[Measures#MinMax| MinMax]] [[Measures| Measure]] together with your current measure ([[Measures#CrossValidation| CrossValidation]] by default). This will cause models whose response falls outside the min-max bounds to be penalized extra, thus spikes should disappear.<br />
* Use a different model type (RBF, ANN, SVM,...), as spikes are a typical problem of rational functions.<br />
* Increase the population size if using the genetic version<br />
* Try using the [[SampleSelector#RationalPoleSuppressionSampleSelector| RationalPoleSuppressionSampleSelector]], it was designed to get rid of this problem more quickly, but it only selects one sample at the time.<br />
<br />
However, these solutions may not still not suffice in some cases. The underlying reason is that the order selection algorithm contains quite a lot of randomness, making it prone to over-fitting. This issue is being worked on but will take some time. Automatic order selection is not an easy problem<br />
<br />
=== There is no noise in my data yet the rational functions don't interpolate ===<br />
<br />
[[FAQ#I sometimes get flat models when using rational functions |see this question]].<br />
<br />
=== When loading a model from disk I get "Warning: Class ':all:' is an unknown object class. Object 'model' of this class has been converted to a structure." ===<br />
<br />
You are trying to load a model file without the SUMO Toolbox in your Matlab path. Make sure the toolbox is in your Matlab path. <br />
<br />
In short: Start Matlab, run <code><SUMO-Toolbox-directory>/startup.m</code> (to ensure the toolbox is in your path) and then try to load your model.<br />
<br />
=== When running the SUMO Toolbox you get an error like "No component with id 'annpso' of type 'adaptive model builder' found in config file." ===<br />
<br />
This means you have specified to use a component with a certain id (in this case an AdaptiveModelBuilder component with id 'annpso') but a component with that id does not exist further down in the configuration file (in this particular case 'annpso' does not exist but 'anngenetic' or 'ann' does, as a quick search through the configuration file will show). So make sure you only declare components which have a definition lower down. So see which components are available, simply scroll down the configuration file and see which id's are specified. Please also refer to the [[Toolbox configuration#Declarations and Definitions | Declarations and Definitions]] page.<br />
<br />
=== When using NANN models I sometimes get "Runtime error in matrix library, Choldc failed. Matrix not positive definite" ===<br />
<br />
This is a problem in the mex implementation of the [http://www.iau.dtu.dk/research/control/nnsysid.html NNSYSID] toolbox. Simply delete the mex files, the Matlab implementation will be used and this will not cause any problems.<br />
<br />
=== When using FANN models I sometimes get "Invalid MEX-file createFann.mexa64, libfann.so.2: cannot open shared object file: No such file or directory." ===<br />
<br />
This means Matlab cannot find the [http://leenissen.dk/fann/ FANN] library itself to link to dynamically. Make sure the FANN libraries (stored in src/matlab/contrib/fann/src/.libs/) are in your library path, e.g., on unix systems, make sure they are included in LD_LIBRARY_PATH.<br />
<br />
=== Undeﬁned function or method ’createFann’ for input arguments of type ’double’. ===<br />
<br />
See [[FAQ#When_using_FANN_models_I_sometimes_get_.22Invalid_MEX-file_createFann.mexa64.2C_libfann.so.2:_cannot_open_shared_object_file:_No_such_file_or_directory..22]]<br />
<br />
=== When trying to use SVM models I get 'Error during fitness evaluation: Error using ==> svmtrain at 170, Group must be a vector' ===<br />
<br />
You forgot to build the SVM mex files for your platform. For windows they are pre-compiled for you, on other systems you have to compile them yourself with the makefile.<br />
<br />
=== When running the toolbox you get something like '??? Undefined variable "ibbt" or class "ibbt.sumo.config.ContextConfig.setRootDirectory"' ===<br />
<br />
First see [[FAQ#What_is_the_relationship_between_Matlab_and_Java.3F | this FAQ entry]].<br />
<br />
This means Matlab cannot find the needed Java classes. This typically means that you forgot to run 'startup' (to set the path correctly) before running the toolbox (using 'go'). So make sure you always run 'startup' before running 'go' and that both commands are always executed in the toolbox root directory.<br />
<br />
If you did run 'startup' correctly and you are still getting an error, check that Java is properly enabled:<br />
<br />
# typing 'usejava jvm' should return 1 <br />
# typing 's = java.lang.String', this should ''not'' give an error<br />
# typing 'version('-java')' should return at least version 1.5.0<br />
<br />
If (1) returns 0, then the jvm of your Matlab installation is not enabled. Check your Matlab installation or startup parameters (did you start Matlab with -nojvm?)<br />
If (2) fails but (1) is ok, there is a very weird problem, check the Matlab documentation.<br />
If (3) returns a version before 1.5.0 you will have to upgrade Matlab to a newer version or force Matlab to use a custom, newer, jvm (See the Matlab docs for how to do this).<br />
<br />
=== You get errors related to ''gaoptimset'',''psoptimset'',''saoptimset'',''newff'' not being found or unknown ===<br />
<br />
You are trying to use a component of the SUMO toolbox that requires a Matlab toolbox that you do not have. See the [[System requirements]] for more information.<br />
<br />
=== After upgrading I get all kinds of weird errors or warnings when I run my XML files ===<br />
<br />
See [[FAQ#How_do_I_upgrade_to_a_newer_version.3F]]<br />
<br />
=== I get a warning about duplicate samples being selected, why is this? ===<br />
<br />
Sometimes, in special circumstances, multiple sample selectors may select the same sample at the same time. Even though in most cases this is detected and avoided, it can still happen when multiple outputs are modelled in one run, and each output is sampled by a different sample selector. These sample selectors may then accidentally choose the same new sample location.<br />
<br />
=== I sometimes see the error of the best model go up, shouldn't it decrease monotonically? ===<br />
<br />
There is no short answer here, it depends on the situation. Below 'single objective' refers to the case where during the hyperparameter optimization (= the modeling iteration) combineOutputs=false, and there is only a single measure set to 'on'. The other cases are classified as 'multi objective'. See also [[Multi-Objective Modeling]].<br />
<br />
# '''Sampling off'''<br />
## ''Single objective'': the error should always decrease monotonically, you should never see it rise. If it does [[reporting problems|report it as a bug]]<br />
## ''Multi objective'': There is a very small chance the error can temporarily decrease but it should be safe to ignore. In this case it is best to use a multi objective enabled modeling algorithm<br />
# '''Sampling on'''<br />
## ''Single objective'': inside each modeling iteration the error should always monotonically decrease. At each sampling iteration the best models are updated (to reflect the new data), thus there the best model score may increase, this is normal behavior(*). It is possible that the error increases for a short while, but as more samples come in it should decrease again. If this does not happen you are using a poor measure or poor hyperparameter optimization algorithm, or there is a problem with the modeling technique itself (e.g., clustering in the datapoints is causing numerical problems).<br />
## ''Multi objective'': Combination of 1.2 and 2.1.<br />
<br />
(*) This is normal if you are using a measure like cross validation that is less reliable on little data than on more data. However, in some cases you may wish to override this behavior if you are using a measure that is independent of the number of samples the model is trained with (e.g., a dense, external validation set). In this case you can force a monotonic decrease by setting the 'keepOldModels' option in the SUMO tag to true. Use with caution!<br />
<br />
=== At the end of a run I get Undefined variable "ibbt" or class "ibbt.sumo.util.JpegImagesToMovie.createMovie" ===<br />
<br />
This is normal, the warning printed out before the error explains why:<br />
<br />
''[WARNING] jmf.jar not found in the java classpath, movie creation may not work! Did you install the SUMO extension pack? Alternatively you can install the java media framwork from java.sun.com''<br />
<br />
By default, at the end of a run, the toolbox will try to generate a movie of all the intermediate model plots. To do this it requires the extension pack to be installed (you can download it from the SUMO lab website). So install the extension pack and you will no longer get the error. Alternatively you can simply set the "createMovie" option in the <SUMO> tag to "false".<br />
So note that there is nothing to worry about, everything has run correctly, it is just the movie creation that is failing.<br />
<br />
=== On startup I get the error "java.io.IOException: Couldn't get lock for output/SUMO-Toolbox.%g.%u.log" ===<br />
<br />
This error means that SUMO is unable to create the log file. Check the output directory exists and has the correct permissions. If your output directory is on a shared (network) drive this could also cause problems. Also make sure you are running the toolbox (calling 'go') from the toolbox root directory, and not in some toolbox sub directory! This is very important.<br />
<br />
If you still have problems you can override the default logfile name and location as follows:<br />
<br />
In the <FileHandler> tag inside the <Logging> tag add the following option:<br />
<br />
<code><br />
<Option key="Pattern" value="My_SUMO_Log_file.log"/><br />
</code><br />
<br />
This means that from now on the sumo log file will be saved as the file "My_SUMO_Log_file.log" in the SUMO root directory. You can use any path you like.<br />
For more information about this option see [http://java.sun.com/j2se/1.4.2/docs/api/java/util/logging/FileHandler.html the FileHandler Javadoc].<br />
<br />
=== The Toolbox crashes with "Too many open files" what should I do? ===<br />
<br />
This is a known bug, see [[Known_bugs#Version_6.1]].<br />
<br />
If this does not fix your problem then do the following:<br />
<br />
On Windows try increasing the limit in windows as dictated by the error message. Also, when you get the error, use the fopen("all") command to see which files are open and send us the list of filenames. Then we can maybe further help you debug the problem. Even better would be to use the Process Explorer utility [http://technet.microsoft.com/en-us/sysinternals/bb896653.aspx available here]. When you get the error, dont shut down Matlab but start Process explorer and see which SUMO-Toolbox related files are open. If you then [[Reporting_problems|let us know]] we can further debug the problem.<br />
<br />
On Linux again don't shut down Matlab but:<br />
<br />
* open a new terminal window<br />
* type:<br />
<source lang="bash"><br />
lsof > openFiles.txt<br />
</source><br />
* Then [[Contact|send us]] the following information:<br />
** the file openFiles.txt <br />
** the exact Linux distribution you are using (Red Hat 10, CentOS 5, SUSE 11, etc).<br />
** the output of<br />
<source lang="bash"><br />
uname -a ; df -T ; mount<br />
</source><br />
<br />
As a temporary workaround you can try increasing the maximum number of open files ([http://www.linuxforums.org/forum/redhat-fedora-linux-help/64716-where-chnage-file-max-permanently.html see for example here]). We are currently debugging this issue.<br />
<br />
In general: to be safe it is always best to do a SUMO run from a clean Matlab startup, especially if the run is important or may take a long time.<br />
<br />
=== When using the LS-SVM models I get lots of warnings: "make sure lssvmFILE.x (lssvmFILE.exe) is in the current directory, change now to MATLAB implementation..." ===<br />
<br />
The LS-SVMs have a C implementation and a Matlab implementation. If you dont have the compiled mex files it will use the matlab implementation and give a warning. But everything will work properly. To get rid of the warnings, compile the mex files [[Installation#Windows|as described here]], this can be done very easily. Or simply comment out the lines that produce the output in the lssvmlab directory in src/matlab/contrib.<br />
<br />
=== I get an error "Undefined function or method 'trainlssvm' for input arguments of type 'cell'" ===<br />
<br />
You most likely forgot to [[Installation#Extension_pack|install the extension pack]].<br />
<br />
=== When running the SUMO-Toolbox under Linux, the [http://en.wikipedia.org/wiki/X_Window_System X server] suddenly restarts and I am logged out of my session ===<br />
<br />
Note that in Linux there is an explicit difference between the [http://en.wikipedia.org/wiki/Linux_kernel kernel] and the [http://en.wikipedia.org/wiki/X_Window_System X display server]. If the kernel crashes or panics your system completely freezes (you have to reset manually) or your computer does a full reboot. Luckily this is very rare. However, if you display server (X) crashes or restarts it means your operating system is still running fine, its just that you have to log in again since your graphical session has terminated. The FAQ entry is only for the latter. If you find your kernel is panicing or freezing, that is a more fundamental problem and you should contact your system admin.<br />
<br />
So what happens is that after a few seconds when the toolbox wants to plot the first model [http://en.wikipedia.org/wiki/X_Window_System X] crashes and you are suddenly presented with a login screen. The problem is not due to SUMO but rather to the Matlab - Display server interaction.<br />
<br />
What you should first do is set plotModels to false in the [[Config:ContextConfig]] tag, run again and see if the problem occurs again. If it does please [[Reporting_problems| report it]]. If the problem does not occur you can then try the following:<br />
<br />
* Log in as root (or use [http://en.wikipedia.org/wiki/Sudo sudo])<br />
* Edit the following configuration file using a text editor (pico, nano, vi, kwrite, gedit,...)<br />
<br />
<source lang="bash"><br />
/etc/X11/xorg.conf<br />
</source><br />
<br />
Note: the exact location of the xorg.conf file may vary on your system.<br />
<br />
* Look for the following line:<br />
<br />
<source lang="bash"><br />
Load "glx"<br />
</source><br />
<br />
* Comment it out by replacing it by:<br />
<br />
<source lang="bash"><br />
# Load "glx"<br />
</source><br />
<br />
* Then save the file, restart your X server (if you do not know how to do this simply reboot your computer)<br />
* Log in again, and try running the toolbox (making sure plotModels is set to true again). It should now work. If it still does not please [[Reporting_problems| report it]].<br />
<br />
Note:<br />
* this is just an empirical workaround, if you have a better idea please [[Contact|let us know]]<br />
* if you wish to debug further yourself please check the Xorg log files and those in /var/log<br />
* another possible workaround is to start matlab with the "-nodisplay" option. That could work as well.<br />
<br />
=== I get the error "Failed to close Matlab pool cleanly, error is Too many output arguments" ===<br />
<br />
This happens if you run the toolbox on Matlab version 2008a and you have the parallel computing toolbox installed. You can simply ignore this error message, it does not cause any problems. If you want to use SUMO with the parallel computing toolbox you will need Matlab 2008b.<br />
<br />
=== The toolbox seems to keep on running forever, when or how will it stop? ===<br />
<br />
The toolbox will keep on generating models and selecting data until one of the termination criteria has been reached. It is up to ''you'' to choose these targets carefully, so how low the toolbox runs simply depends on what targets you choose. Please see [[Running#Understanding_the_control_flow]].<br />
<br />
Of course choosing a-priori targets up front is not always easy and there is no real solution for this, except thinking well about what type of model you want (see [[FAQ#I_dont_like_the_final_model_generated_by_SUMO_how_do_I_improve_it.3F]]). In doubt you can always use a small value (or 0) and then simply quit the running toolbox using Ctrl-C when you think its been enough.<br />
<br />
While one could implement fancy, automatic stopping algorithms, their actual benefit is questionable.</div>Dgorissenhttp://sumowiki.intec.ugent.be/index.php?title=Running&diff=5125Running2010-06-16T12:54:50Z<p>Dgorissen: /* Test Suite */</p>
<hr />
<div>== Getting started ==<br />
<br />
If you are just getting started with the toolbox and you have no idea how everything works, this section should help you on your way.<br />
First make sure you [[About#Intended_use|know what the toolbox is used for]], you have finished the toolbox [[Installation]] and you have done a successful [[Installation#Test_run|test run]] by running the default configuration. If that works you know everything is working correctly. Then:<br />
<br />
# Go through the presentation [[About#Documentation|available here]], paying specific attention to the control flow<br />
# The behavior of the toolbox is fully configured through two XML files. If you do not know what XML is please read [[FAQ#What is XML?]] first.<br />
# Read [[Toolbox_configuration|the toolbox configuration structure section]]. This is very important. Then print out ''config/default.xml'' and take your time to read it through and understand the structure and the way things work.<br />
# Do the [[Installation#Test_run|test run]] again, this time play closer attention to what is happening and see if you understand what is going on. If you still have no idea you can refer to the [[Running#Understanding_the_control_flow|Understanding the control flow]] section below.<br />
# Ok, by now you should have a rough idea how the configuration file is structured and how the control flow works. Now, Change ''default.xml'' to run a different example. This [[Running#Running_different_examples|is explained below]]. If you can do that and it works you should have mastered all the basic skills needed to use the toolbox. You can now browse through the rest of the wiki as needed.<br />
<br />
If you get stuck or have any problems [[Reporting problems|please let us know]].<br />
<br />
''We are well aware that documentation is not always complete and possibly even out of date in some cases. We try to document everything as best we can but much is limited by available time and manpower. We are are a university research group after all. The most up to date documentation can always be found (if not here) in the default.xml configuration file and, of course, in the source files. If something is unclear please don't hesitate to [[Reporting problems|ask]].''<br />
<br />
== Running the default configuration ==<br />
<br />
Once the SUMO Toolbox is [[Installation|installed]] you can do a simple test run to check if everything is working as expected. This is explained on the [[Installation#Test_run | installation page]].<br />
<br />
== Running different examples ==<br />
<br />
=== Prerequisites ===<br />
This section is about running a different example problem, if you want to model your own problem see [[Adding an example]]. Make sure you [[configuration|understand the difference between the simulator configuration file and the toolbox configuration file]]. You should also have read [[Toolbox configuration#Structure]].<br />
<br />
=== Changing default.xml ===<br />
The <code>examples/</code> directory contains many example simulators that you can use to test the toolbox with. These examples range from predefined functions, to datasets from various domains, to native simulation code. If you want to try one of the examples, open <code>config/default.xml</code> and edit the [[Simulator| <Simulator>]] tag to suit your needs.<br />
<br />
For example, originally default.xml contains:<br />
<br />
<source lang="xml"><br />
<Simulator>Academic2DTwice</Simulator><br />
</source><br />
<br />
This means the toolbox will look in the examples directory for a project directory called <code>Academic2DTwice</code> and load the xml file with the same name inside that directory (in this case: <code>Academic2DTwice/Academic2DTwice.xml</code>).<br />
<br />
Now lets say you want to run one of the different example problems, for example, lets say you want to try the Michalewicz example. In this case you would replace the original Simulator tag with: <br />
<br />
<source lang="xml"><br />
<Simulator>Michalewicz</Simulator><br />
</source><br />
<br />
In addition you would have to change the <code><Outputs></code> tag. The <code>Academic2DTwice</code> example has two outputs (''out'' and ''outinverse''). However, the Michalewicz example has only one (''out''). Thus telling the SUMO Toolbox to model the ''outinverse'' output in that case makes no sense since it does not exist for the Michalewicz example. So the following output configuration suffices:<br />
<br />
<source lang="xml"><br />
<Outputs><br />
<Output name="out"><br />
</Output><br />
</source><br />
<br />
The rest of default.xml can be kept the same. Then simply run '<code>go</code>' to run the example (making sure that the toolbox is in your Matlab path of course).<br />
<br />
Note that it is also possible to specify an absolute path or refer to a particular xml file directly. For example:<br />
<br />
<source lang="xml"><br />
<Simulator>/path/to/your/project/directory</Simulator><br />
</source><br />
<br />
or:<br />
<br />
<source lang="xml"><br />
<Simulator>Ackley/Ackley2D.xml</Simulator><br />
</source><br />
<br />
=== Important notes ===<br />
<br />
If you start changing default.xml to try out different examples, there are a number of important things you should be aware of.<br />
<br />
==== Select a matching Input and Outputs ====<br />
Using the <code><Inputs></code> and <code><Outputs></code> tags in the SUMO-Toolbox configuration file you can tell the toolbox which outputs should be modeled and how. Note that these tags are optional. You can delete them and then the toolbox will simply model all available inputs and outputs. If you do specify a particular output, for example say you tell the toolbox to model output ''temperature'' of the simulator ''ChemistryProblem''. If you then change the configuration file to model ''BiologyProblem'' you will have to change the name of the selected output (or input) since most likely ''BiologyProblem'' will not have an output called ''temperature''.<br />
Another concrete example is given above with the Michalewicz example.<br />
<br />
==== Select a matching SampleEvaluator ====<br />
There is one important caveat. Some examples consist of a fixed data set, some are implemented as a Matlab function, others as a C++ executable, etc. When running a different example you have to tell the SUMO Toolbox how the example is implemented so the toolbox knows how to extract data (eg: should it load a data file or should it call a Matlab function). This is done by specifying the correct [[Config:SampleEvaluator|SampleEvaluator]] tag. The default SampleEvaluator is:<br />
<br />
<source lang="xml"><br />
<SampleEvaluator>matlab</SampleEvaluator><br />
</source><br />
<br />
So this means that the toolbox expects the example you want to run is implemented as a Matlab function. Thus it is no use running an example that is implemented as a static dataset using the '[[Config:SampleEvaluator#matlab|matlab]]' or '[[Config:SampleEvaluator#local|local]]' sample evaluators. Doing this will result in an error. In this case you should use '[[Config:SampleEvaluator#scatteredDataset|scatteredDataset]]' (or sometimes [[Config:SampleEvaluator#griddedDataset|griddedDataset]]).<br />
<br />
To see how an example is implemented open the XML file inside the example directory and look at the <source lang="xml"><Implementation></source> tag. To see which SampleEvaluators are available see [[Config:SampleEvaluator]].<br />
<br />
==== Select an appropriate AdaptiveModelBuilder ====<br />
Also remember that if you switch to a different example you may also have to change the [[Config:AdaptiveModelBuilder]] used. For example, if you are using a spline model (which only works in 2D) and you decide to model a problem with many dimensions (e.g., CompActive or BostonHousing) you will have to switch to a different model type (e.g., any of the SVM or LS-SVM model builders).<br />
<br />
==== Switch off Sample Selection if not needed ====<br />
If you are modeling a fixed, small size dataset it may make no sense to select samples incrementally. Instead you will probably load all the data at once and only generate models. See [[Adaptive_Modeling_Mode]] for how to do this.<br />
<br />
Finally the question may remain, what settings should I use for my problem? Well there is no best answer to this question, see [[General_guidelines]].<br />
<br />
== Running different configuration files ==<br />
<br />
If you just type "go" the SUMO-Toolbox will run using the configuration options in default.xml. However you may want to make a copy of default.xml and play around with that, leaving your original default.xml intact. So the question is, how do you run that file? Lets say your copy is called MyConfigFile.xml. In order to tell SUMO to run that file you would type:<br />
<br />
<source lang="xml"><br />
go('/path/to/MyConfigFile.xml')<br />
</source><br />
<br />
The path can be an absolute path, or a path relative to the SUMO Toolbox root directory.<br />
To see what other options you have when running go type ''help go''.<br />
<br />
'''Remember to always run go from the toolbox root directory.'''<br />
<br />
=== Merging your configuration ===<br />
<br />
If you know what you are doing, you can merge your own custom configuration with the default configuration by using the '-merge' option. Options or tags that are missing in this custom file will then be filled up with the values from the default configuration. This prevents you from having to duplicate tags in default.xml. However, if you are unfamiliar with XML and not quite sure what you are doing we advise against using it.<br />
<br />
=== Running optimization examples ===<br />
The SUMO toolbox can also be used for minimizing the simulator in an intelligent way. There are 2 examples in included in <code>config/Optimization</code>. To run these examples is exactly the same as always, e.g. <code>go('config/optimization/Branin.xml')</code>. The only difference is in the sample selector which is specified in the configuration file itself.<br />
<gallery><br />
Image:ISCSampleSelector2.png<br />
</gallery><br />
The example configuration files are well documented, it is advised to go through them for more detailed information.<br />
<br />
== Understanding the control flow ==<br />
<br />
[[Image:sumo-control-flow.png|thumb|300px|right|The general SUMO-Toolbox control flow]]<br />
<br />
When the toolbox is running you might wonder what exactly is going on. The high level control flow that the toolbox goes through is illustrated in the flow chart and explained in more detail below. You may also refer to the [[About#Presentation|general SUMO presentation]].<br />
<br />
# Select samples according to the [[InitialDesign|initial design]] and execute the [[Simulator]] for each of the points<br />
# Once enough points are available, start the [[Add_Model_Type#Models.2C_Model_builders.2C_and_Factories|Model builder]] which will start producing models as it optimizes the model parameters<br />
## the number of models generated depends on the [[Config:AdaptiveModelBuilder|AdaptiveModelBuilder]] used. Usually the AdaptiveModelBuilder tag contains a setting like ''maxFunEvals'' or ''popSize''. This indicates to the algorithm that is optimizing the model parameters (and thus generating models) how many models it should maximally generate before stopping. By increasing this number you will generate more models in between sampling iterations, thus have a higher chance of getting a better model, but increasing the computation time. This step is what we refer to as a ''modeling iteration''.<br />
## optimization over the model parameters is driven by the [[Measures|Measure(s)]] that are enabled. Selection of the Measure is thus very important for the modeling process!<br />
## each time the model builder generates a model that has a lower measure score than the previous best model, the toolbox will trigger a "New best model found" event, save the model, generate a plot, and trigger all the profilers to update themselves.<br />
## so note that by default, you only see something happen when a new best model is found, you do not see all the other models that are being generated in the background. If you want to see those, you must increase the logging granularity (or just look in the log file) or [[FAQ#How_do_I_enable_more_profilers.3F|enable more profilers]].<br />
# So the model builder will run until it has completed<br />
# Then, if the current best model satisfies all the targets in the enabled Measures, it means we have reached the requirements and the toolbox terminates.<br />
# If not, the [[SampleSelector]] selects a new set of samples (= a ''sampling iteration''), they are simulated, and the model building resumes or is restarted according to the configured restart strategy<br />
# This whole loop continues (thus the toolbox will keep running) until one of the following conditions is true:<br />
## the targets specified in the active measure tags have been reached (each Measure has a target value which you can set). Note though, that when you are using multiple measures (see [[Multi-Objective Modeling]]) or when using single measures like AIC or LRM, it becomes difficult to set a priori targets since you cant really interpret the scores (in contrast to the simple case with a single measure like CrossValidation where your target is simply the error you require). In those cases you should usually set the targets to 0 and use one of the other criteria below to make sure the toolbox stops.<br />
## the maximum running time has been reached (''maximumTime'' property in the [[Config:SUMO]] tag)<br />
## the maximum number of samples has been reached (''maximumTotalSamples'' property in the [[Config:SUMO]] tag)<br />
## the maximum number of modeling iterations has been reached (''maxModelingIterations'' property in the [[Config:SUMO]] tag)<br />
<br />
<br />
Note that it is also possible to disable the sample selection loop, see [[Adaptive Modeling Mode]]. Also note that while you might think the toolbox is not doing anything, it is actually building models in the background (see above for how to see the details). The toolbox will only inform you (unless configured otherwise) if it finds a model that is better than the previous best model (using that particular measure!!). If not it will continue running until one of the stopping conditions is true.<br />
<br />
== Output ==<br />
<br />
All output is stored under the [[Config:ContextConfig#OutputDirectory|directory]] specified in the [[Config:ContextConfig]] section of the configuration file (by default this is set to "<code>output</code>"). <br />
<br />
Starting from version 6.0 the output directory is always relative to the project directory of your example. Unless you specify an absolute path.<br />
<br />
After completion of a SUMO Toolbox run, the following files and directories can be found there (e.g. : in <code>output/<run_name+date+time>/</code> subdirectory) :<br />
<br />
* <code>config.xml</code>: The xml file that was used by this run. Can be used to reproduce the entire modeling process for that run.<br />
* <code>randstate.dat</code>: contains states of the random number generators, so that it becomes possible to deterministically repeat a run (see the [[Random state]] page).<br />
* <code>samples.txt</code>: a list of all the samples that were evaluated, and their outputs.<br />
* <code>profilers</code>-dir: contains information and plots about convergence rates, resource usage, and so on.<br />
* <code>best</code>-dir: contains the best models (+ plots) of all outputs that were constructed during the run. This is continuously updated as the modeling progresses.<br />
* <code>models_outputName</code>-dir: contains a history of all intermediate models (+ plots + movie) for each output that was modeled.<br />
<br />
If you generated models [[Multi-Objective Modeling|multi-objectively]] you will also find the following directory:<br />
<br />
* <code>paretoFronts</code>-dir: contains snapshots of the population during multi-objective optimization of the model parameters.<br />
<br />
== Debugging ==<br />
<br />
Remember to always check the log file first if problems occur!<br />
When [[reporting problems]] please attach your log file and the xml configuration file you used.<br />
<br />
To aid understanding and debugging you should set the console and file logging level to FINE (or even FINER, FINEST)<br />
as follows: <br />
<br />
Change the level of the ConsoleHandler tag to FINE, FINER or FINEST. Do the same for the FileHandler tag. <br />
<br />
<source lang="xml"><br />
<!-- Configure ConsoleHandler instances --><br />
<ConsoleHandler><br />
<Option key="Level" value="FINE"/><br />
</ConsoleHandler><br />
</source><br />
<br />
== Using models ==<br />
<br />
Once you have generated a model, you might wonder what you can do with it. To see how to load, export, and use SUMO generated models see the [[Using a model]] page.<br />
<br />
== Modeling complex outputs ==<br />
<br />
The toolbox supports the modeling of complex valued data. If you do not specify any specific <[[Outputs|Output]]> tags, all outputs will be modeled with [[Outputs#Complex_handling|complexHandling]] set to '<code>complex</code>'. This means that a real output will be modeled as a real value, and a complex output will be modeled as a complex value (with a real and imaginary part). If you don't want this (i.e., you want to model the modulus of a complex output or you want to model real and imaginary parts separately), you explicitly have to set [[Outputs#Complex_handling|complexHandling]] to 'modulus', 'real', 'imaginary', or 'split'.<br />
<br />
More information on this subject can be found at the [[Outputs#Complex_handling|Outputs]] page.<br />
<br />
== Models with multiple outputs ==<br />
<br />
If multiple [[Outputs]] are selected, by default the toolbox will model each output separately using a separate adaptive model builder object. So if you have a system with 3 outputs you will get three different models each with one output. However, sometimes you may want a single model with multiple outputs. For example instead of having a neural network for each component of a complex output (real/imaginary) you might prefer a single network with 2 outputs. To do this simply set the 'combineOutputs' attribute of the <AdaptiveModelBuilder> tag to 'true'. That means that each time that model builder is selected for an output, the same model builder object will be used instead of creating a new one.<br />
<br />
Note though, that not all model types support multiple outputs. If they don't you will get an error message.<br />
<br />
Also note that you can also generate models with multiple outputs in a multi-objective fashion. For information on this see the page on [[Multi-Objective Modeling]].<br />
<br />
== Multi-Objective Model generation ==<br />
<br />
See the page on [[Multi-Objective Modeling]].<br />
<br />
== Interfacing with the SUMO Toolbox ==<br />
<br />
To learn how to interface with the toolbox or model your own problem see the [[Adding an example]] and [[Interfacing with the toolbox]] pages.<br />
<br />
== Test Suite ==<br />
<br />
The a test harness is provided that can be run manually or automatically as part of a cron job. The test suite consists of a number of test XML files (in the config/test/ directory), each describing a particular surrogate modeling experiment. The file config/test/suite.xml dictates which tests are run and their order. The suite.xml file also contains the accuracy and sample bounds that are checked after each test. If the final model found does not fall within the accuracy or number-of-samples bounds, the test is considered failed.<br />
<br />
Note also that some of the predefined test cases may rely on data sets or simulation code that are not publically available for confidentiality reasons. However, since these test problems typically make very good benchmark problems we left them in for illustration purposes. You can simply comment out the relevant tests in config/suite.xml.<br />
<br />
The coordinating class is the Matlab TestSuite class found in the src/matlab directory. Besides running the tests defined in suite.xml it also tests each of the model member functions.<br />
<br />
Assuming the SUMO Toolbox is setup properly, the test suite should be run as follows (from the SUMO root directory):<br />
<br />
<source lang="matlab"><br />
s = TestEngine('config/test/suite.xml') ; s.run()<br />
</source><br />
<br />
The "run()" method also supports an optional parameter (a vector) that dictates which tests to run (e.g., run([2 5 3]) will run tests 2,5 and 3).<br />
<br />
''Note that due to randomization the final accuracy and number of samples used may vary slightly from run to run (causing failed tests). Thus the bounds must be set sufficiently loose.''<br />
<br />
== Tips ==<br />
<br />
See the [[Tips]] page for various tips and gotchas.</div>Dgorissenhttp://sumowiki.intec.ugent.be/index.php?title=Changelog&diff=5124Changelog2010-06-13T14:40:31Z<p>Dgorissen: </p>
<hr />
<div>Below you will find the detailed list of changes in every new release. For a more high level overview see the [[Whats new]] page.<br />
<br />
== 7.0.1 - 15 June 2010 ==<br />
<br />
* Bugfix release<br />
<br />
== 7.0 - 29 January 2010 ==<br />
<br />
* Move to a dual license model, with an open source licence (AGPLv3) for non-commercial use, see [[License terms]]<br />
* Experimental support for classification and 3D geometric modeling problems (see the 2 new demos)<br />
* Thorough cleanup of SampleEvaluator related classes and package structure<br />
* Improved speed and stability in (Blind) Kriging models and fixed the correlation function derivatives.<br />
* Vastly improved the utilization of compute nodes if a distributed sample evaluator is used that interfaces with a cluster or grid<br />
* Support for plotting the prediction uncertainty in the model browser GUI <br />
* Support for quasi random sequences as initial design<br />
<br />
== 6.2.1 - 19 October 2009 ==<br />
<br />
* This release fixes a number of bugs from 6.2. All users are strongly requested to upgrade.<br />
<br />
== 6.2 - 6 October 2009 ==<br />
<br />
* A new neural network modelbuilder "ann". This is a lot faster than the existing "anngenetic" and the quality of the models is roughly the same<br />
* The sample selection infrastructure is now much more powerful, sample selection criteria can be combined with much more flexibility. This opens the way to dynamic variation of sampling criteria.<br />
* Support for Input constraints / multiple output sampling in the LOLA-Voronoi sample selection algorithm<br />
* Support for auto-sampled inputs (e.g., frequency in an EM context) in LOLA-Voronoi. This is useful if a particular input is already sampled by your simulator.<br />
* Automatic filtering of samples close to each other in CombinedSampleSelector<br />
* Support for TriScatteredInterp in InterpolationModel when it is available (Matlab version 2009a and later)<br />
* Sample selectors that support it (for example: LOLA-Voronoi) now give priorities to new samples, to that samples are submitted and evaluated in order of importance.<br />
* Support for pre-calculated Latin Hypercube Designs, these will be automatically downloaded and used where possible and will improve performance<br />
* The Blind Kriging models have been improved and can now also be used as ordinary Kriging models. Since these models are superior to the existing DACE Toolbox models, the DACE Toolbox backend has been removed.<br />
* The EGOModelBuilder (do model parameter optimization using the EGO algorithm) now uses a nested blind kriging model instead of one based on the DACE Toolbox. This allows for better accuracy<br />
* The Kriging correlation functions can now be chosen automatically (instead of only the correlation parameters)<br />
* Support for multiobjective optimization in the EGO framewok (extended version of probability of improvement)<br />
* DelaunaySampleSelector, OptimizeCriterion support the same set of criterions<br />
* EGO Improvement criteria can now be used together with DACEModel, RBFModel, and SVMModel (LS-SVM backend only)<br />
* Added a model type and builder that does linear/cubic/nearest neighbour interpolation<br />
* All error functions and measures now consistently deal with complex valued data and multiple output models<br />
* Various improvements in the Model Info GUI as part of the Model browser tool<br />
* Improved stability in LRMMeasure, a behavioral complexity metric to help ensure parsimonious models<br />
* The profiler GUI has been updated and improved, and support for textual profilers has been added.<br />
* Improved performance when using Measures, especially for models with multiple outputs.<br />
* Improved management of the best model trace, also in pareto mode<br />
* Removed the debug output when using (LS-)SVM models and added compiled mex files for Windows<br />
* Ported the remaining classes to Matlabs Classdef format<br />
* Increased use of the parallel computing toolbox (if available) in order to speed up modeling<br />
* Improved the Matlab file headers so the help text is more informative (always includes at least the signature)<br />
* Support for plotting the model prediction uncertainty in the model browser (only for 1D plots and not supported by all model types)<br />
* Added support for so-called "reference by id" on every level of the config. If a tag of a particular type is defined on top-level with an id, it can be referenced everywhere else, instead of copying it entirely. See rationalPoleSupression sample selector and patternsearch Optimizer, for example.<br />
* EmptyModelBuilder added - in case you just want to use the sequential design facilities of the toolbox, but not its models.<br />
* Various cleanups and bugfixes<br />
<br />
== 6.1.1 - 17 April 2009 ==<br />
<br />
* Various cleanups and bugfixes (see [[Known bugs]] for 6.1)<br />
<br />
== 6.1 - 16 February 2009 ==<br />
<br />
* The default error function is now the Bayesian Error Estimation Quotient (BEEQ)<br />
* Full support for multi-objective model generation, multiple measures can now be enforced simultaneously. This can also be applied to generating models with multiple outputs (combineOutputs = true). Together with the automatic model type selection algorithm (heterogenetic) this allows the automatic selection of the best model type per output.<br />
* The model browser GUI now supports QQ plots<br />
* The Gradient Sample Selection Algorithm has been renamed to the Local Linear Sample Selector (LOLASampleSelector)<br />
* The modelbuilders have been refactored and some removed. This is a result of the optimizer hierarchy being cleaned up. Adding a new model parameter optimization routine should now be more straightforward.<br />
* The interface classes have been renamed to factories as this is more correct. All implementations have been ported to Matlab's new Classdef format and the inherritance hierarchy has been cleaned up. It should now be significantly easier to add support for new approximation types.<br />
* The ModelInterfaces are now known as ModelFactories, this is more correct. Note that the XML tagnames have been changed as well.<br />
* The Model class hierarchy has been converted to the new Classdef format. This means that models generated with previous versions of the toolbox will no longer be loadable in this version.<br />
* The heterogenetic model builder for automatic model type selection has been cleaned up and made more robust.<br />
* Rational models now support all available modelbuilders. This means that order selection can be done by PSO DIRECT, Simulated Annealing, ... instead of just GA and Sequential.<br />
* New optimizers added are (they can also be used as model builders): Differential Evolution<br />
* Added a Blind Kriging model type implementation as a backend of KrigingModel<br />
* Addition of an EGO model builder. This allows optimization of the model parameters using the well known Efficient Global Optimization (EGO) algorithm. In essence this uses a nested Kriging Model to predict which parameters should be used to build the next model.<br />
* Trivial dependencies on the Statistics Toolbox have been removed<br />
* Added a new smoothness measure (LRMMeasure) that helps to ensure smooth models and reduce erratic bumps. It works best when combined with other Measures (such as SampleError for ANN models) <br />
* Models now have a simple evaluateDerivative() method that allows one to easily get gradient information. The base class implementation is very simple but works. Models can override this method to get more efficient implementations.<br />
* Added experimental support for the Matlab Parallel Computing Toolbox (local scheduler only). This means that when the parallelMode option in ContextConfig is switched on, model construction will make use of all available cores/cpu's.<br />
* Many speed improvements, some quite significant.<br />
* Various cleanups and bugfixes<br />
<br />
== 6.0.1 - Released 23 August 2008 ==<br />
<br />
* Fixed a number of (minor) bugs in the 6.0 release<br />
<br />
== 6.0 - Released 6 August 2008 ==<br />
<br />
* Many important bugs have been fixed that could have resulted in sub-optimal models<br />
* Addition of a Model Browser GUI, this allows you to easily 'walk' through multi-dimensional models<br />
* Moved the InitialDesign tag outside of the SUMO tag<br />
* Some speed improvements<br />
* Removed support for dummy inputs<br />
* Measure scores and input/output names are saved inside the models, allowing for more usable plots<br />
* Added the project directory concept, each example is now self contained in its own directory<br />
* #simulatorname# can now be used in the run name, it will get replaced by the real simulator name<br />
* Input dimensions can be ignored during sampling if the simulator samples them for you. This is useful in EM applications for example where frequency points can be cheap.<br />
* Logging framework revamped, logs can now be saved on a per run basis<br />
* The global score calculation has changed! it is a weighted sum of all individual measures. (the weights are configurable but default to 1)<br />
* Added a simple polynomial model where the orders can be chosen manually<br />
* Countless cleanups, minor bugfixes and feature enhancements<br />
<br />
== 5.0 - Released 8 April 2008 ==<br />
<br />
* In April 2008, the first public release of the '''Surrogate Modeling (SUMO) Toolbox''' (v5.0) occurred. <br />
* A major new release with countless fixes, improvements, new sampling and modeling algorithms, and much more.<br />
<br />
List of changes:<br />
<br />
* Fixed the 'Known bugs' for v4.2 (see Wiki)<br />
* data points now have priorities (assigned by the sample selectors)<br />
* Vastly reworked and improved the sample evaluator framework<br />
** robust handling of failed or 'lost' data points<br />
** pluggable input queue infrastructure to make advanced scheduling policies possible<br />
* The number of samples to select each iteration is now selected dynamically, based on the time needed for modeling, the length of one simulation, the number of compute nodes available, ... A user specified upper bound can till be specified of course.<br />
* Model plots are now in the original space instead of the normalized ([-1 1]) space<br />
* The default error function is now the root relative square error (= a global relative error)<br />
* Intelligent seeding of each new model parameter optimization iteration. This means the model parameter space is searched much more efficiently and completely<br />
* Added a fast Neural Network Modeler based on FANN (http://fann.sf.net)<br />
* Added a Neural Network Modeler based on NNSYSID (http://www.iau.dtu.dk/research/control/nnsysid.html)<br />
* The LS-SVM model type has been merged with the SVM model type. The SVM model now supports three backends: libSVM, SVMlight, and lssvm<br />
* Added a SampleSelector using infill sampling criterions (ISC).<br />
** The expected improvement from EGO/superEGO is provided among others. (only usable with Kriging and RBF)<br />
* More robust handling of SSH sessions when running simulators on a remote cluster<br />
* The TestSamples measure has been renamed to ValidationSet<br />
* The Polynomial model type has been renamed to the more apt Rational model<br />
* The grid and voronoi sample selectors have been renamed to Error and Density respectively<br />
* Drastically reduced memory usage when performing many runs with multiple datasets (datasets are cached)<br />
* Added utility functions for easily summarizing profiler data from a large number of runs<br />
* Lots of speed improvements in the gradient sample selector<br />
* The default settings have been harmonized and much improved<br />
* The (LS)SVM parameter space is now searched in log10 instead of ln space<br />
* Added a TestMinimum measure <br />
** compares the minimum of the surrogate model against a predefined value (for instance a known minimum)<br />
* Added a MinimumProfiler<br />
** tracks the minimum of the surrogate model versus the number of iterations<br />
* Movie creation now works on all supported platforms<br />
* Added an optimizer class hierarchy for solving subproblems transparantly<br />
* Cleaned up the structure of all the model classes so they no longer contain an interface object. This was confusing and led to error prone code. Virtually all subsref and subassgn implementations have also been removed.<br />
* The MinMax measure is now enabled by default<br />
* The Optimization framework was removed (and replaced) for various reasons, see: http://sumowiki.intec.ugent.be/index.php/FAQ#What_about_surrogate_driven_optimization.3F<br />
* Fixed the file output of the profiler, formatting is correct now<br />
* New implementation of a maximin latin hypercube design<br />
** Minimizes pairwise correlation<br />
** Minimizes intersite distance<br />
* Removed dependency of factorial design on the statistics toolbox<br />
* Added a plotOptions tag, this allows for more customisability of model plots (grey scale, light effects, ...)<br />
* Profiler plots can now also be saved as JPG, PNG, EPS, PDF, PS and SVG<br />
* Countless cleanups, minor bugfixes and feature enhancements<br />
<br />
== 4.2 - Released 18 October 2007 ==<br />
<br />
* Fixed the 'Known bugs' for v4.1 (see Wiki)<br />
* Simulators can be passed options through an <Options> tag<br />
* Added a fixed model builder so you can manually force which model parameters to use<br />
* Removed ProActive dependency for the SGE distributed backend<br />
* Improved Makefile under unix/linux<br />
* Data produced by simulators no longer needs to be pre-scaled to [-1 1], this can be done automatically from the simulator configuration file<br />
* Deprecated the optimization framework. It is currently under re-design and a better, more integrated version, will be released with the next toolbox version.<br />
* Lots of cleanups, minor bugfixes and small feature enhancements<br />
* In October 2007, the development of the M3-Toolbox was discontinued.<br />
<br />
== 4.1 - Released 27 July 2007 ==<br />
<br />
* Fixed the 'Known bugs' for v4.0 (see Wiki)<br />
* Vastly improved test sample distribution if a test set is created on the fly<br />
* Gradient sample selector now works with complex outputs and has improved neighbourhood selection<br />
* Speed and usability improvements in the profiler framework<br />
* Improvements in the profiler DockedView widget (added a right click context menu)<br />
* Addition of some new examples<br />
* Added an option (on by default) that selects a certain percentage of the grid sample selector's points randomly, making the algorithm more robust<br />
* Some cleanups, minor bugfixes and feature enhancements<br />
<br />
== 4.0 - Released 22 June 2007 ==<br />
<br />
* IMPORTANT: the best model score is now 0 instead of 1, this is more intuitive<br />
* Reworked and improved the model scoring mechanism, now based on a pareto analysis. This makes it possible to combine multpile measures in a sensible way.<br />
* Added a proof of concept surrogate driven optimization framework. Note this is an initial implementation which works, but don't expect state of the art results.<br />
* Cleanup and refactoring of the profiler framework<br />
* The profiling of model parameters has been totally reworked and this can now easily be tracked in a nice GUI widget<br />
* Cleanup of error function logic so you can now easily use different error functions (relative, RMS, ...) in the measures<br />
* Improved model plotting<br />
* Support for the SVMlight library (you must download it yourself in order to use it)<br />
* Added a MinMax measure which can be used to suppress spikes in rational models<br />
* Support for extinction prevention in the heterogenetic modeler<br />
* Fixed warnings (and in some cases errors) when loading models from disk<br />
* Respect the maximum running time more accurately<br />
* Many cleanups, minor bugfixes and feature enhancements<br />
<br />
== 3.3 - Released 2 May 2007 ==<br />
<br />
* Fixed incorrect summary at the end of a run<br />
* Fixed bug due to duplicate sample points<br />
* Ability to evaluate multiple samples in parallel locally (support for dual/multi-core machines)<br />
* Speedups when reading in datasets<br />
* Added 2 new modelbuilders that optimize the parameters using;<br />
** Pattern Search (requires the Matlab direct search toolbox)<br />
** Simulated Annealing (requires Matlab v7.4 and the direct search toolbox)<br />
** The Matlab Optimization Toolbox (includes different gradient based methods like BGFS)<br />
* A new density based sample selction algorithm (VoronoiSampleSelector)<br />
* New simulator examples to test with<br />
* Addition of a profiler to generate levelplots<br />
* Ability to generate Matlab API documentation using m2html<br />
* New neural network training algorithms based on Differential Evolution and Particle Swarm Optimization<br />
* It is now possible to call the toolbox with specific samples/values directly, e.g., go('myConfigFile.xml',xValues,yValues);<br />
* Many minor bugfixes and feature enhancements<br />
<br />
== 3.2 - Released 9 Mar 2007 ==<br />
<br />
* Many important bugfixes<br />
* Documentation improvements<br />
* Fully working support for RBF models<br />
* New measure profilers that track the errors on measures<br />
* Many new predefined functions and datasets to test with. We now have over 50 examples!<br />
<br />
== 3.1 - Released 28 Feb 2007 ==<br />
<br />
* Small bugfixes and usability improvements<br />
* Improved documentation<br />
* Working implementation of a heterogenous evolutionary modelbuilder<br />
* More examples<br />
<br />
== 3.0 - Released 14 Feb 2007 ==<br />
<br />
* Availability of pre-built binaries<br />
* Extensive refactoring and code cleanups<br />
* Many bugfixes and usability improvements<br />
* Resilience against simulator crashes<br />
* Ability to set the maximum running time for one sample evaluation<br />
* Vastly improved Genetic model builder + a neural network implementation<br />
* Addition of a RandomModelBuilder to use as a baseline benchmark<br />
* Possible to add dummy input variables or to model only a subset of the available inputs while clamping others<br />
* Improved multiple output support<br />
** outputs can be modeled in parallel<br />
** each output can be configured separately (eg. per output: model type, accuracy requirements (measure), sample selection algorithm, complex handling flag, etc) <br />
** mutliple outputs can be combined into one model if the model type supports this<br />
* Noisy (gaussian, outliers, ...) versions of a given output can be automatically added <br />
* New and improved directory structure for output data<br />
* New model types:<br />
** Kriging (based on the DACE MATLAB Kriging Toolbox by Lophaven, Nielsen and Sondergaard)<br />
** Splines (based on the MATLAB Splines Toolbox, only for 1D and 2D)<br />
* Now matlab scripts can be used as datasources (simulators) as well<br />
* New initial experimental design<br />
** Based on a dataset<br />
** Combination of existing designs<br />
** Based on the complexity of different 1D fits<br />
* Addition of new datasets and predefined functions as modeling examples<br />
<br />
== 2.0 - Released 15 Nov 2006 ==<br />
<br />
* Initial release of the M3-Toolbox - open source</div>Dgorissenhttp://sumowiki.intec.ugent.be/index.php?title=Whats_new&diff=5123Whats new2010-06-13T14:40:05Z<p>Dgorissen: </p>
<hr />
<div>This page gives a high level overview of the major changes in each toolbox version. For the detailed list of changes please refer to the [[Changelog]] page. For a list of features in the current version [[About#Features|see the about page]].<br />
<br />
== 7.0.1 - 15 January 2010 ==<br />
<br />
This release fixes a couple of known bugs, the most important one being a clustering related bug in the LOLA sample selection algorithm. All users are strongly encouraged to upgrade.<br />
<br />
== 7.0 - 29 January 2010 ==<br />
<br />
The biggest change of this release is the move to a new license model. From now on the SUMO Toolbox will be available under an '''open source''' license for non-commercial use. This means there no longer is a time or user limit and there is no need for activation files. Details can be found in the [[License terms]].<br />
<br />
Besides this the code has seen some improvements and cleanups, most notably the Sample Evaluator and (Blind) Kriging components.<br />
<br />
== 6.2.1 - 19 October 2009 ==<br />
<br />
A bug fix release, all users are strongly requested to upgrade.<br />
<br />
== 6.2 - 6 October 2009 ==<br />
<br />
=== Sample Selection infrastructure ===<br />
<br />
The sample selection infrastructure has been dramatically refactored in to a highly flexible and pluggable system. Different sample selection criteria can now be combined in a variety of different ways and the road has been opened towards dynamic sample selection criteria.<br />
<br />
The LOLA-Voronoi algorithm has also seen some improvement with the addition of support for input constraints, sampling multiple outputs simultaneously, and improved support for dealing with auto-sampled inputs.<br />
<br />
Sample points are now also assigned a priority by the sampling algorithm which is reflected in the order they are evaluated. Finally, the Latin Hypercube design has been much improved. It will now attempt to download known optimal designs automatically before attempting to generate one itself.<br />
<br />
=== Model building infrastructure ===<br />
<br />
The two main changes here are firstly the addition of an "ann" modelbuilder beside the existing "anngenetic" one. This one runs faster, is more configurable and the quality of the models is roughly the same. <br />
<br />
Secondly, the (Blind) Kriging models have been much improved. A new implementation was added that replaces (and outperforms) the existing DACE Toolbox plugin. Support has also been added for automatically selecting the Kriging correlation functions.<br />
<br />
=== Other changes ===<br />
<br />
Other noteworthy changes include: the addition of an interpolation model type, cleanups and fixes in the error functions, improved stability in LRMMeasure, faster measures in a multi-output setting, and more informative help texts. Additionally the Model Browser and Profiler GUIs have seen some improvements in usability and functionality.<br />
<br />
At the same time the code has seen more cleanups (it is now fully Classdef compliant) and the use of the parallel computing toolbox (if available) has been improved.<br />
<br />
As always, a detailed list of changes can be found in the [[Changelog]].<br />
<br />
== 6.1.1 - 17 April 2009 ==<br />
<br />
This is a bugfix release that contains some cleanups and fixes to the [[Known bugs]] of version 6.1<br />
<br />
== 6.1 - 16 February 2009 ==<br />
<br />
The main improvements of 6.1 over 6.0.1 are stability, robustness, speed, and improved interfacing. However, a number of major new features have been added as well.<br />
<br />
=== Multi-Objective Modeling ===<br />
<br />
Full [[Multi-Objective Modeling|multi-objective]] support when optimizing the model parameters. This allows an engineer to enforce multiple criteria on the models produced (instead of just a single accuracy measure). This will also allow the efficient generation of model with multiple outputs (already possible through the combineOutputs option but not yet in a multi-objective setting). Together with the automatic model type selection algorithm (heterogenetic) this allows the automatic selection of the best model type per output. See [[Multi-Objective Modeling]] for more information and usage.<br />
<br />
=== Smoothness Measure ===<br />
<br />
A new measure: Linear Reference Model (LRM) has been added. This measure is best used together with other measures and helps to enforce a smooth model surface.<br />
<br />
=== Parallel Computing ===<br />
<br />
Added experimental support for the Matlab Parallel Computing Toolbox (local scheduler only). This means that when the parallelMode option in ContextConfig is switched on, model construction will make use of all available cores/cpu's in order to build models in parallel. This can result in some significant speedups.<br />
<br />
=== General Modeling ===<br />
<br />
The ''heterogenetic'' model builder for automatic model type selection has seen many cleanups and the code has been improved. Now there should be no more manual hacks in order to use it. The rational models now support all available optimization algorithms for order selection and two new model types have been added: Blind Kriging and Gaussian Process Models. An Efficient Global Optimization (EGO) modelbuilder has also been added. This means that a nested kriging model is used internally to predict which model parameters (e.g., of an SVM model) will result in the most accurate fit. All models can now also be queried for derivatives at any point in their domain (regarless of the model type).<br />
<br />
=== Code improvements ===<br />
<br />
From now on Matlab 2008a or later will be required to run the toolbox (see [[System requirements]]). The reason is that most of the modeling code has been ported to Matlabs new [[OO_Programming_in_Matlab|Object Orientation]] implementation. The result is that the modeling code has become much cleaner and much less prone to bugs. The interfaces have become more well-defined and it should be much easier to incorporate your own model type or hyperparameter optimization algorithm.<br />
<br />
Note also that the Gradient Sample Selection algorithm has been renamed to LOLA.<br />
<br />
=== General Improvements ===<br />
<br />
In general, many bugs have been fixed, features, and error reporting improved and performance enhanced. Also note that the default error function is now the [http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=4107991 Bayesian Error Estimation Quotient (BEEQ)]. Trivial dependencies on the Statistics Toolbox have been removed.<br />
<br />
== 6.0.1 - Released 23 August 2008 ==<br />
<br />
* This is a bugfix release that fixes a few things in the 6.0 release (including a crash on startup in some cases, see [[Known bugs]])<br />
<br />
== 6.0 - Released 6 August 2008 ==<br />
<br />
Originally this was supposed to be 5.1 but after many fixes and added features we decided to to promote it to 6.0. Some of the things that can be expected for 6.0 are:<br />
<br />
* Some important modeling related bugs have been fixed leading to improved model accuracy convergence<br />
* A nice graphical user interface (GUI) for loading models, browsing through dimensions, plotting errors, generating movies, ... ([[Model Visualization GUI|See here for more information]])<br />
* Introduction of project directories. All files belonging to a particular problem (simulation code, datasets, XML files, documentation, ...) are now grouped together in a project directory instead of being spread out over 3 different places.<br />
* Support for autosampling, one or more dimensions can be ignored during adaptive sampling. This is useful if the simulation code can generate samples for that dimension itself (e.g., frequency samples in the case of a frequency domain simulator in Electro-Magnetism)<br />
* Models now remember axis lables, measure scores, and output names<br />
* An export function has been added to export models to a standalone Matlab script (.m file). Not supported for all model types yet.<br />
* Proper support for Matlab R2008<br />
* A simple new model type "PolynomialModel" that builds polynomial models with a fixed (user defined) order<br />
* Note that in some cases loading models generated by older toolbox versions will not work and give an error<br />
<br />
And of course countless bugfixes, performance, and feature enhancements. '''Upgrading is strongly advised'''.<br />
<br />
== 5.0 - Released 8 April 2008 ==<br />
<br />
=== SUMO Toolbox ===<br />
<br />
In April 2008, the first public release of the '''SUrrogate MOdeling (SUMO) Toolbox''' occurred.<br />
<br />
=== Sampling related changes ===<br />
<br />
The sample selection and evaluation backends have seen some major improvements. <br />
<br />
The number of samples selected each iteration need no longer be chosen a priori but is determined on the fly based on the time needed for modeling, the average length of the past 'n' simulations and the number of compute nodes (or CPU cores) available. Of course, a user specified upper bound can still be specified. It is now also possible to evaluate data points in batches instead of always one-by-one. This is useful if, for example, there is a considerable overhead for submitting one point.<br />
<br />
In addition, data points can be assigned priorities by the sample selection algorithm. These priorities are then reflected in the scheduling decisions made by the sample evaluator. It now also becomes possible to add different priority management policies. For example, one could require that 'interest' in sample points be renewed, else their priorities will degrade with time.<br />
<br />
A new sample selection algorithm has been added that can use any function as a criterion of where to select new samples. This function is able to use all the information the surrogate provides to calculate how interesting a certain sample is. Internally, a numeric global optimizer is applied on the criterion to determine the next sample point(s). There are several criterions implemented, mostly for global optimization. For instance the 'expected improvement criterion' is very efficient for global optimization as it balances between optimization itself and refining the surrogate.<br />
<br />
Finally the handling of failed or 'lost' data points has become much more robust. Pending points are automatically removed if their evaluation time exceeds a multiple of the average evaluation time. Failed points can also be re-submitted a number of times before being regarded as permanently failed.<br />
<br />
=== Modeling related changes ===<br />
<br />
The modeling code has seen some much needed cleanups. Adding new model types and improving the existing ones is now much more straightforward.<br />
<br />
Since the default Matlab neural network model implementation is quite slow, two additional implementations were added based on [http://fann.sf.net FANN] and [http://www.iau.dtu.dk/research/control/nnsysid.html NNSYSID] which are much faster. In addition the NNSYSID implementation also supports pruning. However, though these two implementations are faster, the Matlab implementation still outperforms them accuracy wise.<br />
<br />
An intelligent seeding strategy has been enabled. The starting point/population of each new model parameter optimization run is now chosen intelligently in order to achieve a more optimal search of the model parameter space. This leads to better models faster.<br />
<br />
=== Optimization related changes ===<br />
<br />
* The Optimization framework was removed due to [[FAQ#What_about_surrogate_driven_optimization.3F|several reasons]].<br />
* Added an [[Optimizer|optimizer]] class hierarchy for solving subproblems transparently.<br />
* Added several criterions for optimization, available through the [[Config:SampleSelector#isc|InfillSamplingCriterion]].<br />
<br />
=== Various changes ===<br />
<br />
The default 'error function' is now the root relative square error (= a global relative error) instead of the absolute root mean square error. <br />
<br />
The memory usage has been drastically reduced when performing many runs with multiple datasets (datasets are loaded only once).<br />
<br />
The default settings have been harmonized and much improved. For example the SVM parameter space is now searched in log10 instead of loge. The MinMax measure is now also enabled by default if you do not specify any other measure. This means that if you specify minimum and maximum bounds in the simulator xml file, models which do not respect these bounds are penalized.<br />
<br />
Finally this release has seen countless cleanups, bug fixes and feature enhancements.</div>Dgorissenhttp://sumowiki.intec.ugent.be/index.php?title=About&diff=5122About2010-06-13T14:37:46Z<p>Dgorissen: /* Documentation */</p>
<hr />
<div>== History ==<br />
In 2004, research within the (former) COMS research group, led by professor [http://www.sumo.intec.ugent.be/?q=tomd Tom Dhaene], was focused on developing efficient, adaptive and accurate algorithms for polynomial and rational modeling of linear time-invariant (LTI) systems. This work resulted in a set of Matlab scripts that were used as a testing ground for new ideas and concepts. Research progressed, and with time these scripts were re-worked and refactored into one coherent Matlab toolbox, tentatively named the Multivariate MetaModeling (M3) Toolbox. The first public release of the toolbox (v2.0) occurred in November 2006. In October 2007, the development of the M3 Toolbox was discontinued.<br />
<br />
In April 2008, the first public release of the Surrogate Modeling (SUMO) Toolbox (v5.0) occurred.<br />
<br />
For a list of changes since then refer to the [[Changelog]] and [[Whats new]] pages.<br />
<br />
== Intended use ==<br />
<br />
=== Global Surrogate Models ===<br />
The SUMO Toolbox was originally designed to solve the following problem:<br />
<br />
<center>''Automatically generate a highly accurate surrogate model (= a regression model) for a computational expensive simulation code<br />
<br>requiring as little data points and as little user-interaction as possible.''</center><br />
<br />
In addition the toolbox provides powerful, adaptive algorithms and a whole suite of model types for<br />
* data fitting problems (regression, function approximation, curve fitting)<br />
* response surface modeling (RSM)<br />
* scattered data interpolation<br />
* model selection<br />
* Design Of Experiments (DoE)<br />
* model parameter optimization, e.g., finding the optimal neural network topology, SVM kernel parameters, rational function order, etc. (= hyperparameter optimization)<br />
* iterative adaptive sample selection (also known as sequential design or active learning)<br />
<br />
Note that the SUMO toolbox is able to drive the simulation code directly.<br />
<br />
For domain experts or engineers the SUMO Toolbox provides a flexible, pluggable platform to which the response surface modeling task can be delegated. For researchers in surrogate modeling it provides a common framework to implement, test and benchmark new modeling and sampling algorithms.<br />
<br />
See the Wikipedia [http://en.wikipedia.org/wiki/Surrogate_model surrogate model] page to find out more.<br />
<br />
=== Surrogate Driven Optimization ===<br />
While the main focus of the SUMO Toolbox is to create accurate global surrogate models, it can be used for other goals too.<br />
<br />
For instance, the toolbox can be used to create consecutive local surrogate models for optimization purposes. The information obtained from the local surrogate models is used to guide the adaptive sampling process to the global optimum.<br />
<br />
A good sample strategy for surrogate driven optimization seeks a balance between local search and global search, or refining the surrogate model and finding the optimum.<br />
Such a sample strategy is implemented (akin to (Super)EGO), see the different [[Sample_Selectors#expectedImprovement|sample selectors]] for more information.<br />
<br />
=== Dynamic systems or Time series prediction ===<br />
<br />
See [[FAQ#What_about_dynamical.2C_time_dependent_data.3F]].<br />
<br />
=== Classification ===<br />
<br />
See [[FAQ#What_about_classification_problems.3F]].<br />
<br />
== Application range ==<br />
The SUMO Toolbox has already been applied successfully to a wide range of problems from domains as diverse as aerodynamics, geology, metallurgy, electro-magnetics (EM), electronics, engineering and economics. The SUMO Toolbox can be applied to any situation where the problem can be described as a function that maps a set of inputs onto a set of outputs. We generally refer to this function as the [[Simulator]].<br />
<br />
<br />
[[Image:sumotask.png|center|SUMO-Toolbox : Generating an approximation for a reference model]]<br />
<br />
Across the different problems to which we have applied the toolbox, the input dimension has ranged from 1 to 130 and the output dimension from 1 to 70 (including both complex and real valued outputs). The number of data points has ranged from as little as 15 to as many as 100000.<br />
<br />
== Design goals ==<br />
<br />
The SUMO Toolbox was designed with a number of goals in mind:<br />
<br />
* A flexible tool that integrates different modeling methods and does not tie the user down to one particular set of problems. Reliance on domain specific features should be avoided.<br />
<br />
* The focus should be on adaptivity, i.e., relieving the burden on the domain expert as much as possible. Given a simulation model, the software should produce an accurate surrogate model with minimal user interaction. This also includes easily integrating with the existing design environment.<br />
<br />
* At the same time keeping in mind that there is no such thing as a `one-size-fits-all'. Different problems need to be modeled differently and require different a priori process knowledge. Therefore the software should be modular and easily extensible to new methods.<br />
<br />
* Engineers or domain experts do not tend to trust a black box system that generates models but is unclear about the reasons why a particular model should be preferred. Therefore an important design goal was that the expert user should be able to have full manual control over the modeling process if necessary. In addition the toolbox should support fine grain logging and profiling capabilities so its modeling and sampling decisions can be retraced.<br />
<br />
Given this design philosophy, the toolbox can cater to both the researchers working on novel surrogate modeling techniques as well as to the engineers who need the surrogate model as part of their design process. For the former, the toolbox provides a common platform on which to deploy, test, and compare new modeling algorithms and sampling techniques. For the latter, the software functions as a highly configurable and flexible component to which surrogate model construction can be delegated, easing the burden of the user and enhancing productivity.<br />
<br />
== Features ==<br />
The main features of the toolbox are listed below. For an overview of recent changes see the [[Whats new]] page. A detailed list of changes can be found in the [[Changelog]].<br />
<br />
{| class="wikitable" style="text-align:left" border="0" cellpadding="5" cellspacing="0"<br />
! Implementation Language <br />
| Matlab, Java, and where applicable C, C++<br />
|- <br />
! Design patterns<br />
| Fully object oriented, with the focus on clean design and encapsulation.<br />
|- <br />
! Minimum Requirements<br />
| See the [[system requirements]] page<br />
|-<br />
! Supported data sources*<br />
| Local executable/script, simulation engine, Java class, Matlab script, dataset (txt file) (see [[Interfacing with the toolbox]])<br />
|-<br />
! Supported data types<br />
| Supports multi-dimensional inputs and outputs. Outputs can be any combination of real/complex.<br />
|-<br />
! Supported problem types<br />
| Regression ([[FAQ#What_about_classification_problems.3F|classification]], [[FAQ#What_about_dynamical.2C_time_dependent_data.3F|time series prediction]])<br />
|-<br />
! Configuration<br />
| Extensively configurable through one main [[FAQ#What_is_XML.3F|XML]] configuration file.<br />
|-<br />
! Flexibility<br />
| Virtually every component of the modeling process can be configured, replaced or extended by a user specific, custom implementation<br />
|-<br />
! Predefined accuracy<br />
| The toolbox will run until the user required accuracy has been reached, the maximum number of samples has been exceeded or a timeout has occurred<br />
|-<br />
! Model Types*<br />
| Out of the box support for:<br />
* Polynomial/Rational functions<br />
* Feedforward Neural Networks, 3 implementations<br />
** One based on the [http://www.mathworks.com/products/neuralnet/ Matlab Neural Network toolbox]<br />
** One based on the [http://leenissen.dk/fann/ Fast Artificial Neural Network Library (FANN)]<br />
** One based on the [http://www.iau.dtu.dk/research/control/nnsysid.html NNSYSID Toolbox]<br />
* Radial Basis Function (RBF) Models<br />
* RBF Neural Networks<br />
* Gaussian Process Models (based on [http://www.GaussianProcess.org/gpml/code GPML])<br />
* Kriging Models (two custom implementations)<br />
* Blind Kriging Models<br />
* Smoothing spline models<br />
* Support Vector Machines (SVM)<br />
** Least Squares SVM (based on [http://www.esat.kuleuven.ac.be/sista/lssvmlab/ LS-SVMlab])<br />
** epsilon-SVM (based on [http://www.csie.ntu.edu.tw/~cjlin/libsvm/ LIBSVM] or [http://svmlight.joachims.org/ SVMlight])<br />
** nu-SVM (based on [http://www.csie.ntu.edu.tw/~cjlin/libsvm/ LIBSVM])<br />
|-<br />
! Model parameter optimization algorithms*<br />
| Pattern Search, EOG, Simulated Annealing, Genetic Algorithm, BGFS, DIRECT, Particle Swarm Optimization (PSO), NSGA-II ...<br />
|-<br />
! Sample selection algorithms (=sequential design, active learning)*<br />
| Random, error-based, density-based, gradient-based, and many different hybrids<br />
|-<br />
! Experimental design*<br />
| Latin Hypercube Sampling, Central Composite, Box-Behnken, random, user defined, full factorial<br />
|-<br />
! Model selection measures*<br />
| Validation set, cross-validation, leave-one-out, model difference, AIC (also in a multi-objective context, see [[Multi-Objective Modeling]])<br />
|-<br />
! Sample Evaluation*<br />
| On the local machine (taking advantage of multi-core CPUs) or in parallel on a cluster/grid<br />
|-<br />
! Supported distributed middlewares*<br />
| [http://gridengine.sunsource.net/ Sun Grid Engine], LCG Grid middleware (both accessed through a SSH accessible frontnode)<br />
|-<br />
! Logging<br />
| Extensive logging to enable close monitoring of the modeling process. Logging granularity is fully configurable and log streams can be easily redirected (to file, console, a remote machine, ...).<br />
|-<br />
! Profiling*<br />
| Extensive profiling framework for easy gathering (and plotting) of modeling metrics (average sample evaluation time, hyperparameter optimization trace, ...)<br />
|-<br />
! Easy tracking of modeling progress<br />
| Automatic storing of best models and their plots. Ability to automatically generate a movie of the sequence of plots.<br />
|-<br />
! Model browser GUI<br />
| A graphical tool is available to easily visualize high dimensional models and browse through data ([[Model Visualization GUI|more information here]])<br />
|-<br />
! Available test problems*<br />
| Out of the box support for many built-in functions (Ackley, Camel Back, Goldstein-Price, ...) and datasets (Abalone, Boston Housing, FishLength, ...) from various application domains. Including a number of datasets (and some simulation code) from electronics. In total over 50 examples are available.<br />
|-<br />
! License<br />
| [[License terms]]<br />
|}<br />
<br />
<nowiki>*</nowiki> Custom implementations can easily be added<br />
<br />
== Screenshots ==<br />
A number of screenshots to give a feel of the SUMO Toolbox. Note these screenshots do not necessarily reflect the latest toolbox version.<br />
<br />
<gallery><br />
Image:octagon.png<br />
Image:metamodel-sumo-hourglass.png<br />
Image:SUMO_Toolbox1.png<br />
Image:SUMO_Toolbox2.png<br />
Image:SUMO_Toolbox3.png<br />
Image:SUMO_Toolbox4.png<br />
Image:ISCSampleSelector1.png<br />
Image:ISCSampleSelector2.png<br />
Image:SUMO_Gui1.png<br />
Image:SUMO_Gui2.png<br />
Image:Contour1.png<br />
Image:TwoDim1.png<br />
Image:TwoDim2.png<br />
Image:ThreeDim1.png<br />
Image:ThreeDim2.png<br />
Image:ThreeDim3.png<br />
Image:FEBioTrekEI.png<br />
Image:FEBioTrekFunc.png<br />
</gallery><br />
<br />
== Movies ==<br />
<br />
[[Image:youtube-logo.jpg|right|70px|link=http://www.youtube.com/sumolab|]] A number of video clips generated by or related to the SUMO Toolbox [http://www.youtube.com/sumolab can be found at our YouTube channel]. Feel free to make suggestions or leave comments.<br />
<br />
Note these movies do not necessarily reflect the latest toolbox version. Improvements and/or interface adjustments may have been made since then.<br />
<br />
== Documentation ==<br />
<br />
An in depth overview of the rationale and philosophy, including a treatment of the software architecture underlying the SUMO Toolbox is available in the form of a PhD dissertation. A copy of this dissertation [http://www.sumo.intec.ugent.be/?q=system/files/2010_04_PhD_DirkGorissen.pdf is available here].<br />
<br />
In addition the following poster and presentation give a high level overview:<br />
<br />
* Poster: [[Media:SUMO_poster.pdf|SUMO poster]]<br />
* Presentation: [[Media:SUMO_presentation.pdf|SUMO slides]]<br />
<br />
To stay up to date with the latest news and releases, we also recommend subscribing to the [http://www.sumo.intec.ugent.be SUMO newsletter]. <br />
Traffic will be kept to a minimum and you can unsubscribe at any time.<br />
<br />
A blog covering related research can be found here [http://sumolab.blogspot.com http://sumolab.blogspot.com].<br />
<br />
== Citations ==<br />
<br />
See [[Citing|Citing the toolbox]].</div>Dgorissenhttp://sumowiki.intec.ugent.be/index.php?title=FAQ&diff=5121FAQ2010-06-13T14:34:59Z<p>Dgorissen: /* Will there be an R/Scilab/Octave/Sage/.. version? */</p>
<hr />
<div>== General ==<br />
<br />
=== What is a global surrogate model? ===<br />
<br />
A global [http://en.wikipedia.org/wiki/Surrogate_model surrogate model] is a mathematical model that mimics the behavior of a computationally expensive simulation code over '''the complete parameter space''' as accurately as possible, using as little data points as possible. So note that optimization is not the primary goal, although it can be done as a post-processing step. Global surrogate models are useful for:<br />
<br />
* design space exploration, to get a ''feel'' of how the different parameters behave<br />
* sensitivity analysis<br />
* ''what-if'' analysis<br />
* prototyping<br />
* visualization<br />
* ...<br />
<br />
In addition they are a cheap way to model large scale systems, multiple global surrogate models can be chained together in a model cascade.<br />
<br />
See also the [[About]] page.<br />
<br />
=== What about surrogate driven optimization? ===<br />
<br />
When coining the term '''surrogate driven optimization''' most people associate it with trust-region strategies and simple polynomial models. These frameworks first construct a local surrogate which is optimized to find an optimum. Afterwards, a move limit strategy decides how the local surrogate is scaled and/or moved through the input space. Subsequently the surrogate is rebuild and optimized. I.e. the surrogate zooms in to the global optimum. For instance the [http://www.cs.sandia.gov/DAKOTA/ DAKOTA] Toolbox implements such strategies where the surrogate construction is separated from optimization.<br />
<br />
Such a framework was earlier implemented in the SUMO Toolbox but was deprecated as it didn't fit the philosophy and design of the toolbox. <br />
<br />
Instead another, equally powerful, approach was taken. The current optimization framework is in fact a sampling selection strategy that balances local and global search. In other words, it balances between exploring the input space and exploiting the information the surrogate gives us.<br />
<br />
A configuration example can be found [[Config:SampleSelector#expectedImprovement|here]].<br />
<br />
=== What is (adaptive) sampling? Why is it used? ===<br />
<br />
In classical Design of Experiments you need to specify the design of your experiment up-front. Or in other words, you have to say up-front how many data points you need and how they should be distributed. Two examples are Central Composite Designs and Latin Hypercube designs. However, if your data is expensive to generate (e.g., an expensive simulation code) it is not clear how many points are needed up-front. Instead data points are selected adaptively, only a couple at a time. This process of incrementally selecting new data points in regions that are the most interesting is called adaptive sampling, sequential design, or active learning. Of course the sampling process needs to start from somewhere so the very first set of points is selected based on a fixed, classic experimental design. See also [[Running#Understanding_the_control_flow]].<br />
SUMO provides a number of different sampling algorithms: [[SampleSelector]]<br />
<br />
Of course sometimes you dont want to do sampling. For example if you have a fixed dataset you just want to load all the data in one go and model that. For how to do this see [[FAQ#How_do_I_turn_off_adaptive_sampling_.28run_the_toolbox_for_a_fixed_set_of_samples.29.3F]].<br />
<br />
=== What about dynamical, time dependent data? ===<br />
<br />
The original design and purpose was to tackle static input-output systems, where there is no memory. Just a complex mapping that must be learnt and approximated. Of course you can take a fixed time interval and apply the toolbox but that typically is not a desired solution. Usually you are interested in time series prediction, e.g., given a set of output values from time t=0 to t=k, predict what happens at time t=k+1,k+2,...<br />
<br />
The toolbox was originally not intended for this purpose. However, it is quite easy to add support for recurrent models. Automatic generation of dynamical models would involve adding a new model type (just like you would add a new regression technique) or require adapting an existing one. For example it would not be too much work to adapt the ANN or SVM models to support dynamic problems. The only extra work besides that would be to add a new [[Measures|Measure]] that can evaluate the fidelity of the models' prediction.<br />
<br />
Naturally though, you would be unable to use sample selection (since it makes no sense in those problems). Unless of course there is a specialized need for it. In that case you would add a new [[SampleSelector]].<br />
<br />
For more information on this topic [[Contact]] us.<br />
<br />
=== What about classification problems? ===<br />
<br />
The main focus of the SUMO Toolbox is on regression/function approximation. However, the framework for hyperparameter optimization, model selection, etc. can also be used for classification. Starting from version 6.3 a demo file is included in the distribution that shows how this works on a well known test problem. If you want to play around with this feature without waiting for 6.3 to be released [[Contact|just let us know]].<br />
<br />
=== Can the toolbox drive my simulation code directly? ===<br />
<br />
Yes it can. See the [[Interfacing with the toolbox]] page.<br />
<br />
=== What is the difference between the M3-Toolbox and the SUMO-Toolbox? ===<br />
<br />
The SUMO toolbox is a complete, feature-full framework for automatically generating approximation models and performing adaptive sampling. In contrast, the M3-Toolbox was more of a proof-of-principle.<br />
<br />
=== What happened to the M3-Toolbox? ===<br />
<br />
The M3 Toolbox project has been discontinued (Fall 2007) and superseded by the SUMO Toolbox. Please contact tom.dhaene@ua.ac.be for any inquiries and requests about the M3 Toolbox.<br />
<br />
=== How can I stay up to date with the latest news? ===<br />
<br />
To stay up to date with the latest news and releases, we also recommend subscribing to our newsletter [http://www.sumo.intec.ugent.be here]. Traffic will be kept to a minimum (1 message every 2-3 months) and you can unsubscribe at any time.<br />
<br />
You can also follow our blog: [http://sumolab.blogspot.com/ http://sumolab.blogspot.com/].<br />
<br />
=== What is the roadmap for the future? ===<br />
<br />
There is no explicit roadmap since much depends on where our research leads us, what feedback we get, which problems we are working on, etc. However, to get an idea of features to come you can always check the [[Whats new]] page.<br />
<br />
You can also follow our blog: [http://sumolab.blogspot.com/ http://sumolab.blogspot.com/].<br />
<br />
=== Will there be an R/Scilab/Octave/Sage/.. version? ===<br />
<br />
At the start of the project we considered moving from Matlab to one of the available open source alternatives. However, after much discussion we decided against this for several reasons, including:<br />
<br />
* Existing experience and know-how of the development team<br />
* The widespread use of the Matlab platform in the target application domains<br />
* The quality and amount of available Matlab documentation<br />
* The quality and number of Matlab toolboxes<br />
* Support for object orientation (inheritance, polymorphism, etc.)<br />
* Many well documented interfacing options (especially the seamless integration with Java)<br />
<br />
Matlab, as a proprietary platform, definitely has its problems and deficiencies but the number of advanced algorithms and available toolboxes make it a very attractive platform. Equally important is the fact that every function is properly documented, tested, and includes examples, tutorials, and in some cases GUI tools. A lot of things would have been a lot harder and/or time consuming to implement on one of the other platforms. Add to that the fact that many engineers (particularly in aerospace) already use Matlab quite heavily. Thus given our situation, goals, and resources at the time, Matlab was the best choice for us. <br />
<br />
The other platforms remain on our radar however, and we do look into them from time to time. Though, with our limited resources porting to one of those platforms is not (yet) cost effective.<br />
<br />
=== What are collaboration options? ===<br />
<br />
We will gladly help out with any SUMO-Toolbox related questions or problems. However, since we are a university research group the most interesting goal for us is to work towards some joint publication (e.g., we can help with the modeling of your problem). Alternatively, it is always nice if we could use your data/problem (fully referenced and/or anonymized if necessary of course) as an example application during a conference presentation or in a PhD thesis.<br />
<br />
The most interesting case is if your problem involves sample selection and modeling. This means you have some simulation code or script to drive and you want an accurate model while minimizing the number of data points. In this case, in order for us to optimally help you it would be easiest if we could run your simulation code (or script) locally or access it remotely. Else its difficult to give good recommendations about what settings to use.<br />
<br />
If this is not possible (e.g., expensive, proprietary or secret modeling code) or if your problem does not involve sample selection, you can send us a fixed data set that is representative of your problem. Again, this may be fully anonymized and will be kept confidential of course.<br />
<br />
In either case (code or dataset) remember:<br />
<br />
* the data file should be an ASCII file in column format (each row containing one data point) (see also [[Interfacing_with_the_toolbox]])<br />
* include a short description of your data:<br />
** number of inputs and number of outputs<br />
** the range of each input (or scaled to [-1 1] if you do not wish to disclose this)<br />
** if the outputs are real or complex valued<br />
** how noisy the data is or if it is completely deterministic (computer simulation) (please also see: [[FAQ#My_data_contains_noise_can_the_SUMO-Toolbox_help_me.3F]]).<br />
** if possible the expected range of each output (or scaled if you do not wish to disclose this)<br />
** if possible the names of each input/output + a short description of what they mean<br />
** any further insight you have about the data, expected behavior, expected importance of each input, etc.<br />
<br />
If you have any further questions or comments related to this please [[Contact]] us.<br />
<br />
=== Can you help me model my problem? ===<br />
<br />
Please see the previous question: [[FAQ#What_are_collaboration_options.3F]]<br />
<br />
== Installation and Configuration ==<br />
<br />
=== What is the relationship between Matlab and Java? ===<br />
<br />
Many people do not know this, but your Matlab installation automatically includes a Java virtual machine. By default, Matlab seamlessly integrates with Java, allowing you to create Java objects from the command line (e.g., 's = java.lang.String'). It is possible to disable java support but in order to use the SUMO Toolbox it should not be. To check if Java is enabled you can use the 'usejava' command.<br />
<br />
=== What is Java, why do I need it, do I have to install it, etc. ? ===<br />
<br />
The short answer is: no, dont worry about it. The long answer is: Some of the code of the SUMO Toolbox is written in [http://en.wikipedia.org/wiki/Java_(programming_language) Java], since it makes a lot more sense in many situations and is a proper programming language instead of a scripting language like Matlab. Since Matlab automatically includes a JVM to run Java code there is nothing you need to do or worry about (see the previous FAQ entry). Unless its not working of course, in that case see [[FAQ#When_running_the_toolbox_you_get_something_like_.27.3F.3F.3F_Undefined_variable_.22ibbt.22_or_class_.22ibbt.sumo.config.ContextConfig.setRootDirectory.22.27]].<br />
<br />
=== What is XML? ===<br />
<br />
XML stands for eXtensible Markup Language and is related to HTML (= the stuff web pages are written in). The first thing you have to understand is that '''does not do anything'''. Honest. Many engineers are not used to it and think it is some complicated computer programming language-stuff-thingy. This is of course not the case (we ignore some of the fancy stuff you can do with it for now). XML is a markup language meaning, it provides some rules how you can annotate or structure existing text.<br />
<br />
The way SUMO uses XML is really simple and there is not much to understand. First some simple terminology. Take the following example:<br />
<br />
<source lang="xml"><br />
<Foo attr="bar">bla bla bla</Foo> <br />
</source><br />
<br />
Here we have '''a tag''' called ''Foo'' containing text ''bla bla bla''. The tag Foo also has an '''attribute''' ''attr'' with value ''bar''. '<Foo>' is what we call the '''opening tag''', and '</Foo>' is the '''closing tag'''. Each time you open a tag you must close it again. How you name the tags or attributes it totally up to you, you choose :)<br />
<br />
Lets take a more interesting example. Here we have used XML to represent information about a receipe for pancakes:<br />
<br />
<source lang="xml"><br />
<recipe category="dessert"><br />
<title>Pancakes</title><br />
<author>sumo@intec.ugent.be</author><br />
<date>Wed, 14 Jun 95</date><br />
<description><br />
Good old fashioned pancakes.<br />
</description><br />
<ingredients><br />
<item><br />
<amount>3</amount><br />
<type>eggs</type><br />
</item><br />
<br />
<item><br />
<amount>0.5 tablespoon</amount><br />
<type>salt</type><br />
</item><br />
...<br />
</ingredients><br />
<preparation><br />
...<br />
</preparation><br />
</recipe><br />
</source><br />
<br />
So basically, you see that XML is just a way to structure, order, and group information. Thats it! So SUMO basically uses it to store and structure configuration options. And this works well due to the nice hierarchical nature of XML.<br />
<br />
If you understand this there is nothing else to it in order to be able to understand the SUMO configuration files. If you need more information see the tutorial here: [http://www.w3schools.com/XML/xml_whatis.asp http://www.w3schools.com/XML/xml_whatis.asp]. You can also have a look at the wikipedia page here: [http://en.wikipedia.org/wiki/XML http://en.wikipedia.org/wiki/XML]<br />
<br />
=== Why does SUMO use XML? ===<br />
<br />
XML is the defacto standard way of structuring information. This ranges from spreadsheet files (Microsoft Excel for example), to configuration data, to scientific data, ... There are even whole database systems based solely on XML. So basically, its an intuitive way to structure data and it is used everywhere. This makes that there are a very large number of libraries and programming languages available that can parse, and handle XML easily. That means less work for the programmer. Then of course there is stuff like XSLT, XQuery, etc that makes life even easier.<br />
So basically, it would not make sense for SUMO to use any other format :)<br />
<br />
=== I get an error that SUMO is not yet activated ===<br />
<br />
Make sure you installed the activation file that was mailed to you as is explained in the [[Installation]] instructions. Also double check your system meets the [[System requirements]] and that [http://www.sumowiki.intec.ugent.be/index.php/FAQ#When_running_the_toolbox_you_get_something_like_.27.3F.3F.3F_Undefined_variable_.22ibbt.22_or_class_.22ibbt.sumo.config.ContextConfig.setRootDirectory.22.27|java java is enabled]. To fully verify that the activation file installation is correct ensure that the file ContextConfig.class is present in the directory ''<SUMO installation directory>/bin/java/ibbt/sumo/config''.<br />
<br />
Please note that more flexible research licenses are available if it is possible to [[FAQ#What_are_collaboration_options.3F|collaborate in any way]].<br />
<br />
== Upgrading ==<br />
<br />
=== How do I upgrade to a newer version? ===<br />
<br />
Delete your old <code><SUMO-Toolbox-directory></code> completely and replace it by the new one. Install the new activation file / extension pack as before (see [[Installation]]), start Matlab and make sure the default run works. To port your old configuration files to the new version: make a copy of default.xml (from the new version) and copy over your custom changes (from the old version) one by one. This should prevent any weirdness if the XML structure has changed between releases.<br />
<br />
If you had a valid activation file for the previous version, just [[Contact]] us (giving your SUMOlab website username) and we will send you a new activation file. Note that to update an activation file you must first unzip a copy of the toolbox to a new directory and install the activation file as if it was the very first time. Upgrading of an activation file without performing a new toolbox install is (unfortunately) not (yet) supported.<br />
<br />
== Using ==<br />
<br />
=== I have no idea how to use the toolbox, what should I do? ===<br />
<br />
See: [[Running#Getting_started]]<br />
<br />
=== I want to try one of the different examples ===<br />
<br />
See [[Running#Running_different_examples]].<br />
<br />
=== I want to model my own problem ===<br />
<br />
See : [[Adding an example]].<br />
<br />
=== I want to contribute some data/patch/documentation/... ===<br />
<br />
See : [[Contributing]].<br />
<br />
=== How do I interface with the SUMO Toolbox? ===<br />
<br />
See : [[Interfacing with the toolbox]].<br />
<br />
=== What configuration options (model type, sample selection algorithm, ...) should I use for my problem? ===<br />
<br />
See [[General_guidelines]].<br />
<br />
=== Ok, I generated a model, what can I do with it? ===<br />
<br />
See: [[Using a model]].<br />
<br />
=== How can I share a model created by the SUMO Toolbox? ===<br />
<br />
See : [[Using a model#Model_portability| Model portability]].<br />
<br />
=== I dont like the final model generated by SUMO how do I improve it? ===<br />
<br />
Before you start the modeling you should really ask youself this question: ''What properties do I want to see in the final model?'' You have to think about what for you constitutes a good model and what constitutes a poor model. Then you should rank those properties depending on how important you find them. Examples are:<br />
<br />
* accuracy in the training data<br />
** is it important that the error in the training data is exactly 0, or do you prefer some smoothing<br />
* accuracy outside the training data<br />
** this is the validation or test error, how important is proper generalization (usually this is very important)<br />
* what does accuracy mean to you? a low maximum error, a low average error, both, ...<br />
* smoothness<br />
** should your model be perfectly smooth or is it acceptable that you have a few small ripples here and there for example<br />
* are some regions of the response more important than others?<br />
** for example you may want to be certain that the minima/maxima are captured very accurately but everything in between is less important<br />
* are there particular special features that your model should have<br />
** for example, capture underlying poles or discontinuities correctly<br />
* extrapolation capability<br />
* ...<br />
<br />
It is important to note that often these criteria may be conflicting. The classical example is fitting noisy data: the lower your training error the higher your testing error. A natural approach is to combine multiple criteria, see [[Multi-Objective Modeling]].<br />
<br />
Once you have decided on a set of requirements the question is then, can the SUMO-Toolbox produce a model that meets them? In SUMO model generation is driven by one or more [[Measures]]. So you should choose the combination of [[Measures]] that most closely match your requirements. Of course we can not provide a Measure for every single property, but it is very straightforward to [[Add_Measure|add your own Measure]].<br />
<br />
Now, lets say you have chosen what you think are the best Measures but you are still not happy with the final model. Reasons could be:<br />
<br />
* you need more modeling iterations or you need to build more models per iteration (see [[Running#Understanding_the_control_flow]]). This will result in a more extensive search of the model parameter space, but will take longer to run.<br />
* you should switch to a different model parameter optimization algorithm (e.g., for example instead of the Pattern Search variant, try the Genetic Algorithm variant of your AdaptiveModelBuilder.)<br />
* the model type you are using is not ideally suited to your data<br />
* there simply is not enough data, use a larger initial design or perform more sampling iterations to get more information per dimension<br />
* maybe the sample distribution is causing troubles for your model (e.g., Kriging can have problems with clustered data). In that case it could be worthwhile to choose a different sample selection algorithm.<br />
* the range of your response variable is not ideal (for example, neural networks have trouble modeling data if the range of the outputs is very very small)<br />
<br />
You may also refer to the following [[General_guidelines]]. Finally, of course it may be that your problem is simply a very difficult one and does not approximate well. But, still you should at least get something satisfactory.<br />
<br />
If you are having these kinds of problems, please [[Reporting_problems|let us know]] and we will gladly help out.<br />
<br />
=== My data contains noise can the SUMO-Toolbox help me? ===<br />
<br />
The original purpose of the SUMO-Toolbox was for it to be used in conjunction with computer simulations. Since these are fully deterministic you do not have to worry about noise in the data and all the problems it causes. However, the methods in the toolbox are general fitting methods that work on noisy data as well. So yes, the toolbox can be used with noisy data, but you will just have to be more careful about how you apply the methods and how you perform model selection. Its only when you use the toolbox with a noisy simulation engine that a few special options may need to be set. In that case [[Contact]] us for more information.<br />
<br />
Note though, that the toolbox is not a statistical package, if you have noisy data and you need noise estimation algorithms, kernel smoothing algorithms, etc. you should look towards other tools.<br />
<br />
=== What is the difference between a ModelBuilder and a ModelFactory? ===<br />
<br />
See [[Add Model Type]].<br />
<br />
=== Why are the Neural Networks so slow? ===<br />
<br />
The ANN models are an extremely powerful model type that give very good results in many problems. However, they are quite slow to use. There are some things you can do:<br />
<br />
* use trainlm or trainscg instead of the default training function trainbr. trainbr gives very good, smooth results but is slower to use. If results with trainlm are not good enough, try using msereg as a performance function.<br />
* try setting the training goal (= the SSE to reach during training) to a small positive number (e.g., 1e-5) instead of 0.<br />
* check that the output range of your problem is not very small. If your response data lies between 10e-5 and 10e-9 for example it will be very hard for the neural net to learn it. In that case rescale your data to a more sane range.<br />
* switch from ANN to one of the other neural network modelers: fanngenetic or nanngenetic. These are a lot faster than the default backend based on the [http://www.mathworks.com/products/neuralnet/ Matlab Neural Network Toolbox]. However, the accuracy is usually not as good.<br />
* If you are using [[Measures#CrossValidation| CrossValidation]] try to switch to a different measure since CrossValidation is very expensive to use. CrossValidation is used by default if you have not defined a [[Measures| measure]] yourself. When using one of the neural network model types, try to use a different measure if you can. For example, our tests have shown that minimizing the sum of [[Measures#SampleError| SampleError]] and [[Measures#LRMMeasure| LRMMeasure]] can give equal or even better results than CrossValidation, while being much cheaper (see [[Multi-Objective Modeling]] for how to combine multiple measures). See also the comments in <code>default.xml</code> for examples.<br />
<br />
See also [[FAQ#How_can_I_make_the_toolbox_run_faster.3F]]<br />
<br />
=== How can I make the toolbox run faster? ===<br />
<br />
There are a number of things you can do to speed things up. These are listed below. Remember though that the main reason the toolbox may seem to be slow is due to the many models being built as part of the hyperparameter optimization. Please make sure you fully understand the [[Running#Understanding_the_control_flow|control flow described here]] before trying more advanced options.<br />
<br />
* First of all check that your virus scanner is not interfering with Matlab. If McAfee or any other program wants to scan every file SUMO generates this really slows things down and your computer becomes unusable.<br />
<br />
* Turn off the plotting of models in [[Config:ContextConfig#PlotOptions| ContextConfig]], you can always generate plots from the saved mat files<br />
<br />
* This is an important one. For most model builders there is an option "maxFunEals", "maxIterations", or equivalent. Change this value to change the maximum number of models built between 2 sampling iterations. The higher this number, the slower, but the better the models ''may'' be. Equivalently, for the Genetic model builders reduce the population size and the number of generations.<br />
<br />
* If you are using [[Measures#CrossValidation]] see if you can avoid it and use one of the other measures or a combination of measures (see [[Multi-Objective Modeling]]<br />
<br />
* If you are using a very dense [[Measures#ValidationSet]] as your Measure, this means that every single model will be evaluated on that data set. For some models like RBF, Kriging, SVM, this can slow things down.<br />
<br />
* Disable some, or even all of the [[Config:ContextConfig#Profiling| profilers]] or disable the output handlers that draw charts. For example, you might use the following configuration for the profilers:<br />
<br />
<source lang="xml"><br />
<Profiling><br />
<Profiler name=".*share.*|.*ensemble.*|.*Level.*" enabled="true"><br />
<Output type="toImage"/><br />
<Output type="toFile"/><br />
</Profiler><br />
<br />
<Profiler name=".*" enabled="true"><br />
<Output type="toFile"/><br />
</Profiler><br />
</Profiling><br />
</source><br />
<br />
The ".*" means match any one or more characters ([http://java.sun.com/j2se/1.5.0/docs/api/java/util/regex/Pattern.html see here for the full list of supported wildcards]). Thus in this example all the profilers that have "share", "ensemble", or "Level" in their name shoud be enabled and should be saved as a text file (toFile) AND as an image file (toImage). All the other profilers should be saved just to file. The idea is to only save to image what you want as an image since image generation is expensive. If you do this or switch off image generation completely you will see everything run much faster.<br />
<br />
* Decrease the logging granularity, a log level of FINE (the default is FINEST or ALL) is more then granular enough. Setting it to FINE, INFO, or even WARNING should speed things up.<br />
<br />
* If you have a multi-core/multi-cpu machine:<br />
** if you have the Matlab Parallel Computing Toolbox, try setting the parallelMode option to true in [[Config:ContextConfig]]. Now all model training occurs in parallel. This may give unexpected errors in some cases so beware when using.<br />
** if you are using a native executable or script as the sample evaluator set the threadCount variable in [[Config:SampleEvaluator#LocalSampleEvaluator| LocalSampleEvaluator]] equal to the number of cores/CPUs (only do this if it is ok to start multiple instances of your simulation script in parallel!)<br />
<br />
* Dont use the Min-Max measure, it can slow things down. See also [[FAQ#How_do_I_force_the_output_of_the_model_to_lie_in_a_certain_range]]<br />
<br />
* If you are using neural networks see [[FAQ#Why_are_the_Neural_Networks_so_slow.3F]]<br />
<br />
* If you are having problems with very slow or seemingly hanging runs:<br />
** Do a run inside the [http://www.mathworks.com/access/helpdesk/help/techdoc/index.html?/access/helpdesk/help/techdoc/matlab_env/f9-17018.html&http://www.google.be/search?client=firefox-a&rls=org.mozilla%3Aen-US%3Aofficial&channel=s&hl=nl&q=matlab+profiler&meta=&btnG=Google+zoeken Matlab profiler] and see where most time is spent.<br />
<br />
** Monitor CPU and physical/virtual memory usage while the SUMO toolbox is running and see if you notice anything strange. <br />
<br />
* Also note that by default Matlab only allocates about 117 MB memory space for the Java Virtual Machine. If you would like to increase this limit (which you should) please follow the instructions [http://www.mathworks.com/support/solutions/data/1-18I2C.html?solution=1-18I2C here]. See also the general memory instructions [http://www.mathworks.com/support/tech-notes/1100/1106.html here].<br />
<br />
To check if your SUMO run has hanged, monitor your log file (with the level set at least to FINE). If you see no changes for about 30 minutes the toolbox will probably have stalled. [[Reporting problems| report the problems here]].<br />
<br />
Such problems are hard to identify and fix so it is best to work towards a reproducible test case if you think you found a performance or scalability issue.<br />
<br />
=== How do I build models with more than one output ===<br />
<br />
Sometimes you have multiple responses that you want to model at once. See [[Running#Models_with_multiple_outputs]]<br />
<br />
=== How do I turn off adaptive sampling (run the toolbox for a fixed set of samples)? ===<br />
<br />
See : [[Adaptive Modeling Mode]].<br />
<br />
=== How do I change the error function (relative error, RMSE, ...)? ===<br />
<br />
The [[Measures| <Measure>]] tag specifies the algorithm to use to assign models a score, e.g., [[Measures#CrossValidation| CrossValidation]]. It is also possible to specify which '''error function''' to use, in the measure. The default error function is '<code>rootRelativeSquareError</code>'.<br />
<br />
Say you want to use [[Measures#CrossValidation| CrossValidation]] with the maximum absolute error, then you would put:<br />
<br />
<source lang="xml"><br />
<Measure type="CrossValidation" target="0.001" errorFcn="maxAbsoluteError"/><br />
</source><br />
<br />
On the other hand, if you wanted to use the [[Measures#ValidationSet| ValidationSet]] measure with a relative root-mean-square error you would put:<br />
<br />
<source lang="xml"><br />
<Measure type="ValidationSet" target="0.001" errorFcn="relativeRms"/><br />
</source><br />
<br />
The default error function is '<code>rootRelativeSquareError</code>'. These error functions can be found in the <code>src/matlab/tools/errorFunctions</code> directory. You are free to modify them and add your own. Remember that the choice of error function is very important! Make sure you think well about it. Also see [[Multi-Objective Modeling]].<br />
<br />
=== How do I enable more profilers? ===<br />
<br />
Go to the [[Config:ContextConfig#Profiling| <Profiling>]] tag and put <code>"<nowiki>.*</nowiki>"</code> as the regular expression. See also the next question.<br />
<br />
=== What regular expressions can I use to filter profilers? ===<br />
<br />
See the syntax [http://java.sun.com/j2se/1.5.0/docs/api/java/util/regex/Pattern.html here].<br />
<br />
=== How can I ensure deterministic results? ===<br />
<br />
See : [[Random state]].<br />
<br />
=== How do I get a simple closed-form model (symbolic expression)? ===<br />
<br />
See : [[Using a model]].<br />
<br />
=== How do I enable the Heterogenous evolution to automatically select the best model type? ===<br />
<br />
Simply use the [[Config:AdaptiveModelBuilder#heterogenetic| heterogenetic modelbuilder]] as you would any other.<br />
<br />
=== What is the combineOutputs option? ===<br />
<br />
See [[Running#Models_with_multiple_outputs]]<br />
<br />
=== What error function should I use? ===<br />
<br />
The default error function is the Root Relative Square Error (RRSE). On the other hand meanRelativeError may be more intuitive but in that case you have to be careful if you have function values close to zero since in that case the relative error explodes or even gives infinity. You could also use one of the combined relative error functions (contain a +1 in the denominator to account for small values) but then you get something between a relative and absolute error (=> hard to interpret).<br />
<br />
So to be sure an absolute error seems the safest bet (like the RMSE), however in that case you have to come up with sensible accuracy targets and realize that you will build models that try to fit the regions of high absolute value better than the low ones.<br />
<br />
Picking an error function is a very tricky business and many people do not realize this. Which one is best for you and what targets you use ultimately depends on your application and on what kind of model you want. There is no general answer.<br />
<br />
A recommended read is [http://www.springerlink.com/content/24104526223221u3/ is this paper]. See also the page on [[Multi-Objective Modeling]].<br />
<br />
=== I just want to generate an initial design (no sampling, no modeling) ===<br />
<br />
Do a regular SUMO run, except set the 'maxModelingIterations' in the SUMO tag to 0. The resulting run will only generate (and evaluate) the initial design and save it to samples.txt in the output directory.<br />
<br />
=== How do I start a run with the samples of of a previous run, or with a custom initial design? ===<br />
<br />
Use a Dataset design component, for example:<br />
<br />
<source lang="xml"><br />
<InitialDesign type="DatasetDesign"><br />
<Option key="file" value="/path/to/the/file/containing/the/points.txt"/><br />
</InitialDesign><br />
</source><br />
<br />
=== What is a level plot? ===<br />
<br />
A level plot is a plot that shows how the error histogram changes as the best model improves. An example is:<br />
<gallery><br />
Image:levelplot.png<br />
</gallery><br />
Level plots only work if you have a separate dataset (test set) that the model can be checked against. See the comments in default.xml for how to enable level plots.<br />
<br />
===I am getting a java out of memory error, what happened?===<br />
Datasets are loaded through java. This means that the java heap space is used for storing the data. If you try to load a huge dataset (> 50MB), you might experience problems with the maximum heap size. You can solve this by raising the heap size as described on the following webpage:<br />
[http://www.mathworks.com/support/solutions/data/1-18I2C.html]<br />
<br />
=== How do I force the output of the model to lie in a certain range ===<br />
<br />
See [[Measures#MinMax]].<br />
<br />
=== My problem is high dimensional and has a lot of input parameters (more than 10). Can I use SUMO? ===<br />
<br />
That depends. Remember that the main focus of SUMO is to generate accurate 'global' models. If you want to do sampling the practical dimensionality is limited to around 6-8 (though it depends on the problem and how cheap the simulations are!). Since the more dimensions the more space you need to fill. At that point you need to see if you can extend the models with domain specific knowledge (to improve performance) or apply a dimensionality reduction method ([[FAQ#Can_the_toolbox_tell_me_which_are_the_most_important_inputs_.28.3D_variable_selection.29.3F|see the next question]]). On the other hand, if you don't need to do sample selection but you have a fixed dataset which you want to model. Then the performance on high dimensional data just depends on the model type. For examples SVM type models are independent of the dimension and thus can always be applied. Though things like feature selection are always recommended.<br />
<br />
=== Can the toolbox tell me which are the most important inputs (= variable selection)? ===<br />
<br />
When tackling high dimensional problems a crucial question is "Are all my input parameters relevant?". Normally domain knowledge would answer this question but this is not always straightforward. In those cases a whole set of algorithms exist for doing dimensionality reduction (= feature selection). Support for some of these algorithms may eventually make it into the toolbox but are not currently implemented. That is a whole PhD thesis on its own. However, if a model type provides functions for input relevance determination the toolbox can leverage this. For example, the LS-SVM model available in the toolbox supports Automatic Relevance Determination (ARD). This means that if you use the SUMO Toolbox to generate an LS-SVM model, you can call the function ''ARD()'' on the model and it will give you a list of the inputs it thinks are most important.<br />
<br />
=== Should I use a Matlab script or a shell script for interfacing with my simulation code? ===<br />
<br />
When you want to link SUMO with an external simulation engine (ADS Momentum, SPECTRE, FEBIO, SWAT, ...) you need a [http://en.wikipedia.org/wiki/Shell_script shell script] (or executable) that can take the requested points from SUMO, setup the simulation engine (e.g., set necessary input files), calls the simulator for all the requested points, reads the output (e.g., one or more output files), and returns the results to SUMO (see [[Interfacing with the toolbox]]).<br />
<br />
Which one you choose (matlab script + [[Config:SampleEvaluator#matlab|Matlab Sample Evaluator]], or shell script/executable with [[Config:SampleEvaluator#local|Local Sample Evaluator]] is basically a matter of preference, take whatever is easiest for you.<br />
<br />
HOWEVER, there is one important consideration: Matlab does not support threads so this means that if you use a matlab script to interface with the simulation engine, simulations and modeling will happen sequentially, NOT in parallel. This means the modeling code will sit around waiting, doing nothing, until the simulation(s) have finished. If your simulation code takes a long time to run this is not very efficient. In version 6.2 we will probably fix this by using the Parallel Computing Toolbox.<br />
<br />
On the other hand, using a shell script/executable, does allow the modeling and simulation to occur in parallel (at least if you wrote your interface script in such a way that it can be run multiple times in parallel, i.e., no shared global directories or variables that can cause [http://en.wikipedia.org/wiki/Race_condition race conditions]).<br />
<br />
As a sidenote, note that if you already put work into a Matlab script, it is still possible to use a shell script, by writing a shell script that starts Matlab (using -nodisplay or -nojvm options), executes your script (using the -r option), and exits Matlab again. Of course it is not very elegant and adds some overhead but depending on your situation it may be worth it.<br />
<br />
=== Is there any design documentation available? ===<br />
<br />
There is a PhD thesis fully describing the software architecture and design rationale behind the toolbox. It will be put online in the future. Until then you can [[Contact]] us to obtain a copy.<br />
<br />
== Troubleshooting ==<br />
<br />
=== I have a problem and I want to report it ===<br />
<br />
See : [[Reporting problems]].<br />
<br />
=== I sometimes get flat models when using rational functions ===<br />
<br />
First make sure the model is indeed flat, and does not just appear so on the plot. You can verify this by looking at the output axis range and making sure it is within reasonable bounds. When there are poles in the model, the axis range is sometimes stretched to make it possible to plot the high values around the pole, causing the rest of the model to appear flat. If the model contains poles, refer to the next question for the solution.<br />
<br />
The [[Config:AdaptiveModelBuilder#rational| RationalModel]] tries to do a least squares fit, based on which monomials are allowed in numerator and denominator. We have experienced that some models just find a flat model as the best least squares fit. There are two causes for this:<br />
<br />
* The number of sample points is few, and the model parameters (as explained [[Model types explained#PolynomialModel|here]]) force the model to use only a very small set of degrees of freedom. The solution in this case is to increase the minimum percentage bound in the RationalFactory section of your configuration file: change the <code>"percentBounds"</code> option to <code>"60,100"</code>, <code>"80,100"</code>, or even <code>"100,100"</code>. A setting of <code>"100,100"</code> will force the polynomial models to always exactly interpolate. However, note that this does not scale very well with the number of samples (to counter this you can set <code>"maxDegrees"</code>). If, after increasing the <code>"percentBounds"</code> you still get weird, spiky, models you simply need more samples or you should switch to a different model type.<br />
* Another possibility is that given a set of monomial degrees, the flat function is just the best possible least squares fit. In that case you simply need to wait for more samples.<br />
* The measure you are using is not accurately estimating the true error, try a different measure or error function. Note that a maximum relative error is dangerous to use since a the 0-function (= a flat model) has a lower maximum relative error than a function which overshoots the true behavior in some places but is otherwise correct.<br />
<br />
=== When using rational functions I sometimes get 'spikes' (poles) in my model ===<br />
<br />
When the denominator polynomial of a rational model has zeros inside the domain, the model will tend to infinity near these points. In most cases these models will only be recognized as being `the best' for a short period of time. As more samples get selected these models get replaced by better ones and the spikes should disappear.<br />
<br />
So, it is possible that a rational model with 'spikes' (caused by poles inside the domain) will be selected as best model. This may or may not be an issue, depending on what you want to use the model for. If it doesn't matter that the model is very inaccurate at one particular, small spot (near the pole), you can use the model with the pole and it should perform properly.<br />
<br />
However, if the model should have a reasonable error on the entire domain, several methods are available to reduce the chance of getting poles or remove the possibility altogether. The possible solutions are:<br />
<br />
* Simply wait for more data, usually spikes disappear (but not always).<br />
* Lower the maximum of the <code>"percentBounds"</code> option in the RationalFactory section of your configuration file. For example, say you have 500 data points and if the maximum of the <code>"percentBounds"</code> option is set to 100 percent it means the degrees of the polynomials in the rational function can go up to 500. If you set the maximum of the <code>"percentBounds"</code> option to 10, on the other hand, the maximum degree is set at 50 (= 10 percent of 500). You can also use the <code>"maxDegrees"</code> option to set an absolute bound.<br />
* If you roughly know the output range your data should have, an easy way to eliminate poles is to use the [[Measures#MinMax| MinMax]] [[Measures| Measure]] together with your current measure ([[Measures#CrossValidation| CrossValidation]] by default). This will cause models whose response falls outside the min-max bounds to be penalized extra, thus spikes should disappear.<br />
* Use a different model type (RBF, ANN, SVM,...), as spikes are a typical problem of rational functions.<br />
* Increase the population size if using the genetic version<br />
* Try using the [[SampleSelector#RationalPoleSuppressionSampleSelector| RationalPoleSuppressionSampleSelector]], it was designed to get rid of this problem more quickly, but it only selects one sample at the time.<br />
<br />
However, these solutions may not still not suffice in some cases. The underlying reason is that the order selection algorithm contains quite a lot of randomness, making it prone to over-fitting. This issue is being worked on but will take some time. Automatic order selection is not an easy problem<br />
<br />
=== There is no noise in my data yet the rational functions don't interpolate ===<br />
<br />
[[FAQ#I sometimes get flat models when using rational functions |see this question]].<br />
<br />
=== When loading a model from disk I get "Warning: Class ':all:' is an unknown object class. Object 'model' of this class has been converted to a structure." ===<br />
<br />
You are trying to load a model file without the SUMO Toolbox in your Matlab path. Make sure the toolbox is in your Matlab path. <br />
<br />
In short: Start Matlab, run <code><SUMO-Toolbox-directory>/startup.m</code> (to ensure the toolbox is in your path) and then try to load your model.<br />
<br />
=== When running the SUMO Toolbox you get an error like "No component with id 'annpso' of type 'adaptive model builder' found in config file." ===<br />
<br />
This means you have specified to use a component with a certain id (in this case an AdaptiveModelBuilder component with id 'annpso') but a component with that id does not exist further down in the configuration file (in this particular case 'annpso' does not exist but 'anngenetic' or 'ann' does, as a quick search through the configuration file will show). So make sure you only declare components which have a definition lower down. So see which components are available, simply scroll down the configuration file and see which id's are specified. Please also refer to the [[Toolbox configuration#Declarations and Definitions | Declarations and Definitions]] page.<br />
<br />
=== When using NANN models I sometimes get "Runtime error in matrix library, Choldc failed. Matrix not positive definite" ===<br />
<br />
This is a problem in the mex implementation of the [http://www.iau.dtu.dk/research/control/nnsysid.html NNSYSID] toolbox. Simply delete the mex files, the Matlab implementation will be used and this will not cause any problems.<br />
<br />
=== When using FANN models I sometimes get "Invalid MEX-file createFann.mexa64, libfann.so.2: cannot open shared object file: No such file or directory." ===<br />
<br />
This means Matlab cannot find the [http://leenissen.dk/fann/ FANN] library itself to link to dynamically. Make sure it is in your library path, ie, on unix systems, make sure it is included in LD_LIBRARY_PATH.<br />
<br />
=== When trying to use SVM models I get 'Error during fitness evaluation: Error using ==> svmtrain at 170, Group must be a vector' ===<br />
<br />
You forgot to build the SVM mex files for your platform. For windows they are pre-compiled for you, on other systems you have to compile them yourself with the makefile.<br />
<br />
=== When running the toolbox you get something like '??? Undefined variable "ibbt" or class "ibbt.sumo.config.ContextConfig.setRootDirectory"' ===<br />
<br />
First see [[FAQ#What_is_the_relationship_between_Matlab_and_Java.3F | this FAQ entry]].<br />
<br />
This means Matlab cannot find the needed Java classes. This typically means that you forgot to run 'startup' (to set the path correctly) before running the toolbox (using 'go'). So make sure you always run 'startup' before running 'go' and that both commands are always executed in the toolbox root directory.<br />
<br />
If you did run 'startup' correctly and you are still getting an error, check that Java is properly enabled:<br />
<br />
# typing 'usejava jvm' should return 1 <br />
# typing 's = java.lang.String', this should ''not'' give an error<br />
# typing 'version('-java')' should return at least version 1.5.0<br />
<br />
If (1) returns 0, then the jvm of your Matlab installation is not enabled. Check your Matlab installation or startup parameters (did you start Matlab with -nojvm?)<br />
If (2) fails but (1) is ok, there is a very weird problem, check the Matlab documentation.<br />
If (3) returns a version before 1.5.0 you will have to upgrade Matlab to a newer version or force Matlab to use a custom, newer, jvm (See the Matlab docs for how to do this).<br />
<br />
=== You get errors related to ''gaoptimset'',''psoptimset'',''saoptimset'',''newff'' not being found or unknown ===<br />
<br />
You are trying to use a component of the SUMO toolbox that requires a Matlab toolbox that you do not have. See the [[System requirements]] for more information.<br />
<br />
=== After upgrading I get all kinds of weird errors or warnings when I run my XML files ===<br />
<br />
See [[FAQ#How_do_I_upgrade_to_a_newer_version.3F]]<br />
<br />
=== I get a warning about duplicate samples being selected, why is this? ===<br />
<br />
Sometimes, in special circumstances, multiple sample selectors may select the same sample at the same time. Even though in most cases this is detected and avoided, it can still happen when multiple outputs are modelled in one run, and each output is sampled by a different sample selector. These sample selectors may then accidentally choose the same new sample location.<br />
<br />
=== I sometimes see the error of the best model go up, shouldn't it decrease monotonically? ===<br />
<br />
There is no short answer here, it depends on the situation. Below 'single objective' refers to the case where during the hyperparameter optimization (= the modeling iteration) combineOutputs=false, and there is only a single measure set to 'on'. The other cases are classified as 'multi objective'. See also [[Multi-Objective Modeling]].<br />
<br />
# '''Sampling off'''<br />
## ''Single objective'': the error should always decrease monotonically, you should never see it rise. If it does [[reporting problems|report it as a bug]]<br />
## ''Multi objective'': There is a very small chance the error can temporarily decrease but it should be safe to ignore. In this case it is best to use a multi objective enabled modeling algorithm<br />
# '''Sampling on'''<br />
## ''Single objective'': inside each modeling iteration the error should always monotonically decrease. At each sampling iteration the best models are updated (to reflect the new data), thus there the best model score may increase, this is normal behavior(*). It is possible that the error increases for a short while, but as more samples come in it should decrease again. If this does not happen you are using a poor measure or poor hyperparameter optimization algorithm, or there is a problem with the modeling technique itself (e.g., clustering in the datapoints is causing numerical problems).<br />
## ''Multi objective'': Combination of 1.2 and 2.1.<br />
<br />
(*) This is normal if you are using a measure like cross validation that is less reliable on little data than on more data. However, in some cases you may wish to override this behavior if you are using a measure that is independent of the number of samples the model is trained with (e.g., a dense, external validation set). In this case you can force a monotonic decrease by setting the 'keepOldModels' option in the SUMO tag to true. Use with caution!<br />
<br />
=== At the end of a run I get Undefined variable "ibbt" or class "ibbt.sumo.util.JpegImagesToMovie.createMovie" ===<br />
<br />
This is normal, the warning printed out before the error explains why:<br />
<br />
''[WARNING] jmf.jar not found in the java classpath, movie creation may not work! Did you install the SUMO extension pack? Alternatively you can install the java media framwork from java.sun.com''<br />
<br />
By default, at the end of a run, the toolbox will try to generate a movie of all the intermediate model plots. To do this it requires the extension pack to be installed (you can download it from the SUMO lab website). So install the extension pack and you will no longer get the error. Alternatively you can simply set the "createMovie" option in the <SUMO> tag to "false".<br />
So note that there is nothing to worry about, everything has run correctly, it is just the movie creation that is failing.<br />
<br />
=== On startup I get the error "java.io.IOException: Couldn't get lock for output/SUMO-Toolbox.%g.%u.log" ===<br />
<br />
This error means that SUMO is unable to create the log file. Check the output directory exists and has the correct permissions. If your output directory is on a shared (network) drive this could also cause problems. Also make sure you are running the toolbox (calling 'go') from the toolbox root directory, and not in some toolbox sub directory! This is very important.<br />
<br />
If you still have problems you can override the default logfile name and location as follows:<br />
<br />
In the <FileHandler> tag inside the <Logging> tag add the following option:<br />
<br />
<code><br />
<Option key="Pattern" value="My_SUMO_Log_file.log"/><br />
</code><br />
<br />
This means that from now on the sumo log file will be saved as the file "My_SUMO_Log_file.log" in the SUMO root directory. You can use any path you like.<br />
For more information about this option see [http://java.sun.com/j2se/1.4.2/docs/api/java/util/logging/FileHandler.html the FileHandler Javadoc].<br />
<br />
=== The Toolbox crashes with "Too many open files" what should I do? ===<br />
<br />
This is a known bug, see [[Known_bugs#Version_6.1]].<br />
<br />
If this does not fix your problem then do the following:<br />
<br />
On Windows try increasing the limit in windows as dictated by the error message. Also, when you get the error, use the fopen("all") command to see which files are open and send us the list of filenames. Then we can maybe further help you debug the problem. Even better would be to use the Process Explorer utility [http://technet.microsoft.com/en-us/sysinternals/bb896653.aspx available here]. When you get the error, dont shut down Matlab but start Process explorer and see which SUMO-Toolbox related files are open. If you then [[Reporting_problems|let us know]] we can further debug the problem.<br />
<br />
On Linux again don't shut down Matlab but:<br />
<br />
* open a new terminal window<br />
* type:<br />
<source lang="bash"><br />
lsof > openFiles.txt<br />
</source><br />
* Then [[Contact|send us]] the following information:<br />
** the file openFiles.txt <br />
** the exact Linux distribution you are using (Red Hat 10, CentOS 5, SUSE 11, etc).<br />
** the output of<br />
<source lang="bash"><br />
uname -a ; df -T ; mount<br />
</source><br />
<br />
As a temporary workaround you can try increasing the maximum number of open files ([http://www.linuxforums.org/forum/redhat-fedora-linux-help/64716-where-chnage-file-max-permanently.html see for example here]). We are currently debugging this issue.<br />
<br />
In general: to be safe it is always best to do a SUMO run from a clean Matlab startup, especially if the run is important or may take a long time.<br />
<br />
=== When using the LS-SVM models I get lots of warnings: "make sure lssvmFILE.x (lssvmFILE.exe) is in the current directory, change now to MATLAB implementation..." ===<br />
<br />
The LS-SVMs have a C implementation and a Matlab implementation. If you dont have the compiled mex files it will use the matlab implementation and give a warning. But everything will work properly. To get rid of the warnings, compile the mex files [[Installation#Windows|as described here]], this can be done very easily. Or simply comment out the lines that produce the output in the lssvmlab directory in src/matlab/contrib.<br />
<br />
=== I get an error "Undefined function or method 'trainlssvm' for input arguments of type 'cell'" ===<br />
<br />
You most likely forgot to [[Installation#Extension_pack|install the extension pack]].<br />
<br />
=== When running the SUMO-Toolbox under Linux, the [http://en.wikipedia.org/wiki/X_Window_System X server] suddenly restarts and I am logged out of my session ===<br />
<br />
Note that in Linux there is an explicit difference between the [http://en.wikipedia.org/wiki/Linux_kernel kernel] and the [http://en.wikipedia.org/wiki/X_Window_System X display server]. If the kernel crashes or panics your system completely freezes (you have to reset manually) or your computer does a full reboot. Luckily this is very rare. However, if you display server (X) crashes or restarts it means your operating system is still running fine, its just that you have to log in again since your graphical session has terminated. The FAQ entry is only for the latter. If you find your kernel is panicing or freezing, that is a more fundamental problem and you should contact your system admin.<br />
<br />
So what happens is that after a few seconds when the toolbox wants to plot the first model [http://en.wikipedia.org/wiki/X_Window_System X] crashes and you are suddenly presented with a login screen. The problem is not due to SUMO but rather to the Matlab - Display server interaction.<br />
<br />
What you should first do is set plotModels to false in the [[Config:ContextConfig]] tag, run again and see if the problem occurs again. If it does please [[Reporting_problems| report it]]. If the problem does not occur you can then try the following:<br />
<br />
* Log in as root (or use [http://en.wikipedia.org/wiki/Sudo sudo])<br />
* Edit the following configuration file using a text editor (pico, nano, vi, kwrite, gedit,...)<br />
<br />
<source lang="bash"><br />
/etc/X11/xorg.conf<br />
</source><br />
<br />
Note: the exact location of the xorg.conf file may vary on your system.<br />
<br />
* Look for the following line:<br />
<br />
<source lang="bash"><br />
Load "glx"<br />
</source><br />
<br />
* Comment it out by replacing it by:<br />
<br />
<source lang="bash"><br />
# Load "glx"<br />
</source><br />
<br />
* Then save the file, restart your X server (if you do not know how to do this simply reboot your computer)<br />
* Log in again, and try running the toolbox (making sure plotModels is set to true again). It should now work. If it still does not please [[Reporting_problems| report it]].<br />
<br />
Note:<br />
* this is just an empirical workaround, if you have a better idea please [[Contact|let us know]]<br />
* if you wish to debug further yourself please check the Xorg log files and those in /var/log<br />
* another possible workaround is to start matlab with the "-nodisplay" option. That could work as well.<br />
<br />
=== I get the error "Failed to close Matlab pool cleanly, error is Too many output arguments" ===<br />
<br />
This happens if you run the toolbox on Matlab version 2008a and you have the parallel computing toolbox installed. You can simply ignore this error message, it does not cause any problems. If you want to use SUMO with the parallel computing toolbox you will need Matlab 2008b.<br />
<br />
=== The toolbox seems to keep on running forever, when or how will it stop? ===<br />
<br />
The toolbox will keep on generating models and selecting data until one of the termination criteria has been reached. It is up to ''you'' to choose these targets carefully, so how low the toolbox runs simply depends on what targets you choose. Please see [[Running#Understanding_the_control_flow]].<br />
<br />
Of course choosing a-priori targets up front is not always easy and there is no real solution for this, except thinking well about what type of model you want (see [[FAQ#I_dont_like_the_final_model_generated_by_SUMO_how_do_I_improve_it.3F]]). In doubt you can always use a small value (or 0) and then simply quit the running toolbox using Ctrl-C when you think its been enough.<br />
<br />
While one could implement fancy, automatic stopping algorithms, their actual benefit is questionable.</div>Dgorissenhttp://sumowiki.intec.ugent.be/index.php?title=Running&diff=5120Running2010-06-13T14:08:15Z<p>Dgorissen: </p>
<hr />
<div>== Getting started ==<br />
<br />
If you are just getting started with the toolbox and you have no idea how everything works, this section should help you on your way.<br />
First make sure you [[About#Intended_use|know what the toolbox is used for]], you have finished the toolbox [[Installation]] and you have done a successful [[Installation#Test_run|test run]] by running the default configuration. If that works you know everything is working correctly. Then:<br />
<br />
# Go through the presentation [[About#Documentation|available here]], paying specific attention to the control flow<br />
# The behavior of the toolbox is fully configured through two XML files. If you do not know what XML is please read [[FAQ#What is XML?]] first.<br />
# Read [[Toolbox_configuration|the toolbox configuration structure section]]. This is very important. Then print out ''config/default.xml'' and take your time to read it through and understand the structure and the way things work.<br />
# Do the [[Installation#Test_run|test run]] again, this time play closer attention to what is happening and see if you understand what is going on. If you still have no idea you can refer to the [[Running#Understanding_the_control_flow|Understanding the control flow]] section below.<br />
# Ok, by now you should have a rough idea how the configuration file is structured and how the control flow works. Now, Change ''default.xml'' to run a different example. This [[Running#Running_different_examples|is explained below]]. If you can do that and it works you should have mastered all the basic skills needed to use the toolbox. You can now browse through the rest of the wiki as needed.<br />
<br />
If you get stuck or have any problems [[Reporting problems|please let us know]].<br />
<br />
''We are well aware that documentation is not always complete and possibly even out of date in some cases. We try to document everything as best we can but much is limited by available time and manpower. We are are a university research group after all. The most up to date documentation can always be found (if not here) in the default.xml configuration file and, of course, in the source files. If something is unclear please don't hesitate to [[Reporting problems|ask]].''<br />
<br />
== Running the default configuration ==<br />
<br />
Once the SUMO Toolbox is [[Installation|installed]] you can do a simple test run to check if everything is working as expected. This is explained on the [[Installation#Test_run | installation page]].<br />
<br />
== Running different examples ==<br />
<br />
=== Prerequisites ===<br />
This section is about running a different example problem, if you want to model your own problem see [[Adding an example]]. Make sure you [[configuration|understand the difference between the simulator configuration file and the toolbox configuration file]]. You should also have read [[Toolbox configuration#Structure]].<br />
<br />
=== Changing default.xml ===<br />
The <code>examples/</code> directory contains many example simulators that you can use to test the toolbox with. These examples range from predefined functions, to datasets from various domains, to native simulation code. If you want to try one of the examples, open <code>config/default.xml</code> and edit the [[Simulator| <Simulator>]] tag to suit your needs.<br />
<br />
For example, originally default.xml contains:<br />
<br />
<source lang="xml"><br />
<Simulator>Academic2DTwice</Simulator><br />
</source><br />
<br />
This means the toolbox will look in the examples directory for a project directory called <code>Academic2DTwice</code> and load the xml file with the same name inside that directory (in this case: <code>Academic2DTwice/Academic2DTwice.xml</code>).<br />
<br />
Now lets say you want to run one of the different example problems, for example, lets say you want to try the Michalewicz example. In this case you would replace the original Simulator tag with: <br />
<br />
<source lang="xml"><br />
<Simulator>Michalewicz</Simulator><br />
</source><br />
<br />
In addition you would have to change the <code><Outputs></code> tag. The <code>Academic2DTwice</code> example has two outputs (''out'' and ''outinverse''). However, the Michalewicz example has only one (''out''). Thus telling the SUMO Toolbox to model the ''outinverse'' output in that case makes no sense since it does not exist for the Michalewicz example. So the following output configuration suffices:<br />
<br />
<source lang="xml"><br />
<Outputs><br />
<Output name="out"><br />
</Output><br />
</source><br />
<br />
The rest of default.xml can be kept the same. Then simply run '<code>go</code>' to run the example (making sure that the toolbox is in your Matlab path of course).<br />
<br />
Note that it is also possible to specify an absolute path or refer to a particular xml file directly. For example:<br />
<br />
<source lang="xml"><br />
<Simulator>/path/to/your/project/directory</Simulator><br />
</source><br />
<br />
or:<br />
<br />
<source lang="xml"><br />
<Simulator>Ackley/Ackley2D.xml</Simulator><br />
</source><br />
<br />
=== Important notes ===<br />
<br />
If you start changing default.xml to try out different examples, there are a number of important things you should be aware of.<br />
<br />
==== Select a matching Input and Outputs ====<br />
Using the <code><Inputs></code> and <code><Outputs></code> tags in the SUMO-Toolbox configuration file you can tell the toolbox which outputs should be modeled and how. Note that these tags are optional. You can delete them and then the toolbox will simply model all available inputs and outputs. If you do specify a particular output, for example say you tell the toolbox to model output ''temperature'' of the simulator ''ChemistryProblem''. If you then change the configuration file to model ''BiologyProblem'' you will have to change the name of the selected output (or input) since most likely ''BiologyProblem'' will not have an output called ''temperature''.<br />
Another concrete example is given above with the Michalewicz example.<br />
<br />
==== Select a matching SampleEvaluator ====<br />
There is one important caveat. Some examples consist of a fixed data set, some are implemented as a Matlab function, others as a C++ executable, etc. When running a different example you have to tell the SUMO Toolbox how the example is implemented so the toolbox knows how to extract data (eg: should it load a data file or should it call a Matlab function). This is done by specifying the correct [[Config:SampleEvaluator|SampleEvaluator]] tag. The default SampleEvaluator is:<br />
<br />
<source lang="xml"><br />
<SampleEvaluator>matlab</SampleEvaluator><br />
</source><br />
<br />
So this means that the toolbox expects the example you want to run is implemented as a Matlab function. Thus it is no use running an example that is implemented as a static dataset using the '[[Config:SampleEvaluator#matlab|matlab]]' or '[[Config:SampleEvaluator#local|local]]' sample evaluators. Doing this will result in an error. In this case you should use '[[Config:SampleEvaluator#scatteredDataset|scatteredDataset]]' (or sometimes [[Config:SampleEvaluator#griddedDataset|griddedDataset]]).<br />
<br />
To see how an example is implemented open the XML file inside the example directory and look at the <source lang="xml"><Implementation></source> tag. To see which SampleEvaluators are available see [[Config:SampleEvaluator]].<br />
<br />
==== Select an appropriate AdaptiveModelBuilder ====<br />
Also remember that if you switch to a different example you may also have to change the [[Config:AdaptiveModelBuilder]] used. For example, if you are using a spline model (which only works in 2D) and you decide to model a problem with many dimensions (e.g., CompActive or BostonHousing) you will have to switch to a different model type (e.g., any of the SVM or LS-SVM model builders).<br />
<br />
==== Switch off Sample Selection if not needed ====<br />
If you are modeling a fixed, small size dataset it may make no sense to select samples incrementally. Instead you will probably load all the data at once and only generate models. See [[Adaptive_Modeling_Mode]] for how to do this.<br />
<br />
Finally the question may remain, what settings should I use for my problem? Well there is no best answer to this question, see [[General_guidelines]].<br />
<br />
== Running different configuration files ==<br />
<br />
If you just type "go" the SUMO-Toolbox will run using the configuration options in default.xml. However you may want to make a copy of default.xml and play around with that, leaving your original default.xml intact. So the question is, how do you run that file? Lets say your copy is called MyConfigFile.xml. In order to tell SUMO to run that file you would type:<br />
<br />
<source lang="xml"><br />
go('/path/to/MyConfigFile.xml')<br />
</source><br />
<br />
The path can be an absolute path, or a path relative to the SUMO Toolbox root directory.<br />
To see what other options you have when running go type ''help go''.<br />
<br />
'''Remember to always run go from the toolbox root directory.'''<br />
<br />
=== Merging your configuration ===<br />
<br />
If you know what you are doing, you can merge your own custom configuration with the default configuration by using the '-merge' option. Options or tags that are missing in this custom file will then be filled up with the values from the default configuration. This prevents you from having to duplicate tags in default.xml. However, if you are unfamiliar with XML and not quite sure what you are doing we advise against using it.<br />
<br />
=== Running optimization examples ===<br />
The SUMO toolbox can also be used for minimizing the simulator in an intelligent way. There are 2 examples in included in <code>config/Optimization</code>. To run these examples is exactly the same as always, e.g. <code>go('config/optimization/Branin.xml')</code>. The only difference is in the sample selector which is specified in the configuration file itself.<br />
<gallery><br />
Image:ISCSampleSelector2.png<br />
</gallery><br />
The example configuration files are well documented, it is advised to go through them for more detailed information.<br />
<br />
== Understanding the control flow ==<br />
<br />
[[Image:sumo-control-flow.png|thumb|300px|right|The general SUMO-Toolbox control flow]]<br />
<br />
When the toolbox is running you might wonder what exactly is going on. The high level control flow that the toolbox goes through is illustrated in the flow chart and explained in more detail below. You may also refer to the [[About#Presentation|general SUMO presentation]].<br />
<br />
# Select samples according to the [[InitialDesign|initial design]] and execute the [[Simulator]] for each of the points<br />
# Once enough points are available, start the [[Add_Model_Type#Models.2C_Model_builders.2C_and_Factories|Model builder]] which will start producing models as it optimizes the model parameters<br />
## the number of models generated depends on the [[Config:AdaptiveModelBuilder|AdaptiveModelBuilder]] used. Usually the AdaptiveModelBuilder tag contains a setting like ''maxFunEvals'' or ''popSize''. This indicates to the algorithm that is optimizing the model parameters (and thus generating models) how many models it should maximally generate before stopping. By increasing this number you will generate more models in between sampling iterations, thus have a higher chance of getting a better model, but increasing the computation time. This step is what we refer to as a ''modeling iteration''.<br />
## optimization over the model parameters is driven by the [[Measures|Measure(s)]] that are enabled. Selection of the Measure is thus very important for the modeling process!<br />
## each time the model builder generates a model that has a lower measure score than the previous best model, the toolbox will trigger a "New best model found" event, save the model, generate a plot, and trigger all the profilers to update themselves.<br />
## so note that by default, you only see something happen when a new best model is found, you do not see all the other models that are being generated in the background. If you want to see those, you must increase the logging granularity (or just look in the log file) or [[FAQ#How_do_I_enable_more_profilers.3F|enable more profilers]].<br />
# So the model builder will run until it has completed<br />
# Then, if the current best model satisfies all the targets in the enabled Measures, it means we have reached the requirements and the toolbox terminates.<br />
# If not, the [[SampleSelector]] selects a new set of samples (= a ''sampling iteration''), they are simulated, and the model building resumes or is restarted according to the configured restart strategy<br />
# This whole loop continues (thus the toolbox will keep running) until one of the following conditions is true:<br />
## the targets specified in the active measure tags have been reached (each Measure has a target value which you can set). Note though, that when you are using multiple measures (see [[Multi-Objective Modeling]]) or when using single measures like AIC or LRM, it becomes difficult to set a priori targets since you cant really interpret the scores (in contrast to the simple case with a single measure like CrossValidation where your target is simply the error you require). In those cases you should usually set the targets to 0 and use one of the other criteria below to make sure the toolbox stops.<br />
## the maximum running time has been reached (''maximumTime'' property in the [[Config:SUMO]] tag)<br />
## the maximum number of samples has been reached (''maximumTotalSamples'' property in the [[Config:SUMO]] tag)<br />
## the maximum number of modeling iterations has been reached (''maxModelingIterations'' property in the [[Config:SUMO]] tag)<br />
<br />
<br />
Note that it is also possible to disable the sample selection loop, see [[Adaptive Modeling Mode]]. Also note that while you might think the toolbox is not doing anything, it is actually building models in the background (see above for how to see the details). The toolbox will only inform you (unless configured otherwise) if it finds a model that is better than the previous best model (using that particular measure!!). If not it will continue running until one of the stopping conditions is true.<br />
<br />
== Output ==<br />
<br />
All output is stored under the [[Config:ContextConfig#OutputDirectory|directory]] specified in the [[Config:ContextConfig]] section of the configuration file (by default this is set to "<code>output</code>"). <br />
<br />
Starting from version 6.0 the output directory is always relative to the project directory of your example. Unless you specify an absolute path.<br />
<br />
After completion of a SUMO Toolbox run, the following files and directories can be found there (e.g. : in <code>output/<run_name+date+time>/</code> subdirectory) :<br />
<br />
* <code>config.xml</code>: The xml file that was used by this run. Can be used to reproduce the entire modeling process for that run.<br />
* <code>randstate.dat</code>: contains states of the random number generators, so that it becomes possible to deterministically repeat a run (see the [[Random state]] page).<br />
* <code>samples.txt</code>: a list of all the samples that were evaluated, and their outputs.<br />
* <code>profilers</code>-dir: contains information and plots about convergence rates, resource usage, and so on.<br />
* <code>best</code>-dir: contains the best models (+ plots) of all outputs that were constructed during the run. This is continuously updated as the modeling progresses.<br />
* <code>models_outputName</code>-dir: contains a history of all intermediate models (+ plots + movie) for each output that was modeled.<br />
<br />
If you generated models [[Multi-Objective Modeling|multi-objectively]] you will also find the following directory:<br />
<br />
* <code>paretoFronts</code>-dir: contains snapshots of the population during multi-objective optimization of the model parameters.<br />
<br />
== Debugging ==<br />
<br />
Remember to always check the log file first if problems occur!<br />
When [[reporting problems]] please attach your log file and the xml configuration file you used.<br />
<br />
To aid understanding and debugging you should set the console and file logging level to FINE (or even FINER, FINEST)<br />
as follows: <br />
<br />
Change the level of the ConsoleHandler tag to FINE, FINER or FINEST. Do the same for the FileHandler tag. <br />
<br />
<source lang="xml"><br />
<!-- Configure ConsoleHandler instances --><br />
<ConsoleHandler><br />
<Option key="Level" value="FINE"/><br />
</ConsoleHandler><br />
</source><br />
<br />
== Using models ==<br />
<br />
Once you have generated a model, you might wonder what you can do with it. To see how to load, export, and use SUMO generated models see the [[Using a model]] page.<br />
<br />
== Modeling complex outputs ==<br />
<br />
The toolbox supports the modeling of complex valued data. If you do not specify any specific <[[Outputs|Output]]> tags, all outputs will be modeled with [[Outputs#Complex_handling|complexHandling]] set to '<code>complex</code>'. This means that a real output will be modeled as a real value, and a complex output will be modeled as a complex value (with a real and imaginary part). If you don't want this (i.e., you want to model the modulus of a complex output or you want to model real and imaginary parts separately), you explicitly have to set [[Outputs#Complex_handling|complexHandling]] to 'modulus', 'real', 'imaginary', or 'split'.<br />
<br />
More information on this subject can be found at the [[Outputs#Complex_handling|Outputs]] page.<br />
<br />
== Models with multiple outputs ==<br />
<br />
If multiple [[Outputs]] are selected, by default the toolbox will model each output separately using a separate adaptive model builder object. So if you have a system with 3 outputs you will get three different models each with one output. However, sometimes you may want a single model with multiple outputs. For example instead of having a neural network for each component of a complex output (real/imaginary) you might prefer a single network with 2 outputs. To do this simply set the 'combineOutputs' attribute of the <AdaptiveModelBuilder> tag to 'true'. That means that each time that model builder is selected for an output, the same model builder object will be used instead of creating a new one.<br />
<br />
Note though, that not all model types support multiple outputs. If they don't you will get an error message.<br />
<br />
Also note that you can also generate models with multiple outputs in a multi-objective fashion. For information on this see the page on [[Multi-Objective Modeling]].<br />
<br />
== Multi-Objective Model generation ==<br />
<br />
See the page on [[Multi-Objective Modeling]].<br />
<br />
== Interfacing with the SUMO Toolbox ==<br />
<br />
To learn how to interface with the toolbox or model your own problem see the [[Adding an example]] and [[Interfacing with the toolbox]] pages.<br />
<br />
== Test Suite ==<br />
<br />
The a test harness is provided that can be run manually or automatically as part of a cron job. The test suite consists of a number of test XML files (in the config/test/ directory), each describing a particular surrogate modeling experiment. The file config/test/suite.xml dictates which tests are run and their order. The suite.xml file also contains the accuracy and sample bounds that are checked after each test. If the final model found does not fall within the accuracy or number-of-samples bounds, the test is considered failed. Note that due to randomization the final accuracy and number of samples used may vary slightly. Thus the bounds must be set sufficiently loose.<br />
<br />
Note also that some of the predefined test cases may rely on data sets or simulation code that are not publically available. However, since these test problems typically make very good benchmark problems we left them in for illustration purposes.<br />
<br />
The coordinating class is the Matlab TestSuite class. Besides running the tests defined in suite.xml it also tests each of the model member functions.<br />
<br />
The test suite may be run as follows (assuming the SUMO Toolbox is setup properly):<br />
<br />
<source lang="matlab"><br />
s = TestEngine('config/test/suite.xml') ; s.run()<br />
</source><br />
<br />
The "run()" method also supports an optional parameter (a vector) that dictates which tests to run (e.g., run([2 5 3]) will run tests 2,5 and 3).<br />
<br />
== Tips ==<br />
<br />
See the [[Tips]] page for various tips and gotchas.</div>Dgorissenhttp://sumowiki.intec.ugent.be/index.php?title=FAQ&diff=5119FAQ2010-06-07T20:40:50Z<p>Dgorissen: /* Will there be an R/Scilab/Octave/Sage/.. version? */</p>
<hr />
<div>== General ==<br />
<br />
=== What is a global surrogate model? ===<br />
<br />
A global [http://en.wikipedia.org/wiki/Surrogate_model surrogate model] is a mathematical model that mimics the behavior of a computationally expensive simulation code over '''the complete parameter space''' as accurately as possible, using as little data points as possible. So note that optimization is not the primary goal, although it can be done as a post-processing step. Global surrogate models are useful for:<br />
<br />
* design space exploration, to get a ''feel'' of how the different parameters behave<br />
* sensitivity analysis<br />
* ''what-if'' analysis<br />
* prototyping<br />
* visualization<br />
* ...<br />
<br />
In addition they are a cheap way to model large scale systems, multiple global surrogate models can be chained together in a model cascade.<br />
<br />
See also the [[About]] page.<br />
<br />
=== What about surrogate driven optimization? ===<br />
<br />
When coining the term '''surrogate driven optimization''' most people associate it with trust-region strategies and simple polynomial models. These frameworks first construct a local surrogate which is optimized to find an optimum. Afterwards, a move limit strategy decides how the local surrogate is scaled and/or moved through the input space. Subsequently the surrogate is rebuild and optimized. I.e. the surrogate zooms in to the global optimum. For instance the [http://www.cs.sandia.gov/DAKOTA/ DAKOTA] Toolbox implements such strategies where the surrogate construction is separated from optimization.<br />
<br />
Such a framework was earlier implemented in the SUMO Toolbox but was deprecated as it didn't fit the philosophy and design of the toolbox. <br />
<br />
Instead another, equally powerful, approach was taken. The current optimization framework is in fact a sampling selection strategy that balances local and global search. In other words, it balances between exploring the input space and exploiting the information the surrogate gives us.<br />
<br />
A configuration example can be found [[Config:SampleSelector#expectedImprovement|here]].<br />
<br />
=== What is (adaptive) sampling? Why is it used? ===<br />
<br />
In classical Design of Experiments you need to specify the design of your experiment up-front. Or in other words, you have to say up-front how many data points you need and how they should be distributed. Two examples are Central Composite Designs and Latin Hypercube designs. However, if your data is expensive to generate (e.g., an expensive simulation code) it is not clear how many points are needed up-front. Instead data points are selected adaptively, only a couple at a time. This process of incrementally selecting new data points in regions that are the most interesting is called adaptive sampling, sequential design, or active learning. Of course the sampling process needs to start from somewhere so the very first set of points is selected based on a fixed, classic experimental design. See also [[Running#Understanding_the_control_flow]].<br />
SUMO provides a number of different sampling algorithms: [[SampleSelector]]<br />
<br />
Of course sometimes you dont want to do sampling. For example if you have a fixed dataset you just want to load all the data in one go and model that. For how to do this see [[FAQ#How_do_I_turn_off_adaptive_sampling_.28run_the_toolbox_for_a_fixed_set_of_samples.29.3F]].<br />
<br />
=== What about dynamical, time dependent data? ===<br />
<br />
The original design and purpose was to tackle static input-output systems, where there is no memory. Just a complex mapping that must be learnt and approximated. Of course you can take a fixed time interval and apply the toolbox but that typically is not a desired solution. Usually you are interested in time series prediction, e.g., given a set of output values from time t=0 to t=k, predict what happens at time t=k+1,k+2,...<br />
<br />
The toolbox was originally not intended for this purpose. However, it is quite easy to add support for recurrent models. Automatic generation of dynamical models would involve adding a new model type (just like you would add a new regression technique) or require adapting an existing one. For example it would not be too much work to adapt the ANN or SVM models to support dynamic problems. The only extra work besides that would be to add a new [[Measures|Measure]] that can evaluate the fidelity of the models' prediction.<br />
<br />
Naturally though, you would be unable to use sample selection (since it makes no sense in those problems). Unless of course there is a specialized need for it. In that case you would add a new [[SampleSelector]].<br />
<br />
For more information on this topic [[Contact]] us.<br />
<br />
=== What about classification problems? ===<br />
<br />
The main focus of the SUMO Toolbox is on regression/function approximation. However, the framework for hyperparameter optimization, model selection, etc. can also be used for classification. Starting from version 6.3 a demo file is included in the distribution that shows how this works on a well known test problem. If you want to play around with this feature without waiting for 6.3 to be released [[Contact|just let us know]].<br />
<br />
=== Can the toolbox drive my simulation code directly? ===<br />
<br />
Yes it can. See the [[Interfacing with the toolbox]] page.<br />
<br />
=== What is the difference between the M3-Toolbox and the SUMO-Toolbox? ===<br />
<br />
The SUMO toolbox is a complete, feature-full framework for automatically generating approximation models and performing adaptive sampling. In contrast, the M3-Toolbox was more of a proof-of-principle.<br />
<br />
=== What happened to the M3-Toolbox? ===<br />
<br />
The M3 Toolbox project has been discontinued (Fall 2007) and superseded by the SUMO Toolbox. Please contact tom.dhaene@ua.ac.be for any inquiries and requests about the M3 Toolbox.<br />
<br />
=== How can I stay up to date with the latest news? ===<br />
<br />
To stay up to date with the latest news and releases, we also recommend subscribing to our newsletter [http://www.sumo.intec.ugent.be here]. Traffic will be kept to a minimum (1 message every 2-3 months) and you can unsubscribe at any time.<br />
<br />
You can also follow our blog: [http://sumolab.blogspot.com/ http://sumolab.blogspot.com/].<br />
<br />
=== What is the roadmap for the future? ===<br />
<br />
There is no explicit roadmap since much depends on where our research leads us, what feedback we get, which problems we are working on, etc. However, to get an idea of features to come you can always check the [[Whats new]] page.<br />
<br />
You can also follow our blog: [http://sumolab.blogspot.com/ http://sumolab.blogspot.com/].<br />
<br />
=== Will there be an R/Scilab/Octave/Sage/.. version? ===<br />
<br />
At the start of the project we considered moving from Matlab to one of the available open source alternatives. However, after much discussion we decided against this for several reasons, including:<br />
<br />
* The quality and amount of available Matlab documentation<br />
* The quality and number of Matlab toolboxes<br />
* Support for object orientation (inheritance, polymorphism, etc.)<br />
* Many well documented interfacing options (especially the seamless integration with Java)<br />
* Existing experience and know-how of the development team<br />
* The widespread use of the Matlab platform in the target application domains<br />
<br />
Matlab, as a proprietary platform, sure has its problems and deficiencies but the number of advanced algorithms and available toolboxes make it a very attractive platform. Equally important is the fact that every function is properly documented, tested, and includes examples, tutorials, and in some cases GUI tools. A lot of things would have been a lot harder and/or time consuming to implement on one of the other platforms. Add to that the fact that many engineers (particularly in aerospace) already use Matlab quite heavily. Thus given our situation, goals, and resources at the time, Matlab was the best choice for us. <br />
<br />
The other platforms remain on our radar however, and we do look into them from time to time. Though, with our limited resources porting to one of those platforms is not (yet) cost effective.<br />
<br />
=== What are collaboration options? ===<br />
<br />
We will gladly help out with any SUMO-Toolbox related questions or problems. However, since we are a university research group the most interesting goal for us is to work towards some joint publication (e.g., we can help with the modeling of your problem). Alternatively, it is always nice if we could use your data/problem (fully referenced and/or anonymized if necessary of course) as an example application during a conference presentation or in a PhD thesis.<br />
<br />
The most interesting case is if your problem involves sample selection and modeling. This means you have some simulation code or script to drive and you want an accurate model while minimizing the number of data points. In this case, in order for us to optimally help you it would be easiest if we could run your simulation code (or script) locally or access it remotely. Else its difficult to give good recommendations about what settings to use.<br />
<br />
If this is not possible (e.g., expensive, proprietary or secret modeling code) or if your problem does not involve sample selection, you can send us a fixed data set that is representative of your problem. Again, this may be fully anonymized and will be kept confidential of course.<br />
<br />
In either case (code or dataset) remember:<br />
<br />
* the data file should be an ASCII file in column format (each row containing one data point) (see also [[Interfacing_with_the_toolbox]])<br />
* include a short description of your data:<br />
** number of inputs and number of outputs<br />
** the range of each input (or scaled to [-1 1] if you do not wish to disclose this)<br />
** if the outputs are real or complex valued<br />
** how noisy the data is or if it is completely deterministic (computer simulation) (please also see: [[FAQ#My_data_contains_noise_can_the_SUMO-Toolbox_help_me.3F]]).<br />
** if possible the expected range of each output (or scaled if you do not wish to disclose this)<br />
** if possible the names of each input/output + a short description of what they mean<br />
** any further insight you have about the data, expected behavior, expected importance of each input, etc.<br />
<br />
If you have any further questions or comments related to this please [[Contact]] us.<br />
<br />
=== Can you help me model my problem? ===<br />
<br />
Please see the previous question: [[FAQ#What_are_collaboration_options.3F]]<br />
<br />
== Installation and Configuration ==<br />
<br />
=== What is the relationship between Matlab and Java? ===<br />
<br />
Many people do not know this, but your Matlab installation automatically includes a Java virtual machine. By default, Matlab seamlessly integrates with Java, allowing you to create Java objects from the command line (e.g., 's = java.lang.String'). It is possible to disable java support but in order to use the SUMO Toolbox it should not be. To check if Java is enabled you can use the 'usejava' command.<br />
<br />
=== What is Java, why do I need it, do I have to install it, etc. ? ===<br />
<br />
The short answer is: no, dont worry about it. The long answer is: Some of the code of the SUMO Toolbox is written in [http://en.wikipedia.org/wiki/Java_(programming_language) Java], since it makes a lot more sense in many situations and is a proper programming language instead of a scripting language like Matlab. Since Matlab automatically includes a JVM to run Java code there is nothing you need to do or worry about (see the previous FAQ entry). Unless its not working of course, in that case see [[FAQ#When_running_the_toolbox_you_get_something_like_.27.3F.3F.3F_Undefined_variable_.22ibbt.22_or_class_.22ibbt.sumo.config.ContextConfig.setRootDirectory.22.27]].<br />
<br />
=== What is XML? ===<br />
<br />
XML stands for eXtensible Markup Language and is related to HTML (= the stuff web pages are written in). The first thing you have to understand is that '''does not do anything'''. Honest. Many engineers are not used to it and think it is some complicated computer programming language-stuff-thingy. This is of course not the case (we ignore some of the fancy stuff you can do with it for now). XML is a markup language meaning, it provides some rules how you can annotate or structure existing text.<br />
<br />
The way SUMO uses XML is really simple and there is not much to understand. First some simple terminology. Take the following example:<br />
<br />
<source lang="xml"><br />
<Foo attr="bar">bla bla bla</Foo> <br />
</source><br />
<br />
Here we have '''a tag''' called ''Foo'' containing text ''bla bla bla''. The tag Foo also has an '''attribute''' ''attr'' with value ''bar''. '<Foo>' is what we call the '''opening tag''', and '</Foo>' is the '''closing tag'''. Each time you open a tag you must close it again. How you name the tags or attributes it totally up to you, you choose :)<br />
<br />
Lets take a more interesting example. Here we have used XML to represent information about a receipe for pancakes:<br />
<br />
<source lang="xml"><br />
<recipe category="dessert"><br />
<title>Pancakes</title><br />
<author>sumo@intec.ugent.be</author><br />
<date>Wed, 14 Jun 95</date><br />
<description><br />
Good old fashioned pancakes.<br />
</description><br />
<ingredients><br />
<item><br />
<amount>3</amount><br />
<type>eggs</type><br />
</item><br />
<br />
<item><br />
<amount>0.5 tablespoon</amount><br />
<type>salt</type><br />
</item><br />
...<br />
</ingredients><br />
<preparation><br />
...<br />
</preparation><br />
</recipe><br />
</source><br />
<br />
So basically, you see that XML is just a way to structure, order, and group information. Thats it! So SUMO basically uses it to store and structure configuration options. And this works well due to the nice hierarchical nature of XML.<br />
<br />
If you understand this there is nothing else to it in order to be able to understand the SUMO configuration files. If you need more information see the tutorial here: [http://www.w3schools.com/XML/xml_whatis.asp http://www.w3schools.com/XML/xml_whatis.asp]. You can also have a look at the wikipedia page here: [http://en.wikipedia.org/wiki/XML http://en.wikipedia.org/wiki/XML]<br />
<br />
=== Why does SUMO use XML? ===<br />
<br />
XML is the defacto standard way of structuring information. This ranges from spreadsheet files (Microsoft Excel for example), to configuration data, to scientific data, ... There are even whole database systems based solely on XML. So basically, its an intuitive way to structure data and it is used everywhere. This makes that there are a very large number of libraries and programming languages available that can parse, and handle XML easily. That means less work for the programmer. Then of course there is stuff like XSLT, XQuery, etc that makes life even easier.<br />
So basically, it would not make sense for SUMO to use any other format :)<br />
<br />
=== I get an error that SUMO is not yet activated ===<br />
<br />
Make sure you installed the activation file that was mailed to you as is explained in the [[Installation]] instructions. Also double check your system meets the [[System requirements]] and that [http://www.sumowiki.intec.ugent.be/index.php/FAQ#When_running_the_toolbox_you_get_something_like_.27.3F.3F.3F_Undefined_variable_.22ibbt.22_or_class_.22ibbt.sumo.config.ContextConfig.setRootDirectory.22.27|java java is enabled]. To fully verify that the activation file installation is correct ensure that the file ContextConfig.class is present in the directory ''<SUMO installation directory>/bin/java/ibbt/sumo/config''.<br />
<br />
Please note that more flexible research licenses are available if it is possible to [[FAQ#What_are_collaboration_options.3F|collaborate in any way]].<br />
<br />
== Upgrading ==<br />
<br />
=== How do I upgrade to a newer version? ===<br />
<br />
Delete your old <code><SUMO-Toolbox-directory></code> completely and replace it by the new one. Install the new activation file / extension pack as before (see [[Installation]]), start Matlab and make sure the default run works. To port your old configuration files to the new version: make a copy of default.xml (from the new version) and copy over your custom changes (from the old version) one by one. This should prevent any weirdness if the XML structure has changed between releases.<br />
<br />
If you had a valid activation file for the previous version, just [[Contact]] us (giving your SUMOlab website username) and we will send you a new activation file. Note that to update an activation file you must first unzip a copy of the toolbox to a new directory and install the activation file as if it was the very first time. Upgrading of an activation file without performing a new toolbox install is (unfortunately) not (yet) supported.<br />
<br />
== Using ==<br />
<br />
=== I have no idea how to use the toolbox, what should I do? ===<br />
<br />
See: [[Running#Getting_started]]<br />
<br />
=== I want to try one of the different examples ===<br />
<br />
See [[Running#Running_different_examples]].<br />
<br />
=== I want to model my own problem ===<br />
<br />
See : [[Adding an example]].<br />
<br />
=== I want to contribute some data/patch/documentation/... ===<br />
<br />
See : [[Contributing]].<br />
<br />
=== How do I interface with the SUMO Toolbox? ===<br />
<br />
See : [[Interfacing with the toolbox]].<br />
<br />
=== What configuration options (model type, sample selection algorithm, ...) should I use for my problem? ===<br />
<br />
See [[General_guidelines]].<br />
<br />
=== Ok, I generated a model, what can I do with it? ===<br />
<br />
See: [[Using a model]].<br />
<br />
=== How can I share a model created by the SUMO Toolbox? ===<br />
<br />
See : [[Using a model#Model_portability| Model portability]].<br />
<br />
=== I dont like the final model generated by SUMO how do I improve it? ===<br />
<br />
Before you start the modeling you should really ask youself this question: ''What properties do I want to see in the final model?'' You have to think about what for you constitutes a good model and what constitutes a poor model. Then you should rank those properties depending on how important you find them. Examples are:<br />
<br />
* accuracy in the training data<br />
** is it important that the error in the training data is exactly 0, or do you prefer some smoothing<br />
* accuracy outside the training data<br />
** this is the validation or test error, how important is proper generalization (usually this is very important)<br />
* what does accuracy mean to you? a low maximum error, a low average error, both, ...<br />
* smoothness<br />
** should your model be perfectly smooth or is it acceptable that you have a few small ripples here and there for example<br />
* are some regions of the response more important than others?<br />
** for example you may want to be certain that the minima/maxima are captured very accurately but everything in between is less important<br />
* are there particular special features that your model should have<br />
** for example, capture underlying poles or discontinuities correctly<br />
* extrapolation capability<br />
* ...<br />
<br />
It is important to note that often these criteria may be conflicting. The classical example is fitting noisy data: the lower your training error the higher your testing error. A natural approach is to combine multiple criteria, see [[Multi-Objective Modeling]].<br />
<br />
Once you have decided on a set of requirements the question is then, can the SUMO-Toolbox produce a model that meets them? In SUMO model generation is driven by one or more [[Measures]]. So you should choose the combination of [[Measures]] that most closely match your requirements. Of course we can not provide a Measure for every single property, but it is very straightforward to [[Add_Measure|add your own Measure]].<br />
<br />
Now, lets say you have chosen what you think are the best Measures but you are still not happy with the final model. Reasons could be:<br />
<br />
* you need more modeling iterations or you need to build more models per iteration (see [[Running#Understanding_the_control_flow]]). This will result in a more extensive search of the model parameter space, but will take longer to run.<br />
* you should switch to a different model parameter optimization algorithm (e.g., for example instead of the Pattern Search variant, try the Genetic Algorithm variant of your AdaptiveModelBuilder.)<br />
* the model type you are using is not ideally suited to your data<br />
* there simply is not enough data, use a larger initial design or perform more sampling iterations to get more information per dimension<br />
* maybe the sample distribution is causing troubles for your model (e.g., Kriging can have problems with clustered data). In that case it could be worthwhile to choose a different sample selection algorithm.<br />
* the range of your response variable is not ideal (for example, neural networks have trouble modeling data if the range of the outputs is very very small)<br />
<br />
You may also refer to the following [[General_guidelines]]. Finally, of course it may be that your problem is simply a very difficult one and does not approximate well. But, still you should at least get something satisfactory.<br />
<br />
If you are having these kinds of problems, please [[Reporting_problems|let us know]] and we will gladly help out.<br />
<br />
=== My data contains noise can the SUMO-Toolbox help me? ===<br />
<br />
The original purpose of the SUMO-Toolbox was for it to be used in conjunction with computer simulations. Since these are fully deterministic you do not have to worry about noise in the data and all the problems it causes. However, the methods in the toolbox are general fitting methods that work on noisy data as well. So yes, the toolbox can be used with noisy data, but you will just have to be more careful about how you apply the methods and how you perform model selection. Its only when you use the toolbox with a noisy simulation engine that a few special options may need to be set. In that case [[Contact]] us for more information.<br />
<br />
Note though, that the toolbox is not a statistical package, if you have noisy data and you need noise estimation algorithms, kernel smoothing algorithms, etc. you should look towards other tools.<br />
<br />
=== What is the difference between a ModelBuilder and a ModelFactory? ===<br />
<br />
See [[Add Model Type]].<br />
<br />
=== Why are the Neural Networks so slow? ===<br />
<br />
The ANN models are an extremely powerful model type that give very good results in many problems. However, they are quite slow to use. There are some things you can do:<br />
<br />
* use trainlm or trainscg instead of the default training function trainbr. trainbr gives very good, smooth results but is slower to use. If results with trainlm are not good enough, try using msereg as a performance function.<br />
* try setting the training goal (= the SSE to reach during training) to a small positive number (e.g., 1e-5) instead of 0.<br />
* check that the output range of your problem is not very small. If your response data lies between 10e-5 and 10e-9 for example it will be very hard for the neural net to learn it. In that case rescale your data to a more sane range.<br />
* switch from ANN to one of the other neural network modelers: fanngenetic or nanngenetic. These are a lot faster than the default backend based on the [http://www.mathworks.com/products/neuralnet/ Matlab Neural Network Toolbox]. However, the accuracy is usually not as good.<br />
* If you are using [[Measures#CrossValidation| CrossValidation]] try to switch to a different measure since CrossValidation is very expensive to use. CrossValidation is used by default if you have not defined a [[Measures| measure]] yourself. When using one of the neural network model types, try to use a different measure if you can. For example, our tests have shown that minimizing the sum of [[Measures#SampleError| SampleError]] and [[Measures#LRMMeasure| LRMMeasure]] can give equal or even better results than CrossValidation, while being much cheaper (see [[Multi-Objective Modeling]] for how to combine multiple measures). See also the comments in <code>default.xml</code> for examples.<br />
<br />
See also [[FAQ#How_can_I_make_the_toolbox_run_faster.3F]]<br />
<br />
=== How can I make the toolbox run faster? ===<br />
<br />
There are a number of things you can do to speed things up. These are listed below. Remember though that the main reason the toolbox may seem to be slow is due to the many models being built as part of the hyperparameter optimization. Please make sure you fully understand the [[Running#Understanding_the_control_flow|control flow described here]] before trying more advanced options.<br />
<br />
* First of all check that your virus scanner is not interfering with Matlab. If McAfee or any other program wants to scan every file SUMO generates this really slows things down and your computer becomes unusable.<br />
<br />
* Turn off the plotting of models in [[Config:ContextConfig#PlotOptions| ContextConfig]], you can always generate plots from the saved mat files<br />
<br />
* This is an important one. For most model builders there is an option "maxFunEals", "maxIterations", or equivalent. Change this value to change the maximum number of models built between 2 sampling iterations. The higher this number, the slower, but the better the models ''may'' be. Equivalently, for the Genetic model builders reduce the population size and the number of generations.<br />
<br />
* If you are using [[Measures#CrossValidation]] see if you can avoid it and use one of the other measures or a combination of measures (see [[Multi-Objective Modeling]]<br />
<br />
* If you are using a very dense [[Measures#ValidationSet]] as your Measure, this means that every single model will be evaluated on that data set. For some models like RBF, Kriging, SVM, this can slow things down.<br />
<br />
* Disable some, or even all of the [[Config:ContextConfig#Profiling| profilers]] or disable the output handlers that draw charts. For example, you might use the following configuration for the profilers:<br />
<br />
<source lang="xml"><br />
<Profiling><br />
<Profiler name=".*share.*|.*ensemble.*|.*Level.*" enabled="true"><br />
<Output type="toImage"/><br />
<Output type="toFile"/><br />
</Profiler><br />
<br />
<Profiler name=".*" enabled="true"><br />
<Output type="toFile"/><br />
</Profiler><br />
</Profiling><br />
</source><br />
<br />
The ".*" means match any one or more characters ([http://java.sun.com/j2se/1.5.0/docs/api/java/util/regex/Pattern.html see here for the full list of supported wildcards]). Thus in this example all the profilers that have "share", "ensemble", or "Level" in their name shoud be enabled and should be saved as a text file (toFile) AND as an image file (toImage). All the other profilers should be saved just to file. The idea is to only save to image what you want as an image since image generation is expensive. If you do this or switch off image generation completely you will see everything run much faster.<br />
<br />
* Decrease the logging granularity, a log level of FINE (the default is FINEST or ALL) is more then granular enough. Setting it to FINE, INFO, or even WARNING should speed things up.<br />
<br />
* If you have a multi-core/multi-cpu machine:<br />
** if you have the Matlab Parallel Computing Toolbox, try setting the parallelMode option to true in [[Config:ContextConfig]]. Now all model training occurs in parallel. This may give unexpected errors in some cases so beware when using.<br />
** if you are using a native executable or script as the sample evaluator set the threadCount variable in [[Config:SampleEvaluator#LocalSampleEvaluator| LocalSampleEvaluator]] equal to the number of cores/CPUs (only do this if it is ok to start multiple instances of your simulation script in parallel!)<br />
<br />
* Dont use the Min-Max measure, it can slow things down. See also [[FAQ#How_do_I_force_the_output_of_the_model_to_lie_in_a_certain_range]]<br />
<br />
* If you are using neural networks see [[FAQ#Why_are_the_Neural_Networks_so_slow.3F]]<br />
<br />
* If you are having problems with very slow or seemingly hanging runs:<br />
** Do a run inside the [http://www.mathworks.com/access/helpdesk/help/techdoc/index.html?/access/helpdesk/help/techdoc/matlab_env/f9-17018.html&http://www.google.be/search?client=firefox-a&rls=org.mozilla%3Aen-US%3Aofficial&channel=s&hl=nl&q=matlab+profiler&meta=&btnG=Google+zoeken Matlab profiler] and see where most time is spent.<br />
<br />
** Monitor CPU and physical/virtual memory usage while the SUMO toolbox is running and see if you notice anything strange. <br />
<br />
* Also note that by default Matlab only allocates about 117 MB memory space for the Java Virtual Machine. If you would like to increase this limit (which you should) please follow the instructions [http://www.mathworks.com/support/solutions/data/1-18I2C.html?solution=1-18I2C here]. See also the general memory instructions [http://www.mathworks.com/support/tech-notes/1100/1106.html here].<br />
<br />
To check if your SUMO run has hanged, monitor your log file (with the level set at least to FINE). If you see no changes for about 30 minutes the toolbox will probably have stalled. [[Reporting problems| report the problems here]].<br />
<br />
Such problems are hard to identify and fix so it is best to work towards a reproducible test case if you think you found a performance or scalability issue.<br />
<br />
=== How do I build models with more than one output ===<br />
<br />
Sometimes you have multiple responses that you want to model at once. See [[Running#Models_with_multiple_outputs]]<br />
<br />
=== How do I turn off adaptive sampling (run the toolbox for a fixed set of samples)? ===<br />
<br />
See : [[Adaptive Modeling Mode]].<br />
<br />
=== How do I change the error function (relative error, RMSE, ...)? ===<br />
<br />
The [[Measures| <Measure>]] tag specifies the algorithm to use to assign models a score, e.g., [[Measures#CrossValidation| CrossValidation]]. It is also possible to specify which '''error function''' to use, in the measure. The default error function is '<code>rootRelativeSquareError</code>'.<br />
<br />
Say you want to use [[Measures#CrossValidation| CrossValidation]] with the maximum absolute error, then you would put:<br />
<br />
<source lang="xml"><br />
<Measure type="CrossValidation" target="0.001" errorFcn="maxAbsoluteError"/><br />
</source><br />
<br />
On the other hand, if you wanted to use the [[Measures#ValidationSet| ValidationSet]] measure with a relative root-mean-square error you would put:<br />
<br />
<source lang="xml"><br />
<Measure type="ValidationSet" target="0.001" errorFcn="relativeRms"/><br />
</source><br />
<br />
The default error function is '<code>rootRelativeSquareError</code>'. These error functions can be found in the <code>src/matlab/tools/errorFunctions</code> directory. You are free to modify them and add your own. Remember that the choice of error function is very important! Make sure you think well about it. Also see [[Multi-Objective Modeling]].<br />
<br />
=== How do I enable more profilers? ===<br />
<br />
Go to the [[Config:ContextConfig#Profiling| <Profiling>]] tag and put <code>"<nowiki>.*</nowiki>"</code> as the regular expression. See also the next question.<br />
<br />
=== What regular expressions can I use to filter profilers? ===<br />
<br />
See the syntax [http://java.sun.com/j2se/1.5.0/docs/api/java/util/regex/Pattern.html here].<br />
<br />
=== How can I ensure deterministic results? ===<br />
<br />
See : [[Random state]].<br />
<br />
=== How do I get a simple closed-form model (symbolic expression)? ===<br />
<br />
See : [[Using a model]].<br />
<br />
=== How do I enable the Heterogenous evolution to automatically select the best model type? ===<br />
<br />
Simply use the [[Config:AdaptiveModelBuilder#heterogenetic| heterogenetic modelbuilder]] as you would any other.<br />
<br />
=== What is the combineOutputs option? ===<br />
<br />
See [[Running#Models_with_multiple_outputs]]<br />
<br />
=== What error function should I use? ===<br />
<br />
The default error function is the Root Relative Square Error (RRSE). On the other hand meanRelativeError may be more intuitive but in that case you have to be careful if you have function values close to zero since in that case the relative error explodes or even gives infinity. You could also use one of the combined relative error functions (contain a +1 in the denominator to account for small values) but then you get something between a relative and absolute error (=> hard to interpret).<br />
<br />
So to be sure an absolute error seems the safest bet (like the RMSE), however in that case you have to come up with sensible accuracy targets and realize that you will build models that try to fit the regions of high absolute value better than the low ones.<br />
<br />
Picking an error function is a very tricky business and many people do not realize this. Which one is best for you and what targets you use ultimately depends on your application and on what kind of model you want. There is no general answer.<br />
<br />
A recommended read is [http://www.springerlink.com/content/24104526223221u3/ is this paper]. See also the page on [[Multi-Objective Modeling]].<br />
<br />
=== I just want to generate an initial design (no sampling, no modeling) ===<br />
<br />
Do a regular SUMO run, except set the 'maxModelingIterations' in the SUMO tag to 0. The resulting run will only generate (and evaluate) the initial design and save it to samples.txt in the output directory.<br />
<br />
=== How do I start a run with the samples of of a previous run, or with a custom initial design? ===<br />
<br />
Use a Dataset design component, for example:<br />
<br />
<source lang="xml"><br />
<InitialDesign type="DatasetDesign"><br />
<Option key="file" value="/path/to/the/file/containing/the/points.txt"/><br />
</InitialDesign><br />
</source><br />
<br />
=== What is a level plot? ===<br />
<br />
A level plot is a plot that shows how the error histogram changes as the best model improves. An example is:<br />
<gallery><br />
Image:levelplot.png<br />
</gallery><br />
Level plots only work if you have a separate dataset (test set) that the model can be checked against. See the comments in default.xml for how to enable level plots.<br />
<br />
===I am getting a java out of memory error, what happened?===<br />
Datasets are loaded through java. This means that the java heap space is used for storing the data. If you try to load a huge dataset (> 50MB), you might experience problems with the maximum heap size. You can solve this by raising the heap size as described on the following webpage:<br />
[http://www.mathworks.com/support/solutions/data/1-18I2C.html]<br />
<br />
=== How do I force the output of the model to lie in a certain range ===<br />
<br />
See [[Measures#MinMax]].<br />
<br />
=== My problem is high dimensional and has a lot of input parameters (more than 10). Can I use SUMO? ===<br />
<br />
That depends. Remember that the main focus of SUMO is to generate accurate 'global' models. If you want to do sampling the practical dimensionality is limited to around 6-8 (though it depends on the problem and how cheap the simulations are!). Since the more dimensions the more space you need to fill. At that point you need to see if you can extend the models with domain specific knowledge (to improve performance) or apply a dimensionality reduction method ([[FAQ#Can_the_toolbox_tell_me_which_are_the_most_important_inputs_.28.3D_variable_selection.29.3F|see the next question]]). On the other hand, if you don't need to do sample selection but you have a fixed dataset which you want to model. Then the performance on high dimensional data just depends on the model type. For examples SVM type models are independent of the dimension and thus can always be applied. Though things like feature selection are always recommended.<br />
<br />
=== Can the toolbox tell me which are the most important inputs (= variable selection)? ===<br />
<br />
When tackling high dimensional problems a crucial question is "Are all my input parameters relevant?". Normally domain knowledge would answer this question but this is not always straightforward. In those cases a whole set of algorithms exist for doing dimensionality reduction (= feature selection). Support for some of these algorithms may eventually make it into the toolbox but are not currently implemented. That is a whole PhD thesis on its own. However, if a model type provides functions for input relevance determination the toolbox can leverage this. For example, the LS-SVM model available in the toolbox supports Automatic Relevance Determination (ARD). This means that if you use the SUMO Toolbox to generate an LS-SVM model, you can call the function ''ARD()'' on the model and it will give you a list of the inputs it thinks are most important.<br />
<br />
=== Should I use a Matlab script or a shell script for interfacing with my simulation code? ===<br />
<br />
When you want to link SUMO with an external simulation engine (ADS Momentum, SPECTRE, FEBIO, SWAT, ...) you need a [http://en.wikipedia.org/wiki/Shell_script shell script] (or executable) that can take the requested points from SUMO, setup the simulation engine (e.g., set necessary input files), calls the simulator for all the requested points, reads the output (e.g., one or more output files), and returns the results to SUMO (see [[Interfacing with the toolbox]]).<br />
<br />
Which one you choose (matlab script + [[Config:SampleEvaluator#matlab|Matlab Sample Evaluator]], or shell script/executable with [[Config:SampleEvaluator#local|Local Sample Evaluator]] is basically a matter of preference, take whatever is easiest for you.<br />
<br />
HOWEVER, there is one important consideration: Matlab does not support threads so this means that if you use a matlab script to interface with the simulation engine, simulations and modeling will happen sequentially, NOT in parallel. This means the modeling code will sit around waiting, doing nothing, until the simulation(s) have finished. If your simulation code takes a long time to run this is not very efficient. In version 6.2 we will probably fix this by using the Parallel Computing Toolbox.<br />
<br />
On the other hand, using a shell script/executable, does allow the modeling and simulation to occur in parallel (at least if you wrote your interface script in such a way that it can be run multiple times in parallel, i.e., no shared global directories or variables that can cause [http://en.wikipedia.org/wiki/Race_condition race conditions]).<br />
<br />
As a sidenote, note that if you already put work into a Matlab script, it is still possible to use a shell script, by writing a shell script that starts Matlab (using -nodisplay or -nojvm options), executes your script (using the -r option), and exits Matlab again. Of course it is not very elegant and adds some overhead but depending on your situation it may be worth it.<br />
<br />
=== Is there any design documentation available? ===<br />
<br />
There is a PhD thesis fully describing the software architecture and design rationale behind the toolbox. It will be put online in the future. Until then you can [[Contact]] us to obtain a copy.<br />
<br />
== Troubleshooting ==<br />
<br />
=== I have a problem and I want to report it ===<br />
<br />
See : [[Reporting problems]].<br />
<br />
=== I sometimes get flat models when using rational functions ===<br />
<br />
First make sure the model is indeed flat, and does not just appear so on the plot. You can verify this by looking at the output axis range and making sure it is within reasonable bounds. When there are poles in the model, the axis range is sometimes stretched to make it possible to plot the high values around the pole, causing the rest of the model to appear flat. If the model contains poles, refer to the next question for the solution.<br />
<br />
The [[Config:AdaptiveModelBuilder#rational| RationalModel]] tries to do a least squares fit, based on which monomials are allowed in numerator and denominator. We have experienced that some models just find a flat model as the best least squares fit. There are two causes for this:<br />
<br />
* The number of sample points is few, and the model parameters (as explained [[Model types explained#PolynomialModel|here]]) force the model to use only a very small set of degrees of freedom. The solution in this case is to increase the minimum percentage bound in the RationalFactory section of your configuration file: change the <code>"percentBounds"</code> option to <code>"60,100"</code>, <code>"80,100"</code>, or even <code>"100,100"</code>. A setting of <code>"100,100"</code> will force the polynomial models to always exactly interpolate. However, note that this does not scale very well with the number of samples (to counter this you can set <code>"maxDegrees"</code>). If, after increasing the <code>"percentBounds"</code> you still get weird, spiky, models you simply need more samples or you should switch to a different model type.<br />
* Another possibility is that given a set of monomial degrees, the flat function is just the best possible least squares fit. In that case you simply need to wait for more samples.<br />
* The measure you are using is not accurately estimating the true error, try a different measure or error function. Note that a maximum relative error is dangerous to use since a the 0-function (= a flat model) has a lower maximum relative error than a function which overshoots the true behavior in some places but is otherwise correct.<br />
<br />
=== When using rational functions I sometimes get 'spikes' (poles) in my model ===<br />
<br />
When the denominator polynomial of a rational model has zeros inside the domain, the model will tend to infinity near these points. In most cases these models will only be recognized as being `the best' for a short period of time. As more samples get selected these models get replaced by better ones and the spikes should disappear.<br />
<br />
So, it is possible that a rational model with 'spikes' (caused by poles inside the domain) will be selected as best model. This may or may not be an issue, depending on what you want to use the model for. If it doesn't matter that the model is very inaccurate at one particular, small spot (near the pole), you can use the model with the pole and it should perform properly.<br />
<br />
However, if the model should have a reasonable error on the entire domain, several methods are available to reduce the chance of getting poles or remove the possibility altogether. The possible solutions are:<br />
<br />
* Simply wait for more data, usually spikes disappear (but not always).<br />
* Lower the maximum of the <code>"percentBounds"</code> option in the RationalFactory section of your configuration file. For example, say you have 500 data points and if the maximum of the <code>"percentBounds"</code> option is set to 100 percent it means the degrees of the polynomials in the rational function can go up to 500. If you set the maximum of the <code>"percentBounds"</code> option to 10, on the other hand, the maximum degree is set at 50 (= 10 percent of 500). You can also use the <code>"maxDegrees"</code> option to set an absolute bound.<br />
* If you roughly know the output range your data should have, an easy way to eliminate poles is to use the [[Measures#MinMax| MinMax]] [[Measures| Measure]] together with your current measure ([[Measures#CrossValidation| CrossValidation]] by default). This will cause models whose response falls outside the min-max bounds to be penalized extra, thus spikes should disappear.<br />
* Use a different model type (RBF, ANN, SVM,...), as spikes are a typical problem of rational functions.<br />
* Increase the population size if using the genetic version<br />
* Try using the [[SampleSelector#RationalPoleSuppressionSampleSelector| RationalPoleSuppressionSampleSelector]], it was designed to get rid of this problem more quickly, but it only selects one sample at the time.<br />
<br />
However, these solutions may not still not suffice in some cases. The underlying reason is that the order selection algorithm contains quite a lot of randomness, making it prone to over-fitting. This issue is being worked on but will take some time. Automatic order selection is not an easy problem<br />
<br />
=== There is no noise in my data yet the rational functions don't interpolate ===<br />
<br />
[[FAQ#I sometimes get flat models when using rational functions |see this question]].<br />
<br />
=== When loading a model from disk I get "Warning: Class ':all:' is an unknown object class. Object 'model' of this class has been converted to a structure." ===<br />
<br />
You are trying to load a model file without the SUMO Toolbox in your Matlab path. Make sure the toolbox is in your Matlab path. <br />
<br />
In short: Start Matlab, run <code><SUMO-Toolbox-directory>/startup.m</code> (to ensure the toolbox is in your path) and then try to load your model.<br />
<br />
=== When running the SUMO Toolbox you get an error like "No component with id 'annpso' of type 'adaptive model builder' found in config file." ===<br />
<br />
This means you have specified to use a component with a certain id (in this case an AdaptiveModelBuilder component with id 'annpso') but a component with that id does not exist further down in the configuration file (in this particular case 'annpso' does not exist but 'anngenetic' or 'ann' does, as a quick search through the configuration file will show). So make sure you only declare components which have a definition lower down. So see which components are available, simply scroll down the configuration file and see which id's are specified. Please also refer to the [[Toolbox configuration#Declarations and Definitions | Declarations and Definitions]] page.<br />
<br />
=== When using NANN models I sometimes get "Runtime error in matrix library, Choldc failed. Matrix not positive definite" ===<br />
<br />
This is a problem in the mex implementation of the [http://www.iau.dtu.dk/research/control/nnsysid.html NNSYSID] toolbox. Simply delete the mex files, the Matlab implementation will be used and this will not cause any problems.<br />
<br />
=== When using FANN models I sometimes get "Invalid MEX-file createFann.mexa64, libfann.so.2: cannot open shared object file: No such file or directory." ===<br />
<br />
This means Matlab cannot find the [http://leenissen.dk/fann/ FANN] library itself to link to dynamically. Make sure it is in your library path, ie, on unix systems, make sure it is included in LD_LIBRARY_PATH.<br />
<br />
=== When trying to use SVM models I get 'Error during fitness evaluation: Error using ==> svmtrain at 170, Group must be a vector' ===<br />
<br />
You forgot to build the SVM mex files for your platform. For windows they are pre-compiled for you, on other systems you have to compile them yourself with the makefile.<br />
<br />
=== When running the toolbox you get something like '??? Undefined variable "ibbt" or class "ibbt.sumo.config.ContextConfig.setRootDirectory"' ===<br />
<br />
First see [[FAQ#What_is_the_relationship_between_Matlab_and_Java.3F | this FAQ entry]].<br />
<br />
This means Matlab cannot find the needed Java classes. This typically means that you forgot to run 'startup' (to set the path correctly) before running the toolbox (using 'go'). So make sure you always run 'startup' before running 'go' and that both commands are always executed in the toolbox root directory.<br />
<br />
If you did run 'startup' correctly and you are still getting an error, check that Java is properly enabled:<br />
<br />
# typing 'usejava jvm' should return 1 <br />
# typing 's = java.lang.String', this should ''not'' give an error<br />
# typing 'version('-java')' should return at least version 1.5.0<br />
<br />
If (1) returns 0, then the jvm of your Matlab installation is not enabled. Check your Matlab installation or startup parameters (did you start Matlab with -nojvm?)<br />
If (2) fails but (1) is ok, there is a very weird problem, check the Matlab documentation.<br />
If (3) returns a version before 1.5.0 you will have to upgrade Matlab to a newer version or force Matlab to use a custom, newer, jvm (See the Matlab docs for how to do this).<br />
<br />
=== You get errors related to ''gaoptimset'',''psoptimset'',''saoptimset'',''newff'' not being found or unknown ===<br />
<br />
You are trying to use a component of the SUMO toolbox that requires a Matlab toolbox that you do not have. See the [[System requirements]] for more information.<br />
<br />
=== After upgrading I get all kinds of weird errors or warnings when I run my XML files ===<br />
<br />
See [[FAQ#How_do_I_upgrade_to_a_newer_version.3F]]<br />
<br />
=== I get a warning about duplicate samples being selected, why is this? ===<br />
<br />
Sometimes, in special circumstances, multiple sample selectors may select the same sample at the same time. Even though in most cases this is detected and avoided, it can still happen when multiple outputs are modelled in one run, and each output is sampled by a different sample selector. These sample selectors may then accidentally choose the same new sample location.<br />
<br />
=== I sometimes see the error of the best model go up, shouldn't it decrease monotonically? ===<br />
<br />
There is no short answer here, it depends on the situation. Below 'single objective' refers to the case where during the hyperparameter optimization (= the modeling iteration) combineOutputs=false, and there is only a single measure set to 'on'. The other cases are classified as 'multi objective'. See also [[Multi-Objective Modeling]].<br />
<br />
# '''Sampling off'''<br />
## ''Single objective'': the error should always decrease monotonically, you should never see it rise. If it does [[reporting problems|report it as a bug]]<br />
## ''Multi objective'': There is a very small chance the error can temporarily decrease but it should be safe to ignore. In this case it is best to use a multi objective enabled modeling algorithm<br />
# '''Sampling on'''<br />
## ''Single objective'': inside each modeling iteration the error should always monotonically decrease. At each sampling iteration the best models are updated (to reflect the new data), thus there the best model score may increase, this is normal behavior(*). It is possible that the error increases for a short while, but as more samples come in it should decrease again. If this does not happen you are using a poor measure or poor hyperparameter optimization algorithm, or there is a problem with the modeling technique itself (e.g., clustering in the datapoints is causing numerical problems).<br />
## ''Multi objective'': Combination of 1.2 and 2.1.<br />
<br />
(*) This is normal if you are using a measure like cross validation that is less reliable on little data than on more data. However, in some cases you may wish to override this behavior if you are using a measure that is independent of the number of samples the model is trained with (e.g., a dense, external validation set). In this case you can force a monotonic decrease by setting the 'keepOldModels' option in the SUMO tag to true. Use with caution!<br />
<br />
=== At the end of a run I get Undefined variable "ibbt" or class "ibbt.sumo.util.JpegImagesToMovie.createMovie" ===<br />
<br />
This is normal, the warning printed out before the error explains why:<br />
<br />
''[WARNING] jmf.jar not found in the java classpath, movie creation may not work! Did you install the SUMO extension pack? Alternatively you can install the java media framwork from java.sun.com''<br />
<br />
By default, at the end of a run, the toolbox will try to generate a movie of all the intermediate model plots. To do this it requires the extension pack to be installed (you can download it from the SUMO lab website). So install the extension pack and you will no longer get the error. Alternatively you can simply set the "createMovie" option in the <SUMO> tag to "false".<br />
So note that there is nothing to worry about, everything has run correctly, it is just the movie creation that is failing.<br />
<br />
=== On startup I get the error "java.io.IOException: Couldn't get lock for output/SUMO-Toolbox.%g.%u.log" ===<br />
<br />
This error means that SUMO is unable to create the log file. Check the output directory exists and has the correct permissions. If your output directory is on a shared (network) drive this could also cause problems. Also make sure you are running the toolbox (calling 'go') from the toolbox root directory, and not in some toolbox sub directory! This is very important.<br />
<br />
If you still have problems you can override the default logfile name and location as follows:<br />
<br />
In the <FileHandler> tag inside the <Logging> tag add the following option:<br />
<br />
<code><br />
<Option key="Pattern" value="My_SUMO_Log_file.log"/><br />
</code><br />
<br />
This means that from now on the sumo log file will be saved as the file "My_SUMO_Log_file.log" in the SUMO root directory. You can use any path you like.<br />
For more information about this option see [http://java.sun.com/j2se/1.4.2/docs/api/java/util/logging/FileHandler.html the FileHandler Javadoc].<br />
<br />
=== The Toolbox crashes with "Too many open files" what should I do? ===<br />
<br />
This is a known bug, see [[Known_bugs#Version_6.1]].<br />
<br />
If this does not fix your problem then do the following:<br />
<br />
On Windows try increasing the limit in windows as dictated by the error message. Also, when you get the error, use the fopen("all") command to see which files are open and send us the list of filenames. Then we can maybe further help you debug the problem. Even better would be to use the Process Explorer utility [http://technet.microsoft.com/en-us/sysinternals/bb896653.aspx available here]. When you get the error, dont shut down Matlab but start Process explorer and see which SUMO-Toolbox related files are open. If you then [[Reporting_problems|let us know]] we can further debug the problem.<br />
<br />
On Linux again don't shut down Matlab but:<br />
<br />
* open a new terminal window<br />
* type:<br />
<source lang="bash"><br />
lsof > openFiles.txt<br />
</source><br />
* Then [[Contact|send us]] the following information:<br />
** the file openFiles.txt <br />
** the exact Linux distribution you are using (Red Hat 10, CentOS 5, SUSE 11, etc).<br />
** the output of<br />
<source lang="bash"><br />
uname -a ; df -T ; mount<br />
</source><br />
<br />
As a temporary workaround you can try increasing the maximum number of open files ([http://www.linuxforums.org/forum/redhat-fedora-linux-help/64716-where-chnage-file-max-permanently.html see for example here]). We are currently debugging this issue.<br />
<br />
In general: to be safe it is always best to do a SUMO run from a clean Matlab startup, especially if the run is important or may take a long time.<br />
<br />
=== When using the LS-SVM models I get lots of warnings: "make sure lssvmFILE.x (lssvmFILE.exe) is in the current directory, change now to MATLAB implementation..." ===<br />
<br />
The LS-SVMs have a C implementation and a Matlab implementation. If you dont have the compiled mex files it will use the matlab implementation and give a warning. But everything will work properly. To get rid of the warnings, compile the mex files [[Installation#Windows|as described here]], this can be done very easily. Or simply comment out the lines that produce the output in the lssvmlab directory in src/matlab/contrib.<br />
<br />
=== I get an error "Undefined function or method 'trainlssvm' for input arguments of type 'cell'" ===<br />
<br />
You most likely forgot to [[Installation#Extension_pack|install the extension pack]].<br />
<br />
=== When running the SUMO-Toolbox under Linux, the [http://en.wikipedia.org/wiki/X_Window_System X server] suddenly restarts and I am logged out of my session ===<br />
<br />
Note that in Linux there is an explicit difference between the [http://en.wikipedia.org/wiki/Linux_kernel kernel] and the [http://en.wikipedia.org/wiki/X_Window_System X display server]. If the kernel crashes or panics your system completely freezes (you have to reset manually) or your computer does a full reboot. Luckily this is very rare. However, if you display server (X) crashes or restarts it means your operating system is still running fine, its just that you have to log in again since your graphical session has terminated. The FAQ entry is only for the latter. If you find your kernel is panicing or freezing, that is a more fundamental problem and you should contact your system admin.<br />
<br />
So what happens is that after a few seconds when the toolbox wants to plot the first model [http://en.wikipedia.org/wiki/X_Window_System X] crashes and you are suddenly presented with a login screen. The problem is not due to SUMO but rather to the Matlab - Display server interaction.<br />
<br />
What you should first do is set plotModels to false in the [[Config:ContextConfig]] tag, run again and see if the problem occurs again. If it does please [[Reporting_problems| report it]]. If the problem does not occur you can then try the following:<br />
<br />
* Log in as root (or use [http://en.wikipedia.org/wiki/Sudo sudo])<br />
* Edit the following configuration file using a text editor (pico, nano, vi, kwrite, gedit,...)<br />
<br />
<source lang="bash"><br />
/etc/X11/xorg.conf<br />
</source><br />
<br />
Note: the exact location of the xorg.conf file may vary on your system.<br />
<br />
* Look for the following line:<br />
<br />
<source lang="bash"><br />
Load "glx"<br />
</source><br />
<br />
* Comment it out by replacing it by:<br />
<br />
<source lang="bash"><br />
# Load "glx"<br />
</source><br />
<br />
* Then save the file, restart your X server (if you do not know how to do this simply reboot your computer)<br />
* Log in again, and try running the toolbox (making sure plotModels is set to true again). It should now work. If it still does not please [[Reporting_problems| report it]].<br />
<br />
Note:<br />
* this is just an empirical workaround, if you have a better idea please [[Contact|let us know]]<br />
* if you wish to debug further yourself please check the Xorg log files and those in /var/log<br />
* another possible workaround is to start matlab with the "-nodisplay" option. That could work as well.<br />
<br />
=== I get the error "Failed to close Matlab pool cleanly, error is Too many output arguments" ===<br />
<br />
This happens if you run the toolbox on Matlab version 2008a and you have the parallel computing toolbox installed. You can simply ignore this error message, it does not cause any problems. If you want to use SUMO with the parallel computing toolbox you will need Matlab 2008b.<br />
<br />
=== The toolbox seems to keep on running forever, when or how will it stop? ===<br />
<br />
The toolbox will keep on generating models and selecting data until one of the termination criteria has been reached. It is up to ''you'' to choose these targets carefully, so how low the toolbox runs simply depends on what targets you choose. Please see [[Running#Understanding_the_control_flow]].<br />
<br />
Of course choosing a-priori targets up front is not always easy and there is no real solution for this, except thinking well about what type of model you want (see [[FAQ#I_dont_like_the_final_model_generated_by_SUMO_how_do_I_improve_it.3F]]). In doubt you can always use a small value (or 0) and then simply quit the running toolbox using Ctrl-C when you think its been enough.<br />
<br />
While one could implement fancy, automatic stopping algorithms, their actual benefit is questionable.</div>Dgorissenhttp://sumowiki.intec.ugent.be/index.php?title=FAQ&diff=5118FAQ2010-06-07T20:36:14Z<p>Dgorissen: /* Will there be an R/Scilab/Octave/Sage/.. version? */</p>
<hr />
<div>== General ==<br />
<br />
=== What is a global surrogate model? ===<br />
<br />
A global [http://en.wikipedia.org/wiki/Surrogate_model surrogate model] is a mathematical model that mimics the behavior of a computationally expensive simulation code over '''the complete parameter space''' as accurately as possible, using as little data points as possible. So note that optimization is not the primary goal, although it can be done as a post-processing step. Global surrogate models are useful for:<br />
<br />
* design space exploration, to get a ''feel'' of how the different parameters behave<br />
* sensitivity analysis<br />
* ''what-if'' analysis<br />
* prototyping<br />
* visualization<br />
* ...<br />
<br />
In addition they are a cheap way to model large scale systems, multiple global surrogate models can be chained together in a model cascade.<br />
<br />
See also the [[About]] page.<br />
<br />
=== What about surrogate driven optimization? ===<br />
<br />
When coining the term '''surrogate driven optimization''' most people associate it with trust-region strategies and simple polynomial models. These frameworks first construct a local surrogate which is optimized to find an optimum. Afterwards, a move limit strategy decides how the local surrogate is scaled and/or moved through the input space. Subsequently the surrogate is rebuild and optimized. I.e. the surrogate zooms in to the global optimum. For instance the [http://www.cs.sandia.gov/DAKOTA/ DAKOTA] Toolbox implements such strategies where the surrogate construction is separated from optimization.<br />
<br />
Such a framework was earlier implemented in the SUMO Toolbox but was deprecated as it didn't fit the philosophy and design of the toolbox. <br />
<br />
Instead another, equally powerful, approach was taken. The current optimization framework is in fact a sampling selection strategy that balances local and global search. In other words, it balances between exploring the input space and exploiting the information the surrogate gives us.<br />
<br />
A configuration example can be found [[Config:SampleSelector#expectedImprovement|here]].<br />
<br />
=== What is (adaptive) sampling? Why is it used? ===<br />
<br />
In classical Design of Experiments you need to specify the design of your experiment up-front. Or in other words, you have to say up-front how many data points you need and how they should be distributed. Two examples are Central Composite Designs and Latin Hypercube designs. However, if your data is expensive to generate (e.g., an expensive simulation code) it is not clear how many points are needed up-front. Instead data points are selected adaptively, only a couple at a time. This process of incrementally selecting new data points in regions that are the most interesting is called adaptive sampling, sequential design, or active learning. Of course the sampling process needs to start from somewhere so the very first set of points is selected based on a fixed, classic experimental design. See also [[Running#Understanding_the_control_flow]].<br />
SUMO provides a number of different sampling algorithms: [[SampleSelector]]<br />
<br />
Of course sometimes you dont want to do sampling. For example if you have a fixed dataset you just want to load all the data in one go and model that. For how to do this see [[FAQ#How_do_I_turn_off_adaptive_sampling_.28run_the_toolbox_for_a_fixed_set_of_samples.29.3F]].<br />
<br />
=== What about dynamical, time dependent data? ===<br />
<br />
The original design and purpose was to tackle static input-output systems, where there is no memory. Just a complex mapping that must be learnt and approximated. Of course you can take a fixed time interval and apply the toolbox but that typically is not a desired solution. Usually you are interested in time series prediction, e.g., given a set of output values from time t=0 to t=k, predict what happens at time t=k+1,k+2,...<br />
<br />
The toolbox was originally not intended for this purpose. However, it is quite easy to add support for recurrent models. Automatic generation of dynamical models would involve adding a new model type (just like you would add a new regression technique) or require adapting an existing one. For example it would not be too much work to adapt the ANN or SVM models to support dynamic problems. The only extra work besides that would be to add a new [[Measures|Measure]] that can evaluate the fidelity of the models' prediction.<br />
<br />
Naturally though, you would be unable to use sample selection (since it makes no sense in those problems). Unless of course there is a specialized need for it. In that case you would add a new [[SampleSelector]].<br />
<br />
For more information on this topic [[Contact]] us.<br />
<br />
=== What about classification problems? ===<br />
<br />
The main focus of the SUMO Toolbox is on regression/function approximation. However, the framework for hyperparameter optimization, model selection, etc. can also be used for classification. Starting from version 6.3 a demo file is included in the distribution that shows how this works on a well known test problem. If you want to play around with this feature without waiting for 6.3 to be released [[Contact|just let us know]].<br />
<br />
=== Can the toolbox drive my simulation code directly? ===<br />
<br />
Yes it can. See the [[Interfacing with the toolbox]] page.<br />
<br />
=== What is the difference between the M3-Toolbox and the SUMO-Toolbox? ===<br />
<br />
The SUMO toolbox is a complete, feature-full framework for automatically generating approximation models and performing adaptive sampling. In contrast, the M3-Toolbox was more of a proof-of-principle.<br />
<br />
=== What happened to the M3-Toolbox? ===<br />
<br />
The M3 Toolbox project has been discontinued (Fall 2007) and superseded by the SUMO Toolbox. Please contact tom.dhaene@ua.ac.be for any inquiries and requests about the M3 Toolbox.<br />
<br />
=== How can I stay up to date with the latest news? ===<br />
<br />
To stay up to date with the latest news and releases, we also recommend subscribing to our newsletter [http://www.sumo.intec.ugent.be here]. Traffic will be kept to a minimum (1 message every 2-3 months) and you can unsubscribe at any time.<br />
<br />
You can also follow our blog: [http://sumolab.blogspot.com/ http://sumolab.blogspot.com/].<br />
<br />
=== What is the roadmap for the future? ===<br />
<br />
There is no explicit roadmap since much depends on where our research leads us, what feedback we get, which problems we are working on, etc. However, to get an idea of features to come you can always check the [[Whats new]] page.<br />
<br />
You can also follow our blog: [http://sumolab.blogspot.com/ http://sumolab.blogspot.com/].<br />
<br />
=== Will there be an R/Scilab/Octave/Sage/.. version? ===<br />
<br />
At the start of the project we considered moving from Matlab to one of the available open source alternatives. However, after much discussion we decided against this for several reasons, including:<br />
<br />
* The quality and amount of available Matlab documentation<br />
* The quality and number of Matlab toolboxes<br />
* Many well documented interfacing options (especially the seamless integration with Java)<br />
* Existing experience and know-how of the development team<br />
* The widespread use of the Matlab platform in the target application domains<br />
<br />
Matlab, as a proprietary platform, sure has its problems and deficiencies but the number of advanced algorithms and available toolboxes make it a very attractive platform. Equally important is the fact that every function is properly documented, tested, and includes examples, tutorials, and in some cases GUI tools. A lot of things would have been a lot harder and/or time consuming to implement on one of the other platforms. Add to that the fact that many engineers (particularly in aerospace) already use Matlab quite heavily. Thus given our situation, goals, and resources at the time, Matlab was the best choice for us. <br />
<br />
The other platforms remain on our radar however, and we do look into them from time to time. Though, with our limited resources porting to one of those platforms is not (yet) cost effective.<br />
<br />
=== What are collaboration options? ===<br />
<br />
We will gladly help out with any SUMO-Toolbox related questions or problems. However, since we are a university research group the most interesting goal for us is to work towards some joint publication (e.g., we can help with the modeling of your problem). Alternatively, it is always nice if we could use your data/problem (fully referenced and/or anonymized if necessary of course) as an example application during a conference presentation or in a PhD thesis.<br />
<br />
The most interesting case is if your problem involves sample selection and modeling. This means you have some simulation code or script to drive and you want an accurate model while minimizing the number of data points. In this case, in order for us to optimally help you it would be easiest if we could run your simulation code (or script) locally or access it remotely. Else its difficult to give good recommendations about what settings to use.<br />
<br />
If this is not possible (e.g., expensive, proprietary or secret modeling code) or if your problem does not involve sample selection, you can send us a fixed data set that is representative of your problem. Again, this may be fully anonymized and will be kept confidential of course.<br />
<br />
In either case (code or dataset) remember:<br />
<br />
* the data file should be an ASCII file in column format (each row containing one data point) (see also [[Interfacing_with_the_toolbox]])<br />
* include a short description of your data:<br />
** number of inputs and number of outputs<br />
** the range of each input (or scaled to [-1 1] if you do not wish to disclose this)<br />
** if the outputs are real or complex valued<br />
** how noisy the data is or if it is completely deterministic (computer simulation) (please also see: [[FAQ#My_data_contains_noise_can_the_SUMO-Toolbox_help_me.3F]]).<br />
** if possible the expected range of each output (or scaled if you do not wish to disclose this)<br />
** if possible the names of each input/output + a short description of what they mean<br />
** any further insight you have about the data, expected behavior, expected importance of each input, etc.<br />
<br />
If you have any further questions or comments related to this please [[Contact]] us.<br />
<br />
=== Can you help me model my problem? ===<br />
<br />
Please see the previous question: [[FAQ#What_are_collaboration_options.3F]]<br />
<br />
== Installation and Configuration ==<br />
<br />
=== What is the relationship between Matlab and Java? ===<br />
<br />
Many people do not know this, but your Matlab installation automatically includes a Java virtual machine. By default, Matlab seamlessly integrates with Java, allowing you to create Java objects from the command line (e.g., 's = java.lang.String'). It is possible to disable java support but in order to use the SUMO Toolbox it should not be. To check if Java is enabled you can use the 'usejava' command.<br />
<br />
=== What is Java, why do I need it, do I have to install it, etc. ? ===<br />
<br />
The short answer is: no, dont worry about it. The long answer is: Some of the code of the SUMO Toolbox is written in [http://en.wikipedia.org/wiki/Java_(programming_language) Java], since it makes a lot more sense in many situations and is a proper programming language instead of a scripting language like Matlab. Since Matlab automatically includes a JVM to run Java code there is nothing you need to do or worry about (see the previous FAQ entry). Unless its not working of course, in that case see [[FAQ#When_running_the_toolbox_you_get_something_like_.27.3F.3F.3F_Undefined_variable_.22ibbt.22_or_class_.22ibbt.sumo.config.ContextConfig.setRootDirectory.22.27]].<br />
<br />
=== What is XML? ===<br />
<br />
XML stands for eXtensible Markup Language and is related to HTML (= the stuff web pages are written in). The first thing you have to understand is that '''does not do anything'''. Honest. Many engineers are not used to it and think it is some complicated computer programming language-stuff-thingy. This is of course not the case (we ignore some of the fancy stuff you can do with it for now). XML is a markup language meaning, it provides some rules how you can annotate or structure existing text.<br />
<br />
The way SUMO uses XML is really simple and there is not much to understand. First some simple terminology. Take the following example:<br />
<br />
<source lang="xml"><br />
<Foo attr="bar">bla bla bla</Foo> <br />
</source><br />
<br />
Here we have '''a tag''' called ''Foo'' containing text ''bla bla bla''. The tag Foo also has an '''attribute''' ''attr'' with value ''bar''. '<Foo>' is what we call the '''opening tag''', and '</Foo>' is the '''closing tag'''. Each time you open a tag you must close it again. How you name the tags or attributes it totally up to you, you choose :)<br />
<br />
Lets take a more interesting example. Here we have used XML to represent information about a receipe for pancakes:<br />
<br />
<source lang="xml"><br />
<recipe category="dessert"><br />
<title>Pancakes</title><br />
<author>sumo@intec.ugent.be</author><br />
<date>Wed, 14 Jun 95</date><br />
<description><br />
Good old fashioned pancakes.<br />
</description><br />
<ingredients><br />
<item><br />
<amount>3</amount><br />
<type>eggs</type><br />
</item><br />
<br />
<item><br />
<amount>0.5 tablespoon</amount><br />
<type>salt</type><br />
</item><br />
...<br />
</ingredients><br />
<preparation><br />
...<br />
</preparation><br />
</recipe><br />
</source><br />
<br />
So basically, you see that XML is just a way to structure, order, and group information. Thats it! So SUMO basically uses it to store and structure configuration options. And this works well due to the nice hierarchical nature of XML.<br />
<br />
If you understand this there is nothing else to it in order to be able to understand the SUMO configuration files. If you need more information see the tutorial here: [http://www.w3schools.com/XML/xml_whatis.asp http://www.w3schools.com/XML/xml_whatis.asp]. You can also have a look at the wikipedia page here: [http://en.wikipedia.org/wiki/XML http://en.wikipedia.org/wiki/XML]<br />
<br />
=== Why does SUMO use XML? ===<br />
<br />
XML is the defacto standard way of structuring information. This ranges from spreadsheet files (Microsoft Excel for example), to configuration data, to scientific data, ... There are even whole database systems based solely on XML. So basically, its an intuitive way to structure data and it is used everywhere. This makes that there are a very large number of libraries and programming languages available that can parse, and handle XML easily. That means less work for the programmer. Then of course there is stuff like XSLT, XQuery, etc that makes life even easier.<br />
So basically, it would not make sense for SUMO to use any other format :)<br />
<br />
=== I get an error that SUMO is not yet activated ===<br />
<br />
Make sure you installed the activation file that was mailed to you as is explained in the [[Installation]] instructions. Also double check your system meets the [[System requirements]] and that [http://www.sumowiki.intec.ugent.be/index.php/FAQ#When_running_the_toolbox_you_get_something_like_.27.3F.3F.3F_Undefined_variable_.22ibbt.22_or_class_.22ibbt.sumo.config.ContextConfig.setRootDirectory.22.27|java java is enabled]. To fully verify that the activation file installation is correct ensure that the file ContextConfig.class is present in the directory ''<SUMO installation directory>/bin/java/ibbt/sumo/config''.<br />
<br />
Please note that more flexible research licenses are available if it is possible to [[FAQ#What_are_collaboration_options.3F|collaborate in any way]].<br />
<br />
== Upgrading ==<br />
<br />
=== How do I upgrade to a newer version? ===<br />
<br />
Delete your old <code><SUMO-Toolbox-directory></code> completely and replace it by the new one. Install the new activation file / extension pack as before (see [[Installation]]), start Matlab and make sure the default run works. To port your old configuration files to the new version: make a copy of default.xml (from the new version) and copy over your custom changes (from the old version) one by one. This should prevent any weirdness if the XML structure has changed between releases.<br />
<br />
If you had a valid activation file for the previous version, just [[Contact]] us (giving your SUMOlab website username) and we will send you a new activation file. Note that to update an activation file you must first unzip a copy of the toolbox to a new directory and install the activation file as if it was the very first time. Upgrading of an activation file without performing a new toolbox install is (unfortunately) not (yet) supported.<br />
<br />
== Using ==<br />
<br />
=== I have no idea how to use the toolbox, what should I do? ===<br />
<br />
See: [[Running#Getting_started]]<br />
<br />
=== I want to try one of the different examples ===<br />
<br />
See [[Running#Running_different_examples]].<br />
<br />
=== I want to model my own problem ===<br />
<br />
See : [[Adding an example]].<br />
<br />
=== I want to contribute some data/patch/documentation/... ===<br />
<br />
See : [[Contributing]].<br />
<br />
=== How do I interface with the SUMO Toolbox? ===<br />
<br />
See : [[Interfacing with the toolbox]].<br />
<br />
=== What configuration options (model type, sample selection algorithm, ...) should I use for my problem? ===<br />
<br />
See [[General_guidelines]].<br />
<br />
=== Ok, I generated a model, what can I do with it? ===<br />
<br />
See: [[Using a model]].<br />
<br />
=== How can I share a model created by the SUMO Toolbox? ===<br />
<br />
See : [[Using a model#Model_portability| Model portability]].<br />
<br />
=== I dont like the final model generated by SUMO how do I improve it? ===<br />
<br />
Before you start the modeling you should really ask youself this question: ''What properties do I want to see in the final model?'' You have to think about what for you constitutes a good model and what constitutes a poor model. Then you should rank those properties depending on how important you find them. Examples are:<br />
<br />
* accuracy in the training data<br />
** is it important that the error in the training data is exactly 0, or do you prefer some smoothing<br />
* accuracy outside the training data<br />
** this is the validation or test error, how important is proper generalization (usually this is very important)<br />
* what does accuracy mean to you? a low maximum error, a low average error, both, ...<br />
* smoothness<br />
** should your model be perfectly smooth or is it acceptable that you have a few small ripples here and there for example<br />
* are some regions of the response more important than others?<br />
** for example you may want to be certain that the minima/maxima are captured very accurately but everything in between is less important<br />
* are there particular special features that your model should have<br />
** for example, capture underlying poles or discontinuities correctly<br />
* extrapolation capability<br />
* ...<br />
<br />
It is important to note that often these criteria may be conflicting. The classical example is fitting noisy data: the lower your training error the higher your testing error. A natural approach is to combine multiple criteria, see [[Multi-Objective Modeling]].<br />
<br />
Once you have decided on a set of requirements the question is then, can the SUMO-Toolbox produce a model that meets them? In SUMO model generation is driven by one or more [[Measures]]. So you should choose the combination of [[Measures]] that most closely match your requirements. Of course we can not provide a Measure for every single property, but it is very straightforward to [[Add_Measure|add your own Measure]].<br />
<br />
Now, lets say you have chosen what you think are the best Measures but you are still not happy with the final model. Reasons could be:<br />
<br />
* you need more modeling iterations or you need to build more models per iteration (see [[Running#Understanding_the_control_flow]]). This will result in a more extensive search of the model parameter space, but will take longer to run.<br />
* you should switch to a different model parameter optimization algorithm (e.g., for example instead of the Pattern Search variant, try the Genetic Algorithm variant of your AdaptiveModelBuilder.)<br />
* the model type you are using is not ideally suited to your data<br />
* there simply is not enough data, use a larger initial design or perform more sampling iterations to get more information per dimension<br />
* maybe the sample distribution is causing troubles for your model (e.g., Kriging can have problems with clustered data). In that case it could be worthwhile to choose a different sample selection algorithm.<br />
* the range of your response variable is not ideal (for example, neural networks have trouble modeling data if the range of the outputs is very very small)<br />
<br />
You may also refer to the following [[General_guidelines]]. Finally, of course it may be that your problem is simply a very difficult one and does not approximate well. But, still you should at least get something satisfactory.<br />
<br />
If you are having these kinds of problems, please [[Reporting_problems|let us know]] and we will gladly help out.<br />
<br />
=== My data contains noise can the SUMO-Toolbox help me? ===<br />
<br />
The original purpose of the SUMO-Toolbox was for it to be used in conjunction with computer simulations. Since these are fully deterministic you do not have to worry about noise in the data and all the problems it causes. However, the methods in the toolbox are general fitting methods that work on noisy data as well. So yes, the toolbox can be used with noisy data, but you will just have to be more careful about how you apply the methods and how you perform model selection. Its only when you use the toolbox with a noisy simulation engine that a few special options may need to be set. In that case [[Contact]] us for more information.<br />
<br />
Note though, that the toolbox is not a statistical package, if you have noisy data and you need noise estimation algorithms, kernel smoothing algorithms, etc. you should look towards other tools.<br />
<br />
=== What is the difference between a ModelBuilder and a ModelFactory? ===<br />
<br />
See [[Add Model Type]].<br />
<br />
=== Why are the Neural Networks so slow? ===<br />
<br />
The ANN models are an extremely powerful model type that give very good results in many problems. However, they are quite slow to use. There are some things you can do:<br />
<br />
* use trainlm or trainscg instead of the default training function trainbr. trainbr gives very good, smooth results but is slower to use. If results with trainlm are not good enough, try using msereg as a performance function.<br />
* try setting the training goal (= the SSE to reach during training) to a small positive number (e.g., 1e-5) instead of 0.<br />
* check that the output range of your problem is not very small. If your response data lies between 10e-5 and 10e-9 for example it will be very hard for the neural net to learn it. In that case rescale your data to a more sane range.<br />
* switch from ANN to one of the other neural network modelers: fanngenetic or nanngenetic. These are a lot faster than the default backend based on the [http://www.mathworks.com/products/neuralnet/ Matlab Neural Network Toolbox]. However, the accuracy is usually not as good.<br />
* If you are using [[Measures#CrossValidation| CrossValidation]] try to switch to a different measure since CrossValidation is very expensive to use. CrossValidation is used by default if you have not defined a [[Measures| measure]] yourself. When using one of the neural network model types, try to use a different measure if you can. For example, our tests have shown that minimizing the sum of [[Measures#SampleError| SampleError]] and [[Measures#LRMMeasure| LRMMeasure]] can give equal or even better results than CrossValidation, while being much cheaper (see [[Multi-Objective Modeling]] for how to combine multiple measures). See also the comments in <code>default.xml</code> for examples.<br />
<br />
See also [[FAQ#How_can_I_make_the_toolbox_run_faster.3F]]<br />
<br />
=== How can I make the toolbox run faster? ===<br />
<br />
There are a number of things you can do to speed things up. These are listed below. Remember though that the main reason the toolbox may seem to be slow is due to the many models being built as part of the hyperparameter optimization. Please make sure you fully understand the [[Running#Understanding_the_control_flow|control flow described here]] before trying more advanced options.<br />
<br />
* First of all check that your virus scanner is not interfering with Matlab. If McAfee or any other program wants to scan every file SUMO generates this really slows things down and your computer becomes unusable.<br />
<br />
* Turn off the plotting of models in [[Config:ContextConfig#PlotOptions| ContextConfig]], you can always generate plots from the saved mat files<br />
<br />
* This is an important one. For most model builders there is an option "maxFunEals", "maxIterations", or equivalent. Change this value to change the maximum number of models built between 2 sampling iterations. The higher this number, the slower, but the better the models ''may'' be. Equivalently, for the Genetic model builders reduce the population size and the number of generations.<br />
<br />
* If you are using [[Measures#CrossValidation]] see if you can avoid it and use one of the other measures or a combination of measures (see [[Multi-Objective Modeling]]<br />
<br />
* If you are using a very dense [[Measures#ValidationSet]] as your Measure, this means that every single model will be evaluated on that data set. For some models like RBF, Kriging, SVM, this can slow things down.<br />
<br />
* Disable some, or even all of the [[Config:ContextConfig#Profiling| profilers]] or disable the output handlers that draw charts. For example, you might use the following configuration for the profilers:<br />
<br />
<source lang="xml"><br />
<Profiling><br />
<Profiler name=".*share.*|.*ensemble.*|.*Level.*" enabled="true"><br />
<Output type="toImage"/><br />
<Output type="toFile"/><br />
</Profiler><br />
<br />
<Profiler name=".*" enabled="true"><br />
<Output type="toFile"/><br />
</Profiler><br />
</Profiling><br />
</source><br />
<br />
The ".*" means match any one or more characters ([http://java.sun.com/j2se/1.5.0/docs/api/java/util/regex/Pattern.html see here for the full list of supported wildcards]). Thus in this example all the profilers that have "share", "ensemble", or "Level" in their name shoud be enabled and should be saved as a text file (toFile) AND as an image file (toImage). All the other profilers should be saved just to file. The idea is to only save to image what you want as an image since image generation is expensive. If you do this or switch off image generation completely you will see everything run much faster.<br />
<br />
* Decrease the logging granularity, a log level of FINE (the default is FINEST or ALL) is more then granular enough. Setting it to FINE, INFO, or even WARNING should speed things up.<br />
<br />
* If you have a multi-core/multi-cpu machine:<br />
** if you have the Matlab Parallel Computing Toolbox, try setting the parallelMode option to true in [[Config:ContextConfig]]. Now all model training occurs in parallel. This may give unexpected errors in some cases so beware when using.<br />
** if you are using a native executable or script as the sample evaluator set the threadCount variable in [[Config:SampleEvaluator#LocalSampleEvaluator| LocalSampleEvaluator]] equal to the number of cores/CPUs (only do this if it is ok to start multiple instances of your simulation script in parallel!)<br />
<br />
* Dont use the Min-Max measure, it can slow things down. See also [[FAQ#How_do_I_force_the_output_of_the_model_to_lie_in_a_certain_range]]<br />
<br />
* If you are using neural networks see [[FAQ#Why_are_the_Neural_Networks_so_slow.3F]]<br />
<br />
* If you are having problems with very slow or seemingly hanging runs:<br />
** Do a run inside the [http://www.mathworks.com/access/helpdesk/help/techdoc/index.html?/access/helpdesk/help/techdoc/matlab_env/f9-17018.html&http://www.google.be/search?client=firefox-a&rls=org.mozilla%3Aen-US%3Aofficial&channel=s&hl=nl&q=matlab+profiler&meta=&btnG=Google+zoeken Matlab profiler] and see where most time is spent.<br />
<br />
** Monitor CPU and physical/virtual memory usage while the SUMO toolbox is running and see if you notice anything strange. <br />
<br />
* Also note that by default Matlab only allocates about 117 MB memory space for the Java Virtual Machine. If you would like to increase this limit (which you should) please follow the instructions [http://www.mathworks.com/support/solutions/data/1-18I2C.html?solution=1-18I2C here]. See also the general memory instructions [http://www.mathworks.com/support/tech-notes/1100/1106.html here].<br />
<br />
To check if your SUMO run has hanged, monitor your log file (with the level set at least to FINE). If you see no changes for about 30 minutes the toolbox will probably have stalled. [[Reporting problems| report the problems here]].<br />
<br />
Such problems are hard to identify and fix so it is best to work towards a reproducible test case if you think you found a performance or scalability issue.<br />
<br />
=== How do I build models with more than one output ===<br />
<br />
Sometimes you have multiple responses that you want to model at once. See [[Running#Models_with_multiple_outputs]]<br />
<br />
=== How do I turn off adaptive sampling (run the toolbox for a fixed set of samples)? ===<br />
<br />
See : [[Adaptive Modeling Mode]].<br />
<br />
=== How do I change the error function (relative error, RMSE, ...)? ===<br />
<br />
The [[Measures| <Measure>]] tag specifies the algorithm to use to assign models a score, e.g., [[Measures#CrossValidation| CrossValidation]]. It is also possible to specify which '''error function''' to use, in the measure. The default error function is '<code>rootRelativeSquareError</code>'.<br />
<br />
Say you want to use [[Measures#CrossValidation| CrossValidation]] with the maximum absolute error, then you would put:<br />
<br />
<source lang="xml"><br />
<Measure type="CrossValidation" target="0.001" errorFcn="maxAbsoluteError"/><br />
</source><br />
<br />
On the other hand, if you wanted to use the [[Measures#ValidationSet| ValidationSet]] measure with a relative root-mean-square error you would put:<br />
<br />
<source lang="xml"><br />
<Measure type="ValidationSet" target="0.001" errorFcn="relativeRms"/><br />
</source><br />
<br />
The default error function is '<code>rootRelativeSquareError</code>'. These error functions can be found in the <code>src/matlab/tools/errorFunctions</code> directory. You are free to modify them and add your own. Remember that the choice of error function is very important! Make sure you think well about it. Also see [[Multi-Objective Modeling]].<br />
<br />
=== How do I enable more profilers? ===<br />
<br />
Go to the [[Config:ContextConfig#Profiling| <Profiling>]] tag and put <code>"<nowiki>.*</nowiki>"</code> as the regular expression. See also the next question.<br />
<br />
=== What regular expressions can I use to filter profilers? ===<br />
<br />
See the syntax [http://java.sun.com/j2se/1.5.0/docs/api/java/util/regex/Pattern.html here].<br />
<br />
=== How can I ensure deterministic results? ===<br />
<br />
See : [[Random state]].<br />
<br />
=== How do I get a simple closed-form model (symbolic expression)? ===<br />
<br />
See : [[Using a model]].<br />
<br />
=== How do I enable the Heterogenous evolution to automatically select the best model type? ===<br />
<br />
Simply use the [[Config:AdaptiveModelBuilder#heterogenetic| heterogenetic modelbuilder]] as you would any other.<br />
<br />
=== What is the combineOutputs option? ===<br />
<br />
See [[Running#Models_with_multiple_outputs]]<br />
<br />
=== What error function should I use? ===<br />
<br />
The default error function is the Root Relative Square Error (RRSE). On the other hand meanRelativeError may be more intuitive but in that case you have to be careful if you have function values close to zero since in that case the relative error explodes or even gives infinity. You could also use one of the combined relative error functions (contain a +1 in the denominator to account for small values) but then you get something between a relative and absolute error (=> hard to interpret).<br />
<br />
So to be sure an absolute error seems the safest bet (like the RMSE), however in that case you have to come up with sensible accuracy targets and realize that you will build models that try to fit the regions of high absolute value better than the low ones.<br />
<br />
Picking an error function is a very tricky business and many people do not realize this. Which one is best for you and what targets you use ultimately depends on your application and on what kind of model you want. There is no general answer.<br />
<br />
A recommended read is [http://www.springerlink.com/content/24104526223221u3/ is this paper]. See also the page on [[Multi-Objective Modeling]].<br />
<br />
=== I just want to generate an initial design (no sampling, no modeling) ===<br />
<br />
Do a regular SUMO run, except set the 'maxModelingIterations' in the SUMO tag to 0. The resulting run will only generate (and evaluate) the initial design and save it to samples.txt in the output directory.<br />
<br />
=== How do I start a run with the samples of of a previous run, or with a custom initial design? ===<br />
<br />
Use a Dataset design component, for example:<br />
<br />
<source lang="xml"><br />
<InitialDesign type="DatasetDesign"><br />
<Option key="file" value="/path/to/the/file/containing/the/points.txt"/><br />
</InitialDesign><br />
</source><br />
<br />
=== What is a level plot? ===<br />
<br />
A level plot is a plot that shows how the error histogram changes as the best model improves. An example is:<br />
<gallery><br />
Image:levelplot.png<br />
</gallery><br />
Level plots only work if you have a separate dataset (test set) that the model can be checked against. See the comments in default.xml for how to enable level plots.<br />
<br />
===I am getting a java out of memory error, what happened?===<br />
Datasets are loaded through java. This means that the java heap space is used for storing the data. If you try to load a huge dataset (> 50MB), you might experience problems with the maximum heap size. You can solve this by raising the heap size as described on the following webpage:<br />
[http://www.mathworks.com/support/solutions/data/1-18I2C.html]<br />
<br />
=== How do I force the output of the model to lie in a certain range ===<br />
<br />
See [[Measures#MinMax]].<br />
<br />
=== My problem is high dimensional and has a lot of input parameters (more than 10). Can I use SUMO? ===<br />
<br />
That depends. Remember that the main focus of SUMO is to generate accurate 'global' models. If you want to do sampling the practical dimensionality is limited to around 6-8 (though it depends on the problem and how cheap the simulations are!). Since the more dimensions the more space you need to fill. At that point you need to see if you can extend the models with domain specific knowledge (to improve performance) or apply a dimensionality reduction method ([[FAQ#Can_the_toolbox_tell_me_which_are_the_most_important_inputs_.28.3D_variable_selection.29.3F|see the next question]]). On the other hand, if you don't need to do sample selection but you have a fixed dataset which you want to model. Then the performance on high dimensional data just depends on the model type. For examples SVM type models are independent of the dimension and thus can always be applied. Though things like feature selection are always recommended.<br />
<br />
=== Can the toolbox tell me which are the most important inputs (= variable selection)? ===<br />
<br />
When tackling high dimensional problems a crucial question is "Are all my input parameters relevant?". Normally domain knowledge would answer this question but this is not always straightforward. In those cases a whole set of algorithms exist for doing dimensionality reduction (= feature selection). Support for some of these algorithms may eventually make it into the toolbox but are not currently implemented. That is a whole PhD thesis on its own. However, if a model type provides functions for input relevance determination the toolbox can leverage this. For example, the LS-SVM model available in the toolbox supports Automatic Relevance Determination (ARD). This means that if you use the SUMO Toolbox to generate an LS-SVM model, you can call the function ''ARD()'' on the model and it will give you a list of the inputs it thinks are most important.<br />
<br />
=== Should I use a Matlab script or a shell script for interfacing with my simulation code? ===<br />
<br />
When you want to link SUMO with an external simulation engine (ADS Momentum, SPECTRE, FEBIO, SWAT, ...) you need a [http://en.wikipedia.org/wiki/Shell_script shell script] (or executable) that can take the requested points from SUMO, setup the simulation engine (e.g., set necessary input files), calls the simulator for all the requested points, reads the output (e.g., one or more output files), and returns the results to SUMO (see [[Interfacing with the toolbox]]).<br />
<br />
Which one you choose (matlab script + [[Config:SampleEvaluator#matlab|Matlab Sample Evaluator]], or shell script/executable with [[Config:SampleEvaluator#local|Local Sample Evaluator]] is basically a matter of preference, take whatever is easiest for you.<br />
<br />
HOWEVER, there is one important consideration: Matlab does not support threads so this means that if you use a matlab script to interface with the simulation engine, simulations and modeling will happen sequentially, NOT in parallel. This means the modeling code will sit around waiting, doing nothing, until the simulation(s) have finished. If your simulation code takes a long time to run this is not very efficient. In version 6.2 we will probably fix this by using the Parallel Computing Toolbox.<br />
<br />
On the other hand, using a shell script/executable, does allow the modeling and simulation to occur in parallel (at least if you wrote your interface script in such a way that it can be run multiple times in parallel, i.e., no shared global directories or variables that can cause [http://en.wikipedia.org/wiki/Race_condition race conditions]).<br />
<br />
As a sidenote, note that if you already put work into a Matlab script, it is still possible to use a shell script, by writing a shell script that starts Matlab (using -nodisplay or -nojvm options), executes your script (using the -r option), and exits Matlab again. Of course it is not very elegant and adds some overhead but depending on your situation it may be worth it.<br />
<br />
=== Is there any design documentation available? ===<br />
<br />
There is a PhD thesis fully describing the software architecture and design rationale behind the toolbox. It will be put online in the future. Until then you can [[Contact]] us to obtain a copy.<br />
<br />
== Troubleshooting ==<br />
<br />
=== I have a problem and I want to report it ===<br />
<br />
See : [[Reporting problems]].<br />
<br />
=== I sometimes get flat models when using rational functions ===<br />
<br />
First make sure the model is indeed flat, and does not just appear so on the plot. You can verify this by looking at the output axis range and making sure it is within reasonable bounds. When there are poles in the model, the axis range is sometimes stretched to make it possible to plot the high values around the pole, causing the rest of the model to appear flat. If the model contains poles, refer to the next question for the solution.<br />
<br />
The [[Config:AdaptiveModelBuilder#rational| RationalModel]] tries to do a least squares fit, based on which monomials are allowed in numerator and denominator. We have experienced that some models just find a flat model as the best least squares fit. There are two causes for this:<br />
<br />
* The number of sample points is few, and the model parameters (as explained [[Model types explained#PolynomialModel|here]]) force the model to use only a very small set of degrees of freedom. The solution in this case is to increase the minimum percentage bound in the RationalFactory section of your configuration file: change the <code>"percentBounds"</code> option to <code>"60,100"</code>, <code>"80,100"</code>, or even <code>"100,100"</code>. A setting of <code>"100,100"</code> will force the polynomial models to always exactly interpolate. However, note that this does not scale very well with the number of samples (to counter this you can set <code>"maxDegrees"</code>). If, after increasing the <code>"percentBounds"</code> you still get weird, spiky, models you simply need more samples or you should switch to a different model type.<br />
* Another possibility is that given a set of monomial degrees, the flat function is just the best possible least squares fit. In that case you simply need to wait for more samples.<br />
* The measure you are using is not accurately estimating the true error, try a different measure or error function. Note that a maximum relative error is dangerous to use since a the 0-function (= a flat model) has a lower maximum relative error than a function which overshoots the true behavior in some places but is otherwise correct.<br />
<br />
=== When using rational functions I sometimes get 'spikes' (poles) in my model ===<br />
<br />
When the denominator polynomial of a rational model has zeros inside the domain, the model will tend to infinity near these points. In most cases these models will only be recognized as being `the best' for a short period of time. As more samples get selected these models get replaced by better ones and the spikes should disappear.<br />
<br />
So, it is possible that a rational model with 'spikes' (caused by poles inside the domain) will be selected as best model. This may or may not be an issue, depending on what you want to use the model for. If it doesn't matter that the model is very inaccurate at one particular, small spot (near the pole), you can use the model with the pole and it should perform properly.<br />
<br />
However, if the model should have a reasonable error on the entire domain, several methods are available to reduce the chance of getting poles or remove the possibility altogether. The possible solutions are:<br />
<br />
* Simply wait for more data, usually spikes disappear (but not always).<br />
* Lower the maximum of the <code>"percentBounds"</code> option in the RationalFactory section of your configuration file. For example, say you have 500 data points and if the maximum of the <code>"percentBounds"</code> option is set to 100 percent it means the degrees of the polynomials in the rational function can go up to 500. If you set the maximum of the <code>"percentBounds"</code> option to 10, on the other hand, the maximum degree is set at 50 (= 10 percent of 500). You can also use the <code>"maxDegrees"</code> option to set an absolute bound.<br />
* If you roughly know the output range your data should have, an easy way to eliminate poles is to use the [[Measures#MinMax| MinMax]] [[Measures| Measure]] together with your current measure ([[Measures#CrossValidation| CrossValidation]] by default). This will cause models whose response falls outside the min-max bounds to be penalized extra, thus spikes should disappear.<br />
* Use a different model type (RBF, ANN, SVM,...), as spikes are a typical problem of rational functions.<br />
* Increase the population size if using the genetic version<br />
* Try using the [[SampleSelector#RationalPoleSuppressionSampleSelector| RationalPoleSuppressionSampleSelector]], it was designed to get rid of this problem more quickly, but it only selects one sample at the time.<br />
<br />
However, these solutions may not still not suffice in some cases. The underlying reason is that the order selection algorithm contains quite a lot of randomness, making it prone to over-fitting. This issue is being worked on but will take some time. Automatic order selection is not an easy problem<br />
<br />
=== There is no noise in my data yet the rational functions don't interpolate ===<br />
<br />
[[FAQ#I sometimes get flat models when using rational functions |see this question]].<br />
<br />
=== When loading a model from disk I get "Warning: Class ':all:' is an unknown object class. Object 'model' of this class has been converted to a structure." ===<br />
<br />
You are trying to load a model file without the SUMO Toolbox in your Matlab path. Make sure the toolbox is in your Matlab path. <br />
<br />
In short: Start Matlab, run <code><SUMO-Toolbox-directory>/startup.m</code> (to ensure the toolbox is in your path) and then try to load your model.<br />
<br />
=== When running the SUMO Toolbox you get an error like "No component with id 'annpso' of type 'adaptive model builder' found in config file." ===<br />
<br />
This means you have specified to use a component with a certain id (in this case an AdaptiveModelBuilder component with id 'annpso') but a component with that id does not exist further down in the configuration file (in this particular case 'annpso' does not exist but 'anngenetic' or 'ann' does, as a quick search through the configuration file will show). So make sure you only declare components which have a definition lower down. So see which components are available, simply scroll down the configuration file and see which id's are specified. Please also refer to the [[Toolbox configuration#Declarations and Definitions | Declarations and Definitions]] page.<br />
<br />
=== When using NANN models I sometimes get "Runtime error in matrix library, Choldc failed. Matrix not positive definite" ===<br />
<br />
This is a problem in the mex implementation of the [http://www.iau.dtu.dk/research/control/nnsysid.html NNSYSID] toolbox. Simply delete the mex files, the Matlab implementation will be used and this will not cause any problems.<br />
<br />
=== When using FANN models I sometimes get "Invalid MEX-file createFann.mexa64, libfann.so.2: cannot open shared object file: No such file or directory." ===<br />
<br />
This means Matlab cannot find the [http://leenissen.dk/fann/ FANN] library itself to link to dynamically. Make sure it is in your library path, ie, on unix systems, make sure it is included in LD_LIBRARY_PATH.<br />
<br />
=== When trying to use SVM models I get 'Error during fitness evaluation: Error using ==> svmtrain at 170, Group must be a vector' ===<br />
<br />
You forgot to build the SVM mex files for your platform. For windows they are pre-compiled for you, on other systems you have to compile them yourself with the makefile.<br />
<br />
=== When running the toolbox you get something like '??? Undefined variable "ibbt" or class "ibbt.sumo.config.ContextConfig.setRootDirectory"' ===<br />
<br />
First see [[FAQ#What_is_the_relationship_between_Matlab_and_Java.3F | this FAQ entry]].<br />
<br />
This means Matlab cannot find the needed Java classes. This typically means that you forgot to run 'startup' (to set the path correctly) before running the toolbox (using 'go'). So make sure you always run 'startup' before running 'go' and that both commands are always executed in the toolbox root directory.<br />
<br />
If you did run 'startup' correctly and you are still getting an error, check that Java is properly enabled:<br />
<br />
# typing 'usejava jvm' should return 1 <br />
# typing 's = java.lang.String', this should ''not'' give an error<br />
# typing 'version('-java')' should return at least version 1.5.0<br />
<br />
If (1) returns 0, then the jvm of your Matlab installation is not enabled. Check your Matlab installation or startup parameters (did you start Matlab with -nojvm?)<br />
If (2) fails but (1) is ok, there is a very weird problem, check the Matlab documentation.<br />
If (3) returns a version before 1.5.0 you will have to upgrade Matlab to a newer version or force Matlab to use a custom, newer, jvm (See the Matlab docs for how to do this).<br />
<br />
=== You get errors related to ''gaoptimset'',''psoptimset'',''saoptimset'',''newff'' not being found or unknown ===<br />
<br />
You are trying to use a component of the SUMO toolbox that requires a Matlab toolbox that you do not have. See the [[System requirements]] for more information.<br />
<br />
=== After upgrading I get all kinds of weird errors or warnings when I run my XML files ===<br />
<br />
See [[FAQ#How_do_I_upgrade_to_a_newer_version.3F]]<br />
<br />
=== I get a warning about duplicate samples being selected, why is this? ===<br />
<br />
Sometimes, in special circumstances, multiple sample selectors may select the same sample at the same time. Even though in most cases this is detected and avoided, it can still happen when multiple outputs are modelled in one run, and each output is sampled by a different sample selector. These sample selectors may then accidentally choose the same new sample location.<br />
<br />
=== I sometimes see the error of the best model go up, shouldn't it decrease monotonically? ===<br />
<br />
There is no short answer here, it depends on the situation. Below 'single objective' refers to the case where during the hyperparameter optimization (= the modeling iteration) combineOutputs=false, and there is only a single measure set to 'on'. The other cases are classified as 'multi objective'. See also [[Multi-Objective Modeling]].<br />
<br />
# '''Sampling off'''<br />
## ''Single objective'': the error should always decrease monotonically, you should never see it rise. If it does [[reporting problems|report it as a bug]]<br />
## ''Multi objective'': There is a very small chance the error can temporarily decrease but it should be safe to ignore. In this case it is best to use a multi objective enabled modeling algorithm<br />
# '''Sampling on'''<br />
## ''Single objective'': inside each modeling iteration the error should always monotonically decrease. At each sampling iteration the best models are updated (to reflect the new data), thus there the best model score may increase, this is normal behavior(*). It is possible that the error increases for a short while, but as more samples come in it should decrease again. If this does not happen you are using a poor measure or poor hyperparameter optimization algorithm, or there is a problem with the modeling technique itself (e.g., clustering in the datapoints is causing numerical problems).<br />
## ''Multi objective'': Combination of 1.2 and 2.1.<br />
<br />
(*) This is normal if you are using a measure like cross validation that is less reliable on little data than on more data. However, in some cases you may wish to override this behavior if you are using a measure that is independent of the number of samples the model is trained with (e.g., a dense, external validation set). In this case you can force a monotonic decrease by setting the 'keepOldModels' option in the SUMO tag to true. Use with caution!<br />
<br />
=== At the end of a run I get Undefined variable "ibbt" or class "ibbt.sumo.util.JpegImagesToMovie.createMovie" ===<br />
<br />
This is normal, the warning printed out before the error explains why:<br />
<br />
''[WARNING] jmf.jar not found in the java classpath, movie creation may not work! Did you install the SUMO extension pack? Alternatively you can install the java media framwork from java.sun.com''<br />
<br />
By default, at the end of a run, the toolbox will try to generate a movie of all the intermediate model plots. To do this it requires the extension pack to be installed (you can download it from the SUMO lab website). So install the extension pack and you will no longer get the error. Alternatively you can simply set the "createMovie" option in the <SUMO> tag to "false".<br />
So note that there is nothing to worry about, everything has run correctly, it is just the movie creation that is failing.<br />
<br />
=== On startup I get the error "java.io.IOException: Couldn't get lock for output/SUMO-Toolbox.%g.%u.log" ===<br />
<br />
This error means that SUMO is unable to create the log file. Check the output directory exists and has the correct permissions. If your output directory is on a shared (network) drive this could also cause problems. Also make sure you are running the toolbox (calling 'go') from the toolbox root directory, and not in some toolbox sub directory! This is very important.<br />
<br />
If you still have problems you can override the default logfile name and location as follows:<br />
<br />
In the <FileHandler> tag inside the <Logging> tag add the following option:<br />
<br />
<code><br />
<Option key="Pattern" value="My_SUMO_Log_file.log"/><br />
</code><br />
<br />
This means that from now on the sumo log file will be saved as the file "My_SUMO_Log_file.log" in the SUMO root directory. You can use any path you like.<br />
For more information about this option see [http://java.sun.com/j2se/1.4.2/docs/api/java/util/logging/FileHandler.html the FileHandler Javadoc].<br />
<br />
=== The Toolbox crashes with "Too many open files" what should I do? ===<br />
<br />
This is a known bug, see [[Known_bugs#Version_6.1]].<br />
<br />
If this does not fix your problem then do the following:<br />
<br />
On Windows try increasing the limit in windows as dictated by the error message. Also, when you get the error, use the fopen("all") command to see which files are open and send us the list of filenames. Then we can maybe further help you debug the problem. Even better would be to use the Process Explorer utility [http://technet.microsoft.com/en-us/sysinternals/bb896653.aspx available here]. When you get the error, dont shut down Matlab but start Process explorer and see which SUMO-Toolbox related files are open. If you then [[Reporting_problems|let us know]] we can further debug the problem.<br />
<br />
On Linux again don't shut down Matlab but:<br />
<br />
* open a new terminal window<br />
* type:<br />
<source lang="bash"><br />
lsof > openFiles.txt<br />
</source><br />
* Then [[Contact|send us]] the following information:<br />
** the file openFiles.txt <br />
** the exact Linux distribution you are using (Red Hat 10, CentOS 5, SUSE 11, etc).<br />
** the output of<br />
<source lang="bash"><br />
uname -a ; df -T ; mount<br />
</source><br />
<br />
As a temporary workaround you can try increasing the maximum number of open files ([http://www.linuxforums.org/forum/redhat-fedora-linux-help/64716-where-chnage-file-max-permanently.html see for example here]). We are currently debugging this issue.<br />
<br />
In general: to be safe it is always best to do a SUMO run from a clean Matlab startup, especially if the run is important or may take a long time.<br />
<br />
=== When using the LS-SVM models I get lots of warnings: "make sure lssvmFILE.x (lssvmFILE.exe) is in the current directory, change now to MATLAB implementation..." ===<br />
<br />
The LS-SVMs have a C implementation and a Matlab implementation. If you dont have the compiled mex files it will use the matlab implementation and give a warning. But everything will work properly. To get rid of the warnings, compile the mex files [[Installation#Windows|as described here]], this can be done very easily. Or simply comment out the lines that produce the output in the lssvmlab directory in src/matlab/contrib.<br />
<br />
=== I get an error "Undefined function or method 'trainlssvm' for input arguments of type 'cell'" ===<br />
<br />
You most likely forgot to [[Installation#Extension_pack|install the extension pack]].<br />
<br />
=== When running the SUMO-Toolbox under Linux, the [http://en.wikipedia.org/wiki/X_Window_System X server] suddenly restarts and I am logged out of my session ===<br />
<br />
Note that in Linux there is an explicit difference between the [http://en.wikipedia.org/wiki/Linux_kernel kernel] and the [http://en.wikipedia.org/wiki/X_Window_System X display server]. If the kernel crashes or panics your system completely freezes (you have to reset manually) or your computer does a full reboot. Luckily this is very rare. However, if you display server (X) crashes or restarts it means your operating system is still running fine, its just that you have to log in again since your graphical session has terminated. The FAQ entry is only for the latter. If you find your kernel is panicing or freezing, that is a more fundamental problem and you should contact your system admin.<br />
<br />
So what happens is that after a few seconds when the toolbox wants to plot the first model [http://en.wikipedia.org/wiki/X_Window_System X] crashes and you are suddenly presented with a login screen. The problem is not due to SUMO but rather to the Matlab - Display server interaction.<br />
<br />
What you should first do is set plotModels to false in the [[Config:ContextConfig]] tag, run again and see if the problem occurs again. If it does please [[Reporting_problems| report it]]. If the problem does not occur you can then try the following:<br />
<br />
* Log in as root (or use [http://en.wikipedia.org/wiki/Sudo sudo])<br />
* Edit the following configuration file using a text editor (pico, nano, vi, kwrite, gedit,...)<br />
<br />
<source lang="bash"><br />
/etc/X11/xorg.conf<br />
</source><br />
<br />
Note: the exact location of the xorg.conf file may vary on your system.<br />
<br />
* Look for the following line:<br />
<br />
<source lang="bash"><br />
Load "glx"<br />
</source><br />
<br />
* Comment it out by replacing it by:<br />
<br />
<source lang="bash"><br />
# Load "glx"<br />
</source><br />
<br />
* Then save the file, restart your X server (if you do not know how to do this simply reboot your computer)<br />
* Log in again, and try running the toolbox (making sure plotModels is set to true again). It should now work. If it still does not please [[Reporting_problems| report it]].<br />
<br />
Note:<br />
* this is just an empirical workaround, if you have a better idea please [[Contact|let us know]]<br />
* if you wish to debug further yourself please check the Xorg log files and those in /var/log<br />
* another possible workaround is to start matlab with the "-nodisplay" option. That could work as well.<br />
<br />
=== I get the error "Failed to close Matlab pool cleanly, error is Too many output arguments" ===<br />
<br />
This happens if you run the toolbox on Matlab version 2008a and you have the parallel computing toolbox installed. You can simply ignore this error message, it does not cause any problems. If you want to use SUMO with the parallel computing toolbox you will need Matlab 2008b.<br />
<br />
=== The toolbox seems to keep on running forever, when or how will it stop? ===<br />
<br />
The toolbox will keep on generating models and selecting data until one of the termination criteria has been reached. It is up to ''you'' to choose these targets carefully, so how low the toolbox runs simply depends on what targets you choose. Please see [[Running#Understanding_the_control_flow]].<br />
<br />
Of course choosing a-priori targets up front is not always easy and there is no real solution for this, except thinking well about what type of model you want (see [[FAQ#I_dont_like_the_final_model_generated_by_SUMO_how_do_I_improve_it.3F]]). In doubt you can always use a small value (or 0) and then simply quit the running toolbox using Ctrl-C when you think its been enough.<br />
<br />
While one could implement fancy, automatic stopping algorithms, their actual benefit is questionable.</div>Dgorissenhttp://sumowiki.intec.ugent.be/index.php?title=About&diff=5100About2010-03-22T12:05:15Z<p>Dgorissen: /* Documentation */</p>
<hr />
<div>== History ==<br />
In 2004, research within the (former) COMS research group, led by professor [http://www.sumo.intec.ugent.be/?q=tomd Tom Dhaene], was focused on developing efficient, adaptive and accurate algorithms for polynomial and rational modeling of linear time-invariant (LTI) systems. This work resulted in a set of Matlab scripts that were used as a testing ground for new ideas and concepts. Research progressed, and with time these scripts were re-worked and refactored into one coherent Matlab toolbox, tentatively named the Multivariate MetaModeling (M3) Toolbox. The first public release of the toolbox (v2.0) occurred in November 2006. In October 2007, the development of the M3 Toolbox was discontinued.<br />
<br />
In April 2008, the first public release of the Surrogate Modeling (SUMO) Toolbox (v5.0) occurred.<br />
<br />
For a list of changes since then refer to the [[Changelog]] and [[Whats new]] pages.<br />
<br />
== Intended use ==<br />
<br />
=== Global Surrogate Models ===<br />
The SUMO Toolbox was originally designed to solve the following problem:<br />
<br />
<center>''Automatically generate a highly accurate surrogate model (= a regression model) for a computational expensive simulation code<br />
<br>requiring as little data points and as little user-interaction as possible.''</center><br />
<br />
In addition the toolbox provides powerful, adaptive algorithms and a whole suite of model types for<br />
* data fitting problems (regression, function approximation, curve fitting)<br />
* response surface modeling (RSM)<br />
* scattered data interpolation<br />
* model selection<br />
* Design Of Experiments (DoE)<br />
* model parameter optimization, e.g., finding the optimal neural network topology, SVM kernel parameters, rational function order, etc. (= hyperparameter optimization)<br />
* iterative adaptive sample selection (also known as sequential design or active learning)<br />
<br />
Note that the SUMO toolbox is able to drive the simulation code directly.<br />
<br />
For domain experts or engineers the SUMO Toolbox provides a flexible, pluggable platform to which the response surface modeling task can be delegated. For researchers in surrogate modeling it provides a common framework to implement, test and benchmark new modeling and sampling algorithms.<br />
<br />
See the Wikipedia [http://en.wikipedia.org/wiki/Surrogate_model surrogate model] page to find out more.<br />
<br />
=== Surrogate Driven Optimization ===<br />
While the main focus of the SUMO Toolbox is to create accurate global surrogate models, it can be used for other goals too.<br />
<br />
For instance, the toolbox can be used to create consecutive local surrogate models for optimization purposes. The information obtained from the local surrogate models is used to guide the adaptive sampling process to the global optimum.<br />
<br />
A good sample strategy for surrogate driven optimization seeks a balance between local search and global search, or refining the surrogate model and finding the optimum.<br />
Such a sample strategy is implemented (akin to (Super)EGO), see the different [[Sample_Selectors#expectedImprovement|sample selectors]] for more information.<br />
<br />
=== Dynamic systems or Time series prediction ===<br />
<br />
See [[FAQ#What_about_dynamical.2C_time_dependent_data.3F]].<br />
<br />
=== Classification ===<br />
<br />
See [[FAQ#What_about_classification_problems.3F]].<br />
<br />
== Application range ==<br />
The SUMO Toolbox has already been applied successfully to a wide range of problems from domains as diverse as aerodynamics, geology, metallurgy, electro-magnetics (EM), electronics, engineering and economics. The SUMO Toolbox can be applied to any situation where the problem can be described as a function that maps a set of inputs onto a set of outputs. We generally refer to this function as the [[Simulator]].<br />
<br />
<br />
[[Image:sumotask.png|center|SUMO-Toolbox : Generating an approximation for a reference model]]<br />
<br />
Across the different problems to which we have applied the toolbox, the input dimension has ranged from 1 to 130 and the output dimension from 1 to 70 (including both complex and real valued outputs). The number of data points has ranged from as little as 15 to as many as 100000.<br />
<br />
== Design goals ==<br />
<br />
The SUMO Toolbox was designed with a number of goals in mind:<br />
<br />
* A flexible tool that integrates different modeling methods and does not tie the user down to one particular set of problems. Reliance on domain specific features should be avoided.<br />
<br />
* The focus should be on adaptivity, i.e., relieving the burden on the domain expert as much as possible. Given a simulation model, the software should produce an accurate surrogate model with minimal user interaction. This also includes easily integrating with the existing design environment.<br />
<br />
* At the same time keeping in mind that there is no such thing as a `one-size-fits-all'. Different problems need to be modeled differently and require different a priori process knowledge. Therefore the software should be modular and easily extensible to new methods.<br />
<br />
* Engineers or domain experts do not tend to trust a black box system that generates models but is unclear about the reasons why a particular model should be preferred. Therefore an important design goal was that the expert user should be able to have full manual control over the modeling process if necessary. In addition the toolbox should support fine grain logging and profiling capabilities so its modeling and sampling decisions can be retraced.<br />
<br />
Given this design philosophy, the toolbox can cater to both the researchers working on novel surrogate modeling techniques as well as to the engineers who need the surrogate model as part of their design process. For the former, the toolbox provides a common platform on which to deploy, test, and compare new modeling algorithms and sampling techniques. For the latter, the software functions as a highly configurable and flexible component to which surrogate model construction can be delegated, easing the burden of the user and enhancing productivity.<br />
<br />
== Features ==<br />
The main features of the toolbox are listed below. For an overview of recent changes see the [[Whats new]] page. A detailed list of changes can be found in the [[Changelog]].<br />
<br />
{| class="wikitable" style="text-align:left" border="0" cellpadding="5" cellspacing="0"<br />
! Implementation Language <br />
| Matlab, Java, and where applicable C, C++<br />
|- <br />
! Design patterns<br />
| Fully object oriented, with the focus on clean design and encapsulation.<br />
|- <br />
! Minimum Requirements<br />
| See the [[system requirements]] page<br />
|-<br />
! Supported data sources*<br />
| Local executable/script, simulation engine, Java class, Matlab script, dataset (txt file) (see [[Interfacing with the toolbox]])<br />
|-<br />
! Supported data types<br />
| Supports multi-dimensional inputs and outputs. Outputs can be any combination of real/complex.<br />
|-<br />
! Supported problem types<br />
| Regression ([[FAQ#What_about_classification_problems.3F|classification]], [[FAQ#What_about_dynamical.2C_time_dependent_data.3F|time series prediction]])<br />
|-<br />
! Configuration<br />
| Extensively configurable through one main [[FAQ#What_is_XML.3F|XML]] configuration file.<br />
|-<br />
! Flexibility<br />
| Virtually every component of the modeling process can be configured, replaced or extended by a user specific, custom implementation<br />
|-<br />
! Predefined accuracy<br />
| The toolbox will run until the user required accuracy has been reached, the maximum number of samples has been exceeded or a timeout has occurred<br />
|-<br />
! Model Types*<br />
| Out of the box support for:<br />
* Polynomial/Rational functions<br />
* Feedforward Neural Networks, 3 implementations<br />
** One based on the [http://www.mathworks.com/products/neuralnet/ Matlab Neural Network toolbox]<br />
** One based on the [http://leenissen.dk/fann/ Fast Artificial Neural Network Library (FANN)]<br />
** One based on the [http://www.iau.dtu.dk/research/control/nnsysid.html NNSYSID Toolbox]<br />
* Radial Basis Function (RBF) Models<br />
* RBF Neural Networks<br />
* Gaussian Process Models (based on [http://www.GaussianProcess.org/gpml/code GPML])<br />
* Kriging Models (two custom implementations)<br />
* Blind Kriging Models<br />
* Smoothing spline models<br />
* Support Vector Machines (SVM)<br />
** Least Squares SVM (based on [http://www.esat.kuleuven.ac.be/sista/lssvmlab/ LS-SVMlab])<br />
** epsilon-SVM (based on [http://www.csie.ntu.edu.tw/~cjlin/libsvm/ LIBSVM] or [http://svmlight.joachims.org/ SVMlight])<br />
** nu-SVM (based on [http://www.csie.ntu.edu.tw/~cjlin/libsvm/ LIBSVM])<br />
|-<br />
! Model parameter optimization algorithms*<br />
| Pattern Search, EOG, Simulated Annealing, Genetic Algorithm, BGFS, DIRECT, Particle Swarm Optimization (PSO), NSGA-II ...<br />
|-<br />
! Sample selection algorithms (=sequential design, active learning)*<br />
| Random, error-based, density-based, gradient-based, and many different hybrids<br />
|-<br />
! Experimental design*<br />
| Latin Hypercube Sampling, Central Composite, Box-Behnken, random, user defined, full factorial<br />
|-<br />
! Model selection measures*<br />
| Validation set, cross-validation, leave-one-out, model difference, AIC (also in a multi-objective context, see [[Multi-Objective Modeling]])<br />
|-<br />
! Sample Evaluation*<br />
| On the local machine (taking advantage of multi-core CPUs) or in parallel on a cluster/grid<br />
|-<br />
! Supported distributed middlewares*<br />
| [http://gridengine.sunsource.net/ Sun Grid Engine], LCG Grid middleware (both accessed through a SSH accessible frontnode)<br />
|-<br />
! Logging<br />
| Extensive logging to enable close monitoring of the modeling process. Logging granularity is fully configurable and log streams can be easily redirected (to file, console, a remote machine, ...).<br />
|-<br />
! Profiling*<br />
| Extensive profiling framework for easy gathering (and plotting) of modeling metrics (average sample evaluation time, hyperparameter optimization trace, ...)<br />
|-<br />
! Easy tracking of modeling progress<br />
| Automatic storing of best models and their plots. Ability to automatically generate a movie of the sequence of plots.<br />
|-<br />
! Model browser GUI<br />
| A graphical tool is available to easily visualize high dimensional models and browse through data ([[Model Visualization GUI|more information here]])<br />
|-<br />
! Available test problems*<br />
| Out of the box support for many built-in functions (Ackley, Camel Back, Goldstein-Price, ...) and datasets (Abalone, Boston Housing, FishLength, ...) from various application domains. Including a number of datasets (and some simulation code) from electronics. In total over 50 examples are available.<br />
|-<br />
! License<br />
| [[License terms]]<br />
|}<br />
<br />
<nowiki>*</nowiki> Custom implementations can easily be added<br />
<br />
== Screenshots ==<br />
A number of screenshots to give a feel of the SUMO Toolbox. Note these screenshots do not necessarily reflect the latest toolbox version.<br />
<br />
<gallery><br />
Image:octagon.png<br />
Image:metamodel-sumo-hourglass.png<br />
Image:SUMO_Toolbox1.png<br />
Image:SUMO_Toolbox2.png<br />
Image:SUMO_Toolbox3.png<br />
Image:SUMO_Toolbox4.png<br />
Image:ISCSampleSelector1.png<br />
Image:ISCSampleSelector2.png<br />
Image:SUMO_Gui1.png<br />
Image:SUMO_Gui2.png<br />
Image:Contour1.png<br />
Image:TwoDim1.png<br />
Image:TwoDim2.png<br />
Image:ThreeDim1.png<br />
Image:ThreeDim2.png<br />
Image:ThreeDim3.png<br />
Image:FEBioTrekEI.png<br />
Image:FEBioTrekFunc.png<br />
</gallery><br />
<br />
== Movies ==<br />
<br />
[[Image:youtube-logo.jpg|right|70px|link=http://www.youtube.com/sumolab|]] A number of video clips generated by or related to the SUMO Toolbox [http://www.youtube.com/sumolab can be found at our YouTube channel]. Feel free to make suggestions or leave comments.<br />
<br />
Note these movies do not necessarily reflect the latest toolbox version. Improvements and/or interface adjustments may have been made since then.<br />
<br />
== Documentation ==<br />
<br />
An in depth overview of the rationale and philosophy, including a treatment of the software architecture underlying the SUMO Toolbox is available in the form of a PhD dissertation. A copy of this dissertation is available [[Contact|on request]].<br />
<br />
In addition the following poster and presentation give a high level overview:<br />
<br />
* Poster: [[Media:SUMO_poster.pdf|SUMO poster]]<br />
* Presentation: [[Media:SUMO_presentation.pdf|SUMO slides]]<br />
<br />
To stay up to date with the latest news and releases, we also recommend subscribing to the [http://www.sumo.intec.ugent.be SUMO newsletter]. <br />
Traffic will be kept to a minimum and you can unsubscribe at any time.<br />
<br />
A blog covering related research can be found here [http://sumolab.blogspot.com http://sumolab.blogspot.com].<br />
<br />
== Citations ==<br />
<br />
See [[Citing|Citing the toolbox]].</div>Dgorissenhttp://sumowiki.intec.ugent.be/index.php?title=Useful_Links&diff=5088Useful Links2010-02-24T12:07:08Z<p>Dgorissen: /* Related projects */</p>
<hr />
<div>=== Related publications ===<br />
<br />
See the [[Related publications]] page.<br />
<br />
=== Related projects ===<br />
<br />
A list of projects with similar ideas and scope.<br />
* [http://www.cs.rtu.lv/jekabsons/regression.html Matlab regression software]<br />
* [http://www.nutechsolutions.com/prod_cv_analytics.asp ClearVu Analytics]<br />
* [http://fchegury.googlepages.com/surrogatestoolbox Surrogates Toolbox]: Another matlab surrogate modeling toolbox<br />
* [http://www.evolved-analytics.com Evolved-Analytics]: DataModeler package.<br />
* [http://www.muneda.com/ MunEDA]: MunEDA provides leading EDA software technology for analysis, modelling and optimization of yield and performance of analog, mixed-signal and digital designs.<br />
* [http://www.lsoptsupport.com/faqs/setting-parameters-for-metamodel-based-optimization-strategies LS-OPT: Functionality similar to the SUMO-Toolbox in LS-OPT]<br />
* [http://mucm.aston.ac.uk/MUCM/MUCMToolkit/index.php?page=MetaHomePage.html MUCH Toolkit]: A toolbox for sensitivity analysis using surrogate models<br />
* [http://home.mit.bme.hu/~kollar/topics/fdident.html FDIDENT: frequency domain identification toolbox]<br />
* [http://www.dmoz.org/Science/Math/Statistics/Software/Regression_and_Curve_Fitting/: Regression software on Open Directory]<br />
* [http://www.infiniscale.com/ Infiniscale]: TechModeler, TechAnalyzer: automatic model generation tools for EM applications<br />
* [http://www.vrand.com/visualDOC.html visualDOC]: VisualDOC, a design optimization tool<br />
* [http://www.salford-systems.com/mars.php MARS]: A spline based modeling tool<br />
* [http://www.phoenix-int.com/products/modelcenter.php Modelcenter]: A datamining and modeling tool<br />
* [http://www.engineous.com/product_iSIGHT.htm iSIGHT]: A datamining and modeling tool<br />
* [http://www.tmpinc.com/datascape_overview.html Datascape]: A datamining and modeling tool<br />
* [http://www.friendship-systems.com/ Friendship systems]: Tools for modeling ship hulls parametrically and performing the modeling calculations (Equilibrium tool)<br />
* [http://www.csse.monash.edu.au/%7Edavida/nimrod/nimrodg.htm Nimrod/G]: Execute parameter sweeps on the grid<br />
* [http://www-sop.inria.fr/oasis/ProActive/ ProActive]: Java grid library <br />
* [http://www.csse.monash.edu.au/%7Edavida/nimrod/nimrodo.htm Nimrod/O]: Grid-enabled optimization toolkit <br />
* [http://www.ece.northwestern.edu/OTC/ NEOS]: Distributed optimization toolkit <br />
* [http://www.nas.nasa.gov/SC2000/ARC/ilab.html iLab]: automated parameter study toolkit <br />
* [http://software.sci.utah.edu/scirun.html SciRun]: Problem Solving Environment (PSE), for simulation, modeling, and visualization of scientific problems. <br />
* [http://www.esteco.it/ ModeFRONTIER]: environment dedicated to the set up of design assessment chains and efficient investigation of the design space. <br />
* [http://www.gridbus.org/ Gridbus]: metascheduler for the grid <br />
* [http://icl.cs.utk.edu/netsolve/overview/ Gridsolve/Netsolve]: grid enabled scientific computation toolbox <br />
* [http://www.dtreg.com/ DTREG]: a powerful statistical analysis program that generates classification and regression trees and Support Vector Machine models that can be used to predict parameter values. <br />
* [http://www.soton.ac.uk/%7Epbn/MDO/ MDO]: A collection of MDO links <br />
* [http://www.research.att.com/~njas/gosset/index.html GOSSET]: A general purpose program for designing experiments <br />
* [http://www.cs.sandia.gov/DAKOTA/ DAKOTA]: Design Analysis Kit for Optimization and Terascale Applications<br />
* [http://www.wesc.ac.uk/projectsite/dipso/index.html DIPSO]: Wide-Area Distributed Problem Solving (DIPSO <br />
* [http://www.geodise.org GEODISE]: Grid Enabled Optimisation and Design Search for Engineering<br />
* [http://czms.mit.edu/poseidon/new1/ Poseidon]: A distributed information system for ocean processes.<br />
* [http://www.fast.u-psud.fr/ezyfit/ EZfit] : Free curve fitting toolbox for matlab<br />
* [http://www.ians.uni-stuttgart.de/spinterp/about.html SGIT] : A Sparse grid interpolation toolbox<br />
* [http://www.csie.ntu.edu.tw/~yien/quickrbf/quickstart.php QuickRBF] : an RBF fitting library (native)<br />
* [http://www.farfieldtechnology.com/products/toolbox/ FastRBF] : another RBF fitting library (matlab)<br />
* [http://www.neuromat.com/models.html Neuromat Predictor] : neural networks fitting library<br />
* [http://simlab.jrc.ec.europa.eu/ Sensitivity Analysis library]<br />
* Gaussian Process Matlab code<br />
** [http://www.gaussianprocess.org/gpml/code/matlab/doc/ Code] based on Rasmussen's book<br />
** [http://www.kyb.mpg.de/publication.html?publ=2689 Sparse Gaussian Processes]<br />
** [http://www.cs.man.ac.uk/~neill/gp/ Gaussian Process Software]<br />
** [http://www.ios.htwg-konstanz.de/joomla_mof/index.php?option=com_content&view=article&id=48:polyreg-polynomial-gaussian-process-regression&catid=36:code&Itemid=81 GP] using polynomial covariance functions<br />
<br />
=== Related labs ===<br />
<br />
A list of some of the labs/researchers with similar ideas and scope.<br />
<br />
* [http://aerospace.engin.umich.edu/index.html Department of Aerospace Engineering at the University of Michigan]<br />
* [http://www.mae.ufl.edu/~mdo/research.html The Structural and Multidisciplinary Optimization Group at the University of Florida]<br />
* [http://web.engr.oregonstate.edu/~tgd/ School of Electrical Engineering and Computer Science, Oregon State U]<br />
* [http://www.soton.ac.uk/~cedc/ Computational Engineering and Design Center]<br />
* [http://edog.mne.psu.edu/research.html Engineering Design & Optimization Group (Penn state)]<br />
* [http://www.nd.edu/~sgano/research.html Aerospace and Mechanical Engineering] [http://www.gano.name/shawn/ Homepage]<br />
* [http://shyylab.engin.umich.edu/research/design-optimization Computational Thermo-Fluids Group]<br />
* [http://www.ensc.sfu.ca/~gwa5/index.htm Product Design and Optimization Laboratory (PDOL)]<br />
* [http://www.cerfacs.fr/4-25708-Home.php Computational Fluid Dynamics (CFD) group at CERFACS]<br />
* [http://webuser.uni-weimar.de/~roos1/ Dirk Roos]<br />
* [http://www.cerfacs.fr/~duchaine/HTML/research.htm Florent Duchaine]<br />
* http://www.nlr.nl/ National aerospace lab]<br />
<br />
=== Data sets - Simulation code ===<br />
<br />
A list of publicaly available datasets and simulation codes, useful for testing.<br />
<br />
* [http://matlabdb.mathematik.uni-stuttgart.de/files.jsp?MC_ID=1&SC_ID=2 Matlab scientific computing database] : Nicly documented Matlab simulation code examples<br />
* [http://www.mat.univie.ac.at/~neum/stat.html Statistics links] : A nice collection of data fitting and analysis codes<br />
* [http://en.wikipedia.org/wiki/Classic_data_sets Wikipedia Classic Datasets]<br />
* [http://www.uni-koeln.de/themen/Statistik/data/rousseeuw/ The ROUSSEEUW datasets]<br />
* [ftp://ftp.sas.com/pub/neural/dojo/dojo.html Donoho-Johnstone Benchmarks] <br />
* [http://people.scs.fsu.edu/~burkardt/datasets/datasets.html datasets]<br />
* [http://www.cise.ufl.edu/~mpf/sch/ Schrödinger wave simulations]<br />
* [http://www-syscom.univ-mlv.fr/~vignat/Signal/Space/index.html Spice like simulator for matlab]<br />
* [http://www.itl.nist.gov/div898/strd/general/dataarchive.html Nist dataset archive]<br />
* [http://homes.esat.kuleuven.be/~smc/daisy/daisydata.html Daisy datasets]<br />
* [http://lib.stat.cmu.edu/datasets/ Statlib dataset archive]<br />
* [http://www.iau.dtu.dk/nnbook/systems.html Datasets from the book "Neural networks for the modeling and control of dynamic systems.]<br />
* [http://www.gpc.de/eposes.html A simulation environment for production and transport, logistics and automation systems]<br />
* [http://phy.asu.edu/shumway/codes.html Nanostructure Simulation and Modeling Programs]<br />
* [http://pedsim.silmaril.org/ A Modular, Distributed Pedestrian Crowd Simulation System]<br />
* [http://g95.sourceforge.net/g95_status.html Fortran simulation codes]<br />
* [http://www.genie.ac.uk/ Grid ENabled Integrated Earth system model]<br />
* [http://gcmd.nasa.gov/KeywordSearch/Home.do?Portal=GCMD&MetadataType=0 Nasa datasets]<br />
* [http://funapp.cs.bilkent.edu.tr/DataSets/ Function approximation repository]<br />
* [http://www.google.com/Top/Computers/Artificial_Intelligence/Machine_Learning/Datasets/ Google directory datasets]<br />
* [http://www.idsia.ch/~andrea/simtools.html A Collection of Modeling and Simulation Resources on the Internet]<br />
* [http://www.grc.nasa.gov/WWW/K-12/freesoftware_page.htm Free simulation software from Nasa]<br />
* [http://opensees.berkeley.edu/index.php Earthquake simulation]<br />
* [http://www.pdl.cmu.edu/DiskSim/ Harddisk simulator]<br />
* [http://spib.rice.edu/spib/mtn_top.html Mountain top radar data]<br />
* [http://www.statsci.org/datasets.html Statsci dataset repository]<br />
* [http://statwww.epfl.ch/davison/BMA/Data4BMA/ dataset repository]<br />
* [http://astrostatistics.psu.edu/datasets/ dataset repository]<br />
* [http://www.maths.uq.edu.au/CEToolBox/ problems from the Cross Entropy toolbox]<br />
* [http://www.cs.waikato.ac.nz/~ml/weka/index_datasets.html Weka datasets]<br />
* [http://www.ailab.si/orange/datasets.asp?Inst=on&Atts=on&Class=on&Values=on&Description=on&sort=Data+Set dataset repository]<br />
* [http://www.itee.uq.edu.au/%7Emarcusg/msg.html Max Set of Gaussians Landscape Generator]<br />
* [http://www.mathworks.fr/matlabcentral/fileexchange/loadAuthor.do?objectId=364966&objectType=author Collection of useful Matlab scripts by Thomas Abrahamsson ]<br />
* [http://www.mathworks.com/matlabcentral/fileexchange/loadFile.do?objectId=10731&objectType=FILE# A Matlab toolbox for Linear Structural Dynamics Analysis]<br />
* [http://webscripts.softpedia.com/script/Scientific-Engineering-Ruby/Controls-and-Systems-Modeling/StructDyn-32656.html Matlab simulators for control and systems modeling]<br />
* [http://www.ae.uiuc.edu/m-selig/ads.html UIUC Airfoil Data Site]<br />
* [http://citeseer.ist.psu.edu/112408.html Proben1 benchmark datasets]<br />
* [http://public.ca.sandia.gov/TNF/abstract.html TNF workshop data archives]<br />
* [http://www.gaussianprocess.org Code for kriging models]<br />
* [http://www.comsol.be/ Comsol multiphysics] package<br />
<br />
=== Predefined functions ===<br />
<br />
* [http://www.cs.cmu.edu/afs/cs/project/jair/pub/volume24/ortizboyer05a-html/node6.html Continuous benchmark problems]<br />
* [http://www2.imm.dtu.dk/~km/GlobOpt/testex/ Optimization test functions]<br />
* [http://www.geatbx.com/docu/fcnindex-01.html Optimization test functions]<br />
* [http://www.ewh.ieee.org/soc/es/May2001/14/Begin.htm Optimization test functions]<br />
* [http://www.it.lut.fi/ip/evo/functions/functions.html Optimization test functions]<br />
* [http://www.cs.colostate.edu/~genitor/functions.html Optimization test functions]<br />
* [http://www-optima.amp.i.kyoto-u.ac.jp/member/student/hedar/Hedar_files/TestGO_files/Page364.htm Optimization test functions]<br />
* [http://ntu-cg.ntu.edu.sg/ysong/journal/GLSurrogate.pdf Surrogate modeling test functions]<br />
* [http://www.it.lut.fi/ip/evo/functions/functions.html Functions with multiple global optima]<br />
* [http://www.mat.univie.ac.at/~neum/glopt/moretest More test set]<br />
* [http://www.mat.univie.ac.at/~neum/software/dixon.tar.gz Dixon-Szegö test set]<br />
* [http://titan.princeton.edu/TestProblems/ Handbook of Test Problems for local and global optimization]<br />
<br />
=== Optimization ===<br />
<br />
* [http://www.tik.ee.ethz.ch/sop/pisa/ PISA optimization framework]<br />
* [http://www.jeo.org/emo/EMOOsoftware.html Impl. of multiobjective optimization methods]<br />
* [http://www.hvass-labs.org/projects/swarmops/ swarmOps]<br />
* [http://www.icsi.berkeley.edu/~storn/code.html Differential Evolution]<br />
* [http://control.ee.ethz.ch/~joloef/wiki/pmwiki.php YALMIP] High-level optimization problem solver, uses external solvers as backend<br />
* [http://www.mat.univie.ac.at/~neum/glopt/software_g.html Global optimization software]<br />
* [http://www.gerad.ca/NOMAD/Abramson/nomadm.html Mesh-Adaptive Direct Search (MADS) software in Matlab]<br />
* [http://www-rocq.inria.fr/~gilbert/modulopt/modulopt.html Modulopt]: fortran implementations and some matlab<br />
<br />
=== Various links ===<br />
<br />
* [http://videolectures.net/Top/Computer_Science/Machine_Learning/ An excellent collection of machine learning lecture videos]<br />
* [http://home.online.no/~pjacklam/matlab/software/util/fullindex.html A collection of useful Matlab scripts]<br />
* [http://home.online.no/~pjacklam/matlab/doc/mtt/ MATLAB array manipulation tips and tricks]<br />
* [http://www.adaptivebox.net/research/bookmark/psocodes_link.html Particle Swarm implementations]<br />
* [http://mloss.org A great selection of open source machine learning software]<br />
* [http://www.cimlcommunity.org/ Another repository of machine learning tools]<br />
* [http://stommel.tamu.edu/~baum/toolboxes.html A list of Matlab toolboxes]<br />
* [http://mdoboard.proboards59.com/ ISSMO-REASON: Research and Engineering Applications in Structural Optimization Network]<br />
* [http://www.nafems.org/about/ NAFEMS]<br />
* [http://www.kat-net.net/ The European Coordinating Action on Key Aerodynamic Technologies]<br />
* [http://www.altairhyperworks.co.uk/Default.aspx Altair Hyperworks]<br />
* [http://www.ifte.de/english/research/index.html IFTE]<br />
* [http://www.optiy.eu/Features.html OptyI]<br />
* [http://www.technet-alliance.com/ Technet Alliance]<br />
<br />
=== Conference links ===<br />
<br />
* [http://www.conferencealerts.com Conference alerts]<br />
* [http://www.conferencealerts.com/engineer.htm Engineering conferences]<br />
* [http://www.conferencealerts.com/ai.htm AI Conference alerts]<br />
* [http://www.ieee.org/web/conferences/search/index.html IEEE conferences]<br />
* [http://www.aiaa.org/content.cfm?pageid=1 AIAA Conferences]<br />
* [http://www.wikicfp.com/ Wiki of conferences]<br />
* [http://ddl.me.cmu.edu/ddwiki/index.php/ASME_IDETC/CIE_Conferences CIE Conferences]<br />
* [http://users.jyu.fi/~miettine/lista.html#Conferences Kaisa's conferences]<br />
* [http://www.conference-service.com/conferences/neural-networks.html AI Conferences]<br />
* [http://ieee-cis.org/conferences/co_sponsorship_1/ IEEE CIS conferences]<br />
* [http://www.makhfi.com/events.htm Neural Network Events]<br />
* [http://www.adam.ntu.edu.sg/~mgeorg/lately_announced.pl?timeback=30 Neural Network Conferences]<br />
* [http://openresearch.org/mw/index.php?title=Upcoming_deadlines&field=Machine+learning Open research AI conferences]<br />
<br />
<br />
=== Journal Links ===<br />
<br />
*[http://www.linklings.net/tomacs/charter.html ACM Transactions on Modeling and Computer Simulation]<br />
*[http://www.elsevier.com/wps/find/journaldescription.cws_home/622330/description#description Simulation Modelling Practice and Theory]<br />
*[http://www.elsevier.com/wps/find/journaldescription.cws_home/422911/description#description Advances in Engineering Software]<br />
*[http://journaltool.asme.org/Content/JournalDescriptions.cfm?journalId=12 Journal of Mechanical Design ]<br />
*[http://journaltool.asme.org/Content/JournalDescriptions.cfm?journalId=3&Journal=JCISE Journal of Computing and Information Science in Engineering]<br />
*[http://www.elsevier.com/wps/find/journaldescription.cws_home/975/description#description Engineering Applications of Artificial Intelligence]<br />
*[http://www.elsevier.com/wps/find/journaldescription.cws_home/622240/description#description Advanced Engineering Informatics]<br />
*[http://www.springer.com/computer/information+systems/journal/366 Engineering with Computers]<br />
*[http://www.springer.com/computer/mathematics/journal/521 Neural Computing and Applications]<br />
*[http://www.aiaa.org/content.cfm?pageid=322&lupubid=2 AIAA Journal]<br />
*[http://journals.cambridge.org/action/displayJournal?jid=aie Artificial Intelligence for Engineering Design, Analysis and Manufacturing ]<br />
*[http://jmlr.csail.mit.edu/ Journal of Machine Learning Research]<br />
*[http://www.elsevier.com/wps/find/journaldescription.cws_home/505645/description#description Computer Methods in Applied Mechanics and Engineering]<br />
*[http://www.elsevier.com/wps/find/journaldescription.cws_home/524998/description#description Applied Mathematical Modelling]<br />
*[http://www.siam.org/journals/sisc.php SIAM Journal on Scientific Computing]<br />
*[http://www.iop.org/EJ/journal/-page=scope/0266-5611 Inverse Problems]<br />
*[http://www.tandf.co.uk/journals/titles/17415977.asp Inverse problems in science and engineering]<br />
*[http://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=3468 IEEE Transactions on Systems, Man and Cybernetics, Part A]<br />
*[http://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=3477 IEEE Transactions on Systems, Man and Cybernetics, Part B]<br />
*[http://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=5326 IEEE Transactions on Systems, Man and Cybernetics, Part C]<br />
*[http://www.techscience.com/cmes/aims_scope.html Computer Modeling in Engineering & Sciences]<br />
*[http://www.informaworld.com/smpp/title~db=all~content=t713723652~tab=summary Journal of Experimental & Theoretical Artificial Intelligence]<br />
*[http://www.elsevier.com/locate/jocs Journal of Computational Science]</div>Dgorissenhttp://sumowiki.intec.ugent.be/index.php?title=FAQ&diff=5077FAQ2010-02-10T18:44:17Z<p>Dgorissen: /* Using */</p>
<hr />
<div>== General ==<br />
<br />
=== What is a global surrogate model? ===<br />
<br />
A global [http://en.wikipedia.org/wiki/Surrogate_model surrogate model] is a mathematical model that mimics the behavior of a computationally expensive simulation code over '''the complete parameter space''' as accurately as possible, using as little data points as possible. So note that optimization is not the primary goal, although it can be done as a post-processing step. Global surrogate models are useful for:<br />
<br />
* design space exploration, to get a ''feel'' of how the different parameters behave<br />
* sensitivity analysis<br />
* ''what-if'' analysis<br />
* prototyping<br />
* visualization<br />
* ...<br />
<br />
In addition they are a cheap way to model large scale systems, multiple global surrogate models can be chained together in a model cascade.<br />
<br />
See also the [[About]] page.<br />
<br />
=== What about surrogate driven optimization? ===<br />
<br />
When coining the term '''surrogate driven optimization''' most people associate it with trust-region strategies and simple polynomial models. These frameworks first construct a local surrogate which is optimized to find an optimum. Afterwards, a move limit strategy decides how the local surrogate is scaled and/or moved through the input space. Subsequently the surrogate is rebuild and optimized. I.e. the surrogate zooms in to the global optimum. For instance the [http://www.cs.sandia.gov/DAKOTA/ DAKOTA] Toolbox implements such strategies where the surrogate construction is separated from optimization.<br />
<br />
Such a framework was earlier implemented in the SUMO Toolbox but was deprecated as it didn't fit the philosophy and design of the toolbox. <br />
<br />
Instead another, equally powerful, approach was taken. The current optimization framework is in fact a sampling selection strategy that balances local and global search. In other words, it balances between exploring the input space and exploiting the information the surrogate gives us.<br />
<br />
A configuration example can be found [[Config:SampleSelector#expectedImprovement|here]].<br />
<br />
=== What is (adaptive) sampling? Why is it used? ===<br />
<br />
In classical Design of Experiments you need to specify the design of your experiment up-front. Or in other words, you have to say up-front how many data points you need and how they should be distributed. Two examples are Central Composite Designs and Latin Hypercube designs. However, if your data is expensive to generate (e.g., an expensive simulation code) it is not clear how many points are needed up-front. Instead data points are selected adaptively, only a couple at a time. This process of incrementally selecting new data points in regions that are the most interesting is called adaptive sampling, sequential design, or active learning. Of course the sampling process needs to start from somewhere so the very first set of points is selected based on a fixed, classic experimental design. See also [[Running#Understanding_the_control_flow]].<br />
SUMO provides a number of different sampling algorithms: [[SampleSelector]]<br />
<br />
Of course sometimes you dont want to do sampling. For example if you have a fixed dataset you just want to load all the data in one go and model that. For how to do this see [[FAQ#How_do_I_turn_off_adaptive_sampling_.28run_the_toolbox_for_a_fixed_set_of_samples.29.3F]].<br />
<br />
=== What about dynamical, time dependent data? ===<br />
<br />
The original design and purpose was to tackle static input-output systems, where there is no memory. Just a complex mapping that must be learnt and approximated. Of course you can take a fixed time interval and apply the toolbox but that typically is not a desired solution. Usually you are interested in time series prediction, e.g., given a set of output values from time t=0 to t=k, predict what happens at time t=k+1,k+2,...<br />
<br />
The toolbox was originally not intended for this purpose. However, it is quite easy to add support for recurrent models. Automatic generation of dynamical models would involve adding a new model type (just like you would add a new regression technique) or require adapting an existing one. For example it would not be too much work to adapt the ANN or SVM models to support dynamic problems. The only extra work besides that would be to add a new [[Measures|Measure]] that can evaluate the fidelity of the models' prediction.<br />
<br />
Naturally though, you would be unable to use sample selection (since it makes no sense in those problems). Unless of course there is a specialized need for it. In that case you would add a new [[SampleSelector]].<br />
<br />
For more information on this topic [[Contact]] us.<br />
<br />
=== What about classification problems? ===<br />
<br />
The main focus of the SUMO Toolbox is on regression/function approximation. However, the framework for hyperparameter optimization, model selection, etc. can also be used for classification. Starting from version 6.3 a demo file is included in the distribution that shows how this works on a well known test problem. If you want to play around with this feature without waiting for 6.3 to be released [[Contact|just let us know]].<br />
<br />
=== Can the toolbox drive my simulation code directly? ===<br />
<br />
Yes it can. See the [[Interfacing with the toolbox]] page.<br />
<br />
=== What is the difference between the M3-Toolbox and the SUMO-Toolbox? ===<br />
<br />
The SUMO toolbox is a complete, feature-full framework for automatically generating approximation models and performing adaptive sampling. In contrast, the M3-Toolbox was more of a proof-of-principle.<br />
<br />
=== What happened to the M3-Toolbox? ===<br />
<br />
The M3 Toolbox project has been discontinued (Fall 2007) and superseded by the SUMO Toolbox. Please contact tom.dhaene@ua.ac.be for any inquiries and requests about the M3 Toolbox.<br />
<br />
=== How can I stay up to date with the latest news? ===<br />
<br />
To stay up to date with the latest news and releases, we also recommend subscribing to our newsletter [http://www.sumo.intec.ugent.be here]. Traffic will be kept to a minimum (1 message every 2-3 months) and you can unsubscribe at any time.<br />
<br />
You can also follow our blog: [http://sumolab.blogspot.com/ http://sumolab.blogspot.com/].<br />
<br />
=== What is the roadmap for the future? ===<br />
<br />
There is no explicit roadmap since much depends on where our research leads us, what feedback we get, which problems we are working on, etc. However, to get an idea of features to come you can always check the [[Whats new]] page.<br />
<br />
You can also follow our blog: [http://sumolab.blogspot.com/ http://sumolab.blogspot.com/].<br />
<br />
=== Will there be an R/Scilab/Octave/Sage/.. version? ===<br />
<br />
At the start of the project we considered moving to one of the available open source alternatives to Matlab. However, after much discussion we decided against this for several reasons(*), including:<br />
<br />
* The quality and amount of available Matlab documentation <br />
* The quality and number of Matlab toolboxes<br />
* Many well documented interfacing options (esp. Java)<br />
* Existing experience and know-how<br />
<br />
Matlab sure has its problems and deficiencies but the number of advanced algorithms and toolboxes make it a very attractive platform. Equally important is the fact that every function is properly documented and includes examples, tutorials, and in some cases GUI tools. A lot of things would have been a lot harder and/or time consuming to implement on one of the other platforms. The other platforms remain on our radar however, and we do look into them from time to time. In principle it would even be possible to write a bridge between Matlab and them.<br />
<br />
(*) We are not saying those projects are poor or useless, quite the contrary. Its just that given our situation, goals, and resources at the time, Matlab was the best choice for us.<br />
<br />
=== What are collaboration options? ===<br />
<br />
We will gladly help out with any SUMO-Toolbox related questions or problems. However, since we are a university research group the most interesting goal for us is to work towards some joint publication (e.g., we can help with the modeling of your problem). Alternatively, it is always nice if we could use your data/problem (fully referenced and/or anonymized if necessary of course) as an example application during a conference presentation or in a PhD thesis.<br />
<br />
The most interesting case is if your problem involves sample selection and modeling. This means you have some simulation code or script to drive and you want an accurate model while minimizing the number of data points. In this case, in order for us to optimally help you it would be easiest if we could run your simulation code (or script) locally or access it remotely. Else its difficult to give good recommendations about what settings to use.<br />
<br />
If this is not possible (e.g., expensive, proprietary or secret modeling code) or if your problem does not involve sample selection, you can send us a fixed data set that is representative of your problem. Again, this may be fully anonymized and will be kept confidential of course.<br />
<br />
In either case (code or dataset) remember:<br />
<br />
* the data file should be an ASCII file in column format (each row containing one data point) (see also [[Interfacing_with_the_toolbox]])<br />
* include a short description of your data:<br />
** number of inputs and number of outputs<br />
** the range of each input (or scaled to [-1 1] if you do not wish to disclose this)<br />
** if the outputs are real or complex valued<br />
** how noisy the data is or if it is completely deterministic (computer simulation) (please also see: [[FAQ#My_data_contains_noise_can_the_SUMO-Toolbox_help_me.3F]]).<br />
** if possible the expected range of each output (or scaled if you do not wish to disclose this)<br />
** if possible the names of each input/output + a short description of what they mean<br />
** any further insight you have about the data, expected behavior, expected importance of each input, etc.<br />
<br />
If you have any further questions or comments related to this please [[Contact]] us.<br />
<br />
=== Can you help me model my problem? ===<br />
<br />
Please see the previous question: [[FAQ#What_are_collaboration_options.3F]]<br />
<br />
== Installation and Configuration ==<br />
<br />
=== What is the relationship between Matlab and Java? ===<br />
<br />
Many people do not know this, but your Matlab installation automatically includes a Java virtual machine. By default, Matlab seamlessly integrates with Java, allowing you to create Java objects from the command line (e.g., 's = java.lang.String'). It is possible to disable java support but in order to use the SUMO Toolbox it should not be. To check if Java is enabled you can use the 'usejava' command.<br />
<br />
=== What is Java, why do I need it, do I have to install it, etc. ? ===<br />
<br />
The short answer is: no, dont worry about it. The long answer is: Some of the code of the SUMO Toolbox is written in [http://en.wikipedia.org/wiki/Java_(programming_language) Java], since it makes a lot more sense in many situations and is a proper programming language instead of a scripting language like Matlab. Since Matlab automatically includes a JVM to run Java code there is nothing you need to do or worry about (see the previous FAQ entry). Unless its not working of course, in that case see [[FAQ#When_running_the_toolbox_you_get_something_like_.27.3F.3F.3F_Undefined_variable_.22ibbt.22_or_class_.22ibbt.sumo.config.ContextConfig.setRootDirectory.22.27]].<br />
<br />
=== What is XML? ===<br />
<br />
XML stands for eXtensible Markup Language and is related to HTML (= the stuff web pages are written in). The first thing you have to understand is that '''does not do anything'''. Honest. Many engineers are not used to it and think it is some complicated computer programming language-stuff-thingy. This is of course not the case (we ignore some of the fancy stuff you can do with it for now). XML is a markup language meaning, it provides some rules how you can annotate or structure existing text.<br />
<br />
The way SUMO uses XML is really simple and there is not much to understand. First some simple terminology. Take the following example:<br />
<br />
<source lang="xml"><br />
<Foo attr="bar">bla bla bla</Foo> <br />
</source><br />
<br />
Here we have '''a tag''' called ''Foo'' containing text ''bla bla bla''. The tag Foo also has an '''attribute''' ''attr'' with value ''bar''. '<Foo>' is what we call the '''opening tag''', and '</Foo>' is the '''closing tag'''. Each time you open a tag you must close it again. How you name the tags or attributes it totally up to you, you choose :)<br />
<br />
Lets take a more interesting example. Here we have used XML to represent information about a receipe for pancakes:<br />
<br />
<source lang="xml"><br />
<recipe category="dessert"><br />
<title>Pancakes</title><br />
<author>sumo@intec.ugent.be</author><br />
<date>Wed, 14 Jun 95</date><br />
<description><br />
Good old fashioned pancakes.<br />
</description><br />
<ingredients><br />
<item><br />
<amount>3</amount><br />
<type>eggs</type><br />
</item><br />
<br />
<item><br />
<amount>0.5 tablespoon</amount><br />
<type>salt</type><br />
</item><br />
...<br />
</ingredients><br />
<preparation><br />
...<br />
</preparation><br />
</recipe><br />
</source><br />
<br />
So basically, you see that XML is just a way to structure, order, and group information. Thats it! So SUMO basically uses it to store and structure configuration options. And this works well due to the nice hierarchical nature of XML.<br />
<br />
If you understand this there is nothing else to it in order to be able to understand the SUMO configuration files. If you need more information see the tutorial here: [http://www.w3schools.com/XML/xml_whatis.asp http://www.w3schools.com/XML/xml_whatis.asp]. You can also have a look at the wikipedia page here: [http://en.wikipedia.org/wiki/XML http://en.wikipedia.org/wiki/XML]<br />
<br />
=== Why does SUMO use XML? ===<br />
<br />
XML is the defacto standard way of structuring information. This ranges from spreadsheet files (Microsoft Excel for example), to configuration data, to scientific data, ... There are even whole database systems based solely on XML. So basically, its an intuitive way to structure data and it is used everywhere. This makes that there are a very large number of libraries and programming languages available that can parse, and handle XML easily. That means less work for the programmer. Then of course there is stuff like XSLT, XQuery, etc that makes life even easier.<br />
So basically, it would not make sense for SUMO to use any other format :)<br />
<br />
=== I get an error that SUMO is not yet activated ===<br />
<br />
Make sure you installed the activation file that was mailed to you as is explained in the [[Installation]] instructions. Also double check your system meets the [[System requirements]] and that [http://www.sumowiki.intec.ugent.be/index.php/FAQ#When_running_the_toolbox_you_get_something_like_.27.3F.3F.3F_Undefined_variable_.22ibbt.22_or_class_.22ibbt.sumo.config.ContextConfig.setRootDirectory.22.27|java java is enabled]. To fully verify that the activation file installation is correct ensure that the file ContextConfig.class is present in the directory ''<SUMO installation directory>/bin/java/ibbt/sumo/config''.<br />
<br />
Please note that more flexible research licenses are available if it is possible to [[FAQ#What_are_collaboration_options.3F|collaborate in any way]].<br />
<br />
== Upgrading ==<br />
<br />
=== How do I upgrade to a newer version? ===<br />
<br />
Delete your old <code><SUMO-Toolbox-directory></code> completely and replace it by the new one. Install the new activation file / extension pack as before (see [[Installation]]), start Matlab and make sure the default run works. To port your old configuration files to the new version: make a copy of default.xml (from the new version) and copy over your custom changes (from the old version) one by one. This should prevent any weirdness if the XML structure has changed between releases.<br />
<br />
If you had a valid activation file for the previous version, just [[Contact]] us (giving your SUMOlab website username) and we will send you a new activation file. Note that to update an activation file you must first unzip a copy of the toolbox to a new directory and install the activation file as if it was the very first time. Upgrading of an activation file without performing a new toolbox install is (unfortunately) not (yet) supported.<br />
<br />
== Using ==<br />
<br />
=== I have no idea how to use the toolbox, what should I do? ===<br />
<br />
See: [[Running#Getting_started]]<br />
<br />
=== I want to try one of the different examples ===<br />
<br />
See [[Running#Running_different_examples]].<br />
<br />
=== I want to model my own problem ===<br />
<br />
See : [[Adding an example]].<br />
<br />
=== I want to contribute some data/patch/documentation/... ===<br />
<br />
See : [[Contributing]].<br />
<br />
=== How do I interface with the SUMO Toolbox? ===<br />
<br />
See : [[Interfacing with the toolbox]].<br />
<br />
=== What configuration options (model type, sample selection algorithm, ...) should I use for my problem? ===<br />
<br />
See [[General_guidelines]].<br />
<br />
=== Ok, I generated a model, what can I do with it? ===<br />
<br />
See: [[Using a model]].<br />
<br />
=== How can I share a model created by the SUMO Toolbox? ===<br />
<br />
See : [[Using a model#Model_portability| Model portability]].<br />
<br />
=== I dont like the final model generated by SUMO how do I improve it? ===<br />
<br />
Before you start the modeling you should really ask youself this question: ''What properties do I want to see in the final model?'' You have to think about what for you constitutes a good model and what constitutes a poor model. Then you should rank those properties depending on how important you find them. Examples are:<br />
<br />
* accuracy in the training data<br />
** is it important that the error in the training data is exactly 0, or do you prefer some smoothing<br />
* accuracy outside the training data<br />
** this is the validation or test error, how important is proper generalization (usually this is very important)<br />
* what does accuracy mean to you? a low maximum error, a low average error, both, ...<br />
* smoothness<br />
** should your model be perfectly smooth or is it acceptable that you have a few small ripples here and there for example<br />
* are some regions of the response more important than others?<br />
** for example you may want to be certain that the minima/maxima are captured very accurately but everything in between is less important<br />
* are there particular special features that your model should have<br />
** for example, capture underlying poles or discontinuities correctly<br />
* extrapolation capability<br />
* ...<br />
<br />
It is important to note that often these criteria may be conflicting. The classical example is fitting noisy data: the lower your training error the higher your testing error. A natural approach is to combine multiple criteria, see [[Multi-Objective Modeling]].<br />
<br />
Once you have decided on a set of requirements the question is then, can the SUMO-Toolbox produce a model that meets them? In SUMO model generation is driven by one or more [[Measures]]. So you should choose the combination of [[Measures]] that most closely match your requirements. Of course we can not provide a Measure for every single property, but it is very straightforward to [[Add_Measure|add your own Measure]].<br />
<br />
Now, lets say you have chosen what you think are the best Measures but you are still not happy with the final model. Reasons could be:<br />
<br />
* you need more modeling iterations or you need to build more models per iteration (see [[Running#Understanding_the_control_flow]]). This will result in a more extensive search of the model parameter space, but will take longer to run.<br />
* you should switch to a different model parameter optimization algorithm (e.g., for example instead of the Pattern Search variant, try the Genetic Algorithm variant of your AdaptiveModelBuilder.)<br />
* the model type you are using is not ideally suited to your data<br />
* there simply is not enough data, use a larger initial design or perform more sampling iterations to get more information per dimension<br />
* maybe the sample distribution is causing troubles for your model (e.g., Kriging can have problems with clustered data). In that case it could be worthwhile to choose a different sample selection algorithm.<br />
* the range of your response variable is not ideal (for example, neural networks have trouble modeling data if the range of the outputs is very very small)<br />
<br />
You may also refer to the following [[General_guidelines]]. Finally, of course it may be that your problem is simply a very difficult one and does not approximate well. But, still you should at least get something satisfactory.<br />
<br />
If you are having these kinds of problems, please [[Reporting_problems|let us know]] and we will gladly help out.<br />
<br />
=== My data contains noise can the SUMO-Toolbox help me? ===<br />
<br />
The original purpose of the SUMO-Toolbox was for it to be used in conjunction with computer simulations. Since these are fully deterministic you do not have to worry about noise in the data and all the problems it causes. However, the methods in the toolbox are general fitting methods that work on noisy data as well. So yes, the toolbox can be used with noisy data, but you will just have to be more careful about how you apply the methods and how you perform model selection. Its only when you use the toolbox with a noisy simulation engine that a few special options may need to be set. In that case [[Contact]] us for more information.<br />
<br />
Note though, that the toolbox is not a statistical package, if you have noisy data and you need noise estimation algorithms, kernel smoothing algorithms, etc. you should look towards other tools.<br />
<br />
=== What is the difference between a ModelBuilder and a ModelFactory? ===<br />
<br />
See [[Add Model Type]].<br />
<br />
=== Why are the Neural Networks so slow? ===<br />
<br />
The ANN models are an extremely powerful model type that give very good results in many problems. However, they are quite slow to use. There are some things you can do:<br />
<br />
* use trainlm or trainscg instead of the default training function trainbr. trainbr gives very good, smooth results but is slower to use. If results with trainlm are not good enough, try using msereg as a performance function.<br />
* try setting the training goal (= the SSE to reach during training) to a small positive number (e.g., 1e-5) instead of 0.<br />
* check that the output range of your problem is not very small. If your response data lies between 10e-5 and 10e-9 for example it will be very hard for the neural net to learn it. In that case rescale your data to a more sane range.<br />
* switch from ANN to one of the other neural network modelers: fanngenetic or nanngenetic. These are a lot faster than the default backend based on the [http://www.mathworks.com/products/neuralnet/ Matlab Neural Network Toolbox]. However, the accuracy is usually not as good.<br />
* If you are using [[Measures#CrossValidation| CrossValidation]] try to switch to a different measure since CrossValidation is very expensive to use. CrossValidation is used by default if you have not defined a [[Measures| measure]] yourself. When using one of the neural network model types, try to use a different measure if you can. For example, our tests have shown that minimizing the sum of [[Measures#SampleError| SampleError]] and [[Measures#LRMMeasure| LRMMeasure]] can give equal or even better results than CrossValidation, while being much cheaper (see [[Multi-Objective Modeling]] for how to combine multiple measures). See also the comments in <code>default.xml</code> for examples.<br />
<br />
See also [[FAQ#How_can_I_make_the_toolbox_run_faster.3F]]<br />
<br />
=== How can I make the toolbox run faster? ===<br />
<br />
There are a number of things you can do to speed things up. These are listed below. Remember though that the main reason the toolbox may seem to be slow is due to the many models being built as part of the hyperparameter optimization. Please make sure you fully understand the [[Running#Understanding_the_control_flow|control flow described here]] before trying more advanced options.<br />
<br />
* First of all check that your virus scanner is not interfering with Matlab. If McAfee or any other program wants to scan every file SUMO generates this really slows things down and your computer becomes unusable.<br />
<br />
* Turn off the plotting of models in [[Config:ContextConfig#PlotOptions| ContextConfig]], you can always generate plots from the saved mat files<br />
<br />
* This is an important one. For most model builders there is an option "maxFunEals", "maxIterations", or equivalent. Change this value to change the maximum number of models built between 2 sampling iterations. The higher this number, the slower, but the better the models ''may'' be. Equivalently, for the Genetic model builders reduce the population size and the number of generations.<br />
<br />
* If you are using [[Measures#CrossValidation]] see if you can avoid it and use one of the other measures or a combination of measures (see [[Multi-Objective Modeling]]<br />
<br />
* If you are using a very dense [[Measures#ValidationSet]] as your Measure, this means that every single model will be evaluated on that data set. For some models like RBF, Kriging, SVM, this can slow things down.<br />
<br />
* Disable some, or even all of the [[Config:ContextConfig#Profiling| profilers]] or disable the output handlers that draw charts. For example, you might use the following configuration for the profilers:<br />
<br />
<source lang="xml"><br />
<Profiling><br />
<Profiler name=".*share.*|.*ensemble.*|.*Level.*" enabled="true"><br />
<Output type="toImage"/><br />
<Output type="toFile"/><br />
</Profiler><br />
<br />
<Profiler name=".*" enabled="true"><br />
<Output type="toFile"/><br />
</Profiler><br />
</Profiling><br />
</source><br />
<br />
The ".*" means match any one or more characters ([http://java.sun.com/j2se/1.5.0/docs/api/java/util/regex/Pattern.html see here for the full list of supported wildcards]). Thus in this example all the profilers that have "share", "ensemble", or "Level" in their name shoud be enabled and should be saved as a text file (toFile) AND as an image file (toImage). All the other profilers should be saved just to file. The idea is to only save to image what you want as an image since image generation is expensive. If you do this or switch off image generation completely you will see everything run much faster.<br />
<br />
* Decrease the logging granularity, a log level of FINE (the default is FINEST or ALL) is more then granular enough. Setting it to FINE, INFO, or even WARNING should speed things up.<br />
<br />
* If you have a multi-core/multi-cpu machine:<br />
** if you have the Matlab Parallel Computing Toolbox, try setting the parallelMode option to true in [[Config:ContextConfig]]. Now all model training occurs in parallel. This may give unexpected errors in some cases so beware when using.<br />
** if you are using a native executable or script as the sample evaluator set the threadCount variable in [[Config:SampleEvaluator#LocalSampleEvaluator| LocalSampleEvaluator]] equal to the number of cores/CPUs (only do this if it is ok to start multiple instances of your simulation script in parallel!)<br />
<br />
* Dont use the Min-Max measure, it can slow things down. See also [[FAQ#How_do_I_force_the_output_of_the_model_to_lie_in_a_certain_range]]<br />
<br />
* If you are using neural networks see [[FAQ#Why_are_the_Neural_Networks_so_slow.3F]]<br />
<br />
* If you are having problems with very slow or seemingly hanging runs:<br />
** Do a run inside the [http://www.mathworks.com/access/helpdesk/help/techdoc/index.html?/access/helpdesk/help/techdoc/matlab_env/f9-17018.html&http://www.google.be/search?client=firefox-a&rls=org.mozilla%3Aen-US%3Aofficial&channel=s&hl=nl&q=matlab+profiler&meta=&btnG=Google+zoeken Matlab profiler] and see where most time is spent.<br />
<br />
** Monitor CPU and physical/virtual memory usage while the SUMO toolbox is running and see if you notice anything strange. <br />
<br />
* Also note that by default Matlab only allocates about 117 MB memory space for the Java Virtual Machine. If you would like to increase this limit (which you should) please follow the instructions [http://www.mathworks.com/support/solutions/data/1-18I2C.html?solution=1-18I2C here]. See also the general memory instructions [http://www.mathworks.com/support/tech-notes/1100/1106.html here].<br />
<br />
To check if your SUMO run has hanged, monitor your log file (with the level set at least to FINE). If you see no changes for about 30 minutes the toolbox will probably have stalled. [[Reporting problems| report the problems here]].<br />
<br />
Such problems are hard to identify and fix so it is best to work towards a reproducible test case if you think you found a performance or scalability issue.<br />
<br />
=== How do I build models with more than one output ===<br />
<br />
Sometimes you have multiple responses that you want to model at once. See [[Running#Models_with_multiple_outputs]]<br />
<br />
=== How do I turn off adaptive sampling (run the toolbox for a fixed set of samples)? ===<br />
<br />
See : [[Adaptive Modeling Mode]].<br />
<br />
=== How do I change the error function (relative error, RMSE, ...)? ===<br />
<br />
The [[Measures| <Measure>]] tag specifies the algorithm to use to assign models a score, e.g., [[Measures#CrossValidation| CrossValidation]]. It is also possible to specify which '''error function''' to use, in the measure. The default error function is '<code>rootRelativeSquareError</code>'.<br />
<br />
Say you want to use [[Measures#CrossValidation| CrossValidation]] with the maximum absolute error, then you would put:<br />
<br />
<source lang="xml"><br />
<Measure type="CrossValidation" target="0.001" errorFcn="maxAbsoluteError"/><br />
</source><br />
<br />
On the other hand, if you wanted to use the [[Measures#ValidationSet| ValidationSet]] measure with a relative root-mean-square error you would put:<br />
<br />
<source lang="xml"><br />
<Measure type="ValidationSet" target="0.001" errorFcn="relativeRms"/><br />
</source><br />
<br />
The default error function is '<code>rootRelativeSquareError</code>'. These error functions can be found in the <code>src/matlab/tools/errorFunctions</code> directory. You are free to modify them and add your own. Remember that the choice of error function is very important! Make sure you think well about it. Also see [[Multi-Objective Modeling]].<br />
<br />
=== How do I enable more profilers? ===<br />
<br />
Go to the [[Config:ContextConfig#Profiling| <Profiling>]] tag and put <code>"<nowiki>.*</nowiki>"</code> as the regular expression. See also the next question.<br />
<br />
=== What regular expressions can I use to filter profilers? ===<br />
<br />
See the syntax [http://java.sun.com/j2se/1.5.0/docs/api/java/util/regex/Pattern.html here].<br />
<br />
=== How can I ensure deterministic results? ===<br />
<br />
See : [[Random state]].<br />
<br />
=== How do I get a simple closed-form model (symbolic expression)? ===<br />
<br />
See : [[Using a model]].<br />
<br />
=== How do I enable the Heterogenous evolution to automatically select the best model type? ===<br />
<br />
Simply use the [[Config:AdaptiveModelBuilder#heterogenetic| heterogenetic modelbuilder]] as you would any other.<br />
<br />
=== What is the combineOutputs option? ===<br />
<br />
See [[Running#Models_with_multiple_outputs]]<br />
<br />
=== What error function should I use? ===<br />
<br />
The default error function is the Root Relative Square Error (RRSE). On the other hand meanRelativeError may be more intuitive but in that case you have to be careful if you have function values close to zero since in that case the relative error explodes or even gives infinity. You could also use one of the combined relative error functions (contain a +1 in the denominator to account for small values) but then you get something between a relative and absolute error (=> hard to interpret).<br />
<br />
So to be sure an absolute error seems the safest bet (like the RMSE), however in that case you have to come up with sensible accuracy targets and realize that you will build models that try to fit the regions of high absolute value better than the low ones.<br />
<br />
Picking an error function is a very tricky business and many people do not realize this. Which one is best for you and what targets you use ultimately depends on your application and on what kind of model you want. There is no general answer.<br />
<br />
A recommended read is [http://www.springerlink.com/content/24104526223221u3/ is this paper]. See also the page on [[Multi-Objective Modeling]].<br />
<br />
=== I just want to generate an initial design (no sampling, no modeling) ===<br />
<br />
Do a regular SUMO run, except set the 'maxModelingIterations' in the SUMO tag to 0. The resulting run will only generate (and evaluate) the initial design and save it to samples.txt in the output directory.<br />
<br />
=== How do I start a run with the samples of of a previous run, or with a custom initial design? ===<br />
<br />
Use a Dataset design component, for example:<br />
<br />
<source lang="xml"><br />
<InitialDesign type="DatasetDesign"><br />
<Option key="file" value="/path/to/the/file/containing/the/points.txt"/><br />
</InitialDesign><br />
</source><br />
<br />
=== What is a level plot? ===<br />
<br />
A level plot is a plot that shows how the error histogram changes as the best model improves. An example is:<br />
<gallery><br />
Image:levelplot.png<br />
</gallery><br />
Level plots only work if you have a separate dataset (test set) that the model can be checked against. See the comments in default.xml for how to enable level plots.<br />
<br />
===I am getting a java out of memory error, what happened?===<br />
Datasets are loaded through java. This means that the java heap space is used for storing the data. If you try to load a huge dataset (> 50MB), you might experience problems with the maximum heap size. You can solve this by raising the heap size as described on the following webpage:<br />
[http://www.mathworks.com/support/solutions/data/1-18I2C.html]<br />
<br />
=== How do I force the output of the model to lie in a certain range ===<br />
<br />
See [[Measures#MinMax]].<br />
<br />
=== My problem is high dimensional and has a lot of input parameters (more than 10). Can I use SUMO? ===<br />
<br />
That depends. Remember that the main focus of SUMO is to generate accurate 'global' models. If you want to do sampling the practical dimensionality is limited to around 6-8 (though it depends on the problem and how cheap the simulations are!). Since the more dimensions the more space you need to fill. At that point you need to see if you can extend the models with domain specific knowledge (to improve performance) or apply a dimensionality reduction method ([[FAQ#Can_the_toolbox_tell_me_which_are_the_most_important_inputs_.28.3D_variable_selection.29.3F|see the next question]]). On the other hand, if you don't need to do sample selection but you have a fixed dataset which you want to model. Then the performance on high dimensional data just depends on the model type. For examples SVM type models are independent of the dimension and thus can always be applied. Though things like feature selection are always recommended.<br />
<br />
=== Can the toolbox tell me which are the most important inputs (= variable selection)? ===<br />
<br />
When tackling high dimensional problems a crucial question is "Are all my input parameters relevant?". Normally domain knowledge would answer this question but this is not always straightforward. In those cases a whole set of algorithms exist for doing dimensionality reduction (= feature selection). Support for some of these algorithms may eventually make it into the toolbox but are not currently implemented. That is a whole PhD thesis on its own. However, if a model type provides functions for input relevance determination the toolbox can leverage this. For example, the LS-SVM model available in the toolbox supports Automatic Relevance Determination (ARD). This means that if you use the SUMO Toolbox to generate an LS-SVM model, you can call the function ''ARD()'' on the model and it will give you a list of the inputs it thinks are most important.<br />
<br />
=== Should I use a Matlab script or a shell script for interfacing with my simulation code? ===<br />
<br />
When you want to link SUMO with an external simulation engine (ADS Momentum, SPECTRE, FEBIO, SWAT, ...) you need a [http://en.wikipedia.org/wiki/Shell_script shell script] (or executable) that can take the requested points from SUMO, setup the simulation engine (e.g., set necessary input files), calls the simulator for all the requested points, reads the output (e.g., one or more output files), and returns the results to SUMO (see [[Interfacing with the toolbox]]).<br />
<br />
Which one you choose (matlab script + [[Config:SampleEvaluator#matlab|Matlab Sample Evaluator]], or shell script/executable with [[Config:SampleEvaluator#local|Local Sample Evaluator]] is basically a matter of preference, take whatever is easiest for you.<br />
<br />
HOWEVER, there is one important consideration: Matlab does not support threads so this means that if you use a matlab script to interface with the simulation engine, simulations and modeling will happen sequentially, NOT in parallel. This means the modeling code will sit around waiting, doing nothing, until the simulation(s) have finished. If your simulation code takes a long time to run this is not very efficient. In version 6.2 we will probably fix this by using the Parallel Computing Toolbox.<br />
<br />
On the other hand, using a shell script/executable, does allow the modeling and simulation to occur in parallel (at least if you wrote your interface script in such a way that it can be run multiple times in parallel, i.e., no shared global directories or variables that can cause [http://en.wikipedia.org/wiki/Race_condition race conditions]).<br />
<br />
As a sidenote, note that if you already put work into a Matlab script, it is still possible to use a shell script, by writing a shell script that starts Matlab (using -nodisplay or -nojvm options), executes your script (using the -r option), and exits Matlab again. Of course it is not very elegant and adds some overhead but depending on your situation it may be worth it.<br />
<br />
=== Is there any design documentation available? ===<br />
<br />
There is a PhD thesis fully describing the software architecture and design rationale behind the toolbox. It will be put online in the future. Until then you can [[Contact]] us to obtain a copy.<br />
<br />
== Troubleshooting ==<br />
<br />
=== I have a problem and I want to report it ===<br />
<br />
See : [[Reporting problems]].<br />
<br />
=== I sometimes get flat models when using rational functions ===<br />
<br />
First make sure the model is indeed flat, and does not just appear so on the plot. You can verify this by looking at the output axis range and making sure it is within reasonable bounds. When there are poles in the model, the axis range is sometimes stretched to make it possible to plot the high values around the pole, causing the rest of the model to appear flat. If the model contains poles, refer to the next question for the solution.<br />
<br />
The [[Config:AdaptiveModelBuilder#rational| RationalModel]] tries to do a least squares fit, based on which monomials are allowed in numerator and denominator. We have experienced that some models just find a flat model as the best least squares fit. There are two causes for this:<br />
<br />
* The number of sample points is few, and the model parameters (as explained [[Model types explained#PolynomialModel|here]]) force the model to use only a very small set of degrees of freedom. The solution in this case is to increase the minimum percentage bound in the RationalFactory section of your configuration file: change the <code>"percentBounds"</code> option to <code>"60,100"</code>, <code>"80,100"</code>, or even <code>"100,100"</code>. A setting of <code>"100,100"</code> will force the polynomial models to always exactly interpolate. However, note that this does not scale very well with the number of samples (to counter this you can set <code>"maxDegrees"</code>). If, after increasing the <code>"percentBounds"</code> you still get weird, spiky, models you simply need more samples or you should switch to a different model type.<br />
* Another possibility is that given a set of monomial degrees, the flat function is just the best possible least squares fit. In that case you simply need to wait for more samples.<br />
* The measure you are using is not accurately estimating the true error, try a different measure or error function. Note that a maximum relative error is dangerous to use since a the 0-function (= a flat model) has a lower maximum relative error than a function which overshoots the true behavior in some places but is otherwise correct.<br />
<br />
=== When using rational functions I sometimes get 'spikes' (poles) in my model ===<br />
<br />
When the denominator polynomial of a rational model has zeros inside the domain, the model will tend to infinity near these points. In most cases these models will only be recognized as being `the best' for a short period of time. As more samples get selected these models get replaced by better ones and the spikes should disappear.<br />
<br />
So, it is possible that a rational model with 'spikes' (caused by poles inside the domain) will be selected as best model. This may or may not be an issue, depending on what you want to use the model for. If it doesn't matter that the model is very inaccurate at one particular, small spot (near the pole), you can use the model with the pole and it should perform properly.<br />
<br />
However, if the model should have a reasonable error on the entire domain, several methods are available to reduce the chance of getting poles or remove the possibility altogether. The possible solutions are:<br />
<br />
* Simply wait for more data, usually spikes disappear (but not always).<br />
* Lower the maximum of the <code>"percentBounds"</code> option in the RationalFactory section of your configuration file. For example, say you have 500 data points and if the maximum of the <code>"percentBounds"</code> option is set to 100 percent it means the degrees of the polynomials in the rational function can go up to 500. If you set the maximum of the <code>"percentBounds"</code> option to 10, on the other hand, the maximum degree is set at 50 (= 10 percent of 500). You can also use the <code>"maxDegrees"</code> option to set an absolute bound.<br />
* If you roughly know the output range your data should have, an easy way to eliminate poles is to use the [[Measures#MinMax| MinMax]] [[Measures| Measure]] together with your current measure ([[Measures#CrossValidation| CrossValidation]] by default). This will cause models whose response falls outside the min-max bounds to be penalized extra, thus spikes should disappear.<br />
* Use a different model type (RBF, ANN, SVM,...), as spikes are a typical problem of rational functions.<br />
* Increase the population size if using the genetic version<br />
* Try using the [[SampleSelector#RationalPoleSuppressionSampleSelector| RationalPoleSuppressionSampleSelector]], it was designed to get rid of this problem more quickly, but it only selects one sample at the time.<br />
<br />
However, these solutions may not still not suffice in some cases. The underlying reason is that the order selection algorithm contains quite a lot of randomness, making it prone to over-fitting. This issue is being worked on but will take some time. Automatic order selection is not an easy problem<br />
<br />
=== There is no noise in my data yet the rational functions don't interpolate ===<br />
<br />
[[FAQ#I sometimes get flat models when using rational functions |see this question]].<br />
<br />
=== When loading a model from disk I get "Warning: Class ':all:' is an unknown object class. Object 'model' of this class has been converted to a structure." ===<br />
<br />
You are trying to load a model file without the SUMO Toolbox in your Matlab path. Make sure the toolbox is in your Matlab path. <br />
<br />
In short: Start Matlab, run <code><SUMO-Toolbox-directory>/startup.m</code> (to ensure the toolbox is in your path) and then try to load your model.<br />
<br />
=== When running the SUMO Toolbox you get an error like "No component with id 'annpso' of type 'adaptive model builder' found in config file." ===<br />
<br />
This means you have specified to use a component with a certain id (in this case an AdaptiveModelBuilder component with id 'annpso') but a component with that id does not exist further down in the configuration file (in this particular case 'annpso' does not exist but 'anngenetic' or 'ann' does, as a quick search through the configuration file will show). So make sure you only declare components which have a definition lower down. So see which components are available, simply scroll down the configuration file and see which id's are specified. Please also refer to the [[Toolbox configuration#Declarations and Definitions | Declarations and Definitions]] page.<br />
<br />
=== When using NANN models I sometimes get "Runtime error in matrix library, Choldc failed. Matrix not positive definite" ===<br />
<br />
This is a problem in the mex implementation of the [http://www.iau.dtu.dk/research/control/nnsysid.html NNSYSID] toolbox. Simply delete the mex files, the Matlab implementation will be used and this will not cause any problems.<br />
<br />
=== When using FANN models I sometimes get "Invalid MEX-file createFann.mexa64, libfann.so.2: cannot open shared object file: No such file or directory." ===<br />
<br />
This means Matlab cannot find the [http://leenissen.dk/fann/ FANN] library itself to link to dynamically. Make sure it is in your library path, ie, on unix systems, make sure it is included in LD_LIBRARY_PATH.<br />
<br />
=== When trying to use SVM models I get 'Error during fitness evaluation: Error using ==> svmtrain at 170, Group must be a vector' ===<br />
<br />
You forgot to build the SVM mex files for your platform. For windows they are pre-compiled for you, on other systems you have to compile them yourself with the makefile.<br />
<br />
=== When running the toolbox you get something like '??? Undefined variable "ibbt" or class "ibbt.sumo.config.ContextConfig.setRootDirectory"' ===<br />
<br />
First see [[FAQ#What_is_the_relationship_between_Matlab_and_Java.3F | this FAQ entry]].<br />
<br />
This means Matlab cannot find the needed Java classes. This typically means that you forgot to run 'startup' (to set the path correctly) before running the toolbox (using 'go'). So make sure you always run 'startup' before running 'go' and that both commands are always executed in the toolbox root directory.<br />
<br />
If you did run 'startup' correctly and you are still getting an error, check that Java is properly enabled:<br />
<br />
# typing 'usejava jvm' should return 1 <br />
# typing 's = java.lang.String', this should ''not'' give an error<br />
# typing 'version('-java')' should return at least version 1.5.0<br />
<br />
If (1) returns 0, then the jvm of your Matlab installation is not enabled. Check your Matlab installation or startup parameters (did you start Matlab with -nojvm?)<br />
If (2) fails but (1) is ok, there is a very weird problem, check the Matlab documentation.<br />
If (3) returns a version before 1.5.0 you will have to upgrade Matlab to a newer version or force Matlab to use a custom, newer, jvm (See the Matlab docs for how to do this).<br />
<br />
=== You get errors related to ''gaoptimset'',''psoptimset'',''saoptimset'',''newff'' not being found or unknown ===<br />
<br />
You are trying to use a component of the SUMO toolbox that requires a Matlab toolbox that you do not have. See the [[System requirements]] for more information.<br />
<br />
=== After upgrading I get all kinds of weird errors or warnings when I run my XML files ===<br />
<br />
See [[FAQ#How_do_I_upgrade_to_a_newer_version.3F]]<br />
<br />
=== I get a warning about duplicate samples being selected, why is this? ===<br />
<br />
Sometimes, in special circumstances, multiple sample selectors may select the same sample at the same time. Even though in most cases this is detected and avoided, it can still happen when multiple outputs are modelled in one run, and each output is sampled by a different sample selector. These sample selectors may then accidentally choose the same new sample location.<br />
<br />
=== I sometimes see the error of the best model go up, shouldn't it decrease monotonically? ===<br />
<br />
There is no short answer here, it depends on the situation. Below 'single objective' refers to the case where during the hyperparameter optimization (= the modeling iteration) combineOutputs=false, and there is only a single measure set to 'on'. The other cases are classified as 'multi objective'. See also [[Multi-Objective Modeling]].<br />
<br />
# '''Sampling off'''<br />
## ''Single objective'': the error should always decrease monotonically, you should never see it rise. If it does [[reporting problems|report it as a bug]]<br />
## ''Multi objective'': There is a very small chance the error can temporarily decrease but it should be safe to ignore. In this case it is best to use a multi objective enabled modeling algorithm<br />
# '''Sampling on'''<br />
## ''Single objective'': inside each modeling iteration the error should always monotonically decrease. At each sampling iteration the best models are updated (to reflect the new data), thus there the best model score may increase, this is normal behavior(*). It is possible that the error increases for a short while, but as more samples come in it should decrease again. If this does not happen you are using a poor measure or poor hyperparameter optimization algorithm, or there is a problem with the modeling technique itself (e.g., clustering in the datapoints is causing numerical problems).<br />
## ''Multi objective'': Combination of 1.2 and 2.1.<br />
<br />
(*) This is normal if you are using a measure like cross validation that is less reliable on little data than on more data. However, in some cases you may wish to override this behavior if you are using a measure that is independent of the number of samples the model is trained with (e.g., a dense, external validation set). In this case you can force a monotonic decrease by setting the 'keepOldModels' option in the SUMO tag to true. Use with caution!<br />
<br />
=== At the end of a run I get Undefined variable "ibbt" or class "ibbt.sumo.util.JpegImagesToMovie.createMovie" ===<br />
<br />
This is normal, the warning printed out before the error explains why:<br />
<br />
''[WARNING] jmf.jar not found in the java classpath, movie creation may not work! Did you install the SUMO extension pack? Alternatively you can install the java media framwork from java.sun.com''<br />
<br />
By default, at the end of a run, the toolbox will try to generate a movie of all the intermediate model plots. To do this it requires the extension pack to be installed (you can download it from the SUMO lab website). So install the extension pack and you will no longer get the error. Alternatively you can simply set the "createMovie" option in the <SUMO> tag to "false".<br />
So note that there is nothing to worry about, everything has run correctly, it is just the movie creation that is failing.<br />
<br />
=== On startup I get the error "java.io.IOException: Couldn't get lock for output/SUMO-Toolbox.%g.%u.log" ===<br />
<br />
This error means that SUMO is unable to create the log file. Check the output directory exists and has the correct permissions. If your output directory is on a shared (network) drive this could also cause problems. Also make sure you are running the toolbox (calling 'go') from the toolbox root directory, and not in some toolbox sub directory! This is very important.<br />
<br />
If you still have problems you can override the default logfile name and location as follows:<br />
<br />
In the <FileHandler> tag inside the <Logging> tag add the following option:<br />
<br />
<code><br />
<Option key="Pattern" value="My_SUMO_Log_file.log"/><br />
</code><br />
<br />
This means that from now on the sumo log file will be saved as the file "My_SUMO_Log_file.log" in the SUMO root directory. You can use any path you like.<br />
For more information about this option see [http://java.sun.com/j2se/1.4.2/docs/api/java/util/logging/FileHandler.html the FileHandler Javadoc].<br />
<br />
=== The Toolbox crashes with "Too many open files" what should I do? ===<br />
<br />
This is a known bug, see [[Known_bugs#Version_6.1]].<br />
<br />
If this does not fix your problem then do the following:<br />
<br />
On Windows try increasing the limit in windows as dictated by the error message. Also, when you get the error, use the fopen("all") command to see which files are open and send us the list of filenames. Then we can maybe further help you debug the problem. Even better would be to use the Process Explorer utility [http://technet.microsoft.com/en-us/sysinternals/bb896653.aspx available here]. When you get the error, dont shut down Matlab but start Process explorer and see which SUMO-Toolbox related files are open. If you then [[Reporting_problems|let us know]] we can further debug the problem.<br />
<br />
On Linux again don't shut down Matlab but:<br />
<br />
* open a new terminal window<br />
* type:<br />
<source lang="bash"><br />
lsof > openFiles.txt<br />
</source><br />
* Then [[Contact|send us]] the following information:<br />
** the file openFiles.txt <br />
** the exact Linux distribution you are using (Red Hat 10, CentOS 5, SUSE 11, etc).<br />
** the output of<br />
<source lang="bash"><br />
uname -a ; df -T ; mount<br />
</source><br />
<br />
As a temporary workaround you can try increasing the maximum number of open files ([http://www.linuxforums.org/forum/redhat-fedora-linux-help/64716-where-chnage-file-max-permanently.html see for example here]). We are currently debugging this issue.<br />
<br />
In general: to be safe it is always best to do a SUMO run from a clean Matlab startup, especially if the run is important or may take a long time.<br />
<br />
=== When using the LS-SVM models I get lots of warnings: "make sure lssvmFILE.x (lssvmFILE.exe) is in the current directory, change now to MATLAB implementation..." ===<br />
<br />
The LS-SVMs have a C implementation and a Matlab implementation. If you dont have the compiled mex files it will use the matlab implementation and give a warning. But everything will work properly. To get rid of the warnings, compile the mex files [[Installation#Windows|as described here]], this can be done very easily. Or simply comment out the lines that produce the output in the lssvmlab directory in src/matlab/contrib.<br />
<br />
=== I get an error "Undefined function or method 'trainlssvm' for input arguments of type 'cell'" ===<br />
<br />
You most likely forgot to [[Installation#Extension_pack|install the extension pack]].<br />
<br />
=== When running the SUMO-Toolbox under Linux, the [http://en.wikipedia.org/wiki/X_Window_System X server] suddenly restarts and I am logged out of my session ===<br />
<br />
Note that in Linux there is an explicit difference between the [http://en.wikipedia.org/wiki/Linux_kernel kernel] and the [http://en.wikipedia.org/wiki/X_Window_System X display server]. If the kernel crashes or panics your system completely freezes (you have to reset manually) or your computer does a full reboot. Luckily this is very rare. However, if you display server (X) crashes or restarts it means your operating system is still running fine, its just that you have to log in again since your graphical session has terminated. The FAQ entry is only for the latter. If you find your kernel is panicing or freezing, that is a more fundamental problem and you should contact your system admin.<br />
<br />
So what happens is that after a few seconds when the toolbox wants to plot the first model [http://en.wikipedia.org/wiki/X_Window_System X] crashes and you are suddenly presented with a login screen. The problem is not due to SUMO but rather to the Matlab - Display server interaction.<br />
<br />
What you should first do is set plotModels to false in the [[Config:ContextConfig]] tag, run again and see if the problem occurs again. If it does please [[Reporting_problems| report it]]. If the problem does not occur you can then try the following:<br />
<br />
* Log in as root (or use [http://en.wikipedia.org/wiki/Sudo sudo])<br />
* Edit the following configuration file using a text editor (pico, nano, vi, kwrite, gedit,...)<br />
<br />
<source lang="bash"><br />
/etc/X11/xorg.conf<br />
</source><br />
<br />
Note: the exact location of the xorg.conf file may vary on your system.<br />
<br />
* Look for the following line:<br />
<br />
<source lang="bash"><br />
Load "glx"<br />
</source><br />
<br />
* Comment it out by replacing it by:<br />
<br />
<source lang="bash"><br />
# Load "glx"<br />
</source><br />
<br />
* Then save the file, restart your X server (if you do not know how to do this simply reboot your computer)<br />
* Log in again, and try running the toolbox (making sure plotModels is set to true again). It should now work. If it still does not please [[Reporting_problems| report it]].<br />
<br />
Note:<br />
* this is just an empirical workaround, if you have a better idea please [[Contact|let us know]]<br />
* if you wish to debug further yourself please check the Xorg log files and those in /var/log<br />
* another possible workaround is to start matlab with the "-nodisplay" option. That could work as well.<br />
<br />
=== I get the error "Failed to close Matlab pool cleanly, error is Too many output arguments" ===<br />
<br />
This happens if you run the toolbox on Matlab version 2008a and you have the parallel computing toolbox installed. You can simply ignore this error message, it does not cause any problems. If you want to use SUMO with the parallel computing toolbox you will need Matlab 2008b.<br />
<br />
=== The toolbox seems to keep on running forever, when or how will it stop? ===<br />
<br />
The toolbox will keep on generating models and selecting data until one of the termination criteria has been reached. It is up to ''you'' to choose these targets carefully, so how low the toolbox runs simply depends on what targets you choose. Please see [[Running#Understanding_the_control_flow]].<br />
<br />
Of course choosing a-priori targets up front is not always easy and there is no real solution for this, except thinking well about what type of model you want (see [[FAQ#I_dont_like_the_final_model_generated_by_SUMO_how_do_I_improve_it.3F]]). In doubt you can always use a small value (or 0) and then simply quit the running toolbox using Ctrl-C when you think its been enough.<br />
<br />
While one could implement fancy, automatic stopping algorithms, their actual benefit is questionable.</div>Dgorissenhttp://sumowiki.intec.ugent.be/index.php?title=Installation&diff=5057Installation2010-02-02T20:57:46Z<p>Dgorissen: </p>
<hr />
<div>== Introduction ==<br />
This page will walk you through the SUMO Toolbox installation. Please refer to the [[system requirements]] first. See the [[downloading]] section on how to download the toolbox.<br />
<br />
== Quick start ==<br />
<br />
Quick and dirty instructions:<br />
<br />
# Log into the SUMO lab website with the account information mailed to you and download the toolbox<br />
# Unzip the toolbox zip file, it will create a directory (= the toolbox installation directory)<br />
<!-- # Unzip the activation zip file '''INTO the toolbox installation directory''' (this file was mailed to you after you registered) --><br />
# Start Matlab<br />
# Go to the toolbox directory<br />
# Run '<code>startup</code>'<br />
# Run '<code>go</code>'<br />
<br />
== Basic Installation ==<br />
<br />
=== Toolbox ===<br />
Unzip the toolbox zip file to a directory somewhere on your harddisk, the full path of the SUMO Toolbox (including installation directory) will be referred to as the toolbox installation directory (e.g., c:\software\SUMO-Toolbox-6.3). Note that you do '''not''' have to put the toolbox in the Matlab installation directory, we actually advise against it since it can cause confusing errors.<br />
<br />
Once you have unzipped the toolbox zip file the directory structure looks like this:<br />
<br />
* <code><toolbox installation directory> ''(e.g., c:\software\SUMO-Toolbox-6.3)''</code><br />
** <code>bin/</code> : binaries, executable scripts, ...<br />
** <code>config/</code> : configuration files, location of <code>default.xml</code><br />
** <code>config/demo</code> : a couple of demo configuration files that may help you<br />
** <code>doc/</code> : some documentation<br />
** <code>doc/apidoc</code> : Javadoc and other api docs<br />
** <code>lib/</code> : required libraries (eg: dom4j)<br />
** <code>output/</code> : some output may be placed here (e.g., a global log file)<br />
** <code>src/</code> : all source code<br />
** <code>examples/</code> : project directories of different examples (you can test with these problems and use them as an example to [[Adding an example|add your own]])<br />
<br />
<!--<br />
=== Activation file ===<br />
<br />
Once you have received the activation file simply unzip it '''INTO''' in your toolbox installation directory. So place the zip file in the toolbox installation directory and unzip it there, it should place all files in the correct places (see also the README file in the activation zip). DO NOT unzip the activation file into its own directory somewhere else. Make sure you restart Matlab (if it was running) after you have done this.<br />
<br />
=== Extension pack ===<br />
<br />
There are a number of third party tools and modeling libraries that the SUMO Toolbox can use but that we cannot distribute together with the toolbox. These have been bundled in an extension pack. Only minor patches have been made to the original code to make them work better with SUMO (e.g., remove debug output). To install the extension pack, download the zip file, and unzip it INTO your toolbox installation directory. The files should be placed in the correct directories. Simply re-run 'startup' to make Matlab aware of the new files.<br />
<br />
If you download and/or use these files please respect their licenses (found in doc/licenses), '''THIS IS YOUR RESPONSIBILITY !!!'''.<br />
--><br />
=== Setup ===<br />
<br />
Setting up the toolbox is very easy. Start Matlab, navigate to the toolbox installation directory (not anywhere else, this is important!!) and run '<code>startup</code>'.<br />
<br />
=== Test run ===<br />
<br />
To ensure everything is working you can do a simple run of the toolbox with the default configuration. This means the toolbox will use the setting specified in <code><SUMO-Toolbox-installation-dir>/config/default.xml</code>.<br />
<br />
# Make sure that you are in the toolbox installation directory and you have run '<code>startup</code>' (see above)<br />
# Type '<code>go</code>' and press enter.<br />
# The toolbox will start to model the ''Academic2DTwice'' simulator. This simulator has 2 inputs and 2 outputs, and will be modeled using Kriging models, scored using [[Measures#CrossValidation| CrossValidation]], and samples selected using a combined sample selection method.<br />
# To see the exact settings used open <code>config/default.xml</code>. Feel free to edit this file and play around with the different options.<br />
<br />
The examples directory contains many example simulators that you can use to test the toolbox with. See [[Running#Running_different_examples]].<br />
<br />
== Ok, the test run works, now what? ==<br />
<br />
See [[Running]] page.<br />
<br />
== Problems ==<br />
<br />
See the [[reporting problems]] page.<br />
<br />
== Optional: Compiling libraries ==<br />
<br />
There are some alternative libraries and simulators available that have to be compiled for your specific platform. Instructions depend on your operating system. Ensure you have installed the extension pack before continuing.<br />
<br />
=== Linux/Unix/OSX ===<br />
<br />
# Ensure you have the following environment variables set:<br />
## <code>MATLABDIR=/path/to/your/matlab/installation</code><br />
## <code>JAVA_HOME=/path/to/your/SDK/installation</code><br />
# Ensure you have the usual build tools installed: gcc, g++, autotools, make, etc<br />
# From the command line shell (so NOT from inside Matlab): Go to the toolbox installation directory and type '<code>make</code>'. This will build everything for you (C/C++ files, SVM libraries, ...). If you only want to build certain packages simply '<code>make Package</code>' in the toolbox installation directory. <br />
## Note: if this is giving you problems, and you just want to compile the LS-SVMs you can try running makeLSSVM from inside Matlab (see the Windows instructions below)<br />
# A complete list of available packages follows:<br />
<br />
<br />
{| style="margin: 1em auto 1em auto" border="1"<br />
|-<br />
! Package<br />
! Description<br />
! Requires extension pack<br />
|-<br />
| contrib<br />
| Builds the FANN, SVM (libsvm, LS-SVMlab) and NNSYSID libraries<br />
| Yes<br />
|-<br />
| cexamples<br />
| Builds the binaries for several C/C++ simulators<br />
| No<br />
|}<br />
<br />
=== Windows ===<br />
<br />
# Compiling C/C++ codes (examples):<br />
## You will have to do this on your own using a C/C++ compiler of your choice: Dev-c++/Visual Studio/...<br />
# Compiling LS-SVM libraries:<br />
## In order to use the [http://www.esat.kuleuven.be/sista/lssvmlab/ LS-SVM] backend, you will have to compile the LS-SVM mex files (it will work if you dont but you will get a lot of warning messages about a missing CFile implementation).<br />
## This can be done using the built-in LCC compiler of matlab, by calling '<code>makeLSSVM</code>' from the Matlab command prompt (make sure the SUMO Toolbox is in your path)<br />
# Compiling ANN libraries:<br />
## In order to use the [http://leenissen.dk/fann/ FANN] backend, you will have to compile the FANN library and mex files.<br />
## So far nobody has yet got it to work under Windows, but don't let that stop you.</div>Dgorissenhttp://sumowiki.intec.ugent.be/index.php?title=Installation&diff=5056Installation2010-02-02T20:55:35Z<p>Dgorissen: </p>
<hr />
<div>== Introduction ==<br />
This page will walk you through the SUMO Toolbox installation. Please refer to the [[system requirements]] first. See the [[downloading]] section on how to download the toolbox.<br />
<br />
== Quick start ==<br />
<br />
Quick and dirty instructions:<br />
<br />
# Log into the SUMO lab website with the account information mailed to you and download the toolbox<br />
# Unzip the toolbox zip file, it will create a directory (= the toolbox installation directory)<br />
<!-- # Unzip the activation zip file '''INTO the toolbox installation directory''' (this file was mailed to you after you registered) --><br />
# Start Matlab<br />
# Go to the toolbox directory<br />
# Run '<code>startup</code>'<br />
# Run '<code>go</code>'<br />
<br />
== Basic Installation ==<br />
<br />
=== Toolbox ===<br />
Unzip the toolbox zip file to a directory somewhere on your harddisk, the full path of the SUMO Toolbox (including installation directory) will be referred to as the toolbox installation directory (e.g., c:\software\SUMO-Toolbox-6.3). Note that you do '''not''' have to put the toolbox in the Matlab installation directory, we actually advise against it since it can cause confusing errors.<br />
<br />
Once you have unzipped the toolbox zip file the directory structure looks like this:<br />
<br />
* <code><toolbox installation directory> ''(e.g., c:\software\SUMO-Toolbox-6.3)''</code><br />
** <code>bin/</code> : binaries, executable scripts, ...<br />
** <code>config/</code> : configuration files, location of <code>default.xml</code><br />
** <code>config/demo</code> : a couple of demo configuration files that may help you<br />
** <code>doc/</code> : some documentation<br />
** <code>doc/apidoc</code> : Javadoc and other api docs<br />
** <code>lib/</code> : required libraries (eg: dom4j)<br />
** <code>output/</code> : some output may be placed here (e.g., a global log file)<br />
** <code>src/</code> : all source code<br />
** <code>examples/</code> : project directories of different examples (you can test with these problems and use them as an example to [[Adding an example|add your own]])<br />
<br />
<!--<br />
=== Activation file ===<br />
<br />
Once you have received the activation file simply unzip it '''INTO''' in your toolbox installation directory. So place the zip file in the toolbox installation directory and unzip it there, it should place all files in the correct places (see also the README file in the activation zip). DO NOT unzip the activation file into its own directory somewhere else. Make sure you restart Matlab (if it was running) after you have done this.<br />
<br />
=== Extension pack ===<br />
<br />
There are a number of third party tools and modeling libraries that the SUMO Toolbox can use but that we cannot distribute together with the toolbox. These have been bundled in an extension pack. Only minor patches have been made to the original code to make them work better with SUMO (e.g., remove debug output). To install the extension pack, download the zip file, and unzip it INTO your toolbox installation directory. The files should be placed in the correct directories. Simply re-run 'startup' to make Matlab aware of the new files.<br />
<br />
If you download and/or use these files please respect their licenses (found in doc/licenses), '''THIS IS YOUR RESPONSIBILITY !!!'''.<br />
--><br />
== Setup ==<br />
<br />
Setting up the toolbox is very easy. Start Matlab, navigate to the toolbox installation directory (not anywhere else, this is important!!) and run '<code>startup</code>'.<br />
<br />
== Test run ==<br />
<br />
To ensure everything is working you can do a simple run of the toolbox with the default configuration. This means the toolbox will use the setting specified in <code><SUMO-Toolbox-installation-dir>/config/default.xml</code>.<br />
<br />
# Make sure that you are in the toolbox installation directory and you have run '<code>startup</code>' (see above)<br />
# Type '<code>go</code>' and press enter.<br />
# The toolbox will start to model the ''Academic2DTwice'' simulator. This simulator has 2 inputs and 2 outputs, and will be modeled using Kriging models, scored using [[Measures#CrossValidation| CrossValidation]], and samples selected using a combined sample selection method.<br />
# To see the exact settings used open <code>config/default.xml</code>. Feel free to edit this file and play around with the different options.<br />
<br />
The examples directory contains many example simulators that you can use to test the toolbox with. See [[Running#Running_different_examples]].<br />
<br />
== Ok, the test run works, now what? ==<br />
<br />
See [[Running]] page.<br />
<br />
== Problems ==<br />
<br />
See the [[reporting problems]] page.<br />
<br />
== Optional: Compiling libraries ==<br />
<br />
There are some alternative libraries and simulators available that have to be compiled for your specific platform. Instructions depend on your operating system. Ensure you have installed the extension pack before continuing.<br />
<br />
=== Linux/Unix/OSX ===<br />
<br />
# Ensure you have the following environment variables set:<br />
## <code>MATLABDIR=/path/to/your/matlab/installation</code><br />
## <code>JAVA_HOME=/path/to/your/SDK/installation</code><br />
# Ensure you have the usual build tools installed: gcc, g++, autotools, make, etc<br />
# From the command line shell (so NOT from inside Matlab): Go to the toolbox installation directory and type '<code>make</code>'. This will build everything for you (C/C++ files, SVM libraries, ...). If you only want to build certain packages simply '<code>make Package</code>' in the toolbox installation directory. <br />
## Note: if this is giving you problems, and you just want to compile the LS-SVMs you can try running makeLSSVM from inside Matlab (see the Windows instructions below)<br />
# A complete list of available packages follows:<br />
<br />
<br />
{| style="margin: 1em auto 1em auto" border="1"<br />
|-<br />
! Package<br />
! Description<br />
! Requires extension pack<br />
|-<br />
| contrib<br />
| Builds the FANN, SVM (libsvm, LS-SVMlab) and NNSYSID libraries<br />
| Yes<br />
|-<br />
| cexamples<br />
| Builds the binaries for several C/C++ simulators<br />
| No<br />
|}<br />
<br />
=== Windows ===<br />
<br />
# Compiling C/C++ codes (examples):<br />
## You will have to do this on your own using a C/C++ compiler of your choice: Dev-c++/Visual Studio/...<br />
# Compiling LS-SVM libraries:<br />
## In order to use the [http://www.esat.kuleuven.be/sista/lssvmlab/ LS-SVM] backend, you will have to compile the LS-SVM mex files (it will work if you dont but you will get a lot of warning messages about a missing CFile implementation).<br />
## This can be done using the built-in LCC compiler of matlab, by calling '<code>makeLSSVM</code>' from the Matlab command prompt (make sure the SUMO Toolbox is in your path)<br />
# Compiling ANN libraries:<br />
## In order to use the [http://leenissen.dk/fann/ FANN] backend, you will have to compile the FANN library and mex files.<br />
## So far nobody has yet got it to work under Windows, but don't let that stop you.</div>Dgorissenhttp://sumowiki.intec.ugent.be/index.php?title=Downloading&diff=5055Downloading2010-02-02T20:51:22Z<p>Dgorissen: /* Obtaining the SUMO Toolbox */</p>
<hr />
<div>== Obtaining the SUMO Toolbox ==<br />
<br />
The SUMO Toolbox is available as open source software for non-commercial use. For commercial use a license must be obtained. For details please see the the [[License terms]]. In both cases, we are always open to [[FAQ#What_are_collaboration_options.3F|some form of collaboration]]. If this is possible, please [[Contact|let us know]].<br />
<br />
Download instructions can be found [http://www.sumo.intec.ugent.be/?q=SUMO_toolbox#download on the SUMO lab website].<br />
<br />
== Nightly builds ==<br />
<br />
Nightly snapshots of our development tree are available [[Contact|on request]]. These contain the latest bugfixes, newest features but may also be unstable.<br />
<br />
== Installation ==<br />
<br />
See the [[Installation|installation instructions here]].<br />
<br />
== Latest Features ==<br />
<br />
To get an overview of what has changed in each version please consult the [[Whats new]] and [[Changelog]] pages.</div>Dgorissenhttp://sumowiki.intec.ugent.be/index.php?title=Whats_new&diff=5053Whats new2010-01-31T18:47:19Z<p>Dgorissen: /* 7.0 - 29 January 2010 */</p>
<hr />
<div>This page gives a high level overview of the major changes in each toolbox version. For the detailed list of changes please refer to the [[Changelog]] page. For a list of features in the current version [[About#Features|see the about page]].<br />
<br />
== 7.0 - 29 January 2010 ==<br />
<br />
The biggest change of this release is the move to a new license model. From now on the SUMO Toolbox will be available under an '''open source''' license for non-commercial use. This means there no longer is a time or user limit and there is no need for activation files. Details can be found in the [[License terms]].<br />
<br />
Besides this the code has seen some improvements and cleanups, most notably the Sample Evaluator and (Blind) Kriging components.<br />
<br />
== 6.2.1 - 19 October 2009 ==<br />
<br />
A bug fix release, all users are strongly requested to upgrade.<br />
<br />
== 6.2 - 6 October 2009 ==<br />
<br />
=== Sample Selection infrastructure ===<br />
<br />
The sample selection infrastructure has been dramatically refactored in to a highly flexible and pluggable system. Different sample selection criteria can now be combined in a variety of different ways and the road has been opened towards dynamic sample selection criteria.<br />
<br />
The LOLA-Voronoi algorithm has also seen some improvement with the addition of support for input constraints, sampling multiple outputs simultaneously, and improved support for dealing with auto-sampled inputs.<br />
<br />
Sample points are now also assigned a priority by the sampling algorithm which is reflected in the order they are evaluated. Finally, the Latin Hypercube design has been much improved. It will now attempt to download known optimal designs automatically before attempting to generate one itself.<br />
<br />
=== Model building infrastructure ===<br />
<br />
The two main changes here are firstly the addition of an "ann" modelbuilder beside the existing "anngenetic" one. This one runs faster, is more configurable and the quality of the models is roughly the same. <br />
<br />
Secondly, the (Blind) Kriging models have been much improved. A new implementation was added that replaces (and outperforms) the existing DACE Toolbox plugin. Support has also been added for automatically selecting the Kriging correlation functions.<br />
<br />
=== Other changes ===<br />
<br />
Other noteworthy changes include: the addition of an interpolation model type, cleanups and fixes in the error functions, improved stability in LRMMeasure, faster measures in a multi-output setting, and more informative help texts. Additionally the Model Browser and Profiler GUIs have seen some improvements in usability and functionality.<br />
<br />
At the same time the code has seen more cleanups (it is now fully Classdef compliant) and the use of the parallel computing toolbox (if available) has been improved.<br />
<br />
As always, a detailed list of changes can be found in the [[Changelog]].<br />
<br />
== 6.1.1 - 17 April 2009 ==<br />
<br />
This is a bugfix release that contains some cleanups and fixes to the [[Known bugs]] of version 6.1<br />
<br />
== 6.1 - 16 February 2009 ==<br />
<br />
The main improvements of 6.1 over 6.0.1 are stability, robustness, speed, and improved interfacing. However, a number of major new features have been added as well.<br />
<br />
=== Multi-Objective Modeling ===<br />
<br />
Full [[Multi-Objective Modeling|multi-objective]] support when optimizing the model parameters. This allows an engineer to enforce multiple criteria on the models produced (instead of just a single accuracy measure). This will also allow the efficient generation of model with multiple outputs (already possible through the combineOutputs option but not yet in a multi-objective setting). Together with the automatic model type selection algorithm (heterogenetic) this allows the automatic selection of the best model type per output. See [[Multi-Objective Modeling]] for more information and usage.<br />
<br />
=== Smoothness Measure ===<br />
<br />
A new measure: Linear Reference Model (LRM) has been added. This measure is best used together with other measures and helps to enforce a smooth model surface.<br />
<br />
=== Parallel Computing ===<br />
<br />
Added experimental support for the Matlab Parallel Computing Toolbox (local scheduler only). This means that when the parallelMode option in ContextConfig is switched on, model construction will make use of all available cores/cpu's in order to build models in parallel. This can result in some significant speedups.<br />
<br />
=== General Modeling ===<br />
<br />
The ''heterogenetic'' model builder for automatic model type selection has seen many cleanups and the code has been improved. Now there should be no more manual hacks in order to use it. The rational models now support all available optimization algorithms for order selection and two new model types have been added: Blind Kriging and Gaussian Process Models. An Efficient Global Optimization (EGO) modelbuilder has also been added. This means that a nested kriging model is used internally to predict which model parameters (e.g., of an SVM model) will result in the most accurate fit. All models can now also be queried for derivatives at any point in their domain (regarless of the model type).<br />
<br />
=== Code improvements ===<br />
<br />
From now on Matlab 2008a or later will be required to run the toolbox (see [[System requirements]]). The reason is that most of the modeling code has been ported to Matlabs new [[OO_Programming_in_Matlab|Object Orientation]] implementation. The result is that the modeling code has become much cleaner and much less prone to bugs. The interfaces have become more well-defined and it should be much easier to incorporate your own model type or hyperparameter optimization algorithm.<br />
<br />
Note also that the Gradient Sample Selection algorithm has been renamed to LOLA.<br />
<br />
=== General Improvements ===<br />
<br />
In general, many bugs have been fixed, features, and error reporting improved and performance enhanced. Also note that the default error function is now the [http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=4107991 Bayesian Error Estimation Quotient (BEEQ)]. Trivial dependencies on the Statistics Toolbox have been removed.<br />
<br />
== 6.0.1 - Released 23 August 2008 ==<br />
<br />
* This is a bugfix release that fixes a few things in the 6.0 release (including a crash on startup in some cases, see [[Known bugs]])<br />
<br />
== 6.0 - Released 6 August 2008 ==<br />
<br />
Originally this was supposed to be 5.1 but after many fixes and added features we decided to to promote it to 6.0. Some of the things that can be expected for 6.0 are:<br />
<br />
* Some important modeling related bugs have been fixed leading to improved model accuracy convergence<br />
* A nice graphical user interface (GUI) for loading models, browsing through dimensions, plotting errors, generating movies, ... ([[Model Visualization GUI|See here for more information]])<br />
* Introduction of project directories. All files belonging to a particular problem (simulation code, datasets, XML files, documentation, ...) are now grouped together in a project directory instead of being spread out over 3 different places.<br />
* Support for autosampling, one or more dimensions can be ignored during adaptive sampling. This is useful if the simulation code can generate samples for that dimension itself (e.g., frequency samples in the case of a frequency domain simulator in Electro-Magnetism)<br />
* Models now remember axis lables, measure scores, and output names<br />
* An export function has been added to export models to a standalone Matlab script (.m file). Not supported for all model types yet.<br />
* Proper support for Matlab R2008<br />
* A simple new model type "PolynomialModel" that builds polynomial models with a fixed (user defined) order<br />
* Note that in some cases loading models generated by older toolbox versions will not work and give an error<br />
<br />
And of course countless bugfixes, performance, and feature enhancements. '''Upgrading is strongly advised'''.<br />
<br />
== 5.0 - Released 8 April 2008 ==<br />
<br />
=== SUMO Toolbox ===<br />
<br />
In April 2008, the first public release of the '''SUrrogate MOdeling (SUMO) Toolbox''' occurred.<br />
<br />
=== Sampling related changes ===<br />
<br />
The sample selection and evaluation backends have seen some major improvements. <br />
<br />
The number of samples selected each iteration need no longer be chosen a priori but is determined on the fly based on the time needed for modeling, the average length of the past 'n' simulations and the number of compute nodes (or CPU cores) available. Of course, a user specified upper bound can still be specified. It is now also possible to evaluate data points in batches instead of always one-by-one. This is useful if, for example, there is a considerable overhead for submitting one point.<br />
<br />
In addition, data points can be assigned priorities by the sample selection algorithm. These priorities are then reflected in the scheduling decisions made by the sample evaluator. It now also becomes possible to add different priority management policies. For example, one could require that 'interest' in sample points be renewed, else their priorities will degrade with time.<br />
<br />
A new sample selection algorithm has been added that can use any function as a criterion of where to select new samples. This function is able to use all the information the surrogate provides to calculate how interesting a certain sample is. Internally, a numeric global optimizer is applied on the criterion to determine the next sample point(s). There are several criterions implemented, mostly for global optimization. For instance the 'expected improvement criterion' is very efficient for global optimization as it balances between optimization itself and refining the surrogate.<br />
<br />
Finally the handling of failed or 'lost' data points has become much more robust. Pending points are automatically removed if their evaluation time exceeds a multiple of the average evaluation time. Failed points can also be re-submitted a number of times before being regarded as permanently failed.<br />
<br />
=== Modeling related changes ===<br />
<br />
The modeling code has seen some much needed cleanups. Adding new model types and improving the existing ones is now much more straightforward.<br />
<br />
Since the default Matlab neural network model implementation is quite slow, two additional implementations were added based on [http://fann.sf.net FANN] and [http://www.iau.dtu.dk/research/control/nnsysid.html NNSYSID] which are much faster. In addition the NNSYSID implementation also supports pruning. However, though these two implementations are faster, the Matlab implementation still outperforms them accuracy wise.<br />
<br />
An intelligent seeding strategy has been enabled. The starting point/population of each new model parameter optimization run is now chosen intelligently in order to achieve a more optimal search of the model parameter space. This leads to better models faster.<br />
<br />
=== Optimization related changes ===<br />
<br />
* The Optimization framework was removed due to [[FAQ#What_about_surrogate_driven_optimization.3F|several reasons]].<br />
* Added an [[Optimizer|optimizer]] class hierarchy for solving subproblems transparently.<br />
* Added several criterions for optimization, available through the [[Config:SampleSelector#isc|InfillSamplingCriterion]].<br />
<br />
=== Various changes ===<br />
<br />
The default 'error function' is now the root relative square error (= a global relative error) instead of the absolute root mean square error. <br />
<br />
The memory usage has been drastically reduced when performing many runs with multiple datasets (datasets are loaded only once).<br />
<br />
The default settings have been harmonized and much improved. For example the SVM parameter space is now searched in log10 instead of loge. The MinMax measure is now also enabled by default if you do not specify any other measure. This means that if you specify minimum and maximum bounds in the simulator xml file, models which do not respect these bounds are penalized.<br />
<br />
Finally this release has seen countless cleanups, bug fixes and feature enhancements.</div>Dgorissenhttp://sumowiki.intec.ugent.be/index.php?title=Whats_new&diff=5052Whats new2010-01-31T18:46:26Z<p>Dgorissen: </p>
<hr />
<div>This page gives a high level overview of the major changes in each toolbox version. For the detailed list of changes please refer to the [[Changelog]] page. For a list of features in the current version [[About#Features|see the about page]].<br />
<br />
== 7.0 - 29 January 2010 ==<br />
<br />
The biggest change of this release is the move to a new license model. From now on the SUMO Toolbox will be available under an open source license for non-commercial use. This means there no longer is a time or user limit and there is no need for activation files. Details can be found in the [[License terms]].<br />
<br />
Besides this the code has seen some improvements and cleanups, most notably the Sample Evaluator and (Blind) Kriging components.<br />
<br />
== 6.2.1 - 19 October 2009 ==<br />
<br />
A bug fix release, all users are strongly requested to upgrade.<br />
<br />
== 6.2 - 6 October 2009 ==<br />
<br />
=== Sample Selection infrastructure ===<br />
<br />
The sample selection infrastructure has been dramatically refactored in to a highly flexible and pluggable system. Different sample selection criteria can now be combined in a variety of different ways and the road has been opened towards dynamic sample selection criteria.<br />
<br />
The LOLA-Voronoi algorithm has also seen some improvement with the addition of support for input constraints, sampling multiple outputs simultaneously, and improved support for dealing with auto-sampled inputs.<br />
<br />
Sample points are now also assigned a priority by the sampling algorithm which is reflected in the order they are evaluated. Finally, the Latin Hypercube design has been much improved. It will now attempt to download known optimal designs automatically before attempting to generate one itself.<br />
<br />
=== Model building infrastructure ===<br />
<br />
The two main changes here are firstly the addition of an "ann" modelbuilder beside the existing "anngenetic" one. This one runs faster, is more configurable and the quality of the models is roughly the same. <br />
<br />
Secondly, the (Blind) Kriging models have been much improved. A new implementation was added that replaces (and outperforms) the existing DACE Toolbox plugin. Support has also been added for automatically selecting the Kriging correlation functions.<br />
<br />
=== Other changes ===<br />
<br />
Other noteworthy changes include: the addition of an interpolation model type, cleanups and fixes in the error functions, improved stability in LRMMeasure, faster measures in a multi-output setting, and more informative help texts. Additionally the Model Browser and Profiler GUIs have seen some improvements in usability and functionality.<br />
<br />
At the same time the code has seen more cleanups (it is now fully Classdef compliant) and the use of the parallel computing toolbox (if available) has been improved.<br />
<br />
As always, a detailed list of changes can be found in the [[Changelog]].<br />
<br />
== 6.1.1 - 17 April 2009 ==<br />
<br />
This is a bugfix release that contains some cleanups and fixes to the [[Known bugs]] of version 6.1<br />
<br />
== 6.1 - 16 February 2009 ==<br />
<br />
The main improvements of 6.1 over 6.0.1 are stability, robustness, speed, and improved interfacing. However, a number of major new features have been added as well.<br />
<br />
=== Multi-Objective Modeling ===<br />
<br />
Full [[Multi-Objective Modeling|multi-objective]] support when optimizing the model parameters. This allows an engineer to enforce multiple criteria on the models produced (instead of just a single accuracy measure). This will also allow the efficient generation of model with multiple outputs (already possible through the combineOutputs option but not yet in a multi-objective setting). Together with the automatic model type selection algorithm (heterogenetic) this allows the automatic selection of the best model type per output. See [[Multi-Objective Modeling]] for more information and usage.<br />
<br />
=== Smoothness Measure ===<br />
<br />
A new measure: Linear Reference Model (LRM) has been added. This measure is best used together with other measures and helps to enforce a smooth model surface.<br />
<br />
=== Parallel Computing ===<br />
<br />
Added experimental support for the Matlab Parallel Computing Toolbox (local scheduler only). This means that when the parallelMode option in ContextConfig is switched on, model construction will make use of all available cores/cpu's in order to build models in parallel. This can result in some significant speedups.<br />
<br />
=== General Modeling ===<br />
<br />
The ''heterogenetic'' model builder for automatic model type selection has seen many cleanups and the code has been improved. Now there should be no more manual hacks in order to use it. The rational models now support all available optimization algorithms for order selection and two new model types have been added: Blind Kriging and Gaussian Process Models. An Efficient Global Optimization (EGO) modelbuilder has also been added. This means that a nested kriging model is used internally to predict which model parameters (e.g., of an SVM model) will result in the most accurate fit. All models can now also be queried for derivatives at any point in their domain (regarless of the model type).<br />
<br />
=== Code improvements ===<br />
<br />
From now on Matlab 2008a or later will be required to run the toolbox (see [[System requirements]]). The reason is that most of the modeling code has been ported to Matlabs new [[OO_Programming_in_Matlab|Object Orientation]] implementation. The result is that the modeling code has become much cleaner and much less prone to bugs. The interfaces have become more well-defined and it should be much easier to incorporate your own model type or hyperparameter optimization algorithm.<br />
<br />
Note also that the Gradient Sample Selection algorithm has been renamed to LOLA.<br />
<br />
=== General Improvements ===<br />
<br />
In general, many bugs have been fixed, features, and error reporting improved and performance enhanced. Also note that the default error function is now the [http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=4107991 Bayesian Error Estimation Quotient (BEEQ)]. Trivial dependencies on the Statistics Toolbox have been removed.<br />
<br />
== 6.0.1 - Released 23 August 2008 ==<br />
<br />
* This is a bugfix release that fixes a few things in the 6.0 release (including a crash on startup in some cases, see [[Known bugs]])<br />
<br />
== 6.0 - Released 6 August 2008 ==<br />
<br />
Originally this was supposed to be 5.1 but after many fixes and added features we decided to to promote it to 6.0. Some of the things that can be expected for 6.0 are:<br />
<br />
* Some important modeling related bugs have been fixed leading to improved model accuracy convergence<br />
* A nice graphical user interface (GUI) for loading models, browsing through dimensions, plotting errors, generating movies, ... ([[Model Visualization GUI|See here for more information]])<br />
* Introduction of project directories. All files belonging to a particular problem (simulation code, datasets, XML files, documentation, ...) are now grouped together in a project directory instead of being spread out over 3 different places.<br />
* Support for autosampling, one or more dimensions can be ignored during adaptive sampling. This is useful if the simulation code can generate samples for that dimension itself (e.g., frequency samples in the case of a frequency domain simulator in Electro-Magnetism)<br />
* Models now remember axis lables, measure scores, and output names<br />
* An export function has been added to export models to a standalone Matlab script (.m file). Not supported for all model types yet.<br />
* Proper support for Matlab R2008<br />
* A simple new model type "PolynomialModel" that builds polynomial models with a fixed (user defined) order<br />
* Note that in some cases loading models generated by older toolbox versions will not work and give an error<br />
<br />
And of course countless bugfixes, performance, and feature enhancements. '''Upgrading is strongly advised'''.<br />
<br />
== 5.0 - Released 8 April 2008 ==<br />
<br />
=== SUMO Toolbox ===<br />
<br />
In April 2008, the first public release of the '''SUrrogate MOdeling (SUMO) Toolbox''' occurred.<br />
<br />
=== Sampling related changes ===<br />
<br />
The sample selection and evaluation backends have seen some major improvements. <br />
<br />
The number of samples selected each iteration need no longer be chosen a priori but is determined on the fly based on the time needed for modeling, the average length of the past 'n' simulations and the number of compute nodes (or CPU cores) available. Of course, a user specified upper bound can still be specified. It is now also possible to evaluate data points in batches instead of always one-by-one. This is useful if, for example, there is a considerable overhead for submitting one point.<br />
<br />
In addition, data points can be assigned priorities by the sample selection algorithm. These priorities are then reflected in the scheduling decisions made by the sample evaluator. It now also becomes possible to add different priority management policies. For example, one could require that 'interest' in sample points be renewed, else their priorities will degrade with time.<br />
<br />
A new sample selection algorithm has been added that can use any function as a criterion of where to select new samples. This function is able to use all the information the surrogate provides to calculate how interesting a certain sample is. Internally, a numeric global optimizer is applied on the criterion to determine the next sample point(s). There are several criterions implemented, mostly for global optimization. For instance the 'expected improvement criterion' is very efficient for global optimization as it balances between optimization itself and refining the surrogate.<br />
<br />
Finally the handling of failed or 'lost' data points has become much more robust. Pending points are automatically removed if their evaluation time exceeds a multiple of the average evaluation time. Failed points can also be re-submitted a number of times before being regarded as permanently failed.<br />
<br />
=== Modeling related changes ===<br />
<br />
The modeling code has seen some much needed cleanups. Adding new model types and improving the existing ones is now much more straightforward.<br />
<br />
Since the default Matlab neural network model implementation is quite slow, two additional implementations were added based on [http://fann.sf.net FANN] and [http://www.iau.dtu.dk/research/control/nnsysid.html NNSYSID] which are much faster. In addition the NNSYSID implementation also supports pruning. However, though these two implementations are faster, the Matlab implementation still outperforms them accuracy wise.<br />
<br />
An intelligent seeding strategy has been enabled. The starting point/population of each new model parameter optimization run is now chosen intelligently in order to achieve a more optimal search of the model parameter space. This leads to better models faster.<br />
<br />
=== Optimization related changes ===<br />
<br />
* The Optimization framework was removed due to [[FAQ#What_about_surrogate_driven_optimization.3F|several reasons]].<br />
* Added an [[Optimizer|optimizer]] class hierarchy for solving subproblems transparently.<br />
* Added several criterions for optimization, available through the [[Config:SampleSelector#isc|InfillSamplingCriterion]].<br />
<br />
=== Various changes ===<br />
<br />
The default 'error function' is now the root relative square error (= a global relative error) instead of the absolute root mean square error. <br />
<br />
The memory usage has been drastically reduced when performing many runs with multiple datasets (datasets are loaded only once).<br />
<br />
The default settings have been harmonized and much improved. For example the SVM parameter space is now searched in log10 instead of loge. The MinMax measure is now also enabled by default if you do not specify any other measure. This means that if you specify minimum and maximum bounds in the simulator xml file, models which do not respect these bounds are penalized.<br />
<br />
Finally this release has seen countless cleanups, bug fixes and feature enhancements.</div>Dgorissenhttp://sumowiki.intec.ugent.be/index.php?title=License_terms&diff=5051License terms2010-01-31T18:42:42Z<p>Dgorissen: </p>
<hr />
<div>[[Image:osilogo.jpg|90px|right|Open Source Initiative]]<br />
<br />
The SUMO Toolbox is available under a dual license model. For '''non-commercial''' use, the toolbox is available under the [http://www.fsf.org/licensing/licenses/agpl-3.0.html GNU Affero General Public License version 3] (AGPLv3), an [http://www.opensource.org/ OSI] approved [http://en.wikipedia.org/wiki/Open_source open source] license. For use in a commercial setting, a commercial license must be obtained.<br />
<br />
In addition we require that any reference to the SUMO Toolbox be accompanied by the [[Citing|corresponding publication]].<br />
<br />
== License terms ==<br />
<br />
This program is free software; you can redistribute it and/or modify it under<br />
the terms of the GNU Affero General Public License version 3 as published by the<br />
Free Software Foundation.<br />
<br />
This program is distributed in the hope that it will be useful, but WITHOUT ANY<br />
WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A<br />
PARTICULAR PURPOSE. See the GNU Affero General Public License for more details.<br />
<br />
You should have received a copy of the GNU Affero General Public License along<br />
with this program; if not, see http://www.gnu.org/licenses or write to the Free<br />
Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA<br />
02110-1301 USA, or download the license from the following URL:<br />
<br />
[http://www.fsf.org/licensing/licenses/agpl-3.0.html http://www.fsf.org/licensing/licenses/agpl-3.0.html]<br />
<br />
In accordance with Section 7(b) of the GNU Affero General Public License, these<br />
Appropriate Legal Notices must retain the display of the "SUMO Toolbox" text and<br />
homepage. In addition, when mentioning the program in written work, reference<br />
must be made to the [[Citing|corresponding publication]].<br />
<br />
You can be released from these requirements by purchasing a commercial license.<br />
Buying such a license is in most cases mandatory as soon as you develop<br />
commercial activities involving the SUMO Toolbox software. Commercial activities<br />
include: consultancy services or using the SUMO Toolbox in commercial projects <br />
(standalone, on a server, through a webservice or other remote access technology).<br />
<br />
For details about a commercial license please [[Contact]] us.<br />
<br />
== Notice ==<br />
<br />
Only our own original work is licensed under the terms described above.<br />
The licenses of some libraries used might impose different redistribution or<br />
general licensing terms than those stated above. Users and<br />
redistributors are hereby requested to verify these conditions and agree upon<br />
them.<br />
<br />
A list of the licenses of these 3rd party libraries can be found in doc/licenses.<br />
The corresponding code can be found in the src/matlab/contrib directory and the<br />
ibbt.sumo.contrib Java package.</div>Dgorissenhttp://sumowiki.intec.ugent.be/index.php?title=Changelog&diff=5050Changelog2010-01-31T18:41:21Z<p>Dgorissen: </p>
<hr />
<div>Below you will find the detailed list of changes in every new release. For a more high level overview see the [[Whats new]] page.<br />
<br />
== 7.0 - 29 January 2010 ==<br />
<br />
* Move to a open source licence (AGPL), see [[License terms]]<br />
* Experimental support for classification and 3D geometric modeling problems (see the 2 new demos)<br />
* Thorough cleanup of SampleEvaluator related classes and package structure<br />
* Improved speed and stability in (Blind) Kriging models and fixed the correlation function derivatives.<br />
* Vastly improved the utilization of compute nodes if a distributed sample evaluator is used that interfaces with a cluster or grid<br />
* Support for plotting the prediction uncertainty in the model browser GUI <br />
* Support for quasi random sequences as initial design<br />
<br />
== 6.2.1 - 19 October 2009 ==<br />
<br />
* This release fixes a number of bugs from 6.2. All users are strongly requested to upgrade.<br />
<br />
== 6.2 - 6 October 2009 ==<br />
<br />
* A new neural network modelbuilder "ann". This is a lot faster than the existing "anngenetic" and the quality of the models is roughly the same<br />
* The sample selection infrastructure is now much more powerful, sample selection criteria can be combined with much more flexibility. This opens the way to dynamic variation of sampling criteria.<br />
* Support for Input constraints / multiple output sampling in the LOLA-Voronoi sample selection algorithm<br />
* Support for auto-sampled inputs (e.g., frequency in an EM context) in LOLA-Voronoi. This is useful if a particular input is already sampled by your simulator.<br />
* Automatic filtering of samples close to each other in CombinedSampleSelector<br />
* Support for TriScatteredInterp in InterpolationModel when it is available (Matlab version 2009a and later)<br />
* Sample selectors that support it (for example: LOLA-Voronoi) now give priorities to new samples, to that samples are submitted and evaluated in order of importance.<br />
* Support for pre-calculated Latin Hypercube Designs, these will be automatically downloaded and used where possible and will improve performance<br />
* The Blind Kriging models have been improved and can now also be used as ordinary Kriging models. Since these models are superior to the existing DACE Toolbox models, the DACE Toolbox backend has been removed.<br />
* The EGOModelBuilder (do model parameter optimization using the EGO algorithm) now uses a nested blind kriging model instead of one based on the DACE Toolbox. This allows for better accuracy<br />
* The Kriging correlation functions can now be chosen automatically (instead of only the correlation parameters)<br />
* Support for multiobjective optimization in the EGO framewok (extended version of probability of improvement)<br />
* DelaunaySampleSelector, OptimizeCriterion support the same set of criterions<br />
* EGO Improvement criteria can now be used together with DACEModel, RBFModel, and SVMModel (LS-SVM backend only)<br />
* Added a model type and builder that does linear/cubic/nearest neighbour interpolation<br />
* All error functions and measures now consistently deal with complex valued data and multiple output models<br />
* Various improvements in the Model Info GUI as part of the Model browser tool<br />
* Improved stability in LRMMeasure, a behavioral complexity metric to help ensure parsimonious models<br />
* The profiler GUI has been updated and improved, and support for textual profilers has been added.<br />
* Improved performance when using Measures, especially for models with multiple outputs.<br />
* Improved management of the best model trace, also in pareto mode<br />
* Removed the debug output when using (LS-)SVM models and added compiled mex files for Windows<br />
* Ported the remaining classes to Matlabs Classdef format<br />
* Increased use of the parallel computing toolbox (if available) in order to speed up modeling<br />
* Improved the Matlab file headers so the help text is more informative (always includes at least the signature)<br />
* Support for plotting the model prediction uncertainty in the model browser (only for 1D plots and not supported by all model types)<br />
* Added support for so-called "reference by id" on every level of the config. If a tag of a particular type is defined on top-level with an id, it can be referenced everywhere else, instead of copying it entirely. See rationalPoleSupression sample selector and patternsearch Optimizer, for example.<br />
* EmptyModelBuilder added - in case you just want to use the sequential design facilities of the toolbox, but not its models.<br />
* Various cleanups and bugfixes<br />
<br />
== 6.1.1 - 17 April 2009 ==<br />
<br />
* Various cleanups and bugfixes (see [[Known bugs]] for 6.1)<br />
<br />
== 6.1 - 16 February 2009 ==<br />
<br />
* The default error function is now the Bayesian Error Estimation Quotient (BEEQ)<br />
* Full support for multi-objective model generation, multiple measures can now be enforced simultaneously. This can also be applied to generating models with multiple outputs (combineOutputs = true). Together with the automatic model type selection algorithm (heterogenetic) this allows the automatic selection of the best model type per output.<br />
* The model browser GUI now supports QQ plots<br />
* The Gradient Sample Selection Algorithm has been renamed to the Local Linear Sample Selector (LOLASampleSelector)<br />
* The modelbuilders have been refactored and some removed. This is a result of the optimizer hierarchy being cleaned up. Adding a new model parameter optimization routine should now be more straightforward.<br />
* The interface classes have been renamed to factories as this is more correct. All implementations have been ported to Matlab's new Classdef format and the inherritance hierarchy has been cleaned up. It should now be significantly easier to add support for new approximation types.<br />
* The ModelInterfaces are now known as ModelFactories, this is more correct. Note that the XML tagnames have been changed as well.<br />
* The Model class hierarchy has been converted to the new Classdef format. This means that models generated with previous versions of the toolbox will no longer be loadable in this version.<br />
* The heterogenetic model builder for automatic model type selection has been cleaned up and made more robust.<br />
* Rational models now support all available modelbuilders. This means that order selection can be done by PSO DIRECT, Simulated Annealing, ... instead of just GA and Sequential.<br />
* New optimizers added are (they can also be used as model builders): Differential Evolution<br />
* Added a Blind Kriging model type implementation as a backend of KrigingModel<br />
* Addition of an EGO model builder. This allows optimization of the model parameters using the well known Efficient Global Optimization (EGO) algorithm. In essence this uses a nested Kriging Model to predict which parameters should be used to build the next model.<br />
* Trivial dependencies on the Statistics Toolbox have been removed<br />
* Added a new smoothness measure (LRMMeasure) that helps to ensure smooth models and reduce erratic bumps. It works best when combined with other Measures (such as SampleError for ANN models) <br />
* Models now have a simple evaluateDerivative() method that allows one to easily get gradient information. The base class implementation is very simple but works. Models can override this method to get more efficient implementations.<br />
* Added experimental support for the Matlab Parallel Computing Toolbox (local scheduler only). This means that when the parallelMode option in ContextConfig is switched on, model construction will make use of all available cores/cpu's.<br />
* Many speed improvements, some quite significant.<br />
* Various cleanups and bugfixes<br />
<br />
== 6.0.1 - Released 23 August 2008 ==<br />
<br />
* Fixed a number of (minor) bugs in the 6.0 release<br />
<br />
== 6.0 - Released 6 August 2008 ==<br />
<br />
* Many important bugs have been fixed that could have resulted in sub-optimal models<br />
* Addition of a Model Browser GUI, this allows you to easily 'walk' through multi-dimensional models<br />
* Moved the InitialDesign tag outside of the SUMO tag<br />
* Some speed improvements<br />
* Removed support for dummy inputs<br />
* Measure scores and input/output names are saved inside the models, allowing for more usable plots<br />
* Added the project directory concept, each example is now self contained in its own directory<br />
* #simulatorname# can now be used in the run name, it will get replaced by the real simulator name<br />
* Input dimensions can be ignored during sampling if the simulator samples them for you. This is useful in EM applications for example where frequency points can be cheap.<br />
* Logging framework revamped, logs can now be saved on a per run basis<br />
* The global score calculation has changed! it is a weighted sum of all individual measures. (the weights are configurable but default to 1)<br />
* Added a simple polynomial model where the orders can be chosen manually<br />
* Countless cleanups, minor bugfixes and feature enhancements<br />
<br />
== 5.0 - Released 8 April 2008 ==<br />
<br />
* In April 2008, the first public release of the '''Surrogate Modeling (SUMO) Toolbox''' (v5.0) occurred. <br />
* A major new release with countless fixes, improvements, new sampling and modeling algorithms, and much more.<br />
<br />
List of changes:<br />
<br />
* Fixed the 'Known bugs' for v4.2 (see Wiki)<br />
* data points now have priorities (assigned by the sample selectors)<br />
* Vastly reworked and improved the sample evaluator framework<br />
** robust handling of failed or 'lost' data points<br />
** pluggable input queue infrastructure to make advanced scheduling policies possible<br />
* The number of samples to select each iteration is now selected dynamically, based on the time needed for modeling, the length of one simulation, the number of compute nodes available, ... A user specified upper bound can till be specified of course.<br />
* Model plots are now in the original space instead of the normalized ([-1 1]) space<br />
* The default error function is now the root relative square error (= a global relative error)<br />
* Intelligent seeding of each new model parameter optimization iteration. This means the model parameter space is searched much more efficiently and completely<br />
* Added a fast Neural Network Modeler based on FANN (http://fann.sf.net)<br />
* Added a Neural Network Modeler based on NNSYSID (http://www.iau.dtu.dk/research/control/nnsysid.html)<br />
* The LS-SVM model type has been merged with the SVM model type. The SVM model now supports three backends: libSVM, SVMlight, and lssvm<br />
* Added a SampleSelector using infill sampling criterions (ISC).<br />
** The expected improvement from EGO/superEGO is provided among others. (only usable with Kriging and RBF)<br />
* More robust handling of SSH sessions when running simulators on a remote cluster<br />
* The TestSamples measure has been renamed to ValidationSet<br />
* The Polynomial model type has been renamed to the more apt Rational model<br />
* The grid and voronoi sample selectors have been renamed to Error and Density respectively<br />
* Drastically reduced memory usage when performing many runs with multiple datasets (datasets are cached)<br />
* Added utility functions for easily summarizing profiler data from a large number of runs<br />
* Lots of speed improvements in the gradient sample selector<br />
* The default settings have been harmonized and much improved<br />
* The (LS)SVM parameter space is now searched in log10 instead of ln space<br />
* Added a TestMinimum measure <br />
** compares the minimum of the surrogate model against a predefined value (for instance a known minimum)<br />
* Added a MinimumProfiler<br />
** tracks the minimum of the surrogate model versus the number of iterations<br />
* Movie creation now works on all supported platforms<br />
* Added an optimizer class hierarchy for solving subproblems transparantly<br />
* Cleaned up the structure of all the model classes so they no longer contain an interface object. This was confusing and led to error prone code. Virtually all subsref and subassgn implementations have also been removed.<br />
* The MinMax measure is now enabled by default<br />
* The Optimization framework was removed (and replaced) for various reasons, see: http://sumowiki.intec.ugent.be/index.php/FAQ#What_about_surrogate_driven_optimization.3F<br />
* Fixed the file output of the profiler, formatting is correct now<br />
* New implementation of a maximin latin hypercube design<br />
** Minimizes pairwise correlation<br />
** Minimizes intersite distance<br />
* Removed dependency of factorial design on the statistics toolbox<br />
* Added a plotOptions tag, this allows for more customisability of model plots (grey scale, light effects, ...)<br />
* Profiler plots can now also be saved as JPG, PNG, EPS, PDF, PS and SVG<br />
* Countless cleanups, minor bugfixes and feature enhancements<br />
<br />
== 4.2 - Released 18 October 2007 ==<br />
<br />
* Fixed the 'Known bugs' for v4.1 (see Wiki)<br />
* Simulators can be passed options through an <Options> tag<br />
* Added a fixed model builder so you can manually force which model parameters to use<br />
* Removed ProActive dependency for the SGE distributed backend<br />
* Improved Makefile under unix/linux<br />
* Data produced by simulators no longer needs to be pre-scaled to [-1 1], this can be done automatically from the simulator configuration file<br />
* Deprecated the optimization framework. It is currently under re-design and a better, more integrated version, will be released with the next toolbox version.<br />
* Lots of cleanups, minor bugfixes and small feature enhancements<br />
* In October 2007, the development of the M3-Toolbox was discontinued.<br />
<br />
== 4.1 - Released 27 July 2007 ==<br />
<br />
* Fixed the 'Known bugs' for v4.0 (see Wiki)<br />
* Vastly improved test sample distribution if a test set is created on the fly<br />
* Gradient sample selector now works with complex outputs and has improved neighbourhood selection<br />
* Speed and usability improvements in the profiler framework<br />
* Improvements in the profiler DockedView widget (added a right click context menu)<br />
* Addition of some new examples<br />
* Added an option (on by default) that selects a certain percentage of the grid sample selector's points randomly, making the algorithm more robust<br />
* Some cleanups, minor bugfixes and feature enhancements<br />
<br />
== 4.0 - Released 22 June 2007 ==<br />
<br />
* IMPORTANT: the best model score is now 0 instead of 1, this is more intuitive<br />
* Reworked and improved the model scoring mechanism, now based on a pareto analysis. This makes it possible to combine multpile measures in a sensible way.<br />
* Added a proof of concept surrogate driven optimization framework. Note this is an initial implementation which works, but don't expect state of the art results.<br />
* Cleanup and refactoring of the profiler framework<br />
* The profiling of model parameters has been totally reworked and this can now easily be tracked in a nice GUI widget<br />
* Cleanup of error function logic so you can now easily use different error functions (relative, RMS, ...) in the measures<br />
* Improved model plotting<br />
* Support for the SVMlight library (you must download it yourself in order to use it)<br />
* Added a MinMax measure which can be used to suppress spikes in rational models<br />
* Support for extinction prevention in the heterogenetic modeler<br />
* Fixed warnings (and in some cases errors) when loading models from disk<br />
* Respect the maximum running time more accurately<br />
* Many cleanups, minor bugfixes and feature enhancements<br />
<br />
== 3.3 - Released 2 May 2007 ==<br />
<br />
* Fixed incorrect summary at the end of a run<br />
* Fixed bug due to duplicate sample points<br />
* Ability to evaluate multiple samples in parallel locally (support for dual/multi-core machines)<br />
* Speedups when reading in datasets<br />
* Added 2 new modelbuilders that optimize the parameters using;<br />
** Pattern Search (requires the Matlab direct search toolbox)<br />
** Simulated Annealing (requires Matlab v7.4 and the direct search toolbox)<br />
** The Matlab Optimization Toolbox (includes different gradient based methods like BGFS)<br />
* A new density based sample selction algorithm (VoronoiSampleSelector)<br />
* New simulator examples to test with<br />
* Addition of a profiler to generate levelplots<br />
* Ability to generate Matlab API documentation using m2html<br />
* New neural network training algorithms based on Differential Evolution and Particle Swarm Optimization<br />
* It is now possible to call the toolbox with specific samples/values directly, e.g., go('myConfigFile.xml',xValues,yValues);<br />
* Many minor bugfixes and feature enhancements<br />
<br />
== 3.2 - Released 9 Mar 2007 ==<br />
<br />
* Many important bugfixes<br />
* Documentation improvements<br />
* Fully working support for RBF models<br />
* New measure profilers that track the errors on measures<br />
* Many new predefined functions and datasets to test with. We now have over 50 examples!<br />
<br />
== 3.1 - Released 28 Feb 2007 ==<br />
<br />
* Small bugfixes and usability improvements<br />
* Improved documentation<br />
* Working implementation of a heterogenous evolutionary modelbuilder<br />
* More examples<br />
<br />
== 3.0 - Released 14 Feb 2007 ==<br />
<br />
* Availability of pre-built binaries<br />
* Extensive refactoring and code cleanups<br />
* Many bugfixes and usability improvements<br />
* Resilience against simulator crashes<br />
* Ability to set the maximum running time for one sample evaluation<br />
* Vastly improved Genetic model builder + a neural network implementation<br />
* Addition of a RandomModelBuilder to use as a baseline benchmark<br />
* Possible to add dummy input variables or to model only a subset of the available inputs while clamping others<br />
* Improved multiple output support<br />
** outputs can be modeled in parallel<br />
** each output can be configured separately (eg. per output: model type, accuracy requirements (measure), sample selection algorithm, complex handling flag, etc) <br />
** mutliple outputs can be combined into one model if the model type supports this<br />
* Noisy (gaussian, outliers, ...) versions of a given output can be automatically added <br />
* New and improved directory structure for output data<br />
* New model types:<br />
** Kriging (based on the DACE MATLAB Kriging Toolbox by Lophaven, Nielsen and Sondergaard)<br />
** Splines (based on the MATLAB Splines Toolbox, only for 1D and 2D)<br />
* Now matlab scripts can be used as datasources (simulators) as well<br />
* New initial experimental design<br />
** Based on a dataset<br />
** Combination of existing designs<br />
** Based on the complexity of different 1D fits<br />
* Addition of new datasets and predefined functions as modeling examples<br />
<br />
== 2.0 - Released 15 Nov 2006 ==<br />
<br />
* Initial release of the M3-Toolbox - open source</div>Dgorissenhttp://sumowiki.intec.ugent.be/index.php?title=License_terms2&diff=5049License terms22009-12-10T15:26:30Z<p>Dgorissen: </p>
<hr />
<div>[[Image:osilogo.jpg|90px|right|Open Source Initiative]]<br />
<br />
The SUMO Toolbox is av