About
Contents
History
In 2004, research within the (former) COMS research group, led by professor Tom Dhaene, was focused on developing efficient, adaptive and accurate algorithms for polynomial and rational modeling of linear timeinvariant (LTI) systems. This work resulted in a set of Matlab scripts that were used as a testing ground for new ideas and concepts. Research progressed, and with time these scripts were reworked and refactored into one coherent Matlab toolbox, tentatively named the Multivariate MetaModeling (M3) Toolbox. The first public release of the toolbox (v2.0) occurred in November 2006. In October 2007, the development of the M3 Toolbox was discontinued.
In April 2008, the first public release of the Surrogate Modeling (SUMO) Toolbox (v5.0) occurred.
For a list of changes since then refer to the Changelog and Whats new pages.
Intended use
Global Surrogate Models
The SUMO Toolbox was originally designed to solve the following problem:
requiring as little data points and as little userinteraction as possible.
In addition the toolbox provides powerful, adaptive algorithms and a whole suite of model types for
 data fitting problems (regression, function approximation, curve fitting)
 response surface modeling (RSM)
 scattered data interpolation
 model selection
 Design Of Experiments (DoE)
 model parameter optimization, e.g., finding the optimal neural network topology, SVM kernel parameters, rational function order, etc. (= hyperparameter optimization)
 iterative adaptive sample selection (also known as sequential design or active learning)
Note that the SUMO toolbox is able to drive the simulation code directly.
For domain experts or engineers the SUMO Toolbox provides a flexible, pluggable platform to which the response surface modeling task can be delegated. For researchers in surrogate modeling it provides a common framework to implement, test and benchmark new modeling and sampling algorithms.
See the Wikipedia surrogate model page to find out more.
Surrogate Driven Optimization
While the main focus of the SUMO Toolbox is to create accurate global surrogate models, it can be used for other goals too.
For instance, the toolbox can be used to create consecutive local surrogate models for optimization purposes. The information obtained from the local surrogate models is used to guide the adaptive sampling process to the global optimum.
A good sample strategy for surrogate driven optimization seeks a balance between local search and global search, or refining the surrogate model and finding the optimum. Such a sample strategy is implemented (akin to (Super)EGO), see the different sample selectors for more information.
Dynamic systems or Time series prediction
See FAQ#What_about_dynamical.2C_time_dependent_data.3F.
Classification
See FAQ#What_about_classification_problems.3F.
Application range
The SUMO Toolbox has already been applied successfully to a wide range of problems from domains as diverse as aerodynamics, geology, metallurgy, electromagnetics (EM), electronics, engineering and economics. The SUMO Toolbox can be applied to any situation where the problem can be described as a function that maps a set of inputs onto a set of outputs. We generally refer to this function as the Simulator.
Across the different problems to which we have applied the toolbox, the input dimension has ranged from 1 to 130 and the output dimension from 1 to 70 (including both complex and real valued outputs). The number of data points has ranged from as little as 15 to as many as 100000.
Design goals
The SUMO Toolbox was designed with a number of goals in mind:
 A flexible tool that integrates different modeling methods and does not tie the user down to one particular set of problems. Reliance on domain specific features should be avoided.
 The focus should be on adaptivity, i.e., relieving the burden on the domain expert as much as possible. Given a simulation model, the software should produce an accurate surrogate model with minimal user interaction. This also includes easily integrating with the existing design environment.
 At the same time keeping in mind that there is no such thing as a `onesizefitsall'. Different problems need to be modeled differently and require different a priori process knowledge. Therefore the software should be modular and easily extensible to new methods.
 Engineers or domain experts do not tend to trust a black box system that generates models but is unclear about the reasons why a particular model should be preferred. Therefore an important design goal was that the expert user should be able to have full manual control over the modeling process if necessary. In addition the toolbox should support fine grain logging and profiling capabilities so its modeling and sampling decisions can be retraced.
Given this design philosophy, the toolbox can cater to both the researchers working on novel surrogate modeling techniques as well as to the engineers who need the surrogate model as part of their design process. For the former, the toolbox provides a common platform on which to deploy, test, and compare new modeling algorithms and sampling techniques. For the latter, the software functions as a highly configurable and flexible component to which surrogate model construction can be delegated, easing the burden of the user and enhancing productivity.
Features
The main features of the toolbox are listed below. For an overview of recent changes see the Whats new page. A detailed list of changes can be found in the Changelog.
Implementation Language  Matlab, Java, and where applicable C, C++ 

Design patterns  Fully object oriented, with the focus on clean design and encapsulation. 
Minimum Requirements  See the system requirements page 
Supported data sources*  Local executable/script, simulation engine, Java class, Matlab script, dataset (txt file) (see Interfacing with the toolbox) 
Supported data types  Supports multidimensional inputs and outputs. Outputs can be any combination of real/complex. 
Configuration  Extensively configurable through one main XML configuration file. 
Flexibility  Virtually every component of the modeling process can be configured, replaced or extended by a user specific, custom implementation 
Predefined accuracy  The toolbox will run until the user required accuracy has been reached, the maximum number of samples has been exceeded or a timeout has occurred 
Model Types*  Out of the box support for:

Model parameter optimization algorithms*  Pattern Search, EOG, Simulated Annealing, Genetic Algorithm, BGFS, DIRECT, Particle Swarm Optimization (PSO), NSGAII ... 
Sample selection algorithms (=sequential design, active learning)*  Random, errorbased, densitybased, gradientbased, and many different hybrids 
Experimental design*  Latin Hypercube Sampling, Central Composite, BoxBehnken, random, user defined, full factorial 
Model selection measures*  Validation set, crossvalidation, leaveoneout, model difference, AIC (also in a multiobjective context, see MultiObjective Modeling) 
Sample Evaluation*  On the local machine (taking advantage of multicore CPUs) or in parallel on a cluster/grid 
Supported distributed middlewares*  Sun Grid Engine, LCG Grid middleware (both accessed through a SSH accessible frontnode) 
Logging  Extensive logging to enable close monitoring of the modeling process. Logging granularity is fully configurable and log streams can be easily redirected (to file, console, a remote machine, ...). 
Profiling*  Extensive profiling framework for easy gathering (and plotting) of modeling metrics (average sample evaluation time, hyperparameter optimization trace, ...) 
Easy tracking of modeling progress  Automatic storing of best models and their plots. Ability to automatically generate a movie of the sequence of plots. 
Model browser GUI  A graphical tool is available to easily visualize high dimensional models and browse through data (more information here) 
Available test problems*  Out of the box support for many builtin functions (Ackley, Camel Back, GoldsteinPrice, ...) and datasets (Abalone, Boston Housing, FishLength, ...) from various application domains. Including a number of datasets (and some simulation code) from electronics. In total over 50 examples are available. 
License  License terms 
* Custom implementations can easily be added
Screenshots
A number of screenshots to give a feel of the SUMO Toolbox. Note these screenshots do not necessarily reflect the latest toolbox version.
Movies
A number of movies that illustrate how the modeling process progresses as more data becomes available.
 Modeling the StepDiscontinuity (= electromagnetic problem)
 Modeling the Ackley function (= mathematical function)
 Particle Swarm Optimization in the parameter space of Kriging (theta)
 PSO movie, this movie shows how the correlation parameters are optimized as the SUMO toolbox searches for better models. Note that the data distribution is not constant, but is continually updated.
 Modeling David's face (Data courtesy of the Digital Michelangelo Project)
A number of movies created with the Model Visualization GUI.
 Visualizing a 3D model from video compression data
 Modeling an exponential tapered TML (described in Microwave Engineering 2nd Edition, D.M. Pozar) using an Artificial Neural Network
 Visualizing 3 (out of 5) input dimensions with a high framerate and lighting, no samples shown.
 A movie of a 3D model with sample points, but without lighting.
 Visualizing a 3D model on a contour plot.
 A movie of a 2D model on a 1D plot.
Note these movies do not necessarily reflect the latest toolbox version. Improvements and/or interface adjustments may have been made since then.
Documentation
Presentation
 Poster: SUMO poster
 Presentation: SUMO slides
Newsletter
To stay up to date with the latest news and releases, we also recommend subscribing to the SUMO newsletter.
Traffic will be kept to a minimum and you can unsubscribe at any time.
Blog
A blog covering related research can be found here http://sumolab.blogspot.com.
Citations
See Citing the toolbox.