Difference between revisions of "Whats new"

From SUMOwiki
Jump to navigationJump to search
 
(42 intermediate revisions by 4 users not shown)
Line 1: Line 1:
This page gives a high level overview of the major changes in each toolbox version.  For the detailed list of changes [[Changelog | please refer to the changelog]].
+
This page gives a high level overview of the major changes in each toolbox version.  For the detailed list of changes please refer to the [[Changelog]] page.  For a list of features in the current version [[About#Features|see the about page]].
  
== 5.0 - Released February 2008  ==
+
== 7.0.2 - 1 August 2010 ==
  
=== Rebranding to SUMO Toolbox ===
+
A minor cosmetic update to correspond with the upcoming JMLR Software publication.
From now on the M3-Toolbox will be known as the '''SUrrogate MOdeling (SUMO) Toolbox'''.
 
  
Part of the reason for this rebranding is that the governing institution has changed.  All research and development related to the SUMO toolbox is now conducted at [http://www.ugent.be/ Ghent University (UGent)] (instead of [http://www.ua.ac.be Antwerp University (UA)]).
+
== 7.0.1 - 15 January 2010 ==
 +
 
 +
This release fixes a couple of known bugs, the most important one being a clustering related bug in the LOLA sample selection algorithm.  All users are strongly encouraged to upgrade.
 +
 
 +
== 7.0 - 29 January 2010 ==
 +
 
 +
The biggest change of this release is the move to a new license model.  From now on the SUMO Toolbox will be available under an '''open source''' license for non-commercial use.  This means there no longer is a time or user limit and there is no need for activation files.  Details can be found in the [[License terms]].
 +
 
 +
Besides this the code has seen some improvements and cleanups, most notably the Sample Evaluator and (Blind) Kriging components.
 +
 
 +
== 6.2.1 - 19 October 2009 ==
 +
 
 +
A bug fix release, all users are strongly requested to upgrade.
 +
 
 +
== 6.2 - 6 October 2009 ==
 +
 
 +
=== Sample Selection infrastructure ===
 +
 
 +
The sample selection infrastructure has been dramatically refactored in to a highly flexible and pluggable system. Different sample selection criteria can now be combined in a variety of different ways and the road has been opened towards dynamic sample selection criteria.
 +
 
 +
The LOLA-Voronoi algorithm has also seen some improvement with the addition of support for input constraints, sampling multiple outputs simultaneously, and improved support for dealing with auto-sampled inputs.
 +
 
 +
Sample points are now also assigned a priority by the sampling algorithm which is reflected in the order they are evaluated.  Finally, the Latin Hypercube design has been much improved.  It will now attempt to download known optimal designs automatically before attempting to generate one itself.
 +
 
 +
=== Model building infrastructure ===
 +
 
 +
The two main changes here are firstly the addition of an "ann" modelbuilder beside the existing "anngenetic" one.  This one runs faster, is more configurable and the quality of the models is roughly the same.
 +
 
 +
Secondly, the (Blind) Kriging models have been much improved.  A new implementation was added that replaces (and outperforms) the existing DACE Toolbox plugin.  Support has also been added for automatically selecting the Kriging correlation functions.
 +
 
 +
=== Other changes ===
 +
 
 +
Other noteworthy changes include: the addition of an interpolation model type, cleanups and fixes in the error functions, improved stability in LRMMeasure, faster measures in a multi-output setting, and more informative help texts.  Additionally the Model Browser and Profiler GUIs have seen some improvements in usability and functionality.
 +
 
 +
At the same time the code has seen more cleanups (it is now fully Classdef compliant) and the use of the parallel computing toolbox (if available) has been improved.
 +
 
 +
As always, a detailed list of changes can be found in the [[Changelog]].
 +
 
 +
== 6.1.1 - 17 April 2009 ==
 +
 
 +
This is a bugfix release that contains some cleanups and fixes to the [[Known bugs]] of version 6.1
 +
 
 +
== 6.1 - 16 February 2009 ==
 +
 
 +
The main improvements of 6.1 over 6.0.1 are stability, robustness, speed, and improved interfacing.  However, a number of major new features have been added as well.
 +
 
 +
=== Multi-Objective Modeling ===
 +
 
 +
Full [[Multi-Objective Modeling|multi-objective]] support when optimizing the model parameters. This allows an engineer to enforce multiple criteria on the models produced (instead of just a single accuracy measure).  This will also allow the efficient generation of model with multiple outputs (already possible through the combineOutputs option but not yet in a multi-objective setting).  Together with the automatic model type selection algorithm (heterogenetic) this allows the automatic selection of the best model type per output.  See [[Multi-Objective Modeling]] for more information and usage.
 +
 
 +
=== Smoothness Measure ===
 +
 
 +
A new measure: Linear Reference Model (LRM) has been added.  This measure is best used together with other measures and helps to enforce a smooth model surface.
 +
 
 +
=== Parallel Computing ===
 +
 
 +
Added experimental support for the Matlab Parallel Computing Toolbox (local scheduler only). This means that when the parallelMode option in ContextConfig is switched on, model construction will make use of all available cores/cpu's in order to build models in parallel.  This can result in some significant speedups.
 +
 
 +
=== General Modeling ===
 +
 
 +
The ''heterogenetic'' model builder for automatic model type selection has seen many cleanups and the code has been improved.  Now there should be no more manual hacks in order to use it.  The rational models now support all available optimization algorithms for order selection and two new model types have been added: Blind Kriging and Gaussian Process Models.  An Efficient Global Optimization (EGO) modelbuilder has also been added.  This means that a nested kriging model is used internally to predict which model parameters (e.g., of an SVM model) will result in the most accurate fit.  All models can now also be queried for derivatives at any point in their domain (regarless of the model type).
 +
 
 +
=== Code improvements ===
 +
 
 +
From now on Matlab 2008a or later will be required to run the toolbox (see [[System requirements]]).  The reason is that most of the modeling code has been ported to Matlabs new [[OO_Programming_in_Matlab|Object Orientation]] implementation.  The result is that the modeling code has become much cleaner and much less prone to bugs.  The interfaces have become more well-defined and it should be much easier to incorporate your own model type or hyperparameter optimization algorithm.
 +
 
 +
Note also that the Gradient Sample Selection algorithm has been renamed to LOLA.
 +
 
 +
=== General Improvements ===
 +
 
 +
In general, many bugs have been fixed, features, and error reporting improved and performance enhanced.  Also note that the default error function is now the [http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=4107991 Bayesian Error Estimation Quotient (BEEQ)]. Trivial dependencies on the Statistics Toolbox have been removed.
 +
 
 +
== 6.0.1 - Released 23 August 2008 ==
 +
 
 +
* This is a bugfix release that fixes a few things in the 6.0 release (including a crash on startup in some cases, see [[Known bugs]])
 +
 
 +
== 6.0 - Released 6 August 2008 ==
 +
 
 +
Originally this was supposed to be 5.1 but after many fixes and added features we decided to to promote it to 6.0.  Some of the things that can be expected for 6.0 are:
 +
 
 +
* Some important modeling related bugs have been fixed leading to improved model accuracy convergence
 +
* A nice graphical user interface (GUI) for loading models, browsing through dimensions, plotting errors, generating movies, ... ([[Model Visualization GUI|See here for more information]])
 +
* Introduction of project directories. All files belonging to a particular problem (simulation code, datasets, XML files, documentation, ...) are now grouped together in a project directory instead of being spread out over 3 different places.
 +
* Support for autosampling, one or more dimensions can be ignored during adaptive sampling.  This is useful if the simulation code can generate samples for that dimension itself (e.g., frequency samples in the case of a frequency domain simulator in Electro-Magnetism)
 +
* Models now remember axis lables, measure scores, and output names
 +
* An export function has been added to export models to a standalone Matlab script (.m file).  Not supported for all model types yet.
 +
* Proper support for Matlab R2008
 +
* A simple new model type "PolynomialModel" that builds polynomial models with a fixed (user defined) order
 +
* Note that in some cases loading models generated by older toolbox versions will not work and give an error
 +
 
 +
And of course countless bugfixes, performance, and feature enhancements.  '''Upgrading is strongly advised'''.
 +
 
 +
== 5.0 - Released 8 April 2008  ==
 +
 
 +
=== SUMO Toolbox ===
 +
 
 +
In April 2008, the first public release of the '''SUrrogate MOdeling (SUMO) Toolbox''' occurred.
  
 
=== Sampling related changes ===
 
=== Sampling related changes ===
The sample selection and evaluation backends have seen some major improvements.  The number of samples selected each iteration need no longer be chosen a priori but is determined on the fly based on the time needed for modeling, the average length of the past 'n' simulations and the number of compute nodes (or CPU cores) available.  A user specified upper bound can still be specified of course.  It is now also possible to evaluate data points in batches instead of always one-by-one.  This is useful if, for example, there is a considerable overhead for submitting one point.
 
  
In addition data points can be assigned priorities by the sample selection algorithmThese priorities are then reflected in the scheduling decisions made by the sample evaluator.  It now also becomes possible to add different priority management policies.  For example, one could require that interest in sample points be renewed, else their priorities will degrade with time.
+
The sample selection and evaluation backends have seen some major improvements.   
  
A new sample selection algorithm as been added in this version that can use any function as a criterion of where to select new samples. This function is able to use all the information the surrogate provides to calculate how interesting a certain sample is. Internally a numeric global optimizer is applied on the criterion to determine the next sample point(s). There are several criterions implemented, mostly for global optimization. For instance the expected improvement criterion is very efficient for global optimization as it balances between optimization itself and refining the surrogate.
+
The number of samples selected each iteration need no longer be chosen a priori but is determined on the fly based on the time needed for modeling, the average length of the past 'n' simulations and the number of compute nodes (or CPU cores) available.  Of course, a user specified upper bound can still be specified.  It is now also possible to evaluate data points in batches instead of always one-by-one.  This is useful if, for example, there is a considerable overhead for submitting one point.
 +
 
 +
In addition, data points can be assigned priorities by the sample selection algorithm.  These priorities are then reflected in the scheduling decisions made by the sample evaluator.  It now also becomes possible to add different priority management policies.  For example, one could require that 'interest' in sample points be renewed, else their priorities will degrade with time.
 +
 
 +
A new sample selection algorithm has been added that can use any function as a criterion of where to select new samples. This function is able to use all the information the surrogate provides to calculate how interesting a certain sample is. Internally, a numeric global optimizer is applied on the criterion to determine the next sample point(s). There are several criterions implemented, mostly for global optimization. For instance the 'expected improvement criterion' is very efficient for global optimization as it balances between optimization itself and refining the surrogate.
  
 
Finally the handling of failed or 'lost' data points has become much more robust.  Pending points are automatically removed if their evaluation time exceeds a multiple of the average evaluation time.  Failed points can also be re-submitted a number of times before being regarded as permanently failed.
 
Finally the handling of failed or 'lost' data points has become much more robust.  Pending points are automatically removed if their evaluation time exceeds a multiple of the average evaluation time.  Failed points can also be re-submitted a number of times before being regarded as permanently failed.
Line 21: Line 119:
 
The modeling code has seen some much needed cleanups.  Adding new model types and improving the existing ones is now much more straightforward.
 
The modeling code has seen some much needed cleanups.  Adding new model types and improving the existing ones is now much more straightforward.
  
Since the default neural network model implementation is quite slow, two additional implementations were added based on [http://fann.sf.net FANN] and [http://www.iau.dtu.dk/research/control/nnsysid.html NNSYSID] which are much faster. In addition the NNSYSID implementation also supports pruning.  However, though these two implementations are faster, the Matlab implementation still outperforms them accuracy wise.
+
Since the default Matlab neural network model implementation is quite slow, two additional implementations were added based on [http://fann.sf.net FANN] and [http://www.iau.dtu.dk/research/control/nnsysid.html NNSYSID] which are much faster. In addition the NNSYSID implementation also supports pruning.  However, though these two implementations are faster, the Matlab implementation still outperforms them accuracy wise.
  
 
An intelligent seeding strategy has been enabled.  The starting point/population of each new model parameter optimization run is now chosen intelligently in order to achieve a more optimal search of the model parameter space.  This leads to better models faster.
 
An intelligent seeding strategy has been enabled.  The starting point/population of each new model parameter optimization run is now chosen intelligently in order to achieve a more optimal search of the model parameter space.  This leads to better models faster.
Line 27: Line 125:
 
=== Optimization related changes ===
 
=== Optimization related changes ===
  
TODO Ivo
+
* The Optimization framework was removed due to [[FAQ#What_about_surrogate_driven_optimization.3F|several reasons]].
* The Optimization framework was removed since...
+
* Added an [[Optimizer|optimizer]] class hierarchy for solving subproblems transparently.
* Added an optimizer class hierarchy for solving subproblems transparantly
+
* Added several criterions for optimization, available through the [[Config:SampleSelector#isc|InfillSamplingCriterion]].
  
 
=== Various changes ===
 
=== Various changes ===
  
The default error function is now the root relative square error (= a global relative error) instead of the absolute root mean square error. The memory usage has been drastically reduced when performing many runs with multiple datasets (datasets are loaded only once).
+
The default 'error function' is now the root relative square error (= a global relative error) instead of the absolute root mean square error.  
 +
 
 +
The memory usage has been drastically reduced when performing many runs with multiple datasets (datasets are loaded only once).
  
 
The default settings have been harmonized and much improved.  For example the SVM parameter space is now searched in log10 instead of loge.  The MinMax measure is now also enabled by default if you do not specify any other measure.  This means that if you specify minimum and maximum bounds in the simulator xml file, models which do not respect these bounds are penalized.
 
The default settings have been harmonized and much improved.  For example the SVM parameter space is now searched in log10 instead of loge.  The MinMax measure is now also enabled by default if you do not specify any other measure.  This means that if you specify minimum and maximum bounds in the simulator xml file, models which do not respect these bounds are penalized.
  
Finally this release has seen countless cleanups, bugfixes and feature enhancements.
+
Finally this release has seen countless cleanups, bug fixes and feature enhancements.

Latest revision as of 09:04, 2 August 2010

This page gives a high level overview of the major changes in each toolbox version. For the detailed list of changes please refer to the Changelog page. For a list of features in the current version see the about page.

7.0.2 - 1 August 2010

A minor cosmetic update to correspond with the upcoming JMLR Software publication.

7.0.1 - 15 January 2010

This release fixes a couple of known bugs, the most important one being a clustering related bug in the LOLA sample selection algorithm. All users are strongly encouraged to upgrade.

7.0 - 29 January 2010

The biggest change of this release is the move to a new license model. From now on the SUMO Toolbox will be available under an open source license for non-commercial use. This means there no longer is a time or user limit and there is no need for activation files. Details can be found in the License terms.

Besides this the code has seen some improvements and cleanups, most notably the Sample Evaluator and (Blind) Kriging components.

6.2.1 - 19 October 2009

A bug fix release, all users are strongly requested to upgrade.

6.2 - 6 October 2009

Sample Selection infrastructure

The sample selection infrastructure has been dramatically refactored in to a highly flexible and pluggable system. Different sample selection criteria can now be combined in a variety of different ways and the road has been opened towards dynamic sample selection criteria.

The LOLA-Voronoi algorithm has also seen some improvement with the addition of support for input constraints, sampling multiple outputs simultaneously, and improved support for dealing with auto-sampled inputs.

Sample points are now also assigned a priority by the sampling algorithm which is reflected in the order they are evaluated. Finally, the Latin Hypercube design has been much improved. It will now attempt to download known optimal designs automatically before attempting to generate one itself.

Model building infrastructure

The two main changes here are firstly the addition of an "ann" modelbuilder beside the existing "anngenetic" one. This one runs faster, is more configurable and the quality of the models is roughly the same.

Secondly, the (Blind) Kriging models have been much improved. A new implementation was added that replaces (and outperforms) the existing DACE Toolbox plugin. Support has also been added for automatically selecting the Kriging correlation functions.

Other changes

Other noteworthy changes include: the addition of an interpolation model type, cleanups and fixes in the error functions, improved stability in LRMMeasure, faster measures in a multi-output setting, and more informative help texts. Additionally the Model Browser and Profiler GUIs have seen some improvements in usability and functionality.

At the same time the code has seen more cleanups (it is now fully Classdef compliant) and the use of the parallel computing toolbox (if available) has been improved.

As always, a detailed list of changes can be found in the Changelog.

6.1.1 - 17 April 2009

This is a bugfix release that contains some cleanups and fixes to the Known bugs of version 6.1

6.1 - 16 February 2009

The main improvements of 6.1 over 6.0.1 are stability, robustness, speed, and improved interfacing. However, a number of major new features have been added as well.

Multi-Objective Modeling

Full multi-objective support when optimizing the model parameters. This allows an engineer to enforce multiple criteria on the models produced (instead of just a single accuracy measure). This will also allow the efficient generation of model with multiple outputs (already possible through the combineOutputs option but not yet in a multi-objective setting). Together with the automatic model type selection algorithm (heterogenetic) this allows the automatic selection of the best model type per output. See Multi-Objective Modeling for more information and usage.

Smoothness Measure

A new measure: Linear Reference Model (LRM) has been added. This measure is best used together with other measures and helps to enforce a smooth model surface.

Parallel Computing

Added experimental support for the Matlab Parallel Computing Toolbox (local scheduler only). This means that when the parallelMode option in ContextConfig is switched on, model construction will make use of all available cores/cpu's in order to build models in parallel. This can result in some significant speedups.

General Modeling

The heterogenetic model builder for automatic model type selection has seen many cleanups and the code has been improved. Now there should be no more manual hacks in order to use it. The rational models now support all available optimization algorithms for order selection and two new model types have been added: Blind Kriging and Gaussian Process Models. An Efficient Global Optimization (EGO) modelbuilder has also been added. This means that a nested kriging model is used internally to predict which model parameters (e.g., of an SVM model) will result in the most accurate fit. All models can now also be queried for derivatives at any point in their domain (regarless of the model type).

Code improvements

From now on Matlab 2008a or later will be required to run the toolbox (see System requirements). The reason is that most of the modeling code has been ported to Matlabs new Object Orientation implementation. The result is that the modeling code has become much cleaner and much less prone to bugs. The interfaces have become more well-defined and it should be much easier to incorporate your own model type or hyperparameter optimization algorithm.

Note also that the Gradient Sample Selection algorithm has been renamed to LOLA.

General Improvements

In general, many bugs have been fixed, features, and error reporting improved and performance enhanced. Also note that the default error function is now the Bayesian Error Estimation Quotient (BEEQ). Trivial dependencies on the Statistics Toolbox have been removed.

6.0.1 - Released 23 August 2008

  • This is a bugfix release that fixes a few things in the 6.0 release (including a crash on startup in some cases, see Known bugs)

6.0 - Released 6 August 2008

Originally this was supposed to be 5.1 but after many fixes and added features we decided to to promote it to 6.0. Some of the things that can be expected for 6.0 are:

  • Some important modeling related bugs have been fixed leading to improved model accuracy convergence
  • A nice graphical user interface (GUI) for loading models, browsing through dimensions, plotting errors, generating movies, ... (See here for more information)
  • Introduction of project directories. All files belonging to a particular problem (simulation code, datasets, XML files, documentation, ...) are now grouped together in a project directory instead of being spread out over 3 different places.
  • Support for autosampling, one or more dimensions can be ignored during adaptive sampling. This is useful if the simulation code can generate samples for that dimension itself (e.g., frequency samples in the case of a frequency domain simulator in Electro-Magnetism)
  • Models now remember axis lables, measure scores, and output names
  • An export function has been added to export models to a standalone Matlab script (.m file). Not supported for all model types yet.
  • Proper support for Matlab R2008
  • A simple new model type "PolynomialModel" that builds polynomial models with a fixed (user defined) order
  • Note that in some cases loading models generated by older toolbox versions will not work and give an error

And of course countless bugfixes, performance, and feature enhancements. Upgrading is strongly advised.

5.0 - Released 8 April 2008

SUMO Toolbox

In April 2008, the first public release of the SUrrogate MOdeling (SUMO) Toolbox occurred.

Sampling related changes

The sample selection and evaluation backends have seen some major improvements.

The number of samples selected each iteration need no longer be chosen a priori but is determined on the fly based on the time needed for modeling, the average length of the past 'n' simulations and the number of compute nodes (or CPU cores) available. Of course, a user specified upper bound can still be specified. It is now also possible to evaluate data points in batches instead of always one-by-one. This is useful if, for example, there is a considerable overhead for submitting one point.

In addition, data points can be assigned priorities by the sample selection algorithm. These priorities are then reflected in the scheduling decisions made by the sample evaluator. It now also becomes possible to add different priority management policies. For example, one could require that 'interest' in sample points be renewed, else their priorities will degrade with time.

A new sample selection algorithm has been added that can use any function as a criterion of where to select new samples. This function is able to use all the information the surrogate provides to calculate how interesting a certain sample is. Internally, a numeric global optimizer is applied on the criterion to determine the next sample point(s). There are several criterions implemented, mostly for global optimization. For instance the 'expected improvement criterion' is very efficient for global optimization as it balances between optimization itself and refining the surrogate.

Finally the handling of failed or 'lost' data points has become much more robust. Pending points are automatically removed if their evaluation time exceeds a multiple of the average evaluation time. Failed points can also be re-submitted a number of times before being regarded as permanently failed.

Modeling related changes

The modeling code has seen some much needed cleanups. Adding new model types and improving the existing ones is now much more straightforward.

Since the default Matlab neural network model implementation is quite slow, two additional implementations were added based on FANN and NNSYSID which are much faster. In addition the NNSYSID implementation also supports pruning. However, though these two implementations are faster, the Matlab implementation still outperforms them accuracy wise.

An intelligent seeding strategy has been enabled. The starting point/population of each new model parameter optimization run is now chosen intelligently in order to achieve a more optimal search of the model parameter space. This leads to better models faster.

Optimization related changes

Various changes

The default 'error function' is now the root relative square error (= a global relative error) instead of the absolute root mean square error.

The memory usage has been drastically reduced when performing many runs with multiple datasets (datasets are loaded only once).

The default settings have been harmonized and much improved. For example the SVM parameter space is now searched in log10 instead of loge. The MinMax measure is now also enabled by default if you do not specify any other measure. This means that if you specify minimum and maximum bounds in the simulator xml file, models which do not respect these bounds are penalized.

Finally this release has seen countless cleanups, bug fixes and feature enhancements.