Difference between revisions of "SED:SED toolbox"
(→DACE toolbox interface) 
(→Introduction) 

(22 intermediate revisions by 2 users not shown)  
Line 1:  Line 1:  
== Introduction == 
== Introduction == 

−  [[Image: 
+  [[Image:SED.png110 pxrightSED Toolbox]] 
−  The SED 
+  The SED Toolbox (Sequential Experimental Design) is a powerful Matlab toolbox for the sequential ''Design of Experiments (DoE)''. 
+  In traditional experimental design, all the design points are selected up front, before performing any (computer or reallife) experiment, and afterwards, no additional design points are selected. This traditional approach is prone to oversampling and/or undersampling, because it is often very difficult to estimate up front the required number of design points. 

+  The SED Toolbox solves this problem by providing the user with stateoftheart algorithms that generate an experimental design in a sequential way, i.e. one design point at a time, without having to provide the total number of design points in advance. This is called ''sequential experimental design (SED)''. The SED Toolbox was designed to be extremely fast and easy to use, yet very powerful. 

−  TO BE UPDATED 

−  the popular Gaussian Process based kriging surrogate models. Kriging is in particular popular for approximating (and optimizing) deterministic computer experiments. Given a dataset the toolbox automatically fits a kriging surrogate model to it. Afterwards the kriging surrogate can be fully exploited instead of the (probably more expensive) simulation code. 

+  Central to the experimental design problem is the tradeoff between the ''intersite'' (maximin) and ''projected'' (noncollapsing) requirements. 

−  The toolbox is aimed for solving complex applications (expensive simulation codes, physical experiments, ...) and for researching new kriging extensions and techniques. 

+  The ''intersite distance'' is the smallest distance between two design points in the design space; this value should be as high as possible, in order to have the points spread out as evenly as possible. 

+  In addition to the intersite distance, the projected distance is also important. 

+  The ''projected distance'' is the smallest distance between all the points after they have been projected on one axes of the design space. This measure is especially important if the relative importance of the design parameters is unknown. E.g., if one of the design parameters does not influence the output behavior, two design points which only differ in this (irrelevant) parameter have the same behavior, and can be seen as the "same" design point. Thus, the projected distance must also be maximized. 

+  
+  All the algorithms in the SED Toolbox were optimized to produce designs that score well on both the intersite and projected distance. 

== Download == 
== Download == 

Line 16:  Line 21:  
== Quick start guide == 
== Quick start guide == 

+  
−  '''IMPORTANT''': Before the toolbox can be used you have to include the toolbox in Matlab's search path. You can do this manually by running startup, or, if Matlab is started in the root toolbox directory, then startup will be run automatically. 

+  '''IMPORTANT''': Before the toolbox can be used, you have to set it up for use, by browsing to the directory in which the toolbox was unpacked and running the startup command: 

<source lang="matlab"> 
<source lang="matlab"> 

Line 22:  Line 28:  
</source> 
</source> 

+  Now the toolbox is ready to be used. The SED Toolbox can be used in several ways, based on how much freedom you want in configuring and finetuning the parameters of the algorithms. We will now describe the three ways the toolbox can be used, in order of complexity, based on your requirements. If you prefer to learn by example, you can check out the examples directory in the distribution, which contains several applications and example problems for the toolbox. 

−  Now the toolbox is ready to be used. The blindDACE toolbox is designed in an object oriented (OO) fashion. 

−  It is strongly recommended to exploit the OO design directly, i.e., use the Kriging and Optimizer matlab classes. 

−  However, for convenience wrapper scripts (dacefit, predictor) are provided that emulate the DACE toolbox interface (see [[#DACE toolbox interfacewrapper scripts]] for more information). 

−  +  === You want an ND design of X points === 

+  In order to quickly generate a good ND design in X points, you can use the following code: 

−  <b>samples</b> holds the input parameters nbyd array (each row is one observation) and <b>values</b> is the corresponding nby1 array containing the output values. 

−  <b>lb</b> and <b>ub</b> are 1byd arrays defining the lower bounds and upper bounds, respectively, needed to optimize the hyperparameters (<b>theta</b>). In addition, a set of starting values for <b>theta</b> has to be specified (i.e., <b>theta0</b> is also an 1byd array) 

+  <source lang="matlab"> 

−  As of version 0.2 of the blindDACE toolbox a script is provided, blinddacefit, that just takes your dataset (a <b>samples</b> and <b>values</b> matrix) and returns a fitted kriging object, all other parameters (<b>theta0</b>, etc.) are set to some sensible defaults. 

+  startup % configure the toolbox 

+  config.inputs.nInputs = N; % set the number of inputs in the config struct 

+  generator = SequentialDesign(config); % set up the sequential design 

+  generator = generator.generateTotalPoints(X); % generate a total of X points 

+  points = generator.getAllPoints(); % return the entire design 

+  
+  % optional: 

+  generator.plot(); % plot the design 

+  generator.getMetrics(); % get some metrics about the quality of the design 

+  </source> 

+  
+  === You want to use the more advanced features of the SED Toolbox === 

+  
+  If you want to use some of the more advanced features of the SED Toolbox, such as input ranges and weights and constraints, you have two options. The first one is to use Matlab structs as in the previous example. The second one is to use simple XML files to configure the toolbox. Note that constraints will only work with XML configuration. You can open the 'problem.xml' file in the SED directory to get an idea of how a problem configuration looks like. You can edit this file to suit your needs and use it to configure the toolbox using the following command: 

−  For more flexibility the full example code to fit the dataset is: 

<source lang="matlab"> 
<source lang="matlab"> 

+  % generate a sequential design for the problem defined in problem.xml: 

−  ... 

+  generator = SequentialDesign('problem.xml'); 

−  % Generate kriging options structure 

−  opts = getDefaultOptions(); 

−  opts.hpBounds = [lb ; ub]; % hyperparameter optimization bounds 

+  % generate a sequential design using the specified method for the problem defined in problem.xml: 

−  % configure the optimization algorithm (only one optimizer is included) 

+  generator = SequentialDesign('problem.xml', 'methods/mcintersiteprojectedthreshold.xml'); 

−  % the Matlab Optimization toolbox is REQUIRED 

+  </source> 

−  optimopts.GradObj = 'on'; 

−  optimopts.DerivativeCheck = 'off'; 

−  optimopts.Diagnostics = 'off'; 

−  optimopts.Algorithm = 'activeset'; 

−  opts.hpOptimizer = MatlabOptimizer( dim, 1, optimopts ); 

+  If you instead prefer to use Matlab structs, you can use the following code to configure the toolbox: 

−  % build and fit Kriging object 

−  k = Kriging( opts, theta0, 'regpoly0', @corrgauss ); 

−  k = k.fit( samples, values ); 

+  <source lang="matlab"> 

−  % k represents the approximation and can now be used, e.g., 

+  config.inputs.nInputs = 2; % this is a 2D example 

−  [y mse] = k.predict( [1 2] ) 

+  config.inputs.minima = [1 1]; % define the minimum of each input 

−  ... 

+  config.inputs.maxima = [3 1]; % define the maximum of each input 

+  config.inputs.weights = [2 0]; % the first input is twice as important as the second one 

+  generator = SequentialDesign(config); % set up the sequential design 

</source> 
</source> 

+  
−  See the included demo.m script for more example code on how to use the blindDACE toolbox (including more advanced features such as using blind kriging or how to use regression instead of interpolation). For more information on the classes and their methods please refer to the source files. 

+  === You want full control over all the method parameters === 

+  
+  If you want full control over all the parameters of both the problem specification and the sequential design method, XML files are the only option. By editing the method XML files, you can tweak each method to your own preferences. Even though the options are documented, it might be difficult to understand their effect on the sampling process. Note that the default settings have been chosen based on extensive studies and comparisons, and are in most cases the best choice. If you have any questions or suggestions, please contact the authors at '''Karel dot Crombecq at ua.ac.be'''. 

+  
+  In addition to the methods provided by the XML files packaged with the SED Toolbox, SED also contains a huge library of components (such as candidate generators, optimizers, metrics) from which the user can compose his own sequential design methods. This feature is undocumented and unsupported, but users are free to experiment with them. 

== SED toolbox interface == 
== SED toolbox interface == 

+  A reference of all the functions available in the SED Toolbox can be found on [[SED:SED_referencethis page]]. 

−  The SED toolbox provides 

+  == Rules of thumb for selecting the right sequential design method == 

−  TO BE UPDATED 

+  The default sequential design method for the SED Toolbox is ''mcintersiteprojectedthreshold.xml''. This is an intelligent Monte Carlo method which generates Monte Carlo points only in parts of the design space where the projected distance is above a certain threshold. From the remaining points, the best point in terms of intersite distance is picked as the next design point. 

−  two scripts dacefit.m and predictor.m that emulate the behavior of the DACE toolbox ([http://www2.imm.dtu.dk/~hbn/dace/]). Note, that full compatibility between blindDACE and the DACE toolbox is not provided. The scripts merely aim to ease the transition from the DACE toolbox to the blindDACE toolbox. 

+  
+  This method is very fast and can be applied to highly dimensional problems and for large designs. It also works well with constraints and input weights. However, there are some cases in which one of the other methods might be a better choice. Below you can find a table with rules of thumb for picking the right method for the right job. 

+  
+  === Constraints === 

+  
+  the default method '''mcintersiteprojectedthreshold''' can run into problems when you are using very strict constraints. Because the Monte Carlo points are filtered by the projected distance threshold, it might be possible that no candidates remain that satisfy the constraints. In that case, '''mcintersiteprojected''' can be a good alternative. It produces slightly worse designs but is much more robust in terms of constraints. Additionally, '''mcintersiteprojectedthreshold''' and all other methods besides '''mcintersiteprojected''' need the corner points [1,...,1] and [1,...,1] to start, and if they violate the constraints they will still be selected. You can later request the design without these corner points using the getAllPointsWithoutInitialDesign() function, so this might not be an issue, but keep it in mind. 

+  
+  === Quality vs speed === 

+  
+  The slowest method available in SED is '''optimizerintersite''', but this method also generates the best designs (slightly better than '''mcintersiteprojectedthreshold'''). If you have the time, consider using this method instead. It also supports constraints, but might also run into problems with very tight constraints. 

+  
+  If time is of no concern, you can also consider increasing some of the method parameters to further improve the design. For '''mcintersiteprojectedthreshold''', the ''candidatesPerSample'' option can be increased to improve the quality at the cost of speed. For '''optimizerintersite''', both the ''nPop'' and ''maxIterations'' options can be increased. 

+  
+  === Dimensionality === 

+  
+  The Monte Carlo methods scale very well with the number of dimensions and points and should work for highdimensional problems. However, the optimizer methods suffer more from the curse of dimensionality. '''optimizerintersite''' should work up to 10D, but will run into memory problems for higher dimensions. 

−  Example code: 

−  <source lang="matlab"> 

−  krige = dacefit(samples, values, 'regpoly0', 'corrgauss', theta0, lb, ub ) 

−  y = predictor([1 2], krige) 

−  </source> 

−  Obviously, a lot less code is used to copy the setup described above. However, less code means less flexibility (e.g., blind kriging and regression kriging are not available using the wrapper scripts). Hence, it is suggested to learn the object oriented interface of SED and use it instead. 

== Contribute == 
== Contribute == 
Latest revision as of 18:54, 24 February 2011
Contents
Introduction
The SED Toolbox (Sequential Experimental Design) is a powerful Matlab toolbox for the sequential Design of Experiments (DoE). In traditional experimental design, all the design points are selected up front, before performing any (computer or reallife) experiment, and afterwards, no additional design points are selected. This traditional approach is prone to oversampling and/or undersampling, because it is often very difficult to estimate up front the required number of design points.
The SED Toolbox solves this problem by providing the user with stateoftheart algorithms that generate an experimental design in a sequential way, i.e. one design point at a time, without having to provide the total number of design points in advance. This is called sequential experimental design (SED). The SED Toolbox was designed to be extremely fast and easy to use, yet very powerful.
Central to the experimental design problem is the tradeoff between the intersite (maximin) and projected (noncollapsing) requirements.
The intersite distance is the smallest distance between two design points in the design space; this value should be as high as possible, in order to have the points spread out as evenly as possible.
In addition to the intersite distance, the projected distance is also important.
The projected distance is the smallest distance between all the points after they have been projected on one axes of the design space. This measure is especially important if the relative importance of the design parameters is unknown. E.g., if one of the design parameters does not influence the output behavior, two design points which only differ in this (irrelevant) parameter have the same behavior, and can be seen as the "same" design point. Thus, the projected distance must also be maximized.
All the algorithms in the SED Toolbox were optimized to produce designs that score well on both the intersite and projected distance.
Download
See: download page
Quick start guide
IMPORTANT: Before the toolbox can be used, you have to set it up for use, by browsing to the directory in which the toolbox was unpacked and running the startup command:
startup
Now the toolbox is ready to be used. The SED Toolbox can be used in several ways, based on how much freedom you want in configuring and finetuning the parameters of the algorithms. We will now describe the three ways the toolbox can be used, in order of complexity, based on your requirements. If you prefer to learn by example, you can check out the examples directory in the distribution, which contains several applications and example problems for the toolbox.
You want an ND design of X points
In order to quickly generate a good ND design in X points, you can use the following code:
startup % configure the toolbox % set the number of inputs in the config struct % set up the sequential design % generate a total of X points % return the entire design % optional: % plot the design % get some metrics about the quality of the design
You want to use the more advanced features of the SED Toolbox
If you want to use some of the more advanced features of the SED Toolbox, such as input ranges and weights and constraints, you have two options. The first one is to use Matlab structs as in the previous example. The second one is to use simple XML files to configure the toolbox. Note that constraints will only work with XML configuration. You can open the 'problem.xml' file in the SED directory to get an idea of how a problem configuration looks like. You can edit this file to suit your needs and use it to configure the toolbox using the following command:
% generate a sequential design for the problem defined in problem.xml: % generate a sequential design using the specified method for the problem defined in problem.xml:
If you instead prefer to use Matlab structs, you can use the following code to configure the toolbox:
span class="co1">% this is a 2D example % define the minimum of each input % define the maximum of each input % the first input is twice as important as the second one % set up the sequential design
You want full control over all the method parameters
If you want full control over all the parameters of both the problem specification and the sequential design method, XML files are the only option. By editing the method XML files, you can tweak each method to your own preferences. Even though the options are documented, it might be difficult to understand their effect on the sampling process. Note that the default settings have been chosen based on extensive studies and comparisons, and are in most cases the best choice. If you have any questions or suggestions, please contact the authors at Karel dot Crombecq at ua.ac.be.
In addition to the methods provided by the XML files packaged with the SED Toolbox, SED also contains a huge library of components (such as candidate generators, optimizers, metrics) from which the user can compose his own sequential design methods. This feature is undocumented and unsupported, but users are free to experiment with them.
SED toolbox interface
A reference of all the functions available in the SED Toolbox can be found on this page.
Rules of thumb for selecting the right sequential design method
The default sequential design method for the SED Toolbox is mcintersiteprojectedthreshold.xml. This is an intelligent Monte Carlo method which generates Monte Carlo points only in parts of the design space where the projected distance is above a certain threshold. From the remaining points, the best point in terms of intersite distance is picked as the next design point.
This method is very fast and can be applied to highly dimensional problems and for large designs. It also works well with constraints and input weights. However, there are some cases in which one of the other methods might be a better choice. Below you can find a table with rules of thumb for picking the right method for the right job.
Constraints
the default method mcintersiteprojectedthreshold can run into problems when you are using very strict constraints. Because the Monte Carlo points are filtered by the projected distance threshold, it might be possible that no candidates remain that satisfy the constraints. In that case, mcintersiteprojected can be a good alternative. It produces slightly worse designs but is much more robust in terms of constraints. Additionally, mcintersiteprojectedthreshold and all other methods besides mcintersiteprojected need the corner points [1,...,1] and [1,...,1] to start, and if they violate the constraints they will still be selected. You can later request the design without these corner points using the getAllPointsWithoutInitialDesign() function, so this might not be an issue, but keep it in mind.
Quality vs speed
The slowest method available in SED is optimizerintersite, but this method also generates the best designs (slightly better than mcintersiteprojectedthreshold). If you have the time, consider using this method instead. It also supports constraints, but might also run into problems with very tight constraints.
If time is of no concern, you can also consider increasing some of the method parameters to further improve the design. For mcintersiteprojectedthreshold, the candidatesPerSample option can be increased to improve the quality at the cost of speed. For optimizerintersite, both the nPop and maxIterations options can be increased.
Dimensionality
The Monte Carlo methods scale very well with the number of dimensions and points and should work for highdimensional problems. However, the optimizer methods suffer more from the curse of dimensionality. optimizerintersite should work up to 10D, but will run into memory problems for higher dimensions.
Contribute
Suggestions on how to improve the SED toolbox are always welcome. For more information please see the feedback page.