Add Sampling Algorithm
This wiki page is written for version 6.2 of the SUMO Toolbox.
The toolbox comes with a number of sample selection algorithms, both for experimental design (initial samples) and sequential design. Of course you are free to add your own.
You can govern how new samples are selected in a number of ways. The easiest method is by implementing your own sample selector class that derives from the SampleSelector base class in src/matlab/sampleSelectors. Again, only two methods are needed:
- a constructor for reading in the configuration extracted from the XML file. See other sample selectors for the structure of this configuration.
- a selectSamples.m file that, given the toolbox state, returns the next batch of samples.
The toolbox state is a Matlab struct with the following fields:
- samples: the samples that were previously evaluated.
- values: the output that must be used to select new samples for.
- lastModels: the best models so far.
- numNewSamples: the amount of new samples that must be selected. This is based on environmental information such as the modeling time, the number of available computational nodes (cpu cores, grid nodes) and so on.
The SUMO Toolbox contains an extensive framework for sampling, which makes it possible to combine your sampling algorithm in a number of ways with the existing algorithms. Below, a number of options are discussed. Note that these are optional, and it is also possible to write your own sample selector as mentioned above, without paying attention to the options below.
The combined sample selector can be used to have multiple different sample selectors sample the same output during one run of the toolbox. This can be useful if there are multiple criteria that you want to use together to select new samples. Different weights can be given to the sample selectors, so that each sample selector gets to sample a number of samples proportional to its weight. Sample code for combining 3 selectors with different weights can be found below.
Note the <MergeCriterion> tag. This tag defines the criterion which is used to combine the selected samples from the different sample selectors in one set of new samples. The ClosenessThreshold merge criterion simply filters out samples that are too close to each other. This avoids the problem that two sample selectors might select new samples really close to each other, thus evaluating two (almost) identical samples. The code from this example can be copied to your own CombinedSampleSelector as is.
<!--Allows you combine multiple sample selector algorithms--> <SampleSelector id="combo" type="CombinedSampleSelector" combineOutputs="false"> <!-- A highly adaptive sampling algorithm, error and density based --> <SampleSelector id="lola" type="LOLASampleSelector" combineOutputs="false" weight="0.5" /> <!-- An adaptive sample selection algorithm (error based), driven by the evaluation of your model on a dense grid --> <SampleSelector id="error" type="ErrorSampleSelector" combineOutputs="false" weight="0.3" /> <!--Each sampling iterations new samples are selected randomly--> <SampleSelector id="random" type="RandomSampleSelector" combineOutputs="false" weight="0.2" /> <!--Remove samples that are too close to each other--> <MergeCriterion type="ClosenessThreshold"> <!-- Closeness threshold, Double --> <Option key="closenessThreshold" value="0.2"/> <!-- Set a % of the maximumSamples to randomly chosen --> <Option key="randomPercentage" value="0"/> <Option key="debug" value="off" /> </MergeCriterion> </SampleSelector>
The pipeline sample selector is an extensive sampling framework used by several of the predefined sample selectors. It splits the sampling process up into three separate tasks, which are executed one after each other, to come to a final set of sample locations. In this section, we will briefly discuss the three different steps in the pipeline process. Please look into the existing sample selectors (such as delaunay and error) for example implementations.
The candidate generator is responsible for generating an initial set of candidate new samples. Out if this candidates, eventually, a number of new samples will be picked. Examples of candidate generators are a grid, a set of random points, etc. To implement your own candidate generator, you need only make a function with the following declaration:
All the candidates, generated by the candidate generator, are ranked by one or more candidate rankers. These rankers give a score (or ranking) to all the candidates. To make your own candidate ranker, you have to derive your class from the CandidateRanker base class. A minimal example candidate ranker is shown below.
Finally, all the rankings provided by the different candidate rankers, are used to select the final set of new samples out of the candidate samples. The merge criterion has the task of somehow combining the different rankings, and using these rankings to select the most appropriate candidates. In addition to a set of new samples, the merge criterion also has to assign priority values to each sample. These priorities (high value means high priority) are used to determine in which order the samples are evaluated. A minimal example merge criterion is shown below.