Difference between revisions of "Add Sampling Algorithm"

From SUMOwiki
Jump to navigationJump to search
Line 1: Line 1:
 
The toolbox comes with a number of sample selection algorithms, both for experimental design (initial samples) and sequential design.  Of course you are free to add your own.
 
The toolbox comes with a number of sample selection algorithms, both for experimental design (initial samples) and sequential design.  Of course you are free to add your own.
  
You can govern how new samples are selected by implementing your own sample selector class that derives from the SampleSelector base class in src/matlab/sampleSelectors.  Again, only two methods are needed:
+
You can govern how new samples are selected in a number of ways. The easiest method is by implementing your own sample selector class that derives from the SampleSelector base class in src/matlab/sampleSelectors.  Again, only two methods are needed:
  
 
* a constructor for reading in the configuration extracted from the XML file. See other sample selectors for the structure of this configuration.
 
* a constructor for reading in the configuration extracted from the XML file. See other sample selectors for the structure of this configuration.
Line 11: Line 11:
 
* lastModels: the best models so far.
 
* lastModels: the best models so far.
 
* numNewSamples: the amount of new samples that must be selected. This is based on environmental information such as the modeling time, the number of available computational nodes (cpu cores, grid nodes) and so on.
 
* numNewSamples: the amount of new samples that must be selected. This is based on environmental information such as the modeling time, the number of available computational nodes (cpu cores, grid nodes) and so on.
 +
 +
The SUMO Toolbox contains an extensive framework for sampling, which makes it possible to combine your sampling algorithm in a number of ways with the existing algorithms. Below, a number of options are discussed. Note that these are optional, and it is also possible to write your own sample selector as mentioned above, without paying attention to the options below.
 +
 +
 +
= CombinedSampleSelector =
 +
 +
The combined sample selector can be used to have multiple different sample selectors sample the same output during one run of the toolbox. This can be useful if there are multiple criteria that you want to use together to select new samples. Different weights can be given to the sample selectors, so that each sample selector gets to sample a number of samples proportional to its weight. Sample code for combining 3 selectors with different weights can be found below.
 +
 +
Note the <MergeCriterion> tag. This tag defines the criterion which is used to combine the selected samples from the different sample selectors in one set of new samples. The ClosenessThreshold merge criterion simply filters out samples that are too close to each other. This avoids the problem that two sample selectors might select new samples really close to each other, thus evaluating two (almost) identical samples. The code from this example can be copied to your own CombinedSampleSelector as is.
 +
 +
 +
<source xmlns:saxon="http://icl.com/saxon" lang="xml">
 +
 +
<!--Allows you combine multiple sample selector algorithms-->
 +
<SampleSelector id="combo" type="CombinedSampleSelector" combineOutputs="false">
 +
 +
<!-- A highly adaptive sampling algorithm, error and density based -->
 +
<SampleSelector id="lola" type="LOLASampleSelector" combineOutputs="false" weight="0.5" />
 +
 +
<!-- An adaptive sample selection algorithm (error based), driven by the evaluation of your model on a dense grid -->
 +
<SampleSelector id="error" type="ErrorSampleSelector" combineOutputs="false" weight="0.3" />
 +
 +
<!--Each sampling iterations new samples are selected randomly-->
 +
<SampleSelector id="random" type="RandomSampleSelector" combineOutputs="false" weight="0.2" />
 +
 +
<!--Remove samples that are too close to each other-->
 +
<MergeCriterion type="ClosenessThreshold">
 +
 +
<!-- Closeness threshold, Double -->
 +
<Option key="closenessThreshold" value="0.2"/>
 +
<!-- Set a % of the maximumSamples to randomly chosen -->
 +
<Option key="randomPercentage" value="0"/>
 +
 +
<Option key="debug" value="off" />
 +
</MergeCriterion>
 +
 +
</SampleSelector>
 +
</source>
 +
 +
 +
= PipelineSampleSelector =
 +
 +
The pipeline sample selector is an extensive sampling framework used by several of the predefined sample selectors. It splits the sampling process up into three separate tasks, which are executed one after each other, to come to a final set of sample locations. In this section, we will briefly discuss the three different steps in the pipeline process. Please look into the existing sample selectors (such as delaunay and error) for example implementations.
 +
 +
== CandidateGenerator ==
 +
 +
The candidate generator is responsible for generating an initial set of candidate new samples. Out if this candidates, eventually, a number of new samples will be picked. Examples of candidate generators are a grid, a set of random points, etc. To implement your own candidate generator, you need only make a function with the following declaration:
 +
 +
<source xmlns:saxon="http://icl.com/saxon" lang="matlab">
 +
function [state, candidates] = MyCandidateGenerator(state)
 +
</source>

Revision as of 13:30, 28 September 2009

The toolbox comes with a number of sample selection algorithms, both for experimental design (initial samples) and sequential design. Of course you are free to add your own.

You can govern how new samples are selected in a number of ways. The easiest method is by implementing your own sample selector class that derives from the SampleSelector base class in src/matlab/sampleSelectors. Again, only two methods are needed:

  • a constructor for reading in the configuration extracted from the XML file. See other sample selectors for the structure of this configuration.
  • a selectSamples.m file that, given the toolbox state, returns the next batch of samples.

The toolbox state is a Matlab struct with the following fields:

  • samples: the samples that were previously evaluated.
  • values: the output that must be used to select new samples for.
  • lastModels: the best models so far.
  • numNewSamples: the amount of new samples that must be selected. This is based on environmental information such as the modeling time, the number of available computational nodes (cpu cores, grid nodes) and so on.

The SUMO Toolbox contains an extensive framework for sampling, which makes it possible to combine your sampling algorithm in a number of ways with the existing algorithms. Below, a number of options are discussed. Note that these are optional, and it is also possible to write your own sample selector as mentioned above, without paying attention to the options below.


CombinedSampleSelector

The combined sample selector can be used to have multiple different sample selectors sample the same output during one run of the toolbox. This can be useful if there are multiple criteria that you want to use together to select new samples. Different weights can be given to the sample selectors, so that each sample selector gets to sample a number of samples proportional to its weight. Sample code for combining 3 selectors with different weights can be found below.

Note the <MergeCriterion> tag. This tag defines the criterion which is used to combine the selected samples from the different sample selectors in one set of new samples. The ClosenessThreshold merge criterion simply filters out samples that are too close to each other. This avoids the problem that two sample selectors might select new samples really close to each other, thus evaluating two (almost) identical samples. The code from this example can be copied to your own CombinedSampleSelector as is.


<!--Allows you combine multiple sample selector algorithms-->
<SampleSelector id="combo" type="CombinedSampleSelector" combineOutputs="false">
	
	<!-- A highly adaptive sampling algorithm, error and density based -->
	<SampleSelector id="lola" type="LOLASampleSelector" combineOutputs="false" weight="0.5" />
		
	<!-- An adaptive sample selection algorithm (error based), driven by the evaluation of your model on a dense grid -->
	<SampleSelector id="error" type="ErrorSampleSelector" combineOutputs="false" weight="0.3" />
		
	<!--Each sampling iterations new samples are selected randomly-->
	<SampleSelector id="random" type="RandomSampleSelector" combineOutputs="false" weight="0.2" />

	<!--Remove samples that are too close to each other-->
	<MergeCriterion type="ClosenessThreshold">
		
		<!-- Closeness threshold, Double -->
		<Option key="closenessThreshold" value="0.2"/>
		<!-- Set a % of the maximumSamples to randomly chosen -->
		<Option key="randomPercentage" value="0"/>
	
		<Option key="debug" value="off" />
	</MergeCriterion>

</SampleSelector>


PipelineSampleSelector

The pipeline sample selector is an extensive sampling framework used by several of the predefined sample selectors. It splits the sampling process up into three separate tasks, which are executed one after each other, to come to a final set of sample locations. In this section, we will briefly discuss the three different steps in the pipeline process. Please look into the existing sample selectors (such as delaunay and error) for example implementations.

CandidateGenerator

The candidate generator is responsible for generating an initial set of candidate new samples. Out if this candidates, eventually, a number of new samples will be picked. Examples of candidate generators are a grid, a set of random points, etc. To implement your own candidate generator, you need only make a function with the following declaration:

function [state, candidates] = MyCandidateGenerator(state)