# Model types explained

This page gives some additional information about the implementation of Models implemented in the SUMO Toolbox. Note that the information is only given for the Model class and additional options may be available in the corresponding ModelFactory.

*We are well aware that documentation is not always complete and possibly even out of date in some cases. We try to document everything as best we can but much is limited by available time and manpower. We are are a university research group after all. The most up to date documentation can always be found (if not here) in the default.xml configuration file and, of course, in the source files. If something is unclear please dont hesitate to ask.*

Also see the Using a model and Add Model Type pages.

## Contents |

## Model (Abstract base class)

The `Model`

class serves as an overall base class, it is the interface to which all models should adhere.

## RationalModel

A rational model tries to interpolate or approximate data by a rational function, like <math>3 x^2 y + 5 x y + 2x + 1</math> or by a quotient of 2 polynomial, like <math>\dfrac{xy + 2x + 6y + 2}{xy + 1}</math>

To decide which degrees (monomials) are present in numerator and denominator, the rational model depends on three model parameters:

- Variable weights W1 ... Wd (integer values, one for each dimension)
- Variable flags F1 ... Fd (boolean values, one for each dimension)
- An indicator for the degrees of freedom, P

To determine which degrees to use, this procedure is followed:

- Determine the number of sample points N
- Use P to determine the requested degrees of freedom: freedom = N * P / 100
- Select degrees based on weighting and flags, using following rules:
- Only variables for which Fi == false (0) are allowed in the denominator
- Monomials with degrees (a1 ... ad) get precedence over monomials with degrees (b1 ... bd) if and only if the expression a1*W1 + ... + ad*Wd < b1*W1 + ... + bd*Wd

The calculation of these suitable degrees are delegated to the Degrees class, which uses a Java implementation (Diophantine Solver) to order the monomials using the weighted expression.

Rationale: The weighting scheme is in place because the number of model parameters had to be restricted in some way. [Geest et al.] used such a weighting scheme before to discriminate between variables. The toolbox tries to select suitable values adaptively. In this way, it can eliminate variables which have no significant impact on the output (like in the Kotanchek example).

Please note that *larger weights* correspond to *less important variables*.

Two extra options are of interest:

- Selection of base function (a function handle)
- The frequencyVariable parameter (a variable name, or the keywords auto or off)

It is sometimes advantageous to use a different set of base functions. The default set is just the power function: 1, x, x^2, ... The toolbox provides two alternatives for this, the Legendre base functions, and the Chebyshev base functions. Each occurrence of x^i in the above should be read as *the value of the i'th base function in x*.

When the frequencyVariable parameter resolves to a variable name, or when it is set to auto and there is a variable named f, freq or frequency, that variable is treated differently. First of all it is scaled to by strictly positive. To do this the standard interval [-1,1] is rescaled to [j,2*j], in order to represent a true complex frequency. Then, a standard RationalModel is built, using *only real coefficients*. Only use this feature when complexHandling is set to 'complex', otherwise it is pointless.

## SVMModel

SVMModel is the SUMO class for Support Vector Machines (SVM). It is a wrapper class for three different SVM implementations: LibSVM [1], SVM-Light [2] and LS-SVM [3]. You can find more information about each implementation on their websites and in the SVMModel code.

SVMModel has the following properties (note that some of these properties only apply to one specific implementation):

- backend: Chooses which SVM implementation to use. The following options are valid: libSVM, libSVM, and lssvm. By default LibSVM is used.
- kernel: Chooses the kernel of the SVM. The following options are available: rbf (radial basis functions), lin (linear), poly (polynomial), and sig (sigmoid)
- kernelParams: Lets you specify the specific kernel parameters, the options here are dependent on the backend used
- regularizationParam: Lets you specify the trade-off parameter between the margin and the training error. The regularization parameter will be 10^regularizationParam
- epsilon: Sets the epsilon parameter of support vector regression (SVR).
- stoppingTolerance: Sets the tolerance for the stopping criterion.
- nu: Set the nu parameter of nu-SVR and nu-SVC (only for LibSVM)
- type: Sets the type of LibSVM (nu-SVR, epsilon-SVR, etc...)
- crossvalidationFolds: Set the number of folds for LibSVM in cross-validation mode

## EureqaModel

The EureqaModel implements the interface between the SUMO Toolbox and the Eureqa Toolbox. This interface is modified code from [4]. The EureqaModel implementation does not do modelling itself, instead it connects to a Eureqa server from which it retrieves the solution. Hence, first the Eureqa server needs to be installed, see the installation instructions. The options for the EureqaModel are the following:

- fitness: Chooses the fitness function of the genetic search. The fitness functions are coded, e.g. 0 -> mean absolute error. The complete list of fitness functions can be found here.
- operators: A string with the operators allowed in the genetic search. The operators can be given in arbitrary order spaces not necessary. The set of possible operators is limited compared to the standalone GUI version of Eureqa. Only 'constant, +, -, *, /, ^, exp, log, sin, cos, abs, tan' are supported.
- duration: The amount of time allocated to the genetic search in second.
- debug: Displays intermediate information during the search.
- doStart: If this is true, SUMO will automatically start a Eureqa server. If additional options about the server are given (see below) SUMO will start the server with these options. Otherwise a Eureqa server has to be started manually.
- pathToServer: The path where the Eureqa server is located. By default SUMO will look in the directory of the EureqaFactory class.
- host: The name of the host of the Eureqa server, by default this is localhost.
- port: The port with which to connect to the Eureqa Server.
- forceCores: Forces the server to use a specified amount of CPU cores.
- maxCores: Set the maximum number of CPU cores the Eureqa server can use.
- scaleFactors: Models in SUMO normally work in Model space, i.e. the input space is scaled to [-1 1] for all dimensions. The scaleFactors property tells SUMO how this [-1 1] should be scaled instead. When calling a EureqaModel from the EureqaFactory, the scaleFactors will be set equal to those defined in the Simulator.xml of the problem.

EureqaModel returns a solution string which can be accessed by the method model.getFormula. The method model.getAllSolutions will give the entire pareto front of solutions as a cell of strings.

## MOVFModel

The MOVFModel implements the Multivariate Orthonormal Vector Fitting (MOVF) technique [5]. This is a rational that fits the data in a least-square sense. This model is not included in the release version of the SUMO Toolbox due to licensing issues. Due to the implementation, the **frequency has to be the last parameter** of data given to the MOVFModel.

The properties for MOVFModel are the following:

- order: A vector with the order of each input parameter. If the basis used is rational, all the orders have to be even. If the basis used is polynomial, the orders for the geometrical parameters can be any positive integer, and only the order of the frequency has to be even.
- iter: Determines the number of internal optimization iterations of the MOVFModel.
- basis: Specifies the basis used to model the geometric parameters of the MOVFModel. The basis can either be
*rational*or*polynomial*. - errorFunc: A string with the name of an error function used to optimize the MOVFModel internally. The valid options can be found in src/matlab/tools/errorFunctions
- scaleFactors: Scales the data from the default [-1 1] input of SUMO to the desired bounds. When calling the MOVFModel from MOVFFactory the bounds defined in the Simulator.xml of the problem will be used.
- batchSize: Sets the number of samples that are evaluated simultaneously.

The coefficients of the MOVFModel can be accessed via model.getMOVF.coef.