train (cv.SVM) - mexopencv

Trains the statistical model

status = model.train(samples, responses)
status = model.train(csvFilename, [])
[...] = model.train(..., 'OptionName', optionValue, ...)

Input

samples matrix of training samples. It should have single type. By default, each row represents a sample (see the Layout option).
responses matrix of associated responses. If the responses are scalar, they should be stored as a vector (as a single row or a single column matrix). The matrix should have type single or int32 (in the former case the responses are considered as ordered (numerical) by default; in the latter case as categorical). You can override the defaults using the VarType option.
csvFilename The input CSV file name from which to load dataset. In this variant, you should set the second argument to an empty array.

Output

status Success flag.

Options

Data Training data options, specified as a cell array of key/value pairs of the form {'key',val, ...}. See below.
Flags The optional training flags, model-dependent. Not used. default 0

Options for `Data` (first variant with samples and reponses)

Layout Sample types. Default 'Row'. One of:
- Row each training sample is a row of samples.
- Col each training sample occupies a column of samples.
VarIdx vector specifying which variables to use for training. It can be an integer vector (int32) containing 0-based variable indices or logical vector (uint8 or logical) containing a mask of active variables. Not set by default, which uses all variables in the input data.
SampleIdx vector specifying which samples to use for training. It can be an integer vector (int32) containing 0-based sample indices or logical vector (uint8 or logical) containing a mask of training samples of interest. Not set by default, which uses all samples in the input data.
SampleWeights optional floating-point vector with weights for each sample. Some samples may be more important than others for training. You may want to raise the weight of certain classes to find the right balance between hit-rate and false-alarm rate, and so on. Not set by default, which effectively assigns an equal weight of 1 for all samples.
VarType optional vector of type uint8 and size <num_of_vars_in_samples> + <num_of_vars_in_responses>, containing types of each input and output variable. By default considers all variables as numerical (both input and output variables). In case there is only one output variable of integer type, it is considered categorical. You can also specify a cell-array of strings (or as one string of single characters, e.g 'NNNC'). Possible values:
- Numerical, N same as 'Ordered'
- Ordered, O ordered variables
- Categorical, C categorical variables
MissingMask Indicator mask for missing observation (not currently implemented). Not set by default
TrainTestSplitCount divides the dataset into train/test sets, by specifying number of samples to use for the test set. By default all samples are used for the training set.
TrainTestSplitRatio divides the dataset into train/test sets, by specifying ratio of samples to use for the test set. By default all samples are used for the training set.
TrainTestSplitShuffle when splitting dataset into train/test sets, specify whether to shuffle the samples. Otherwise samples are assigned sequentially (first train then test). default true

Options for `Data` (second variant for loading CSV file)

HeaderLineCount The number of lines in the beginning to skip; besides the header, the function also skips empty lines and lines staring with '#'. default 1
ResponseStartIdx Index of the first output variable. If -1, the function considers the last variable as the response. If the dataset only contains input variables and no responses, use ResponseStartIdx = -2 and ResponseEndIdx = 0, then the output variables vector will just contain zeros. default -1
ResponseEndIdx Index of the last output variable + 1. If -1, then there is single response variable at ResponseStartIdx. default -1
VarTypeSpec The optional text string that specifies the variables' types. It has the format ord[n1-n2,n3,n4-n5,...]cat[n6,n7-n8,...]. That is, variables from n1 to n2 (inclusive range), n3, n4 to n5 ... are considered ordered and n6, n7 to n8 ... are considered as categorical. The range [n1..n2] + [n3] + [n4..n5] + ... + [n6] + [n7..n8] should cover all the variables. If VarTypeSpec is not specified, then algorithm uses the following rules:
- all input variables are considered ordered by default. If some column contains has non- numerical values, e.g. 'apple', 'pear', 'apple', 'apple', 'mango', the corresponding variable is considered categorical.
- if there are several output variables, they are all considered as ordered. Errors are reported when non-numerical values are used.
- if there is a single output variable, then if its values are non-numerical or are all integers, then it's considered categorical. Otherwise, it's considered ordered.
Delimiter The character used to separate values in each line. default ','
Missing The character used to specify missing measurements. It should not be a digit. Although it's a non-numerical value, it surely does not affect the decision of whether the variable ordered or categorical. default '?'
TrainTestSplitCount same as above.
TrainTestSplitRatio same as above.
TrainTestSplitShuffle same as above.

The method trains the SVM model. It follows the conventions of the generic train approach with the following limitations:

Input variables are all ordered.
Output variables can be either categorical (Type=C_SVC or Type=NU_SVC), or ordered (Type=EPS_SVR or Type=NU_SVR), or not required at all (Type=ONE_CLASS).
Missing measurements are not supported.

SVM models may be trained on a selected feature subset, and/or on a selected sample subset of the training set. To make it easier for you, the data options include the VarIdx and SampleIdx parameters. The former parameter identifies variables (features) of interest, and the latter one identifies samples of interest. Both vectors are either integer vectors (lists of 0-based indices) or logical masks of active variables/samples. You may pass empty input instead of either of the arguments, meaning that all of the variables/samples are used for training.

Example

For example, an Nx4 samples matrix of row layout with four numerical variables and one categorical response variable Nx1 can be specified as:

model.train(samples, responses, 'Flags',0, ...
    'Data',{'Layout','Row', 'VarType','NNNNC'});

Example

You can also directly load a dataset from a CSV file:

model.train('C:\path\to\data.csv', [], 'Flags',0, ...
    'Data',{'HeaderLineCount',1, 'Delimiter',','});

Access	public
Sealed	false
Static	false

Input

Output

Options

Options for Data (first variant with samples and reponses)

Options for Data (second variant for loading CSV file)

Example

Example

Options for `Data` (first variant with samples and reponses)

Options for `Data` (second variant for loading CSV file)