Classification demo

This demonstrates an example of machine learning algorithms in a simple classification problem. It compares different classifiers using the same data samples.

Prepare data: there are two normal distributions

X = double([randn(1000,5)+.5; randn(1000,5)-.5]); % features
Y =  int32([    ones(1000,1);    -ones(1000,1)]); % labels
test_idx = mod(1:numel(Y),3)==0;                  % train/test split

try a bunch of classifiers (using their default options)

models = { ...
    cv.ANN_MLP(), ...
    cv.NormalBayesClassifier(), ...
    cv.KNearest(), ...
    cv.SVM(), ...
    cv.SVMSGD(), ...
    cv.DTrees(), ...
    cv.Boost(), ...
    cv.RTrees(), ...
    ... cv.ERTrees(), ...
    ... cv.GBTrees(), ...
    cv.LogisticRegression() ...
};

for each classifier

for i = 1:numel(models)
    try

        classifier = models{i};
        fprintf('=== %s ===\n', class(classifier));

        Ytrain = Y(~test_idx,:);
        if isa(classifier, 'cv.ANN_MLP')
            % ANN_MLP must be initialized properly with non-default values
            classifier.LayerSizes = [size(X,2), 2];
            classifier.setActivationFunction('Sigmoid', ...
                'Param1',1, 'Param2',1);

            % Unroll labels to an indicator representation
            Ytrain = double([Ytrain==1, Ytrain==-1]);
        end

        % train
        tic;
        classifier.train(X(~test_idx,:), Ytrain);
        fprintf('Training time %f seconds\n', toc);

        % predict
        tic;
        Yhat = classifier.predict(X(test_idx,:));
        fprintf('Prediction time %f seconds\n', toc);

        if isa(classifier, 'cv.ANN_MLP')
            % Get it back to a categorical vector
            Yhat = (Yhat(:,1) > Yhat(:,2))*2 - 1;
        end

        % evaluate
        Yhat = int32(Yhat);
        accuracy = nnz(Yhat == Y(test_idx)) / nnz(test_idx);
        fprintf('Accuracy: %.2f%%\n', accuracy*100);

    catch ME
        %disp(ME.getReport())
        disp('error!')
    end
end
=== cv.ANN_MLP ===
Training time 0.007717 seconds
Prediction time 0.003091 seconds
Accuracy: 86.94%
=== cv.NormalBayesClassifier ===
Training time 0.006115 seconds
Prediction time 0.002396 seconds
Accuracy: 86.49%
=== cv.KNearest ===
Training time 0.004083 seconds
Prediction time 0.004525 seconds
Accuracy: 83.63%
=== cv.SVM ===
Training time 0.022067 seconds
Prediction time 0.007866 seconds
Accuracy: 83.03%
=== cv.SVMSGD ===
Training time 0.336153 seconds
Prediction time 0.004351 seconds
Accuracy: 86.19%
=== cv.DTrees ===
Training time 0.008569 seconds
Prediction time 0.003721 seconds
Accuracy: 81.23%
=== cv.Boost ===
Training time 0.064558 seconds
Prediction time 0.004667 seconds
Accuracy: 82.88%
=== cv.RTrees ===
Training time 0.084264 seconds
Prediction time 0.005877 seconds
Accuracy: 85.14%
=== cv.LogisticRegression ===
Training time 0.395183 seconds
Prediction time 0.011265 seconds
Accuracy: 86.64%