Facemark AAM training demo

The user should provides the list of training images accompanied by their corresponding landmarks location in separate files.

See below for a description of file formats.

Examples of datasets are available at https://ibug.doc.ic.ac.uk/resources/facial-point-annotations/.

Sources:

Contents

Preparation

Before you continue with this tutorial, you should download a training dataset of facial landmarks detection.

We suggest you to download the LFPW dataset which can be retrieved at https://ibug.doc.ic.ac.uk/download/annotations/lfpw.zip.

First thing to do is to make two text files containing the list of image files and annotation files respectively. Make sure that the order of images and annotations in both files are matched. Furthermore, it is advised to use absolute paths instead of relative paths.

Example to make the file list in Linux machine:

ls /data/lfpw/trainset/*.png > images_train.txt
ls /data/lfpw/trainset/*.pts > annotations_train.txt

Optionally, you can also create similar files list for the testset.

Example of content in the images_train.txt file:

/data/lfpw/trainset/image_0001.png
/data/lfpw/trainset/image_0002.png
/data/lfpw/trainset/image_0003.png
...

Example of content in the annotations_train.txt file:

/data/lfpw/trainset/image_0001.pts
/data/lfpw/trainset/image_0002.pts
/data/lfpw/trainset/image_0003.pts
...

where a .pts file contains the position of each face landmark. Make sure that the annotation format is supported by the API, where the contents should look like the following snippet:

version: 1
n_points:  68
{
212.716603 499.771793
230.232816 566.290071
...
}

Once trained, we show how to use the model to detect face landmarks in a test image.

In this tutorial, the pre-trained model will not be provided due to its large file size (~500MB). By following this tutorial, you will be able to train and obtain your own trained model within few minutes.

Options

% [INPUT] path of a text file contains the list of paths to all training images
imgList = fullfile(mexopencv.root(),'test','facemark','lfpw','images.lst');
assert(exist(imgList, 'file') == 2, 'missing images list file');

% [INPUT] path of a text file contains the list of paths to all annotations files
ptsList = fullfile(mexopencv.root(),'test','facemark','lfpw','annotations.lst');
assert(exist(ptsList, 'file') == 2, 'missing annotations list file');

% [OUTPUT] path for saving the trained model
modelFile = fullfile(tempdir(), 'model_aam.yaml');

% [INPUT] path to the cascade xml file for the face detector
xmlFace = fullfile(mexopencv.root(),'test','haarcascade_frontalface_alt.xml');
download_classifier_xml(xmlFace);

% [INPUT] path to the cascade xml file for the eyes detector
xmlEyes = fullfile(mexopencv.root(),'test','haarcascade_eye_tree_eyeglasses.xml');
download_classifier_xml(xmlEyes);

% path to test image
testImg = fullfile(mexopencv.root(),'test','lena.jpg');

Init

create the facemark instance

scales = [2.0, 4.0];
obj = cv.Facemark('AAM', 'Scales',scales, ...
    'ModelFilename',modelFile, 'SaveModel',true, 'Verbose',true);

In this case, we modified the default list of the scaling factor. By default, the scaling factor used is 1.0 (no scaling). Here we add two more scaling factor which will make the instance trains two more model at scale 2 and 4 (2 times smaller and 4 times smaller, with faster fitting time). However, you should make sure that this scaling factor is not too big since it will make the image scaled into a very small one. Thus it will lose all of its important information for the landmark detection purpose.

Data

load the dataset, and add training samples one-by-one

disp('Loading data...')
[imgFiles, ptsFiles] = cv.Facemark.loadDatasetList(imgList, ptsList);
for i=1:numel(imgFiles)
    % load image and its corresponding annotation data, then add pair
    img = cv.imread(imgFiles{i});
    pts = cv.Facemark.loadFacePoints(ptsFiles{i});
    obj.addTrainingSample(img, pts);
end
Loading data...

Train

train the algorithm, model will be saved to specified file

disp('Training...')
tic
obj.training();
toc
Training...
Elapsed time is 6.612023 seconds.

Prepare for Test

Since the AAM algorithm needs initialization parameters (rotation, translation, and scaling), we need to declare the required variable to store these information which will be obtained using a custom function. The implementation of getInitialFitting function in this example is not optimal, you can always create your own function.

The initialization is obtained by comparing the base shape of the trained model with the current face image. In this case, the rotation is obtained by comparing the angle of line formed by two eyes in the input face image with the same line in the base shape. Meanwhile, the scaling is obtained by comparing the length of line between eyes in the input image compared to the base shape.

The fitting process starts by detecting faces in given image.

If at least one face is found, then the next step is computing the initialization parameters. In this case, since getInitialFitting function is not optimal, it may not find pair of eyes from a given face. Therefore, we will filter out faces without initialization parameters and in this case, each element in the confs vector represent the initialization parameters for each filtered face.

create cascade detector objects (for face and eyes)

ccFace = cv.CascadeClassifier(xmlFace);
ccEyes = cv.CascadeClassifier(xmlEyes);

detect faces

img = cv.imread(testImg);
faces = myFaceDetector(img, ccFace);
assert(~isempty(faces), 'no faces found');
fprintf('%d faces\n', numel(faces));
1 faces

get base shape from trained model

s0 = obj.getData();
s0 = cat(1, s0{:});

compute initialization params for each detected face

S = struct('R',eye(2), 't',[0 0], 'scale',1);
confs = S([]);
faces_eyes = {};
for i=1:numel(faces)
    [conf, found] = getInitialFitting(img, faces{i}, s0, ccEyes);
    if found
        confs(end+1) = conf;
        faces_eyes{end+1} = faces{i};
    end
end
assert(~isempty(confs), 'failed to compute initialization params');
fprintf('%d faces with eyes\n', numel(confs));
1 faces with eyes

For the fitting parameters stored in the confs vector, scaleIdx field represents the ID of scaling factor that will be used in the fitting process. In this example the fitting will use the biggest scaling factor (4) which is expected to have the fastest computation time compared to the other scales. If the ID is bigger than the available trained scales in the model, the model with the biggest scale ID is used.

confs.scaleIdx = numel(scales) - 1;

Test

The fitting process is quite simple, you just need to pass the image, array of rectangles representing the ROIs of all faces in the given image, and the configuration params. It returns the landmark points.

tic
landmarks = obj.fit(img, faces_eyes, 'Configs',confs);
toc
Elapsed time is 0.481008 seconds.

After the fitting process is finished, we can visualize the result

for i=1:numel(landmarks)
    img = cv.Facemark.drawFacemarks(img, landmarks{i}, 'Color',[0 255 0]);
end
imshow(img)

Helper functions

function download_classifier_xml(fname)
    if exist(fname, 'file') ~= 2
        % attempt to download trained Haar/LBP/HOG classifier from Github
        url = 'https://cdn.rawgit.com/opencv/opencv/3.4.0/data/';
        [~, f, ext] = fileparts(fname);
        if strncmpi(f, 'haarcascade_', length('haarcascade_'))
            url = [url, 'haarcascades/'];
        elseif strncmpi(f, 'lbpcascade_', length('lbpcascade_'))
            url = [url, 'lbpcascades/'];
        elseif strncmpi(f, 'hogcascade_', length('hogcascade_'))
            url = [url, 'hogcascades/'];
        else
            error('File not found');
        end
        urlwrite([url f ext], fname);
    end
end

function faces = myFaceDetector(img, ccFace)
    %MYFACEDETECTOR  Detect faces
    %
    %    faces = myFaceDetector(img, ccFace)
    %
    % ## Input
    % * __img__ input image
    % * __ccFace__ cascade object for face detection
    %
    % ## Output
    % * __faces__ detected faces, `{[x,y,w,h], ...}`
    %
    % See also: cv.Facemark.getFacesHAAR
    %

    if size(img,3) > 1
        gray = cv.cvtColor(img, 'RGB2GRAY');
    else
        gray = img;
    end
    gray = cv.equalizeHist(gray);
    faces = ccFace.detect(gray, 'ScaleFactor',1.4, 'MinNeighbors',2, ...
        'ScaleImage',true, 'MinSize',[30 30]);
end

function [conf, found] = getInitialFitting(img, face, s0, ccEyes)
    %GETINITIALFITTING  Calculate AAM intial fit params
    %
    %     [conf, found] = getInitialFitting(img, face, s0, ccEyes)
    %
    % ## Input
    % * __img__ input image
    % * __face__ detected face `[x,y,w,h]`
    % * __s0__ base shape of the trained model
    % * __ccEyes__ cascade object for eyes detection
    %
    % ## Output
    % * __conf__ struct with rotation, translation, and scale
    % * __found__ success flag
    %

    found = false;
    conf = struct('R',eye(2), 't',[0 0], 'scale',1.0);

    % detect eyes in face
    if cv.Rect.area(face) == 0, return; end
    faceROI = cv.Rect.crop(img, face);
    eyes = ccEyes.detect(faceROI, 'ScaleFactor',1.1, 'MinNeighbors',2, ...
        'ScaleImage',true, 'MinSize',[20 20]);
    if numel(eyes) ~= 2, return; end

    % make sure that first is left eye, second is right eye
    if eyes{2}(1) < eyes{1}(1)
        eyes = eyes([2 1]);
    end

    % eyes centers in detected face
    c1 = face(1:2) + eyes{1}(1:2) + eyes{1}(3:4)/2;  % left eye
    c2 = face(1:2) + eyes{2}(1:2) + eyes{2}(3:4)/2;  % right eye
    assert(c1(1) < c2(1), 'eyes not ordered correctly (left then right)');

    % eyes centers in base shape (shifted to middle of image)
    base = bsxfun(@plus, s0, [size(img,2) size(img,1)]/2);
    c1Base = (base(37,:) + base(40,:)) / 2;  % left eye
    c2Base = (base(43,:) + base(46,:)) / 2;  % right eye

    % scale between the two line length in detected and base shape
    scale = norm(c2 - c1) / norm(c2Base - c1Base);

    % eyes centers in scaled base shape (not shifted)
    base = s0 * scale;
    c1Base = (base(37,:) + base(40,:)) / 2;
    c2Base = (base(43,:) + base(46,:)) / 2;

    % angle of horizontal line connecting eyes centers in scaled base shape
    aBase = atan2(c2Base(2) - c1Base(2), c2Base(1) - c1Base(1));

    % angle of horizontal line connecting eyes centers in detect face
    a = atan2(c2(2) - c1(2), c2(1) - c1(1));

    % rotation matrix from the two angles
    R = cv.getRotationMatrix2D([0 0], rad2deg(aBase-a), 1.0);
    R = R(1:2,1:2);

    % eyes centers in transformed base shape (scaled then rotated)
    base = (R * scale * s0')';
    c1Base = (base(37,:) + base(40,:)) / 2;
    c2Base = (base(43,:) + base(46,:)) / 2;

    % translation between detected and transformed base shape
    t = c1 - c1Base;

    % fill output
    found = true;
    conf.R = R;
    conf.t = t;
    conf.scale = scale;
end