Histogram-based face tracker with CAMShift

In this demo, we implement a simple face tracker applied on an input video.

Sources:

Contents

Video

Create the video file reader

if true
    v = which('vipcolorsegmentation.avi');
    s = 3;                    % scale since frames are a bit too small
    win = [40 45 25 25] * s;  % hardcoded object location
else
    v = which('visionface.avi');
    s = 1;
    win = [275 125 75 100];
end
if isempty(v)
    if true
        filtspec = strjoin(strcat('*.', {'avi','mpg','mpeg','mp4','wmv'}), ';');
        [fn,fp] = uigetfile(filtspec, 'Select a video file');
        if fp==0, error('No file selected'); end
        v = fullfile(fp,fn);
    else
        v = 0;
    end
    s = 1;
    win = [];
end
vid = cv.VideoCapture(v);
assert(vid.isOpened(), 'Could not initialize capturing');

Read the first video frame which contains the object to track

img = vid.read();
assert(~isempty(img), 'Failed to read frame');
if s ~= 1, img = cv.resize(img, s, s); end
sz = size(img);

Options

% visualization options
use_hg = false;
vis_prob = false;
str = {'meanshift', 'camshift', 'camshift rotated'};
clr = 255 * eye(3);

% mean shift termination criteria
crit = struct('type','Count+EPS', 'maxCount',10, 'epsilon',1.0);

Prepare plot

hImg = imshow(img);
title('Histogram-based Tracker (MeanShift & CamShift)')
if use_hg
    hRectMS = rectangle('Position',win, 'EdgeColor','r');
    hRectCS = rectangle('Position',win, 'EdgeColor','g');
    hLineCS = line(NaN, NaN, 'Color','b');
end

Detect a face to track

Define the object region. This initial window is typically found using some sort of object detection. Optionally, you can select the object region using your mouse with IMRECT. The object must occupy the majority of the region.

if isempty(win)
    if false
        % automatically detect biggest face in image
        xmlfile = fullfile(mexopencv.root(),'test','haarcascade_frontalface_alt2.xml');
        obj = cv.CascadeClassifier(xmlfile);
        faces = obj.detect(cv.equalizeHist(cv.cvtColor(img, 'RGB2GRAY')));
        clear obj
        if ~isempty(faces)
            %TODO: we can further improve this by detecting nose within face,
            % as it provides more accurate measure of skin tone with less
            % background pixels
            [~,idx] = max(cellfun(@(f) cv.Rect.area(f), faces));
            win = faces{idx};
        end
    elseif ~mexopencv.isOctave() && mexopencv.require('images')
        % interactively select region with mouse
        hRect = imrect(gca);
        setColor(hRect, clr(1,:)/255);
        win = wait(hRect);
        mask = uint8(createMask(hRect, hImg) * 255);
        delete(hRect);
        win(1:2) = win(1:2) - 1;
    else
        % fallback to using an input dialog to prompt for region
        win = inputdlg(strcat('win.',{'x','y','w','h'}), 'Window', 1);
        win = str2double(win);
    end
end
assert(~isempty(win) && cv.Rect.area(win) > 0, 'invalid object region');
win = cv.Rect.intersect(win, [0 0 sz(2) sz(1)]);
winMS = win;
winCS = win;

% set up ROI mask marking the object to track
if true
    mask = zeros(sz(1:2), 'uint8');
    mask = cv.rectangle(mask, win, 'Color',255, 'Thickness','Filled');
elseif true
    w = win + [1 1 0 0];
    mask = false(sz(1:2));
    mask(w(2):w(2)+w(4), w(1):w(1)+w(3)) = true;
end

Facial feature to track (hue color)

Set the object, based on the hue channel of the first video frame. (Convert to HSV color space and calculate the hue histogram of object)

imgHSV = cv.cvtColor(img, 'RGB2HSV');
if true
    H = cv.calcHist(imgHSV(:,:,1), 0:180, 'Mask',mask);
else
    hue = imgHSV(:,:,1);
    H = histc(imgHSV(mask), 0:179);
end

Track the face

Track and display the object in each video frame. The while loop reads each image frame, converts the image to HSV color space, then tracks the object in the hue channel where it is distinct from the background. Finally, the example draws a box around the object and displays the results.

while ishghandle(hImg)
    % next video frame
    img = vid.read();
    if isempty(img), break; end
    if s ~= 1, img = cv.resize(img, s, s); end

    % probability according to histogram empirical model
    imgHSV = cv.cvtColor(img, 'RGB2HSV');
    if true
        D = cv.calcBackProject(imgHSV(:,:,1), H, 0:180);
    else
        [~,idx] = histc(imgHSV(:,:,1), 0:179);
        D = H(idx);
    end

    % normalize the probability
    if true
        D = cv.normalize(D, 'NormType','MinMax', ...
            'Alpha',0, 'Beta',1, 'DType','double');
    else
        D = double(D);
        D = (D - min(D(:))) ./ (max(D(:)) - min(D(:)));
    end

    % find new window
    if true
        %TODO: camshift tends to give larger windows??
        % so we use the meanshift window
        winCS = winMS;
    end
    winMS = cv.meanShift(D, winMS, 'Criteria',crit);
    [boxCS,winCS] = cv.CamShift(D, winCS, 'Criteria',crit);
    %winCS = cv.RotatedRect.boundingRect(boxCS);

    % visualize backprojection instead of raw frame
    if vis_prob
        img = cv.cvtColor(uint8(D * 255), 'GRAY2RGB');
    end

    % draw meanshift and camshift tracking windows
    boxPts = cv.RotatedRect.points(boxCS);
    if use_hg
        set(hRectMS, 'Position',winMS);
        set(hRectCS, 'Position',winCS);
        set(hLineCS, 'XData',boxPts([1:4 1],1), 'YData',boxPts([1:4 1],2));
    else
        img = cv.rectangle(img, winMS, 'Color',clr(1,:));
        img = cv.rectangle(img, winCS, 'Color',clr(2,:));
        img = cv.polylines(img, {boxPts}, 'Color',clr(3,:), 'Closed',true, ...
            'LineType','AA');
        for i=1:3
            % draw legend
            img = cv.putText(img, ['- ' str{i}], [10 i*10], ...
                'Color',clr(i,:), 'FontScale',0.4, 'LineType','AA');
        end
    end

    % show result
    set(hImg, 'CData',img);
    pause(0.05);
end

Release the video reader

vid.release();