4 min read

Conceptualizing Haar cascades

When we talk about classifying objects and tracking their location, what exactly are we hoping to pinpoint? What constitutes a recognizable part of an object?

Photographic images, even from a webcam, may contain a lot of detail for our (human) viewing pleasure. However, image detail tends to be unstable with respect to variations in lighting, viewing angle, viewing distance, camera shake, and digital noise. Moreover, even real differences in physical detail might not interest us for the purpose of classification. I was taught in school, that no two snowflakes look alike under a microscope. Fortunately, as a Canadian child, I had already learned how to recognize snowflakes without a microscope, as the similarities are more obvious in bulk.

Thus, some means of abstracting image detail is useful in producing stable classification and tracking results. The abstractions are called features, which are said to be extracted from the image data. There should be far fewer features than pixels, though any pixel might influence multiple features. The level of similarity between two images can be evaluated based on distances between the images’ corresponding features. For example, distance might be defined in terms of spatial coordinates or color coordinates. Haar-like features are one type of feature that is often applied to real-time face tracking. They were first used f or this purpose by Paul Viola and Michael Jones in 2001. Each Haar-like feature describes the pattern of contrast among adjacent image regions. For example, edges, vertices, and thin lines each generate distinctive features. For any given image, the features may vary depending on the regions’ size, which may be called the window size. Two images that differ only in scale should be capable of yielding similar features, albeit for different window sizes. Thus, it is useful to generate features for multiple window sizes. Such a collection of features is called a cascade. We may say a Haar cascade is scale-invariant or, in other words, robust to changes in scale. OpenCV provides a classifier and tracker for scale-invariant Haar cascades, whic h it expects to be in a certain file format. Haar cascades, as implemented in OpenCV, are not robust to changes in rotation. For example, an upside-down face is not considered similar to an upright face and a face viewed in profile is not considered similar to a face viewed from the front. A more complex and more resource-intensive implementation could improve Haar cascades’ robustness to rotation by considering multiple transformations of images as well as multiple window sizes. However, we will confine ourselves to the implementation in OpenCV.

Getting Haar cascade data

As part of your OpenCV setup, you probably have a directory called haarcascades. It contains cascades that are trained for certain subjects using tools that come with OpenCV. The directory’s full path depends on your system and method of setting up OpenCV, as follows:

  • Build from source archive:: <unzip_destination>/data/haarcascades
  • Windows with self-extracting ZIP:<unzip_destination>/data/haarcascades
  • Mac with MacPorts:MacPorts: /opt/local/share/OpenCV/haarcascades
  • Mac with Homebrew:The haarcascades file is not included; to get it, download the source archive

  • Ubuntu with apt or Software Center: The haarcascades file is not included; to get it, download the source archive

If you cannot find haarcascades, then download the source archive from http://sourceforge.net/projects/opencvlibrary/files/opencv-unix/2.4.3/OpenCV-2.4.3.tar.bz2/download (or the Windows self-extracting ZIP from http://sourceforge.net/projects/opencvlibrary/files/opencvwin/ 2.4.3/OpenCV-2.4.3.exe/download), unzip it, and look for <unzip_destination>/data/haarcascades.

Once you find haarcascades, create a directory called cascades in the same folder as cameo.py and copy the following files from haarcascades into cascades:

haarcascade_frontalface_alt.xml
haarcascade_eye.xml
haarcascade_mcs_nose.xml
haarcascade_mcs_mouth.xml

As their names suggest, these cascades are for tracking faces, eyes, noses, and mouths. They require a frontal, upright view of the subject. With a lot of patience and a powerful computer, you can make your own cascades, trained for various types of objects.

Creating modules

We should continue to maintain good separation between application-specific code and reusable code. Let’s make new modules for tracking classes and their helpers.

A file called trackers.py should be created in the same directory as cameo.py (and, equivalently, in the parent directory of cascades ). Let’s put the following import statements at the start of trackers.py:

import cv2
import rects
import utils

Alongside trackers.py and cameo.py, let’s make another file called rects.py containing the following import statement:

import cv2

Our face tracker and a definition of a face will go in trackers.py, while various helpers will go in rects.py and our preexisting utils.py file.

LEAVE A REPLY

Please enter your comment!
Please enter your name here