Deep Learning Haar Cascade Explained
Alright! This is where we start having some fun! The concept behind the Cascade and how it is used in the real world is nothing short of amazing. So what is it?
Cascade is a machine learning object detection algorithm used to identify objects in an image or video and based on the concept of features proposed by Paul Viola and Michael Jones in their paper "Rapid Object Detection using a Boosted Cascade of Simple Features" in 2001.
It is a machine learning based approach where a cascade function is trained from a lot of positive and negative images. It is then used to detect objects in other images.
The algorithm has four stages:
- Feature Selection
- Cascading Classifiers
It is well known for being able to detect faces and body parts in an image, but can be trained to identify almost any object.
Lets take face detection as an example. Initially, the algorithm needs a lot of positive images of faces and negative images without faces to train the classifier. Then we need to extract features from it.
First step is to collect the Features. A feature considers adjacent rectangular regions at a specific location in a detection window, sums up the pixel intensities in each region and calculates the difference between these sums.
are used to make this super fast.
But among all these features we calculated, most of them are irrelevant. For example, consider the image below. Top row shows two good features. The first feature selected seems to focus on the property that the region of the eyes is often darker than the region of the nose and cheeks. The second feature selected relies on the property that the eyes are darker than the bridge of the nose. But the same windows applying on cheeks or any other place is irrelevant.
So how do we select the best features out of 160000+ features? This is accomplished using a concept called which both selects the best features and trains the classifiers that use them. This algorithm constructs a “strong” classifier as a linear combination of weighted simple “weak” classifiers. The process is as follows.
During the detection phase, a window of the target size is moved over the input image, and for each subsection of the image and features are calculated. You can see this in action in the video below. This difference is then compared to a learned threshold that separates non-objects from objects. Because each feature is only a "weak classifier" (its detection quality is slightly better than random guessing) a large number of features are necessary to describe an object with sufficient accuracy and are therefore organized into cascade classifiers to form a strong classifier.
The cascade classifier consists of a collection of stages, where each stage is an ensemble of weak learners. The weak learners are simple classifiers called decision stumps. Each stage is trained using a technique called boosting. Boosting provides the ability to train a highly accurate classifier by taking a weighted average of the decisions made by the weak learners.
Each stage of the classifier labels the region defined by the current location of the sliding window as either positive or negative. Positive indicates that an object was found and negative indicates no objects were found. If the label is negative, the classification of this region is complete, and the detector slides the window to the next location. If the label is positive, the classifier passes the region to the next stage. The detector reports an object found at the current window location when the final stage classifies the region as positive.
The stages are designed to reject negative samples as fast as possible. The assumption is that the vast majority of windows do not contain the object of interest. Conversely, true positives are rare and worth taking the time to verify.
- A true positive occurs when a positive sample is correctly classified.
- A false positive occurs when a negative sample is mistakenly classified as positive.
- A false negative occurs when a positive sample is mistakenly classified as negative.
To work well, each stage in the cascade must have a low false negative rate. If a stage incorrectly labels an object as negative, the classification stops, and you cannot correct the mistake. However, each stage can have a high false positive rate. Even if the detector incorrectly labels a nonobject as positive, you can correct the mistake in subsequent stages. Adding more stages reduces the overall false positive rate, but it also reduces the overall true positive rate.
Cascade classifier training requires a set of positive samples and a set of negative images. You must provide a set of positive images with regions of interest specified to be used as positive samples. You can use the Image Labeler to label objects of interest with bounding boxes. The Image Labeler outputs a table to use for positive samples. You also must provide a set of negative images from which the function generates negative samples automatically. To achieve acceptable detector accuracy, set the number of stages, feature type, and other function parameters.
The video below shows this in action.
Haar Cascade - Facial Detection IN ACTION
If a picture is worth a thousand words this would be a million words. This is where it all comes together. The Ahh-hah moment.
This simple video helped crystalize for me how this algorithm works. Here are some observations:
- Notice how the algorithm moves the window systematically over the image, applying the features as it is trying to detect the face. This is depicted by the green rectangles.
- Notice underneath the red boundary square, we see the classifier executing stages quickly discarding window frames that are clearly not a match (stages 1-25)
- To the right of the stage we see the how well it performed in identifying the face.
- Notice as it gets closer and closer to identifying the face, the number of stages increases into the 20s. (around the 1 minute mark). This demonstrates the cascading effect where the early stages are discarding the input as it has identified them as irrelevant. As it gets closer to finding a face it pays closer attention.
Let me know if you have any questions or have any comments below.
I want to make sure I got this post right. It will be critical that you understand this before we go into the next section where we will implement a full Custom Object Cascade detector.
I don't know about you, but I find the best way to understand something is by doing it. Conceptually we now have an idea for how the machine learning Cascade object detection works. Now lets build a real world custom Object Detector, train it, and see it in action. I have a really cool example for us! Click on the button below.
1. Wikipedia, Wikipedia. “AdaBoost.” Wikipedia, Wikimedia Foundation, 13 Jan. 2018, en.wikipedia.org/wiki/.
2. Docs, OpenCV. “Face Detection Using Cascades.” OpenCV: Face Detection Using Cascades, 4 Aug. 2017, docs.opencv.org/3.3.0/d7/d8b/tutorial_py_face_detection.html.
3. Wikipedia, Wikipedia. “Viola–Jones object detection framework.” Wikipedia, Wikimedia Foundation, 4 November. 2017, en.wikipedia.org/wiki/Viola–Jones_object_detection_framework.
4. Wikipedia, Wikipedia. “Cascading Classifiers.” Wikipedia, Wikimedia Foundation, 15 October. 2013, en.wikipedia.org/wiki/Cascading_classifiers.
5. Mathworks, Mathworks. “Train a Cascade Object Detector” Mathworks, 2017, www.mathworks.com/help/vision/ug/train-a-cascade-object-detector.html.