Face recognition with OpenCV: Haar Cascade

Computer vision is a field of study which aims at gaining a deep understanding from digital images or videos. Combined with AI and ML techniques, today many industries are investing in researches and solutions of computer vision. Think about the following example: many studies are being carried on to implement security cameras with object detection capabilities. Indeed, imagine a camera in a train station which, depending on the movement captured, is able to detect whether a fight is occurring: it could immediately send a signal to the closest policeman and prevent that fight from getting worse.

Object detection is a powerful instrument and, throughout this article, I’m going to explain the structure behind the algorithm we will employ, as well as provide a practical example (specifically with face detention). For this purpose, I will use OpenCV (Open Source Computer Vision Library) which is an open-source computer vision and machine learning software library and easy to import in Python. Particularly, I’m going to use the Haar Cascade algorithm.

Haar Cascade is a machine learning object detection algorithm proposed by Paul Viola and Michael Jones in their paper “Rapid Object Detection using a Boosted Cascade of Simple Features” in 2001. It is a machine learning based approach where a cascade function (I will explain this concept later on) is trained from a lot of positive and negative images (where positive images are those where the object to be detected is present, negative are those where it is not). It is then used to detect objects in other images. Luckily, OpenCV offers pre-trained Haar cascade algorithms, organized into categories (faces, eyes and so forth), depending on the images they have been trained on.

Now let’s see how this algorithm concretely works. The idea of Haar cascade is extracting features from images using a kind of ‘filter’, similar to the concept of the convolutional kernel (you can read more about convolutional kernel and CNN here ). These filters are called Haar features and look like that:

UVrYZjb.jpg!web

Source: https://opencv-python-tutroals.readthedocs.io/en/latest/py_tutorials/py_objdetect/py_face_detection/py_face_detection.html

The idea is passing these filters on the image, inspecting one portion (or window) at the time. Then, for each window, all the pixel intensities of, respectively, white and black portions are summed. Finally, the value obtained by subtracting those two summations is the value of the feature extracted. Ideally, a great value of a feature means it is relevant. Namely, if we consider the Edge feature (a) and apply it to the following B&W pic:

6jyYRrE.png!web

We will obtain a significant value, hence the algorithm will return an edge feature with high probability. Of course, the real intensities of pixels is never equal to white or black, and we will often face a similar situation:

Nevertheless, the idea remains the same: the higher the result (that is, the difference between black and white summations), the higher the probability of that window of being a relevant feature.

Now, imagine the huge amount of features returned by this computation. To give you an idea, even a 24x24 window results over 160000 features, and windows within an image are a lot . How to make this process more efficient? The solution came out with the concept of Summed-area table, also known as Integral Image. It is a data structure and algorithm for generating the sum of values in a rectangular subset of a grid. The goal is reducing the number of computations needed to obtain the summations of pixel intensities within a window. I won’t dive deeper into that topic in this article, however I provide a full explanation of Integral Image here .

Next step also involves efficiency and optimization. Besides being numerous, features might also be irrelevant. Among the features we obtain (that are more than 160000), how can we decide which ones are good? The answer to this question relies on the concept of Ensembilg method: by combining many algorithms, weak by definition, we can create a strong algorithm. This is accomplished using Adaboost which both selects the best features and trains the classifiers that use them. This algorithm constructs a “strong” classifier as a linear combination of weighted simple “weak” classifiers.

We are almost done. The last concept which needs to be introduced is a final element of optimization (in terms of the time of training). Indeed, even though we reduced our 160000+ features to a more manageable number, the latter is still high: applying all the features on all the windows will take a lot of time. That’s why we use the concept of Cascade of classifiers: instead of applying all the features on a window, it groups the features into different stages of classifiers and applies one-by-one. If a window fails (translated: the difference between white and black summations is low) the first stage (which normally includes few features), the algorithm discards it: it won’t consider remaining features on it. If it passes, the algorithm applies the second stage of features and continues the process.

Great, now that the concept of Haar Cascade is clearer, let’s dive into some lines of code using Python and the mentioned library OpenCV:

import numpy as np
import cv2face_cascade = cv2.CascadeClassifier("haarcascade_frontalface_default.xml")
eye_cascade = cv2.CascadeClassifier("haarcascade_eye.xml")img = cv2.imread("image.jpg")
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)faces = face_cascade.detectMultiScale(gray, 1.3, 5)
for (x,y,w,h) in faces:
    img = cv2.rectangle(img,(x,y),(x+w,y+h),(255,0,0),2)
    roi_gray = gray[y:y+h, x:x+w]
    roi_color = img[y:y+h, x:x+w]
    eyes = eye_cascade.detectMultiScale(roi_gray)
    for (ex,ey,ew,eh) in eyes:
        cv2.rectangle(roi_color,(ex,ey),(ex+ew,ey+eh),(0,255,0),2)cv2.imshow('img',img)
cv2.waitKey(0)
cv2.destroyAllWindows()

fAV3Mju.png!web

As you can see, our algorithm worked pretty well! If you explore the whole library of Haar algorithms, you will see that there are specific models trained on different features of the human physical aspect, hence you can improve your model by adding more features detection.

If you are interested in further readings, I strongly recommend you to read the OpenCV documentation here .

Recommend

手写mybatis彻底搞懂框架原理

浅谈Fastjson RCE漏洞的绕过史

Next.js Practical Introduction: Pages and Layout

Debugging CSS Grid with Firefox Dev Tools

Go 编程：那些隐晦的操作符

Avoid Vendor Lock-in by Embracing Open Source

软件开发|使用 Python 和 Scribus 创建一个 RGB 立方体

狗池历险记（九）

Tensorflow 2.0 Inference on Google Cloud Functions

How I ruined my application performances by using React context instead of Redux

About Joyk