25

Real-Time Head Pose Estimation in Python

 3 years ago
source link: https://mc.ai/real-time-head-pose-estimation-in-python-2/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Requirements

For this project, we need OpenCV and Tensorflow so let’s install them.

#Using pip
pip install opencv-python
pip install tensorflow#Using conda
conda install -c conda-forge opencv
conda install -c conda-forge tensorflow

Face Detection

Our first step is to find the faces in the images on which we can find facial landmarks. For this task, we will be using a Caffe model of OpenCV’s DNN module. If you are wondering how it fares against other models like Haar Cascades or Dlib’s frontal face detector or you want to know more about it in-depth then you can refer to this article:

You can download the required models from my GitHub repository .

import cv2
import numpy as npmodelFile = "models/res10_300x300_ssd_iter_140000.caffemodel"
configFile = "models/deploy.prototxt.txt"
net = cv2.dnn.readNetFromCaffe(configFile, modelFile)img = cv2.imread('test.jpg')
h, w = img.shape[:2]
blob = cv2.dnn.blobFromImage(cv2.resize(img, (300, 300)), 1.0,
(300, 300), (104.0, 117.0, 123.0))
net.setInput(blob)
faces = net.forward()#to draw faces on image
for i in range(faces.shape[2]):
        confidence = faces[0, 0, i, 2]
        if confidence > 0.5:
            box = faces[0, 0, i, 3:7] * np.array([w, h, w, h])
            (x, y, x1, y1) = box.astype("int")
            cv2.rectangle(img, (x, y), (x1, y1), (0, 0, 255), 2)

Load the network using cv2.dnn.readNetFromCaffe and pass the model’s layers and weights as its arguments. It performs best on images resized to 300×300.

Facial Landmark Detection

The most commonly used one is Dlib’s facial landmark detection which gives us 68 landmarks, however, it does not give good accuracy. Instead, we will be using a facial landmark detector provided by Yin Guobing in this Github repo . It also gives 68 landmarks and it is a Tensorflow CNN trained on 5 datasets! The pre-trained model can be found here . The author has only written a series of posts explaining the includes background, dataset, preprocessing, model architecture, training, and deployment that can be found here . I have provided a very brief summary here, but I would strongly encourage you to read them.

In the first of those series, he describes the problem of stability of facial landmarks in videos followed by labeling out the existing solutions like OpenFace and Dlib’s facial landmark detection along with the datasets available. The third article is all about data preprocessing and making it ready to use. In the next two articles, the work is to extract the faces and apply facial landmarks on it to make it ready to train a CNN and store them as TFRecord files. In the sixth article, a model is trained using Tensorflow. In this article, we can see how important loss functions are in training as first he used tf.losses.mean_pairwise_squared_error which uses the relationships between points as the basis for optimization when minimizing loss and could not generalize well. In contrast, when tf.losses.mean_squared_error was used it worked well. In the final article, the model is exported as an API and shown how to use it in Python.

The model takes square boxes of size 128×128 which contain faces and return 68 facial landmarks. The code provided below is taken from here and it can also be used to draw 3D annotation boxes on it. The code is modified to draw facial landmarks on all the faces, unlike the original code which would draw on only one.

This code will draw facial landmarks on the faces.

Drawing facial landmarks

Using the draw_annotation_box() we can also draw the annotation box as shown below.

With annotation box

Pose Estimation

This is a great article on Learn OpenCV which explains head pose detection on images with a lot of Maths about converting the points to 3D space and using cv2.solvePnP to find rotational and translational vectors. A quick read-through of that article will be great to understand the intrinsic working and hence I will write about it only in brief here.

We need six points of the face i.e. is nose tip, chin, extreme left and right points of lips, and the left corner of the left eye and right corner of the right eye. We take standard 3D coordinates of these facial landmarks and try to estimate the rational and translational vectors at the nose tip. Now, for an accurate estimate, we need to intrinsic parameters of the camera like focal length, optical center, and radial distortion parameters. We can estimate the former two and assume the last one is not present to make our work easier. After obtaining the required vectors we can project those 3D points on a 2D surface that is our image.

If we only use the code available and find the angle with the x-axis we can obtain the result shown below.

Result

It works great for recording the head moving up and down but not moving left or right. So how to do that? Well, above we had seen an annotation box on the face. If we could utilize it somehow to measure the left and right movements.

With annotation box

We can find the line in the middle of the two dark blue lines to act as our pointer and find the angle with the y-axis to find the angle of movement.

Result

Combining both of them we can get the result in which direction we want. The complete code can also be found here at my GitHub repository along with various other sub-models for an online proctoring solution.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK