Building a Deep Learning Person Classifier

Accurately identify images of people with and without faces

Introduction

In machine learning and statistics , classification is the problem of identifying to which of a set of categories (sub-populations) a new observation belongs, on the basis of a training set of data containing observations (or instances) whose category membership is known [1].

“Shallow” learning techniques such as Support Vector Machines (SVM) can produce effective classifiers from modest sized data sets and more recently deep learning classifiers have rivaled humans in identification tasks but require much larger data sets and large computing resources to achieve this. In particular deep learning techniques have produced Convolutional Neural Networks (CNN) that are state of the art image classifiers.

This article will show you how to develop a CNN-based person classifier using TensorFlow that can outperform standard face recognition techniques in certain situations. The standard methods typically include the use of shallow learners like SVM to classify patterns generated by facial embedding (see for example the superb face_recognition program.) The standard techniques are extremely good at recognition when most of the face is visible. However my application needed a way to accurately classify people with only partial views of a face and ideally from behind. This provided the motivation to develop a person classifier using deep learning and my goal was to achieve the same recognition accuracy as a human when looking at an image of a person they know with or without a face.

The following steps are required to build an accurate CNN-based person classifier that identifies which of a set of known and unknown people a new observation belongs. These steps are also applicable to training other types of CNN-based classifiers.

Collect images of each person you want to classify as well as of strangers to create your training set and preprocess it . You should use data augmentation methods that automatically generate related, but new, images from the collected data to increase the size of the data set. Your data set will likely be unbalanced meaning that it doesn’t contain the same number of observations for each class. You will need to compensate for this otherwise your model will overemphasize classes with more observations and its accuracy will suffer. Lastly, your data set needs to be converted into a data format suitable for the training process. You should plan for a training set of at least a 1000 observations for each class. Note that the terms observations and samples are used interchangeably in this article.
Select CNN that has already been trained on a standard data set and use it as a layer in a new model that will become your classifier. The standard data set should contain classes similar to what you want to classify. In this case, ImageNet is good choice since “person” (in the generic sense) is already a class its been trained to recognize. There are many CNN models available from TensorFlow with varying complexity — accuracy trade-offs. You should choose the least complex model that achieves your application’s inference accuracy requirements. Of course, you can fine-tune on any class similar to the ImageNet classes, not just people.
Apply transfer learning on your CNN using your image data . Deep neural networks are prone to overfitting and you will learn several mitigation techniques to combat it. Note that transfer learning is also known as fine-tuning and training a model is also known as fitting a model to a data set. These terms are used interchangeably in this article.
Evaluate the Fine-Tuned Model . You should check the accuracy of your fine-tuned model to see if it meets your application’s requirements and if not, refit it with more data or more optimal training parameters (“hyperparameters”).
Save your model and prepare it for inference . SavedModel is the canonical way to save and serve a TensorFlow program but you may also want to generate a TensorFlow Lite representation of it for deployment on a mobile or edge device. Both are covered below.

Please note that I used an Intel i5 + Nvidia 1080Ti machine with 64GB of main memory to train my models. You will need at least a similar machine to feasibly train a deep learning model.

Also note that much of this work was done for the smart-zoneminder project and you can leverage that work. If you just want to quickly fine-tune a TensorFlow CNN without diving into the details, install the machine learning platform on a suitable machine, prepare your data set as described below and run the Python program train.py with options that suit your configuration. The rest of this article will breakdown train.py section by section to help you understand how it works to use it better and to perhaps modify it for your own use.

Data Set Preparation

Create a sub-directory for each person’s images that you want recognized, named for the person, in a directory called “dataset” (I used my Google photos to seed this directory). Also create a sub-directory called ‘Unknown’ that will hold faces of random strangers. In each sub-directory you can optionally place another sub-directory that can hold images of the person without a full or partial view of their face. You can decide to include such images as a member of their class or as the Unknown class. The dataset directory will look like the following.

dataset
 |-name_of_person_1
 |-sample_image_1
 |-sample_image_2
 |-sample_image_n
 |-images_with_no_face
 |-sample_image_1
 |-sample_image_2
 |-sample_image_n
 |-name_of_person_2
 |-name_of_person_n
 |-Unknown

The function below to creates a Python dataframe from the contents of dataset.

Creating a dataframe from the Data Set

The dataframe is input to the flow_from_dataframe method of the tf.keras.preprocessing.image.ImageDataGenerator class. The ImageDataGenerator generates batches of tensor image data with real-time data augmentation and creates training and validation sets. The flow_from_dataframe methods creates a Python generator for each set. This code is shown below.

Creating Training and Validation Generators

Although it is possible to use train_generator and validation_generator directly to fit the CNN model, using instead a Dataset object from tf.data.Dataset will give you better fitting performance . In train.py, this is done as follows.

Creating a tf,data Dataset

Since it likely you will have more sample images in some classes than others, you will need to convey the class weighting to the model fitting process. The code to do that is shown below along with the number of steps the fitting process should use for training and validation.

Determine Class Weights and Fitting Steps

The data is now ready to be used to fit a model.

Model Preparation

The function below creates a CNN model from an ImageNet pre-trained tf.keras.applications model. Although the function will create a model based on VGG16 ,InceptionResNetV2 ,MobileNetV2 and ResNet50, only InceptionResNetV2 is shown. The final softmax layer from the base model is removed, new dense classifier and softmax layers are added and then its complied. Pay particular attention to the hyperparameter constants since they will likely need to be adjusted to suit your data set but the default values give me good results. The function includes various over-fitting mitigation techniques including L2 Regularization , Label Smoothing and D ropout . It also selects an appropriate tf.keras preprocessor that formats the data samples correctly.

Python Function to Create a CNN Model

The model is now ready to be fine-tuned.

Model Fine-Tuning

Fine-tuning is done in two passes. The first pass uses a high learning rate and trains just the newly added dense and softmax layers since they were initialized with random data. The second pass uses a much smaller learning rate and trains the final layers along with much of the base model. The two pass scheme along with regularization is designed to preserve as much as possible the original weights from the base model while still learning the patterns from the new data set. Both passes use early stopping based on validation loss which is another way to mitigate overfitting.

The first pass fine-tuning code is shown below. Again, note that only the new layers are being fitted.

Pass 1 of Fine-Tuning

The second pass of the fine-tuning is shown below. Note that some of the bottom base model’s layers are frozen and won’t be trained in this step, but all others will be. The number of layers frozen is a hyperparameter and is a balance between preserving the layers that were trained on high-level features from ImageNet while allowing the top layers to learn the new features in your data set. The more layers you unfreeze the more data you will need to prevent overfitting, despite all the mitigations employed.

Pass 2 of Fine-Tuning

The model is now fine-tuned and will be evaluated for accuracy and saved for inference in the next steps.

Final Model Evaluation

Your model should be evaluated to determine if its accuracy meets your application’s requirement. Generating a sklearn classification report from its predictions is a way to do this as shown in the code below.

Generating a Classification Report

An example classification report for one of my runs is shown below. Although I’m pleased that the model is biased toward it creator ;), more work it required to raise the accuracy on the other classes though additional observations in the data set and hyperparameter optimization.

Example Classification Report

Currently my best results are with the InceptionResNetV2 base model which achieves an overall accuracy of about 92% My goal is > that 95% which is my subjective estimate of what a normal person could achieve on images with and without faces. For comparison, using the standard approach mentioned above yields an accuracy of about 90% for those images with a full or mostly full view of the face and the accuracy on the same data set, with full, partial and no faces, is less than 70%.

Saving the Model for Inference

train.py includes options for saving the model in various ways — a frozen TensorFlow program (deprecated starting in TensorFlow 2), an eight-bit quantized TensorFlow Lite program optimized for inference on IoT devices, a compiled TensorFlow Lite program for Google’s Edge TPU and the TensorFlow SavedModel format (the canonical method starting in TensorFlow 2). The code below shows how train.py saves to the SavedModel format.

Saving Final Model to the SavedModel Format

The final model is now ready for inference via its SavedModel or one of the other representations. For additional details on saving and using a model at the edge, see Using Keras On Google’s Edge TPU .

Conclusion

You can leverage TensorFlow to quickly fine-tune a CNN classifier with modest sized data sets as compared to the original data set that was used to train the base model, as shown in the example above for the person classifier. The Python program train.py from the smart-zoneminder project can be used to build your own classifier as is or as a reference for you to develop your own program.

Fitting deep learning models to limited data can be tricky since they can quickly memorize the data set and won’t generalize well to new observations, this is known as overfitting. To mitigate this you can use the following techniques.

Gathering more observations.
Data augmentation.
Choosing the least complex model that meets your application’s accuracy requirements.
L2 Regularization.
Dropout.
Label Smoothing.
Preventing many base model layers from being trained (“freezing”).
Early stopping of the training process based on validation loss.

You will also most likely have to deal with an unequal number of samples for each class in your data set (e.g., an unbalanced set) which will bias the model’s accuracy. To compensate you’ll need to use class weighting in the model fitting process.

The steps to build a CNN classifier are as follows.

Collect and preprocess your data set.
Select an appropriate CNN base model.
Fine-tune the model.
Evaluate your fine-tuned model’s accuracy.
Prepare your final model for inference and save it.

References

[1] Statistical classification , from Wikipedia, The Free Encyclopedia.

Building a Deep Learning Person Classifier