

From zero to Real-Time Hand Keypoints detection in five months with OpenCV, Tens...
source link: https://www.tuicool.com/articles/bQRVfm3
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

In this article, I will show you step by step, how to build your own real time hand keypoints detector with OpenCV, Tensorflow and Fastai (Python 3.7). I will be focusing on the challenges I faced when building it during a fascinating 5 months intensive journey.
You can see the models in action here:

Motivation :
It all started with this incredible obsession to understand the dynamics at the heart of Artificial Intelligence. Five months ago, i googled “AI vs Machine learning vs Deep learning” in my first attempt to grasp the nuances between the different concepts :blush:.
After reviewing multiple videos and articles, I decided to start with computer vision by developing my own hand key points detector using a mobile camera.
Knowing that the human brain requires only 20 watts to operate, my aim was and would always be to keep things simple and downsize the computational requirements of any model, wherever possible. Complicated things require complex calculus which itself is highly energy intensive.
Few words about my learning curve:
I have a civil engineering academic background with some visual basic coding skills. I have worked in the field of finance since graduation.
Very uncommon, I started my journey by learning Javascript ( ex1 , ex2 ). That helped me to understand the « general logic » behind the code and was certainly useful when I later, started learning Python & Django.
After three and a half months into intensive coding, I started the Andrew Ng machine learning course while reading hundreds and hundreds of articles. It was important to understand all the mechanics under the hood by building my own artificial neural network from scratch and coding propagation and back-propagation.
The pipeline:
My process of detecting hand keypoints with a camera follows the following architecture :

⁃ The image is grabbed by the camera;
⁃ A first deep learning model detects the hand on the image and estimates the coordinates of the box around it (done by retraining tensorflow object detection API model on hand detection, you could also achieve it by building a customized deep learning model);
⁃ A second deep learning regression model takes the image inside the box and estimates the coordinates of all hand keypoints (achieved by transfer learning from a resnet34 with a customised head).
Hand detection :
For this part, I decided to retrain a tensorflow’s object detection model (trained on COCO dataset) on hand dataset. I picked MobileNet_v2 for speed.
I won’t cover this part in detail. You can find many tutorials from public sources .
In case you are using Open Image dataset, I have written a customized script to convert the data to the required format:

It took me about 6 hours to retrain the model.
Keypoints detection:
I tried different approaches before sticking with Fastai:
1- I first tried to use Keras and Tensorflow, but faced at an early stage, the challenge of data augmentation. I had no choice but to implement my own data augmentation with Python using Tensorpack (a low level api), which was quite complicated due to the amount of transformations I had to perform (zooming, cropping, stretching, lightning and rotating) … and due to the fact that all the image transformations had to be impacted on the coordinates which are stored in Json or Csv formats.
2- The second approach, was to draw the location of the coordinates associated with each hand on a grayscale image (see below mask for illustration) and using DataImageGenerator from Keras to perform data augmentation on both images and their corresponding masks. The model performed well as far as the metrics (loss and accuracy) showed, but the predictions were chaotic. I couldn’t figure out what was wrong and moved to a different approach. Keras is a great API but was difficult to debug in my case.

3- The next move proved to be successful. After reading about Fastai, I decided to give it a try. The first advantage of Fastai resides in the fact that you can debug all your code. The second advantage is that coordinates augmentation is part of the library core development.
I followed the first lesson tutorial to get used to it and started immediately implementing my code on a Jupyter notebook.
The most interesting thing about Fastai and Pytorch is that the whole code sums up to the following script (easy, right :blush:!):


After performing “learn.lr_find()” and “learn.recorder.plot()”, to determine the optimal learning rate, I ran the code for 3 days in total over different cycles (on a CPU!).
The last cycle “learn.fit_one_cycle(36,slice(6e-3))” ended up with following results:

For making predictions, use one of the following codes :
img = im.open_image(‘path_to/hand_image.png’)
preds = learn.predict(img)
img.show(y=preds[0])
or :
img = im.open_image(‘path_to/hand_image.png’)
preds = learn.predict(img)
preds= preds[1]+torch.ones(21,2) # denormalizing
preds=torch.mm(preds,torch.tensor([[img.size[0]/2,0],
[0,img.size[1]/2]],dtype=torch.float))
preds = ImagePoints(FlowField(img.size, preds))
img.show(y=preds)
Inference and visualization:
The model is exported for inference with learn.export(). You should note that Fastai failed at exporting the Reshape function and the custom loss class. These should be incorporated to you script before evoking the model for inference.
To draw the keypoints, you need to add the following to your visualization code:
First:
learn = load_learner(‘path_to_export.pkl’) # load your inference model saved previously with learn.export()
Then :

Where do I go from here?
1- I would like to develop an equity trading model using deep learning. I developed few quant models in the past and they were verbose and complicated to implement. Now I am very curious to see how markets look like through DL.
2- Also, i would like to drop some funny end to end ios app at the intersection of computer vision and augmented reality.
Thank you for your interest.
If you’ve got any questions, feel free to email me at [email protected] or join me on linkedin .
Recommend
-
75
Swift has been with us for a while now, and through its iterations, it has brought to us all the features of a modern object-oriented programming language. These include optionals, generics, tuples, structs that support m...
-
24
Multi-Object-Tracker Object detection using deep learning and multi-object tracking YOLO
-
55
One of the Machine Learning challenges I’m facing at my new company.
-
35
Construct a CNN model to detect if a person is wearing a face mask or not with your webcam or mobile camera.
-
26
Deep learning | OpenCV Early Fire detection system using deep learning and OpenCV Creating customized InceptionV3 and CNN architectures for indoor and outdoor fire detection. ...
-
7
Face Detection using OpenCV in Python I’ve recently been on a journey to learn more...
-
12
Cross compiling OpenCV 4 for Raspberry Pi Zero Posted on August 7, 2019 by Paul In this article I will show you how to cross compile the latest version of OpenCV for Raspberry Pi Zer...
-
6
KAPAO (Keypoints and Poses as Objects) KAPAO is an efficient single-stage multi-person human pose estimation model that models keypoints and poses as o
-
12
OpenCV Python Program For Vehicle Detection in a Video FrameLoaded: 0.55%00:01Remaining Time -18:08OpenCV Python Program For Vehicle Dete...
-
7
Orientation Keypoints for 6D Human Pose Estimation Marker-Free Motion Capture Martin Fisch1,2 Ronald Clark1 1Im...
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK