

Multi-threaded Camera Caffe Inferencing
source link: https://jkjung-avt.github.io/camera-caffe-threaded/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Multi-threaded Camera Caffe Inferencing
Jun 14, 2018
2019-05-16 update: I just added the Installing and Testing SSD Caffe on Jetson Nano post. If you are testing SSD/caffe on a Jetson Nano, or on a Jetson TX2 / AGX Xavier with JetPack-4.2, do check out the new post.
Quick link: tegra-cam-caffe-threaded.py
A while ago, I wrote the post Capture Camera Video and Do Caffe Inferencing with Python on Jetson TX2. I was subsequently asked whether I could post another example code to do camera capturing and caffe inferencing in 2 different threads. I did spend some time and develop an python script accordingly. As usual, I shared the code on my Gist repository (refer to the “quick link” above).
I think the code in this new tegra-cam-caffe-threaded.py
script is mostly straightforward. However, there are some design considerations worth mentioning. And thus I decided to write a post about it.
Design consideration #1: how to divide the work between the 2 threads
I have prior experience in multi-threaded python scripts which do both camera image capturing and caffe inferencing. What I’ve found is that I had to initialize caffe and do caffe inferencing in the same thread, otherwise caffe inferencing would not behave properly (more specifically, caffe.set_mode_gpu()
would not work and caffe kept running very slowly in CPU mode). So when I wrote this tegra-cam-caffe-threaded.py
code, I decided to only move camera image capturing part to a sub-thread, and let the main thread do all rest of the work, including caffe initialization, inferencing and image rendering.
Here’s the code snippet for initiating the sub-thread to do camera image capturing and for terminating it when done.
import threading
#
# This 'grab_img' function is designed to be run in the sub-thread.
# Once started, this thread continues to grab new image and put it
# into the global IMG_HANDLE, until THREAD_RUNNING is set to False.
#
def grab_img(cap):
global THREAD_RUNNING
global IMG_HANDLE
while THREAD_RUNNING:
_, IMG_HANDLE = cap.read()
def main():
......
# Start the sub-thread, which is responsible for grabbing images
THREAD_RUNNING = True
th = threading.Thread(target=grab_img, args=(cap,))
th.start()
......
# Terminate the sub-thread
THREAD_RUNNING = False
th.join()
Design consideration #2: synchronization between the 2 threads
Our multi-threaded camera caffe code actually fits into the classical producer-consumer model, as illustrated in the diagram below (courtesy of howtodoinjava.com).

The camera image capturing thread would act as the producer, while the main (caffe inferencing) thread the consumer. In such a producer-consumer model, we would normally design a queue to decouple the production and consumption of items, i.e. captured image frames in our case. This way we would not need to worry about matching the rate of the production and consumption. We could simply monitor fullness of the queue to decide whether we need to drop items or throttle the consumer.
In our case, I thought the producer (camera capturing, at 30 fps) was likely running faster than the consumer (caffe inferencing, for which the rate would depend on how complicated the caffe model was). So I didn’t really need to implement the queue in-between. Instead, I only needed to keep track of the latest image frame produced by the camera capturing thread. By taking advantage of the garbage collector in python, I didn’t even need to use a mutex to protect the reference to the kept (latest) frame.
I used a single global variable IMG_HANDLE
to reference the image frame. This IMG_HANLDE
gets updated every time the producer (camera capturing thread) gets a new frame from the camera. On the other hand, whenever the consumer (caffe inferencing thread) is ready to process the next image frame, it just dereferences IMG_HANDLE
and thus always gets the latest image frame. I think this is what happens when we run the code.

So, what happens to frames #2, #4, and #5, when they get discarded? You might ask.
I think they get garbage-collected by python since there is no reference to them any more in the program. In fact, frames #1, #3 and #6 also get garbage-collected once the caffe inferencing thread finishes processing them (and no longer keeps any reference to them).
How to run the code
Please refer to my previous post Capture Camera Video and Do Caffe Inferencing with Python on Jetson TX2. Make sure all “Prerequisite” has been done on the target JTX2 platform. Then run the code exactly the same way as the old tegra-cam-caffe.py
script.
$ python3 tegra-cam-caffe-threaded.py --usb --vid 1
Discussion and conclusion
Let’s consider one question. Does this multi-thread design help to improve throughput of our caffe inferencing script? That is, would we be able to inference more frames per second (fps) with this design (maybe because the main thread does not need to block for the next camera frame to arrive)?
My answer is “most likely not”. Say, in the original single-threaded design, assuming the camera image capturing part (the producer) generates image frames faster than the caffe inferencing part (the consumer) processes them. Then the cap.read()
calls would always return immediately (without any blocking), since there are always image frames ready for processing. More specifically, the old image frame gets queued in either V4L2 driver buffers or gstreamer/opencv stack and gets returned by cap.read()
immediately. And that old image frame is likely not the latest frame grabbed by the camera…
So, what’s the real benefit of making this multi-threaded design?
In our tegra-cam-caffe-threaded.py
, we only keep “the latest one” image frame in the global variable IMG_HANDLE
. So the caffe inferencing (main) thread always gets the latest grabbed image frame for processing. In conclusion, I’d say this multi-threaded design helps to improve latency of the caffe inferencing program.
Recommend
-
11
pmem.ioPersistent Memory Programming Challenges of multi-threaded transactions Our library cur...
-
14
Asynchronous Multi-Threaded Parallel World of SwiftOver the last 15 years, CPU clock rate has plateaued due to thermal limitations in processors. As a result, CPU manufactures have instead chosen to add more cores or processing units...
-
10
Don't setenv in multi-threaded code on glibc Remember in 2014, when I wrote about getting stuck if you called basically anything between fork and exec if you have thread...
-
8
Multi-threaded unprotected accesses to static variables I think I've found yet another way to make things go really loopy in multi-threaded programs. Consider some C++ code which looks like this: static long...
-
11
Multi Threaded Debugging Hell Aug 21, 2016 As of late I've been dealing a lot with categorization issues and specifically with respect to failures in a categorization engine that I've written. What I was confronted wit...
-
7
How to Capture Camera Video and Do Caffe Inferencing with Python on Jetson TX2 Oct 27, 2017 Quick link: tegra-cam-caffe.py
-
6
Define a web 4.0 app to be multi threadedTake your skills to an entirely new level by learning to create a visually stunning and blazing fast next generation web app using web workers.The app as well as yo...
-
9
Clio: extremely fast, multi-threaded code on the browser Sep 20 ・7 min read
-
3
Multi-threaded SQLite without the OperationalErrors January 30, 2017 22:36 / peewee
-
6
[Last Week in .NET #85] – Multi-threaded Boards I asked the .NET Foundation when they were going to update us on their...
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK