

Measuring Caffe Model Inference Speed on Jetson TX2
source link: https://jkjung-avt.github.io/caffe-time/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Measuring Caffe Model Inference Speed on Jetson TX2
Feb 27, 2018
When deploying Caffe models onto embedded platforms such as Jetson TX2, inference speed of the caffe models is an essential factor to consider. I think the best way to verify whether a Caffe model runs fast enough is to do measurement on the target platform.
Here’s how I measure caffe model inference time on Jetson TX2.
Prerequisite:
- Build and install Caffe on the target Jetson TX2. Reference: How to Install Caffe and PyCaffe on Jetson TX2
- Prepare
deploy.prototxt
for the Caffe models to be measured
In the following examples I used my own fork of ssd-caffe.
Reference:
- Check out the official Caffe ‘Interfaces’ documentation for descriptions about the
caffe time
command.
Step-by-step:
Assuming a version of Caffe has been built at ~/project/ssd-caffe, we would use the built caffe
executable to measure inference time of the models.
Important: During measurement caffe
would use whatever input batch size as specified in the deploy.prototxt
. When you compare inference speed of 2 different Caffe models, if input batch size is set differently between the 2, you would not be making fair comparisons.
For practical purposes I care most about inference time of batch size 1 (inferencing only 1 single image each time). So when measuring, I would set input batch size to 1 for all models being compared.
Take AlexNet for example. First make a copy of its deploy.prototxt.
$ cp ~/project/ssd-caffe/models/bvlc_alexnet/deploy.prototxt /tmp/alexnet_deploy.prototxt
### Set TX2 to max performance mode before measuring
$ sudo nvpmodel -m 0
$ sudo ~/jetson_clocks.sh
### Modify input batch size as described below
$ vim /tmp/alexnet_deploy.prototxt
Then modeify line #6 of the prototxt to specify batch size as 1.
- input_param { shape: { dim: 10 dim: 3 dim: 227 dim: 227 } }
+ input_param { shape: { dim: 1 dim: 3 dim: 227 dim: 227 } }
Run the caffe time
command.
$ cd ~/project/ssd-caffe
$ ./build/tools/caffe time -gpu 0 -model /tmp/alexnet_deploy.prototxt
I0228 11:53:37.071836 7979 caffe.cpp:343] Use GPU with device ID 0
I0228 11:53:37.616500 7979 net.cpp:58] Initializing net from parameters:
......
0228 11:53:41.861127 7979 caffe.cpp:412] Average Forward pass: 12.9396 ms.
I0228 11:53:41.861150 7979 caffe.cpp:414] Average Backward pass: 35.2972 ms.
I0228 11:53:41.861168 7979 caffe.cpp:416] Average Forward-Backward: 48.4081 ms.
I0228 11:53:41.861196 7979 caffe.cpp:418] Total Time: 2420.4 ms.
So we get inference time (forward pass only) of bvlc_alexnet on JTX2 is about 12.9396 ms.
Next, repeat the measurement for bvlc_googlenet (set input batch size to 1 as well). And the result is 24.6415 ms.
$ ./build/tools/caffe time -gpu 0 -model /tmp/googlenet_deplay.prototxt
I0228 12:00:19.444232 8129 caffe.cpp:343] Use GPU with device ID 0
I0228 12:00:19.983999 8129 net.cpp:58] Initializing net from parameters:
......
I0228 12:00:25.924129 8129 caffe.cpp:412] Average Forward pass: 24.6415 ms.
I0228 12:00:25.924151 8129 caffe.cpp:414] Average Backward pass: 41.9625 ms.
I0228 12:00:25.924170 8129 caffe.cpp:416] Average Forward-Backward: 66.9036 ms.
I0228 12:00:25.924201 8129 caffe.cpp:418] Total Time: 3345.18 ms.
I also downloaded VGG16 and ResNet-50 from links on Caffe Model Zoo and did the measurements. Here are all the results.
Model Inference TIme bvlc_alexnet 12.9396 ms bvlc_googlenet 24.6415 ms VGG16 91.82 ms ResNet-50 64.0829 ms
A big take-away for myself by doing these measurements is that bvlc_googlenet, having similar classification accuracy as VGG16, actually runs much faster than VGG16 on JTX2. So it could be a better (speedier) CNN feature extractor for the object detection models such as Faster R-CNN, YOLO, and SSD.
Recommend
-
9
Installing and Testing SSD Caffe on Jetson Nano May 16, 2019 Quick link: jkjung-avt/jetson_nano In this post, I’m documenting how I install and test SSD caf...
-
10
Deploying the Hand Detector onto Jetson TX2 Sep 25, 2018 Quick link: jkjung-avt/tf_trt_models In previous posts, I’ve shared how to apply TF-TRT to optimi...
-
13
How I built TensorFlow 1.8.0 on Jetson TX2 Get bazel. I tested the latest version (0.17.1) of bazel and it was no good. So I downloaded and used bazel 0.15.2 instead. $ cd ~/Downloads $ wge...
-
10
TensorFlow/TensorRT Models on Jetson TX2 Sep 14, 2018 2019-05-20 update: I just added the Running TensorRT Optimized GoogLeNet on Jetson Nano
-
6
YOLOv3 on Jetson TX2 Mar 27, 2018 2020-01-03 update: I just created a TensorRT YOLOv3 demo which should run faster than the original dark...
-
9
Building and Testing 'openalpr' on Jetson TX2 Mar 9, 2018 I read about openalpr a while ago. Recently...
-
5
Faster R-CNN on Jetson TX2 Feb 12, 2018 2018-03-30 update: I’ve written a subsequent post about how to build a Faster RCNN model which runs twice as fast as the original VGG16 based model:
-
18
Single Shot MultiBox Detector (SSD) on Jetson TX2 Nov 30, 2017 2019-05-16 update: I just added the Installing and Testing SSD Caffe on Jetson Nan...
-
7
How to Capture Camera Video and Do Caffe Inferencing with Python on Jetson TX2 Oct 27, 2017 Quick link: tegra-cam-caffe.py
-
10
How to Install Caffe and PyCaffe on Jetson TX2 Aug 8, 2017 2019-05-16 update: I just added the Installing and Testing SSD Caffe on Jetson Nano
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK