Testing nebullvm, the open-source AI accelerator, on TensorFlow, Pytorch and Hugging Face

Photo by CHUTTERSNAP on Unsplash

You may have read on some blogs about nebullvm, the open-source library that optimizes AI models to make them faster in inference.

The real question is, does nebullvm really achieve what it claims to do? Does it really reach ~10x inference acceleration of deep learning models by just adding a few lines to your code?

Let’s test its full potential.

Intro about the open-source library

Nebullvm takes an AI model as input and accelerates it by leveraging a technology called deep learning compilers. The library outputs an optimized version of the model that performs inference 5–20 times faster, where acceleration depends strongly on the input model and the hardware on which the optimization is performed.

As stated on github, the goal of nebullvm is to let any developer benefit from deep learning (DL) compilers without having to spend tons of hours understanding, installing, testing and debugging this powerful technology.

Testing nebullvm on your models

We built 3 notebooks where the library can be tested on the most popular AI frameworks Tensorflow, Pytorch and Hugging Face.

The notebooks will run locally on your hardware so you can get an idea of the performance you would achieve with nebullvm on your AI models.

Note that it may take a few minutes to install the library the first time, as the library also installs the deep learning compilers responsible for optimization.

Benchmarks

We have tested nebullvm on popular AI models and hardware from leading vendors.

Hardware: M1 Pro, NVIDIA T4, Intel Xeon, AMD EPYC
AI Models: EfficientNet, Resnet, SqueezeNet, BERT, GPT2

The table below shows the response time in milliseconds (ms) of the non-optimized model and the optimized model for the various model-hardware couplings as an average value over 100 experiments. It also displays the speedup provided by nebullvm, where speedup is defined as the response time of the optimized model over the response time of the non-optimized model.

Benchmarking of nebullvm on eficientNet, Resnet, SqueezeNet, BERT, GPT2. Source: Image by the author

At first glance, we can observe that speedup varies greatly across hardware-model couplings. Overall, the library provides great positive results, most ranging from 2 to 30+ times speedup.

To summarize, the results are:

Nebullvm provides positive acceleration to non-optimized AI models
These early results show poorer (yet positive) performance on Hugging Face models. Support for Hugging Face has just been released and improvements will be included in future versions
The library provides a ~2–3x boost on Intel and AMD hardware. These results are most likely related to an already highly optimized implementation of PyTorch for x86 devices
Nebullvm delivers extremely good performance on NVIDIA machines
The library provides great performances also on Apple M1 chips And across all scenarios, nebullvm can result very useful for its ease of use, allowing you to take advantage of deep learning compilers without having to spend hours studying, testing and debugging this technology.

Remarks on pre-optimized models

Nebullvm is benchmarked against models that have not been optimized with some other accelerator. If you are already using a deep learning compiler on your models such as Apache TVM, TensorRT, or OpenVINO, it is likely that you will not get 5–20x speedup with nebullvm over your pre-optimized model.

Even in this case, nebullvm could be of great help to you for its ease of use.

More about nebullvm

Full documentation on nebullvm is provided on github. The main contributor to the library is Diego Fiori, with support from the github community.

The library quickly grew to 800+ github stars in just the first month after launch, and aims to continuously expand in performance and coverage. As mentioned on github, the library aims to become:

Deep learning model agnostic. Nebullvm supports all the most popular architectures such as transformers, LSTMs, CNNs and FCNs.
Hardware agnostic. The library now works on most CPUs and GPUs and will soon support TPUs and other deep learning-specific ASICs.
Framework agnostic. Nebullvm supports the most widely used frameworks (PyTorch, TensorFlow and Hugging Face) and will soon support many more.
Secure. Everything runs locally on your machine.
Easy-to-use. It takes a few lines of code to install the library and optimize your models.
Leveraging the best deep learning compilers. There are tons of DL compilers that optimize the way your AI models run on your hardware. It would take tons of hours for a developer to install and test them at every model deployment. The library does it for you!

Testing nebullvm, the open-source AI accelerator, on TensorFlow, Pytorch and Hug...

Testing nebullvm, the open-source AI accelerator, on TensorFlow, Pytorch and Hugging Face

Intro about the open-source library

Testing nebullvm on your models

Benchmarks

Remarks on pre-optimized models

More about nebullvm

Recommend

Microsoft will let IT admins send Windows 11 desktop or taskbar messages

全球数据中心可持续性状况

Part-1 Blazor WebAssembly Cookie Authentication[.NET 6]

Panel: Living on the Edge

GitHub can now auto-block commits containing API keys, auth tokens

These are the words Amazon’s planned employee chat app reportedly won’t let you...

Google Business Profile review management tool expands support to those with man...

Apple TV+ dark comedy 'Physical' season two premieres June 3

TikTok Adds New 'Background Player' Option for Live-Streams

PIPEFAIL: How a missing shell option slowed Cloudflare down

About Joyk