2

DeciNets AI models arrive with Intel CPU optimization

 2 years ago
source link: https://linuxgizmos.com/decinets-ai-models-arrive-with-intel-cpu-optimization/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

DeciNets AI models arrive with Intel CPU optimization

Feb 16, 2022 — by Eric Brown

164 views

deci_decinets_xaviernx-thm.jpgDeci unveiled AutoNAC generated “DeciNets” models for Intel Cascade Lake CPUs claimed to be much faster and more accurate than other image classification models for CPUs. Meanwhile, Aaeon announced that the Hailo-8 NPU is available on its UP boards.

Last July, Deci announced its DeciNets family of pre-trained image classification models, which are generated from the Israel-based company’s proprietary Automated Neural Architecture Construction (AutoNAC) technology. Today, Deci said that the pre-trained DeciNets are now available for Intel Cascade Lake processors, such as 2nd Gen Xeon Scalable CPUs. The DeciNets, running on Intel’s Cascade Lake, “deliver more than 2x improvement in runtime, coupled with improved accuracy, as compared to the most powerful models publicly available such as EfficientNets, developed by Google,” claims Deci.

In other AI-related news, Aaeon announced it was making Hailo’s up to 26-TOPS Hailo-8 M.2 accelerator card available on the UP Squared Pro, UP Squared 6000, and UP Xtreme i11 SBCs (see farther below).

DesiNets neural net accuracy/latency tradeoffs on Jetson Xavier NX GPU vs. EfficientNet, ResNet, MobileNet, etc.
Source: Deci
(click image to enlarge)

DeciNets can run on any type of major processor, including GPUs and FPGAs. The technology has already been optimized for Nvidia Jetson Xavier NX GPUs for edge AI (see diagram above), as well as Nvidia T4 GPUs for the cloud.

— ADVERTISEMENT —

With the Intel collaboration and the Cascade Lake support, Deci is focusing on the relatively untapped marked for image classification on CPUs. CPUs tend to be cheaper and more prevalent than GPUs. They also perform a variety of tasks compared to GPUs, which are often used for headless tasks in which the GPU is used only for AI. Yet, GPUs continue to rule in AI because deep learning models typically perform 3x to 10x times slower on a CPU than on a GPU, says Deci.

DeciNets “closes the gap significantly between GPU and CPU performance for CNNs,” enabling tasks carried out on a CPU because they were previously too resource intensive, says Deci. With DeciNets, “the gap between a model’s inference performance on a GPU versus a CPU is cut in half, without sacrificing the model’s accuracy.”

Deep learning can be accelerated on many levels, starting with instructions within the processors, such as the Deep Learning Boost instructions built into Cascade Lake, Tiger Lake, and Alder Lake CPUs. Other solutions include hardware accelerator add-ons such Hailo-8, Intel’s Myriad X, or Google’s Edge TPU NPUs. Runtimes and compilers like Nvidia’s Tensor RT or Intel’s OpenVino, which are built into Deci’s AutoNAC, also play a role, but one of the more important factors is the convolutional neural network (CNN) deep learning model.

AutoNAC

A deep learning model’s performance is linked to the neural architecture that was used to develop it. Deci’s AutoNAC, which competes with similar technologies such as Google’s Neural Architecture Search (NAS) technology, is an algorithmic acceleration technology that is hardware-aware and works on top of other optimization techniques.

AutoNAC contains a NAS component that revises a given trained model to “optimally speed up its runtime by as much as 15x, while preserving the model’s baseline accuracy,” says Deci. Intel has previously collaborated with Intel on using AutoNAC to accelerate the inference speed of ResNet50 neural networks running on Intel CPUs, “reducing the submitted models’ latency by a factor of up to 11.8x and increasing throughput by up to 11x,” says Deci.

The AutoNAC optimization process allows each of the DeciNets to be optimized for a specific application’s target inference hardware. This process requires rather extensive pre-training time, with customer input, using Deci’s Infery and RTiC deployment tools. Yet, as noted in a ZDNet story on DeciNets, which includes an interview with Yonatan Geifman, co-founder and CEO of Deci, the extra training results in reduced operation costs and latency and improved performance.

Multiple DeciNets can be modified to offer different tradeoffs in latency vs. cost of operation. Although the initial focus for the DeciNets Cascade Lake project is on the datacenter, the technology’s hardware-specific approach should also help get the most out of constrained edge AI hardware, says the story.

“As deep learning practitioners, our goal is not only to find the most accurate models, but to uncover the most resource-efficient models which work seamlessly in production — this combination of effectiveness and accuracy constitutes the ‘holy grail’ of deep learning,” stated Deci CEO Geifman. “AutoNAC creates the best computer vision models to date, and now, the new class of DeciNets can be applied and effectively run AI applications on CPUs.”

Hailo-8 now available on UP boards

Aaeon offers its own mini-PCIe and M.2 modules equipped with Intel’s Movidius Myriad X deep learning accelerators for its community-backed, Intel-based “UP Bridge the Gap” boards. These include the UP AI Core XM 2280 module with 2x Myriad-X VPUs, which is available as an option on the Apollo Lake based UP Squared Pro, the Elkhart Lake powered UP Squared 6000, and 11th Gen Tiger Lake-U driven UP Xtreme x11. Now these same three SBCs are the first UP boards to offer an option for Hailo’s up to 26-TOPS Hailo-8 M.2 AI Acceleration Module.

Hailo claims the 3-TOPs per Watt Hailo-8 NPU supplied by the $199 M.2 module vastly outperforms Google’s Edge TPU and Intel’s Movidius Myriad X on a TOPS per watt basis running AI semantic segmentation and object detection applications including ResNet50. The Hailo-8 on UP deployments require Ubuntu.

The Hailo-8 is available on a growing list of Linux-driven embedded systems, most recently on Axiomtek’s Arm-based RSC100 edge AI system. Most of the other systems also use Arm processors, although Kontron is offering Hailo-8 to Linux customers of its 8th Gen Whiskey Lake based KBox A-150-WKL system.

Further information

DeciNets models are available now for Cascade Lake with Community, Professional, and Enterprise platform pricing schemes. More information may be found on Deci’s DeciNets page.

More on the Hailo-8 on UP board option may be found on the Hailo-8 on UP GitHub page. The option is available on the supported UP shopping pages as a $199 accessory.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK