There was a time when Graphics Processing Units (GPU) were confined to high-end video adapters powering PC gaming or graphics rendering. An average computer user or even a programmer didn’t care much about GPUs. Thanks to the rise of ML and AI, GPUs are hot. From public cloud vendors to academic research labs, GPUs have become an essential part of computing.
NVIDIA, the largest makers of GPU, enjoys a monopoly in the market. Their GPUs come with software drivers and computing toolkits that can run computing jobs in parallel. CUDA and cuDNN are popular toolkits among AI researchers and enterprise data scientists.
Machine learning algorithms are nothing but mathematical and statistical equations that are adapted to solve business problems. Training a machine learning model involves running complex mathematical equations in parallel by replacing multiple variables of an equation. When the magic combination of variables is discovered, the model is said to be fully trained.
GPUs come with hundreds of cores that can parallelly run these complex mathematical equations. While CPUs are equally faster, they are general purpose processors that are not designed for massive parallelization. GPUs complement CPUs by taking over the mathematical computation part. Many workloads including machine learning, High-Performance Computing (HPC), graphics rendering, game development rely on GPUs. The increasing popularity of ML and AI technologies have made GPUs as pervasive as CPUs.
Machine learning models have two phases — training and inference.
In supervised machine learning, training involves feeding a variety of combination of variables to complex equations until the prediction comes closer to the actual value. This technique is used for solving simple linear regression problems to complex deep learning algorithms dealing with computer vision and natural language processing.
Once a model is fully trained, it is ready to predict or classify unseen data points. Inferencing is a technique of utilizing a fully-trained model for prediction. Though inferencing is not as intense as training, it still uses complex mathematical equations for generating the expected outcome. For example, in computer vision, an input image is instantly converted into a massive array of pixels that is fed to a model for inference. This kicks off a complex computational task that deals with millions of equations and variables.
Most of the trained machine learning models run in offline mode at the edge computing layer. These edge devices need accelerators to speed up inferencing. If the inference is run only on a CPU, it may not result in a faster response. In scenarios such as face detection and object detection, users expect the prediction or classification to happen in milliseconds. The classic example of this use case is Face ID authentication in iPhone. When the camera captures a matching face, the phone is unlocked immediately. There is a GPU in iPhone that is accelerating the computation of a machine learning model. This mechanism of running a fully trained machine learning model in end-user devices is called as inferencing.
Edge computing is going to drive the adoption of specialized AI chips and accelerators.
Realizing the need for a complementary processor for the CPU at the edge, chip manufacturers such as Intel and Qualcomm are investing in specialized processors that accelerate ML inferencing. These chips are highly customized for specific use cases including computer vision and natural language processing.
We will look at three specific implementations of specialized chips in computer vision.
Intel Movidius Neural Compute Stick
In 2016, Intel acquired Movidius, a niche chipmaker that made computer vision processors used in drones and virtual reality devices. The flagship product of Movidius was Myraid, a chip that is purpose-built for processing images and video streams. It is positioned as a VPU — Vision Processing Unit — for its ability to deal with computer vision.
After acquiring Movidius, Intel has packaged Myraid 2 in a USB thumb drive form factor, which is sold as a Neural Compute Stick (NCS). The best thing about NCS is that it works with both x86 and ARM devices. It can be easily plugged into an Intel NUC or a Raspberry Pi for running inferencing. It draws power from the host device without the need of an external power supply.
Machine learning models built on Caffe or TensorFlow can be easily ported to NCS. Intel ships an SDK and tools that help you profile, tune, and deploy existing models on Movidius. The SDK has many samples based on popular neural network architectures such as AlexNet, ImageNet, MobileNet, and Inception.
When connected to a Raspberry Pi equipped with a camera module, Intel Movidius NCS can perform object detection in just a few milliseconds.
You can buy Intel Movidius NCS device from Amazon which costs $80.
Horned Sungem AI
Horned Sungem is a Chinese company specializing in AI. The AI development board from Horned Sungem is built for developers, students, hobbyists, and enthusiasts to create their own AI applications with ease.
The device has a USB-C connector that can be plugged into a Raspberry Pi or any other computing device. It has native support for the Raspberry Pi Camera connected through the CSI interface.
According to the manufacturer, its chip has a unique and integrated front-end design, which is able to achieve high performance at low power consumption (<3W). HS has its own development toolkit based on Python which supports most platforms such as MacOS, Linux, Android, Windows (to be released soon).
HS comes with fully trained ML models out of the box. Developers can easily get started with the toolkit which has no dependency on deep learning frameworks or complex libraries. After running a short installation script, the device will be ready to identify a variety of objects.
Interestingly, HS is based on Intel Movidius MA245X VPU chips that power many devices in production.
Horned Sungem AI is available for US$129.
Google AIY Vision Kit
Though it may look like a hobby kit based on Google Cardboard project, the AIY Vision Kit packs a punch. It comes with everything needed to build full-blown computer vision applications on a tiny device — Raspberry Pi Zero.
Google has partnered with Intel to build a custom board called the Vision Bonnet. It’s not surprising to see that the board is powered by an Intel Movidius VPU.
The AIY Vision Kit comes with a Raspberry Pi Camera module that is directly connected to the Vision Bonnet. This avoids the latency involved in forwarding the image frame to the VPU for processing.
Google has completely rewritten the Movidius SDK, which makes it much simpler to run TensorFlow models for inference. The Python SDK has API for programming the buttons and LEDs that come with the kit.
The kit ships with half-a-dozen models that are ready for inference. Developers can easily consume these models to develop custom applications.
Google AIY Vision Kit is available from Target for US$89.99
Even though AI and edge computing are in their infancy, they are expected to pick up momentum. GPUs in the public cloud and specialized AI chips at the edge are going to drive the next wave of computing.