Development / Edge / IoT / Machine Learning

Tutorial: Accelerate AI at Edge with ONNX Runtime and Intel Neural Compute Stick 2

31 Jul 2020 10:00am, by

This post is the fifth and the last in a series of introductory tutorials on the Open Neural Network Exchange (ONNX), an initiative from AWS, Microsoft, and Facebook to define a standard for interoperability across machine learning platforms. See: Part 1, Part 2, Part 3, and Part 4.

In the previous parts of this series, we have explored the concept of ONNX model format and runtime. In the last and final tutorial, I will walk you through the steps of accelerating an ONNX model on an edge device powered by Intel Movidius Neural Compute Stick (NCS) 2 and Intel’s Distribution of OpenVINO Toolkit. We will run the Tiny YOLO2 model first on the desktop based on CPU and then on an edge device with almost no change to the code.

Quick Recap — ONNX Runtime

Apart from bringing interoperability across deep learning frameworks, ONNX promises optimized execution of neural network graph depending on the availability of hardware. The ONNX Runtime abstracts various hardware architectures such as AMD64 CPU, ARM64 CPU, GPU, FPGA, and VPU.

For example, the same ONNX model can deliver better inference performance when it is run against a GPU backend without any optimization done to the model. This is possible due to the plugin model of ONNX that supports multiple execution providers.

A hint provided to ONNX Runtime just before creating the inference session translates to a considerable performance boost.

The below code snippet is an example of such an optimization hint for the ONNX Runtime to utilize an Intel Integrated Graphics backend.

When the same model is used in a smart camera powered by an Intel NCS device, the backend can be changed to target the MYRIAD Vision Processing Unit (VPU).

In the below sections, we will build a simple object detection system based on the popular Tiny YOLO v2 model. We will first run this on a PC to execute the model against a CPU backend before moving it to the edge device with a VPU.


To finish this tutorial, you need the following:

Setting up the Environment

Start by creating a Python virtual environment for the project.

Create a requirements.txt file with the required Python modules.

Since we are going to detect up to 20 objects, create a file called labels.txt with the below labels:

Finally, download the Tiny YOLO v2 model from the ONNX Model Zoo.

Object Detection with Tiny YOLO V2 on Desktop

We are now ready to code the inference program based on Tiny YOLO v2 and ONNX Runtime. Create a file, with the below code:

If you are familiar with OpenCV and basic Convolutional Neural Networks (CNN), the code is self-explanatory.

It does three things:

    1. Grabs the frame from the webcam
    2. Converts and preprocesses the frame as expected by the model
    3. Finally, it performs inference on the frame to detect objects that match the confidence level and pairs it with one of the labels from the CSV file

If you have multiple cameras attached to the machine, don’t forget to update the index appropriately by changing the value of cam variable.

Executing the code shows the objects it found along with the confidence score. Adjust the confidence threshold based on your requirement.

This scenario represents ONNX Runtime performing inference against a CPU backend. In the next step, we will port this code to run on an edge device powered by Intel NCS 2.

Object Detection with Tiny YOLO V2 at the Edge

Assuming you have an Ubuntu 18.04 machine connected to an Intel NCS 2 device running the latest version of Intel OpenVINO Toolkit, you are ready to execute the code at the edge. Otherwise, follow the steps to configure Intel NCS 2 and OpenVINO Toolkit as per the documentation.

If you have an Up Squared AI Vision X Kit, you can use it for this tutorial.

Even if you don’t install the entire OpenVINO Toolkit, ensure you install the Myriad rules drivers for NCS on the host machine according to the reference.

Microsoft has provided Docker images and Dockerfile for mainstream environments. Let’s start by downloading the container image for OpenVINO Toolkit with Myriad.

Create a directory, tinyyolo, on the Ubuntu machine and copy the files from your PC. Your directory should contain the below files:




Before we execute the code, let’s add a line that tells ONNX Runtime about the presence of the Intel NCS device.

Open and add the below line just before creating the inference session variable.

We are set to run the inference code within the Docker container based on the Myriad device.

Let’s launch the Docker container by mapping the /dev directory and mounting the tinyyolo directory. We also need to add the --privileged and --network host flags to provide appropriate permissions to access the camera and the NCS USB device.

While in the tinyyolo directory, execute the below command:

After getting into the shell, let’s move into the directory and install the prerequisites.

Execute the code to see the inference output in the terminal.

It may take a few minutes for the graph to get loaded and warmed up. You should now see the objects detected by the camera in the terminal.

This scenario can be easily extended to publish the inference output to an MQTT channel configured locally or in the cloud. Refer to my previous AIoT tutorial and a video demo of this use case.

Janakiram MSV’s Webinar series, “Machine Intelligence and Modern Infrastructure (MI2)” offers informative and insightful sessions covering cutting-edge technologies. Sign up for the upcoming MI2 webinar at

Feature Image by Robert Balog from Pixabay,

At this time, The New Stack does not allow comments directly on this website. We invite all readers who wish to discuss a story to visit us on Twitter or Facebook. We also welcome your news tips and feedback via email:

A newsletter digest of the week’s most important stories & analyses.