Tutorial: Accelerate Deep Learning Models with Intel Movidius

Intel Movidius Neural Compute Stick accelerates machine learning inferencing at the edge. I covered the details of this device last week. In this tutorial, we will take an existing Caffe deep learning model and optimize it for Intel Movidius.
This guide is based on Intel Movidius NCS 1 and NCSDK 2. The most recent version of the device uses Intel OpenVINO Toolkit which is not compatible with the previous versions of the SDK. But Intel is still selling NCS 1 devices while actively maintaining the SDK.
Apart from the NCS1 device, you need an Ubuntu 16.04 PC with a free USB 3 port. You can use VirtualBox or VMware Fusion/Workstation to set up and configure the SDK. Optionally, a Raspberry Pi 3 device may be used to run the optimized graph.
Installing Intel Neural Compute SDK 2
Let’s start by confirming that the Intel Movidius NCS USB device is recognized. Running lsusb should show a device with ID 03e7:2150.
We will now install the prerequisites – Python3, Pip, and Git packages on Ubuntu.
0 1 2 3 |
sudo apt-get upgrade sudo apt install python3 sudo apt install python3-pip sudo apt install git-all |
Clone the NCSDK 2 Github repository and build the SDK.
0 1 2 |
git clone -b ncsdk2 http://github.com/Movidius/ncsdk cd ncsdk make install |
It may take a few minutes for the setup to complete. After it is done, verify that the SDK is properly installed by running the hello_ncs_py sample.
0 1 |
cd ~/ncsdk/examples/apps/hello_ncs_py make run |
The above output confirms that the SDK is able to access the Intel Movidius NCS device.
Generating the Graph from Caffe Deep Learning Model
In one of the previous tutorials, I used NVIDIA DIGITS to build a Convolutional Neural Network (CNN) that classifies images. We will use the fully-trained model from that demo to classify the images of dogs and cats.
Download the trained Caffe model from the below links:
0 1 2 |
mkdir cat-dog && cd cat-dog wget https://www.dropbox.com/s/vxyby375e82vq1b/cat-dog.caffemodel?dl=0 -o cat-dog.caffemodel wget https://www.dropbox.com/s/byvf1d4ul09ujn9/deploy.prototxt?dl=0 -o deploy.prototxt |
Before we use the model for inference, we need to generate a graph optimized for Intel Movidius. For this, we will use mvNCCompile, one of the command tools available in the NC SDK. This tool takes the trained model as an input and generates the required graph.
0 |
mvNCCompile deploy.prototxt -w cat-dog.caffemodel -s 12 -is 227 227 |
The first two parameters point to the Caffe model while -s 12 denotes that we are using 12 SHAVE Cores for the graph. The last parameter is the size of image, which is 227X227.
You should now find two new files – graph and output_expected.npy which can be loaded onto the NCS device for inference.
It’s time for us write Python code that loads the graph and does inference.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 |
import os import sys import glob import numpy import ntpath import argparse import skimage.io import skimage.transform import sys import mvnc.mvncapi as mvnc GRAPH="graph" SIZE=[227,227] devices = mvnc.enumerate_devices() if len( devices ) == 0: print( "No devices found" ) quit() device = mvnc.Device( devices[0] ) device.open() with open(GRAPH, mode='rb' ) as f: blob = f.read() # Load the graph buffer into the NCS graph = mvnc.Graph(GRAPH) # Set up fifos fifo_in, fifo_out = graph.allocate_with_fifos( device, blob ) img = skimage.io.imread( sys.argv[1] ) img = skimage.transform.resize( img, SIZE, preserve_range=True, mode='constant' ) labels =[ line.rstrip('\n') for line in open( "./labels.txt" ) if line != 'classes\n'] print( "\n==============================================================" ) # Load the image as an array graph.queue_inference_with_fifo_elem( fifo_in, fifo_out, img.astype(numpy.float32), None ) # Get the results from NCS output, userobj = fifo_out.read_elem() # Get execution time inference_time = graph.get_option( mvnc.GraphOption.RO_TIME_TAKEN ) # Find the index of highest confidence top_prediction = output.argmax() # Print top predictions for each image print( "Predicted " + sys.argv[1] + " as " + labels[top_prediction] + " in %.2f ms" % ( numpy.sum( inference_time ) ) + " with %3.1f%%" % (100.0 * output[top_prediction] ) + " confidence." ) print( "==============================================================\n" ) fifo_in.destroy() fifo_out.destroy() graph.destroy() device.close() device.destroy() |
Let’s call the above file as run.py. Before we pass sample images to test the graph, we need to create a labels file with just two entries in separate lines.
0 1 |
echo cat > labels.txt echo dog >> labels.txt |
You may want to download samples preprocessed images from this link.
Invoke the graph through the below command:
0 |
python3 run.py images/3.jpg |
In one of my upcoming tutorials, I will demonstrate how to use Intel Movidius NCS 2 with OpenVINO Toolkit.
Janakiram MSV’s Webinar series, “Machine Intelligence and Modern Infrastructure (MI2)” offers informative and insightful sessions covering cutting-edge technologies. Sign up for the upcoming MI2 webinar for a deep dive on accelerating machine learning inference with Intel Movidius.
Feature image by F. Muhammad from Pixabay.