In the previous part of this series, I introduced Nvidia DIGITS as a user-friendly interface to build Deep Learning models. In this tutorial, I will walk you through the steps involved in building a Convolutional Neural Network that can classify images. We will use the popular cat versus dog dataset to train our model. By the end of this tutorial, you will learn everything it takes to train the model with DIGITS and using it for inferencing.
You need a Linux machine with a GPU to build the model. For inferencing, any OS that can run Docker is sufficient.
Preparing the Environment
Let’s start by downloading the dataset on the host machine on which you will run Nvidia DIGITS container. The dataset is similar to the one used in Kaggle Dogs versus Cats competition. It contains about 4,000 images of cats and dogs which is just enough for the tutorial. The images are already resized to 256×256 to skip preprocessing.
Since the dataset is too big for Github, I uploaded it to Google Drive. Download the file and unzip it in a folder. You should see two directories, train and test. The first directory contains 2,500 images of each category while the test directory has about 1,500 images.
Let’s launch Nvidia DIGITS Docker container by mapping the folder that contains the train and test directories. The dataset would become available at /data directory within the container.
$ docker run --runtime=nvidia --name digits -d -p 5000:5000 -v $PWD:/data nvidia/digits
Training the Convolutional Neural Network
With the container up and running, we can access the web UI at port 5000.
Click on the Datasets tab, and choose Classification.
Point DIGITS to the train and test directories. Give a name to the dataset and click Create.
Within a few minutes, DIGITS will parse the directories to create three databases — train, val, and test.
Go back to the home page and select the Models tab. Under Images drop-down, select Classification.
Choose the dataset created in the previous step. Under Standard Networks, make sure you select AlexNet. Give a name to the model and click Create.
The above step kicks off the training job, which will take a few minutes to complete. You will notice that the accuracy increases with each epoch. Also, the learning rate gets adjusted with each epoch.
By the time it hits the 30th epoch, the model reaches an accuracy of 85 percent.
Let’s test our trained model with a cat and dog image from the web. Under Trained Models, browse for an image, and click on Classify One.
Our model should accurately classify dogs vs. cats.
Congratulations! You have successfully built a CNN model without writing a single line of code.
In the next step, we will use the trained model for inferencing.
Model Inferencing with Caffe
Let’s run the trained model on our local machine. Click on the Download Model button, which will start downloading the compressed model along with the weights.
Clone this repo which contains Python code for classification along with the shell script to run the Caffe Docker container for inference. It also contains a few sample images for testing the model.
Uncompress the downloaded model and copy all the files from the inference directory of the cloned repo. Your directory should look similar to the below screenshot.
Set the environment variable to the Caffe model file, and then run infer.sh script to perform inference.
$ export MODEL_NAME=snapshot_iter_1080.caffemodel
$ bash infer.sh images/1.jpg
This command will pull the Caffe Docker image for the CPU on the local machine.
$ bash infer.sh images/1.jpg
Unable to find image 'bvlc/caffe:cpu' locally
cpu: Pulling from bvlc/caffe
22dc81ace0ea: Pull complete
1a8b3c87dba3: Pull complete
91390a1c435a: Pull complete
07844b14977e: Pull complete
b78396653dae: Pull complete
efebd366640a: Pull complete
4e325e9a951a: Pull complete
384c8a5cd8c4: Pull complete
0df2c13a8aa1: Pull complete
41474c5b537e: Pull complete
Status: Downloaded newer image for bvlc/caffe:cpu
Processed 1/1 images in 0.344489 seconds ...
------------------------- Prediction for images/1.jpg --------------------------
78.1073% - "dog"
21.8927% - "cat"
Script took 1.915341 seconds.
If you try sending a cat image, you would see the below output:
$ bash infer.sh images/3.jpg
Processed 1/1 images in 0.329652 seconds ...
------------------------- Prediction for images/3.jpg --------------------------
90.1773% - "cat"
9.8227% - "dog"
Script took 1.849655 seconds.
Let’s take a closer look at the infer.sh
docker run -it --rm --name caffe -v $PWD:/infer bvlc/caffe:cpu bash -c "cd /infer && python -W ignore classify.py -m mean.binaryproto -l labels.txt $MODEL_NAME deploy.prototxt $1 --nogpu"
The script essentially maps the current directory with the model and weights inside the Caffe container. It then invokes classify.py with appropriate parameters such as the model name, weights, labels, and the image to be classified.
Since the container has the Caffe framework and all other dependencies, it can execute classify.py to run inference.
This tutorial covered the workflow involved in training a model through Nvidia DIGITS running on a Linux machine backed by GPU and using the same model on a Mac or Windows machine for inference.