A Primer on Nvidia-Docker — Where Containers Meet GPUs
GPUs are critical for training deep learning models and neural networks. Though it may not be needed for simple models based on linear regression and logistic regression, complex models designed around convolutional neural networks (CNNs) and recurrent neural networks heavily rely on GPUs. Especially computer vision-related models based on frameworks such as Caffe2 and TensorFlow have a dependency on GPU.
In supervised machine learning, a set of features and labels are used to train a model. Deep learning algorithms don’t even need explicit features to evolve trained models. They pretty much “learn” from existing datasets designated for training, testing, and evaluation.
Neural networks perform complex computation on tens of thousands of matrices before the final model is evolved. When an image is fed to a CNN, it gets translated into a matrix of real numbers. Depending on the density and size of the image, multiple such matrices are generated by the neural network. These matrices are added and multiplied with other matrices during the forward propagation and backward propagation till appropriate weights are derived.
A trained model can be run on CPUs for inference. Since inference is not as intense as training, GPUs are strictly optional when running models for inference.
CPUs are not designed to deal with such a rapid rate of computation. While they are faster for performing regular number crunching, CPUs are not meant for parallelizing mathematical operations. That’s where GPU plays a crucial role. They may not have the horsepower of CPUs, but they can perform massively parallelized calculations.
Nvidia CUDA and cuDNN
Traditional programs cannot access GPUs directly. They need a special parallel programming interface to move computations to GPU. Nvidia, the most popular graphics card manufacturer, has created Compute Unified Device Architecture (CUDA), as a parallel computing platform and programming model for general computing on GPUs. With CUDA, developers will be able to dramatically speed up computing applications by harnessing the power of GPUs.
In GPU-enabled applications, the sequential part of the workload continues to run on the CPU — which is optimized for single-threaded performance — while the parallelized compute intensive part of the application is offloaded to run on thousands of GPU cores in parallel. To integrate CUDA, developers program in popular languages such as C, C++, Fortran, Python and MATLAB by expressing parallelism through extensions in the form of a few basic keywords.
Deep learning networks and neural networks heavily rely on CUDA for parallel computation. Nvidia has also developed a specialized library called CUDA Deep Neural Network library (cuDNN), which is a GPU-accelerated library of primitives for deep neural networks. cuDNN provides highly tuned implementations for standard routines such as forward and backward convolution, pooling, normalization, and activation layers.
Deep learning researchers and framework developers worldwide rely on cuDNN for high-performance GPU acceleration. It allows them to focus on training neural networks and developing software applications rather than spending time on low-level GPU performance tuning. cuDNN accelerates widely used deep learning frameworks, including Caffe2, MATLAB, Microsoft Cognitive Toolkit, TensorFlow, Theano, and PyTorch.
Nvidia-Docker — Bridging the Gap Between Containers and GPU
In 2016, Nvidia created a runtime for Docker called Nvidia-Docker. The goal of this open source the project was to bring the ease and agility of containers to CUDA programming model.
Since Docker didn’t support GPUs natively, this project instantly became a hit with the CUDA community. Nvidia-Docker is basically a wrapper around the docker CLI that transparently provisions a container with the necessary dependencies to execute code on the GPU. It is only necessary when using Nvidia-Docker run to execute a container that uses GPUs.
You can run Nvidia-Docker on Linux machines that have a GPU along with the required drivers installed. Refer to the readme on Nvidia-Docker’s Github repository for details on the prerequisites.
Below are the commands I used to install and verify Nvidia-Docker on an Ubuntu 16.04 machine powered by a Nvidia Quadro P4000 GPU.
$ curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
$ curl -s -L https://nvidia.github.io/nvidia-docker/ubuntu16.04/amd64/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
$ sudo apt-get update
$ sudo apt-get install -y nvidia-docker
$ sudo pkill -SIGHUP dockerd # Restart Docker Engine
$ sudo nvidia-docker run --rm nvidia/cuda nvidia-smi
The nvidia-smi command runs the systems management interface (SMI) to confirm that the Docker container is able to access the GPU. Behind the scenes, SMI talks to the Nvidia driver to talk to the GPU.
We can also verify that CUDA is installed by running the below command.
$ sudo nvidia-docker run --rm nvidia/cuda nvcc -V
Data scientists and developers can use this Docker image as the base for running popular neural network frameworks such as TensorFlow, Apache MXNext, Microsoft CNTK, and Caffe2.
How about Kubernetes?
Google Kubernetes Engine (GKE) has added beta support for GPUs. It is one of the first managed Kubernetes services to add support for Nvidia Tesla K80 and Tesla P100 GPUs. This addition opens up interesting avenues for developers and data scientists. We can now train deep learning models and neural networks in Kubernetes. It’s not just machine learning, any parallelizable workload that needs access to GPUs such as video rendering or game development can take advantage of this feature.
As a deep learning enthusiast with a passion for containers, I was very excited to see GPU support for Kubernetes. I ported a few Caffe models to GKE to test the waters. The overall experience has been pretty smooth for a beta release.
In the upcoming multipart tutorial, I will walk you through all the steps required to create a GPU-based Kubernetes cluster, training a convolutional neural network (CNN), and running the training model for inference in the development machine that doesn’t have a GPU. Though the tutorial doesn’t delve into the concepts of deep learning and CNNs, it does give you everything you need to jumpstart model training on Kubernetes.