Machine Learning / Technology

Open Neural Network Exchange Brings Interoperability to Machine Learning Frameworks

9 Jul 2020 1:56pm, by

This post is the first in a series of introductory tutorials on the Open Neural Network Exchange (ONNX). Check back tomorrow for the second in this series.

The fields of machine learning and deep learning are becoming increasingly complex. The rise of diverse frameworks, toolkits, and custom hardware architectures are some of the reasons why building, deploying and managing deep learning models has become tough.

The choice of machine learning frameworks includes TensorFlow, PyTorch, Apache MXNet and Microsoft CNTK. To accelerate the training, hardware AI accelerators such as Graphics Processing Units (GPU), Field Programmable Gate Arrays (FPGA), Tensor Processing Units (TPU) and Application Specific Integrated Circuits (ASIC) are used. NVIDIA GPUs, Intel FPGAs and Google Cloud TPU are some of the examples of AI accelerators. Each of these accelerators come with custom drivers and libraries to interface with the machine learning programs. NVIDIA CUDA/cuDNN and Intel oneDNN are just two of the software stacks acting as interfaces to the underlying hardware accelerators.

Once the training is done and a model is evolved, it is consumed by different applications running in the public cloud, desktops, browsers, edge devices and mobile phones. To speed up inference, different AI accelerators are used. NVIDIA Jetson Family, Intel Movidius and Myriad VPU, Google Edge TPU and Qualcomm Adreno GPU are some of the examples of the hardware accelerators used at the edge. Like their training counterparts, inference accelerators come with their own software stack. NVIDIA JetPack, Intel OpenVINO Toolkit, Google TensorFlow Lite and Qualcomm Neural Processing SDK form the software stack for accelerating AI at the edge. Models that are trained in the cloud or in the data center need to be optimized to take advantage of the combination of the AI hardware and software.

The extreme disparity of hardware and software in the machine learning ecosystem introduces complexity and friction. We are dealing with two key challenges with the current approach to building and deploying ML models:

  • Lack of interoperability among models.
  • Lack of consistent runtime for training and inference.

For example, a complex model trained in TensorFlow cannot be easily used by a PyTorch developer for inference. She has to retrain the same model in PyTorch before using the model. Lack of model interoperability reduces the productivity of ML developers, forcing them to constantly reinvent the wheel.

A trained model cannot be instantly consumed by an application. It has to be optimized and converted to the target environment. For example, a TensorFlow model has to be exported to a TensorRT model to take advantage of NVIDIA GPUs. The same is the case with Intel Movidius and OpenVINO Toolkit. There is no consistent runtime layer that abstracts the combination of hardware and software of an AI accelerator.

Open Neural Network Exchange — A Standard for ML Interoperability

In 2017, AWS, Microsoft, and Facebook came together to launch the Open Neural Network Exchange (ONNX), which defines a standard for ML interoperability. ONNX has two components: a common set of operators and a common file format.

Operators are the building blocks of machine learning and deep learning models. By standardizing a common set of operators, ONNX makes it easy to consume deep learning models trained in any of the supported frameworks. It defines an extensible computation graph model, as well as definitions of built-in operators and standard data types.

The common file format of ONNX becomes the lowest common denominator to represent a model. Once a model is exported to ONNX, irrespective of the framework it is trained in, it exposes a standard graph and set of operators based on the specification. Every model is converted into a standard intermediate representation (IR) that is well-defined and well-documented.

By providing a common representation of the computation graph, ONNX helps developers choose the right framework for their task, allows authors to focus on innovative enhancements, and enables hardware vendors to streamline optimizations for their platforms.

ONNX comes with libraries that make it easy to convert the model into an ONNX format. For example, the below Python code shows how a model trained in Scikit-learn is converted into ONNX format.

The recent version of PyTorch includes an in-built ONNX exporter, making it easy to consume the model in other frameworks. ONNX supports both traditional machine learning models and deep learning models. ONNX-ML, an extension of ONNX is designed for exporting traditional machine learning models trained with frameworks such as Scikit-learn.

Last year, ONNX became a part of the AI Foundation, an umbrella foundation of the Linux Foundation that supports open source innovation in artificial intelligence, machine learning and deep learning.

ONNX Runtime

Microsoft, one of the co-founders of ONNX, has built and open-sourced the runtime. ONNX Runtime is a high-performance inference engine for machine learning models in the ONNX format on Linux, Windows, and Mac.

ONNX Runtime abstracts the underlying hardware by exposing a consistent interface for inference. It has backend support for NVIDIA TensorRT, NVIDIA JetPack, Intel OpenVINO Toolkit and other accelerators.

Microsoft is betting big on ONNX Runtime. For Windows ML, the machine learning component of Windows, Microsoft has chosen ONNX as the default runtime. According to Microsoft, internal teams are using ONNX Runtime to improve the scoring latency and efficiency for models used in core scenarios in Bing Search, Bing Ads, Office productivity services, and more.

Azure Custom Cognitive Services includes a mechanism to export the model directly into ONNX format, which can be deployed at edge devices.

ONNX Runtime is available as a Python library. It also supports other language bindings, including C# and Java.

Once ONNX Runtime is installed, it can be used to load any ONNX model for inference. The below code snippet is an extension of the Scikit-learn program shown in the previous section.

ONNX Runtime does what Java Virtual Machine (JVM) and Common Language Runtime (CLR) did to languages for deep learning frameworks. The intermediate representation of ONNX targets the same runtime, irrespective of the framework it is trained in. The ONNX Runtime has execution providers that take care of the optimization of the model for the target hardware environment.

For example, an ONNX model run on NVIDIA T4 GPU with CUDA/cuDNN automatically optimizes the model for TensorRT. Before that, the ONNX Runtime must be built and deployed with support for the TensorRT execution provider. The same is the case with Intel OpenVINO Toolkit and Android NNAPI.

The combination of ONNX and ONNX Runtime promise portability, interoperability, and optimization of deep learning models.

To make it easy to target the right execution provider, Microsoft has built Docker files and container images for a variety of environments.

Microsoft is heavily investing in ONNX Runtime. It plays an important role in products such as Windows, AzureML, Cognitive Services, IoT Edge, and Visual AI Developer Kit. Microsoft is also working towards making ONNX Runtime ideal for training models.

ONNX Ecosystem

Apart from AWS, Facebook, and Microsoft, there are over 30 companies participating in the ONNX community. The project is now a part of the Linux Foundation, where it has graduated recently.

The community has created an ONNX Model Zoo with popular neural network models, such as AlexNet, ResNet, MobileNet, VGG, GoogleNet, TinyYolo, and BERT. These models can be downloaded and used for inference along with ONNX Runtime. Each model comes with a model.onnx file and test data to evaluate the model.

Along with the libraries, runtime and the model zoo, the ONNX ecosystem has also built tools to visualize and explore the models. Netron can load an ONNX model and inspect the network structure. It’s extremely useful in exploring an ONNX model to understand the input layer, hidden layers, operators, data types and the output layer of a neural network graph.

VisualDL from Baidu is a deep learning visualization tool that can help design deep learning jobs. It was originally created to visualize deep learning models trained with the Paddle Paddle framework. It includes features such as scalar, parameter distribution, model structure and image visualization.

In the next part of the ONNX series, we will see how to use a pre-trained model from the model zoo for inference. Stay tuned!

Janakiram MSV’s Webinar series, “Machine Intelligence and Modern Infrastructure (MI2)” offers informative and insightful sessions covering cutting-edge technologies. Sign up for the upcoming MI2 webinar at http://mi2.live.

Feature image by Lesly Juarez on Unsplash.

At this time, The New Stack does not allow comments directly on this website. We invite all readers who wish to discuss a story to visit us on Twitter or Facebook. We also welcome your news tips and feedback via email: feedback@thenewstack.io.

A newsletter digest of the week’s most important stories & analyses.