Train and Deploy TensorFlow Models Optimized for Google Edge TPU

Edge computing devices are becoming the logical destination to run deep learning models. While the public cloud is the preferred environment for training, it is the edge that runs the models for inferencing. Since most of the edge devices have constraints in the form of available CPU and GPU resources, there are purpose-built AI chips designed to accelerate the inferencing. These AI accelerators complement the CPU by speeding up the calculations involved in inferencing. They are designed to optimize the forward propagation of neural networks deployed on the edge.
Google Edge TPU is one of the AI accelerators in the market that’s highly optimized for running TensorFlow models in inferencing mode. Developers can get started with Edge TPU through the Google Coral Dev Kit and Coral USB Accelerator. For details on the configuration and specs of these two devices, refer to my previous article.
Typical TensorFlow models trained on CPUs, TPUs, and TPUs cannot be directly deployed on Edge TPU. The models need to be converted and optimized before they can take full advantage of the acceleration provided by Edge TPU.
TensorFlow developers are familiar with TensorFlow Lite, a toolkit to convert and run TensorFlow models on mobile, embedded, and IoT devices. TensorFlow Lite is exclusively designed to run models efficiently on mobile and embedded devices with resource constraints. The workflow of converting an existing TensorFlow model to TF Lite includes using the Python SDK or the CLI. Models from SavedModel directories, frozen graph, Keras HDF5 formats can be easily converted into TF Lite through the SDK or the CLI. Converted models can be deployed and run on mobile or embedded devices that have the TF Lite interpreter.
The TensorFlow Lite interpreter is a library that takes a model file, executes the operations it defines on input data and provides access to the output. This interpreter works across multiple platforms and provides a simple API for running TensorFlow Lite models from Java, Swift, Objective-C, C++, and Python.
Unfortunately, Edge TPU doesn’t support TF Lite models for inference. Developers are expected to further optimize a TF Lite model for the Edge TPU. Google has shipped a command-line tool to convert and optimize TensorFlow models for Edge TPU.
TensorFlow supports a model optimization technique called quantization, which is required by the Edge TPU. Quantizing a model essentially means converting all the 32-bit floating-point numbers (such as weights and activation outputs) to the nearest 8-bit fixed-point numbers. This technique makes the model smaller and faster. Even though this results in reduced precision, the model still maintains an acceptable level of accuracy.
Since the Edge TPU executes deep feed-forward neural networks, it supports only TensorFlow Lite models that are fully 8-bit quantized and then compiled specifically for the Edge TPU.
This article discusses two specific mechanisms to train, optimize and deploy TensorFlow models on Edge TPU.
Google Cloud Auto ML Vision
AutoML attempts to accelerate the process of training a model by automating the majority of the steps. Users are expected to upload the datasets and wait for the predictions to become available. From feature engineering to hyperparameter tuning, AutoML tackles the most complex steps of the pipeline.
Google is one of the first to offer AutoML for vision computing. Cloud AutoML Vision accesses image datasets uploaded to Cloud Storage buckets and trains a model that’s ready for inference. The trained model can be hosted in the cloud for online predictions or deploy it on the edge for offline inferencing. For a detailed guide on using Cloud AutoML Vision, follow the tutorial I published earlier.
Cloud AutoML Vision can generate standard TensorFlow Lite models that can be deployed on mobile devices including the Raspberry Pi. It also supports generating a model highly optimized for the Edge TPU.
Once the model is trained, it can be further optimized for the target inference environment. The below screenshot shows supported devices.
The service uploads the Edge TPU-optimized TF Lite model to a Cloud Storage bucket that can be downloaded and deployed for inference.
The other option is to train a standard TF Lite model and then compiling it for Edge TPU which is discussed in the next section.
The best thing about using Cloud AutoML Vision service is the no-code approach to training and generating models. Depending on the size of the dataset, the model would be ready in a couple of hours. Google has done a fantastic job of connecting AutoML Vision service with Edge TPU.
Transfer Learning Combined with Edge TPU SDK and CLI
The other technique is to use the powerful transfer learning process to train the model and optimizing it for the Edge TPU. This approach gives better control of tuning the neural network architecture and the hyperparameters.
Training a convolutional neural network from the ground up can take days or even weeks of computing time and demands large amounts of training data. But transfer learning allows developers to start with an existing neural network architecture that’s already trained for a similar task and then performing further training to teach the model to classify new data points using a smaller training dataset. We can do this by retraining the whole model (adjusting the weights across the whole network), but we can also achieve very accurate results by simply removing the final layer that performs classification, and training a new layer on top that to recognize the new classes.
For example, we can take an existing MobileNet model and reuse it to train a model that classifies a dog image into one of the breeds. Since 90% of the neural network architecture remains the same, the training will take just a few minutes.
Once the model is generated from the transfer learning-based training, it can be easily converted into a TF Lite model optimized for the Edge TPU.Google has made popular models optimized for the Edge TPU available for download. These models can be used as the baseline for performing transfer learning. Some of these models can be even used for on-device training where the entire training process runs on Google Coral Dev Kit without the need to use the cloud or a powerful computing environment. Since the Dev Kit has enough horsepower, it can be exploited to train models based on smaller datasets through transfer learning.
Google not only extended the Cloud TPU to the edge, but it also made it simple for developers to convert and optimize TensorFlow models for the Edge TPU.
Janakiram MSV’s Webinar series, “Machine Intelligence and Modern Infrastructure (MI2)” offers informative and insightful sessions covering cutting-edge technologies. Sign up for the upcoming MI2 webinar at http://mi2.live.