How Azure ML Streamlines Cloud-based Machine Learning

Microsoft was one of the first public cloud providers to jump the machine learning PaaS bandwagon. Ranging from cognitive APIs to advanced GPU-based deep learning virtual machines, Azure has the whole spectrum of tools for building and deploying machine learning models.
Recently, at Microsoft Ignite, the company announced a revamped service that aims to standardize and simplify ML development on Azure. Branded as Azure ML Service, the new platform is designed to match the changing dynamics of evolving machine learning models. Microsoft hopes to make Azure the preferred cloud for building, managing, and hosting machine learning models.
Currently, in the preview, Azure ML Service seems to be a step in the right direction from Microsoft. The service uses Python API that can be easily integrated with existing ML projects. It strikes the right balance between local and cloud-based development. Anything that can be done interactively through the portal can be scripted through the Python API.
Microsoft is embracing Notebooks as the preferred environment for Azure ML. Developers can use Jupyter Notebooks on their local workstation, Azure Notebooks in the cloud, or Databricks Notebooks for data prep. These Notebooks can consume Python APIs for seamless integration with other Azure services such as compute and storage.
I could integrate Azure ML to run experiments in my local testbed based on Ubuntu 16.04 machine running TensorFlow backed by a NVIDIA GPU.
Azure ML closely resembles Amazon’s SageMaker. Both the services rely on Notebooks for development, object storage for storing data sets, container registry for model management, and container engines for serving. The key differentiator for Azure ML is the support for AutoML which can even be used for regression and classification tasks. I will publish a deep dive analysis of Amazon SageMaker vs. Azure ML in the near future.
Microsoft, like AWS, is embracing Notebooks as the preferred environment for Azure ML.
Azure ML Service is composed of multiple components. Once configured, it can become a powerful testbed for both beginners and experienced data scientists. The platform provides tools and APIs for both model training and model hosting.
Let’s take a closer look at the architecture of Azure ML Service.
Workspaces
In Azure ML, a workspace acts as a logical boundary for all the assets and artifacts related to a machine learning project. From the datasets to notebooks to trained models to hyperparameters to container images, a workspace holds everything together. The very first step in getting started with Azure ML is to create a workspace in the public cloud.
When you access a workspace through the portal, you will see the below resources:
Every workspace has a configuration associated with it, which is a simple JSON file that contains the details such as Azure subscription, resource group, and the workspace name. Azure ML’s Python API uses this configuration to associate the current environment with the cloud-based workspace.
Development Environments
Once the workspace is created, the configuration can be used to create a development environment based on Jupyter Notebooks (local), Azure Notebooks (cloud), Python IDEs such as VS Code or PyCharm (local), or Data Science Virtual Machine (cloud). The only prerequisite for connecting the development environment with an Azure ML workspace is the Python API which can be installed with a single pip install command.
With an environment connected to the workspace, the next step is creating an experiment, which tracks all the iterations involved in training the model.
Experimentation — Target Compute Environment
An experiment needs a target execution environment for training the models. The target execution environment can be any compute platform ranging from a local Python virtual environment to an advanced Azure Batch AI. The supported platforms include a local computer, Data Science Virtual Machine, Azure Batch AI, and Azure HDInsight. Depending on the complexity of the model, users can choose from simple to most sophisticated compute environments powered by GPUs and FPGAs.
Experimentation — Target Storage Environment
After defining the compute environment, Azure ML Service needs to know the location of the input dataset. If the dataset is stored on the local computer, it needs to be copied to Azure Storage for centralized access. There is a Python API to perform this task. Some of the standard packages such as Pandas and MatPlotLib may be used to explore and visualize data.
Estimator — Evaluating Training Runs
The next step is to kick off the training based on the identified dataset and the compute target. Developers are free to choose a framework of their choice to train the model. Azure ML Service doesn’t dictate the training framework. It works with popular frameworks including Scikit-learn, Keras, TensorFlow, CNTK, MXNet, and PyTorch. For PyTorch and TensorFlow jobs, Azure Machine Learning also provides respective custom PyTorch and TensorFlow Estimators that make it easy to use these frameworks.
Azure ML expects you to include an estimator that records the accuracy and precision of each iteration. An estimator is a bridge between the chosen training framework and Azure ML service. The most accurate iteration reported by the estimator is used to finalize and register the model. The output of each iteration is stored as a PKL file or any other serializable format. Azure ML doesn’t impose any specific format for finalizing the model. The only expectation is that the scripts used in inference know how to deserialize the model.
Developers can test the model by loading sample data, predicting the results, and evaluating the results. They can use standard techniques such as ROC curve and confusion matrix to evaluate each run of the training process.
While the training process is running in the target compute environment, the progress can be monitored from Azure Portal. It can also be monitored within the Notebook through the RunDetails API.
Model Registration
The last step in training is registering the model with the Azure ML workspace, which can be used for inference. This step uploads the finalized model from the training environment to the workspace.
Model Inference
With the trained and evaluated model in place, it can be used for inference at scale. There are two steps to deploying the model – creating a container image and hosting the web service. Azure ML exposes API for both. The container image consists of the finalized model, hyperparameters, the script to score the model, and a YAML file that defines the runtime dependencies. This self-contained Docker image will have everything it takes to perform inference from the trained model.
Finally, the image can be deployed to a hosting environment where it is exposed as a web service. The target environments for hosting the web service can be Azure Container Instances (ACI), Azure Kubernetes Service (AKS), Azure IoT Edge, and Field-programmable gate array (FPGA).
In one of the upcoming articles, we will build and host a model to predict the salary of a developer based on StackOverflow dataset using Azure ML. Stay tuned!
Microsoft is a sponsor of The New Stack.