Cloud Services / Machine Learning

Tutorial: Create Training and Inferencing Pipelines with Azure ML Designer

15 May 2020 8:31am, by

In the first part of this series, I introduced the concept of Azure ML Pipelines. In the current tutorial, we will explore Azure ML’s interactive designer to build training and inference pipelines for a simple machine learning model.

By the end of this tutorial, we will build a binary classification/logistic regression model to predict whether or not a patient has diabetes based on certain diagnostic measurements included in the dataset.

For background on Azure ML, refer to this article and tutorial.

Create an Azure Resource Group and ML Workspace

Start by creating a new ML workspace in one of the supporting Azure regions. Make sure you choose the enterprise edition of the workspace as the designer is not available in the basic edition.

Once the workspace is ready, switch to the new Azure ML studio interface by clicking on the launch now button.

This takes you to the new ML studio interface which has the designer.

Create the Dataset

For this tutorial, we will use the Pima Indian Diabetes dataset published at the University of California Irvine Machine Learning Repository. You can download the CSV file from Kaggle.

Once you have the CSV file downloaded, register that as a dataset with Azure ML. For this, click on the Datasets link on the left navigation bar and choose create dataset from local files option.

Upload the file from your local hard drive. Make sure you choose “use headers from the first file” in the setting and preview section.

Finally, click on the create button to finish registering the dataset.

Building the Training Pipeline

With the dataset in place, let’s now build the training pipeline. From the designer environment, choose the first option to create a new pipeline draft.

Rename the pipeline and click on Select compute Target to create a cluster used for training.

The compute target will be used for various stages of the pipeline including data preparation, transformation, training, scoring, and evaluation.

Let’s create a two-node cluster with a pre-defined configuration. We will call this demo-cluster.

Wait for the compute cluster to become available. It may take a few minutes.

Let’s start by defining the dataset used for the pipeline. Drag and drop the diabetes dataset created in the previous step. It’s available in the Datasets tab of the palette under My Datasets section.

Now, let’s drag and drop the module, select columns in the dataset available under the data transformation tab to the canvas.

In the select columns dialogue, choose all columns.

Next, let’s create the training and test dataset by splitting the dataset through the split data module. Choose 0.7 as the fraction of rows to create the training dataset with 70% of the data and remaining for the test data.

We are now ready to train, score, and evaluate the model. Drag the following modules and connect them as shown in the below screenshot:

  • Two-Class Logistic Regression (Machine Learning Algorithms)
  • Train Model (Model Training)
  • Score Model (Model Scoring & Evaluation)
  • Evaluate Model (Model Scoring & Evaluation)

Set the label column for the train module to Outcome. This is the value we want the model to predict.

Hit the submit button to start the execution of the pipeline. Create a new experiment for the pipeline to run.

Wait for the pipeline run to finish the execution. You should see green tick marks on each module indicating that all the modules have been successfully run.

Right-click on the evaluate model module to visualize the accuracy metrics from the scored model.

We are now ready to create the inference pipeline.

Create Inference Pipeline

In this step, we will create a REST endpoint for predicting the outcome from the model. Azure ML designer does the heavy lifting of creating the pipeline that deploys and exposed the model.

Click create Inference pipeline button and choose real-time inference pipeline. This creates a new draft pipeline on the canvas.

Click on submit and choose the same experiment used for training.

Wait for the pipeline to finish the execution.

We need a compute environment to host the model. Before publishing the pipeline, let’s create an Azure Kubernetes Service (AKS) cluster which will become the compute target for the inference pipeline.

Navigate to the compute section of the workspace and choose inference clusters. Create a Kubernetes cluster with the desired configuration. Don’t forget to select Dev-test as the environment. This will enable us to deploy models with fewer CPU cores.

Click on deploy and choose the AKS cluster as the compute target.

In a few minutes, the model gets deployed in the AKS cluster. It can be accessed through the REST endpoint from any client that supports sending an HTTP request.

Navigate to the Endpoint section of the workspace to find the diabetes-real-time-inference created by the designer.


We can now invoke the endpoint by sending the HTTP payload that matches the schema of the dataset.

Create a file with the below JSON payload and send it via curl.

You will get the label and scored probability from the web service. Note that we need to include the label, Outcome in the request to meet the schema requirements. However, Azure ML scoring will not consider that while inferencing.

In the next part of this series, we will use Python SDK to create the pipelines. Stay tuned.

Janakiram MSV’s Webinar series, “Machine Intelligence and Modern Infrastructure (MI2)” offers informative and insightful sessions covering cutting-edge technologies. Sign up for the upcoming MI2 webinar at http://mi2.live.

Feature image via Pixabay.

A newsletter digest of the week’s most important stories & analyses.

View / Add Comments

Please stay on topic and be respectful of others. Review our Terms of Use.