Build and Deploy a Machine Learning Model with Azure ML Service

In this tutorial, we will build and deploy a machine model to predict the salary from the Stackoverflow dataset. By the end of this, you will be able to invoke a RESTful web service to get the predictions.
Since the objective to demonstrate the workflow, we will use a simple two-column dataset with years of experience and salary for the experiment. For the details on the dataset, refer to my previous article on linear regression.
Prerequisites
- Basic knowledge of Python and Scikit-learn
- Active Microsoft Azure Subscription
- Anaconda or Miniconda
Configuring the Development Environment
Configure a virtual environment with the Azure ML SDK. Run the below commands to install the Python SDK, and launching a Jupyter Notebook. Start a new Python 3 kernel from Jupyter.
1 2 3 4 5 6 7 8 9 |
$ conda create -n aml -y Python=3.6 $ conda activate aml $ conda install nb_conda $ pip install azureml-sdk[notebooks] $ jupyter notebook |
Initializing Azure ML Environment
Let’s start by importing all the required Python modules, which include standard Scikit-learn modules and the Azure ML modules.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
import datetime import numpy as np import pandas as pd from sklearn.model_selection import train_test_split from sklearn.linear_model import LinearRegression from sklearn.externals import joblib import azureml.core from azureml.core import Workspace from azureml.core.model import Model from azureml.core import Experiment from azureml.core.webservice import Webservice from azureml.core.image import ContainerImage from azureml.core.webservice import AciWebservice from azureml.core.conda_dependencies import CondaDependencies |
We need to create an Azure ML Workspace that acts as the logical boundary for our experiment. A Workspace creates a Storage Account for storing the dataset, a Key Vault for secrets, a Container Registry for maintaining the image repositories, and Application Insights for logging the metrics.
Don’t forget to replace the placeholder with your subscription id.
1 2 3 4 5 6 |
ws = Workspace.create(name='salary', subscription_id='', resource_group='mi2', create_resource_group=True, location='southeastasia' ) |
After a few minutes, we will see the resources created within the Workspace.
We can now create an Experiment to start logging the metrics. Since we don’t have many parameters to log, we are capturing the start time of the training process.
1 2 3 |
exp = Experiment(workspace=ws, name='salexp') run = exp.start_logging() run.log("Experiment start time", str(datetime.datetime.now())) |
Training and Testing the Scikit-learn ML Model
We will now proceed to train and test the model through Scikit-learn.
1 2 3 4 5 6 7 |
sal = pd.read_csv('data/sal.csv',header=0, index_col=None) X = sal[['x']] y = sal['y'] X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=10) lm = LinearRegression() lm.fit(X_train,y_train) |
The trained model will be serialized as a pickle file in the outputs directory. Azure ML automatically copies the content of the outputs directory to the cloud.
1 2 |
filename = 'outputs/sal_model.pkl' joblib.dump(lm, filename) |
Let’s complete the experiment by logging the slope, intercept, and the end time of the training job.
1 2 3 4 5 |
run.log('Intercept :', lm.intercept_) run.log('Slope :', lm.coef_[0]) run.log("Experiment end time", str(datetime.datetime.now())) run.complete() |
We can track the metrics and the execution time from the Azure Dashboard.
Registering and Serving the Trained Model
Each time we freeze the model, it can be registered with Azure ML with a unique version. This gives us the ability to easily switch between different models when serving.
Let’s register the salary model from the above training job by pointing the SDK to the location of the PKL file. We are also adding some additional metadata to the model in the form of tags.
1 2 3 4 5 |
model = Model.register(model_path = "outputs/sal_model.pkl", model_name = "sal_model", tags = {"key": "1"}, description = "Salary Prediction", workspace = ws) |
Check the Models section of the Workspace to ensure that our model is registered.
It’s time for us to package and deploy the model as a container image which will be exposed as a web service.
For the container image to get created, we need to tell Azure ML about the environment needed by the model. We will then pass a Python script that includes code to predict the values based on an inbound data point.
Azure ML API provides handy methods for both. Let’s first create the environment file, salenv.yaml, which tells the runtime to include Scikit-learn in the container image.
1 2 3 4 5 6 7 |
salenv = CondaDependencies() salenv.add_conda_package("scikit-learn") with open("salenv.yml","w") as f: f.write(salenv.serialize_to_string()) with open("salenv.yml","r") as f: print(f.read()) |
The below snippet, when executed from the Jupyter Notebook, creates a file called score.py that contains the inference logic for the model.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
%%writefile score.py import json import numpy as np import os import pickle from sklearn.externals import joblib from sklearn.linear_model import LogisticRegression from azureml.core.model import Model def init(): global model # retrieve the path to the model file using the model name model_path = Model.get_model_path('sal_model') model = joblib.load(model_path) def run(raw_data): data = np.array(json.loads(raw_data)['data']) # make prediction y_hat = model.predict(data) return json.dumps(y_hat.tolist()) |
Now. let’s connect the dots by passing the inference file and environment configuration to the image.
1 2 3 4 5 |
%%time image_config = ContainerImage.image_configuration(execution_script="score.py", runtime="python", conda_file="salenv.yml") |
This eventually results in the creation of a container image which shows up in the Images section of the Workspace.
We are all set to create the deployment configuration that defines the target environment and launching it as web service hosted in Azure Container Instance as a single-vm container. We may also choose AKS or an IoT Edge environment as the deployment target.
1 2 3 4 5 6 7 8 9 10 11 12 |
aciconfig = AciWebservice.deploy_configuration(cpu_cores=1, memory_gb=1, tags={"data": "Salary", "method" : "sklearn"}, description='Predict Stackoverflow Salary') service = Webservice.deploy_from_model(workspace=ws, name='salary-svc', deployment_config=aciconfig, models=[model], image_config=image_config) service.wait_for_deployment(show_output=True) |
The Azure Resource Group now has an Azure Container Instance running the inference for the model.
We can get the URL of the inference service from the below method:
1 |
print(service.scoring_uri) |
Let’s go ahead and invoke with the web service through cURL. We can do this from the same Jupyter Notebook.
You can access the dataset and Jupyter Notebook from the Github repo.
The uniqueness of this approach is that we could perform all the tasks from a Python kernel running inside the Jupyter Notebook. Developers can do everything it takes to train and deploy ML models from code. This is the real value of using an ML PaaS like Azure ML Service.