Tutorial: Train Machine Learning Models with Automated ML Feature of Azure ML

This is the final part of a series using AzureML where we explore AutoML capabilities of the platform.
Similar to the last two tutorials (part 2 and part 3), we will apply logistic regression on the Pima Indian Diabetes dataset. Instead of training the model by ourselves, we will use the AutoML SDK to let Azure ML choose the best algorithm.
You can run the code in a Jupyter Notebook on your workstation but the AutoML job runs in a remote compute cluster provisioned in Azure.
Follow to steps described in the previous tutorial to create a workspace in Azure and installing the Python AzureML SDK in your local development machine.
Start by importing the relevant Python modules.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
import azureml.core from azureml.core import Workspace, Experiment from azureml.core.compute import ComputeTarget, AmlCompute from azureml.core.compute_target import ComputeTargetException from azureml.core import Dataset from azureml.core.experiment import Experiment from azureml.train.automl import AutoMLConfig from datetime import datetime from dateutil.relativedelta import relativedelta from pandas import read_csv from numpy import set_printoptions from sklearn.preprocessing import MinMaxScaler from sklearn.model_selection import train_test_split from sklearn.linear_model import LogisticRegression import logging import os |
Let’s initialize the Azure ML workspace. Make sure you have the config.json file in the same directory.
1 2 3 |
ws = Workspace.from_config() if not os.path.exists('project_folder'): os.makedirs('project_folder') |
Let’s load the CSV file, add headers and save it back.
1 2 3 4 |
filename = './data/pima-indians-diabetes.csv' names = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class'] dataframe = read_csv(filename, names=names) dataframe.to_csv('./data/diabetes.csv') |
We can now upload the CSV file and register that as a dataset in AzureML
1 2 3 4 |
def_blob_store = ws.get_default_datastore() def_blob_store.upload_files(["./data/diabetes.csv"], target_path="data", overwrite=True) diabetes_data = Dataset.Tabular.from_delimited_files(def_blob_store.path('./data/diabetes.csv')) diabetes_data = diabetes_data.register(ws, 'diabetes_data',create_new_version=True) |
With the dataset in place, we will now define the compute environment based on the cluster in AzureML.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
aml_compute_target = "demo-cluster" try: aml_compute = AmlCompute(ws, aml_compute_target) print("found existing compute target.") except ComputeTargetException: print("creating new compute target") provisioning_config = AmlCompute.provisioning_configuration(vm_size = "STANDARD_D2_V2", min_nodes = 1, max_nodes = 4) aml_compute = ComputeTarget.create(ws, aml_compute_target, provisioning_config) aml_compute.wait_for_completion(show_output=True, min_node_count=None, timeout_in_minutes=20) print("Azure Machine Learning Compute attached") |
The next step is to define the AutoML configuration.
1 2 3 4 5 6 7 8 |
automl_settings = { "iteration_timeout_minutes": 1, "iterations": 20, "primary_metric": 'accuracy', "featurization": True, "verbosity": logging.INFO, "n_cross_validations": 2 } |
This instructs Azure AutoML to run 20 iterations before settling for the best algorithm. Each algorithm is ranked based on the accuracy metric.
We are now ready to kick off the AutoML job.
1 2 3 4 5 6 7 |
automl_config = AutoMLConfig(task='classification', debug_log='automated_ml_errors.log', path="./project_folder", compute_target=aml_compute_target, training_data=diabetes_data, label_column_name="class", **automl_settings) |
1 2 |
experiment = Experiment(ws, "diabetes-experiment") remote_run = experiment.submit(automl_config, show_output=True) |
We choose classification as the task type and point the column that represents the label in the dataset. This step connects the dots across the dataset, compute, and problem type.
The whole process takes about 30 minutes. You see the below output in the Jupyter Notebook cell.
We can also use the widget from the AzureML Python SDK which provides an insight into the training job.
1 2 |
from azureml.widgets import RunDetails RunDetails(remote_run).show() |
Finally, you can retrieve the best model based on accuracy.
1 2 3 |
best_run, fitted_model = remote_run.get_output() print(best_run) print(fitted_model) |
Follow the steps from this tutorial to register and deploy the model.
Janakiram MSV’s Webinar series, “Machine Intelligence and Modern Infrastructure (MI2)” offers informative and insightful sessions covering cutting-edge technologies. Sign up for the upcoming MI2 webinar at http://mi2.live.
Feature ihoto by Lucas Myers on Unsplash.