Machine Learning

Tutorial: Use the Amazon SageMaker Python SDK to Train AutoML Models with Autopilot

28 Feb 2020 8:23am, by

In the last tutorial, we have seen how to use Amazon SageMaker Studio to create models through Autopilot.

In this installment, we will take a closer look at the Python SDK to script an end-to-end workflow to train and deploy a model. We will use batch inferencing and store the output in an Amazon S3 bucket.

The walkthrough is based on the same dataset and problem type discussed in the previous tutorial.

Follow the steps mentioned in the previous tutorial to configure and setup the environment for Autopilot. Launch a new Jupyter notebook to run the Python code that uses the SDK.

This step initializes the environment and returns the default S3 bucket associated with SageMaker.

We downloaded the dataset from datahub.io.

This will verify the dataset and displays it in a grid.

We split the dataset and upload it to an S3 bucket.

Now that the dataset is ready, we will define the input, output, and job configuration of an Autopilot experiment.

This cell contains the most critical parameters for an Autopilot experiment. It tells where the dataset is located, the label, where the final artifacts will be uploaded, the criterion for the job to be completed along with the problem type and the metric to evaluate the performance of the model.

With the configuration in place, we will create an AutoML job.

This cell will continue to print the status of the job every 30 seconds.

Once the job is complete, we can retrieve the data exploration notebook, candidate definition notebook, and the name of the candidate with the best model.

This will download the Jupyter notebooks from the S3 bucket to the local environment.

In the next few steps, we will create the model from the best candidate, deploy it and perform batch inferencing.

To perform batch inferencing, we need to transform the test dataset stored in the S3 bucket and send it to the model.

Wait till the job status shows it as completed.

We can now download and print the output from the inferencing job.

This step concludes the tutorial on using SageMaker Autopilot Python SDK to train models.

Janakiram MSV’s Webinar series, “Machine Intelligence and Modern Infrastructure (MI2)” offers informative and insightful sessions covering cutting-edge technologies. Sign up for the upcoming MI2 webinar at http://mi2.live.

Amazon Web Services is a sponsor of The New Stack.

Feature image via Pixabay.

A newsletter digest of the week’s most important stories & analyses.