Cloud Services / Data Science / Machine Learning

Review: Build a ML Model with Amazon SageMaker Canvas

23 Dec 2021 3:00am, by

Last month, at its annual re:Invent user conference, Amazon Web Services launched a new machine learning service specifically built for non-developers.

Built from Amazon SageMaker, Amazon SageMaker Canvas, a new visual, no code capability that was designed  for business analysts to build ML models and generate predictions, through a user interface, and with minimal coding, so the company claims.

In this post, we will do a hands-on evaluation of Amazon SageMaker Canvas. Follow along to train a logistic regression model.

Step 1: Preparing the Dataset

For this tutorial, we will use the bank marketing open-source dataset, which is available through a Creative Commons CCO: Public Domain license. We will replace the header row for clarity.

Let’s download the CSV file and update the header.

CSV file

The last column, Deposit is our label which represents if a customer signed up for a term deposit or not. The value 1 represents a negative outcome (a deposit is not made at the bank) and 2 represents a positive outcome (a deposit is made at the bank).

We will upload this dataset to Amazon S3. Replace the bucket name with yours.

upload the dataset.

Verify the object by accessing the bucket in AWS Console.

Verify the object set in S3.

Step 2: Import the Dataset

We will now import this dataset into Amazon SageMaker Canvas to create a dataset. If you have a SageMaker Domain provisioned, you can launch the Canvas from there.

Launch Canvas

SageMaker Canvas has four steps, which are explained in the splash screen that shows up when we launch the environment.

Canvas splash screen.

Navigate to the Datasets section in the left navigation bar and click on Import.

import dataset

Import the CSV file we uploaded to the S3 bucket to create the dataset.

dataset ui

Import UI

Step 3: Build the Model

Create a new model and give it a meaningful name. Select the dataset in the next step.

select dataset

Choose the Deposit column as the target. The model type automatically switches to 2 category prediction (binary classification).

select a column to predict

Clicking on any column shows the details and quality of data.

data quality details.

Choose Standard build and start the training job. This will give us an option to access the model from SageMaker Studio.

It will take a couple of hours for the model to become ready. You can track the progress on the same page.

Step 4: Analyze the Model

Once the model is ready, you can analyze it through metrics such as F1 Score, RUC/AOC, and the confusion matrix.

The basic analysis shows that the model has an accuracy of 90%, which is impressive.

Basic analysis.

The below screenshot shows the confusion matrix of the model. It also has the F1 and AUC score. F1 is a score of the model’s accuracy, and AUC (Area Under Curve) measure the model’s predictive capabilities.

F1 and AUC scores.

Finally, we can perform a prediction through the UI by changing the values.

Perform a prediction.

Step 5: Access the Experiment from SageMaker Studio

Clicking the Share button will generate a link that can be accessed through the SageMaker Studio.

Share with SageMaker Studio.

Opening the link in SageMaker Studio shows us all the trials performed by SageMaker Autopilot to choose the best candidate.

SageMaker trials.

It even shows the links to the artifacts generated during the trial.

Trial report.

Behind the scenes, SageMaker Canvas runs an Autopilot experiment on the structured dataset. You can download the Jupyter notebooks to explore the code for each trial.

This concludes the tutorial on Amazon SageMaker Canvas — the no-code tool for training ML models.