Prisma Cloud from Palo Alto Networks is sponsoring our coverage of AWS re:Invent 2021.
Built from Amazon SageMaker, Amazon SageMaker Canvas, a new visual, no code capability that was designed for business analysts to build ML models and generate predictions, through a user interface, and with minimal coding, so the company claims.
In this post, we will do a hands-on evaluation of Amazon SageMaker Canvas. Follow along to train a logistic regression model.
Step 1: Preparing the Dataset
For this tutorial, we will use the bank marketing open-source dataset, which is available through a Creative Commons CCO: Public Domain license. We will replace the header row for clarity.
Let’s download the CSV file and update the header.
wget -O bank-marketing-raw.csv https://datahub.io/machine-learning/bank-marketing/r/bank-marketing.csv
sed -e 's/V1,V2,V3,V4,V5,V6,V7,V8,V9,V10,V11,V12,V13,V14,V15,V16,Class.*/Age,Job,MaritalStatus,Education,Default,Balance,Housing,Loan,Contact,Day,Month,Duration,Campaign,PDays,Previous,POutcome,Deposit/' bank-marketing-raw.csv > bank-marketing.csv
The last column,
Deposit is our label which represents if a customer signed up for a term deposit or not. The value 1 represents a negative outcome (a deposit is not made at the bank) and 2 represents a positive outcome (a deposit is made at the bank).
We will upload this dataset to Amazon S3. Replace the bucket name with yours.
Verify the object by accessing the bucket in AWS Console.
Step 2: Import the Dataset
We will now import this dataset into Amazon SageMaker Canvas to create a dataset. If you have a SageMaker Domain provisioned, you can launch the Canvas from there.
SageMaker Canvas has four steps, which are explained in the splash screen that shows up when we launch the environment.
Navigate to the
Datasets section in the left navigation bar and click on
Import the CSV file we uploaded to the S3 bucket to create the dataset.
Step 3: Build the Model
Create a new model and give it a meaningful name. Select the dataset in the next step.
Deposit column as the target. The model type automatically switches to 2 category prediction (binary classification).
Clicking on any column shows the details and quality of data.
Standard build and start the training job. This will give us an option to access the model from SageMaker Studio.
It will take a couple of hours for the model to become ready. You can track the progress on the same page.
Step 4: Analyze the Model
Once the model is ready, you can analyze it through metrics such as F1 Score, RUC/AOC, and the confusion matrix.
The basic analysis shows that the model has an accuracy of 90%, which is impressive.
The below screenshot shows the confusion matrix of the model. It also has the F1 and AUC score. F1 is a score of the model’s accuracy, and AUC (Area Under Curve) measure the model’s predictive capabilities.
Finally, we can perform a prediction through the UI by changing the values.
Step 5: Access the Experiment from SageMaker Studio
Share button will generate a link that can be accessed through the SageMaker Studio.
Opening the link in SageMaker Studio shows us all the trials performed by SageMaker Autopilot to choose the best candidate.
It even shows the links to the artifacts generated during the trial.
Behind the scenes, SageMaker Canvas runs an Autopilot experiment on the structured dataset. You can download the Jupyter notebooks to explore the code for each trial.
This concludes the tutorial on Amazon SageMaker Canvas — the no-code tool for training ML models.
Amazon Web Services is a sponsor of The New Stack.