Cloud Services / Data Science / Machine Learning

Tutorial: Deploying TensorFlow Models with Amazon SageMaker Serverless Inference

22 Dec 2021 3:00am, by

This guide is the last part of a series covering the Amazon SageMaker Studio Lab.

As we mentioned in previous posts, Amazon SageMaker Studio Lab is a standalone service that allows users to experiment with building machine learning models. It has no dependencies on Amazon Web Services itself. The environment is based on the popular and familiar JupyterLab notebooks. JupyterLab is the only commonality between Studio Lab and Studio available from the AWS Console. Anyone with an email account can sign up for the service.

The service is completely free. Amazon has opened up an IDE and environment for building machine learning models with no strings attached. This may be the first AWS service that lives outside of the IAM realm with an infinite number of free tier hours.

Except for the branding, the service has almost nothing to do with SageMaker.

In previous posts, we explored SageMaker Studio Lab basics and the SageMaker Serverless Inference. This tutorial will take the next step, and will show how to publish serverless inference endpoints for TensorFlow models.

When you have a model trained within SageMaker Studio Lab or any other environment, you can host that model within the SageMaker Studio environment for inference at scale. If you have followed the steps to train the image classification model based on the cats vs. dogs dataset, you can extend the scenario to deploy the same model within the SageMaker Serverless Inference service.

SageMaker architecture.

Prerequisites

You need the following to complete this tutorial:

  1. AWS account
  2. Access Key and Secret Key of your AWS account
  3. SageMaker Execution Role

Follow the steps mentioned in the Amazon SageMaker documentation to create the SageMaker IAM role with the appropriate permissions required to deploy the model.

Step 1: Preparing the Environment

Amazon SageMaker Studio Lab comes with the AWS CLI, which can be used to configure the environment. For this tutorial, we will use the Jupyter notebook and AWS SDK for Python (Boto3) to configure the credentials expected by the SDK.

Run the below commands in a new notebook based on the tf2:python kernel created in the previous tutorial.

Don’t forget to replace the credentials with your own keys.

These commands configure the AWS environment expected by Boto3.

Let’s prepare the model by archiving it into a tarball. This will be later uploaded to an Amazon S3 bucket for registering it with SageMaker.

Finally, set the variables used to configure the inference endpoints.

Don’t forget to replace SAGEMAKER_ROLE_ARN with the ARN created as a part of the prerequisites. We are setting the AWS region to Dublin. Feel free to replace it with any of the supported regions of serverless inference feature. The last line points to the container image that will be used by SageMaker during the creation and registration of the model. The TensorFlow Saved model will be mounted within this container that already has the code for inference. In case you choose a different region other than eu-west-1, update the image appropriately. You can access the list of available images here.

Step 2: Creating Amazon SageMaker Model

In this step, we will upload the model tarball to an S3 bucket and associate it with the deep learning container image for inference.

The last code snippet has everything SageMaker needs to create a model with the name dogs-vs-cats.

If you access the S3 bucket used by Amazon SageMaker, you will find the model tarball.

chose data souurce

If you navigate to the models section of SageMaker in AWS Console, you will see the model registered with it.

the data model

Step 3: Defining SageMaker Serverless Inference Endpoint Configuration

This is the most crucial step where we configure the endpoint for serverless inference.

The ServerlessConfig attribute is a hint to SageMaker runtime to provision serverless compute resources that are autoscaled based on the parameters — 2GB RAM and 20 concurrent invocations.

When you finish executing this, you can spot the same in AWS Console.

Step 4: Creating the Serverless Inference Endpoint

We are ready to create the endpoint based on the configuration defined in the previous step.

This results in the final inference endpoint being ready to accept requests.

define inference endpoint.

Step 5: Invoking the Serverless Inference Endpoint

Let’s go ahead and test the endpoint by sending the images of a dog.

You should see the endpoint classifying the image correctly.

image being classified.

This concludes the tutorial on publishing serverless inference endpoints for TensorFlow models. Hope you found it useful.