Gradient is a machine learning platform as a service (ML PaaS) offering from Paperspace. The key building blocks of the platform include a job execution engine, one-click Jupyter Notebook access, native Python API, and a CI/CD pipeline integrated with Github for model management.
Gradient also supports experiments for rapid iteration of models. An experiment is a collection of jobs with different hyperparameters and metrics. Experiments execute under the context of a project which acts as the logical boundary for datasets, jobs, and model artifacts.
For the background and context, refer to the article on Gradient published at The New Stack.
In this tutorial, we will explore the workflow involved in training and deploying models with Gradient. Since the emphasis is on the flow, we will pick a simple linear regression problem that predicts the salary of a developer based on this experience.
The dataset is partially based on the sample Stack Overflow salary calculator, which we’ll use to build a single variate linear regression model with one feature (experience) and label (salary).
The Stack Overflow Salary model was discussed at length in The New Stack article, Machine Learning and Linear Regression for Mere Mortals. There are two jobs involved in this workflow — training and deploying. The first job generates a Python pickle file that gets stored in the shared storage service of Gradient. The same pickle file will be used by the second job running a Flask web server to expose a REST endpoint. This job will serve the model through the inferencing endpoint.
Before going ahead with the tutorial, sign up with Gradient.
Step 1: Create a Project and an Experiment for Training the Model
After signing in, click on Gradient in the navigation bar on the left to choose Projects. Click on the Create Project button and select Create Standalone Project. Enter a name for the project when prompted.
Within this project, we will create an experiment that trains the model.
Experiment Builder is a wizard-style GUI tool to submit a job to the Job Runner component.
The first step is to choose a machine type for scheduling the job. Gradient takes advantage of Google Compute Engine’s preemptable instances to provide low-cost infrastructure. Make sure you check the Enable low-cost instances checkbox. Choose G1 machine type that comes with 1 CPU core, 1.7GB RAM, and 250GB SDD. This configuration is sufficient for the Scikit-learn training job. For TensorFlow and PyTorch, you can select a GPU-based machine type for accelerating the job.
In Gradient, jobs are based on a container image that provides the runtime and dependencies. For this tutorial, we are using an image that contains Python 3 runtime with Scikit-learn framework. I built the image by adding Python dependencies to the lightweight Alpine Linux image.
On a side note, if you want to use your own container image, feel free to use or modify the Dockerfile and push it to a public container registry such as Docker Hub.
LABEL MAINTAINER="Janakiram MSV <firstname.lastname@example.org>"
# Linking of locale.h as xlocale.h
# This is done to ensure successfull install of python numpy package
# see https://forum.alpinelinux.org/comment/690#comment-690 for more information.
RUN apk add --no-cache --virtual build-dependencies python3 \
&& apk add --virtual build-runtime \
build-base python3-dev openblas-dev freetype-dev pkgconfig gfortran \
&& ln -s /usr/include/locale.h /usr/include/xlocale.h \
&& python3 -m ensurepip \
&& rm -r /usr/lib/python*/ensurepip \
&& pip3 install --upgrade pip setuptools \
&& ln -sf /usr/bin/python3 /usr/bin/python \
&& ln -sf pip3 /usr/bin/pip \
&& rm -r /root/.cache \
&& pip install --no-cache-dir $PYTHON_PACKAGES \
&& apk del build-runtime \
&& apk add --no-cache --virtual build-dependencies $PACKAGES \
&& rm -rf /var/cache/apk/*
With the runtime container in place, we now need to point Gradient to the dataset and the training script. This is done through the integration with Github. Gradient pulls the Github repo into the experiment workspace and uses the assets for the training job.
Feel free to explore the Github repo that contains the dataset along with the code for training and deployment.
Let’s point Gradient to the tutorial repo on Github.
Finally, let’s define the command for the job which is the Python script that executes within the context of the runtime of the container. When the script exits gracefully, the job is marked as complete.
The command for this job is python train/train.py -i ./data/sal.csv -o /storage/salary. The script, train.py, takes the source location of the dataset (sal.csv) and the target location to save the training model (/storage/salary).
In Gradient, any file that is saved at /storage location becomes available to other jobs and experiments. By persisting the model to this location, we will be able to access it from the inference job.
We are now ready to kick off the training job by clicking on the Submit Experiment button.
Gradient adds the job to the queue and schedules it in one of the chosen machine types. In a few minutes, the job execution completes.
You can verify the logs that show the coefficients like Mean Square Error (MSE), Intercept, and Slope printed to the stdout in the code.
Feel free to explore the environment and files section of the job.
The fully trained and pickled model (model.pkl) is now available at /storage/salary location. You can now safely delete the job to clean up the project.
Step 2: Hosting and Serving the Model
With the training job done, we will now host a long-running job that exposes a REST endpoint for serving the model. The Github repo has the code for loading the pickled model file and running a Flask-based web server.
Start by creating a new job with the following parameters. Notice that it is similar to the training job except the command – pip install flask && python deploy/infer.py -m /storage/salary/model.pkl.
We first install the Flask module and then launch infer.py which picks up the model file /storage/salary/model.pkl. Feel free to explore the code of infer.py to understand how I load the pickled model file and wire it to the GET request.
Since we are running a web server, we also need to enable port mapping. This is done by entering 8080:8080 in the Ports section of the job.
Submit the experiment and wait for the job to enter the running mode.
The logs from the job confirm that the web server is up and running.
Before we can send a GET request to the endpoint, we need to access the URL of the job which is available under the Environment section.
Open a terminal window and invoke the REST endpoint.
The output below shows the predicted salary of a developer with 25 years of experience.
Gradient projects and experiments make it extremely simple to train and serve machine learning models. The same workflow can be initiated from the CLI which is good for automating the jobs.
In one of the upcoming tutorials, I will demonstrate how to configure CI/CD pipelines for machine learning through GradientCI. Stay tuned!
Janakiram MSV’s Webinar series, “Machine Intelligence and Modern Infrastructure (MI2)” offers informative and insightful sessions covering cutting-edge technologies. Sign up for the upcoming MI2 webinar to learn how to run applications at the edge with AWS Greengrass.