Cloud Services

Paperspace Gradient: A Modern PaaS for Machine Learning

16 Nov 2018 3:00am, by

This article is a part of the series where we explore cloud-based machine learning services. After covering Azure ML Services, Google Cloud ML Engine, Amazon SageMaker, and IBM Watson Studio Cloud, we will take a closer look at Paperspace Gradient.

The rise of ML and AI prompted the cloud computing industry to introduce a new segment of Platform as a Service (PaaS) offerings. The new ML-based platforms take some of the best features from the original PaaS vendors such as Heroku and Engine Yard and combine it with the key use cases and scenarios of machine learning. The result is an easy to use, self-service PaaS capable of running ML training jobs and a scalable hosting environment for model deployment.

In traditional PaaS, developers write and test code on their local machines before deploying it in production. The input to PaaS is code packaged as a zip file or a Docker image that automatically gets provisioned and managed. Each iteration of the build process is tested independently before pushing it as a versioned artifact into the PaaS.

ML PaaS follows the same workflow as the traditional PaaS environments. Developers and data scientists write Python or R code on local machines that is tested with smaller data sets. The same code is moved to the cloud along with large data sets to initiate a training job. These jobs take advantage of the powerful infrastructure based on high-end CPUs and GPUs. Developers typically run the training job multiple times with different set of hyperparameters till they are convinced with the accuracy of the model. The fully-trained, frozen model is deployed for inference either in the same PaaS or a different environment.

Paperspace Gradient is a contemporary platform that aims to bring the simplicity and flexibility of a traditional PaaS to building machine learning models in the cloud. Paperspace also has an offering called as Core that exposes raw VMs powered by NVIDIA GPUs and Google TPUs to customers that prefer additional control over the infrastructure.

Paperspace Gradient is built on top of Core as an abstract layer that hides the complexity involved in provisioning and managing VMs. This PaaS service relies on Docker and NVIDIA containers to simplify the lifecycle management of ML models.

Let’s explore the components of Paperspace Gradient.

At a high level, Gradient has three essential elements: Jobs, Notebooks, and Storage. Jobs are the workhorses of the platform while Notebooks act as the IDE by providing the tooling support. The storage layer is used by Jobs and Notebooks to store and retrieve data sets, models, and other related artifacts. These components of the PaaS can be accessed through the portal, CLI, or the API.

Jobs

Simply put, a job in Paperspace is a Docker container that contains executable code in the form of a shell script or Python code. The job is scheduled on one or more machines powered by a GPU or TPU.

When a developer uses the CLI or portal to submit his code to the job runner, Gradient picks up the code, compresses it, and then creates a Docker image. It is also possible to mention the machine type during the job submission. If there is a container image available in a registry such as Docker Hub, Gradient can be pointed to the location of the image along with required credentials to pull that image. For complex workflows involving pipelines, users can build and test a Docker image locally before submitting it to Gradient via the registry.

If there is a Github repo with the source code, Gradient can clone it on the fly before building the Docker image. The choice of local source code directory, Docker image, and Github repository offer flexibility to developers and DevOps teams.

Once a job is kicked off by the job runner, the portal can show metrics emitted by the code. Logs that are written to stdout in the format of {“chart”: “<identifier>”, “y”: <value>, “x”: <value>} are picked up by the portal for visualization. The CLI can also stream the logs to the local console.

One of the best features of the Gradient PaaS is the ability to access the IP address and ports exposed by the container. Each job gets a public address along with specific ports mapped to the associated container. This feature is very helpful in accessing web UI exposed by tools such as TensorBoard or NVIDIA DIGITS. Basic inferencing can be done by including a simple Flask or Django app inside the container that may be accessed when a job is running.

Jobs have access to a storage location available at /storage to persist trained models, evaluation metrics or any other artifact that needs to be persisted across different iterations of the same job.

Notebooks

Paperspace Gradient can be used to quickly launch Jupyter Notebooks without the need for provisioning and configuring VMs.

Like Jobs, Notebooks are also packaged and launched via Docker containers. Paperspace has multiple pre-configured container images consisting of a variety of deep learning and machine learning frameworks along with Jupyter Notebooks.

Notebook containers can be launched on VMs backed by GPUs. Each Notebook session can last up to 12 hours after which they are automatically stopped. The artifacts generated by Notebooks can be persistent to a shared storage available to other Notebooks or Jobs.

Similar to the concept of custom job containers, developers can build and upload custom Notebook containers to Docker Hub that Gradient can access to instantiate custom Notebook environments. This is helpful in managing highly customized configurations of frameworks used by developers and data scientists.

Storage

Gradient includes three types of storage that are available within the context of running a job or notebook container. The three storage types are Persistent, Artifact, and Workspace storage Each storage type is meant for a different purpose used at various stages of experimentation and evaluation of ML models.

Persistent storage is backed by a filesystem and is ideal for storing data like images, datasets, and model checkpoints. It is a designated storage location where you can read and write files during a Job.

Artifact storage is collected and made available after the job run in the CLI and web interface. Notebooks and Jobs use this location for storing any artifact generated from the code.

The Workspace storage is typically imported from the local directory in which the job is started. The contents of that directory are zipped up and uploaded to the container where the job actually runs. It is a temporary storage location that exists for the duration of the job run.

Paperspace makes popular datasets such as MNIST and COCO available at a read-only directory that is mounted at /datasets. All the Jobs and Notebooks have access to the data sets stored in this location.

Paperspace sets a fine example of how a modern ML PaaS should be designed. It is all set to give a tough competition to public cloud platform vendors who are busy repositioning themselves as the preferred platforms for ML development and deployment.


A digest of the week’s most important stories & analyses.

View / Add Comments

Please stay on topic and be respectful of others. Review our Terms of Use.