The Google Cloud ML Engine is a hosted platform to run machine learning training jobs and predictions at scale. The service treats these two processes (training and predictions) independently. It is possible to use Google Cloud ML Engine just to train a complex model by leveraging the GPU and TPU infrastructure. The outcome from this step — a fully-trained machine learning model — can be hosted in other environments including on-prem infrastructure and public cloud. The service can also be used to deploy a model that is trained in external environments. Cloud ML Engine automates all resource provisioning and monitoring for running the jobs. It can also manage the lifecycle of deployed models and their versions.
Apart from training and hosting, Cloud ML Engine can also perform hyperparameter tuning that influences the accuracy of predictions. Without automated hyperparameter tuning, data scientists will have to experiment with multiple values while evaluating the accuracy of the results.
Let’s take a closer look at the steps involved in training and predicting from a machine learning model deployed in Google Cloud ML Engine.
Data preparation involves acquiring the data and getting it ready for machine learning experiments. This phase typically takes place outside of ML Engine. Google Cloud Platform has several services that help with data preparation.
During data prep, data scientists explore and analyze the quality of data, which includes transforming the original dataset into a format that makes it easy to evolve a model. Typical steps involve identifying missing data, splitting existing columns, removing duplicates and so on. GCP services such as BigQuery, Cloud DataProc, Cloud Dataflow, and Cloud Dataprep are used for acquiring and preparing the data. Cloud Dataprep, a service based on Trifacta, is an intelligent, serverless data service for visually exploring, cleaning, and preparing structured and unstructured data.
The final step in this phase is copying the prepared dataset to Google Cloud Storage bucket, which makes it available to the distributed training job initiated by ML Engine. Even those datasets that are prepared outside of Google Cloud may also be uploaded to Cloud Storage.
This is the most important phase where developers and data scientists code the model in their local environment. Google Cloud ML Engine supports Python-based toolkits for creating machine learning models.
The supported frameworks and toolkits include Scikit-learn, XGBoost, and of course TensorFlow. Developers can use a subset of the original dataset to test and debug the code before submitting it as training job run in the cloud.
The Python programs written for the model creation may not have Cloud ML Engine specific code. They may follow the standard conventions and flow that’s typically used for creating ML models.
Once thoroughly tested, the code is ready to be submitted to ML Engine.
This is a critical phase in the lifecycle of a machine learning model, which involves feeding the training data to the model, evaluating it, and tuning the parameters to increase the accuracy.
When training a model, we feed known data points called features along with the original outcome as a label. During the evaluation, we feed the data from the test dataset and compare the predicted value with the actual labels. This process is repeated till the difference between predicted labels and the actual labels is minimum. For sophisticated models such as artificial neural networks, ML Engine also provides hyperparameter tuning. When the predictions match the actual labels for most of the data points in the dataset, the training process is stopped and the final model is ready for consumption.
Depending on the complexity of the model and size of the dataset, the training job can run across a cluster of machines backed by GPUs and TPUs. ML Engine lets us choose the right tier for scheduling the training job. The compute resources required for running the job are dynamically managed by ML Engine.
For initial testing, ML Engine can train the model on local workstations. However, in most cases, training is done in the cloud exploiting the powerful infrastructure based on CPU and GPU clusters.
During this phase, the model is serialized into a supported format and uploaded to a Google Cloud Storage bucket. For example, Scikit-learn models are saved as PKL and Joblib files and TensorFlow models are serialized into Checkpoint or ProtoBuf files.
A model is then registered in Cloud ML Engine pointing to the location of the Cloud Storage bucket where is the serialized object is uploaded.
Since the new models are often evolved based on new data, newer versions of models may be registered with Cloud ML Engine.
The core value of machine learning is derived from accurate predictions. Google Cloud ML Engine hosts the fully-trained models for prediction. Like the training jobs are managed across multiple resources, the infrastructure required for model hosting is also dynamically managed by ML Engine.
There are two ways to get predictions from trained models — online prediction, which is also called HTTP prediction (or sometimes “online prediction”), and batch prediction. Online prediction deals with one data point at a time while batch prediction can accept an entire dataset. In both the cases, we pass input data to a cloud-hosted machine-learning model and get inferences for each instance.
Cloud ML Engine predictions are tightly integrated with Stackdriver, the monitoring tool of GCP. We can monitor the predictions on an ongoing basis by invoking the APIs to examine running jobs.
Developers and DevOps Engineers can manage most of the phases discussed above through the CLI or the API provided by Google Cloud SDK.
In the next part of this series, we will learn how to train, evaluate, and host a simple machine learning model in Google Cloud. Stay tuned!
Feature Image: RawPixel.