A Close Look at Cloud-Based Machine Learning Platforms: IBM and Oracle
This is the third and final part of the ML PaaS series where we explore IBM Cloud Pak for Data and Oracle Machine Learning. We follow the same framework of classifying the features and services of these platforms into the five stages of machine learning.
IBM Watson Machine Learning
IBM has a comprehensive set of tools and services for building and deploying machine learning models. The key differentiating factor of IBM is the ability to run the data science and machine learning platform in a variety of environments including public cloud, on-premises, and hybrid cloud.
IBM Cloud Pak for Data as a Service is an end-to-end platform available on IBM Cloud. It runs on Red Hat OpenShift as a cloud native solution.
Watson Studio is an integrated development environment running within IBM Cloud Pak for Data. Customers use this tool to manage almost all the stages of the machine learning project.
The platform supports adding data from disparate data sources through an extensible connector. Data from IBM’s managed services such as Cloudant, DB2, and Informix or external services such as Amazon S3, Amazon RDS, or Microsoft Azure SQL database can be ingested into the environment. Customers can easily bring unstructured data stored in object storage or structured data from relational databases into IBM Cloud Pak.
When dealing with large datasets, the DataStage component, which is an ETL tool can be used to transform and integrate data in projects.
Ingested data is cleansed and processed through the Data Refinery component of the platform. The final dataset that acts as an input to the training phase can be saved in Avro, CSV, JSON, or Parquet format.
Watson Studio, which is an integral part of the IBM Cloud Pak for Data as a Service, is the platform for building models. It supports both JupyterLab and RStudio environments as collaborative development environments for data scientists and developers.
Watson Studio comes with many preinstalled Python and R modules. Once the environment is provisioned, developers can use popular machine learning and deep learning tools and frameworks to build models.
Like other MLaaS offerings, IBM ships a Python SDK for Watson Machine Learning which provides access to IBM Cloud services such as object storage and managed databases.
Training jobs can be initiated interactively through the Watson Studio or declaratively by defining the specifications in a YAML file.
In IBM Watson Machine Learning, a deep learning experiment can be created through a logical grouping of one or more model definitions. When an experiment is run it creates training runs for each model definition that is part of the experiment. Each definition is a YAML file with a pointer to the model definition and the CPU/GPU resources required for the experiment.
IBM Watson Machine Learning offers automated hyperparameter tuning and AutoAI that offer different levels of AutoML capabilities.
Hyperparameter optimization is a mechanism for automatically exploring a search space of potential Hyperparameters, building a series of models, and comparing the models using metrics of interest.
IBM Watson Machine Learning supports training models in parallel with an on-demand GPU compute cluster based on NVIDIA K80 and V100 GPUs.
The AutoAI graphical tool in Watson Studio automatically analyzes datasets and generates candidate model pipelines customized for a predictive modeling problem. These model pipelines are created iteratively as AutoAI analyzes the dataset and discovers data transformations, algorithms, and parameter settings that work best for the problem set. Results are displayed on a leaderboard, showing the automatically generated model pipelines ranked according to your problem optimization objective.
In IBM Watson Machine Learning, a deployment space acts as a registry to manage the assets associated with a model deployment. The deployment assets may include a serialized model, scoring script, schema of the inference dataset, and more.
From the deployment space, a model can be deployed and exposed as a web service through a secure HTTPS endpoint. Developers can use the Python SDK to programmatically deploy and configure the REST endpoint.
IBM Watson Machine Learning comes with a robust ModelOps framework to manage deployed models.
When multiple versions of the same model are trained and deployed, customers need to track various versions and key events for the model. Model lineage provides the means to track and manage versions of Watson Machine Learning models.
Through the model activities component, it is possible to track the lifecycle of a model including critical stages such as re-evaluation, retaining, and replacing the active model.
Oracle Machine Learning
Oracle Cloud Infrastructure Data Science Platform and Oracle Cloud Infrastructure Machine Learning are two building blocks of Oracle Cloud Infrastructure (OCI) that deliver end-to-end machine learning capabilities. Like its competitors, Oracle built its managed machine learning platform on top of its core offerings such as Oracle Autonomous Database and Exadata Cloud Service.
For data preparation and pre-processing, OCI has multiple services that help developers in ingesting and processing from disparate sources.
Oracle Cloud Infrastructure Data Integration is a serverless platform that simplifies extract-transform-load (ETL) and extract-load-transform (ELT) jobs. It has an intuitive interface providing a no-code approach to data flow design. Behind the scenes, the service uses Apache Spark for processing large datasets.
Oracle Cloud Infrastructure Data Flow is a managed big data platform based on Apache Spark. Unlike the Data Integration service, Data Flow offers a programmatic approach to creating Big Data applications that run in the context of Apache Spark.
A combination of the above services can be used to ingest, process, and prepared data available in Oracle Autonomous Database or streamed in real-time through Oracle Cloud Infrastructure Streaming.
For building machine learning models, OCI has the Oracle Cloud Infrastructure Data Science platform that comes with Accelerated Data Science (ADS) SDK, a Python library that provides access to relevant OCI services.
Oracle Cloud Infrastructure Data Science has a built-in, cloud-hosted IDE based on JupyterLab Notebooks that allows teams of data scientists to build and train models with a familiar user interface. The platform supports building ML models with TensorFlow, PyTorch, or add other frameworks of choice.
Data scientists and developers can access CPU and GPU infrastructure from the JupyterLab Notebooks. The platform has support for NVIDIA P100 and V100 GPUs.
The Accelerated Data Science (ADS) SDK supports Oracle’s own AutoML, as well as open source tools such as H2O 3 and auto-sklearn. Oracle’s AutoML offers automated feature selection, adaptive sampling, and automated algorithm selection.
Trained models are packaged as an artifact (ZIP file) that contains the serialized model and the inference code. These artifacts are tagged and stored in the model catalog for deployment.
Deployed models are exposed as HTTP endpoints through a load balancing mechanism.
OCI doesn’t support deploying models to managed Kubernetes clusters as microservices.
Deployed models emit access logs and predict logs providing visibility into the inference.
The access log category is a custom log that captures detailed information about requests sent to the model endpoint. Predict logs are emitted by the inference code as defined by the developers. Logs written to stdout and stderr are captured by predict logs.
The access and predict logs emitted from a model deployment can be accessed using the OCI Logging service for further analysis.
OCI doesn’t support model drift detection which is typically an extension of model management.