This article introduces the fundamental building blocks of Kubeflow and their significance in MLOps.
An out-of-the-box installation of Kubeflow comes with Pipelines, Notebook Servers, Katib, and KFServing. Let’s take a look at each of these components.
The process of training a machine learning model is highly iterative. It is often compared to a science experiment due to the trial and error approach adopted by researchers and engineers to arrive at an accurate model.
Model training is not only repetitive but also needs to be consistent. ML engineers need a mechanism to tweak a few parameters while retaining all the previous experiment values. They may also want to compare and contrast each iteration’s outcome to decide which experiment yields the best result.
Training a machine learning model involves running multiple steps in a sequence and sometimes in parallel. Data acquisition, data processing, data preparation, model training, model evaluation, hyperparameter tuning, and model deployment are the stages of the model lifecycle. The output of one step becomes the input for the other. For example, during the data preparation phase, tens of thousands of images are resized, augmented, and converted to a format such as TFRecord (TensorFlow) or Image IO (Apache MXNet), which is accessed by model training in the next step. The training step may run in parallel, leveraging the available GPU/CPU infrastructure to speed up the process.
Cloud-based machine learning platforms such as Amazon SageMaker and Azure ML offer integrated pipelines capability for performing MLOps. They let engineers define each step of the pipeline independently and connect them seamlessly to form a pipeline. Given the modularity and portability of containers, it’s not surprising that these platforms map each step of the pipeline to a container image. Object storage is the preferred persistence layer for sharing the artifacts among the components of the pipeline.
Kubeflow, the open source machine learning platform, offers similar pipeline capabilities as the cloud-based enterprise ML platforms.
A pipeline is a description of an ML workflow, including all of the workflow components and how they combine in the form of a graph. The pipeline includes the definition of the inputs (parameters) required to run the pipeline and the inputs and outputs of each component. A pipeline component is a self-contained set of user code packaged as a container image that performs one step in the pipeline.
Kubeflow pipelines can be defined through a YAML specification, Python SDK, or by annotating existing Python code or Jupyter Notebooks. The YAML file is a declaration of the container images participating in the pipeline, the entry point for each container, and the location to persist the artifacts.
There are four interfaces available for Kubeflow pipelines:
1) Kubeflow Pipelines UI
2) Kubeflow Pipelines CLI
3) Kubeflow Pipelines SDK
4) Kubeflow Pipelines API
Kubeflow pipelines support importing and exporting pipelines. For example, a pipeline definition for running in the public cloud can be imported into a Kubeflow environment running on-premises. The entire pipeline may be exported to a tarball and imported to another environment.
The Python SDK and API provide programmatic access to the pipeline runtime. They make it possible to extend an existing Python application to leverage the workflow.
In one of the upcoming tutorials of this series, I will walk you through an end-to-end scenario using Kubeflow pipelines to train and deploy a model.
The Jupyter Notebook environment has become the most preferred development environment for developing data-driven applications. Unlike traditional development tools, Jupyter Notebooks simplify collaboration among data scientists and engineers.
Kubeflow comes with an integrated Jupyter Notebook platform in the form of Kubeflow Notebook Servers. Unlike the stand-alone version of Jupyter, Kubeflow supports role-based access control (RBAC) for fine-grained access to the Notebooks. Administrators can define roles with different permissions for each Notebooks server.
Though Kubernetes doesn’t have strict multitenancy, Kubeflow Notebook Server brings strong isolation between environments through namespaces and integrated RBAC.
Each Notebook Server is based on a container image that comes with the libraries, frameworks, and tools needed by a data scientist team. This approach enables administrators to build a diverse set of environments based on containers catering to each team’s needs. For example, the team performing data preparation will launch a Notebook Server based on CPU, while the model training team will have a custom TensorFlow image optimized for GPU. A Notebook Server can be pointed to a private or public container registry with the images.
Behind the scenes, each Notebook Server translates to a Kubernetes StatefulSet. Each Pod of the StatefulSet may be attached to a dedicated Persistence Volume Claim (PVC) and a shared PVC. The dedicated PVC acts as the home directory to store artifacts and data specific to the user. In contrast, the shared PVC is mounted across multiple Pods to share datasets and other artifacts. Kubeflow needs a container storage backend such as NFS that supports read/write operations (RWX) by multiple Pods.
Kubeflow Notebook Servers is a robust and collaborative development environment for data scientists and engineers. I will have a dedicated tutorial to demonstrate how to set up, configure and use Jupyter Notebooks on Kubeflow.
As mentioned earlier, training machine learning models is an iterative process. One of the critical aspects of training is hyperparameter tuning, which influences the model’s accuracy and precision.
Hyperparameters are the variables that control the model training process. When training a model based on deep learning, the learning rate parameters, number of layers in a neural network, and the number of nodes in each layer define accuracy. They may need to be adjusted for each run of an experiment to evaluate and select the most optimal combination.
In summary, hyperparameter tuning optimizes the hyperparameter values to maximize the predictive accuracy of the model.
Katib, a component of Kubeflow, provides hyperparameter tuning for deep learning models. It runs several training jobs (known as trials) within each hyperparameter tuning job (experiment). Each trial tests a different set of hyperparameter configurations. At the end of the experiment, Katib outputs the optimized values for the hyperparameters.
In addition to hyperparameter tuning, Katib offers a neural architecture search (NAS) feature that aims to maximize a model’s predictive accuracy and performance. Both hyperparameter tuning and NAS are a subset of AutoML, a technique that offers a low-code/no-code approach to training sophisticated machine learning models.
Katib works with mainstream ML frameworks, including TensorFlow, Apache MXNet, and PyTorch. It has a Python SDK, making it possible to integrate hyperparameter tuning with Kubeflow Pipelines and Notebook Servers.
Katib deserves a detailed discussion and walkthrough, which I plan to cover in a separate article of this series.
Model serving exposes a fully-trained model through a standard API for applications to integrate machine learning capabilities. It is one of the crucial stages in the MLOps pipeline.
Kubernetes is one of the proven platforms for deploying APIs and web UI at scale. Since model serving translates to an API, it makes sense to deploy it in Kubernetes. KFServing bridges the gap between model serving components and Kubernetes.
Kubeflow supports two model serving systems that allow multiframework model serving: KFServing and Seldon Core. Alternatively, we can also use a standalone model serving system.
KFServing and Seldon Core are both open source systems that allow multiframework model serving.
KFServing on Kubeflow brings existing model serving components of TensorFlow, PyTorch, and MXNet to Kubernetes. You can use KFServing to run a multiframework model serving that supports TensorFlow, XGBoost, Scikit-learn, NVIDIA Triton Inference Server, ONNX, and PyTorch.
KfServing can access a model artifact stored in a persistent layer such as an object storage bucket or an NFS share. It exposes a well-defined and standardized API endpoint to perform inference on models.
KFServing can be used for online predictions or batch predictions. It’s a scalable, cloud native, multiframework model serving engine tightly integrated with Kubeflow.
In the next part of this series, we will get started with Kubeflow Notebook Server, where I will walk you through creating customized container images and using them at various stages of MLOps. Stay tuned!
MinIO is a sponsor of The New Stack.