Tensorflow Model Deployment and Inferencing with Kubeflow

In the last part of this series, we trained a Tensorflow model to classify the images of cats and dogs. The model is stored in a shared Kubernetes persistent volume claim (PVC) which can be accessed by another Kubeflow Notebook Server to test the model.
Remember, this series aims not to build an extremely complex neural network but to demonstrate how Kubeflow helps organizations with machine learning operations (MLOps).
Launch a new CPU-based Jupyter Notebook Server and upload the notebook available on GitHub. This notebook validates the model by passing a few images.
Follow the same steps to launch the Notebook Server based on the image, janakiramm/infer
. Make sure you mount the shared PVC – models
.
This notebook loads the TensorFlow model and performs the classification based on sample images.
The infer
function accepts a file and returns the prediction.
Let’s now deploy the model in TensorFlow Serving running in Kubernetes. Start by cloning the Github repository that has everything we need to run the inference code.
1 |
git clone https://github.com/janakiramm/kubeflow-notebook-tutorial.git |
Navigate to the inference
directory to find the YAML files and other related assets.
Let’s deploy TensorFlow Serving in the kubeflow-user-example-com
namespace and expose it as a NodePort service. It’s the same namespace where the Jupyter Notebook Servers are running.
1 2 3 |
cd inference kubectl apply -f tf-serve-deploy.yaml kubectl apply -f tf-serve-service.yaml |
Below are YAML specifications for the TF Serving deployment and service.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 |
apiVersion: apps/v1 kind: Deployment metadata: labels: app: dogs-vs-cats name: dogs-vs-cats-v1 namespace: kubeflow-user-example-com spec: selector: matchLabels: app: dogs-vs-cats template: metadata: labels: app: dogs-vs-cats version: v1 spec: containers: - args: - --port=9000 - --rest_api_port=8500 - --model_name=dogs-vs-cats - --model_base_path=/models command: - /usr/bin/tensorflow_model_server image: tensorflow/serving:latest imagePullPolicy: IfNotPresent livenessProbe: initialDelaySeconds: 30 periodSeconds: 30 tcpSocket: port: 9000 name: dogs-vs-cats ports: - containerPort: 9000 - containerPort: 8500 volumeMounts: - mountPath: /models name: model-serve-storage volumes: - name: model-serve-storage persistentVolumeClaim: claimName: models |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
apiVersion: v1 kind: Service metadata: labels: app: dogs-vs-cats name: dogs-vs-cats-service namespace: kubeflow-user-example-com spec: ports: - name: http-tf-serving port: 8500 targetPort: 8500 nodePort: 31000 - name: grpc-tf-serving port: 9000 targetPort: 9000 nodePort: 31001 selector: app: dogs-vs-cats type: NodePort |
We are essentially mounting the same PVC used by the Jupyter Notebook Servers to serve the model.
The TF Serving endpoint is available as a NodePort on the Kubeflow cluster.
Since Kubeflow relies on Istio for authorizing requests, we need to apply an authorization policy to allow requests to TF Serving.
1 2 3 4 5 6 7 8 9 10 11 |
apiVersion: security.istio.io/v1beta1 kind: AuthorizationPolicy metadata: name: default namespace: kubeflow-user-example-com spec: rules: - to: - operation: methods: ["GET","POST"] paths: ["/v1/models/*"] |
1 |
kubectl apply -f tf-serve-auth.yaml |
It’s time to invoke the endpoint from a Python Client. Let’s create a virtual environment and install the required modules.
1 2 |
python3 -m venv inferenv source inferenv/bin/activate |
1 |
pip install -r requirements.txt |
Below is the Python client code we use for inference.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 |
import argparse import json import numpy as np import requests import tensorflow import PIL from tensorflow.keras.preprocessing import image ap = argparse.ArgumentParser() ap.add_argument("-i", "--image", required=True, help="path of the image") ap.add_argument("-u", "--uri", required=True, help="URI of model server") args = vars(ap.parse_args()) image_path = args['image'] uri = args['uri'] img = image.img_to_array(image.load_img(image_path, target_size=(128, 128))) / 255. payload = { "instances": [{'conv2d_3_input': img.tolist()}] } r = requests.post(uri+'/v1/models/dogs-vs-cats:predict', json=payload) pred = json.loads(r.content.decode('utf-8')) predict=np.asarray(pred['predictions']).argmax(axis=1)[0] print( "Dog" if predict==1 else "Cat" ) |
Let’s run the Python client by passing the TF Serving URL and a sample image. When sending sample1.jpg
, we see the prediction as a dog and a cat when using sample2.jpg
.
1 2 |
HOST=http://10.0.0.54:31000 python infer.py -i sample1.jpg -u $HOST |
Replace HOST with an appropriate IP and port-based on your cluster and the TF Serving NodePort service.
1 2 |
HOST=http://10.0.0.54:31000 python infer.py -i sample2.jpg -u $HOST |
As you can see, the classification is accurate for the images that we sent.
This concludes the series on Kubeflow Jupyter Notebook Servers where we explored the end-to-end MLOps scenario of configuring the environment, performing data preparation, training, deployment, and inference.