IoT Edge Computing / Kubernetes / Machine Learning

Deploy Nvidia Triton Inference Server with MinIO as Model Store

3 Dec 2021 6:00am, by
Triton promotional image

This tutorial is the latest part of a series where we build an end-to-end stack to perform machine learning inference at the edge. In the previous part of this tutorial series, we installed the MinIO object storage service on SUSE Rancher’s RKE2 Kubernetes distribution. We will extend that use case further by deploying Nvidia Triton Inference Server that treats the MinIO tenant as a model store.

AI Inference cluster illustration

By the end of this tutorial, we will have a fully configured model server and registry ready for inference.

Step 1 — Populate the MinIO Model Store with Sample Models

Before deploying the model server, we need to have the model store or repository populated with a few models.

Start by cloning the Triton Inference Server GitHub repository.

git clone https://github.com/triton-inference-server/server.git

We will now run a shell script to download the models to the local filesystem, after which we will upload them to a MinIO bucket.

Run the ./fetch_models.sh script available at server/docs/examples directory.

Wait for all the models to get downloaded in the model_repository directory. It may take a few minutes, depending on your Internet connection.

Model repository

Let’s use the MinIO CLI to upload the models from the model_repository directory to the models bucket. The bucket was created within the model-registry tenant created in the last tutorial.

Run the command from the model_repository directory to copy the files to the bucket.

mc --insecure cp --recursive . model-registry/models

Check the uploads by visiting the MinIO Console. You should be able to see the directories copied to the models bucket.

Minio console for building models.

We are now ready to point NVIDIA Triton Inference Server to MinIO.

Step 2 — Deploy Triton Inference Server on RKE2

Triton expects Amazon S3 as the model store. To access the bucket, it needs a secret with the AWS credentials.

In our case, these credentials are essentially the MinIO tenant credentials saved from the last tutorial.
Create a namespace and the secret within that.

kubectl create ns model-server

kubectl create secret generic aws-credentials --from-literal=AWS_ACCESS_KEY_ID=admin --from-literal=AWS_SECRET_ACCESS_KEY=7c5c084d-9e8e-477b-9a2c-52bbf22db9af -n model-server

Don’t forget to replace the credentials with your values.

Now, create the deployment, service and apply them.

kubectl apply -f triton-deploy.yaml
kubectl apply -f triton-service.yaml

kubectl get pods command

To make the Triton pod access Minio service, we fixed the certificate issue with the below command:

cp /var/run/secrets/kubernetes.io/serviceaccount/ca.crt /usr/local/share/ca-certificates && update-ca-certificates

We passed the MinIO bucket to Triton using the standard Amazon S3 convention – s3://https://minio.model-registry.svc.cluster.local:443/models/

Finally, check the logs of the Triton pod and make sure everything is working properly.

kubectl logs triton-59994bb95c-7hgt7 -n model-server

Kubectl command for fetching Triton logs.

If you see the above in the output, it means that Triton is able to download the models from the model store and serve them through the HTTP and gRPC endpoints.

Step 3 — Run Inference Client against Triton

Start by cloning the repo to get the code for inference.

cd https://github.com/triton-inference-server/client.git

cat <> requirements.txt
cat requirements.txt
pillow
numpy
attrdict
tritonclient
google-api-python-client
grpcio
geventhttpclient
boto3
EOF

pip3 install -r requirements.txt

Navigate to the client/src/python/examples directory and execute the following command


python3 image_client.py \
-u TRITON_HTTP_ENDPOINT \
-m inception_graphdef \
-s INCEPTION \
-x 1 \
-c 1 \
car.jpg

Replace TRITON_HTTP_ENDPOINT with the host and nodeport of the Triton service. Send an image of a car and you should see the below output:

Inference output.

The client has invoked the Trinton inference endpoint with a request to load the inception model already available in the model store. Triton has performed the inference and printed the labels based on the classification.

Congratulations! You have successfully deployed and configured the model server backed by a model store running at the edge.