Deploy Nvidia Triton Inference Server with MinIO as Model Store

This tutorial is the latest part of a series where we build an end-to-end stack to perform machine learning inference at the edge. In the previous part of this tutorial series, we installed the MinIO object storage service on SUSE Rancher’s RKE2 Kubernetes distribution. We will extend that use case further by deploying Nvidia Triton Inference Server that treats the MinIO tenant as a model store.
By the end of this tutorial, we will have a fully configured model server and registry ready for inference.
Step 1 — Populate the MinIO Model Store with Sample Models
Before deploying the model server, we need to have the model store or repository populated with a few models.
Start by cloning the Triton Inference Server GitHub repository.
git clone https://github.com/triton-inference-server/server.git
We will now run a shell script to download the models to the local filesystem, after which we will upload them to a MinIO bucket.
Run the ./fetch_models.sh
script available at server/docs/examples
directory.
Wait for all the models to get downloaded in the model_repository
directory. It may take a few minutes, depending on your Internet connection.
Let’s use the MinIO CLI to upload the models from the model_repository
directory to the models
bucket. The bucket was created within the model-registry tenant created in the last tutorial.
Run the command from the model_repository
directory to copy the files to the bucket.
mc --insecure cp --recursive . model-registry/models
Check the uploads by visiting the MinIO Console. You should be able to see the directories copied to the models
bucket.
We are now ready to point NVIDIA Triton Inference Server to MinIO.
Step 2 — Deploy Triton Inference Server on RKE2
Triton expects Amazon S3 as the model store. To access the bucket, it needs a secret with the AWS credentials.
In our case, these credentials are essentially the MinIO tenant credentials saved from the last tutorial.
Create a namespace and the secret within that.
kubectl create ns model-server
kubectl create secret generic aws-credentials --from-literal=AWS_ACCESS_KEY_ID=admin --from-literal=AWS_SECRET_ACCESS_KEY=7c5c084d-9e8e-477b-9a2c-52bbf22db9af -n model-server
Don’t forget to replace the credentials with your values.
Now, create the deployment, service and apply them.
kubectl apply -f triton-deploy.yaml
kubectl apply -f triton-service.yaml
To make the Triton pod access Minio service, we fixed the certificate issue with the below command:
cp /var/run/secrets/kubernetes.io/serviceaccount/ca.crt /usr/local/share/ca-certificates && update-ca-certificates
We passed the MinIO bucket to Triton using the standard Amazon S3 convention – s3://https://minio.model-registry.svc.cluster.local:443/models/
Finally, check the logs of the Triton pod and make sure everything is working properly.
kubectl logs triton-59994bb95c-7hgt7 -n model-server
If you see the above in the output, it means that Triton is able to download the models from the model store and serve them through the HTTP and gRPC endpoints.
Step 3 — Run Inference Client against Triton
Start by cloning the repo to get the code for inference.
cd https://github.com/triton-inference-server/client.git
cat <> requirements.txt
cat requirements.txt
pillow
numpy
attrdict
tritonclient
google-api-python-client
grpcio
geventhttpclient
boto3
EOF
pip3 install -r requirements.txt
Navigate to the client/src/python/examples
directory and execute the following command
python3 image_client.py \
-u TRITON_HTTP_ENDPOINT \
-m inception_graphdef \
-s INCEPTION \
-x 1 \
-c 1 \
car.jpg
Replace TRITON_HTTP_ENDPOINT with the host and nodeport of the Triton service. Send an image of a car and you should see the below output:
The client has invoked the Trinton inference endpoint with a request to load the inception model already available in the model store. Triton has performed the inference and printed the labels based on the classification.
Congratulations! You have successfully deployed and configured the model server backed by a model store running at the edge.