Where are you using WebAssembly?
Wasm promises to let developers build once and run anywhere. Are you using it yet?
At work, for production apps
At work, but not for production apps
I don’t use WebAssembly but expect to when the technology matures
I have no plans to use WebAssembly
No plans and I get mad whenever I see the buzzword
Edge Computing / Kubernetes

Deploy Nvidia Triton Inference Server with MinIO as Model Store

This tutorial shows how to set up the Nvidia Triton Inference Server that treats the MinIO tenant as a model store.
Dec 3rd, 2021 6:00am by
Featued image for: Deploy Nvidia Triton Inference Server with MinIO as Model Store

This tutorial is the latest part of a series where we build an end-to-end stack to perform machine learning inference at the edge. In the previous part of this tutorial series, we installed the MinIO object storage service on SUSE Rancher’s RKE2 Kubernetes distribution. We will extend that use case further by deploying Nvidia Triton Inference Server that treats the MinIO tenant as a model store.

AI Inference cluster illustration

By the end of this tutorial, we will have a fully configured model server and registry ready for inference.

Step 1 — Populate the MinIO Model Store with Sample Models

Before deploying the model server, we need to have the model store or repository populated with a few models.

Start by cloning the Triton Inference Server GitHub repository.

git clone

We will now run a shell script to download the models to the local filesystem, after which we will upload them to a MinIO bucket.

Run the ./ script available at server/docs/examples directory.

Wait for all the models to get downloaded in the model_repository directory. It may take a few minutes, depending on your Internet connection.

Model repository

Let’s use the MinIO CLI to upload the models from the model_repository directory to the models bucket. The bucket was created within the model-registry tenant created in the last tutorial.

Run the command from the model_repository directory to copy the files to the bucket.

mc --insecure cp --recursive . model-registry/models

Check the uploads by visiting the MinIO Console. You should be able to see the directories copied to the models bucket.

Minio console for building models.

We are now ready to point NVIDIA Triton Inference Server to MinIO.

Step 2 — Deploy Triton Inference Server on RKE2

Triton expects Amazon S3 as the model store. To access the bucket, it needs a secret with the AWS credentials.

In our case, these credentials are essentially the MinIO tenant credentials saved from the last tutorial.
Create a namespace and the secret within that.

kubectl create ns model-server

kubectl create secret generic aws-credentials --from-literal=AWS_ACCESS_KEY_ID=admin --from-literal=AWS_SECRET_ACCESS_KEY=7c5c084d-9e8e-477b-9a2c-52bbf22db9af -n model-server

Don’t forget to replace the credentials with your values.

Now, create the deployment, service and apply them.

kubectl apply -f triton-deploy.yaml
kubectl apply -f triton-service.yaml

kubectl get pods command

To make the Triton pod access Minio service, we fixed the certificate issue with the below command:

cp /var/run/secrets/ /usr/local/share/ca-certificates && update-ca-certificates

We passed the MinIO bucket to Triton using the standard Amazon S3 convention – s3://https://minio.model-registry.svc.cluster.local:443/models/

Finally, check the logs of the Triton pod and make sure everything is working properly.

kubectl logs triton-59994bb95c-7hgt7 -n model-server

Kubectl command for fetching Triton logs.

If you see the above in the output, it means that Triton is able to download the models from the model store and serve them through the HTTP and gRPC endpoints.

Step 3 — Run Inference Client against Triton

Start by cloning the repo to get the code for inference.


cat <> requirements.txt
cat requirements.txt

pip3 install -r requirements.txt

Navigate to the client/src/python/examples directory and execute the following command

python3 \
-m inception_graphdef \
-x 1 \
-c 1 \

Replace TRITON_HTTP_ENDPOINT with the host and nodeport of the Triton service. Send an image of a car and you should see the below output:

Inference output.

The client has invoked the Trinton inference endpoint with a request to load the inception model already available in the model store. Triton has performed the inference and printed the labels based on the classification.

Congratulations! You have successfully deployed and configured the model server backed by a model store running at the edge.

Group Created with Sketch.
THE NEW STACK UPDATE A newsletter digest of the week’s most important stories & analyses.