Validate Service-Level Objectives of REST APIs Using Iter8
Validation of service-level objectives (SLOs) of REST-based apps before their release is a central concern for DevOps/MLOps/SRE teams. Google’s classic book on site reliability engineering popularized the notion of error budget and mean/tail latency-based SLOs for apps. In this article, we will explore a simple approach for SLO validation of REST-based apps using Iter8.
Iter8 is an open source cloud native (Kubernetes-based) experimentation platform that makes it easy to optimize, validate and safely release new versions of apps. Iter8 introduces the notion of an experiment that automates release engineering tasks such as SLO validation, A/B(/n) testing with business metrics, chaos injection, dark launch, canary releases and progressive rollouts with mirroring, incremental traffic shifting, user segmentation and session affinity. Iter8 experiments can be packaged as Helm charts to enable their reuse across apps. Iter8 also provides a command-line utility called iter8ctl that helps a human operator gain a deeper understanding of the experiment.
In this article, we will explore the above Iter8 experiment. Iter8 experiments are deployed together with the apps they validate. This experiment generates synthetic HTTP requests for the app, builds the latency and error profile of the app based on its responses, and determines whether or not the app satisfies the SLOs specified in the experiment. We will demonstrate this experiment in the context of two scenarios, an app that implements a REST API for serving HTTP GET requests and an app that implements a REST API for serving HTTP POST requests with a JSON payload. This experiment is very general and does not require the instrumentation of the app for metrics collection. It also does not assume the availability of a metrics collection database like Prometheus or the presence of a Kubernetes service mesh like Istio or Linkerd.
Cluster & Local Setup
Install Iter8 in your Kubernetes cluster, clone the Iter8 GitHub repo locally, set the ITER8 environment variable to the root of your local Iter8 repo, and install the Helm and iter8ctl utilities on your local machine. These steps are in the following copy-and-paste-able asciicast:
Scenario 1: HTTP GET API
The SLO validation experiment for an HTTP GET API is in the following asciicast:
In the above asciicast, we deploy a simple hello app consisting of a Kubernetes service and a deployment that serves HTTP GET requests at the cluster-local URL: http://hello.default.svc.cluster.local:8080. Next, we create an Iter8 SLO validation experiment using the following Helm command:
helm upgrade -n default my-exp $ITER8/samples/first-exp \
--set URL='http://hello.default.svc.cluster.local:8080' \
--set limitMeanLatency=50.0 \
--set limitErrorRate=0.0 \
--set limit95thPercentileLatency=100.0 \
This creates an Iter8 experiment in the cluster. The experiment generates a stream of HTTP requests for the app, constructs the latency and error-rate profile of the app based on the HTTP responses and verifies that the app satisfies mean latency (50 msec), error rate (0.0), and 95th percentile tail latency (100 msec) SLOs based on the responses. This experiment is short and intended to complete within a few seconds. We can easily assert that the experiment is complete and the app satisfies the specified SLOs using the following command:
iter8ctl assert -c completed -c winnerFound
In this experiment, Iter8 would have declared the app as a winner assuming it satisfied all the SLOs. Under this assumption, the above command will exit normally (code 0) after printing a message that says that all assertions are satisfied. Otherwise, if the app fails to satisfy the SLOs, the command would exit with an error (code 1) with a relevant message. In both cases, the results of the experiment can be described in detail using the following command:
This command displays detailed information about the progress of the experiment, metrics collected for the app, which SLOs were satisfied, and which ones were not.
Scenario 2: HTTP POST API
The SLO validation experiment for an HTTP POST API is in the following asciicast:
This scenario differs from the HTTP GET API scenario in two ways. First, the httpbin app deployed in this scenario exposes a POST API endpoint that accepts a JSON payload. Second, the Helm command used to deploy the Iter8 experiment includes the payloadURL and contentType parameters. The experiment downloads the JSON object from the payloadURL and uses it as the payload in each POST request it sends to the app.
The iter8ctl subcommands assert experiment outcomes and describe the experiment in the same manner as in Scenario 1.
Use the SLO validation workflows introduced in this article or extend them in various ways to suit your API testing needs. The following are a few options for doing so.
Your apps: Perform SLO validation experiments with your apps by using the appropriate URLs in the Helm values. The apps need not be implemented using Kubernetes deployments; they may involve StatefulSets or custom resources like serverless services (e.g., Knative), or machine learning inference services (e.g., KFServing, Seldon). You can even perform these experiments for non-Kubernetes web services that implement HTTP GET or POST APIs.
Version promotion: Extend the Helm charts used in the above scenarios so that the experiment promotes the app to a product cluster after verifying that it satisfies the SLOs. Here is an example of Iter8 performing SLO validation followed by version promotion through a GitHub pull request.
More content types: In the POST API scenario, you may use any valid HTTP content type. For example, to send a JPEG image as part of the POST request, ensure that the image is hosted at the payloadURL (i.e., a GET query to the payloadURL should return a response containing the image), and set contentType to image/jpeg as part of the Helm command.
Other Helm values: In the above experiments, the number of requests sent during the experiment, and the rate at which these requests are sent (qps) are set to their default values of 100 and 8.0 respectively. These parameters can also be set as Helm values. Use the helm get values subcommand to see both the user-supplied and computed Helm values.