Where are you using WebAssembly?
Wasm promises to let developers build once and run anywhere. Are you using it yet?
At work, for production apps
At work, but not for production apps
I don’t use WebAssembly but expect to when the technology matures
I have no plans to use WebAssembly
No plans and I get mad whenever I see the buzzword
AI / Kubernetes / Observability / Software Development

Iter8: Simple A/B/n Testing of Kubernetes Apps, ML Models

Here's how to do A/B/n testing of distributed Kubernetes applications and ML models, using Iter8 SDK to make A/B/n testing easy.
Mar 1st, 2023 10:00am by
Featued image for: Iter8: Simple A/B/n Testing of Kubernetes Apps, ML Models

A common architecture for distributed applications is to have a frontend component that is exposed to users that interacts with one or more backend components. In this article, we focus on such a distributed architecture, and describe how to test multiple versions of the backend component.

In the figure below, the frontend component might be an online store. It relies on a backend model-driven recommendation component to make product suggestions. We are interested in A/B/n testing multiple versions of the backend recommendation component. In the figure, we have two versions v1 (the current, or baseline, version) and v2 (the candidate version).

A/B/n testing is valuable because it allows an application owner to intelligently choose between multiple candidate versions of application components, maximizing benefit. In Kubernetes, A/B/n testing presents two important challenges in terms of metrics and traffic engineering.

Challenge: Metrics

A/B/n testing relies on business metrics — metrics that measure the benefit, or value, of an application. For example, for an online store, relevant metrics might be sales revenue or user engagement. Because business metrics are application specific, they cannot be computed by the infrastructure but must be computed by the application itself.

It is common that business metrics are computed by user-facing (frontend) application components. The contribution of backend components is indirectly included in the metric calculation. However, the frontend component cannot directly attribute the metric value with a particular version of the backend component because it does not usually know which version of the backend was used for a particular request.

Challenge: Session Stickiness

Requests from the same user session should all be routed to the same version of the backend component (session stickiness). Otherwise, the user experience may be inconsistent. Furthermore, if user requests from the same user session are distributed to multiple versions of the backend, attributing a business metric will be impossible.

Iter8 SDK

Iter8 is an open-source Kubernetes-release optimizer that can help you easily test Kubernetes apps. With Iter8, you can perform various kinds of experiments, such as SLO validation, canary tests, chaos injection tests and A/B/n tests. A/B/n tests in Iter8 are enabled by the Iter8 SDK.

To enable A/B/n testing, the Iter8 SDK introduces the concept of a “track identifier.” A track is a logical version of a Kubernetes application. The set of valid track identifiers is fixed over the lifetime of the application. The version of the application associated with a given track identifier changes over time as new versions are developed and deployed.

For a given application, the set of track identifiers is fixed; this number determines how many versions of the application can be deployed/tested at the same time. Because the set of track identifiers is fixed, they can be used to configure routing to the application.

The Iter8 SDK provides two APIs for frontend application components:

  • Lookup(component, user_session) – Given an application and user session, this returns a track identifier. So long as there are no configuration changes, the same inputs will result in the same track identifier.
  • WriteMetric(metric_value, component, user_session) – Given an application, a user session, a metric name and its value, this associates the metric value with the appropriate version of the application.

The following sequence diagram illustrates the interaction. In response to a user request, the shopping component calls Lookup() to identify a track identifier. Lookup() ensures session stickiness — the same track identifier will be recommended for the same user session. The shopping component then sends its request to the recommendation component using the track identifier as a key to the routing.

Later, when the shopping component computes a business metric for the user session, it can be safely associated with the right version of the recommendation component. WriteMetric() assists by eliminating the need for the shopping component to maintain a mapping to version labels.

Running an A/B Test

We show how an A/B test can be run for the recommendation component described above. Before running the test, the Iter8 service and the application are deployed. A candidate version of the backend recommendation component is deployed and an A/B experiment is run. Finally, we show how to promote the candidate version, if desired.

Deploy the Iter8 Service

We can deploy the Iter8 service with helm:

Here, the service is configured to watch for versions of the backend application in the default namespace. The Iter8 service watches for service and deployment objects comprising a new version. When they are observed to be present and ready, the service will include the new version in response to Lookup() requests.

Deploy the Application

The following manual steps can be used to deploy the application. First, deploy the frontend online store component:

Next, deploy current, or baseline, version (v1) of the backend recommendation component:

Run the A/B Test

We are now ready to run an A/B test comparing the currently deployed baseline version of the backend recommendation component and a new candidate version. Broadly speaking, there are two steps to running the test. The first step is to deploy one or more candidate versions of the component. The second step is to launch an Iter8 experiment to evaluate collected metrics. This experiment will periodically execute until it is deleted. On each execution, metrics are re-evaluated.

Deploy the Candidate Version

The following manual steps show the deployment of a candidate version. However, any CI/CD process can be used to deploy candidate versions.

In this example, the candidate version is v2. The objects all share the same name which, as above, corresponds to the track identifier: backend-candidate-1.

In practice, testing depends on the user load applied to the frontend component. In this tutorial, we apply load using a simple load generation script. To use it, forward local requests to the cluster:

and call the load generator:

Launch an Iter8 Experiment

Launch an Iter8 experiment that uses the predefined abnmetrics task to periodically read the business metrics:

Inspect Results

Inspect the results of an experiment (using the command iter8 k report [-o html]) to decide whether to promote the candidate version or not. While the experiment runs, the report will be updated approximately once every minute. A sample report is:

Promoting a Winner

Once the experiment is completed, the candidate version can be deleted and the baseline version can be upgraded. Again, manual steps to promote the candidate version of the recommendation component are provided.

Delete the candidate version:

Then upgrade the baseline version:

How Does It Work? Some Details

The Iter8 Service

To watch for new candidate versions of an application, the Iter8 service is configured to know which objects make up a new version. The configuration requires only the types of the expected resource objects. Iter8 can deduce the object names given these simplifying assumptions:

  • All objects for all versions are deployed in the same namespace.
  • There is only one resource object of a given type in each version.
  • The name of each object in the version associated with the baseline track is  <application_name>.
  • The name of each object in the version associated with a candidate track is of the form <application_name>-candidate-<index> where index is 1, 2, etc.
  • The baseline track identifier is <application_name>.
  • Track identifiers associated with candidate versions are of the form <application_name>-candidate-<index>.

Finally, Iter8 assumes that there is just one candidate version (supporting an A/B test). If there are multiple versions, a maximum number should be specified for the application using maxNumCandidates:

Deploying Candidate Versions

As versions of the backend component are deployed or deleted, the Iter8 service maintains a mapping of track identifiers to the available version. To build and maintain this mapping, the Iter8 service watches the resource objects specified in its configuration (see above). In particular, the configuration requires that the Kubernetes objects comprising the backend component adhere to the specified naming convention. Further, they should have the label set to the version identifier.

Implementing a Frontend with the SDK

We demonstrate, with a Node.js example, how easy it is to modify an application’s frontend component to enable A/B/n testing with the Iter8 SDK.

First, require the gRPC library and source files generated from the specification:

Create a client for the Iter8 service:

Track identifiers are mapped to a static set of endpoints. Here, routes are stored in a map indexed by track identifier:

Identify the default route to use in case of failure:

Call Lookup(), passing the backend component and the user session (in this example, the value of the X-User request header):

Or, to write a metric, call WriteMetric(), the backend component, the user session and the metric name and value (in this example, a random number between 0 and 100):

A working copy of this code can be found here.

These changes are one-time — no further changes are required no matter how many tests are run.

Conclusions and Next Steps

We’ve explored some of the challenges doing A/B/n testing, especially of an application’s backend components. These challenges center on the ability of a frontend component to correctly attribute business metrics to the version of the backend component being tested. The Iter8 SDK enables a frontend component to correctly make this attribution by providing a lookup interface that allows the frontend component to know which version of a backend component is used for each user.

In this way, it can reliably assign business metrics to the correct version. We showed how easy it is to modify a frontend service using the Iter8 SDK and to run A/B/n tests — only a few lines of additional code were needed. Enabling candidate versions for testing simply involves the addition of a few labels.

After trying out the tutorial, try it out with your own application. If you need help, have questions or want to contribute, join the Iter8 community on GitHub and Slack.

Group Created with Sketch.
THE NEW STACK UPDATE A newsletter digest of the week’s most important stories & analyses.