TNS
VOXPOP
Where are you using WebAssembly?
Wasm promises to let developers build once and run anywhere. Are you using it yet?
At work, for production apps
0%
At work, but not for production apps
0%
I don’t use WebAssembly but expect to when the technology matures
0%
I have no plans to use WebAssembly
0%
No plans and I get mad whenever I see the buzzword
0%
CI/CD / Cloud Native Ecosystem / Kubernetes / Software Testing

Simulate Kubernetes Cluster Behavior with SimKube

SimKube can replay a trace from a Kubernetes production cluster in a simulated or development cluster. Good for troubleshooting, parameter testing.
Dec 29th, 2023 1:28pm by
Featued image for: Simulate Kubernetes Cluster Behavior with SimKube
David Morrison speaking at KubeCon.

Kubernetes is complex, and, once in production, costly to debug. If only there were a way where you could test a new Kubernetes deployment in a realistic fashion before you actually run up the cloud provider bills?

SimKube is an effort to offer this sort of capability. It can record some behavior on an actual cluster, then play it back on a simulated cluster so the behavior can be inspected in detail. Down the road, such a technology could go beyond simulated playback and even help the user try out different scenarios,

It is the brainchild of David Morrison, founder of Applied Computing Research Labs, a research and development company focusing on modeling, scheduling, optimization for distributed systems. He discussed SimKube, and the need for such a technology in a session at  KubeCon + CloudNativeCon North America, held earlier this year in Chicago.

Though functional, SimKube is still in the early stages of development, but as Kubernetes moves into more production settings, more behavioral analysis and monitoring tools such as this one will be necessary, not only to replay troubled scenarios, but to sketch out and try new ones, Morrison argued.

What is the mean time for setting up a clusteror to start a node? Can we compare two different simulations to see which one is more efficient?

“Being able to answer those questions is really important, and I would say in the Kubernetes ecosystem, we are lacking those kinds of tools now,” Morrison said.

Imagine the Possible

Helpfully, Morrison sketched out how a Kubernetes simulation tool can be used in a variety of scenarios.

One obvious use case is troubleshooting: A cluster could suffer an outage, but the root cause is not known.  An admin could replay on their laptop a trace of the incident taking place at the time. And when a potential fix is formulated, it can be tested on the simulated cluster first.

It could be used to prevent potential issues as well.

A CI/CD engineer responsible for all the pipelines leading to the clusters might want to ensure users do not add any regressions into their new K8s configurations. They could do so by requiring that they run simulation as a part of the onboarding process.

Simulation software could also be a powerful ally in extracting the most value from your Kubernetes deployments.

The Kubernetes kube-scheduler has plenty of knobs to turn, in terms of setting priorities for assigning workload to nodes, though it leaves one guessing as to what the optimum scheduling would be. Imagine running different scenarios through a laptop, using real production data and seeing some numbers about which configuration works best. You could even feed the ops data into a ML machine for “hyper-parameter” tuning.

In terms of scheduling, batch programming has been a particular challenge with Kubernetes, which does not have a robust set of batch primitives to work with. But large batch processing jobs are becoming more prevalent, thanks to MLOps, which must handle large language model or diffusion model workloads. This has led to alternate schedulers such as Volcano, which also can be easily tested before making a commitment.

Meet SimKube

SimKube, created by Morrison and written mostly in Rust, is a package of six tools to simulate Kubernetes scheduling and autoscaling behavior.

A command line utility, sk-ctrl, provides the way to export operational data from the production clusters and then replay that data on a simulated cluster, in effect simulating the same behavior.

How SimKube works.

Placed in the production environ, sk-tracer collects data off the API server. It can watch and create a timeline (called a trace) of resources and pods being spun up or down on a cluster, and take note of any pre-defined special events that take place. If you have a custom controller, it can watch that as well.

When requested by a user, sk-tracer saves the trace to a file. The trace object itself is mostly a series of timeline objects (serialized in a JSON-like binary format).

When the user wants to rerun a trace, sk-driver downloads the trace object and runs it through the API server on the simulated cluster (either one in dev or KIND running on a laptop).

Virtual Nodes

Providing the imaginary bits of a simulated cluster are sk-vnode and sk-cloudprov. sk-vnode is basically a Virtual Kubelet pretending to be a node. It responds with a status when pinged, but there is no container there.

“You can spin up hundreds and thousands of these things just on a local laptop. It is really lightweight,” Morrison enthused.

To mimic your production cluster, you export the production configuration file using kubectl, using the node definitions to set up the identical simulated nodes.

You deploy the simulated cluster just like you would a regular one, using Kubernetes Cluster Autoscaler. This software gives the user the ability to provision and de-provision nodes across all major cloud service providers, though it also provides the ability to do this through a custom interface. Morrison built sk-cloudprov to mimic a gRPC-based cloud service provider, though in reality, it is just dispatches the requests to sk-vnodes, which in turn deploys virtual nodes on your simulated cluster.

Hypothetical Traces

Right now, you can replay the results on SimKube and watch it unfold (again) on Grafana and Prometheus,

But beyond re-running traces of actual events, the technology could set the stage for answering what-if scenarios. Trace objects could be modified or even created from scratch. Hypothetical traces could be applied in simulated clusters, to answers questions such as “What if we had a deployment that scaled up to 10,000 pods?”

“That could be a really powerful thing,” Morrison said.

More Kubernetes Sims

KWOK is short for Kubernetes Without Kubelet.

SimKube is not the only Kubernetes simulator in town. At KubeCon, two Apple researchers also discussed a similar effort, called KWOK, short for Kubernetes Without Kubelet.

KWOK “is a toolkit that enables setting up a cluster of thousands of nodes in seconds. Under the scene, all nodes are simulated to behave like real ones, so the overall approach employs a pretty low-resource footprint that you can easily play around on your laptop,” the documentation explains.

Kubemark, the Virtual Kubelet and the KCP (Kubernetes Control Plane) are other efforts along these lines.

“So there is a ton of duplicated work going on here. Lets stop duplicating all this stuff and focus on one thing,” Morrison said, adding that he hopes SimKube could be part of this work.

Catch Morrison’s entire presentation here:

Group Created with Sketch.
THE NEW STACK UPDATE A newsletter digest of the week’s most important stories & analyses.