Optimizing Kubernetes for Peak Traffic and Avoiding Setbacks
Since the early days of technology, organizations have faced the challenge of traffic spikes at times of peak usage.
You can provision for expected traffic levels. But it’s still difficult to handle sudden spikes in the application load, such as those experienced by retail industries during holidays or from a new product launch following a large marketing campaign. You might expect organizations to be ready to handle predictable spikes — like during Western commercial holidays such as Thanksgiving or Christmas — but even with planning, it’s common for spikes to result in poor performance or outages.
The same issues can also arise for regular events, such as end-of-week processing for businesses or monthly invoicing and HR payroll cycles. These events may be expected, but they still have the potential to seriously disrupt applications if the underlying platform is not prepared to handle the increased load. Often in microservices-based application architectures, acute disruptions simply lead to momentary performance loss. But if the underlying problem isn’t resolved, sustained peak demands could eventually affect the overall availability of the applications, resulting in downtime.
Scaling Kubernetes Deployments
While correct sizing of systems is valid for any kind of workload, it’s particularly necessary for Kubernetes deployments. This is because of the extra complexity of multiple abstractions and layers within a Kubernetes infrastructure. These include everything from the physical or virtual systems and Kubernetes API server orchestration, to the running pods and applications within containers.
Therefore, tuning demands proper resource management to avoid issues like running out of memory or insufficient processing power.
While Kubernetes’ various autoscaling capabilities play a role in handling expected and unexpected traffic spikes, this doesn’t mean that accurate scaling is to be expected out of the box. While elasticity is a core characteristic of Kubernetes and containerized workloads in general, it creates a burden of ensuring that those dynamic processes are properly tuned for a wide array of traffic scenarios.
Let’s look at the challenges of preparing for spikes. Starting from the planning phase, we’ll share several best practices to help you prepare for peak traffic. Then we’ll look at the integration of artificial intelligence (AI) and machine learning (ML), as used by StormForge, to help you optimize for specific load scenarios using testing and simulations before pushing your applications to production.
How to Prepare Your Kubernetes Application for Peak Traffic
It’s important to emphasize that this preparation activity should be conducted at regular intervals. However, that’s easier said than done, because not all peak usage scenarios are the same. Application updates can nullify known good configurations, so load testing and optimization would ideally be integrated into the deployment pipeline to account for any and all platform changes, including Kubernetes updates as well as any updates to containerized applications. Peak traffic conditions can be expected as well as unexpected, so having the right tools and processes to give you the best chance to handle them gracefully is an ongoing effort as applications and business conditions change.
Example: Online Ski Store
Let’s look at an example. An online retailer who sells skiing equipment expects a spike in usage because heavy snow is expected during the coming winter. Suppose you’ve configured your Kubernetes platform with basic autoscaling capabilities but you don’t have confidence that it can gracefully handle the expected surge. Let’s imagine a few scenarios where sporadic and excessive traffic can result in poor user experience or even downtime:
- Resource exhaustion: During peak traffic, there may be an influx of requests, resulting in the exhaustion of some or all available resources. This can cause containerized applications, or, possibly even the full cluster, to become unresponsive or crash.
- Network congestion: If the network infrastructure is not sufficient to handle the volume of traffic during peak times, or there are more connections coming into the cluster than it can handle, this can result in slow response times and potentially cause services or clusters to go down.
- Application failure: If an application service within the cluster cannot handle the increase in traffic during peak times, it may crash or become unresponsive, leading to downtime for the application and, in some cases, the entire cluster.
- Inadequate scaling: If the cluster is not properly configured to scale out or up during peak traffic, it may become overwhelmed, resulting in downtime.
- Container crashes: If a single container deployment with dependencies goes into a crash loop for a variety of reasons during peak traffic, it may cause disruptions to other applications and lead to cluster downtime.
You could say these adverse scenarios should have been prevented. After all, the retailer was aware of inbound weather patterns and expected a corresponding increase in traffic. However, it’s extremely difficult to estimate how much additional load the spike in traffic will cause to their systems. There will always be a certain amount of guesswork, especially when relying on manually tuning the litany of scaling parameters available to Kubernetes architects and admins.
Experiment to Optimize
The biggest challenge is fine-tuning to account for these potential issues. Manually tuning the long list of parameters that drive the autoscaling behavior in Kubernetes demands extensive experimentation and observation to dial in an optimal configuration properly. It might require setting up a testing environment, which requires resources. Next, you need to deploy observability tools to get useful telemetry data from the testing environment. From there, you need to design the actual experiments, running multiple sequences while altering the parameters. Then you likely need to repeat the whole process multiple times.
It’s important to focus on experimentation and observation. Starting from system behavior (heavy sustained load, spikes or outage resiliency), your Ops team will learn from the data gathered and reconfigure the platform and the various application deployments accordingly. However, achieving optimal results is hard if you’re doing this manually, and not everyone is comfortable with turning all of the knobs to 11 as a panacea.
StormForge Optimize Pro
Another challenge is that not all organizations can afford to run a test environment that closely reflects or is identical to the production environment. Even in the example of seasonal activity, you could start with the same traffic load as the previous year, but it will not be exactly the same, and peak usage this year might result in potential outages.
Instead of relying on manual reconfiguration of your Kubernetes environment to account for peak usage, your Ops or platform teams could consider using ML to assist in scenario optimization. For example, StormForge Optimize Pro can be implemented to help test both clusters and containerized applications under a wide variety of traffic scenarios where its ML can determine the optimal configuration of the many parameters available in Kubernetes.