What news from AWS re:Invent last week will have the most impact on you?
Amazon Q, an AI chatbot for explaining how AWS works.
Super-fast S3 Express storage.
New Graviton 4 processor instances.
Emily Freeman leaving AWS.
I don't use AWS, so none of this will affect me.
Cloud Native Ecosystem

KubeCon 2023: Managing Pets, Cattle … and Starfish?

This years keynote themes addressed concerns about security, complexity, AI and other questions attendees had about taking the cloud native plunge.
Nov 14th, 2023 11:41am by
Featued image for: KubeCon 2023: Managing Pets, Cattle … and Starfish?
Feature image via Pexels.

This year’s KubeCon + CloudNativeCon North America keynotes covered a lot of ground about the challenges of adopting and managing Kubernetes — as opposed to serving up those talks that might be construed as a cheerleading shift to cloud native.

While Kubernetes, warts and all, arguably represents the best way to scale operations and software deployment across distributed environments, This year’s keynote themes addressed concerns about security, complexity, AI and other questions attendees had about taking the cloud native plunge.

When Things Go Bad

Things can and will go wrong, and with Kubernetes, things can go very wrong, indeed. In his keynote “Containers Might be Ephemeral, But Can Your Business Afford to Be?”, Chris Wiborg, vice president, product and solutions marketing for data management tools provider Veritas, offered ways how organizations lose data. He listed the usual suspects, including basic dumb human error, insufficient backups, lack of geographic distribution and other common data loss threats.

There are all of these Cloud Native Computing Foundation  projects that are focused on enhancing cloud data security, and all the great vendors out there on the show floor with possible solutions that you should try to prevent that biggest from happening” Wiborg said. “And yet, they still do as we see from the headlines. But I’m guessing that these days I don’t really need to scare anyone in this room about things like ransomware — we are all living with it now. But there are also more mundane ways to lose your data.”

Residency is key. Referring back to the well-known pets vs. cattle analogy of managing applications, Wiborg offered an additional animal symbol to the equation: a starfish. That is because, if a starfish loses a limb, it can grow back. “In other words, starfish are more resilient to harm… Data and infrastructure need to be thought of the same way. Can we make it more resilient to keep apps up and running even when something disappears?”

Monitoring software provider Datadog offered a transparent review of what can happen when Kubernetes goes horribly wrong — and lessons learned along the way.

Datadog experienced a massive global outage that took almost 24 hours to mitigate and a further 24 hours to backfill data after restoring full app availability, Datadog’s Laurent Bernaille, principal engineer, and Hemanth Malla, senior software engineer, said during their keynote “Everything, Everywhere, All At Once.”

They described how Datadog lost more than 60% of its Kubernetes nodes in less than an hour, and the challenges associated with attempting to recover the tens of thousands of impacted nodes across hundreds of clusters.

The idea was to detail “the hardest incident we ever had to deal with at Datadog,” Malla said.

Datadog had ample engineering support (but what about the nimble startup that would not have had these resources?). Around 400 engineers were on call to put out what was more than just an outage. They began to recover some of the lost Kubernetes nodes by simply rebooting their instances on Google Cloud. Once the nodes were recovered, they went to work analyzing the logs “Those system logs told us that there was an unattended upgrade on these nodes,”

Since the incident, Datadog has been “working very hard” to build more lifecycle automation blocks that can replace thousands of nodes every day with “minimal impact.”  “And it does that in Kubernetes. As you can imagine, this is a lot of migration.”

Much discussion inevitably was around the profound changes coming and coming with AI. As Tim Hockin, a distinguished engineer, Google Cloud,  said during his keynote “A Vision for Vision — Kubernetes in Its Second Decade.”

“Opportunities and threats” are the top concerns among a number of Kubernetes maintainers, adopters and others entrenched in cloud native expressed when asked about the next 10 years of Kubernetes, he said. But for AI, how AI will manifest itself during the next two decades remains a mystery and “that’s okay,” Hockin said. “What does that mean for Kubernetes? Honestly, I’m not sure. I don’t really understand it,“ Hockin said.

However, Kubernetes is “really well positioned to be the platform of choice for AI ML and flow,” Hockin said. However, “this conference will show you that but we don’t really know yet all of the things that we’re going to need to do to make it successful so we need to be listening and watching and probing and asking questions,” Hockin said.

In addition to AI, one of the major elephants in the room was the changing climate. Yes, there are many initiatives to reduce computing resources as a way to indirectly reduce CO2 emissions. And as I would argue, it will be the computing frameworks for climate and weather analysis, resource consumption and pure scientific research that are necessary if a solution is to be found.

Already, for Kubernetes and IT in general, CPU consumption alone, and the process of reducing consumption for redundant CPU servers across the cloud and in data centers represent low-hanging fruit to reduce power consumption and thus CO2 emissions.

However, as Niki Manoledaki, a software engineer for Grafana Labs noted during the panel keynote “Environmental Sustainability in the Cloud Is Not a Mythical Creature,” we are just at the starting block unfortunately. “Measuring and reducing the energy and carbon footprint of our software is not very widespread but this is changing and we are seeing momentum,” Manoledaki said.

But already, global temperature warming is close to exceeding the 1.5 degrees Celsius tipping point, and “we have already exceeded this tipping point in certain regions,” Manoledaki said. “It is scary to think about it.”

Group Created with Sketch.
THE NEW STACK UPDATE A newsletter digest of the week’s most important stories & analyses.