Can Kubernetes Orchestrate the Infrastructure?
Is it possible that managing stateful applications on Kubernetes will become easier than handling the stateless apps containers were originally designed for? Murli Thirumale, CEO of Portworx, said that’s what customers are telling him.
When it comes to encouraging the adoption of Kubernetes in production in an enterprise setting, that is good news, but not something everyone would agree with. In a recent poll by The New Stack, 10% of respondents thought that improving Kubernetes’ integration with storage was the top challenge for the community — after concerns about user experience and support for multitenancy.
But Thirumale isn’t the only one who thinks the Kubernetes ecosystem is ready for stateful workloads. “People are starting to do serious stateful workloads in the cloud and in Kubernetes, in particular,” Quinton Hoole, technical vice president of Huawei’s Futurewei Technologies, said on a recent edition of The New Stack Makers podcast.
In fact, many of Portworx’s customers were at KubeCon talking about the ways they are using Kubernetes to manage data-rich applications, often at incredible scale. Kubernetes is being used to manage real-time data analytics on data feeds from edge devices, to provide a free computer science course to a million students around the globe and to help established companies with thousands of legacy applications modernize, both in the cloud and on-premises.
“I think that Kubernetes is going to move from managing applications to managing infrastructure, like storage,” Thirumale said. “My belief is that 2020 is the beginning of where Kubernetes becomes the new control plan across the data center and cloud — looking upwards to manage applications and downwards to manage infrastructure.”
What might that future look like?
Real-Time Big Data Analytics
ESRI’s geospatial mapping application gives customers a way to process data from edge devices and translate its visual dashboards, generally maps. ESRI’s IoT platform ingests data from sensors and other edge devices clients have in the field — sometimes literally out in a field, in agricultural uses, but it’s also used at airports, in weather and other environmental monitoring, for law enforcement and dozens of other use cases.
“The application we’re using Portworx for enables customers to connect sensors to their GIS applications so they can see data move around and see what’s happening in real-time, from sensors out in the field,” explained Adam Mollenkopf, real-time and big data GIS capability lead at ESRI.
Before developing this SaaS platform, ESRI helped companies set up an on-premises to do real-time data analytics from IoT devices. While customers were generally happy with that system, Mollenkopf said, it was also very expensive to set up, leading ESRI to look for a way to provide the same functionality in a cloud-based, SaaS format that would be accessible to a larger number of customers.
The ArcGIS management console, as ESRI’s application is called, sits on top of a hot data store, streaming data analytics and big data analytics. As the feeds come in (the type of feed depends on the data source — there are more than 50 different types), they are written to a Kafka cluster. ArcGIS then uses Spark for real-time analytics, which then writes to Eslasticsearch databases.
This stack clearly involves a series of stateful services, all of which are connected to AKS with Portworx’s open source STORK (Storage Orchestration for Kubernetes) as well a connected to Portworx’s data platform to ensure high availability and disaster recovery as well as to manage all data encryption.
In addition, ESRI uses Portworx’s Autopilot to automatically resize persistent volume claims, disks and pools, based on pre-determined rules. This ensures there’s no downtime involved in provisioning additional storage while removing the need to manually provision volumes.
Teaching a Million Students, for Free
How do you run an online computer science course, with 30,000 simultaneous users spread around the globe, that needs to be up around the clock? Harvard’s CS50 course has about a thousand students on campus in Cambridge and a million registered students online. The course has been running since 1989 and moved to a cloud-based IDE in 2015. Even using AWS Cloud9 directly with EC2 instances and EBS for storage, there were challenges — especially, but not exclusively, related to cost.
“We had to allocate an EC2 instance and an EBS volume for each user,” explained Kareem Zidane, software engineer at Harvard University. “We also had to assign each user an availability zone, which meant that if there was an instance available but it was in a different availability zone, we couldn’t use it.” For a free course with a million users, this clearly created a cost problem.
Moving the CS50 application to Kubernetes solved many of the problems related to managing compute resources — instead of assigning an EC2 instance for each user, each student would get a container that would spin up when they were active and would be automatically killed when they were done. Kubernetes would pack multiple containers in each instance. Kubernetes was able to abstract away most of the management required to run containers but didn’t solve the storage challenges. With Portworx and Kubernetes together, Harvard could provision both storage and compute thinly, with Kubernetes handling the EC2 instances and Portworx managing the EBS volumes.
From an end-user perspective, Portworx solved some additional pain points. Uploads and downloads had been slow and fragile, but using Portworx volumes solved that problem. In addition, Portworx automatically pushes students’ work to GitHub, so if a pod fails no work is lost.
Modernizing Legacy Apps
Granted, ESRI is a technology company and Harvard’s CS50 is run by computer scientists. What does using Kubernetes for stateful services look like in a company with less tech-savvy leaders?
“Cloud native is great, but cloud native has state somewhere,” explained Satish Puranam, technical specialist at Ford Motor Company on theCUBE. “Can we do that with Kubernetes? I think we’ve done a reasonably good job. We have quite a few workloads in production that are stateful.” These include data messaging applications and databases.
Like many companies, Ford started its digital transformation journey in an effort to reduce costs and increase the speed at which applications can be delivered. Puranam said that net new applications are all being written to be cloud native — including those that need to be stateful. Obviously, though, Ford also has legacy applications to think about, too — thousands of them, and many of them are what Puranam calls ‘table stakes applications.’ As Ford works on pay down its technical debt, moving these applications to Kubernetes, including those that require storage, is one of the company’s technical priorities.
“The idea is heavier and heavier workloads are going to be landing on Kubernetes,” Puranam said. “That’s what we’re going to be focusing on in 2020.