Puppet Engineer: Peeking Inside Containers, You Find the Stack Isn’t All That New
Are the containerized applications you’re finally moving into production truly reformed, cloud-ready, and tailored for orchestration and continuous integration? Or are many of them mainly containerized refits of the client/server applications your organization has been using since the Clinton administration?
In a presentation delivered Monday afternoon at Configuration Management Camp 2017 in Gent, Belgium, Puppet senior software engineer Gareth Rushgrove updated his familiar theme of enterprises not really knowing what they’re containerizing. That update brought with it some new and startling evidence that organizations simply containerizing existing legacy workloads are overwhelming those who are taking the opportunity to re-engineer them for efficiency and security.
“I think, among people who just focus on containers and just think about that worldview,” said Rushgrove in a follow-up interview with The New Stack, “the conversations are often being quite idealized.”
Rushgrove suggested that if one were to survey developers, and from that sample determine how many are using scratch containers — which he characterized as the “ideal” often posited by proponents of new stacks — or using Nix, or a similar tool, for version control; or using a minimal Linux distribution within their containers such as Alpine Linux, one would see a great many hands raised.
“That’s what they’re doing, and that’s what the people around them are doing,” he said. “From seeing what some of our customers and what some larger organizations are doing, pulling their data from GitHub, it’s really not what’s happening in the real world.”
The New Stack and the Big World
Rushgrove was inspired by a recent blog post by Docker contributor David Gageot, who created a SQL query for Google BigQuery capable of extracting the identities of the base Linux images included in containers pushed to the GitHub public archive — some 281,212 Docker files, by Gageot’s count. Some 9.5 percent of all images queries used the full Ubuntu 14.04 as their base Linux image, and about 19.4 percent of those files in total were shown to include some version of Ubuntu.
Alpine Linux — an intentionally lightweight distribution suggested, though not necessarily intended, for use in containers — ranked far lower on Gageot’s list, with the most frequently used version placing #30 among Linux versions, and with “Alpine” showing up in the titles of fewer than two one-hundredths of one percent of Docker files on GitHub.
“Yes, Alpine usage appears to be growing more quickly than other things,” said Rushgrove. “But it’s coming from such a low start.”
Actually, Rushgrove may be intrigued to learn that the preponderance of surveyed developers does not tilt so heavily towards the “ideal.”
“I think, among people who just focus on containers and just think about that worldview … the conversations are often being quite idealized”– Gareth Rushgrove
In an August 2016 survey of container developers conducted by container analysis firm Anchore, given a list of 15 container host operating systems including minimalized container-specific kernels, plus “None” and “Other,” some 34.5 percent of respondents said they use Ubuntu, compared to 13.7 percent for CoreOS (now called Container Linux), 3.6 percent for RancherOS, 1.8 percent for Red Hat Atomic, and 2.5 percent for Ubuntu’s own Core.
As Rushgrove deduced from Gageot’s data, citing one of his slides from Monday, “The majority of people using Docker are using images containing an entire operating system filesystem.” That statement prompted Chef product manager Julian C. Dunn to tweet later that day, “Yeah, this is awful and terrifying.”
— Julian C. Dunn (@julian_dunn) February 6, 2017
Ever since Docker first burst onto the scene, there have been suggestions — and there have also been demonstrations — that a fully contained Linux kernel hosted by another full Linux kernel, presented an attack surface the size of a large planet. Last December, Aporeto engineer Stefano Stabellini argued in The New Stack that, while such a configuration may be difficult to hack, it may be equally difficult to reliably secure.
Paving the Cow Paths
But speaking with us, Rushgrove surprised us by playing down that concern.
“I think the idea that that’s wrong is sort of misleading,” he said. The container “purist,” in his view, may argue that the ideal of containerization is based on the original Google Borg model. Though Gageot’s data presents startlingly irrefutable evidence that this ideal has not been as widely disseminated as idealists may wish, Rushgrove suggests that the solution to the terror Dunn cited — rather than step up our evangelism as to why the ideal model is preferable — is to begin securing the model that organizations are actually choosing.
“The thing that Docker hit on, in particular, was a really easy way for people to use their software inside containers,” he went on, “without having to rebuild the world. And I think the direct result of that path of least resistance is using entire operating system filesystems inside containers. It doesn’t matter whether it’s good or bad, or a better or worse idea. It’s what people are doing for compatibility reasons. It’s what will get us to a world of containers in the real world.”
The real problem that the developer tool community must now face, Rushgrove suggested, deals only partly with security: Tools makers are building and refining their products and methods, he argued, for the idealists and first-adopters who manage the hyperscale systems patterned after Google, Facebook, or Netflix. While that pool is indeed growing, it’s the ocean of general adopters who are scaling out far more quickly. Configuration management, among other jobs, will be different for this latter group because of the choices they’ve made. What’s more, these choices are only more likely over time to be permanent.
“It’s sort of an upsetting realization,” he told us.
Then Why Containerize?
But if organizations prefer to take the path of least resistance, I asked Rushgrove, wouldn’t they prefer to avoid the entire effort of containerization and stick with floating their first-generation virtual machines on vSphere, XenServer, or KVM? Why not take the slipperiest, most friction-free, path available to them?
His response was somewhat roundabout. The ideal of containerization tends to conflate two advantages, he said. First, there is the scheduler, upon which developers rely to enable resource portability, resource optimization, and an overall boost in the robustness of the software infrastructure. Schedulers have already encoded the best practices of operations and infrastructure managers and DevOps teams over the last decade-and-a-half.
Secondly, there is the principle that containers catalyze better development architectures, such as microservices. This conflation suggests that an organization cannot have one without the other — that to take advantage of scheduling and portability, one needs to undertake the seemingly insurmountable task of rewriting everything for this new architecture.
Yet uncoupling the dependencies that most software has already built in (a chain forged by Linux, if I may add, just as much as Windows) may be driving organizations to choose containerization — and choose it more quickly — as a way of avoiding re-architecture, bypassing the big mountain by way of the valley.
Then should the security community and the container tools community come together, I asked, to build new security methods for this broader attack surface, with the understanding that most enterprises will choose the valley path, if you will?
“If we appreciate what people are actually doing, then we can — individually or collectively — educate people to do, quote-unquote, ‘better’ things,” Puppet’s Rushgrove answered. “On the other hand, there’s an aspect of ‘paving the cow paths…’ There’s actually a lot of tools that we’ve used over time to solve that type of [security] problem. But containers are changing the game. It’s not about using the same tools; it’s about going back to the problem those tools were designed to solve, and then say, ‘What would a tool to solve that problem, look like in this new context?’”
Your Garden Variety Operating System
Thing is, many of those older tools carry a Windows logo. If we’re busy paving the way for enterprises likely to take the path of least resistance, then before long, won’t this pastoral landscape start looking less and less like Linux after all?
Rushgrove admitted that his data reveals very little Windows Container usage presently. “But I’d quite like to redo the queries in, for example, a year, and track some of this over time. I think that will increase quite rapidly.”
One step Rushgrove suggests the container evolutionary track should take, is to incorporate a more detailed inventory list of contained components and their dependencies. While such a list may only prove moderately useful for microservices, it could conceivably greatly aid schedulers and orchestrators in managing and securing the bulkier containers in more general use today. That type of list may especially be helpful, he said, as Windows Containers become more widely adopted.
“The idea is to more easily ask any of these containers, with a consistent API, questions about themselves,” he told The New Stack. “From that idea, you can imagine high-level tools that are useful for solving real operations problems.”
In his presentation Monday, Rushgrove asserted that such a tool would address the need all containerization platforms have, for immutability. As one of his slides put it, “Immutability means we need to know what we put inside the box.”
The New Stack’s Lawrence Hecht provided information for this report.
Feature image: “Cow on the way to the Double Lake in the Valley of the Seven Lakes in Triglav National Park,” by Dreamy Pixel, licensed under Creative Commons 4.0.