So, You Want to Go Cloud-Native? First, Ask Why
Over the last couple years, the term “cloud-native” has entered the collective consciousness of those designing and building applications and the infrastructure that supports them.
At its heart, cloud-native refers to a software architecture paradigm tailored for the cloud. It calls that applications 1) employ containers as the atomic unit for packaging and deployment, 2) be autonomic, that is centrally orchestrated and dynamically scheduled, and 3) be microservices-oriented, that is be built as loosely-coupled, modular services each running an independent process, most often communicating with one another through HTTP via an API.
Dissecting those characteristics further implies that modern applications need be platform-independent (e.g. decoupled from physical and/or virtual resources to work equally well across cloud and compute substrates), highly elastic, highly available and easily maintainable.
By the sound of that, it holds that building cloud-native applications is a no-brainer for every organization, whether they consider writing software business-critical or not. In practice, however, going cloud-native–much like adopting DevOps–requires putting into place a broad set of new technologies and practices which meaningfully shift around overhead costs associated with writing, deploying and managing software. So before considering going cloud native, it’s imperative to understand the motivations for this architectural transformation, both technically and organizationally.
A good place to start is with Google, the poster child for this highly distributed, autonomic computing paradigm. Google has been running on containerized infrastructure for nearly a decade and manages resource allocation, scheduling, orchestration and deployment through a proprietary system called Borg.
“Borg provides three main benefits,” a 2015 Google research paper, Large-scale cluster management at Google with Borg, explained. The approach “hides the details of resource management and failure handling so its users can focus on application development instead.” A Borg-ian approach also “operates with very high reliability and availability, and supports applications that do the same [and] lets us run workloads across tens of thousands of machines effectively.”
So Google’s rationale for going cloud-native is to achieve 1) agility, as defined by developer productivity and self-service, 2) fault-tolerance and 3) horizontal scalability. And while almost no organization has to operate at the massive scale of Google, every company in the world asks itself “how do I go faster” and “how do I minimize risk?”
Problems arise, however, when going cloud-native becomes an end, not a means. While containers, autonomic scheduling and microservices-oriented design are all tools which can facilitate operational agility and reduce risk associated with shipping software, they are far from a panacea and involve shifting meaningful costs from dev to prod. Martin Fowler and others have termed this phenomenon the “microservices premium.”
“The [cloud native] approach is all about handling a complex system, but in order to do so the approach introduces its own set of complexities. When you [adopt cloud-native architectures] you have to work on automated deployment, monitoring, dealing with failure, eventual consistency, and other [complexities] that a distributed system introduces,” Fowler wrote.
The prevailing fallacy is to conflate using Docker as package format with the need to build an application as a complex distributed system from the get-go.
The first rule of the thumb is “if ain’t broke, don’t fix it,” so there’s no need for added complexity if your team is functioning at a high level, releases are on schedule and your app is resilient and scaling to meet the demand of users. Sustained high levels of developer productivity, continuous deployment and fault tolerant systems can be and are often achieved without so much as ever interacting with a Dockerfile (though it can radically simplify the development workflow). In fact, many of the most elegant delivery pipelines in high-performance software organizations are AMI-based and deployed by Slackbots!
However, as your engineering organization balloons to 100+ devs, going cloud-native – including stand up the entire distributed runtime – very well could begin to make sense. Just remember, all these decisions are tradeoffs, where complexity is merely shifted not reduced.
Docker is a sponsor of The New Stack.
Feature Image by Ryan McGuire via Gratisography.