Cloud-Native, Seven Years On…
Back in 2010, I published a blog post defining the term cloud-native, based on discussions between I’d had with my colleagues at WSO2. At around the same time, Netflix also started using that term in presentations. Since then the interest in cloud-native has rocketed, including many blog posts, books, and of course the Cloud Native Computing Foundation (CNCF).
The high-level concept of cloud-native is simple: systems that give users a better experience by virtue of operating in the cloud in a genuinely cloud-centric way. In other words, the cloud may make an existing database easier to start up, but if the database doesn’t support elasticity then it can’t take advantage of the scaling capabilities of the cloud.
The motivation for defining cloud-native was driven by two distinct aspects. First, we wanted to capture the thinking and architecture that went into creating properly “cloudy” systems. Secondly, we wanted to highlight that not every system that has been rebranded “cloud” was (or is) actually taking proper advantage of cloud.
Fast forward to today and we have a new definition of cloud-native from the CNCF. The new definition is much simpler, offering three main characteristics:
- Dynamically orchestrated
- Microservices oriented
In this blog post, we will explore the differences and similarities and answer the question: How has the vision of Cloud Native changed in seven years, and is there any overlap between the different definitions?
There is clear overlap between the previous definition and the new definition, which I’ll demonstrate by showing the old characteristics are implementable by systems using the new characteristics. Therefore the differences are fundamentally about how to get there efficiently, not about the core meaning of cloud-native.
Containers give us effective multitenancy for business logic and microservices, but the back end storage can still be a challenge.
Back in 2010, we were building systems that tried to make existing and new code run well in cloud scenarios by leveraging automated control of the underlying cloud Infrastructure-as-a-Service (IaaS). Since then, the most notable change is the emergence of container orchestration: tools like Kubernetes, Docker Swarm, Mesos Containerizer, and others that give the ability to schedule and manage thousands or even millions of Docker containers. These tools are the fundamental reason that the new cloud-native definition is simpler and more effective: because building components to fit into this world gives many capabilities that needed more thought, attention and machinery before container orchestration.
Let’s look at the six key attributes of being cloud-native I identified back in 2010:
- Distributed and dynamically wired: The first item ensures that workloads can be split across multiple servers and relocated. For example, if you have a microservice available on port, then container networking overlays like Weave and Flannel, or a service mesh, like Envoy, can allow workloads to be load-balanced or moved dynamically. This is covered by two aspects of the new cloud-native definition: Dynamically managed (meaning that the orchestrator can move services) and the microservice definition (that requires loosely coupled systems with dependencies explicitly described, e.g. as service endpoints).
- Elastic: the elastic nature of cloud workloads means simply that they can be scaled up and down as demand requires. This seems obvious now, but back in 2010, there were many vendors taking existing code to the cloud with no plan or model for elastic scaling. Even today, multimaster databases with dynamic sharding like Couchbase or Cassandra have a significant advantage in cloud-native scenarios over many traditional master-slave databases that only support elastic scaling of the slaves. In general, elasticity requires the code running in containers to be stateless or to use a scalable caching mechanism to manage state. This is implicit in the definitions of dynamically managed and microservice oriented from the new cloud-native definition.
- Multitenant: In 2010, we needed cloud native systems to have some concept of multitenancy. This is because the cost of running a large traditional monolithic system, per tenant, was overkill, and one of the main aims of the cloud as a whole is to enable self-service and software-as-a-service. The big change here is the onset of container packaging. Containers provide a much more effective way to deploy multitenancy by enabling each tenant to have a set of containers running on their behalf. As an example, the work I did for my PhD shows that it’s possible for ordinary people to have their own container running — handling privacy management for their IoT devices — at a cost of less than $1/year/user. Another example is Borg, whereby Google launches a new container for each GMail session, providing tenant isolation through containers.
- Self-service: Cloudy systems are inherently “as-a-Service”. In other words, they offer users the opportunity to sign up, configure remotely, choose levels of service, etc. For SaaS systems, this means self-service to end users. For IT systems, this means self-service for Developers, DevOps, Operations and others creating, building, deploying, managing and monitoring systems. It is important to make sure this aspect fits the user properly. For example, fitting into developer’s tools and mindset usually means that the self-service control of a cloud-native environment for a developer should be based on config files, command line tools, git pull requests, etc.
- Granularly metered and billed: out of the criteria identified in 2010, this seems the furthest from the simple three criteria from the CNCF. However, this is simply a logical extension of the previous three characteristics: if you allow arbitrary resources to be elastically scaled, across tenants and with self-service, then there clearly needs to be controls on resources. The best approach for that is simply cost. AWS Elastic Container Service, Google Kubernetes Engine, Docker Cloud all demonstrate that cloud orchestration services can do this. In fact, the use of containers simplifies this considerably. Back in the early days of PaaS, we built complex billing around tenants, code and services: today we can simply monitor containers across an orchestration layer using systems like Prometheus, cAdvisor, InfluxDB and others.
- Incrementally deployed and tested: this area has progressed massively in the past seven years. This is really a confluence of all three aspects: better orchestration, container packaging and the microservice model. Patterns that we were describing in 2010 are widely used and have better names too: Blue/Green Deployment, CanaryRelease, Sociable Unit testing, and more.
Why do these changes matter? Fundamentally cloud-native has given us a more adaptive infrastructure that changes more quickly to meet business needs:
- Faster redeployment: Organizations have moved to (or are moving towards) Continuous Integration and Continuous Delivery (CI/CD). Containerized deployment has been one of the significant approaches to making this happen effectively.
- More effective scaling: Code deployed as microservices into container orchestration can be scaled faster and more with more adaption to incoming load. Different parts of the application can be scaled independently, adapting to differing use of the application.
- Better evolution: Because cloud native workloads deployed in containers inherently communicate via network sockets, we can consider these to be service contracts. The result is that each container can be evolved, rewritten, redeployed without disrupting the application, as long as the contract is maintained.
If I’ve made it sound like these things we struggled to do back in 2010 are all simple now, then that isn’t quite true yet. What do you still need to consider today?
- Making your microservices/container images support a shared nothing architecture, with clean, fast startup, low memory footprint, migration and clean shutdown. This enables granular orchestration, low overheads in multitenancy, and fast elasticity.
- There are still challenges for multitenancy in the database and storage layer. Containers give us effective multitenancy for business logic and microservices, but the back end storage can still be a challenge.
- As we’ve seen above, a lot of the challenges of the last seven years have been solved by advanced container orchestration systems. Making sure that your choice can help manage self-service, multitenancy, and incremental deployment should be up there on your to-do list.
Let’s take some examples of systems trying to optimize for cloud-native, and provide a better experience for the users.
- A number of people have foretold the death of Java in a cloud-native world, due to overheads of the JVM. However, the Eclipse OpenJ9 project has put significant effort into reducing the memory footprint and startup times for JVM-based microservices in containers, including sharing classloading across containers.
- Couchbase, Cassandra and DynamoDB have all shown how databases can be scaled effectively in the cloud, using dynamic sharding and repartitioning schemes. Systems like Vitess have adapted existing databases to make them more cloud-native.
- The Ballerina language, led by my colleague Sanjiva Weerawarana, is an initiative that creates composite microservices with low-memory and fast startup times, compiling straight into containers.
- The recently announced hosting of Envoy at the CNCF takes key service-oriented aspects such as load-balancing, circuit-breaking and observability out of the application code into a cloud-native substrate based on sidecars.
Cloud-native has moved on in the last seven years, and the technologies behind cloud-native are orders of magnitude more powerful and better architected around containers than back in 2010, but the core message of building systems that take full advantage of cloud is still the same.
The Cloud Native Computing Foundation is a sponsor of The New Stack.