Data / Microservices / Contributed

Apache Cassandra: Persistent Data Storage for Scalable Microservices Applications

10 Feb 2020 11:55am, by

Patrick Callaghan
Patrick Callaghan was recently appointed VP of Enablement at DataStax, where he was previously an Enterprise Architect and Strategic Business Advisor for the last six years. Patrick works with companies implementing and supporting mission-critical applications across hybrid and multi-cloud environments. Prior to DataStax, Patrick held roles in the banking and finance sectors architecting and developing real-time applications.

Software engineers and architects are moving away from monolithic applications that are built on one single, chunky code base. This approach has become more difficult to manage and scale as companies need to run globally, conduct business around the clock, and work in a world that requires more agility and responsiveness to customer demand.

The new approach of microservices architecture has emerged to fill this gap. Enterprise software teams have learned from their experiences around domain-driven design and embraced continuous integration and continuous delivery’s (CI/CD) ability to get software into production more efficiently. Using microservices, teams can respond faster to changes without having to rework the entire application. They have championed the use of small teams for the lifecycle of a service and have demonstrated how it is possible to build robust systems that can scale up to cope with today’s application requirements.

Planning the Move to Microservices

The microservices approach builds on all the practices and technologies that have come before to create complex applications formed by small, independent processes that communicate with each other using language-agnostic and lightweight application program interfaces (APIs) such as REST. These services scale by distributing these services across multiple servers or machine images, and then replicating these machines as they are required to scale up.

These services are small building blocks, highly decoupled and focused on doing a small task, facilitating a modular approach to building systems. Each of these services is independently deployed and managed. Technologies such as containers are becoming the default choice for creating such services.

If you have an existing monolith application, making the move to microservices involves carving out tasks into different and discrete services. Over time, all or most functionalities will be implemented in the microservices architecture. You can split monoliths based on business logic, front-end, and data access. As a monolith is split based on the modules of the application it will gradually shrink and when new functionality is required, rather than creating more code for the monolith, we can create a microservice instead.

Running services independently delivers some significant benefits:

  • A polyglot approach: As long as the service’s endpoint API returns the desired output, you can select any language or technology for developing it.
  • Deployment joy: The independence of microservices makes them much easier to deploy. Unlike a monolith application, updating or scaling a component doesn’t require taking down the entire application.
  • Fewer cascades: Similarly, a failure in any one service doesn’t cause a cascading failure across the application. Partial failure can become an issue if you do not follow a good design approach (for example, the Netflix method), but service independence does make the debugging process more focused.
  • Recycling: Once you are marching down the microservices road, the code of service can also provide functionality that can be re-used easily enough in other projects.

The Challenges with Microservices Applications

Along with some significant benefits, microservice architecture brings with it some challenges. However, many of these can be solved by applying the right approach.

From the outset, it is very important to choose the right functionality for a service. Creating a microservice for each function of a monolith will bring unnecessary complexity. The goal of microservices is to break down the application to enable agile application development and deployment. A useful rule of thumb, advocated by Sam Newman, a leading thinker on microservices, suggests that when a codebase is too big to be managed by a small team then it’s sensible to consider breaking it down.

Similarly, interservice communication can be costly as well if you don’t implement it correctly. You will need to choose the method that fits the requirements with the least overhead from options such as message passing and RPC. For example, a notification to a customer that their taxi or parcel is arriving only requires a one-to-one, one-way request, not a one-to-many notification that then expects a reply within a specified timeframe.

When you require state, rather than storing it internally, it’s easier to store state information externally in a type of data store.

Another challenge is complexity. Deploying a microservice application will normally require a distributed environment that can run across multiple environments, from different servers in a data center up to fully distributed environments like the cloud. These distributed environments will then need management using a container orchestration tool, such as Kubernetes. Thinking through how to automate processes like new containers being created by Kubernetes can take away a significant scaling headache.

Alongside this, you will have to consider how to test and manage the application over time. End-to-end testing of a microservice can be challenging because of the number of services and platforms involved, and their interdependencies. Ultimately, being aware of the challenges, microservices will continue to make sense for your enterprise when your applications are complex and continuously evolving.

Microservices and Data

Alongside the application processes, it’s important to look at the data that it will create. Each microservice can either be stateless or stateful. What this means in practical terms is that stateless microservices do not maintain any state within the services across calls that are made. The service will take in a request, process it, and send a response back without persisting any state information for future calls.

A stateful microservice will persist in some form to enable it to function. A system that uses microservices will typically have a mix of stateless and stateful components — for example, a service to change a file may not need to keep a copy of that file over time, while the components behind a customer service application will create data that has to be stored.

When you require state, rather than storing it internally, it’s easier to store state information externally in a type of data store. The type of data store that can be used to persist state will depend on your needs and how much data you expect to create over time. Options here include traditional relational database management systems (RDBMS), NoSQL databases, or some type of cloud storage. Persisting the state externally provides availability, reliability, scalability and consistency for the state information.

For applications that create large volumes of data that have to be organized, a database will normally be a better option than object storage or cloud storage. For applications that will involve transactions or customer service, performance at scale can be important too. Looking at NoSQL databases like Apache Cassandra can help here.

Because Apache Cassandra can scale linearly by adding more nodes, it has become a popular persistent data storage choice for microservices applications. At Monzo, for instance, the combination of microservices and Cassandra has enabled the challenger bank to quadruple its customer base every year without issues. At peak times, Monzo says it can handle 300,000 reads per second across the 1,500 microservices that are now present in the bank, all of which connect to their Apache Cassandra cluster.

With so many moving parts involved in any microservices application, for any database implementation to support the app will have to be able to scale easily and connect to all the components. Using a container orchestration tool, such as Kubernetes, to manage both the application and the database instance can help this and by using Kubernetes operators to manage the process for adding database nodes when they are needed, some of the management headaches of scaling up the database can be avoided.

Similarly, it is worth looking at the issue of data consistency and resilience. For distributed computing environments, a missed message can be replayed and a transaction recreated. This is not as simple in a distributed database, so looking at how to handle data consistency may be needed too. Cassandra handles this based on the Paxos principle and eventual consistency — this ensures that all the copies of data are consistent across each node. For applications that require ACID transactions, a separate database instance may be needed, but the vast majority of applications can run using eventual consistency which takes milliseconds.

The Future for Microservices

Microservices meet the demands of today’s IT. They can reside across multiple data center environments around the world or be implemented across hybrid cloud deployments to provide scale and instant access to data. This can address application requirements such as continuous availability and no latency. Looking at microservices applications also means looking at the data that the applications will create.

Using a distributed NoSQL database provides the same kind of application design approach — widely spread, scalable and able to run across multiple environments. By thinking through both application and database design together, you can make the most of your microservices for the future.

Feature image via Pixabay.

A newsletter digest of the week’s most important stories & analyses.