How Redis Simplifies Microservices Design Patterns
Microservice architecture continues to grow in popularity, yet it is widely misunderstood. While most conceptually agree that microservices should be fine-grained and business-oriented, there is often a lack of awareness regarding the architecture’s tradeoffs and complexity. For example, it’s common for DevOps architects to reduce microservices to Kubernetes deployments, or an application developer to boil implementation down to using Spring Boot. While these technologies are relevant, neither container orchestration nor development frameworks can overcome microservice architecture pitfalls on their own — specifically at the data tier.
Martin Fowler, Chris Richardson, and fellow thought-leaders have long addressed the trade-offs associated with microservice architecture and defined characteristics that guide successful implementations. These include the tenets of isolation, empowerment of autonomous teams, embracing eventual consistency, and infrastructure automation. While keeping with these tenets can avoid the pains felt by early adopters and DIYers, the complexity of incorporating them into an architecture amplifies the need for best practices and design patterns — especially as implementations scale to hundreds of microservices.
With Redis rapidly becoming a staple across microservice architecture, it’s worth discussing how it can simplify the implementation of design patterns such as bounded contexts, asynchronous messaging, choreography-based sagas, event-sourcing, CQRS, telemetry, and more.
At RedisDays New York 2022, I hosted a session on this topic so feel free to watch the recorded presentation in case you prefer an audio version.
Design Patterns Are Best Visualized. Let’s Start With a Diagram…
The following architectural diagram is a composition of microservice design patterns. If it seems busy and complex, don’t be discouraged. We will decompose this mockup of an event-driven payment-processing workflow into its many embedded design patterns. We’ll discuss the challenges solved by each pattern and how their implementation can be simplified by using Redis. Since certain patterns are foundational to the implementation of others, we will cover them in an order that allows them to build upon each other.
By the end of this article, readers will be able to identify microservice design patterns where they once saw chaos. Think of it like Neo eventually seeing the code behind the Matrix. For now:
Design Pattern: Bounded Context -> Domain-Driven Design
Our first challenge is to logically segment the business into micro-subdomains, so that each can be supported by a small empowered autonomous team. Each subdomain’s scope should be bound by its team’s capacity to manage the lifecycle of its supporting microservice(s) — from inception to post-production. This shift from working on transient-projects to autonomous domain-ownership incentivizes accountability for all aspects of microservice design and empowers agile decision-making — which results in improved time-to-market.
Think of the prefix ‘micro’ alluding to the size of the team needed to support the entire lifecycle of the microservice(s) within its bounded business subdomain.
Within the context of our mockup architecture, let’s begin the organizational design process by starting with the payment-processing domain — which includes fraud detection, payments, settlement, and more. Since this scope is likely too complicated for a small team to manage, let’s choose to narrow their ownership boundary down to just the fraud-detection subdomain.
The diagram above shows that fraud-detection is composed of the workflow’s first three microservices — which include digital identities, statistical analysis, and machine learning-based online transaction risk-scoring. Since their scope is likely still too broad for a small team to manage, let’s split fraud detection further down into two subdomains — which finally seems more manageable.
At a very high level, the process we just followed is called Domain-Driven Design (DDD), which is supported by the recommended pattern to bind each microservice’s scope and ownership claim to a business subdomain called bounded context. But wait a minute — where does Redis fit in?
Notice that each microservice has its own dedicated database for isolation. The empowered autonomous team that owns the purple bounded context chose RedisJSON to support their “Digital Identity Authentication” microservice, and RedisBloom to support their “Probabilistic Transaction Filter” microservice. Meanwhile, a separate team that owns the green bounded context chose as its feature store to support real-time “Online Transaction Risk Scoring”.
While each microservice required its own optimal data model to handle their unique data access pattern and SLAs, Redis saved them from having to evaluate, onboard, manage, and administrate three distinct databases. In fact, with Redis Enterprise they could deploy all three across a single multitenant cluster without coupling their release cycles nor becoming noisy neighbors.
Design Pattern: Asynchronous Messaging -> Interservice Communication
Now that we’ve identified a bounded context and optimal data model for each microservice, our next challenge is to enable communication between them without breaking compliance to isolation. This can be solved by embracing eventual consistency, which presumes the microservice on the receiving end of interservice communication will not be available during outbound transmission, however, can consume the message as soon as availability is restored.
The recommended pattern for interservice communication is asynchronous messaging using a publish-subscribe message broker as its event distribution hub. In this pattern, a producer can publish an event without requisite awareness of whether or not any consumer is listening, and — in the same way — consumers of that event can react to it at their convenience or ignore it altogether. This is typically the foundation of an event-driven architecture.
Since we have already chosen Redis as the primary database for multiple microservices, we can simplify our architecture by also using it to implement this pattern with Redis Streams. Redis Streams is an immutable time-ordered log data structure that allows a producer to publish asynchronous messages to which multiple consumers can subscribe. This ensures the microservice that is publishing events will remain decoupled from the microservice(s) consuming them — so there are no cross-dependencies on availability and release cycles.
In addition, Redis Streams can be configured to handle different delivery guarantees, support consumer groups, and other nuances that are similar in nature to Apache Kafka topic partitions.
Design Pattern: Choreography-Based Saga -> Distributed Transactions
Now that we’ve enabled interservice communication, our next challenge is to handle transactions that span across multiple bounded contexts without breaking compliance to isolation. In the past, this was trivial to implement, since all operations within the transactional scope were executed against a single RDBMS that provided row-locking, deadlock-detection, and roll-back features. Once data became distributed across multiple databases, the Two-Phase Commit protocol (2PC) became a standard for distributed transactions. However, while both approaches worked, they were not designed with eventual consistency in mind.
If we presume a dependency will be unavailable during a distributed transaction, then we should also presume frequent rollbacks will cause sporadic unavailability across the system — which is neither cloud native nor improves time-to-market.
This can be solved by relaxing strict requirements for ACID guarantees, which have propped up relational databases across most traditional architectures for decades. Though relational databases still have a place within microservice architectures, their relevance becomes much more situational. For example, if referential integrity is not a requirement, then why wouldn’t an empowered autonomous team choose to optimize their microservice with a NoSQL database that is purpose-built to handle their specific data access patterns and SLAs.
Recall that our payment-processing workflow is composed of multiple microservices that are organized into separate bounded contexts and supported by Redis — a NoSQL database. Within this context, the recommended pattern to handle distributed transactions is a choreography-based saga, which performs a sequence of isolated local transactions with published events facilitating the transition between workflow stages.
Each microservice participating in the saga will listen only for its own workflow-related event, which will notify it to perform a local database transaction and subsequently publish its own event to the message broker. This event-driven choreography can include compensating microservices for rollback purposes and decision services for complex business processes.
It’s worth noting that in a choreography-based saga there is no central orchestrator, which avoids coupling the release cycles of participating microservices. However, it is not always the right solution. There can be cases where strong consistency is an absolute requirement — such as account transfers. Within that context, either an orchestration-based saga might be better suited, or relying on a 2PC between microservices within the same bounded context.
Design Pattern: Message Relay -> Consistency
Now that we’ve choreographed transactions that span multiple bounded contexts, our next challenge is to mitigate the risks of inconsistency between a microservice’s database and the message broker — even if Redis is used for both. Recall that in the previous two design patterns, each microservice committed locally to its database and subsequently published an event. If this is implemented using some variation of the dual writes pattern, communication could become lost and parts of the distributed transactions could become orphaned — especially in a cloud environment.
Code-complexity can be added to each microservice to handle various failure and inconsistency scenarios, however consider this effort multiplied across 100s of teams and the risks of incorrect implementations — all of which add no business value.
To avoid the costs and variance of disparate application-level implementations, the recommended pattern is to use a message replay. Redis simplifies the implementation of this pattern, also known as write-behind, by using RedisGears. RedisGears is an in-memory computation engine that operates within Redis as a secondary thread(s) to listen for changed-data events, durably store them in time order, and publish them to the message broker — whenever it’s available. This can be uniformly enabled or upgraded on each Redis database with infrastructure automation.
Design Pattern: Telemetry -> Observability
Now that we’ve mitigated the risks of inconsistency between the primary database and secondary data platform(s), our next challenge is to measure the health of microservices across the architecture and their supported business transactions — known as observability.
Observability is a must-have within a distributed system filled with hundreds of isolated and eventually consistent components.
Observability is built on three pillars — metrics, logging, and traceability. We’ll first focus on metrics, which are typically stored within a time-series data model that can handle heavy ingestion of time-ordered events and point-in-time queries. Optimally, metrics would be tracked in real-time so that SLA/SLO anomalies can be detected and potentially mitigated as they occur.
To observe the health of a distributed system, we’ll first need its data. The recommended pattern is telemetry, which is the automatic collection and transmission of data from a remote source for monitoring. Redis simplifies the implementation of this pattern by building on its write-behind capability to seamlessly ingest data into another Redis data model — RedisTimeSeries. Notice with Redis that we only need a single platform to implement this pattern.
Now that metrics are available within RedisTimeSeries, we can query them in real-time across multiple dimensions — Business KPIs, Application SLA/SLO, Infrastructure Utilization, etc. As an example, here’s how infrastructure-level metrics could be visualized using RedisInsight.
Design Pattern: Event Sourcing -> Auditing and Replay
Now that we’ve implemented telemetry for metrics data, our next challenge is to enable the remaining pillars of observability — logging and traceability. Unlike metrics, a time-series data model would not benefit the inherent properties of logs, since they cannot be aggregated or down-sampled. Instead, they require an immutable and time-ordered data structure that can be used for auditing, recovery, or replaying a chain-of-events in the order they occurred.
Since microservices require isolation, they cannot depend on a shared RDBMS to maintain a transaction log that captures all events within a monolith. Therefore the recommended pattern is event sourcing, which records every changed-data event within an immutable and time-ordered log — at the microservice-database level. This pattern is common across most event-driven architectures.
Event sourcing is typically implemented using the combination of a message broker and an event-store. Recall that we have already implemented patterns that used RedisGears to relay changed-data events and stored them within Redis Streams — an immutable time-ordered log data structure. Therefore, Redis can be used as a database, message broker, and an event-store, all in isolation as multitenant components on the same cluster.
Now that we have captured change-data events within Redis Streams, we can natively visualize them using different filters for observability — microservice ID, job ID, transaction-correlation ID, etc.
Redis Streams can also add value to event-sourcing, beyond the scope of a single microservice, by allowing external processes to subscribe to its event stream as isolated consumer groups. This allows for observability at the business process, domain, or even architectural level for systemwide analytics.
Design Pattern: Command Query Responsibility Segregation (CQRS) -> Performance
Notice that when we defined our fraud-related bounded contexts, we left out the final stage of the payment-processing workflow. This was because its empowered autonomous team chose a non-Redis database to support its microservice.
So, let’s now assume that the “Payment Approval” microservice is supported by a disk-based database which is not optimized for query-performance. Since it presumably has strong durability guarantees, it’s a logical choice for record-keeping — however, what if its bounded context also includes a microservice that requires the same data for query. How can we optimize query-performance without Redis as the system-of-record?
The recommended pattern is CQRS, which segregates the responsibility for a dataset’s writes — Command — and reads — Query. Implementing CQRS by using separate databases optimizes the data structure, or data model, to the data access pattern, and individual SLAs, on both sides of the segregation. Since our goal is to read-optimize our queries, the direction of data replication will flow into Redis from a disk-based database — i.e. MongoDB, Cassandra, RDBMS, etc. Easy, right?
Here’s the catch — to implement this pattern we will need to solve for near-real-time continuous data replication, maintain eventual consistency between heterogeneous databases, and transform the data to avoid an impedance mismatch between Command and Query data structures. This should sound familiar since we did this using RedisGears as the message relay, when Redis was the source database, however since most other databases don’t support write-behind, we’ll need an external implementation to replicate changed-data events.
Within this context, we can simplify the implementation of CQRS by using a Change Data Capture (CDC) framework that can integrate with both Command and Query databases. CDC frameworks typically use transaction-log tailing or polling-publisher patterns to scan for changed-data events on the Command database and replicate them as a transformed payload to the Query database.
Design Pattern: Shared Data -> Reusability
Now that we’ve addressed optimizing performance when Redis is not the system-of-record, our next challenge is to handle shared data between microservices that are separated by different bounded contexts. Here’s a few solution patterns that can be simplified with Redis:
- Read Replicas – replicate change databases
- Shared Database – allow cross-dependencies between separate bounded contexts
- Domain-Driven Design – include microservices sharing data
But wait a minute — while these options address shared data between a few bounded contexts, how do we address this concern on a global scale?
The recommended pattern for global data is an isolated database dedicated to the API gateway. However, since this database could potentially be accessed by every transaction that flows through the architecture, we must consider business continuity, scalability, and performance as critical success criteria for its selection. Luckily, this is where Redis Enterprise shines across thousands of production deployments.
Redis Enterprise is the de facto standard for mission-critical session data, authentication tokens and ephemeral data storage due to its sub-millisecond performance at scale and 99.999% SLAs with its Active-Active cross-cluster replication.
Microservice architecture can be a game-changer to beat competition to market and reduce barriers for an organization’s cloud migration. With digital transformations well on their way, the motivation to re-platform as cloud native microservices will only grow.
But like everything in life, microservices come with tradeoffs. Luckily, best practice design patterns are well documented and platforms like Redis can help us simplify their implementation. While this article only scratches the surface of awaiting challenges, my hope is that it empowers readers to identify patterns where they once saw chaos and reduces implementation complexity by using Redis beyond caching.
Redis updated this post on 9/14/2022.