Cloud Native / Development / Sponsored / Contributed

Cloud Native Applications: Stateless or Stateful Services?

5 Aug 2020 12:00pm, by

Lightbend sponsored this post.

Hugh McKee
Hugh is a developer advocate at Lightbend. He has had a long career building applications that evolved slowly, that inefficiently utilized their infrastructure, and were brittle and prone to failure. His focus now is on helping other developers and architects build resilient and scalable distributed systems. Hugh frequently speaks at conferences, and is the author of Designing Reactive Systems: The Role of Actors in Distributed Architecture (O'Reilly).

For a long time, stateless services have been the primary choice for developers. One of the reasons for using a stateless protocol is that it provides resiliency from failures, recovery strategies in the event of failures, and the option to scale processing capacity up and down to handle variances in traffic.

While on the surface building a stateless web app or microservice may be the right way to go, in some cases it is not necessarily the best approach for cloud native services. In this post, we’ll discuss the differences between stateless and stateful application design, as well as provide a path for additional exploration.

There are two key advantages when building stateless systems. First, the programming complexity is reduced. Incoming requests are received, processed, and forgotten. Second, there is no need to maintain state, and often the complexity revolves around maintaining session state, which typically involves replicating the session state across the cluster. This replication approach is used to maintain the session state when one of the servers goes offline.

In the last sentence of the Wikipedia definition of a stateless protocol, it states: “This property of stateless protocols makes them ideal in high volume applications, increasing performance by removing server load caused by retention of session information.” While it is true that the stateless approach does not have the overhead of maintaining session state, it does introduce processing patterns that have their own overhead and performance costs.

The term stateless is somewhat misleading. Applications by their very nature deal with the state of things — that is what they do, they create, read, update, and delete stateful items. The typical processing flow of a stateless process is to receive a request, retrieve the state from a persistence store, such as a relational database, make the requested state changes, store the changed state back into the persistence stores, and then forget that anything happened.

While there may be reductions in overhead related to not maintaining session state on the servers, there may be costs associated with delegating state management outside of the application, such as delegating the sole responsibility for state management to the persistence layer. This cost is often seen when the persistence layer slows down due to high contention. It may be true that it is possible to scale processing capacity at the application layer with the ability to increase or decrease the number of stateless servers; however, it is also true that the persistence layer does not have unlimited processing capacity. Once the persistence processing capacity is exceeded, the application often cannot go any faster.

It is important to understand that the decision to use a stateless approach has contributed to the persistence capacity limits, by delegating state management from the application layer to the persistence layer.

When to Use a Stateful Approach

For many developers and architects working with cloud native applications, our intuition tells us that on the surface a stateful approach has advantages. The most obvious benefit is the potential for a reduction in the overhead associated with retrieving state on every request. However, our intuition also tells us that maintaining state has an associated cost with the potential for increased complexity.

Often, however, this perception of increased complexity is because we are looking at the problem from the perspective of our current way of doing things. That is our current approach for maintaining state across a cluster and our current relational CRUD-based ways for handling persistence.

Let’s look at event-based state persistence. This stateful alternative shares an events-first way of processing and persisting state changes. Using the classic shopping cart scenario, each change to the state of a shopping cart is persisted as a sequence of events.

Figure 1. Persisted events

The above figure illustrates a series of shopping cart state change events. This is an example of an event log. The events are persisted to an event log stored in a database as each event occurs.

Events are statements of fact, a log of things that happened at some point in the past, and a historical record. In the above event log, the events aggregate to show that your shopping cart contains one item — item 1567 — and a shipping and billing address. My shopping cart contains two items along with shipping and billing addresses. Finally, there is the other shopping cart that contains two items.

Also, note that the event log has recorded various shopping cart changes. For example, you removed an item from your cart. The other user changed one of the cart items. This is an example of types of historical data that is typically lost when using the traditional CRUD-based persistence approach.

At any point in time, it is possible to determine the state of a shopping cart by replaying the events up until that time. Of course, this makes it possible to recover the current state of any shopping cart at the current time. It also makes it possible to view the state for a cart at a time in the past. For example, your cart contained two items at 08:20.

One of the advantages of persisting data using events is that it is now possible to record all of the interesting events that happened over time that resulted in the current state of each shopping cart. This event data, such as removing or changing items, can be extremely interesting for downstream data analytics.

Another event log advantage is that the persistence data structure is a simple key and value pair. In this example case, the key is user Id, item Id, and time, and the value is the event data. The event log is also idempotent, events are insert-only, and there are no updates and no deletes. The insert-only approach reduces the load and contention of the persistence layer.


As usual, when it comes to cloud native software systems, determining the best approach depends on the specific circumstances. This certainly applies when considering stateless versus stateful systems. In many cases, the stateless approach is an acceptable solution; however, there are a growing number of scenarios where using a stateful approach will be a better alternative. This is undoubtedly true for the ever-increasing demand for high performance, near real-time and stream-based systems.

If you would like to explore stateful applications in greater detail, download a copy of “Build Stateful Cloud Native Applications” by Jonas Bonér, creator of Akka and CTO at Lightbend — and get started on the path to running stateful services in a simple and efficient way.

Feature image via Pixabay.

At this time, The New Stack does not allow comments directly on this website. We invite all readers who wish to discuss a story to visit us on Twitter or Facebook. We also welcome your news tips and feedback via email:

A newsletter digest of the week’s most important stories & analyses.