Workflow automation? This is not for us! We do not have human task management and task lists. We fully automate our core business. And we have such a huge load that cannot be handled by workflow engines anyway.
I regularly hear opinions like this, and they are so wrong! Workflow automation is so much more than human task management. That’s why I want to elaborate on my view on use cases for workflow automation today.
TL;DR: I see five clusters of use cases for workflow automation technology, ranging from very technical use cases (like stateful retries in case a remote service is not available) to typical business processes (like order-to-cash). Modern workflow engines are lightweight and can also operate high volumes at low latency making them applicable for every problem that requires a state machine.
Use Cases for Workflow Automation
Workflow automation unifies multiple use cases around long-running behavior. I drew a picture which is my personal opinionated visualization:
I sorted the use cases into five clusters. I will go over these clusters in the rest of the article. I rated them on two dimensions:
- Business or IT, meaning the main driver of the requirement at hand. This can be business departments because they want to implement core business capabilities. But it might as well be IT because they want to solve technical challenges. Most use cases meet somehow in the middle anyway.
- The duration of a workflow instance, which can range from very short running to very long running. The latter might mean that workflow instances take hours, days or even weeks to complete. Long running is misleading by the way, as it basically means that the workflow instance is not running but waiting most of the time. So that’s why you need to handle persistent state the further right you move on the scale.
One important side remark: Just because a use case is more technical doesn’t mean it is not relevant for the business. It is very relevant to decide if you want to retry a failed service call (e.g. customer rating) or just return a default value (green). It is relevant to decide with which inconsistencies you can live or which you have to resolve.
Just in case you wonder why to use a workflow engine at all, there are two main value propositions:
- It is a persistent state machine that also solves subsequent requirements like versioning of process models, scheduling mechanisms, operational control and much more.
- For long-running workflows, the graphical visibility is vital for successful BizDevOps.
So let’s get started and go through the five clusters one by one. I will also give links to source code examples for each use case in order to make it more concrete (at least for developers). These examples are directly runnable on the open source workflow engine from Camunda (hint: I am biased in tool selection as I co-founded Camunda and therefore naturally have examples for this one at hand).
1. Business process automation
Business processes implement important core capabilities of a company like delivering goods or services a customer ordered (“order-to-cash”). Business processes are often long running in nature. They might involve:
- Straight through processing/service orchestration
- Waiting for internal or external messages, timers (e.g. the promised delivery date) or other events
- Human task management
- Order fulfillment
- Application management
- Invoice management/billing
- Inbound/outbound management (often named input/output management)
- Approval of various things (orders, purchase orders, travel expenses, invoices, etc…)
- Stock trading
- Content preparation and delivery
Real-life examples of workflow models implementing end-to-end business processes (credit card application, 2-factor authentication, claim handling)
Code example here.
2. Communication in distributed systems
Distributed systems become the new normal in IT. Distributed systems are complicated because of the eight fallacies of distributed computing. Most developers I know are not yet aware of the magnitude of changes coming due to the fact that remote communication is unreliable, that faults have to be accepted and that you exchange your transactional guarantees with eventual consistency. We as developers really have to adjust our toolboxes in order to cope with these new challenges. I wrote about some examples in three common pitfalls in microservice integration — and how to avoid them.
Workflow engines are an important ingredient to solve a couple of challenges you will run into.
- Retrying services invocations if the services are not available or not responding. Retrying might be done for several hours or even days. I call this stateful retry.
- Wait for messages (e.g. an asynchronous response or an event).
- Timeout when waiting for messages.
- Correlate several messages (e.g. something happens only if 3 messages all arrive).
When serving this use case you might get very small workflow models which is perfectly fine. These models feel like integration flows you probably also know from ESB-like tools.
These workflows are often potentially long-running — as you get synchronous results in milliseconds if everything is good but you might need seconds, minutes or much longer to resolve failure situations.
3. Distributed Transactions
As mentioned you cannot rely on ACID transactions in distributed scenarios. ACID stands for atomicity, consistency, isolation and durability and is what you have experience from working with a typical relational database (begin transaction, do some stuff, commit or rollback). Attempts like 2-phase-commit (XA) bring ACID to distributed scenarios but are not really used much in real-life as they do not scale. But you still have to solve the business requirements of having a one-or-nothing semantic for multiple activities.
This is typically addressed by remembering which activities were already executed and invoke so-called compensation activities whenever the business transaction fails. A compensation activity semantically undoes the original activity (e.g. refund money you have taken from a credit card). It is important to note that this model accepts to have temporarily inconsistent states, but makes sure everything gets consistent in the end. This relaxed view on consistency is known as eventual consistency and sufficient for most real-life use cases.
This is also known as the Saga-Pattern. I plan a dedicated more in-depth article on this soon.
All activity chains that care about consistency in distributed systems. The classical example is booking a trip where you book a hotel, car and flight one after the other — and need to cancel bookings if something goes wrong. Actual real-life use cases are often even much more trivial.
Code example here.
Modern architectures are all about decomposition, e.g. into microservices or serverless functions. When you have many small components doing one thing well you are forced to connect the dots to implement real use cases. This is where orchestration plays a big role (see for example Orchestrating Azure Functions using BPMN and Camunda — a case study). It basically allows invoking components (or services, activities, functions) in a certain sequence.
- One microservice invokes three others in a sequence
- Multiple serverless functions need to be executed in order
5. Decision Automation
Being a workflow guy decision management for me is “the wingman” of workflow automation. Of course, it is a discipline on its own and deserves its own article, but I will only look at it from the workflow automation perspective today. And then it is a great tool to extract business decisions and separate them from routing decisions:
- Automated evaluation of eligibility or approval rules
- Validation of data
- Fraud detection or risk rating
- Calculation of derived values (e.g. discount, shipping costs)
- Determine assignees, e.g. who should best work on a human task
Code example here.
Use Cases Get Mixed in Real-Life
In real-life scenarios the use cases are often combined. So, for example, you might want to fully automate your application processing (classical business process automation with straight through processing). In order to do this you might have to invoke (orchestrate) several web services which means you do communication in distributed systems. Business rules could automatically decide the risk of fraud (decision management). If there is a case of suspected fraud, it is routed to a clerk (human task management).
At this point I want to add a very quick side note: There are manifold options to setup your architecture to use a workflow engine. It does not mean that you have to introduce some central component which forces a low-code approach on you.
You might be interested in Architecture options to run a workflow engine or Avoiding the “BPM monolith” when using bounded contexts for more details.
Performance and Scalability
Opening up the use cases for workflow automation typically raises the question: Can the respective tools really handle the load we will get if we use it for each and every service invocation? This also includes use cases which are known as “low latency high throughput.” Hence the engines have to be really fast even under very high loads.
Zeebe, for example, reaches these new horizons of scalability by completely changing its internal architecture. A Zeebe broker is a distributed system on its own to handle replication and scale efficiently. It uses algorithms like event sourcing, append-only logs, single writer principle and Raft consensus. Don’t worry if you don’t understand these concepts right away, that’s why there are middleware or cloud providers taking care of it. But it also gives you a taste of what I meant earlier in terms of changes coming to the way we develop software.
All of this allows for horizontal scalability never seen before in a workflow engine. My co-founder names this “big workflow.”
The next logical, inevitable stage is the “big workflow problem”: How can we handle activity chains that are complex, distributed, long-running and mission-critical? How can we handle them on a massive scale? This problem has three critical dimensions: Software development, technical operations and business visibility.
Currently, I am discussing use cases with customers that involve hundreds of thousands of instances per second. We are getting there!
Workflow automation unifies multiple use cases around long-running behavior. In this article, I have named and clustered them. I showed that all are valid use cases and that modern technology can be a great help for all of these problems and related requirements.