Meet D-ASYNC: A Framework for Writing Distributed Cloud-Native Applications
Despite being available on the market for a while, serverless platforms started to gain a lot of momentum just recently. A lot of people see big potential in the idea itself, and a lot of people also think that serverless is a threat to the containerization. At the same time, many would agree that serverless platforms are not mature enough to be the next successor right now. Let’s imagine that serverless is the future, would it be the final step of evolution for hosting microservices and running distributed workflows?
Perhaps it’s too early to answer that question, nevertheless, my D·ASYNC technology tries to lay out the vision of the next step.
Abstraction and Expertise
Companies that started hosting physical servers for you did the very first and important step towards segregating concerns of managing of computing units versus using them, thus creating a layer of abstraction for end-users. That wasn’t enough, so later on new features were added to make running software even easier, like custom image of an operating system, dynamic allocation of virtual machines of various ‘shapes and sizes,’ and so on.
Today we have containers, their orchestration, and serverless, all of which incredibly simplifies development of cloud applications. However, despite the fact of many frameworks and services being available, it is still can be hard for software developers to build in-house distributed system and microservice ecosystem from different components, just due to the nature of the complexity of the problem.
This task is not something that engineers cannot handle, but can be further optimized, abstracted away, handled by experts, and provided as a service by other companies. That’s where D·ASYNC technology steps in with a proposal of how to get to the next level of abstraction with the intent to help software developers to focus on business logic even more and help companies to go even faster at a lower cost.
D·ASYNC, where D stands for “distributed,” is a framework for writing cloud-native distributed applications in a general-purpose programming language using just its syntax, paradigms of object-oriented programming, and design patterns, where functions are compiled into finite state machines (the “async” and “await” keywords). The current implementation supports C# only, and the next candidate is Python.
About 20 years ago a parallel programming on a single machine wasn’t a real thing, simply because there were no multicore CPUs generally available to do so. Then we had to build custom thread pools and put the burden of scheduling and running work items into the application logic. Things have drastically changed with the introduction of async-await syntax in C# 5.0, what made asynchronous programming “natural” (but not necessarily most efficient), hiding the underlying details of a multicore/CPU system. To achieve that, an “async” function is compiled into a finite state machine (FSM), where each state transition of an FSM can be scheduled on a thread pool as a work item.
A distributed workflow system does almost the same thing as a set of “async” functions — runs small steps or work items (ideally idempotent actions), where affinity to a particular process, or compute unit is not guaranteed (same as async-await does not guarantee execution on a particular thread in a process). A distributed workflow framework also must have an ability to save and restore state (input, output, contextual data) of a work item to achieve scalability and resiliency.
Building a state machine by hand can be very hard, tedious, error-prone, it’s not easy to understand the code flow with a large number of small functions that represent state transitions, and most importantly, as in the example of early generations of thread pools, it puts burden of being aware of a distributed environment into the application logic.
It is very easy to build a state machine with the async-await keywords. If you look at how C# compiler generates state machines, you can notice that the “await” keyword serves as a “delimiter” between state transitions.
async Task FiniteStateMachine1()
// state transition 1
// state transition 2
Putting everything together, if you can control the execution of such FSMs (suspend before await, resume after await), capture and restore the state (including input arguments and local variables) at a suspended point, then you can build a framework that runs your “async” functions as a distributed workflow. This is the key to the concept of the D·ASYNC technology, although there is much more beyond that.
Functions written with async-await naturally fall into the category of Event Driven Architecture, which has a lot of benefits over standard request-response design practiced in modern microservices. Also using “await” syntax (probably in 99 percent of cases) creates a special case of a publisher-subscriber model, where you have exactly one publisher (an “async” function being called) and exactly one subscriber (the continuation of calling “async” function).
D·ASYNC describes interfaces and classes as candidates for contract definition and implementation of microservices.
// Declaration of the interface of
// another service that might be
// deployed in a different environment.
public interface IFooService
public class BarService
private IFooService _fooService;
// Another service can be consumed by
// injecting as a dependency. All calls
// to that service will be routed to that
// particular deployment using its
// communication mechanism. All replies
// will be routed back to this service.
// This is where Dependency Injection meets
// Service Discovery and Service Mesh.
public BarService(IFooService fooService)
_fooService = fooService;
public async Task Bar()
// Simple call to another service may
// ensure transactionality between two.
// That state of 'Bar' is saved,
// 'Foo' is scheduled for execution.
// That complexity is hidden to help you
// focus on the business logic.
// The state of 'Bar' execution is
// restored upon 'Foo' completion.
There is nothing spectacular or innovative in the code itself. However, the idea is to combine this approach with distributed workflows expressed with “async” functions described above. Thus methods on classes stop being merely entry points to fairly small routines following a request-response model, but rather become a part of a distributed workflow governed by an event-driven service inter-communication mechanism.
Having services defined simply with interfaces and having an ability to use them by simply injecting as a dependency binds Inversion of Control with Service Discovery, and becomes a good layer of abstraction for the application code. Such services can evolve and be deployed independently, but still form a distributed workflow.
Having examples in C# does not restrict such microservices to use .NET or D·ASYNC-based applications only. You can connect existing ones which use request-response or other event-driven mechanisms under the simple condition of having a communication contract like recently announced CNCF’s CloudEvents specification for example.
Code Is the Abstraction
It’s not enough to have concepts described so far to express a distributed app in a code, but the technology implements more of them (see my D·ASYNC syntax mapping post), and other concepts are yet to come. Even with that, there is another crucial ingredient is needed — a continuous delivery pipeline that deploys your apps to the cloud. You just push the code to a version control system, it gets deployed automatically, and the application runs as a distributed workflow composed of microservices (as demonstrated in the D·ASYNC on Azure Functions post).
Therefore the application code itself with special syntax becomes the final level of abstraction on top of serverless platforms (and/or containers or anything else), which expresses how an application should run without being fully aware of a hosting environment and exact techniques used to make your application distributed (in a similar manner how async-await hides the multithreading complexity).
Yet Another RPC? No, Thanks
At this point, you might think “Here we go again… this RPC is doomed along with CORBA and DCOM.” If you do, pause that thought for a minute. Distributed computing is hard, but if nobody tries to make it easier, it’s never going to be easy. Instead of prescribing failures, we should learn from past mistakes. If microservices were merely a reincarnation of former Service Oriented Architecture implementation practices, they would have failed a long time ago. Similarly, while D·ASYNC technology might resemble a form of RPC, ideologically it is something else, and I’ll try to briefly explain why.
Developers of any framework try make it as easy to use as possible by providing a high-level interface and by hiding implementation details. Distributed workflow engines are no exception, where you can find a lot of UI tools that usually target business processes (a workflow definition is stored in XML or JSON for example), or frameworks for software developers exposing their API (the application code itself defines a workflow by using the API).
Whichever path you choose, there always will be a boundary between a workflow definition and application business logic, because there is no any mature general-purpose programming language that incorporates a workflow definition language. D·ASYNC technology closes that gap and makes distributed workflows easy to write and understand, but it happens to utilize and re-purpose existing syntax of programming languages (at least for now) to achieve that — that’s why the technology might appear as another RPC. But standard RPC with request-response model does not work due to fallacies of distributed computing, whereas an “async” function implies that it might take some time to complete, which is better, but still might not be acceptable.
D·ASYNC does not await completion of an “async” function in the same process but saves the state of a workflow (for resiliency and scale) and releases compute resources, what is not possible with RPC. The persistence part allows to completely segregate failures at framework level from failures at the application level. I.e. if an exception occurs in the execution engine, that will be handled by the engine, and will never “bubble up” to application code — no more ugly error handlers mixed with business logic. If an exception is raised by application code, it will be handled by the nearest ‘catch’ statement in the application.
That also applies to service request timeouts — instead, you should use CancellationTokens in .NET for time-sensitive operations — no timeout exceptions are visible to the application code. With D·ASYNC you don’t distribute your objects, but create a workflow in the same programming language as its business logic.
You might also think, “If I don’t know if an ‘async’ function is a part of a workflow or not, I might still get into troubles described in those fallacies.” Such statement is absolutely correct. Once again, a general-purpose programming language does not have a syntax to describe a workflow, so you’ll have to abide certain rules to make an ‘async’ function a part of a workflow in first place. That definitely is not intuitive and increases the risk of associated problems, but there are some developments in progress to address it — to draw a clear visual boundary and increase awareness between a workflow definition and its business logic.
If you are familiar with LINQ, you most likely know that not all query expressions can be supported by a query provider, what sometimes hard to anticipate unless you test it. But that does not stop engineers from using it, because it saves development time, it’s just convenient. D·ASYNC might suffer from similar symptoms, and that’s fine — the technology does not have to be perfect and most efficient but can satisfy majority of cases and be chosen in favor of simplicity, convenience, and cost reduction.
State of the Project
There are some changes that need to happen to serverless platforms first along with the evolution of programming languages themselves and relevant tools before D·ASYNC can reveal its full potential. The technology has certain benefits, but might not offer specific features or meet certain requirements to tackle all scenarios in the best way possible. However, if it can reduce complexity and save a lot of effort in 80 percent of cases, that can be good enough to balance out the rest 20% of sophisticated work.