Avoiding Serverless Anti-Patterns with Observability
Thundra sponsored this post.
Serverless offers opportunities that are transforming how we think about building in the cloud. The days of worrying about complex and brittle cloud infrastructure on which sits your entire business logic are soon coming to an end. All of these responsibilities are increasingly now delegated to a cloud vendor, allowing you to focus primarily on your business logic.
However, as we push more of the responsibilities onto the cloud vendor, we not only give up control — but also observability. This inadvertently leads to a black box situation, where we become unaware of how and why our serverless architecture behaves. This makes it harder to detect anti-patterns, exacerbating the situation.
New adopters of serverless are more susceptible to anti-patterns, so not being aware of — or not understanding the effect of — these anti-patterns, may be frustrating. So it acts as a barrier to serverless adoption.
Observability mitigates this black box effect, and understanding the possible anti-patterns allows us to monitor the right metrics and take the right actions. Therefore, this article goes through some of the major anti-patterns unique to serverless and describes how the right strategy in observability can cushion the impact of anti-patterns creeping into your serverless architectures.
Using Async-like Sync
Serverless applications tend to work best when asynchronous. This is a concept that was preached by Eric Johnson in his talk at ServerlessDays Istanbul, titled “Thinking Async with Serverless.” He later on went to present a longer version of the talk at ServerlessDays Nashville.
As teams and companies begin to adopt serverless, one of the biggest mistakes they can make is designing their architecture while still having a monolith mentality. This results in a lift and shift of their previous architectures. This means the introduction of major controller functions and misplaced await functions.
As a result, the function that is in the idle state will also be charged, since it is still technically active. This goes against the pay-as-you-go principle of serverless.
This problem is further exasperated when chaining functions together. This is the process whereby one function makes an async call to another function, waiting for a response, while the second function performs a read/write operation to a storage service. This increases the possibility of unreliability, as the first function might time out. This is even worse when functions make calls to storage devices outside the vendor’s ecosystem, or on-prem storage services.
What Should You Observe?
The visible effects of an anti-pattern are potentially higher costs and higher probability of timeouts. So the first step is to keep an eye on the cost, duration and timeouts of your functions.
Depending on your monitoring tool, the process can be made more efficient by setting up alerts on these metrics. For example, Thundra allows you to set up alerts on all of these metrics. It even gives you the flexibility to define the rate of the metrics within the desired time intervals.
For an in-depth analysis, investigating the distributed traces of your application may lead to more fruitful and well-founded insights. That is because as the transition to microservices occurs, the system itself becomes distributed. Thus, the manner of monitoring has to facilitate a more holistic approach, where each transaction flow should be measured in the form of a trace. This allows the monitoring of the business flow interacting with each other synchronously or asynchronously. An error or delay in a service might be caused by any of the upstream or downstream services or both.
The Need for Sharing
There are scenarios where libraries or business logic or even just basic code has to be shared between functions. This however, leads to a form of dependency and coupling that works against the serverless architecture.
The most prominent pitfall resulting from this is that it hampers scalability. As your systems scale and functions are constantly reliant on one another, there is an increased risk of errors, downtime, and latency. As a result this acts as an anti-pattern to the serverless scalability property.
An example where such issues spring up is machine learning, where large libraries have to be shared across various functions used to process test, validation and training datasets. AWS provides Lambda layers in an attempt to offer some resolution, but this may not always be the ideal solution.
In most cases, the need to share code libraries and logic was not only an anti-pattern, but also a technical limit on serverless functions. For example, AWS Lambda functions have a hard limit of 512MB on /tmp storage. That means that when developers are building their AWS Lambda functions code, one must always be aware of this limit.
AWS recently solved this problem with the release of a much-coveted Amazon EFS and AWS Lambda integration. This new integration allows functions to access a shared library or data, via an integrated Amazon EFS instance. Nevertheless, this does not justify making functions dependent on one another. Just because something is now achievable does not mean it is the most effective solution, considering the risk resulting from the anti-pattern mentioned above.
What Should You Observe?
If sharing information and coupling of serverless functions was intended and no preventive measure would resolve the issue. In this case it becomes imperative to measure the effects of such an architectural set-up. Cold starts especially is a metric to measure, as the operation of one function may depend on another due to coupling. If one of the functions experiences latency due to cold starts, it may have a ripple effect on all other coupled functions.
Overall the entire architecture should be mapped. Both AWS and Thundra can provide an overview of your cloud architecture. Awareness of how cloud architecture is being built is the only manner in which the issue can be effectively avoided.
Building upon the notion of breaking large compact business cases into smaller independent functions, there is a possibility of reaching a level of granularity that eventually proves detrimental. It is clear that as a push towards breaking down the business logic to individual functions reaches a certain point, the overhead negates the benefits. The need for communicating events between individual functions leads to thinking about webhooks and APIs. Therefore, an increase in the engineering efforts, security risks, and latency. As the number of functions scales, these concerns are multiplied.
What Should You Observe?
Architectures in general can get extremely complicated as your system grows. Therefore, the first thing to definitely go for is a map of your distributed system architecture as you begin to adopt serverless.
Another sign of overly granular architectures is when serverless functions become overly chatty. The major overhead of a granular architecture is communication and that is what should be avoided. Communication overhead and unnecessary calls to AWS Lambda functions mean more engineering complexity and potentially higher costs. Therefore, it would be beneficial to check for costs and total invocation count.
It is also recommended to dive deeper into invocations and keep track of triggering components. If it is noted that one Lambda function is constantly being triggered by the same triggering Lambda function a substantially high number of times, then maybe merging the two Lambda functions can be considered.
Additionally, as mentioned, the move from monolith to microservices and eventually a pure serverless distributed architecture results in the need for communication infrastructure. It therefore becomes necessary to monitor the payload of data being sent between these communication channels.
In conclusion, serverless is booming — but it doesn’t come without its own pitfalls. There are various best practices to avoid anti-patterns, however the ultimate solution is to couple these best practices with a strong tendency for observability. Those adopting the technology need to be aware of not only the anti-patterns that are possible, but what to monitor if they do find a way into the system architecture.
Observability is still a maturing field, and cloud vendors are now heeding the call. However, inbuilt monitoring solutions still do not fulfill the overall needs and offer only basic observability. In achieving true observability according to the three pillars of “metrics”, “traces” and “logs”, it would be advisable to look towards a third-party monitoring tool specialized for the job.
Amazon Web Services is a sponsor of The New Stack.
Feature image via Pixabay.