Serverless Doesn’t Mean DevOpsLess or NoOps
When selecting serverless and Lambda as the preferred architecture for your cloud operations, you need to understand the inherent limitations in order to scale once your application and product code start to grow in size and complexity.
Serverless is a good choice to get ramped up quickly when building your application code, but there is a common misconception that serverless means DevOpsLESS or NoOps, and this simply is not the case.
What’s more, sometimes you have to really invest in design and architecture in advance to not hit a wall later or incur a lot of technical debt. In this post we’ll provide an overview of some of the limitations we encountered when building Jit, a software-as-a-service DevSecOps platform, based on serverless and event-based architecture.
A Quick Overview of the Serverless Gotchas
When your applications start to grow there are challenges unique to the serverless paradigm that are quickly encountered if you don’t plan for them in advance. We’d like to help those exploring serverless be aware of what they need to design their applications for, before they even get started (or possibly quickly rework, if they already have).
Lambda throttling happens as a result of the number of instances you can run simultaneously. (This post explains how to overcome that).
AWS Lambda limits this to 1,000 by default (you can request to increase this threshold, but you need to be aware that it exists in the first place). However do note that this has cost implications, so you shouldn’t automatically increase your threshold before examining the design of your architecture and ensuring you truly need this.
This means that the more events or services you run simultaneously, the more rapidly you will hit a wall. This is something that you need to think about as early as the first line of code if you plan to run entirely on serverless architecture.
While your design might work today at a small scale, you have to think very early on if this will linearly scale.
Remember, this limitation’s purpose is to protect you as well as AWS from mistakes or bad design. For instance, imagine a case where somehow a Lambda calls itself in a loop. Without this built-in protection mechanism, you could reach millions of invocations.
While your design might work today at a small scale, you have to think very early on if this will linearly scale when you have thousands of tenants or customers (or more). Depending on how you architect your resources, Lambdas and microservices (and how “micro” each service is), if you break down your services into too small chunks, you may end up breaking your entire service chain flow due to throttling from too many parallel events.
This means you need to be well aware of how much traffic you are currently handling, in the form of events per minute, and even spikes and outlier traffic that your services handle. All this needs to be considered in addition to the communication invocation method employed for each — sync or async (we dig into this more deeply later) — where with synchronous invocation each service or system call adds up and can overload the system.
It’s important to be aware that the way throttling works isn’t exactly predictable. So even if you have the monitoring in place to ensure you don’t reach 1,000 parallel events, and you think you’re covered, this may actually happen with your first spike where throttling may happen at an even lower threshold, as this is essentially unexpected behavior (but documented in the AWS docs). So a good practice is to architect your systems in a way to be able to recover when that happens (such as idempotency and retries).
As its name implies, serverless does not run on servers that run forever, they provide ephemeral runtime that has at best a 15-minute total window for function runtime. This can affect the size of the input your service can handle. When you design your functions from the get-go, and the size of the input continuously increases along with the processing time, you can encounter timeouts during runtime.
Therefore, a recommended design pattern for services or functions with a runtime that linearly scales with the size of your input is to split the input into chunks or batches and handle them in different Lambdas. It is also a good practice to use queues, when doing so, to avoid throttling.
The event-driven design pattern, popular in serverless-based systems, many times requires a diversity of services in the chain for event handling, including an API gateway, SQS (Amazon Simple Queue Service ), event bridge, SNS (Amazon’s pub/sub service), where each of these has different event size limits. Each resource you use may have different size limits you need to be aware of, where passing data along the chain may break when sending a large payload.
Each of these resources in the chain is capable of processing different-sized payloads, and this means that you will have failed events if you don’t take this into account in advance.
This means you can’t send limitless payloads between resources and need to be aware of this when building your functions and services. Each of these resources in the chain is capable of processing different-sized payloads, which means that you will have failed events if you don’t take this into account in advance, and ensure this payload can be received across your system services and resources.
One solution, essentially a workaround, can be to pass large payloads through an S3 bucket by leveraging a different resource that does support the payload size. [Pro tip: Search for “AWS service quotas” to learn more about resources you use. This is a good reference to get started.]
Due to all the challenges and reasons outlined above, and because failure will always happen, latency happens. Lambdas and serverless resources are often built on retry mechanisms. This is where idempotence is critical. Services need to deliver the same result for a given input, no matter how many times they are retried or partially retried (that means that even if just a part of the flow is retried, the result still needs to be the same).
You need to design for idempotence in advance so that retries and replays do not affect the state of your production systems. A good practice is to ensure that when you run data, you create unique, yet not random, and deterministic IDs for each instance. This is a good guide for doing this right.
To understand how memory leaks happen, you first need to understand how the mechanism that runs your code works, because it too has its limitations. When it comes to Lambda functions, the same Lambda runner is reused again and again until it dies. Perhaps it runs 1,000 times perfectly, but it can start to break down on the 1,001st run and can cause issues with your services.
For example, take Python code with the same interpreter that is used again and again. If this code adds global memory objects with each run that may be passed through different instances of runs, this can lead to memory leaks, where you exceed instance memory limits. And then your Lambda will crash.
This is particularly important when using shared resources and with multitenancy architecture. You need to ensure that you don’t leave behind unused resources, sensitive data or essentially other garbage. When it comes to tenant isolation, if you’re using shared memory, you need to be very careful that data does not leak between instances, because then data can leak between tenants. We shared in a post about tenant isolation on the data layer, but this is equally true for the runtime.
Sync vs. Async Invocation
Synchronous invocation in serverless can lead to many issues (some of which are noted above, like throttling). When possible, and immediate responses are not required, the asynchronous invocation pattern is by far preferred with serverless.
Serverless generally was designed to be more asynchronous and stateless than synchronous and stateful, so it is always best to play to the technology’s strength. When you do require synchronous invocation, make sure to have the right guardrails in place like using an API gateway and have visibility through proper logging.
Guardrails for Cost and Fault Tolerance
That was a mouthful, and probably daunting for those exploring serverless as your infrastructure of choice. That said, serverless is extremely powerful, scalable and flexible, and with the right guardrails in place you can avoid these issues almost entirely.
Another common concern when it comes to running on serverless is, of course, cost, and this should not be ignored. The way you design and architect your applications will have direct cost implications as well. You need to have the proper mechanisms in place to not overly exceed resources starting with billing alerts and generally cost-aware system design, which is an extremely important practice with serverless.
The way you design and architect your applications will have direct cost implications as well.
Other areas that are a full blog post unto themselves are monitoring and testing of serverless applications. It is indeed critical to have the right monitoring, observability, logging and tracing in place to ensure that when you do write an application that is composed of 50 Lambdas, you can test that the flow is properly working and continues working correctly during runtime in production.
This is particularly true when production means 10,000 tenants. An added advantage to being aware of how serverless architecture works under the hood, by following the guidelines suggested here, you will achieve significant cost improvements, as a byproduct, alongside better system design.
Designing for Resilience and Robustness with Serverless Applications
Serverless is an excellent choice for those looking to run fast, focus on delivery and roll out products and features without too much management and overhead of infrastructure. When choosing to run serverless, you need to bear in mind that this means using a number of different AWS resources, and it’s critical to understand how each works independently as well as together, alongside their limitations and flaws.
In our next post, we will dive into additional recommended serverless design patterns including tenant isolation, connection pools for databases, and least privilege, alongside how to avoid antipatterns like tenant starvation, infinite loops, inefficient invocations, and overly large CPU and RAM instances.
Remember, like all cloud native applications in the unpredictable and dynamic cloud landscape, ensuring you have the proper visibility into the way your applications are working through monitoring and logging is especially important with serverless applications.
It is also essential to ensure security, as well as data privacy, to not compromise critical data when using shared resources and multitenancy. There are excellent DevSecOps tools that can help you do so. When you understand how your technologies work under the hood, you can optimize your design, architecture and application code for better performance, safety and fault tolerance.