Benchmarking Serverless: IBM Scientists Devise a Test Suite to Quantify Performance
Serverless technologies promise to simplify scalability. But while delegating the job of running functions to a cloud provider, and letting it decide how to manage execution, sounds like a good idea, the developer needs to know what to expect in terms of baseline performance.
A pair of IBM researchers are developing a test suite to better understand, and compare, the performance characteristics of serverless platforms.
“What we wanted to do was have a way to understand how these serverless platforms work,” said Michael “Dr. Max” Maximilien, IBM engineer, who presented the work, along with fellow IBM cloud engineer, Nima Kaviani, in a session at the Cloud Foundry Summit Silicon Valley 2017.
Named in the grand tradition of the SPEC series of benchmarks that cover the supercomputing community, SPECserverless will be a set of tests for defining baseline performance of serverless offerings.
To characterize serverless performance, they devised four types of serverless jobs:
- CPU intensive, involving finding a large prime number.
- Memory-intensive, with lots of matrix multiplications.
- Jobs with a heavy reliance on back end data source such as a database.
- Jobs with heavy network requirements.
They also looked at three dimensions:
- Invocation: The variance of random, spiked, or periodic workloads.
- Payload: The size of this input and output of workloads.
- Concurrency: The number of jobs a service could execute in parallel.
There are also a number of features that can be captured as part of the testing process. They include:
- Target clouds
- Public, dedicated, or local deployments
- Scheduling support
- Dashboard interface
- Service-level agreement
At the conference, the duo shared some preliminary results when running the test suites on four serverless packages, IBM OpenWhisk, Microsoft Azure Functions, Iron.io and Amazon Web Services’ Lambda, all conducted around the third quarter of 2016.
They also set up their own Cloud Foundry-based service, nicknamed “cf serverless” to compare serverless performance with that of setting up an in-house PaaS-based service.
The researchers deliberately avoided making any performance comparisons among different serverless platforms, and the numbers might have changed since then anyway, Kaviani noted (“If you draw any comparative conclusions, go do it yourself,” he added). The purpose of this run was to better understand how platforms behave in general.
“It’s important to understand how your cloud platform scales if you launch a lot of parallel requests,” Kaviani said.
All about the Baseline
The IBM engineers shared some intriguing results to their work, such as those dealing with concurrency. With Azure Functions, for instance, the more jobs sent to the service at the same time, the slower the response time becomes for each job — until a certain threshold of a total number of jobs is reached. After this threshold, the time to complete the jobs actually went down:
The point at which response time got better was probably the point that the service launched another container to execute the jobs, Kaviani noted. Both Lambda and OpenWhisk were able to better accommodate increasing workloads without as much variance in response times, he said.
Those memory-intensive functions that are launched sequentially — after an initial warm-up period when the service sets up a container on the back-end to run the function — are pretty linear in execution times. However, if multiple instances of some memory-intensive function are launched in parallel, then “it can significantly degrade the performance of the system,” said Kaviani, speculating that too many jobs may crash a container, which would need to then be rebooted.
“For something heavily memory-intensive, we noticed that Azure was not able to cope and scale well when there are a good number of requests that go in,” Kaviani said.
One takeaway from the tests was that the more money spent on serverless does not always guarantee a corresponding improvement of performance. “It very much depends on the architecture of the platform, the way they manage the function, the way they manage the containers,” said Kaviani.
Certainly, the timing for SPECserverless is apt, as many investigating the use of serverless are quantifying their own experiences with the technology.
For instance, John Chapin, co-founder of the Symphonia serverless consultancy, experienced much frustration in trying to establish a baseline in how Lambda responds to low-memory requests.
“Establishing performance baselines using just a few invocations of a Lambda … is simply not sufficient to predict the behavior of the Lambda over a longer period of time,” Chapin wrote in a blog post describing his work.
Chapin wrote a basic AWS Lambda function using Java 8 to execute a recursive Fibonacci algorithm. Since the different Lambda pricing tiers are based on the amount of memory used, Chapin surmised that the more memory procured, the better the performance would be. “Lambda configured for 256MB of memory would be twice as powerful as a 128MB Lambda, and a 1GB Lambda should be twice as powerful as a 512MB Lambda,” he wrote.
This turned out not to be the case though. Not all the time, anyway.
AWS bases performance scaling on the worst possible times. So there may be times when a 128MB service may equal the performance of that of the much more expensive 1.5GB option. AWS is only claiming that the worst performance of something running in a 1.5GB environment will not be as bad as one in the 128MB environment.
Cloud Foundry is a sponsor of The New Stack.
Feature image: Dr. Max, at the Cloud Foundry Summit Silicon Valley.