How Amazon Web Services Isn’t Winning and the Problems it Poses
Today, Amazon Web Services (AWS) is the undisputed leader of the public infrastructure-as-a-service (IaaS) market. And as more and more organizations realize the massive business agility benefits of public IaaS, it is difficult to envision any near future that doesn’t have Amazon as a critical infrastructure provider to an increasing number of companies. This is especially true when one looks at the intelligent investment that Amazon has put into continuous improvement of its public IaaS, from more and more instance types to more and more efficient and easier ways to use AWS, like Lambda.
But it is not enough for AWS to iteratively improve the public IaaS market that it created. Because while public IaaS delivers great business agility to companies, there are even greater agility gains to be had beyond today’s best-practices public IaaS deployments.
Specifically, the next innovation seems to be in moving from servers-as-cattle to no servers at all.
AWS seems to understand what the future holds (see the aforementioned Lambda). But Amazon’s ability to compete effectively is held in check by the way it is built. And it will continue to be held back unless Amazon changes some fundamental aspects of AWS. In fact, a fair analysis of the serverless space shows that instead of being the leader, Amazon is a distant follower.
An example best serves to show how Amazon is behind. After going through the example, I will outline why I think Amazon will need to make some fundamental structural changes to compete effectively in this new serverless battleground.
So let’s walk through the example of building an application from the ground up. Instagram is a useful and relevant example application, because it’s relatively simple to understand and has some core features that many modern applications — even in the enterprise — are asked to include: mobile app, responsive web app, some element of social sharing, and some element of private data ownership. And for the purpose of this example, let’s think about an Instagram-like application with these features: sign in with social media credentials, search for and add friends to your “private sharing”, and post photos either as public or private.
Let’s compare Instagram’s published 2012 architecture (which is what most people would still consider to essentially be “best practices” for a public IaaS deployment today) with an AWS serverless architecture today and with a Parse serverless architecture today. And I’m not going to talk about the mobile app or responsive web app architecture, because they should essentially be the same. For our purposes, we we’re just talking about the backend (and middle tier).
|Instagram 2012||AWS Serverless
|Application Servers||25+ running
|Data Store: Users, Metadata||12+ PostgreSQL,
using EBS RAID,
copying lots of data
|Parse Data Store|
|Data Store: Photos||S3||S3 through
|Parse Data Store|
writing into Parse
|Asynchronous Processing||Gearman on EC2||Lambda||Cloud Code|
The table best illustrates why the serverless future is so desirable. The Instagram 2012 architecture is filled with so many different technologies and so many servers. This means that Instagram 2012 requires really good knowledge and understanding of those technologies. Updating Instagram 2012 also requires thinking about how those various systems will be affected.
It’s certainly much better than having to know all those technologies and also operate a data center. But why, exactly, should Instagram have to know about mdadm or Fabric? Those are general-purpose tools that could be outsourced to other companies while Instagram focuses on only Instagram-specific requirements.
And that’s exactly what the serverless future does. It’s not that technologies like mdadm or Fabric go away, but they can be run by other companies. Amazon recognizes this, and that’s why AWS has talked up Lambda so much — it’s a way of running code without needing to run servers.
The problem is that AWS’s serverless future (at least as it exists today) is substantially worse than one of its competitors — Parse, owned by Facebook.
That’s right, Facebook — a company not exactly pitched as a cloud powerhouse, but which currently owns a company that does serverless better than AWS (at least for this example).
You can also see by the number of different services listed as you go from left to right that while serverless AWS today certainly beats Instagram 2012 for simplicity, Parse is an even better integrated serverless solution than AWS. I’ll explain why, but first let me give more detail on why AWS lags behind Parse.
Sharing is the best example for how Parse beats AWS in the serverless Instagram architecture. AWS’s serverless identity service is called Cognito, but it is a single-user data synchronization. This is most useful where you’ve written a video game and want to save the current level that the user is on, and completely unhelpful if you want to do any kind of sharing that’s not 100 percent public to the world.
I worry that data sharing through Cognito is a substantial problem for AWS to implement, because of the way its security models work. Specifically, AWS is reusing its identity and access management (IAM) service, which is mostly used for organizations’ own internal access to AWS, as the way to handle access control for all of the various external users that are coming into Cognito through mobile apps and the Web. As such, actually being able to write IAM permissions and policies is a pretty big deal, and even though AWS has some amazing flexibility built into IAM, AWS isn’t built to store massive numbers of different IAM policies.
Parse and Firebase, another serverless service now owned by Google, do data sharing through saving object-specific permissions, which means having a ton of different security rules. This is largely a result of having “untrusted” client connections directly to your data stores, instead of through a “trusted” middle-tier application server. It seems likely that there is no way to handle serverless social sharing without object-specific permissions and a large number of security rules. That means Amazon will have to make substantial changes in order to support modern sharing functionality.
AWS has fundamental structural problems to solve even if it can figure out how to do serverless sharing. It has the IAM issue and a number of other issues that makes a serverless AWS environment substantially worse than using Parse. Specifically, AWS has been built as an API-driven service that does discrete general-purpose tasks. This means:
- It requires asking IAM to handle trusted internal users as well as completely untrusted external users.
- Through this API-driven process, AWS is asking each of its other services to also work with untrusted external users.
- This forces AWS services to work in complicated ways that were never originally anticipated. Ultimately, instead of AWS being a series of microservices, it begins to feel more like one monolithic application that seems to take an ever-increasing amount of time to add significant features. You can see some of this in a recent review of AWS’s API Gateway.
There is no comparison for software developers looking at today’s serverless AWS environment in comparison to Parse or to Firebase. AWS pales in comparison.
The developer experience (DX) with both Parse and Firebase is dramatically simpler than what AWS offers. It’s just not clear that AWS is going after frontend developers (usually the most important developers as far as business customer experience goes) in any kind of serious way. All of the AWS Lambda and Cognito tutorials are really basic and rudimentary from a frontend perspective; at best, they handle authentication and some minor data transfer. Compare that to Parse’s litany of rich tutorials, and you’ll see why frontend developers don’t pick AWS as their serverless backend.
I also see a philosophical issue at play here. Amazon seems to believe the base building blocks that companies need are API-driven virtualized hardware. In 2008, Amazon went after the data center and the IT staff who controlled machine and VM deployment, and allowed developers to do all of those jobs, using APIs. Amazon has stayed true to this vision, and continues to add more and more API-driven services, adding to the complexity of using Amazon Web Services, and forcing organizations to hire cloud architects and DevOps engineers to manage these services. Even the services that are theoretically more abstracted, like Lambda, in practice require some significant systems administration and/or DevOps experience to run properly.
In contrast, the serverless future seeks only to manage those services that require business-specific logic. Running virtual machines, storage, backups, load balancing, caching, and so many of the things that AWS excels at are not business-specific, and can be outsourced without issue for the vast majority of applications. The base building block for the serverless future is an abstracted set of compute resources that execute business-specific code.
So why is Amazon having a hard time competing head-to-head with either Parse or Firebase now? Perhaps this is because Amazon sees Parse and Firebase tackling small parts of the market, but I think thick client apps that hit a serverless backend are clearly where businesses are going, and Amazon sees that too. That’s why they’re putting out Cognito and Lambda and API Gateway. It’s just that Amazon is hamstrung by their existing services, existing customers, and overriding product philosophy.
The only way for Amazon to preserve its lead into the serverless future is to separate their serverless services from their virtual-data-center-focused services, and drive forward without the server-focused albatross.