What news from AWS re:Invent last week will have the most impact on you?
Amazon Q, an AI chatbot for explaining how AWS works.
Super-fast S3 Express storage.
New Graviton 4 processor instances.
Emily Freeman leaving AWS.
I don't use AWS, so none of this will affect me.
Cloud Services / Tech Life

Why Parler Can’t Rebuild a Scalable Cloud Service from Scratch

How screwed is Parler or can just anyone build their own cloud?
Jan 19th, 2021 9:00am by
Featued image for: Why Parler Can’t Rebuild a Scalable Cloud Service from Scratch
Feature by Jasmin Schreiber on Unsplash.

When Amazon Web Services informed Parler that it was terminating its hosting deal, the social platform initially maintained that it would be back online in a week. It appears to have secured DNS hosting from Epik (the Sammamish, WA-company that hosts sites like 8chan, the Daily Stormer, Gab and Stormfront who have been dropped from other providers and recently lost its relationship with PayPal because of that).

But a later statement indicated that other providers are not willing to host the platform and as a result, the service may not return: “most people with enough servers to host us have shut their doors to us,” a statement to the Parler user base read and the company’s legal team suggested in court might not return without access to AWS.

Even if the social network finds a hosting provider, setting up all the different services it needs could be prohibitively slow, or even impossible. For all the talk of hybrid and edge computing, hyperscale cloud still has some hard to beat advantages for organizations with small infrastructure teams and not much capital expenditure budget who need to build large services quickly.

A Very Cross-Cloud Migration

Parler hasn’t published specific details of its architecture, although it initially said it doesn’t use any AWS-specific services that would tie it to one cloud. “We prepared for events like this by never relying on amazons [sic] proprietary infrastructure and building bare metal products.”

Certainly, Railsbridge founder Sarah Mei deduced from the details about outages in summer 2020 that the primary data store is relational, and that the infrastructure ream is unusually small. Using a relational data store does make it easier to design features like allowing users to edit posts (something Twitter would find it hard to support because of the sharding in its distributed data store) and running relational databases is a well-known task (although Parler appears not to have much experience with relational database indexes). Parler may even have chosen to run relational databases in VMs rather than using RDS to make any potential migration easier, but using a relational data store is an unusual implementation decision for a social network because of how many table joins common operations will require), so it’s going to need a hosting platform with the same kind of performance and latency as AWS to avoid that slowing down the user experience.

But Parler’s own lawsuit against AWS stated that “both the apps and the website are written to work with AWS’s technology.” Internet sleuths found multiple AWS services that Parler was using: AWS Certificate Manager, AWS CloudFront caching, AWS Application Load Balancer (with routing and redirect rules), Elastic Load Balancing for the Parler API, Route 53 for DNS resolution based on geolocation, S3 storage and even Amazon Simple Email Service for mail.

It’s logical to use the high-level services in the cloud provider you use for hosting; they’re cheaper, more efficient and more secure than building your own load balancing with NGINX or HAProxy, using Squid for caching, picking a separate provider for geo-based DNS routing. ELB has four different load options for load balancing built-in. Parler didn’t even choose to run its own mail server; a relatively simple task even for a commodity that adds no business differentiation.

But it also ties you to AWS because to migrate to another cloud, you’re going to have to reimplement all of those choices with the equivalent services in the cloud you move to, which is why most migrations from one cloud provider to another are long-term operations done after plenty of planning — and often with assistance from the cloud you’re moving to.

“All of those are relatively building-block level primitives, and people delude themselves into thinking that if I’m using those things, those are things that have equivalent across all the other providers, so it should be easy to move. It’s not.” Corey Quinn, chief cloud economist at The Duckbill Group, told the New Stack.

“That is the lie everyone tells themselves. So what they’re doing is slowing down their own feature development, because they have to build things from those primitives [rather than using higher-level services]. Yet when they have to move off it turns out ‘oh, we have too many dependencies on things that don’t really exist in physical data centers.” The idea that you’re building things with a strategic exit path; you just think you are. You’re cosplaying as people who know what they’re doing.”

The idea of building for “multicloud” in the sense that you can easily switch back and forth between different clouds is appealing, but in practice, it ties to your lowest-common-denominator features — and there will still be work to do to move anything beyond a self-contained VM.

Tools like Terraform produce scripts that are specific to the cloud provider. even using VMware services on different clouds have differences in day to day operations.

And while containers and Kubernetes go some way to addressing this with self-contained packages that run on ostensibly interchangeable infrastructure and even the option to burst out from a Kubernetes cluster to a cloud Kubernetes service using virtual kubelets, making that infrastructure truly interchangeable means making the package truly self-contained. That means that instead of calling out to cloud services for auth, storage or anything else, you have to have your own implementation of all of them.

Hybrid cloud hardware that promises to put the cloud into your data center is the most effective way to do private cloud without replicating all the work public clouds have done. But again, Quinn noted, “that presupposes that you’re able to do business with companies.”

“AWS Outposts need to be able to sync to a primary region at least every six hours.” Azure Stack Hub can run without a continuous connection to Azure, but that degrades or blocks some features, and you still need to have an Azure subscription. In both cases, there’s a limited subset of cloud services available and you need to have expertise in running hybrid cloud systems.

One emerging option is Oracle Cloud dedicated regions, Quinn said. “They claim every service [running] in your facility at full bore, and the price for an endeavor like that is something like $4 million or $6 million a year.”

Building a Cloud Is Hard

Running a large and popular service doesn’t necessarily need the scale of a hyperscale cloud. A site like Stack Overflow with over 100 million monthly visitors can run on a relatively small number of servers (somewhere between 20 and 50 servers for serving the site itself, although with more hardware for networking, logging, monitoring, backup and other key infrastructure tasks). (Quinn notes that Stack Overflow also has built out a CDN, so it’s not accurate to think of it being served from a single location on a handful of servers.)

Although Stack Overflow uses machine learning for recommending questions to show on the home page, that’s not on the scale of say, Facebook, which has developed its own AI acceleration hardware that you’d need to use AWS, Azure, or GCP to match. If a service like Parler wanted to build an algorithmic feed to recommend posts, groups or other members, that would require more infrastructure.

But even without that, a service needs more than just web site hosting and load balancing. DDoS attacks keep on rising; in 2020 they doubled from Q1 to Q2 and then doubled again in Q3 according to Cloudflare. A growing number are small, short attacks that might be over before you can spot them with manual analysis — but can still do damage. Given the rise in ransom-driven DDoS attacks, every service is going to need a DDoS provider, preferably one with tools sophisticated enough to distinguish good visitors from malicious bots; doubly so for one like Parler.

Building everything from the MFA and authentication services for signing up new users and changing passwords securely to billing systems to backup and monitoring is like going back in time. “You wind up basically turning the clock back 20 years to a time before all these things existed and you’re building it all yourself from scratch. It’s possible but it’s the undifferentiated heavy lifting that took so much time, energy and money, and the miracle of cloud is that you don’t have to do that.”

Just designing the right architecture and infrastructure for a social network takes expertise (and there is plenty of competition for staff who have that), Quinn pointed out. “This is an area where experience counts for everything, where you have to have had experience scaling things at large scale and understanding how these things break under pressure in order to be able to effectively reason about these things.”

Trying to scale compute and storage and be resilient across multiple regions (to avoid downtime and deliver good connectivity for users across the country and around the world) can be tricky to do even in cloud. Building all the infrastructure yourself, and building it out as your business grows is another thing entirely.

Take cloud storage. “Every large cloud provider has an object-store. If you want to run an object-store in your own data center, you’re either running some janky open source project that’s going to have performance issues, or you’re installing some vendor’s storage solution on top of a bunch of big servers and drives.” What cloud offers isn’t just the ready-made object storage but extreme levels of durability which means that hard drive failures don’t take production down; in fact, they can happen at any time and have no impact on availability.

Cloud services like AWS, Microsoft Azure and the Google Cloud Platform have expertise in storage (and every other area of their business) that’s hard to match, and that’s what a cloud subscription lets you take advantage of. “There’s a team of people who have devoted their entire careers to exploring the extended sum total of human knowledge about how hard drives fail, and how to avoid that and how to replace them in the most optimal way for the least possible amount of cost.”

Cloud services use erasure encoding; AWS doesn’t give specific details but Microsoft documented how it uses erasure encoding in Azure to save space and reduce costs some years ago. “They chop a file or an object into, let’s say, 100 blocks and any 70 of those blocks can be reassembled to deliver the file. That’s the only way to get those ridiculous durability numbers; AWS can lose an entire facility and not have S3 subject to data loss.”

You can do erasure coding in Windows Server with Storage Spaces Direct but that doesn’t get you the equivalent of cloud storage, which likely doesn’t make sense in your own data center. “Object storage is obnoxious, finicky and takes you out of your way, because it’s an intermediary layer on top of the SAN. Why not talk directly to the SAN  and save yourself the complexity at the point where things can break.” Of course, that will mean rewriting the part of your stack that was predicated on object storage.

But you’re also not going to get the near-infinite storage of cloud. “You can’t write to it faster than they can add additional capacity,” Quinn noted, adding that if necessary, Amazon can temporarily evict its own workloads to give customers more resources while it builds out capacity.

Provisioning VMs is different in a hosting or colocation facility, or your own data center than in a cloud IaaS service where agents will be automatically applied to the VM and some services will be available immediately.

Networking in clouds services is software-defined and virtualized and backed by globe-spanning connectivity: a mix of peering, points of presence, interconnects, shared and wholly-owned subsea cables and even satellite connectivity that means that any instance in any region can connect to services and other instances and be accessible externally as soon as you configure that in software. The internet connectivity you get from an ISP or as part of a hosting package will require more configuration and much more management (and you’ll need alternate fiber and suppliers in case a backhoe hits your connection). Prepare to get used to dealing with mistaken or malicious BGP routing that redirects internet traffic: cloud providers take care of that for you.

Network connectivity inside a cloud region is also under a lot of pressure, Quinn pointed out; “top of rack switch congestion is something most people don’t think about because most people have not built these things at scale”. Changes to the network also have to be made carefully; Azure, for example, emulates its entire Azure network and practices all major changes in advance to avoid mistakes.

Of course, that’s once you have a data center and servers. Compute, storage and networking all have to be provisioned for peak load in your own data center; that means negotiating server and network hardware prices with vendors, getting them supplied and delivered to a facility that’s ready to install them (trickier in pandemic 2021, and something that typically takes two to three months between vendors and hosting facility). Hardware, data center space and internet connectivity is going to add up to either a large bill upfront or multiyear contracts.

Don’t forget about power and cooling. Green energy might not be a focus for Parler, but for organizations committed to renewables as part of corporate social responsibility programs, cloud is a meaningful option Paul Johnston, former AWS architect and tech environmental consultant, told us. “Cloud is brilliant at bringing the economies of scale for things like upgrading a grid to renewables. So the consolidation of the data center/cloud industry is a good thing in the sense that it gets the focus on greening our workloads.”

Cloud Changed the Game

“Every AWS region cost multiple billions of dollars and takes years,” Quinn pointed out. If you need even a fraction of cloud services, you can’t build the infrastructure or the stack overnight, especially when it’s uncertain how fast you will grow or even whether you will stay in business.

“The real power of the cloud is that instead of having to invest hundreds of thousands of dollars building out even a relatively small setup and spending months doing it, I punch a credit card into any of these cloud providers. I get access to the console immediately. I start clicking around or using APIs, I get empty server like things or higher-level managed service offerings, and I’m able to begin working with them immediately.”

Experimenting is cheap: and if the business is a success, you can grow on the same cloud provider to the scale of a major organization. Along the way, that cloud provider will be building new features and offering new instance types that are faster or cheaper, at a speed and scale that private data centers find it hard to match; it’s a virtuous cycle. “Effectively no one is starting with an idea today and trying to prove it out and no using a cloud provider — just because why would you?”

Even migrating to the cloud takes time and that’s with cloud providers and third-party services offering ready-built migration tools. There’s little help to go the other way.

“Maybe at some point in the far future, I decided to pull a Facebook, and realize for these workloads for the scale that we’re at, maybe it doesn’t make sense to have this live on the cloud. Dropbox tried that, it didn’t go well… At that point, I can very intentionally build out a data center or rent data center space and migrate those very specific workloads that can’t live in the cloud for whatever reason.” But that’s not what Parler would be trying to do, Quinn said.

“How would I move off AWS if using it becomes untenable for a variety of reasons is a very real, very valid concern that a lot of companies have, and there is absolutely nothing wrong with asking that question.” The solution to that could be cloud migration, multicloud, hybrid cloud or a private data center. But that’s a different issue from the one Parler faces. “What if my company becomes so radioactive that no respectable company will agree to do business with us?”

Most organizations don’t have to view cloud terms of service as an existential threat. If AWS won’t deal with you, you will have trouble becoming a customer of mainstream data hosting services, DDoS protection services and other vendors who have the capacity and connectivity to keep a service like Parler online even if they can build out their own technology stack.

Group Created with Sketch.
THE NEW STACK UPDATE A newsletter digest of the week’s most important stories & analyses.