Software as a Service has become the de facto delivery model for core business applications today. The most business-critical network between the end user and the application is now the public internet.
We tend to think of the internet as reasonably fast. We equate lots of bandwidth to speed and performance, but that’s not really the case. Bandwidth is only potential speed, but actual speed is measured by throughput. And limited throughput is where you see glaring evidence of internet performance issues. If you’ve ever tried a speed test across your 200MB home internet connection and only achieved an upload speed that’s a small fraction of that, you’ve experienced this.
BGP doesn’t respond well to congestion. If a BGP route exists between two points and that route gets congested, there’s no feedback mechanism for BGP.
The internet isn’t designed to be a high-performance network. In fact, it’s not one singular network at all, but a collection of many networks that is designed to be resilient to points of failure. In many cases, the internet does perform well — but a high level of performance is not assured, especially as traffic is carried around the world and into geographic regions with dubious infrastructure.
SaaS providers are highly dependent on their end users having a consistently good experience with internet performance and response time. As more and more applications are delivered via the SaaS model, developers need to understand some basic characteristics of the internet that could have a major effect on their application delivery. Two of the most common misperceptions are that the internet was designed for optimal performance, and that BGP (Border Gateway Protocol) routing helps traffic find the fastest route across the internet.
The Internet Was Designed for Optimal Performance
If there is a grand design to internet routing at all, it’s really to minimize costs and optimize profits.
The public internet is comprised of hundreds of different network providers around the world known as autonomous systems (ASes). These systems hold and exchange routing information that is basically a map that tells each AS where to send data packets. Since ASes are not always directly connected with each other, they need to route their traffic through other ASes. Thus, data packets are probably going to traverse many, many ASes, starting from where the packet originates to its final destination.
Now suppose that Network A has data packets it must send to Network Z, and this traffic is going to have to travel through a variety of ASes along the way. Each time there is a hand-off, the routing decision isn’t based on how to get the packet delivered to Network Z in the fastest way. Instead, it is based on which route will cost the sending networks the least amount of money.
It’s crazy to think that our public internet determines how to send traffic based on lowest cost rather than best performance, but that’s the way it is. “The internet was designed for optimal performance.” This myth is busted!
The Internet Uses BGP Routing to Find the Fastest Path
BGP manages how packets are routed across the internet through the exchange of routing and reachability information between edge routers. BGP directs packets between autonomous systems (ASes), sometimes within a single network AS, but more often connecting one AS to another. There’s a misperception that BGP looks for the fastest path, but that’s not always the case.
BGP was created rather hastily nearly three decades ago to accommodate routing across a burgeoning public internet system. Unfortunately, BGP has some pretty serious flaws that have magnified over time.
For example, BGP doesn’t respond well to congestion. If a BGP route exists between two points and that route gets congested, there’s no feedback mechanism for BGP to recognize that it’s sending traffic down a congested route and that it should choose another route instead.
Now consider this in context with the previous discussion on how internet providers hand-off to each other based on least cost. Provider A is handing off to Provider B using a route that was selected initially based on least-cost routing. Once that route starts getting congested, the protocol in use doesn’t have any mechanism to back off and establish another route; Provider A just keeps stuffing traffic down that congested lane. In effect, congested lanes tend to get more congested, rather than be alleviated by routing intelligence.
The BGP protocol does respond to outages, albeit slowly. If there’s a major link that’s offline, it takes a fairly long time for all the BGP routing tables across the internet to sync up and recognize that the link is no longer in service. During that time, sites are unreachable, data gets lost, and people just can’t get where they want to go on the internet.
Beyond the performance issues, BGP has some serious security flaws as well. There are well-documented cases of BGP hijacking, where a legitimate advertised route to an AS is hijacked by a malicious actor just announcing they have the best path to a destination. It’s really that simple — with BGP, you just say you’re the right path and other ASes believe you.
In April 2018, roughly 1,300 IP addresses within the Amazon Web Services space dedicated to cloud DNS were hijacked through injection of a bogus route advertisement into an ISP in Columbus, Ohio. Several AS peering partners blindly propagated the erroneous re-routing announcement. By subverting Amazon’s domain resolution service, the attackers masqueraded as the cryptocurrency website MyEtherWallet.com and stole about $150,000 in digital coins from unwitting end users.
With the BGP protocol deeply entrenched in the foundations of the internet, the entire internet routing system is essentially based on the honor system. And once again we’ve busted another myth, “The internet uses BGP routing to find the fastest path.”
SaaS Providers Don’t Have to be at the Mercy of the Public Internet
Relying on the public internet for SaaS traffic routing is a bit like playing Russian roulette; the traffic might get delivered without problems, or it might not. There’s simply too much business at risk to play this game.
There are solutions that give SaaS providers more control over how their specific traffic is delivered to and from their customers. For example, one prominent solution deploys an overlay network on top of some of the largest public cloud providers on the internet, including AWS, Google Cloud, Alibaba Cloud, etc. Sensors within these providers’ network fabrics collect data in real time about the performance of the various routes that the providers have available to them. A cloud-based orchestrator can then make decisions about how to route traffic most efficiently between a particular SaaS provider and its customers.
The orchestrator also controls virtualized routing engines that get deployed across the fabric of those public cloud providers. This routing infrastructure establishes the fastest path, at any given time, between a user and a SaaS provider, for example, to enhance data upload performance for a cloud-based storage application. Route adjustments are made in real time if performance on a different route is better. The goal is to get the best throughput, the best latency, and the tightest control over packet loss between user and provider.
This kind of solution has close to infinite scalability. When traffic goes up, more virtual cloud routers can spin up, and when traffic goes down, those containers are discarded until they are needed again. It’s an elegant way of handling capacity demands. Overall this kind of solution give SaaS providers the kind of performance control they could never get from the public internet, and that makes for happy users and low customer churn.
Feature image via Pixabay.