The 4 Definitions of Multicloud: Part 4 — Traffic Portability
Part 1: Data Portability
Part 2: Workflow Portability
Part 3: Workload Portability
Part 4: Traffic Portability
With the goal of bringing more productive discussions on this topic into focus and understanding which types of multicloud capabilities are worth pursuing, this series concludes with a look at multicloud through the lens of traffic portability.
Multicloud traffic portability means you can shift traffic between environments dynamically. If you have geographically dispersed users, traffic portability would allow you to route traffic to the nearest cloud provider that could service them. So, if your app can run on Azure and AWS, maybe there’s a closer AWS data center to your customer than Azure. Or maybe one cloud vendor works better for data sovereignty in Europe, so you route to a particular vendor only for those requests.
In most cases, the goal of traffic portability is to have the ability to dynamically shift traffic very quickly between multiple cloud platforms and on-premises data centers. This could also mean you’re balancing 50/50 traffic between AWS and Azure. Or maybe you’re doing maintenance in your Google Cloud environments, so you move 100% of traffic to another cloud temporarily. Canary deployments are another example where you’re testing something new with 5% of traffic on a public cloud provider and keeping the other 95% of traffic in your data center.
There are three types of traffic portability:
- Ingress-only traffic portability
- Partial failover traffic portability
- Full failover traffic portability
Each type comes with a different set of trade-offs in speed, reliability and cost. Some of these types only become realistic at large-scale companies.
Enabling Ingress-Only Traffic Portability
Ingress-only traffic portability is implemented with a hub-and-spoke architecture, where one “hub” data center or cloud contains most of your data and does most of the coordination. At the end of the “spokes” in this architecture are all of your front ends, and they’re mainly taking ingress traffic. You also could call it “frontend-only”.
The main attributes of ingress-only traffic portability are:
- Ingress traffic can reach any frontend
- One request may touch multiple environments
- Useful for caching at the edge and reducing latency
- Requires workflow portability
- Upfront investment: Medium
The caching and latency benefits of this architecture come from the fact that you’re baking in a hierarchical design. This requires that the applications are aware of this architecture, and it usually requires workflow portability so you don’t have to manage ingress to different front ends in different environments with different workflows.
Partial Failover Traffic Portability
This type of traffic portability is an enhancement of the data and services you’re moving compared to ingress-only. It adds the requirement for partial replication of your backend systems of data in each cloud region and data center. It’s not a carbon copy of your backend data center, but most of your backend services and data are there. This might include data sharded by region and some data in a central location for all regions.
The main attributes of partial failover traffic portability are:
- Data might be sharded by region
- Some backend systems and data replicated between regions
- Improves high availability (HA) and disaster recovery (DR)
- Requires workflow portability
- May require limited data portability
- Upfront investment: Large
Unlike the hub-and-spoke model, where there’s a single point of failure, this model has more HA and DR capabilities if your central data goes down. However, this has greater data portability requirements than the ingress-only model, and both require some form of workflow portability.
The advantage of partial failover compared with ingress-only is that the overall portability of the app and its traffic is increased with a partial failover option, and when you start to do traffic shaping to different clouds and data centers, the important data is already there.
Full Failover Traffic Portability
Full failover traffic portability is the most complex type of traffic portability. With it, you gain the ability to completely take down one site (on-prem or cloud) and failover traffic to any other site. Your sites all have the necessary data in this scenario, so this allows ingress traffic to reach any front end. Full failover also provides maximum HA and DR. With full failover, if you have copies running in multiple clouds, all your traffic shaping will be portable.
The main attributes of full failover traffic portability are:
- Ingress traffic can reach any front end
- Requests can be completed in each environment (no call to a “hub” needed)
- All systems and data are replicated
- Maximum HA and DR capabilities
- Requires data, workflow and workload portability
- Upfront investment: Very large
This type of traffic portability is extremely rare since it requires all four types of multicloud workflow portability. There’s no need for a hub-and-spoke architecture anymore because each request can be processed entirely in any location.
Trade-offs for Each Type
Most of the trade-offs for each type of traffic portability are related to cost, speed and reliability.
- Cost: It gets much more expensive as you go from ingress-only to partial and full failover.
- Latency: The more data available in each infrastructure location while moving from ingress-only to partial failover and to full failover, the faster your application processes become due to fewer network calls to remote locations.
- Reliability: HA and DR get better as you move from ingress-only to partial failover. And they’re at their peak in full failover traffic portability
Because of large cost increases for any form of traffic portability, a deep assessment of costs and benefits is required. Partial and full failover traffic portability should only be a consideration for large, web-scale companies. These forms of traffic portability are rare even among companies that can afford it.
Usually, the customers we see using partial or full failover do it because of regulatory or contractual requirements. As an example, Walmart requires most of its vendors to run their workloads on non-AWS platforms.
At a smaller scale, ingress-only traffic portability is a more realistic option to avoid situations where traffic must be forwarded on a per-application basis, meaning some services must always go to a particular cloud, region or on-prem data center.
Recapping Data, Workflow, Workload and Traffic Portability
To wrap up this article series, here’s a summary of the four main definitions of “being multicloud”:
- Data portability: You have the ability to move data from one cloud provider to another, either continuously or during a break-glass event.
- Workflow portability: You have development and operations workflows that are compatible across multiple environments, whether they be cloud or on-premises.
- Workload portability: You can push a button and move a workload from one cloud or on-premises data center to another.
- Traffic portability: You can shift traffic between environments in a dynamic way.
The one multicloud capability I encourage every company to plan on building, even if you’re not yet in a multicloud or hybrid cloud scenario, is workflow portability. Even if you’re firmly in the single-cloud camp, there’s minimal extra investment required to start using tools and workflows that are cloud-agnostic.
Data, workload and traffic portability are rarer forms of multicloud portability. Although they can bring significant gains in speed and reliability, there are other ways to get those things without the added complexity and sometimes massive investment of those three options. Often, it’s better to accept data, workload and some traffic lock-in rather than trying to build complex architectures to support any contingency.
Workflow portability, in contrast, removes a great deal of complexity when multicloud finally comes for you. Having a separate workflow, toolset and required expertise for two different clouds (or cloud and on-prem) is a waste of your engineers’ time and resources when a unified workflow is a viable option. It’s this view of the options that led us to build HashiCorp’s products the way we did — on the premise that workflow portability provides the most cost-effective and useful benefit for operating applications and infrastructure in the real-world, multicloud environments enterprises are working with today.
The Other Definitions of Multicloud
If you haven’t read about the other three definitions of multicloud — data portability, workflow portability and workload portability — check out the first three articles in this series to understand the trade-offs and enablement patterns for each.