Part 1: Data Portability
Part 2: Workflow Portability
Part 3: Workload Portability
Part 4: Traffic Portability
HashiCorp sponsored this post.
The world is going multicloud. But what does that mean precisely?
When we talk about the word “multicloud” we tend to talk about it in a monolithic way. But if you look just beneath the surface, you’ll find that people have different ideas about what it means — so these conversations need to be more specific. This article series is aimed at fostering more productive discussions on this topic and understanding which types of multicloud capabilities are worth pursuing.
The 4 Definitions of Multicloud
The different use cases for multicloud are not apples-to-apples comparisons. They don’t always preclude one another, meaning you can apply more than one to your systems. When we talk about multicloud definitions, they’re all connected through the lens of portability. This series will cover four multicloud definitions:
In this article, we’ll talk about data portability.
Multicloud data portability means having the ability to move data from one cloud provider to another. One important goal is to see outages, market conditions, changing prices or shifting vendor relationships. Another goal is picking up and moving your data as you see fit.
The biggest impediments to large-scale data portability are the speed of light and the cost of bandwidth. Bandwidth and latency will always be big bottlenecks in computing. Network egress charges tend to be relatively high, so cloud architects tend to place workloads and their data on the same cloud to minimize those costs and reduce latency.
This concept is called data gravity. Moving large amounts of data across the network to another cloud costs a lot of money and takes a lot of time. At a certain point, it’s more efficient to physically load disk drives onto a truck and drive them to a new data center.
Break-Glass vs. Continuous
There are two ways you can architect for data portability:
- Break-glass portability: You want the option to move your data as an escape hatch or potential business decision down the road.
- Continuous replication: You want your data continuously available in multiple cloud regions.
In many cases, continuous replication is not an option because many organizations didn’t choose or consider continuous replication early on, so for those organizations, they can only change what they do with new data systems.
Organizations that are at a starting point for data storage have two fairly distinct choices with very different business costs. Depending on which one you pick, you’ll have to make significant architectural choices to support it.
- Continuous: The costs are constant. You’re actively paying the cost to replicate data across multiple sites. As each additional record comes in, you’re paying an incremental cost for it then.
- Break-glass: The costs are incurred once. You might be accumulating data in one location in a giant data lake, and then when you want to exercise the option to move all that data, your bill is much larger. Instead of paying for ten records every hour, you’re going to pay for a billion records to be moved in a single migration.
If you’re not replicating data on a consistent, smaller basis, this graph shows how the cost of a one-time data move (the blue “Option” line) grows over time. However, in many cases, your data increases exponentially, which would make the Option line an upward curve and the Continuous line a rising line instead. Sometimes, the cost of any data portability is extremely high depending on the workload.
- Initial cost: For both types, the initial cost is low because you don’t have much data yet.
- Ongoing cost: The ongoing data portability cost is essentially zero for break-glass portability, but for continuously moving data, you’re gradually paying down the cost of data portability by replicating each new data record to multiple locations.
- Deferred cost: The deferred cost of break-glass portability is very high because if you need to move and replicate all of your data, the bill might be enormous. With continuous replication, you’ve already paid for those data moves in advance each time new records are created.
A good analogy is a stock option versus insurance. Break-glass portability is like a stock option — you pay a small cost upfront to have the option to exercise, but it’s expensive should you choose to exercise it. Continuous replication is like insurance, you pay your bill every month. It may not be useful to you that often, but when you do need it, it doesn’t bankrupt you.
Speed, Scalability, Resilience, Observability, Manageability
Other factors such as speed, scalability, resilience, observability and manageability often favor the continuous replication path. A small collection of data in the gigabytes to low terabytes can certainly be manageable, resilient, observable and reasonably fast to move in a one-time portability scenario, but those transfers become less practical as the number of transactions and the data size increase.
Small-scale data systems and certain use cases might not need continuous replication in the first place — in these instances, the speed or reliability of continuous replication is irrelevant. Where continuous replication really matters is in systems that want to take advantage of dynamic, cloud native applications where large portions of data need the ability to become instantly available and globally distributed to another region or cloud vendor.
Third Choice: A Plug-and-Play Proprietary Architecture
There is also a direction that gives you neither break-glass portability nor continuous replication — cloud proprietary solutions. If you choose proprietary cloud database services such as AWS DynamoDB, Azure CosmoDB, GCP Spanner, etc. you’re locked in with that cloud vendor. Your data can be very difficult to move if your organization starts using another technology. You’d get all the usual benefits of a commercially supported solution and portability between that vendor’s solutions would likely be fairly smooth, but locking yourself into one cloud database service could be a substantial risk depending on what your domain and needs are.
Enabling Break-Glass Portability
For the break-glass route, you need to have a common interface to each of your data regions. This means you’ll want to use open source databases or proprietary solutions that are widely available. Hosted versions of MySQL, PostgreSQL and other open source DBs will work just fine. Having a compatible interface where your applications are speaking the same API is more important. For example, you can use AWS Aurora but still have application-level compatibility with MySQL because of a common API. Your data systems also need to have import/export functionality, so that you can easily move the data out of one location and into another.
You don’t need to be able to do this in real-time, Since these are batch jobs that are only executed on demand. One downside, however, is that you might have to shut your site down temporarily for this data migration depending on your datastore and if it supports an incremental export/import.
Enabling Continuous Replication
For the continuous route, you need systems that support real-time replication. There are several data systems that focus more on this cloud native use case, including CockroachDB, Cassandra and others. These systems provide continuous availability of data across regions. Many traditional database systems often support real-time streaming and replication as well but may have more complex failure modes or may only support active/passive configurations. Continuous replication also needs a compatible interface, just like break-glass portability, but in addition, it requires a compatible implementation.
With break-glass portability, we are doing an export and import, so the implementations can differ. With continuous replication, the application must support replicating data on a transactional level. This implies that you cannot mix implementations, such as AWS Aurora and standard MySQL to set up continuous replication.
Understand Your Needs Clearly Before Choosing
The key to harnessing multicloud data portability is clearly understanding your system needs as early as possible. You might decide that data portability won’t be necessary down the road and that using a cloud proprietary technology is fine. You might want to design a system that at least has the option for a large data migration later in a break-glass scenario. Alternatively, you might choose to design a continuous data replication system so you don’t have a large bill down line or to support global deployment.
Regardless of your choice, every decision has upfront systems design considerations and business cost profile analyses. It’s also important to note that the costs between these two options might vary widely based on the types of workloads you’re using in your system. Your system domain and how much data you have will also be major factors in which path you choose.
The Other Definitions of Multicloud
As this series continues, you can read about the other three definitions of multicloud — workflow portability, workload portability and traffic portability — to understand the trade-offs and enablement patterns for each.
Featured image via Pixabay.