Data Science / Software Development / Contributed

How Database Virtualization Could Break Vendor Lock-in

4 Jan 2022 10:00am, by
Mike Waas
Mike Waas is the founder and CEO of Datometry, a SaaS database virtualization platform enabling existing applications to run natively on modern cloud data management systems without being rewritten. He has held senior engineering positions at Microsoft, Amazon, EMC, and Pivotal and is the architect of Greenplum’s ORCA query optimizer. He has over 40+ scientific peer-reviewed publications and holds 20+ patents.

In the age of public cloud, shouldn’t vendor lock-in be a thing of the past? Every other discipline in IT has been transformed in the past 20 years by virtualization. From storage and compute to networking, virtualization has revolutionized the space. And then there’s database. Database is not only one of the oldest branches of IT, but also one of the last bastions of vendor lock-in.

How strong is the grip of legacy database vendors? Frank Slootman, CEO of Snowflake, admitted on a recent earnings call, “It’s not that easy to pick up that workload and move it as it costs a lot of money. [Teradata have] done a good job making it bloody hard.” Slootman is certainly not one to play the vulnerability card easily.

The reasons for this vexing lock-in are twofold. First, workloads on legacy systems are demanding and anybody who wants to replace these systems needs to have a strong alternative. The other is the simple fact that rewriting applications to make them work with a new database is an excruciating exercise with sorely inadequate tooling until now.

A new discipline of virtualization, however, has the potential to level the playing field. By virtualizing the database — not the data, and not the hardware on which it runs — applications can run on different databases without having to change their SQL or API calls.

Virtualization has been the big equalizer in IT. Here’s how it can break the lock-in on databases.

How to Virtualize a Database Anyway?

Database virtualization is implemented as a component connecting applications and databases. Effectively, it serves as a hypervisor for queries and API calls. An application written for one database can then run natively on an entirely different database — without having to change its SQL or API calls.

Database virtualization translates queries and data in real time, back and forth. Because it sits in the data path, its expressivity is much higher than that of static conversion of queries. For example, database virtualization can emulate dynamic concepts that require knowledge that is only available at runtime.

Interestingly, there is no hard limit as to what database virtualization can emulate. It can emulate data types or concepts for which no real equivalent exists in the new destination database. Stored Procedures, recursive views, proprietary data types, or Global Temporary Tables are just a few concepts database virtualization emulates efficiently.

Viewed differently, database virtualization automates the work of consultants. Instead of laborious rewrites including modifications to the logic applications to work around actual or perceived limitations, database virtualization does the conversion in real-time. Unlike a consultant, database virtualization scales, does not fatigue, and eliminates room for error.

Data Virtualization Is Not Database Virtualization

While the concept of database virtualization has started attracting attention and several products have become available, there is still a lot of skepticism and confusion. To be fair, the notion of virtualization has been used loosely in the context of databases. From data virtualization to the abstraction of database files, the label virtualization is used liberally.

The concept of data virtualization has been around for decades and may warrant a clean delineation. Data virtualization is the idea of using a universal query language across all applications. While this seems beneficial in the long run, it requires all applications to be rewritten to use this artificial language.

This does not solve the original problem: In order to overcome the vendor lock-in, the enterprise has to, well, overcome the vendor lock-in in the first place? Put differently, data virtualization is certainly aspirational, but it does not solve the actual problem of moving existing applications between databases.

When Not to Use Database Virtualization

A standard reaction by technologists when first introduced to database virtualization is “too good to be true.” Then follows a barrage of highly specific questions about intricate features and edge cases. And while there is no theoretical limit as to what database virtualization can emulate, there are situations where it might not be the right tool for the job.

Not surprisingly, database virtualization works best when the destination system has equal or at least similar processing prowess. Virtualizing a workload of, say, 5 m of daily statements onto a system that can barely handle 2 m is a nonstarter. In this situation, rewriting the workload and crafting a bespoke solution for the destination system may be feasible, albeit costly.

Another case where database virtualization is not the right match are small systems with only very few applications or even just a single one. The situation is akin to running VMware on a laptop. Usually, that’s not a good idea. Retooling the application may be preferable. Then again, vendor lock-in doesn’t pose a significant problem in this case anyway.

To decide if database virtualization is the right fit, one needs to ask their vendor for an upfront analysis. A reputable vendor will give a detailed breakdown of workloads, features used, and overall suitability.

Off to the Races

Database virtualization is still a new discipline. However, the concept is straightforward and intuitive. It has sent the first shockwaves across the industry already. As enterprises increasingly move core data assets to the cloud, this new virtualization technology will certainly become more prominent.

The push of enterprises to the cloud not only created an environment that makes database virtualization attractive, but it might very well need database virtualization to succeed. Without it, migrating the current installed base of databases with conventional means to the cloud will put a burden of well over $100 billion on the economy. With it, that barrier will be lowered by more than an order of magnitude.

Enterprise data management has an exciting future ahead of it — vendor lock-in need not be part of it.

Feature image via Pixabay.