Data 2023 Outlook: Rethink the Modern Data Stack
As we enter the fourth year of the 2020s, there’s little secret that the global economy continues to be in upheaval. Will there be a recession or not? It’s anyone’s guess, and that uncertainty took a toll on what had been hot growth tech growth stocks like Snowflake, while providing reprieves for less sexy old guard mainstays like IBM and VMware.
The bulk of the breakneck growth has been in the cloud. Since the dawn of the pandemic, the narrative has centered on COVID accelerating the existing secular trend of growing cloud adoption. Going forward, the usual suspects are still predicting cloud spending growth in 2023, but Snowflake’s lowered Q4 guidance points to a possible narrative of “whatever comes up must (at some point) come down,” or at least slow down.
In this post, we voice our predictions on the operational side of cloud data platforms and analytics. Tomorrow, we’ll direct the focus on what’s coming with the management side of data, in a follow-up post. But first, let’s understand the bigger picture of what and why this is happening.
This won’t be Dot Com Bust 2.0.
Here’s the general context: Despite, or because of economic uncertainty, cloud adoption will continue to advance.
There’s little debate that the cloud is no longer a financial budgeting maneuver to move costs from the capital to the operating budget; that makes overall adoption fairly resilient to spikes in the overall economy. It is an enabler and accelerator for business transformation, as it removes many of the barriers to launching new apps and business services, and provides the flexibility for changing gears far more readily than systems, with their own dedicated infrastructure operating inside the data center.
Economic uncertainty simply ramps up the pressure for businesses to transform. So, nope, this won’t be a repeat of the dot come bust. Even as internet giants like Amazon, Meta, and maybe even Google, might be shedding jobs, enterprises in the mainstream economy will gladly soak up all the cloud infrastructure, cybersecurity, data science, and AI expertise they can find.
While we’re not about to see retrenchment in cloud adoption, we’ll see more scrutiny of cloud spend. While cloud compute and storage might be cheap, a lot of cheap eventually gets expensive. More to the point, the spotlight will manifest itself in more ways than simply arbitrary budget caps; instead, we expect that it will also surface in platform choices and desire to streamline their modern data stacks that, until recently, the cloud had been disaggregating. The cloud was supposed to make IT simple, and in the coming year, enterprises will hold the hyperscalers’ feet to the fire to live up to that promise.
Of course, there are multiple planes of attack for optimizing cloud spend. There is a well-established and varied ecosystem of solutions ranging from monitoring tools from the hyperscalers (e.g., AWS CloudWatch) to providers like BMC, CloudBolt, Datadog Dynatrace, Flexera, Micro Focus, ServiceNow, VMware, Yotascale, Zerto and many, many others addressing cost control, security, governance, observability, workload optimization. Many of these solutions can drill down to granular reporting of consumption by SaaS service, app, and line organization. Consider this universe of tools as latter-day cloud manifestations of traditional IT service management and IT chargeback solutions.
There’s another side to this coin that goes beyond the traditional pane of glass, and it’s how services are delivered. AWS for instance offers hundreds of services addressing everything from analytics to application integration, contain center, containers, databases, gaming, IoT, machine learning, quantum computing, security, storage and others. Trying to optimize from the horn of plenty is challenging enough. Back to our core point, which is that complexity is the enemy of efficiency, and complexity adds cost. As data is the lane that we live in, we’ll focus our spotlight there. And it’s in data where we have an all-too-inviting target: The Modern Data Stack.
As data is the lane we play in, let’s focus our sights on how data management gets rationalized. Time to cut to the chase.
Simplify the Modern Data Stack for the Cloud
If you’re aiming to be smarter about how you use the cloud, complexity is your enemy. We ranted about this last year in our post, When Will the Cloud Get Simpler?
We expect to see a refactoring of what’s been termed the “Modern Data Stack,” as described by providers as diverse as Fivetran and MongoDB. That stack has typically encompassed a data pipeline for harvesting, transforming, and ingesting data (the modern-day successor to ETL tools), the data warehouse, and the various visualization and analytic tools for gaining insights. To all this, we would add the operational or transaction database, which more often than not, is the primary source for this data.
What made the data stack modern? Well, it is hosted and delivered in the cloud and it takes advantage of the cloud’s elasticity. OK, that’s a start; customers no longer have to worry about provisioning or the housekeeping for patches and upgrades, and with many of these modern data stack services being serverless, there’s a lot less upfront hassle and a lot more flexibility.
But that’s just not enough. The modern data stack boasts an almost too-rich array of data and analytics SaaS services, and while each SaaS service individually makes its own process simpler for customers to launch and manage, they’re still on the hook for integrating them. And did we neglect to mention, these toolchains can get highly complex?
We’ve called on database and analytics SaaS providers to, literally, get it together. Make life simpler for the customer. Simpler is more economical and simpler is smarter. Less wasted cycles and expenses for the customer, more consumption of value-added services for the provider. Everybody should win. Over the past year, we saw that a few providers have started feeling your pain, and this is where we expect to see more positive responses in 2023.
Look for Bundling
The low-hanging fruit is for offering a combo of services that are frequently used together. This has been a long-established pattern in the on-premises solution world. Here we’re making some predictions and suggestions on where to see more linkages this year.
Bundling presents a golden opportunity for hyperscalers to new platinum tiers to their partner programs. It would involve extending core database and analytic services by pre-integrating, bundling, and promotional pricing popular third-party services, stitching them together with under-the-hood orchestration. The goal is shifting the integration burden off the customer’s shoulders, and with your most popular partners, and attractively pricing those combos to stimulate adoption.
Let’s throw out some examples. Add light analytics to transaction databases, and for more “serious” use cases (where you don’t want to slow down transaction processing with analytics), prepackage change feeds to data warehousing services where you can perform ELT. And as for ELT, have ready-made integrations in the target. That’s where the opportunity for competition comes in. AWS Glue, Azure Data Factory, Google Cloud Data Fusion will have their home court advantages with Redshift, Synapse Analytics, and BigQuery, respectively. But developers are not about to abandon their own favorites like dbt or Fivetran. That’s where your partner program kicks in with bundling pre-integrated stacks. And, by the way, the same holds true for analytics and AutoML services.
In-database machine learning has already become a checkbox feature for cloud data warehousing services, although the degree to which data must be moved still varies by provider. Blending of light, operational analytics into transaction databases is also already happening. Google and Oracle introduced API-compatible implementations of MySQL and PostgreSQL that combined the trifecta: transaction processing, analytics, and in-database AutoML. Meanwhile, SingleStore reinvented tiered storage and indexing. And even Snowflake has gotten into the act by dipping their toes in the water for lightweight transaction processing with Unistore.
Of course, the so-called (depending on whether you use Gartner’s or Forrester’s terminology) augmented analytics or translytical database is not new. Appending column and row stores are practices that date back over a decade with IBM BLU, Oracle Database In-Memory, and MariaDB SkySQL, among others.
But as noted above, let’s not stop there. Blend in ELT. AWS at least simplified the Aurora-to-Redshift data pipeline with a prebuilt Zero ETL change data capture feed. Google already builds change data capture support into BigQuery, while Azure Synapse Analytics pre-integrates Azure Data Factory. For almost every analytic platform, there is plenty of opportunity for blending in streaming as well with integration with data flow pipelines and Kafka PubSub feeds. The customer should not have to individually configure these integrations themselves and pay a la carte pricing.
Separate platforms for transaction databases and data warehouses won’t go away, and end users won’t give up their visualization or reporting tools. But we expect hyperscalers and third-party SaaS services to get more creative with blending, bundling, and pricing. Here’s a potential example. In Google Cloud, several databases (e.g., AlloyDB, BigQuery, Dataproc) share common storage. With unified governance, provided courtesy of Google’s Dataplex data fabric, the data could be selectively surfaced by the engine of choice, paid for on a per-use basis.
Serverless Plays a Supporting Role
Another move toward smarter cloud consumption is the growth of serverless. It provides an obvious form of simplification customers no longer need to worry about provisioning or capacity sizing. A critical mass of hyperscaler databases already offer it; for instance, with OpenSearch, AWS just addressed the last gap in its analytic portfolio that lacked serverless. We’d like to see new entries, like Oracle MySQL Heatwave and Google AlloyDB offer serverless options as well. Of course, serverless is not the answer for everything because if your workloads are stable and/or predictable, reserved instances are obviously the better way to go. But for new workloads, serverless removes friction, not to mention the cost of overprovisioning, which should encourage more development for the dollar. Serverless can be an entry-level stage for new workloads that can shift to reserved instances as their uptake matures.
While serverless is pretty well entrenched with cloud data warehouses, analytics, and machine learning services, we expect more transaction and operational services to add the option this year.
What about Multicloud?
All too much verbiage has been written about a core fact of cloud life: Most organizations are going to use more than one cloud. We’ve railed on and on about the administrative overhead that managing multiple clouds will bring on. But as with plate tectonics, there’s no realistic option of turning back, and besides, for competitive reasons, should any organization tie its fortunes to one cloud?
Our take is that multicloud is about freedom of cloud: the freedom to run the workload on the cloud of your choice. We have not been a heavy believer in running the same workload, or database, across multiple public clouds, given the latencies, varying security and access management structures, and infrastructure differences across hyperscalers. Of course, nature abhors a vacuum, and of course, this is where a variety of third parties are jumping in. With regard to data services, freedom of cloud is prominent among the messaging from the likes of Databricks, Snowflake, MongoDB, and others. They promise that, regardless of the infrastructure and administrative differences for each hyperscaler, operationally, their database services will look the same regardless of which cloud you run in.
Still, multicloud is the next frontier for simplification. We’ll throw a shout-out to Silicon Angle for identifying an emerging tier of the cloud ecosystem that they term Supercloud. But in the meantime, hyperscalers will have their hands full simplifying the rats’ nest of connections in their own backyards.
Tomorrow, we’ll take a look at what 2023 will mean for the Data Nerds.