Data, Analytics and AI: What Will Happen in 2023
As we roll into the first full business week of 2023, it’s a good time to consider what’s likely to take place in the data and analytics arena this year. There’s no shortage of opinions out there, of course. Like in past years, as 2022 came to a close, public relations folks happily shared with me industry predictions from their client’s brightest tech leaders and executives. And also as in years past, I took on the project of collecting the predictions, sorting through them and trying to make some sense of them.
Though this is my first year writing up predictions for The New Stack, this year’s batch, larger than years’ before it, fit well into the tradition. The document I copied and pasted all the predictions into is over 40 pages long, and that’s not counting predictions from a couple of sources that were so large as to live in their own documents. The only way to make sense of such a huge volume of prognostications is to read through them all and see where themes and commonalities emerge.
That’s what I’ve done here and, as a result, present this year’s predictions broken down by: the impact of the economy on the world of data and analytics; how automation will impact the data arena; the role of data governance, data security and access; how the Data Mesh approach to analytics will play out this year; the importance of data sharing; the emergence of data observability; data and analytics’ impact on the cloud (and vice versa); trends in containers and Kubernetes; and, of course, what is to come in the realm of artificial intelligence.
That’s enough preamble. Let’s now take a look at what the industry believes awaits it, and us, in 2023.
Recession Drives Efficiency More Judicious Spending
It seems like we spent most of 2022 hearing about an impending recession that still has not arrived. It’s no surprise then, that a number of predictions for 2023 focus on how an anticipated recessionary environment will manifest itself in the data and analytics world.
Mark Van de Wiel, Field Chief Technology Officer at Fivetran, gets to the point quickly: “the global economy is having a cold; perhaps it caught COVID. As a result, this is putting the brakes on lavish spending on modern data stack projects.” But Van de Wiel doesn’t think it’s all doom and gloom, adding “however, 2023 will still bring more data projects. They will just be more focused on a particular use case, or driving efficiencies in general.”
That notion that in order to be funded, data projects must show efficacy, is shared by Matillion CEO Matthew Scullion, who states that “…economic conditions are going to be a forcing function for enterprises to realize that the key to competitive success will be making sure a substantial percentage of that data becomes useful.” Alexander Lovell, Head of Product at Fivetran, says “2023 will be put up or shut up time for data teams.” Lovell says this not just because data teams will need to justify the costs and expenditures they incur, but that recessionary times actually make data-driven insights more valuable “because executive intuition is less reliable when markets are in flux.” As a result, Lovell thinks “the best data teams will grow and become more central in importance. Data teams that do not generate actionable insight will see increased budget pressure.”
Dremio‘s co-founder and Chief Product Officer, Tomer Shiran; Staff Product Manager, Ben Hudson; and Director of Product Management Jason Hughes, collectively write that “customers are going to be much more cautious… and will put up walls of approvals and rules regarding who’s allowed to use and access what.” While Dremio said this specifically in the context of its competitor Snowflake, the fact remains that this cautious approach is a danger in general. It harkens back to the old (expensive) days of data warehousing and is antithetical to the more contemporary idea that analytics should be used wherever and whenever possible. And it’s not just Dremio articulating this warning. Matillion’s Scullion offers his equally wary opinion that “in the face of a recession, incurring surplus cloud costs just to store… data, regardless of its usefulness, won’t be advisable.”
Automation, Not Headcount
Staying with the theme of “doing more with less,” the value of automation cannot be underestimated. A good summary of the automation thesis comes from Satish Jayanthi, Chief Technology Officer and co-founder of Coalesce, who believes that “rather than continuing to grow teams, companies… will look towards ways to automate data processes that they once did manually.”
There’s lots of consensus around this. For example. Dremio’s folks say “Cost-saving data lakehouse automation will rise as lakehouses become mainstream.” Ulfar Erlingsson, Chief Architect at Lacework, says “CISOs will need to look for third-party technologies…based mostly on automation… to augment the capabilities and strengths of their teams.” And Exasol‘s SVP of Product & Innovation, Jens Graupmann, says “In 2023, the role of metadata in the data ecosystem will continue to grow, spurred by… the need to speed up data delivery through the automation of data pipelines and warehouse automation.”
And there’s more. Anmol Bhasin, CTO at ServiceTitan, says “automation through AI is going to be a major factor in staying competitive.” Privitar‘s CEO, Jason du Preez, says data security platform “…policies, based on the logical use of metadata, will enable automation of security and privacy controls, providing scale and flexibility.”
GigaOm Analyst Paul Stringfellow says customers will want to “automate as much as [they] can. Because people have a lack of skilled resources, I think they are looking at IT vendors to solve those problems for them, [to] ‘automate processes for me’.”
Data Governance, Security and Access
But what about data management? We do have lots of predictions here, both for its subcategories and even for specific individual technologies. One big area here is around data governance, and how that cascades into security and access.
Fivetran’s Van de Wiel isn’t shy about this: “Data security and access to data will be top topics of concern for C-suites in 2023″ he says, acknowledging how this discipline–once considered very dry and of importance only to IT and database administrators–is now a concern at the highest executive levels. Van de Wiel sees data governance as the solution, saying “Governance is going to address the challenges. Good data governance enables the citizen analyst, driving efficiencies for data teams. Integration between various vendors will become mainstream, and it will drive investment.”
And it’s not just conventional, structured data that needs governing, according to one of our predictors. Dr. Suzanne Weller, head of research at Privitar, thinks “unstructured data will need the same degree of governance and privacy protection as has been achieved for structured and semi-structured data sources.”
And the governance context keeps growing. Dima Spivak, Chief Operating Officer of Products at StreamSets, thinks this issue extends past data itself, and includes data pipelines as well: “As data integration technology becomes more accessible, the focus will shift to operationalizing those technologies. The ability to scale simple-to-build pipelines and comply with enterprise governance requirements will be seen as more important than simply being able to connect to lots of environments.”
Sophie Stalla-Bourdillon, Immuta‘s Senior Privacy Counsel and Legal Engineer, even sees a nexus between governance and enterprises’ multicloud strategies: “organizations… will build multicloud data analytics environments and will have to abstract or federate their governance layer to adequately govern data in multiple locations.”
Weaving a More Sophisticated Data Mesh
Exasol’s Graupmann even sees data mesh — 2022’s analytics paradigm darling —and data governance clustering together: “One growing trend is the adoption of a data mesh architecture, which many organizations view as an answer to their challenges around data ownership, quality, governance, and access. A data mesh offers a distributed model where domain experts have an ownership role over their data products.”
What is a “data product?” Often, it’s a curated data set, API or application that allows business domains to share and control their data. Vishal Singh, head of data products at Starburst Data, sees data products as taking off in 2023: “Data Products provide a self-service component that enables [successful organizations] to fill in the gaps between data creation and consumption. In the coming year, we’ll see more organizations laying these foundations of democratizing data access and/or strengthening their Data Mesh framework through Data Products.”
There are some cautionary predictions around data mesh, as well. Graupmann at Exasol believes “Data mesh initiatives [will] gain momentum, but misinformation threatens to slow adoption.” The misinformation Graupmann refers to includes the conflation of data mesh and data fabric, the flawed notion that a dash mesh is something you can simply buy, and also trepidation that the data mesh approach can exacerbate data silos. To address these dangers Graupmann says that “companies must take responsibility for educating themselves to strengthen their understanding of what a data mesh is and how it can optimize their data management strategy.”
Data Sharing, Beyond the Ad Hoc
In addition to data products, sharing of data, more broadly speaking, looks to be a growing priority in 2023. Haoyuan Li, founder and CEO at Alluxio, summarized one of his predicted trends for 2023 by saying that “demand for simplified data access and data sharing is on the rise.” Fermín J. Serna, Databricks‘ Chief Security Officer, agrees and credits open source technology for the trend: “Organizations will adopt secure data sharing based on open standards.” Referring to the Databricks-developed but open source Delta Sharing technology, Serna continues: “advances in secure data-sharing standards based on open source change the equation because enterprises now have a way of securely sharing their data while avoiding the pitfalls of opacity and getting locked into a particular vendor.”
Privitar’s Weller echos the sentiment, saying “privacy-enhancing technologies (PETs)… will take center stage in 2023… facilitating safe data sharing across organizations and borders.” Scott Gnau, vice president of data platforms at InterSystems, sees this phenomenon establishing itself in the context of the Internet of Things (IoT), too. Gnau says that “in 2023, we will see innovators take IoT technologies to the next level by progressing from traditional, unidirectional IoT models, toward bidirectional training and sharing of IoT data.” Finally, Manjusha Madabushi, chief technology officer and co-founder of Talentica Software, thinks blockchain figures into the data sharing proposition as well, sharing this prediction: “in 2023 we will see blockchain data sharing system architectures empower users to gain control over the ownership of their data.”
There you have it: open source, IoT and blockchain, all exciting arenas in and of themselves, tie into the data-sharing story, according to our panel of experts.
Data Quality Begets Data Observability
Focusing on the ability to monitor data quality, profiles, volumes and more, data observability is the subject of several predictions for data and analytics in 2023. Data observability began as an offshoot of data quality, and it’s gaining in the market. As a matter of fact, William McKnight of GigaOm, and McKnight Consulting Group, believes that “data quality [will be] subsumed into data observability” (emphasis mine).
Shadi Rostami, senior vice president of engineering at Amplitude, simply states that “data observability will become a critical industry.” Fivetran’s Lovell agrees, saying “observability will be an important trend.” Lovell further states that “with solid observability in place, organizations have fewer regulatory hoops to jump through, driving efficiencies and cost savings.”
Bassam Khan, vice president of product and technical marketing engineering at Gigamon (not to be confused with GigaOm), sees a security angle here too, saying “deep observability will help detect rogue activities.” Providing more detail, Khan explains that “deep observability coupled with MELT (metrics, events, logs, traces) can help discover rogue activities coming from users, cloud app developers, or bad actors such as running P2P applications, cryptomining servers, etc.”
Data and Cloud, Tech’s Chicken and Egg
We’ve covered six categories of predictions already and, somehow, we haven’t touched on the cloud yet, so let’s fix that now. This year’s batch of predictions have a lot to say on how the cloud will change data and analytics, as well as how the latter will impact enterprise cloud strategy.
Starting with a simple, categorical assertion, Immuta’s Stalla-Bourdillon believes that “Cloud-based data analytics will become mainstream.” And Privitar’s du Preez predicts that data projects in the cloud won’t just be mainstream, but will be the very catalyst of cloud momentum, with his prediction that “pragmatic, data-focused programs [will] replace general cloud migration initiatives.”
David Meyer, SVP of Product Management at Databricks, takes it a step further, asserting that “the data platform will emerge as the central driver of multicloud strategy.” Mayer substantiates that claim with this more elaborate comment: “as enterprises consolidate all of their data-oriented use cases on a data lakehouse, they will prioritize their cloud vendor decisions based on data workload needs including ease of use, performance, regulatory compliance, and unified management across clouds.” I would add that this applies to data warehouse use cases as well.
Beyond the data platform generically, Randall Ward, CEO at Appfire, thinks geographical control of data will be a gating factor in cloud provider selection, opining that “data residency across many geographies will become a key differentiator when selecting cloud providers.” Steven Mih, co-founder and CEO of Ahana, thinks that the “Open Source SaaS Market will shift toward Open Source Managed Services.” Drilling down on this a bit, Mih says that “as data and analytics workloads proliferate in the public cloud accounts, and as IT departments demand more control of their own data and applications, we’ll see the adoption of more cloud native managed services instead of full SaaS solutions.”
Kubernetes on Auto Pilot?
It’s hard to talk about cloud and cloud native technology without at least thinking about container technology and Kubernetes (K8s), the industry standard open source technology for container orchestration. But after taking the world by storm, has K8s lost its sheen? Many of our prediction all-stars seem to be getting K8s-jaded. It’s not that they think K8s is going away but rather that customers are losing interest in managing it themselves.
For example, Alluxio’s Li thinks that “Containers provide many benefits, [but] the transition to containers is very complex. As a result, in 2023 the main bottleneck to container adoption will be the shortage of talent with the necessary skill set for tools like Kubernetes.” StreamSet’s Spivak is right there with Li: “I suspect that we’ll see less emphasis on self-hosting technology stacks like Kubernetes. While such services will continue to be extremely valuable, companies will likely want to get out of managing such complex systems and leave doing so to a smaller number of service providers.”
At GigaOm, where I am also an analyst, my late colleague Michael Delzer had this to say: “The battle between K8s and managed PaaS solutions will remind IT veterans of the Java vs .NET campaigns… vendors like SAP, Oracle, and Broadcom will be moving their apps to either PaaS or SaaS solutions.” Michael Delzer, unfortunately, passed away on Jan. 2, but he left us with those sage words in December. After a long and distinguished enterprise technology career, his words should not be taken lightly.
Altair‘s chief scientist, Rosemary Francis, meanwhile, thinks that for AI workloads–those pertaining to deep learning, at least — K8s may be going away. Francis comments that “while initially most machine learning workloads were run on Kubernetes or other container orchestration frameworks, it’s become clear that these systems are designed for microservices, not for the bursty, computer-intensive machine workloads now required for deep learning.” Given that Altair makes high-performance computing iron on which type of metal Francis feels these workloads should run, the prediction may be self-serving. Then again, that’s true of almost all tech predictions.
Artificial Intelligence: of, by and for Data
And speaking of AI, the topic has been so hype-laden for so long, that I felt we should save it for the end, after we’ve considered other parts of the tech stack. AI is likely going much more mainstream this year as, with the emergence of GPT-3, ChatGPT, Megatron-Turing and other large language models (LLMs), AI has suddenly become more impressive, more impactful, and more accessible to people without data science skills.
But here’s a new angle: using these models to generate not plain natural language text or even conventional programming code, but to instead generate SQL. Dave Simmen, Ahana’s co-founder and CTO, puts it this way: “SQL workloads will explode as more NLP (Natural Language Processing) and other Machine Learning (ML) applications generate SQL.” We’ve had natural language query technology for a long time: ThoughtSpot is premised on it; Power BI has supported it since its inception, and Tableau added it years ago. But using an LLM to generate the SQL and have far more tolerant interpretation capabilities of the plain-English (or other languages) input presented to it? That could really make data culture far more attainable, for more companies, more quickly.
It would be more than a bit obtuse to look at AI predictions without surveying a few folks from Nvidia, the company that is commercializing so much of the AI coming out of research laboratories today. Below are predictions from Nvidia that span the worlds of LLMs, enterprise software, healthcare and retail.
Kari Briski, Nvidia’s vice president, AI and HPC Software, sees “the rise of LLM applications” as enabling them to span past their current use cases “across a multitude of diverse organizations by everyone from business executives to fine artists.” Briski also sees LLM expertise spreading “to languages and dialects far beyond English, as well as across business domains, from generating catalog descriptions to summarizing medical notes.”
Briski’s colleague, Manuvir Das, vice president of enterprise computing, agrees and sees this applying directly to business software as well. Under the headline “Generative AI Transforms Enterprise Applications,” Das says that “the foundations for true generative AI… [will]… transform large language models and recommender systems into production applications that… intelligently answer questions, create content and even spark discoveries.” Das believes that this will bring “massive advances in personalized customer service, drive new business models and pave the way for breakthroughs in healthcare.”
On that note, Kimberly Powell, Nvidia’s vice president of healthcare, predicts that 2023 is the year when “biology becomes information science.” She explains that “the capabilities of… new AI models [will] give drug discovery teams the ability to generate, represent and predict the properties and interactions of molecules and proteins — all in silicon.” The result is that “this will accelerate our ability to explore the essentially infinite space of potential therapies.” Who says AI has to be creepy and sinister?
Azita Martin, Nvidia’s vice president of AI for retail, consumer packaged goods and quick-service restaurants (and, once upon a time, my boss), believes “AI will enable more frequent and more accurate forecasting, ensuring the right product is at the right store at the right time. Also, retailers will embrace route optimization software and simulation technology to provide a more holistic view of opportunities and pitfalls.”
Hindsight on Foresight
I’ve been assembling annual predictions for the data and analytics world for several years now. As I do the work, my own reactions are usually a combination of skepticism, gripes, optimism and fascination. Some of the predictions are inevitably correct, while many others prove to be inaccurate. Even the latter are useful, though, because they help us understand what industry players are thinking. As such, they reveal the underpinnings of how they will approach the market, and prioritize features and functionality for their products and platforms.
Oftentimes, the folks making predictions do so in alignment with their companies’ business models and go-to-market strategies. Sometimes, their predictions are based on their opinions and certain self-fulfilling prophecies that validate them. But all of the predictions involve solving a difficult puzzle: identifying which market phenomena are fundamental and classic, and which conditions are temporary and likely to change in the next twelve months. Nothing about that is easy. So pay attention to all of the predictions here, because even those that don’t prove to be correct are nonetheless based on important insight.
Will “data observability” really be a thing in 2023? Even if it flames out, we can expect it to be an important factor. Will generative AI transform the way data-driven organizations implement analytics? We can at least make a reasonable bet that many organizations will try to make that so. Will Kubernetes recede to a behind-the-scenes infrastructural role instead of that of a prominently presented technology? We can at least start to demand that vendors treat it that way. Will the market cool to cloud-based everything, and a “cost be damned” attitude? We can certainly expect more nuanced implementations and a much more sober look toward cost management and budgeting.
Most technology disciplines and their markets tend to oscillate between waves of innovation and waves of rationalization. Practitioners switch their focus from exploring the possible to mastering what’s here and implementing it in a much more precise manner. When we move from focusing on the risks of not acting to the costs of participating, we also force the technology, and ourselves, to mature. The courage to invest must be followed by verification of the return. And when the pendulum swings back to investing again, we’ll hopefully have a sense of foresight that will help us to do so more wisely.