DataOps and the Problem with ‘Ops’ Terminology
Almost every trend regarding how IT operations are handled gets an “Ops” moniker: DevOps, DevSecOps, AIOps, MLOps, GitOps, NoOps, FinOps, etc. We wholeheartedly believe that many of these terms explain real phenomenon. However, as is the case with DataOps, the rush to rebrand existing products obfuscates the degree to which these trends are getting traction.
Although there are differences, at its core DataOps is DevOps processes applied to data operations. Automation of data pipelines and collaboration between teams are two of the key characteristics of “modern” DataOps. Yet, managing databases and other data platforms have been a core component of IT’s responsibilities for years. Without consensus about what DataOps means, market confusion will abound. Here are just a few studies that may be used to overstate the trend’s prominence.
We were shocked to read that 90% of enterprises are using DataOps, so we took a deeper look at the underlying 451 Research report, which was based on a survey sponsored by Delphix. The study actually says that 89% of the respondents expected to increase spending, investment, or development on DataOps technologies. Yet, the study provides a process-oriented definition of DataOps that allows any company to say its product or service say it is a DataOps technology. The report surveyed 150 representatives of North American organizations with more than 1,000 employees and a minimum of 2PBs of data under management who had a solid understanding of their organization’s data management strategy.
Based on that sample, 71% are already well along on with their DataOps maturity. Just like with DevOps several years ago, it seems that most large enterprises believe they are doing DataOps, but the level of maturity is probably being overstated dramatically.
An overbroad definition of DataOps was also used in a 2018 study of “data professionals” commissioned by Nexla which found that 73% of respondents’ companies plan to hire someone to help with data operations. Unfortunately, the study does not distinguish between database administrators, data engineers, storage professionals or a variety of other positions. Hiring more data administrators to support front-end users like business analysts seems to be a failure of organizations to automate and scale, proof of the need for DataOps rather than its use.
An Enterprise Management Associates survey found that quality control is the top aspect of DataOps to the operation of data ecosystems that integrate multiple data platforms. How quality control is defined is anybody’s guess. Automation is the second most important, followed by a long list of functionality that people just assume are related to DataOps. It is just speculation, but we believe that few, if any of the IT and business analysts that filled out the survey respondents were thinking of MLOps (operations for machine learning).
The New Stack and Lightbend’s study of data stream processing dealt with data pipelines. Although most of our respondents were developers or architects, 43 data engineers and data scientists did tell us about their experience with continuous integration and deployment (CI/CD) tools to deploy machine learning applications, models or systems. Albeit based on a small sample, 68% said they were at least somewhat experienced.
Lightbend is a sponsor of The New Stack.
Feature image via Pixabay.