Beyond ELT: What Is a DataOps OS?
DevOps best practices are not commonly found in the world of data. The concept of DataOps — applying DevOps principles like continuous integration, continuous deployment, version control, code review, automated end-to-end testing, automatic rollbacks and cross-team collaboration to data workflows — has slowly been gaining ground, but it has not yet become ubiquitous in data teams and their tools in the same way that DevOps has driven a complete paradigm shift in software development.
Here’s one quick example: Imagine how much more confident you and your data team would feel experimenting and moving quickly if anyone could propose a change to a data connector’s configuration or some transformation, and you’d be able to trust that the system would warn you if this would break a dashboard or notebook downstream instead of having to wait to hear from a frustrated end user who ran into the issue while in the middle of a presentation.
At Meltano, the startup where I serve as CEO, we are confident that data teams can benefit as much as software development teams do from applying DevOps best practices. The open source Meltano project got its start in 2018 as a DIY tool built by and for the GitLab data and analytics team. Last summer, Meltano spun out of GitLab, raised seed funding and now has its sights set on becoming the open source foundation of every team’s ideal data stack.
We believe that the current state of data tooling is still holding data professionals and teams back. Making DevOps best practices an integrated part of the data lifecycle in the form of DataOps is the key to enabling data teams and the organizations they serve to unlock the full potential of their data.
DataOps OS Instead of a DataOps Tool
Originally, our vision was to bring DataOps to the entire data lifecycle by becoming the single tool that does it all. However, as the data space has evolved, we’ve seen a huge shift toward horizontal integration. Multiple narrowly focused tools now compete in every step of the lifecycle. While we talk about the “modern data stack” as if it has a clear definition, every team’s actual ideal data stack will look different based on the tools they’ve chosen and exactly how they’re all hooked up.
With all the competition and rapid iteration, data teams have gained amazing abilities, and it’s become clear that no one-size-fits-all tool will be able to compete with the pick-and-choose approach that allows teams to use the best tool for the job at every stage and finely tune their stack to their own unique needs.
So as a result, the way we see it now is that the modern data stack needs a new layer — a control plane or “operating system” that ties it all together, making the data stack better than the sum of its parts. Ideally, the DataOps OS gives data teams a single place to reason about and interact with their stack as a whole. The DataOps OS should provide unified configuration and deployment across components and let teams treat their entire data platform like a single software project.
Teams deserve a DataOps OS — a stable foundation to build their stack on that can stay with them for years to come and that lowers the barrier to try out new tools, swap out old ones or use alternatives side-by-side.
What Does a DataOps OS Look Like?
You might think of a DataOps OS as a package manager for data tools or “Terraform for data stacks” — different interpretations that focus on different qualities but that we otherwise see as equivalent to the data OS framing.
From day one, Meltano has had a plugin-based architecture, offered package management functionality, and provided much-needed glue between best-in-class open source data tools and technologies.
We used this architecture in the first stage of our strategy, which was to address what we deemed to be a rather desperate need in the market for better ELT tooling that embraces software development best practices like DevOps and open source. ELT (which stands for Extract, Load, Transform) is the first stage of the modern data life cycle. Here, data is extracted from various source systems, loaded into a cloud native data lake or data warehouse and then transformed in whatever way is necessary in order for the data to be analyzed. Today, Meltano is a great ELT solution because of the first set of plugins we’ve decided to support: Singer taps and targets for integration, dbt for transformation and Airflow for orchestration.
Now we aim to extend this architecture to the entire data life cycle, incrementally adding plugin support for all of the tools in the modern data stack that are compatible with DataOps best practices. Since we aim to streamline the configuration and deployment of the entire data stack and the integration between its components, our focus will primarily be on open source tools, but we are also exploring integrations with SaaS tools through API connections.
We want to make the barrier for anyone to get started with DataOps as low as possible. To accomplish this, Meltano will act as a tour guide and offer a set of recommended plugins for each stage of the data lifecycle. Thus, for people new to data, they’ll have in Meltano a “data stack in a box” and an easy way to get started. However, we recognize there is no such thing as a one-size-fits-all data stack and that every team’s ideal setup will look different. Thus, we have no intention of locking users into any particular set of plugins. Offering choice and flexibility is the point.
The data that businesses generate or ingest from third parties continues to grow exponentially in volume, variety and velocity, yet our ability to derive actionable business insights from this data is too often frustrated by mundane and time-consuming manual operations and disjointed workflows. Data professionals are hungry for the benefits that DataOps — and other end-to-end functionality like observability, extensibility, governance and lineage — can provide.
Going forward, DataOps will be critical to how organizations build their high-performing data cultures, and a DataOps OS will be the foundation of every team’s ideal data stack.