DataOps: Tips and Tricks

(This month, The New Stack examines the management of data in cloud native systems, with a series of news posts, contributed essays and podcasts. Check back often on the site for new content).
As we’ve established, DataOps is kind of like DevOps for data processes — but with a much greater focus on incorporating business logic into the engineering process, rather than simply bringing data engineering teams together with operations folks. So how are teams and companies supposed to go about adopting DataOps?
Here’s what DataOps experts say.
It’s All About the Business
It’s easy for technologists to get excited about new tools and technology. There can be a tendency to use the shiniest, newest tech to create solutions when older, more boring technology might actually be the best way to address the problem. The DataOps approach involves a relentless focus on the business reasons for a data product, one that is focused on business outcomes, not tools or technologies.
“Understanding the business and then understanding how to apply these methodologies,” explained Patrick McFadin, vice president of developer relations at DataStax, about successful DataOps practices. “So there’s no formula, you mix in a little of this and a little of that and it’s magic. No, you have to understand the business.”
That’s because DataOps is ultimately about creating magical user experiences — often customer experiences, though DataOps can certainly be applied to data projects for internal use as well. McFadin sees DataOps as part of the digital transformation that has become both a buzzword and a key to survival for many companies.
McFadin compares a lot of companies’ DataOps operation to the underpants gnomes on South Park: Step one is to collect all the underpants. Step two… question mark. Step three profit. “I did that in a couple of presentations, and people are like, yeah, that’s us,” he said. “Because, where’s your step two? If you’re collecting all the data, what’s the step two? Because there’s profit out there? Step one should be what value are we extracting from this data, and then you start looking at how you operationalize it. If you start with saying we’re just going to collect all this data, you just made a massive, very costly mistake.”
Planning for Production
In addition to considering the business rationale for data products at every step of the development lifecycle, it’s also important to start thinking about how the data product will be operationalized and how users will interact with it. This is can be especially critical for internal applications — no one would ever expect a customer to use the command line to access an app on their iPhone, but too many data engineers assume that with a 10-minute training non-engineers in the organization will start using the command line to get information from a data product. “The command line simply is not going to scale,” explained Andrew Stevenson, chief technology officer at lenses.io. “It’s a black box for a lot of people, especially those who have the business knowledge.”
As data products are designed, teams need to think proactively about how the product is going to be accessed, who the users are going to be and how it is going to scale. They have to make sure not to lose track of the second, operational, part of the data product’s life during the design and build phase. Because if non-engineers are expected to use the product with the command line, the entire project will amount to a very expensive waste of time.
Don’t Ignore Security and Governance
In addition to thinking about the business reasons for creating the data product, you also have to consider the security, governance and compliance issues that the data might create. Are there data locality concerns? What jurisdictions is the application going to be running in? What type of data will you be collecting and analyzing?
Getting the security or compliance wrong in a data product can doom the product to failure — or worse, be an extremely costly and embarrassing episode for the company. This is another example of how one-size-fits-all approaches won’t work, however. A healthcare company that operates in 100 countries will have very different concerns than an animal rescue organization that primarily collects data about stray pets. The key is to evaluate the security and compliance issues at the beginning of the development process, just as you should address the business rationale.
Be Thoughtful About Tools
“As much as I love technology, the most successful companies I’ve seen have people who actually understand their business,” Stevenson said. “Understand what you’re doing and why you’re building it and ultimately that will help with the correct technology choices.”
For example, if you need real-time data processing, Kafka is a good choice. If you’re processing a single message per day, Kafka is a poor choice. It’s the same with Kubernetes — there has to be a compelling reason to use tools and technologies, especially because they often end up introducing complexity into the system that can ultimately make it less resilient.
Learn from Software Engineering
DevOps is a more well-known approach and there is a robust ecosystem of tools to support DevOps teams. Some of those tools can be used by data teams, and other can serve as an inspiration for what data teams should have.
“Data teams should start asking themselves, ‘why don’t I have all the same things I’ve gotten used to from software engineering, like code review whenever someone makes a change, like continuous integration to verify that my changes don’t break things and continuous deployment to make sure my changes are brought to life once the team has signed off on them?’” explained Douwe Maan, project lead at Meltano, a data integration platform from GitLab.
The key to success with DataOps is not entirely different from the key to success in any other endeavor, whether you are building software or building a house. Before you start pounding nails, figure out why you want to build the house, what purpose it is going to serve, who will use it and what rules it has to follow. When you’re deciding which tools to use, don’t reach for the ones you like best or that are the most sophisticated, but rather those that are most appropriate for the job at hand.
DataStax and GitLab are sponsors of The New Stack.
Feature image: Susanne Jutzeler, suju-foto de Pixabay