Building Reliable Data Pipelines: Q&A with LogDNA and McAfee
Recently, Tucker Callaway, CEO of LogDNA, and Anastasia Zamyshlyaeva, vice president of data engineering, analytics and data science at McAfee, got together to talk about the data challenges companies face and how to overcome them with a focus on security.
The conversation below gives a brief history of data and discusses some of the trends they see today, how to build reliable data products and a leader’s role in building these reliable data products.
Tucker: How have you seen the volume and complexity of data that companies deal with change over the past five years?
Anastasia: The volume of data is growing exponentially for three reasons. First, because of the democratization of the cost for storage and computing through the cloud, organizations have started to instrument the collection of more data sources that potentially can be used in the future.
The second driver is the increased system requirements for availability, performance and velocity releases. This requires a new level of monitoring and logging that allows organizations to identify something wrong with a system, predict some of the failures and address them in the shortest time frame.
The last reason is that companies are identifying new avenues for innovation or disruption when they place more value on data. For example, they are discovering new segments of customers they weren’t reaching before because they didn’t have a good understanding of them. Or they are identifying an unmet need for which they can launch new products. As they are launching new products, they use a “build, measure, learn” approach about customers to ensure they are building what customers are looking for and addressing their needs.
Tucker: We see that in our business at LogDNA. One of the key themes we often hear from customers is that so much has changed in the last five years that the systems can’t scale to both the volume and the application of value you mentioned. What trends do you see in data today?
Anastasia: I like to discuss data in terms of volume, velocity, variety, veracity and value. Volume is growing for sure, but what is becoming even more critical is the velocity. Real-time is a new norm for big data. Companies are using speed now as a competitive advantage to make fast decisions. They have additional data sources and more variety in data lake structures to derive insights. There is veracity, which equates to quality. Companies waste money attempting to get relevant insights from poor-quality data. Finding value in the data is at the top of this pyramid.
Organizations need to focus on how volume, velocity, variety, veracity and value can drive business outcomes. This requires new processes and changes in the technologies and tools. They need to break the monolithic data lakes into data domains and treat those business data domains as products. Data owners need to understand how they can drive the adoption of those data domains to bring value to the customers.
This approach enables organizations to make decisions based on data rather than a gut feeling. It requires a very different velocity, not only to the pipelines, but also to the outcomes. Organizations are adding a lot of automation to application development through CI/CD testing and production, along with the ability to roll back if something goes wrong. A new area of data observability tools is appearing in the data world. I’m excited about the opportunities in big data created by focusing on the value and the business domains.
Tucker: I think it’s exciting. There will be many opportunities with data for organizations, as well as some technical challenges in breaking up these monolithic data structures into the more domain-specific structures. As you think about that, what tips do you have for organizations? What are the key considerations for building reliable data products?
Anastasia: When we talk about reliability, companies and data products can fail both on the business and the technology sides. To remediate the business side, and this is something that I would suggest addressing first, companies need a strong structure and thought process. They need to start with their top initiatives. What are the KPIs they are trying to address? What factors are driving these initiatives and how can data help? After that, they need to build a data roadmap focusing on how they can help users and customers. They need to define what success will look like and what they need to measure. This may be the number of active users, business impact, revenue, cost reduction, customer satisfaction and any other relevant measurements. They need to assign product owners and make sure that they’re using a data management platform to accelerate this journey and using a governance model to ensure that all data products can play together and not be siloed.
When we talk about reliability for the technology, I would use the same approach as applications with an SLA model. Focus on reducing mean time to detect, mean time to resolve and mean time between failures by having testing automation for release management and easy rollbacks.
So Tucker, as a leader of an organization, what do you think is the role of the leaders in building these reliable data products?
Tucker: It’s essential that we buy into it and believe in it. I’ve learned I need to balance emotion and data in decision-making. If the data validates my gut opinion, I can be confident that we’re making good decisions. When the data isn’t telling you what your gut is telling you, you have to have the courage not to make the decision. Relying on the data is very important, but it requires looking at common data sources that we all believe in and trust in. It comes down to trust, more than anything.
Anastasia: Trust is essential for data. Organizations can start by having a common, single source of truth aligned across different departments. This will lead to company-level decisions based on data. Also, they need to make sure that all the sensitive data is protected from the start. It’s impossible to gain trust and become data-driven organizations without showing value. By constantly delivering certain milestones and celebrating wins, people will gain trust for the data. As they start to use data more frequently, reliability becomes a bigger part. Organizations need to ensure they define service-level agreements for products as they are driven by criticality and business value. They need to make sure they define data availability, freshness and level of quality, then measure to make sure they’re achieving those goals.
Tucker: In summary, there are a few top takeaways for creating reliable data pipelines. One is that you must think of data as a product. For those focused on products, applying that thinking to data, and not thinking of data as an output of a product but an actual product on its own, is essential. Also, it is important to use data to drive business outcomes. Do not make decisions only on emotion; go with what the data tells you. Finally, focus on trust. Applying that specifically to data is foundational to creating reliable data pipelines.