Is Your Data Lineage Helping You Make Smart Decisions and Set Priorities?
Today everyone is talking about data lineage, and every possible facet of it: Lineage for your code, your values, your run-time processes, even lineage for your laundry (just kidding). Then there is lineage for regulatory compliance, trust, and for problem resolution or impact analysis. It goes on and on, with the need for lineage across all your data pipelines, lineage for your on-premises solutions, your legacy tools, your shiny new cloud solutions, and the things you are building in-house.
There are methodologies and best practices, as well as automated and descriptive hybrid methods for accomplishing all of it. But I want to talk about how well you are using the lineage you are achieving.
Lineage displays come in lots of flavors. Pretty graphics are common (though beauty is always in the eyes of the beholder), with various shapes and colors galore. Lineage diagrams can become incredibly complex. Like a plate of spaghetti, perfectly prepared lineage can still be difficult to understand and trace. Here at Manta, we concentrate on making lineage easier to consume, with options for colors, filtering, levels of lineage rendering, and also practical alerting and highlighting.
Imagine a well-documented lineage scenario that, if drawn on a whiteboard, or zoomed to a high level, would cover an entire wall or hallway! Where to begin as you review it? Perhaps you started this lineage journey at a source table, view, or stored procedure, to perform downstream analysis. There are a myriad of flows, hundreds of column mappings, and numerous interlacing pathways. What are you looking for? Are there any guidelines? Any road signs? Manta provides a dynamic highlighting methodology that we call “ActiveTags.”
These are like having bright red “sticky notes” (or choose your color) attached to the lineage that stretches on your wall or hallway, pointing out specific characteristics that demand your attention, or allow you to zero in on one aspect of the lineage — helping you identify just one strand in that plate of spaghetti.
ActiveTags in Manta are out of the box or can be dynamically added based on real-time discoveries from in-house or third-party solutions. One of these is the Manta capability to flag real or significant transformations. Invaluable for typical ETL flows, these are ActiveTags that highlight where serious business rules are being applied, as opposed to just the simple movement of column information from one place to another.
Here we see an ActiveTag (the blue highlight in the context of lineage) that draws attention to a detailed function in SQL:
When looking for “needles in a haystack,” highlighting exactly where these gems of logic are located is a massive time saver. Sites that dynamically apply data quality information to their lineage pipelines can decide which ones need to be addressed NOW and which ones can be reviewed later.
Seeing a newly discovered data quality problem (on an ETL process as an ActiveTag) that leads to a daily list of potentially churning customers probably demands more attention than one in the same set of pipelines that leads to a quarterly revenue summary. This is not unlike deciding to take a detour when your GPS application indicates a major accident on the road ahead. Customers also utilize ActiveTags to pinpoint where privacy data is lurking in the context of their data flows.
ActiveTag details can be introduced to Manta after discovery by third-party profiling tools that crawl through data to identify those places where sensitive data lives. Manta then helps lead you to where it goes.
You and your teams work hard to achieve end-to-end lineage. Make sure you can find what you need as quickly and as easily as possible, to deliver answers to the demanding questions you receive about lineage from across the organization. Your efforts to document lineage details deserve nothing less!