Data Science / Observability / What Is DevOps?

Monte Carlo ‘Insights’ Ranks Data by Importance and Impact

1 Dec 2021 6:26am, by and
ranking

Businesses are dealing with an onslaught of data today, due both to the rapid increase in the number of data sources and the high volumes of data from each. As a result, it’s not sufficient for data-driven organizations to keep mere inventories of their data. They need to triage and keep a prioritized taxonomy of it, too. To address this need, Monte Carlo, a pioneer in the data observability space, has built upon its core platform, and introduced a new module — Monte Carlo Insights — that helps customers not just monitor data agnostically but understand which assets within their data estates are the most important and impactful.

Data Observatory

Let’s back up a bit first so we can understand the platform underneath this product. Data observability is a relatively new data management discipline. While it may at first sound esoteric, a deeper look reveals it to be a straightforward and logical category. Imagine a technology that borrows concepts, in roughly equal parts, from data quality, ops monitoring, and performance management, and you’ll get a good understanding of what data observability is all about.

A data lineage visualization from the core Monte Carlo Data Observability Platform Credit: Monte Carlo

Data observability offers a holistic view of productionalizing data and data reliability by combining data quality management alongside metadata, lineage and even certain principles of application performance management (APM). This combination helps users resolve data issues, understand their impact and communicate them effectively across different user groups. It also helps ensure the functionality and accuracy of dashboards and machine learning models built on the data being observed. With that in mind, the Monte Carlo Data Platform is aimed at data engineers, data analysts, data scientists and the people who manage data teams.

So Meta

Moving “up the stack,” Monte Carlo Insights, which was announced on November 3, is designed to give organizations analytics and insights into their own data platforms. The offering essentially provides operational analytics on the data used to produce business analytics — perhaps we could call this meta-analytics. Insights does this by performing analyses of all tables in a data warehouse or data lake, then ranking them by importance to the business.

This ranking is the major innovation — it leverages machine learning and is based on criteria such as upstream and downstream dependencies, how frequently people query the tables and who is querying them, how widely the tables are used and what reports are driven by the tables. Insights tracks other data sets beyond the ranking of key assets. These include operational trends over time, to monitor service level agreement (SLA) compliance; impact level of addressed data quality issues; and cost trends, via dashboards. Insights also provides easily shared, high-level reporting; it can be used through its APIs and user interface, as well as through Snowflake secure data sharing.

Stop the Bleeding

Insights addresses more than just data usage intelligence for its own sake, of course. The ranking it provides can help organizations focus their data quality and reliability initiatives and ongoing efforts. This, in turn, can mitigate the impact and reduce the costs associated with bad data, which a 2017 Gartner study says costs companies an average of $15M annually. Monte Carlo Insights thus sets out to provide a holistic view of data health to the people managing the data platform (data stewards, chief data officers and IT leaders, for example) and illuminate what should be looked into for reliability and cost management.

The goal of Insights is to minimize the time spent on finding reliable and important data while improving users’ understanding of their own platform. To that end, Monte Carlo’s co-founder, Lior Gavish, told The New Stack “Data teams deliver analytics to various parts of the business such as marketing, sales, product and operations but, really, do they understand how their own platform works and performs? With Monte Carlo’s Insights, they are able to track their reliability, performance, cost, technical debt, and even the impact of their work.”

First, but Not Only?

Poor data quality results in poor decision-making, missed opportunities and lost revenue. Shining a light on data that is most important can help enterprise customers get past the “data-driven” cliché and build a substantive data strategy. We expect Monte Carlo Insights won’t be the lone offering of its type for long.

Feature image by Joshua Golde on Unsplash.