Why You’re Thinking about Data Prep All Wrong
Data preparation is a crucial component of pervasive and trusted data analytics. However, it is just one of many vital features that should be integrated into the entire data analytics workflow.
Viewing data prep as a standalone, disparate task is detrimental to analytics success. Deploying self-service data prep tools that are not fully baked into a larger end-to-end workflow encourages the belief that data cleansing is one stop in a linear workflow; that notion doesn’t not work given the amount and diversity of data in today’s organizations.
What’s Driving the Data Prep Market?
To date, analytics users are investing a majority of their time into prepping data for analysis or waiting for data to be prepped for them. To overcome this roadblock, more and more organizations are seeking low-friction access to data with faster time-to-value.
According to Gartner, by 2020 85 percent of new BI platform spend will be on modern BI platforms that allow all users, including IT, to rapidly build analytics content to meet the demanding time to insight and changing data needs of users.
The exponential growth of big data coupled with demanding time to insight has given rise to data prep solutions — a part of the process that has historically been a roadblock to agile analytics. As such the global data prep market size is estimated to grow from $1.46 billion in 2016 to $3.93 billion by 2021, according to Research and Markets report from November 2016.
It’s a Feature, Not a Market
If you’re looking to take full advantage of big data and use it to discover insights that will translate into dollars saved or dollars earned, it’s important to also think outside the functionality scope. Organizations must understand the role of data preparation in the larger analytic workflow. Here are some thoughts to consider:
- Date Prep Isn’t Just a Single Preparatory Step: Data preparation is integral for speeding up the ability to use data for actionable insights, yet it isn’t a one-and-done process or feature functionality to simply check off. In fact, the word “preparation” is somewhat of a misnomer, implying that the data prep process takes place as an early step. In big data discovery, data preparation may take place at any point in an iterative, ongoing data discovery process. After analysis, the data discovery may reveal flaws in the data or the need to add new data, revealing new requirements for how the data should be shaped, interpreted, enhanced or cleansed. Because of that it must be part of the greater platform that prepares you for the entire big data analytics journey.
- Standalone Tools Ignore Governance Concerns: Self-service data preparation must go well beyond the typical integrating and cleansing data. It is a continuous operational process where business analysts are empowered with datasets, yet have the power to further blend and enrich the data as needed. The end goal is a collaborative process that is frictionless and operational, yet properly governed.
- The scope of functionality for most standalone data-prep tools is narrow. Data-prep tools encompass data quality and remediation, and limited transformation functionality, like splitting columns and joining tables. Such tasks are important, and user interfaces that make shorter work of them are valuable. However, it’s important to understand that functionality like that tees users up to then do more: heavy transformation work, analysis, visualization, and then further transformation work, the requirement for which is identified from that activity.
Approach Data Prep as an Integrated Part of the Entire Process
Organizations need tools that will help them with all of this and, preferably, do as much of it as possible in a single product, avoiding the need to shift between disparate tools.
While the market is accepting self-service data preparation as a standalone tool, for now, beware as you plan your big data initiatives and investments. Even if data prep is your biggest pain point at the moment, understand that even if you make that part of the process a little easier, you ultimately need it to perform in a larger analytics workflow.
Feature image via Pixabay.