Modal Title
Data Science / IoT Edge Computing / Machine Learning

Unstructured Data Will Be Key to Analytics in 2022

Dec 14th, 2021 10:00am by
Featued image for: Unstructured Data Will Be Key to Analytics in 2022
Feature image via Pixabay.

Kumar Goswami
Kumar Goswami is the CEO of Komprise. He has spent 23+ years delivering products that solve complex IT problems with simplicity and cost efficiency.

For decades, managing data essentially meant collecting, storing and occasionally accessing it. That has all changed in recent years as businesses look for the critical information that can be pulled from the massive amounts of data being generated, accessed and stored in myriad locations, from corporate data centers to the cloud and the edge. Given that, data analytics — helped by such modern technologies as artificial intelligence (AI) and machine learning — has become a must-have capability, and in 2022, the importance will be amplified. Enterprises need to rapidly parse through data — much of it unstructured — to find the information that will drive business decisions. They also need to create a modern data environment in which to make that happen.

Below are a few trends in data management that will come to the fore in 2022.

Data managers will broaden their focus from structured data to unstructured data analytics

Traditionally, a lot of data science was focused on feeding structured data to data warehouses. But with 90% of the world’s data becoming unstructured and with the rise of machine learning, which relies on unstructured data, data scientists should broaden their skills to incorporate unstructured data analytics. They need to learn how to glean value from data that has no specific structure or schema and ranges across video files, genomics files, seismic images, IoT data, audio recordings and user data such as emails. Developing these skills, which involves staying current and experimenting with new unstructured data analytics capabilities in data lakes as well as learning unstructured data management techniques, will be paramount in 2022.

‘Right data’ analytics will surpass Big Data analytics as a key trend 

Big Data is almost too big and is creating data swamps that are hard to leverage. Precisely finding the right data in place, no matter where it was created, and ingesting it for data analytics is a game-changer because it will save ample time and manual effort while delivering more relevant analysis. So, instead of Big Data, a new trend will be the development of so-called “right data” analytics.

Storage-agnostic data management will become a critical component of the modern data fabric

A data fabric is an architecture that provides visibility of data and the ability to move, replicate and access data across hybrid storage and cloud resources. Through near real-time analytics, it puts data owners in control of where their data lives across clouds and storage so that data can reside in the right place at the right time. IT and storage managers will choose data fabric architectures to unlock data from storage and enable data-centric vs. storage-centric management. For example, instead of storing all medical images on the same NAS, storage pros can use analytics and user feedback to segment these files, such as by copying medical images for access by machine learning in a clinical study or moving critical data to immutable cloud storage to defend against ransomware.

Data fabrics will be a strategic enterprise IT trend in 2022 

Data fabric is still a vision. It recognizes that your data is living in a lot of places and a fabric can bridge the silos and deliver greater portability, visibility and governance. Data fabric research has typically focused on semi-structured and structured data. But 90% of the world’s data now is unstructured (think videos, X-rays, genomics files, log files and sensor data), and this data has no defined schema. Data lakes and data analytics applications cannot readily access this dark data locked in files. Data fabric technologies need to bridge the unstructured data storage (file storage and object storage) and data analytics platforms (including data lakes, machine learning and natural language processors, and image analytics).

Analyzing unstructured data is becoming pivotal because machine learning relies on unstructured data. Data fabric technologies need to be open and standards-based and look across environments. In 2022, the data fabric should move from being a vision to a set of architectural principles of data management. Technology vendors need to incorporate unstructured data into their data fabric architectures given its rising relevance and sheer magnitude.

Multicloud will evolve with different data strategies

Many organizations today have a hybrid cloud environment in which the bulk of data is stored and backed up in private data centers across multiple vendor systems. As unstructured (file) data has grown exponentially, the cloud is being used as a secondary or tertiary storage tier. It can be difficult to see across the silos to manage costs, ensure performance and manage risk. As a result, IT leaders realize that extracting value from data across clouds and on-premises environments is a formidable challenge. Multicloud strategies work best when organizations use different clouds for different use cases and data sets. However, this brings about another issue: Moving data is very expensive when and if you need to later move data from one cloud to another. A newer concept is to pull compute toward data that lives in one place. That central place could be a colocation center with direct links to cloud providers. Multicloud will evolve with different strategies: sometimes compute comes to your data, sometimes the data resides in multiple clouds.

Synthetic data and unstructured data will be needed to manage data growth

Data security and privacy are becoming more pressing, and synthetic data is an excellent solution to prevent user data collection. Synthetic data is also more portable since you do not have as many privacy laws to consider. While synthetic data reduces the footprint of customer data, it is still a small fraction of the total unstructured data. The bulk of data is application-generated, not user data, so synthetic data coupled with unstructured data management is needed to manage data growth.

Enterprises continue to come under increasing pressure to adopt data management strategies that will enable them to derive useful information from the data tsunami to drive critical business decisions. Analytics will be central to this effort, as will creating open and standards-based data fabrics that enable organizations to bring all this data under control for analysis and action.

Group Created with Sketch.
TNS owner Insight Partners is an investor in: Precisely.
THE NEW STACK UPDATE A newsletter digest of the week’s most important stories & analyses.