Building machine learning and artificial intelligence models is no easy feat. Although the noise surrounding AI is deafening, the art and application of this technology does require a certain understanding of mathematics and the technical know-how to develop meaningful AI models and algorithms.
With growing volumes of raw data about people, places and things, plus increasing computing power and real-time processing speeds, immediate AI applicability and business benefits are a more viable reality. But before IT leaders attempt to successfully deploy or conquer an enterprise-wide AI strategy, they must have the capability to bring large datasets together from disparate and varied data sources into a secure, centralized and scalable governed data repository.
AI is only as intelligent as the data behind it. When it comes down to it, machine learning is processed by training and feeding machine systems information in an organized and structured manner. The effectiveness of a machine learning platform relies on the initial set of data used to train the system. Otherwise, the machine learning systems will produce incorrect outputs and prove to be ineffective.
Take for example Microsoft’s Tay — a Twitter AI chatbot that was supposed to engage in casual and playful conversation with its followers. Instead, it tweeted inappropriate and racist comments. Why? The chatbot was given negative sentiments from Twitter trolls and preventative filtering was thrown out the window once it was launched. This is a clear example of how AI failed because of a bad data set.
We need to put a higher emphasis on the importance of data quality and governance. Your data is not perfect. If educated, talented data scientists struggle with disparate data, why would machines be any better? The solution: allow data governance and quality to pave the way to AI democratization.
Why Your “Crown Jewels” Need Data Quality and Governance
As more companies try to democratize AI on their own, they’re discovering it’s not the easiest thing to do. But companies are eager to speed up the process. In fact, according to a survey, 81 percent of IT leaders are currently investing in or plan to invest in AI, as CIOs have mandated that AI needs to be integrated into their entire technology stack.
But before businesses can get to an AI proof of concept or invest in operational AI applications, they need to have a data quality and governance strategy in place. Both of these frameworks work in tandem, and to put into perspective how data quality and data governance are truly symbiotic, it helps to think of your data as your crown jewels.
With data quality, you ensure that your jewels are cleansed and in perfect condition. Quality is not a one-and-done process and data can come from everywhere. Data management should be continuous to make sure the quality and integrity of data remains so you can make smarter business decisions. Businesses can gain a competitive edge if data quality issues are addressed within the organization. However, to improve data quality, data needs to be accurate, complete and consistent.
On the other hand, data governance requires a team armed with the responsibility and the right tools to manage the system that protects those sacred jewels. A well-planned data governance framework covers strategic, tactical, and operational roles and responsibilities. It defines who can take action,v upon what data, in what situations, using what methods. For businesses, data governance initiatives seek to build a strong foundation for business intelligence and can be a pillar for strategic planning.
Now that the EU’s General Data Protection Regulation is fully in effect, there is a huge opportunity to put a data quality and governance plan in place. A sound data governance approach should and can involve more than one platform or project, plus it should contain a set of rules and standards for data related matters. And keep in mind that a data governance program can stretch across several areas of focus, drilling down to the enterprise or project level. One is (you guessed it) data quality — where finding, correcting and monitoring data quality issues in the enterprise is a top priority.
Data Democratization, Dirty Data, and the Data Champions in Charge of It All
However, with data governance comes great responsibility. It’s no surprise today that companies are in a mad rush to become data-driven, and rightfully so — but this leads to incomplete, inaccurate data or “dirty data,” or data that is riddled with errors and missing values. Studies show that dirty data is the most common problem for workers in the data science field.
Therefore, you want to get a sense of how dirty the data is. Whether you need to update date formats, capitalization or punctuation, it’s important to get a quick understanding of what you’re dealing with. Systems infused with AI capabilities are smart, but they are still computer programs. As noted with Microsoft’s Tay chatbot, you can’t feed the system dirty data and expect to train a model or build a foolproof platform. You can’t train an AI model on the wrong type of data. Like the saying goes, “garbage in, garbage out.”
While data-literate professionals and scientists typically own the keys to the data kingdom, the proliferation of new data streams coming from sensors, social media, the cloud, IoT, and so on, is uncontrollable. That’s why we’re seeing new data-focused roles emerge within enterprises, whether it be a data analyst, a data scientist, or a data steward. These new roles are blurring the lines between enterprise data and consumers, and it’s presenting a challenge related to corporate data quality, reliability and trust that must be addressed by IT organizations.
But there is a solution. While we need data experts helping to maintain data integrity, it’s critical to democratize data in order to distribute information across all teams. Instead of having business units go through IT teams to get the data they need, we can empower all units (marketing, business analysts, IT, sales) to take action on business insights. For example, the marketing department can analyze click streams from the website or finance teams can get vendor billing details. Business users can unleash and access data confidentiality and feel engaged with an active role.
An AI Strategy Is a Data Strategy
It’s undeniable: data is the new oil of today’s fast-paced, digital society. However, when dealing with machines, the quality of the analysis and the outcomes that fall out of it depend on the quality of the data you feed into the algorithm. In fact, businesses can’t and shouldn’t even begin to think about creating and applying their own AI models or algorithms without the power of secure and clean democratized data that is integrated into mission-critical systems.
The results can be disastrous and cost millions.
Before thinking about how you can teach a machine to learn, have a vision for data governance in your company that evolves over time can provide value to your business. The right data strategies are key to implementing the right AI strategies, and we need those who understand data best to maintain the data quality and integrity necessary to fuel the types of automated, intelligent insights that AI can provide. Then, and only then, can organizations fulfill their AI dreams.
Feature image via Pixabay.
The New Stack is a wholly owned subsidiary of Insight Partners. TNS owner Insight Partners is an investor in the following companies: Real.