Data / Data Science / Machine Learning

Indico Applies AI and ML to Challenge of Unstructured Data

21 Feb 2022 3:00am, by

Data is often described as the new oil, the crucial resource that drives today’s companies and key to their ability to compete in the modern business world.

However, data also comes with its share of challenges, not the least of which is the massive amount that is being generated by enterprises. IDC analysts predict that in 2025, 175 zettabytes of data will be created, all of which needs to be collected, processed, stored and analyzed to quickly derive pertinent information that will drive business decisions.

This is further complicated by the fact that 80% to 90% of data being created is unstructured data, which includes everything from emails, texts and social media to images, videos and speech. Such data doesn’t fit into tables or other preset models and can’t be mapped into designated fields. It can’t be stored in traditional databases, making it more difficult than structured data to manage.

“One of the big challenges in unstructured is the heterogeneity of it,” Indico Data CEO Tom Wilde told The New Stack. “We think about it in terms of email, image [and] audio, and that stuff is in frame for unstructured that flows through any major enterprise today. It makes up a lot of critical front-, middle- and back-office workflows.”

Addressing Unstructured Data

Tom Wilde

IT in the past used tools like regular expressions and Boolean Logic — with relative success – to help machines better address unstructured data. The arrival of artificial intelligence (AI), machine learning (ML) and neural networks in recent years has changed the dynamics, according to Wilde.

“For the first time we can actually teach the machine by example instead of by instruction,” he said. “That’s a really profound shift. Now if I show the machine enough examples of what I’m trying to solve, it can learn through observation to solve it as a human would. That’s the good news. I don’t need to be technical to show examples. I just need to find a way to put those examples of products on the neural networks.”

That said, there are issues even with those emerging technologies: While machines armed with AI are smart learners, they’re also slow. Users may need to show it tens of thousands of examples before it can reach human proficiency and “now we’re back outside of something a business user can take on a job,” Wilde said.

Indico’s Unstructured Data Platform

Boston-based Indico Data, which was founded in 2014, is looking to address the issue with its Unstructured Data Platform, a collection of tools that leverage transfer learnings approaches, learning techniques and software agents to quickly create machine learning models that can be applied multiple times to pull value from unstructured data.

The company unveiled the platform in September 2021, delivering it as a service in a private cloud or hosted by Indico. It includes the company’s Indico Ignite services.

“What Indico has done here — and the real disruption is the founders who started the company … invented an approach using something called transfer learning, whereby a general approach to the problem could be repurposed over and over again as a bootstrap to solving the specific problem,” Wilde said. “Some people describe it as, we run the first 26 miles of the marathon and the customer runs the last .5 miles. We built this massive pre-trained understanding of language, of images, etc., as a giant generic bootstrap that allows the business user to create custom machine learning models sitting on top of that and train it how to do the specific task that they’re trying to solve. That’s the technical disruption that needs to be brought to the market.”

Custom Models Are Key

The CEO said the ability to create custom machine learning models with only a small amount of training is one of three key parts of the strategy. In addition, the application itself is essentially a point-click which makes it easy for customers to do such tasks as creating the models and presenting and labeling the training data. Keeping humans in the loop also is a key part.

“Oftentimes, customers will say to us, ‘How accurate is your solution?’” he said. “That’s the wrong question. The right question is, ‘How difficult is it for me to get to 100 percent accuracy?’ It’s a subtle difference but an important one. The key distinction there is human knowledge. Getting to 100 percent accuracy is both a machine learning problem and a user experience problem.”

The application allows the user to see what has been processed via custom models and understand where it’s lacking confidence in its prediction. The user then needs to understand the efficiency in finding the problem and remediating it.

“That all together is what solves this problem,” Wilde said. “That separates Indico: it’s all of those things combined. It’s not one of those things.”

A Growing Company

Brandi Corbello

Indico has raised $36 million, including $22 million in December 2020, which enabled it to grow its workforce and expand its channel reach. It now has 15 enterprise customers in such industries as financial services, insurance and commercial real estate, such as Cushman & Wakefield. The Chicago-based global real estate services firm — with about 50,000 employees in 400 offices across 60 countries — was investing in automation technologies as part of its larger digital transformation initiative.

The company had been successful leveraging robotic process automation (RPA) for structured data and wanted to address unstructured data that included millions of documents, emails and other text-based information.

Brandi Corbello was vice president of transformation at Cushman and Wakefield at the time.

“Typically what you’ve seen in the automation space is the automation leaders typically sitting in a function or in a business unit,” Corbello, who in January became Indico’s vice president of business development, told The New Stack. “They’re really only exposed to the problems sitting in that particular business unit or that function. I was in the position where I got to see all of the problems across the organization. What we were finding in a lot of our unstructured data problems was there were a lot of niche solutions or siloed solutions, with one solution that could only look at leases or one solution that could only look at invoices.”

Cushman and Wakefield’s Challenges

The company needed a strategy to solve the unstructured data problem across the organization, she said. The Indico solution enabled business process experts to build models and modify them when needed without having to involve IT or data scientists. The analytics in the offering identified relevant terms in a document in multiple documents and the platform could be used for myriad use cases and document types in disparate business units.

“Within each use case, we created custom models,” Corbello said. “My team knew one tool, so we had one tool in the platform vs. two to three, [and] our end users were actually quite comfortable with that tool, so the change management from a business perspective was low and we had good user experience. We were able to go after a multitude of use cases and not just focus in on one problem in the organization.”

The ability to create custom models also makes it easier for enterprises to adopt the technology, Wilde said. Addressing the first use case generally takes between 90 and 120 days, but subsequent use cases take about four to six weeks.

Once they’re up, once customers get the learning curve, they’re able to capture value at an accelerated rate,” he said.

Cushman and Wakefield used the Indico solution to address such use cases as lease analysis and procure-to-pay invoicing workflows and save 16,000 hours on its deal management initiative and accelerating the turnaround time on each deal by 70%.

“What was key for us at Cushman was we were able to operationalize the tool,” Corbello said. “We were able to embed it in those processes and within those businesses. It’s a true part of their operations, rather than it being a silo technology.”

Featured image via Pixabay.