“It’s not who has the most data who wins, it’s who is able to act most quickly,” said Jack Norris, Senior Vice President, Data and Applications at MapR. “It’s about the data agility, being able to see what the data is telling you and being able to act appropriately.”
In this episode of The New Stack Makers, Norris shares wisdom about how to get the most from big data. It’s not just using AI to plumb historical or accumulated data, he said, but there’s a performance aspect of how one injects performance into a business operation. The point of operationalizing AI is being able to understand the context of what is going on at a speed that you can actually influence the event. For customer engagement, that’s interacting with the customer on the web, or with security, it’s identifying a fraudulent transaction before it is completed.
Usually when talking about AI, people jump to the algorithm or what is the perfect tool and leave the data logistics to the end. But any AI strategy should start with harassing the data logistics.
Harris suggested reading Google’s white paper on “Machine Learning, the high-interest credit card of technical debt.” Machine learning is a great addition to your toolkit, but until you understand data dependencies you will incur technical debt, he said.
In order to avoid this, he advised to separate the data dependencies and data flow from the actual machine learning technique. And make sure that the data logistics are in place support on-going agility. Harris suggested the book “Machine Learning Logistics by Ted Dunning and Ellen Friedman” for a deeper dive and specific suggestions.
Solid data logistics not only avoid technical debt and downstream resources, but it also makes the precious data science resources you have in-house more productive and effective, he said.
Often people structure plans around a set of myths that don’t provide any value, so it’s important to address them, Harris said.
The first myth is that AI is all about the right algorithm. As already noted, it’s better to start with data logistics and leave the algorithms later.
Second, although going all cloud is the simplest path — if you have a consistent data layer that not only helps you navigate the cloud more easily, it also bridges cloud and non-cloud tech. But, he cautioned, “just because something has the same brand doesn’t mean that you don’t have separate data silos that need to be navigated and coordinated.”
The third myth is that containers are only good for simple, stateless apps. In fact, they are very versatile and work well with edge computing.
Listen in and hear Norris talk about dataware, using SQL in new and more useful ways, how the oil industry is using ML and a deep dive into the three myths that are slowing down ML, and advice for engineers and developers.
In this Edition:
0:55: How much scale are we talking about?
7:59: MapR, do you provide this as a service for your clients, or is this something that each company needs to come to on its own?
10:37: Discussing the new query language that Norris wrote.
14:41: ML workflows and details across industries.
16:17: Discussing the three myths of artificial intelligence.
24:43: What advice do you have for our listeners who are engineers who work with data at scale, about what they can do to keep their careers fresh?
Feature image from MapR.