Data lakes are an enticing idea: pour all of a company’s information — both structured and unstructured — into a single system, and then everyone can access it and analyze it on demand. It’s a great idea, but in practice, actually getting that data back out of the lake and into the hands of the people who need it can be a lot more difficult than it sounds.
Kelly Stirman, the chief marketing officer and vice president of Dremio, has been working with data for decades. In a past life, he worked on strategy at MongoDB, and before that, MarkLogic. In his latest role, he’s tasked with helping companies find ways to solve their data lake problems. Chief among those problems, he said, is making that data usable by those who need it.
“It’s still pretty early. Companies have figured out how to get the data in, and now they’re working through the next set of projects to get value out of their initiatives,” said Stirman. “There’s the more important thing which companies talk about, which is consolidating their workloads onto their data lake initiative. For example, how do we take the things that we’ve been doing for 20 years in Teradata and offload meaningful portions of that work into our new data lake? How do we make it so our data scientists can focus on building their models and spend less time on mining the data and getting it ready for model development? I think that’s where you see the excitement and strategy flowing out of many companies.”
The need for data lakes inside enterprises, said Stirman, originates in the need for businesses to undergo the digital transformation: they need to produce more software, faster. Just as cloud brought self-service workloads to the developer, Stirman said that data lakes can bring similar self-service-style access paradigms to data.
“The force that’s really driving these different kinds of changes is all about this really big change in application development that’s really unfolded in the past 20 plus years, which is, it used to be when you looked at the total cost to bring an application from the back of a cocktail napkin to production; the dominant cost factors were infrastructure, and maybe less than five percent was the people involved in bringing that application to production. Now that’s completely inverted, and the overwhelming and dominant cost for any company is the people involved. The infrastructure costs are continuing to shrink. When you look at that, smart CIOs say, ‘How do I optimize for this particular problem?” asked Stirman.
In this Edition:
0:40: How are enterprises doing with their transition to building data lakes internally?
1:50: Are businesses focused on delivering value from data lakes in the lifecycle and operational side, now?
3:02: Is the appeal of a data lake the fact that so many things can get data out of it via different methods?
5:23: Can you tell us about the Apache Arrow Project?
8:01: Managing structured and unstructured data with ETL
12:37: What’s the next big wave of interesting technology in this space?
Feature image via Pixabay.