Bob Muglia, the former head of Microsoft’s server and tools business and former Juniper exec, is finally talking about Snowflake, the startup he joined a few months ago as CEO. Snowflake is a data warehouse as a service offering that lets users combine structured and semi-structured data. Built from scratch with full support for standard SQL, Snowflake has some unique capabilities designed to make it a lot less expensive than other options and accessible to more than just data scientists.
The cost savings come in part because Snowflake cleverly decouples compute and storage. The result is that an analyst can come into work in the morning, instantiate a virtual warehouse and leave it running all day. The analyst shuts down the virtual warehouse before leaving for the day so that the company doesn’t have to pay for it over night.
“Nothing’s lost because the data is on S3,” Muglia said. “It dynamically rebuilds itself when you start it back up.”
That’s a different model than most other data warehouse technologies use, he said. “If you look at Hadoop or others, the first thing you do is load all the data onto clusters. That takes hours to do. With us, you can have a virtual warehouse running in a couple of minutes,” he said.
The result is both easy start up and cost savings. Snowflake charges for use of the virtual warehouse and separately for storage, which costs users slightly more than Amazon’s charge for S3.
But cost savings wasn’t really the main driver for Snowflake. The service is designed to let businesses combine structured and semi-structured data as well as offer an easier to use service.
Businesses have been collecting structured transactional business data, which Muglia describes as data about what happened, for years, using traditional data warehouses. In the last decade or so, many companies have begun collecting machine generated data, typically from apps, using Hadoop. That data is semi-structured and answers the question “how,” he said.
“If you can put the what and the how together, you can get a level of understanding not possible otherwise,” Muglia said. But doing so is difficult. “Today, traditional data warehouses don’t handle machine data and Hadoop doesn’t handle structured data really well,” he said.
In addition to combining structured and semi-structured data, Snowflake is designed to be easier to use than other data warehouse products. One way Snowflake makes its offering easier to use is by delivering it as a service. “If you’re running Hadoop on premises or in the cloud, you’re running the environment yourself. You’re maintaining it, having to back it up and take care of it. We handle all that for you,” he said.
Amazon Web Services’ Redshift is probably the closest comparison to Snowflake in the cloud but even it’s harder to manage, Muglia argued. “AWS makes it easy to create a Redshift cluster, but once you create it, you manage it yourself,” he said. He describes Redshift as more of an infrastructure service while Snowflake is designed to be software as a service.
The result is that Snowflake users may not have to compete for hard-to-find talent. For example, to run Hadoop, businesses typically need a data scientist and Hadoop operations people. “They are the people who every company we speak to says are almost impossible to find,” said Jon Bock, vice president of marketing at Snowflake. “In that case, you start to say, ‘given how difficult it is to find those people, I can’t do everything I want’.”
To get a database up and running, a business has to figure out database distribution and database indexing, make sure each queue has resources and rewrite things that aren’t working well. “All those are things we take care of,” Bock said.
The company built its service with full support for standard SQL. “We started with C+ and Java complier and wrote the code,” Muglia said.
Snowflake thinks its use of SQL will make it much easier for businesses to work with its service. Bock did a LinkedIn search to find out many people mentioned Hadoop in their profiles. He found 50,000 people. A similar search for SQL came back with 1.5 million profiles.
Also, because Snowflake was built for SQL, it interfaces more easily with data visualization tools like Tableau. Such tools are often challenging to adapt to Hadoop, which doesn’t natively support SQL, he said.
Snowflake currently runs in AWS but is designed to be able to run in different public clouds. “That’s where we are now because that’s where our customer data is,” Muglia said.
For now, Snowflake works with Tableau, Informatica, Microstrategy and Excel. The integration with such tools is “relatively straightforward,” Muglia said, but requires some work.
Snowflake isn’t totally alone in trying to make it easier for businesses to combine structured and unstructured or semi-structured data. ExtraHop has been talking about a similar idea.
Muglia just revealed in June that he’d joined Snowflake. He previously spent a couple years at Juniper, leaving late last year just after a new CEO was named at the company. He’s better known, however, for having spent more than 20 years at Microsoft. Before leaving the software giant, Muglia headed up the server and tools business. When he left Microsoft, he handed the torch to Satya Nadella, who has now become CEO of Microsoft.
Snowflake is announcing today that the service is currently available for beta users. Early customers, including Adobe, Accordant Media, White Ops and VoiceBase have already been testing out the service.
Snowflake is also announcing $26 million in funding from Redpoint Ventures, Sutter Hill Ventures and Wing Ventures. The company said it plans to use the investment to fund product development and chase customers.
Image via Flickr Creative Commons.