How Google Is Part of the Cloudification of Open Source

Picking “best of breed” open source databases to feature as managed services is a response to the on-going antagonism between open source and AWS that has practical and commercial advantages. But Google is also tapping into the shift to cloud that open source needs to make in order to deliver what enterprise customers need. The New Stack talked to leaders of the open source organizations that Google is working with to understand whether this is more than just co-selling.
As Google senior vice president for technical infrastructure Urs Hölzle told us at the Google Next event last month, open source has become a critical part of IT for many enterprises but it hasn’t kept up with cloud adoption. “The larger arc of open source software has been gaining acceptance, relevance and importance in the enterprise and we don’t see that ending in the cloud age. Today many, many very traditional enterprises are actually deeply invested in open source; it’s everywhere in their stack. If cloud is the future, then open source needs a future in the cloud.”
But open source has typically required more operational involvement than the managed services of cloud, he noted. “It’s a complex problem because cloud really changes the nature of open source consumption. It used to be ‘you download it, you install it, you run it yourself’; with cloud, that’s a strange thing to do. You want to consume it as a service.”
Open Source Without the Ops
“The two big trends for enterprises are open source and cloud and the story of how those two things come together hasn’t been that well worked out,” said Jay Kreps, the CEO and co-founder of Confluent, a streaming platform based on the open source Apache Kafka. That’s what Google’s approach is trying to improve.
“This is about taking open source platforms and reimagining them as cloud native system that is elastic, pay for what you use, that you get access to in seconds. I think that hasn’t been the experience for people for open source in cloud generally. These are the platforms that companies want to build around; they like the fact that there’s no lock-in, they like these open ecosystems around these projects, but this is about making them now true cloud systems,” Kreps said.
This approach is particularly well suited to modern, NoSQL databases Ofer Bengal, CEO and co-founder of Redis Labs explained to us. “All these databases are built in a way that less on-going ops is needed, the on-going operations of the database are fully automated.” Enterprises are also using multiple databases where they might once have picked a single, commercial DBMS. “You find not only multiple databases within the same organization; you find multiple databases supporting a single application.”
But those same trends made life difficult for open source providers, including Redis. “Cloud providers are stronger than the open source initiatives; they pick them up and monetize them. Without an initiative like Google’s, open source may eventually die because the cloud provider consumes the entire market opportunity around open source and there is nothing left for the company that does the project.”
That’s less of a problem for smaller tools and plugins, but it’s particularly painful for open source infrastructure projects, Bengal suggests. “A database is a large-scale open source project; it’s typically hundreds of thousands of lines of code. This is something that can’t be built by a single developer, you need a large team because this is the lowest level in the stack and it connects with everything in the application stack, and it gets very expensive. You stand all this effort in developing something, hoping you will be able to somehow monetize it and then the cloud providers are stronger than open source initiatives, so pick they them up and monetize them leaving only very little for the developers of the open source.”
The open source version of Redis is available from all the cloud providers, he pointed out. “AWS has ElastiCache which is basically an automated open source fully-managed Redis service. Microsoft has Azure Cache and even Google Cloud has Cloud Memorystore.”
Offering commercial versions with more features than the open source release hasn’t been enough to differentiate from those cloud services, Bengal says. “We thought a product like Redis Enterprise that is much better in terms of functionality will solve the problem; apparently it does not!”
Cloud Scale
A hyperscale cloud like AWS, Azure or Google is naturally going to have more customers and more resources than a single open source company like Redis Labs, Elastic or MongoDB. As analyst Charles Fitzgerald pointed out in a slightly tongue-in-cheek post, managed database services on Google can take advantage of far more capex investment than the same service run by any single vendor and they can offer it alongside all the other services an organization needs, whether that’s IaaS, PaaS or other open source databases.
The consistency a cloud provider can offer makes life simpler for customers, Google’s Hölzle noted. “If a company is using 17 different open source packages, they don’t have seven different sets of rules and ways to install and configure and manage them, but one set of rules.” And if you’re using the commercial version of an open source technology, not having to set up a new billing account for each new technology is far more convenient for enterprises. “If I use 17 different software packages, let me see it all in the same billing analytics so I don’t have to download an Excel file from them, and from them and from you…”
Evan Kaplan, CEO of time series database company InfluxData agrees: “the aggregation of services [on a cloud platform] is actually a really useful thing for customers.”
MongoDB was designed to be cloud-ready, according to MongoDB board member Tom Killalea (and the company has been working with Google for two years already). As the Google Cloud Platform (GC) adds regions, “you can have your data be wherever you want it to be and be jurisdictionally compliant with where you want to keep data,” he said.
If the data you want to use in a database is already stored on that cloud, another benefit, pointed out Emil Eifrem, CEO & co-founder of graph database provider Neo4j, is that “data has gravity,” he said.
The association with Google tools like TensorFlow is excellent branding for Neo4j, he noted. “This is something where a good strategy for the larger cloud vendor, Google, fits with the strategy of the open source community. What we’ve got is a 20-year decomposition of an Oracle, DB2, SQL database infrastructure into targeted, horses-for-courses siloed databases like graph databases, search databases, or time series. This feels like the right intervention at the right time.”
If the managed services approach pays off for the initial seven partners, whose commercial services appear at the same level as Google’s own, will Google expand the program further? Given that one of the advantages all the open source companies mentioned was standing out in the list of services rather than being hidden away in a marketplace with thousands of competitors, perhaps not. “I assume they would limit themselves to best of breed type of products like databases and not just take anyone; that would kill the whole concept,” Bengal told us. “I don’t foresee them having 50 or more products under this initiative.”
This level of integration may not stay exclusive to Google either, if it turns out to be the right model for bringing open source to the enterprise. “We would definite likely to cooperate with other cloud providers,” Bengal told us; “I think that’s the right way.”
As Billy Bosworth, CEO of enterprise Cassandra distribution DataStax put it, “We’re going to keep working with Amazon and Azure for their best of breed technologies; that’s just the new world and everyone is going to have to be comfortable with that.”
Influx Data is a sponsor of The New Stack.
Feature Image by Vane Monte from Pixabay.