HashiCorp CEO Dave McJannet: In a Cloud Native World, Service Names are the New IP Numbers
HashiCorp sponsored this post.
Since Dave McJannet took on the role of then-emerging infrastructure software provider HashiCorp in 2016, both the company, and the industry it serves has changed dramatically, as more organizations started to adopt container-based cloud native computing, and look to spread newly created, microservices-based workloads across multiple clouds.
Prior to taking the helm of HashiCorp — started by Armon Dadgar and Mitchell Hashimoto in 2012 —he advised the company from his role as Executive in Residence at Greylock Partners, where he had been since 2015. Before that, he was Senior Director of Marketing at VMware, where he was instrumental in creating and managing both Cloud Foundry and Spring Framework. His work ultimately helped birth the company Pivotal itself. After, he worked as Vice President of Marketing for Hadoop distribution provider Hortonworks.
We caught up with McJannet at the company’s HashiConf user conference, held this month in Seattle, where the company released its first hosted service for the Consul service mesh, HashiCorp Consul Service (HCS) on Azure, as well as additional collaboration tiers for its hosted infrastructure provisioning tool, Terraform Cloud.
When I think of HashiCorp, I think of a set of solidly engineered products: Nomad, Consul, Vault, Terraform. But how do they fit together? What’s the overarching vision for the company?
The way we think about it is in terms of market transitions. The infrastructure market is going through a somewhat predictable transition from people running largely dedicated servers to running cloud infrastructure. The way you have to decompose what that means for the participants and IT is by their ops, security, networking, development people in IT.
[With] the infrastructural layer, the way we used to provide compute capacity is we would buy a server and then run vSphere on top of it and provision a VM on top of it. In the cloud world, you don’t own the infrastructure anymore, you give a set of instructions to Amazon. The ops challenge is solved through that, not by buying servers. That’s why we have Terraform.
At the security layer, you go from this high-trust network to this identity-based model for security, right? At the networking layer, you go from the IP-based networking world to the service-name based networking. Then at the runtime layer, [where] I think a lot of the heat and light is around Kubernetes is, “Oh, Kubernetes is an application platform that runs atop infrastructure security networking for delivering new applications.”
Those are the core elements of infrastructure that I think get redone essentially in the cloud model. Basically, it’s infrastructure, security, networking and the runtime platform for developers, and the fifth one is actually APM [application performance monitoring]. I think you’ve seen Datadog have great success in this sort of new [space], monitoring highly decomposed applications that are built-in pieces.
I think those are the five elements of infrastructure where our core focus is on the top four. Which is Terraform for infrastructure provisioning, Vault for identity-based security, Consul for service networking and then Nomad as a scheduler. But we recognize that there’s heterogeneity at the runtime layer. That’s our vision of the company is to be an infrastructure provider of essentially the new stack for the cloud model.
Excellent, excellent. In your keynote, you had mentioned that Consul is actually the most widely used of HashiCorp’s software packages and in yesterday’s keynote you said that networking has gotten very hard in terms of this new cloud-based model. Can you talk a bit about that?
Yeah, so the history of Consul, Consul is actually the first major product that Armon and Mitchell built because their view was that in this cloud model you need a common service registry that tells you where everything is. And everything automation-related is a derivative of knowing where everything is and what its status is. Consul ended up getting used — I mean, 90% of the cloud native companies we come across use open service Consul as the basis for how they do this and you just sort of discover it. By virtue of the way it’s deployed, Consul is used on every compute node, so basically every server has a Consul client on it. The numbers are just enormous. Think about the fleets the people have that we have seen: As many as 35,000 nodes in a single cluster. These are Alexa top 50 companies that have huge infrastructure. They may have 100,000 nodes in their fleet. The scale of it is just enormous.
Now the history of Consul is really around service discovery and service registration. It allows you to create this common service registry that tells you where everything is. When I drop a new artifact into that environment, Consul discovers it and it says, “Oh, now you have two instances of the application server. Now you have three instances of the application server.” As the world progressed it was “Now you have three instances of the container.”
It doesn’t really matter what the element is, it’s the idea of a common service registry. That’s why it’s our most widely-deployed product: If you looked at the GitHub stars history, you’ll see that, until very recently, Consul is by far the most popular. Terraform passed it maybe about six months ago, but it’s the most popular.
The real problem that Consul is going to solve for you is around where everything is and how do you route traffic to the things that are out there. Just imagine how hard it is for the networking team to update the firewall rules every time an application is deployed in that environment, when that application is coming in the form of 100 containers? Then someone says, “Oh, that application’s now under load, I need another 2,000 instances of that thing. Go and update the firewall rules to allow that to happen.”
It’s almost impossible for a human to do that, because these things are changing IP addresses all the time. This is how Consul gets used: Rather than doing networking based on IP addresses, it does networking based on service names. “Container” can talk to “database,” rather than 126.96.36.199 can talk to 188.8.131.52 … You let a machine maintain the mapping of where that thing happens to be at any point in time, so it knows what the IP address is …
There’s so much hype around the construct of a service mesh, which is weighed in the ability to provide direct encrypted connections between things in my container world. You forget to step back. The problem you’re actually trying to solve is how do things connect together. Consul, you could argue, was a service mesh long before the idea of a service mesh became a thing.
Consul really has three use cases ultimately: Number one is, obviously, a common service registry. Now let me do health monitoring of all those services. Technologies like Datadog and others plug into Consul to inform the health of their applications.
The second use case is routing using Consul DNS. Just update the DNS registry that tells me where all of these things are and now actually I can really automate the routing process for when a new thing lands in the environment. It gets discovered straight away and Consul knows that it’s there and routes traffic to it.
Then, the third use case is really just a setting, truthfully. “Hey, I want to enable or enforce encryption every time these things talk to each other.” That’s the service mesh use case. We didn’t start there. We started with the broader problem. It’s not just about routing traffic between containers and encrypting them. It lets me think about how to network more broadly. The service mesh use case is going over networks so encrypt the connection between the two.
Nice, and the service mesh was kind of born from Kubernetes but it seems like this was a set of problems that were going to be generated anyway.
Actually I think the service mesh construct is born from containers. I used to deploy a monolithic application all within local hosts and all my calls between the application were inside a single address base. In that case, networking is actually quite easy. I can update our firewall rule to let this thing have traffic. With the advent of containers, now my applications come in a bunch of pieces. No longer are my calls between the localhost, which is a much more secure way of doing it. Now it’s actually between things, and, oh, by the way, now it’s worse because when that thing scales up, you’re going to give me more containers and you end up with this proliferation. The decomposition of the application elements has required a different way of doing networking.
Now, Google which really is the only company behind Istio, came up with the notion of having this capacity embedded into [Kubernetes].That is certainly a valid use case, but what happens when that container wants to connect to a database that’s not running in Kubernetes?
The company introduced some new pricing tiers for Terraform Cloud, somewhere between “enterprise” and free. Are small- and midsized market businesses (SMBs) moving towards this idea of cloud-based applications of programmable infrastructure, not just the Web and mobile giants?
Actually, I think the history’s kind of curious. Terraform is just a command-line tool like Git. Git for developers, Terraform for ops. People run it on their desktop or laptop as a desktop client. Then they interface to Amazon or Azure direct [in larger teams]. We created this fully-fledged platform called Terraform Enterprise, which provides a collaborative workflow around teams and provides governance and policy of people using the open service.
Our commercial customers came to us years ago and said you know what, our real issue is this collaboration around the use of teams using that command-line tool. Kind of like going from Git to GitHub. What we learned is that actually that collaboration workflow needs to be part of the core of Terraform. Because people are inherently collaborating around infrastructure, the same way they collaborate around Git. Very few people use Git in isolation. Yes, you can, the real value comes when can collaborate on it. I think it was more learning for us that what we developed for the enterprise market was actually super applicable to actually the general workflow requirement for Terraform itself. The potential user base might include teams building infrastructure, but maybe not at the size of the Fortune 500? Now I think what we’ll see is a lot more of these somewhat smaller companies using the paid version of ours.
So a lot of smaller companies are also investigating cloud native computing?
Yeah, actually I think it’s super interesting. A lot of the smaller companies are already using cloud even more than the bigger ones. They’re just using Terraform on the client-side, they’ve had no alternative for the collaboration capabilities outside of doing it themselves. I think in some of the later developing markets — some of the places in Asia that come to mind — are actually skipping the old on-prem world altogether and going straight to cloud.
How does HashiCorp determine what features will be open source and what will be commercial products?
We are deeply committed to the open source model because it provides a facility for us to bring a huge group of people to collaborate with us around solving the core technical problems of how to provision infrastructure, how to do database security. The integration requirements are enormous on that, and then you know, how many things do you want to provision? We’ll provide for that. Open source works really well for solving the problems of the individual. Our rubric is pretty straightforward, we think about organizational complexity as the customer requirement where our commercial products fit in. They solve different problems and it is analogous to go from Git to GitHub. Think about the continuum if you go from an individual to a team to an organization.
As you go from an individual to a team, well, the problem that’s introduced is one of collaboration. “How do I not step on you when you provision something and you not step on me?” When you go to an organization on top of the collaboration problem, you have the issue of governance and policy. For example, no one can provision after 5:00 p.m. on a Friday, and nobody outside of this ACL group can change a networking rule, and then governance which is basically auto-trail, searching with who did what. You lay it out that way and you start to recognize actually those are completely different problems than the technical problem that’s getting solved. It is Git versus GitHub. That’s how we think about our commercial delineation.
Earlier this year there was a lot of concern voiced about Amazon Web Services’ commercialization of open source packages. But I was surprised with the launch of the Azure-based Consul service how much Microsoft had actually lent a hand. How was it working with Microsoft?
The Consul service is very deeply integrated into Azure as a tier-one experience. I think that’s reflective of both the close relationship we have with Microsoft, but also I think Microsoft’s outlook on this. I think Microsoft has always viewed itself as a platform. They I think it has been very deliberate about enabling key partners to build businesses on top of the platform. And in this sense, as you see, the [Consul service] billing comes through your Azure account.
Which is good for Microsoft, and it’s good for us and it’s good for the customer. I think that’s what’s emerged as the market has matured, this idea of a cloud provider providing the core services. I think Microsoft is very clearly taking the view that it wants to be a platform, not an aggregator of services.