Part one of a two-part interview with Alex Polvi, co-founder of CoreOS, which incorporates containerization, clustering and fault tolerance into its auto-updating technology. The company calls its offering “OS as a service.”
Alex Williams: Alex, you have been developing CoreOS, and it has really been on a tear over the past several months. We’re going to talk a little bit about what you’re doing, but also I want to learn more about who is Alex Polvi? How did you get started in programming? Were you in grade school? Were you at middle school?
Alex Polvi: I’ve always been a pretty nerdy guy. Growing up, I was the one who set up dial-up for the first time in my family, and so on. I got into open source software in high school. There was a mentor who came in and said, “I want to teach some kids how to program,” and I was the only one that signed up. This guy turned out to be an old-school Unix guy — he used to work at Bell Labs — his name is Jon Steinhart. He taught me how to program C, and he introduced me to this world of open source and this thing called Red Hat at the time, and I started playing with Linux there. That led me to go into Computer Science as an undergrad.
I got really lucky in my freshman year and I got involved with a program that was just emerging at Oregon State University called the OSU Open Source Lab. The Open Source Lab ran all the biggest open source projects on the Internet, like apache.org, mozilla.org, and so on. Brandon, my co-founder, and I were the first two students hired there. Brandon was hired as a developer and I was hired as a sysadmin. From then on, we’ve been in the infrastructure open source world.
What did you find interesting about programming when you were that age?
Alex: It’s the ultimate LEGO bricks, I guess. I like building stuff. Software is cool because it’s perfectly efficient. The only thing you have to put into it is thought and time to be able to build something. There’s no material cost, outside of a computer. The combination of free software and being able to build things – I was just instantly drawn to it; I always have been.
What were some of the first things you built?
Alex: I remember the very first meaningful app I built was tic-tac-toe in C (laughs). I remember Jon saying, “Just think! Just think about it for a minute!” Learning to program is more about learning to think than anything else.
Did you grow up in Oregon?
Alex: I grew up in rural Oregon, on a Christmas tree farm — that’s why I go for the lumberjack vibe. Then I went to Oregon State University.
And your family’s still there in rural Oregon?
Alex: I grew up where my Dad grew up — where my Dad’s dad grew up — so everyone’s still out there.
Tell me about the Open Source Lab, how it started and how you got involved?
Alex: Originally, it was part of the net services at Oregon State – the folks that run the wi-fi networks and the mail servers for the university. Just as a way to give back, they started providing mirrors of open source software. They were mirroring Debian and Gentoo, essentially providing local access to open source software to folks at Oregon State. The mirrors became pretty popular because the network connection they were attached to was really fast. Oregon State always had some of the best mirrors.
Do you remember when SourceForge was the GitHub? There was a class of projects that outgrew SourceForge — namely, Apache, the Linux kernel, Drupal, and freenode.net, the IRC network. All of the big, open source things that outgrew SourceForge ended up at Oregon State, and it was like a managed hosting provider. Oregon State was providing servers for the open source projects. Then they provided some smart hands to help run those servers and patch them and keep them up to date, and that’s what I was doing. I was a student sysadmin. I learned how to use grep working on drupal.org and apache.org.
The Open Source Lab is still going. That’s where I learned to be a sysadmin, and really what got me involved with open source, and what led me to working at Mozilla as well, because mozilla.org was a customer of the Open Source Lab.
What were some of the complexities that you faced when you first started working at the Open Source Lab?
Alex: It was pre-cloud, so you still had to deal with physical machines. Oregon State ran all their own networks, so you had to do it all soup-to-nuts. You had to understand the whole stack and how it all worked. I remember one day we needed to do a data center migration of a server, and it was like, “Alright, pull the machine down and run it across campus.” It was a freenode server – that netsplit was me unplugging the freenode server and running it across campus and plugging it back in again. You know, that kind of stuff – that’s what it was like back then.
What were some of the tools you were using then?
Alex: It was all Linux. CFEngine was pretty popular for being such a sophisticated shop. Lots of IRC for communicating. I became pretty good at Apache by configuring the web server and making it go really fast. I’d say Apache, MySQL… Python was the system of choice, kind of still is.
That was the dawn of the modern open source era, wasn’t it?
Alex: It really was, and I was very fortunate to be in the middle of it. All of those projects I rattled off are some of the biggest open source projects out there now. At the time, it seemed like it was just a bunch of hackers getting together to make this stuff available to people.
I interviewed Mitchell Hashimoto, who was involved in open source when he was in high school. He started with developing videogame cheats. But he learned how to code and became involved in open source that way. To him, it was really about learning. I think it still is about learning. What were those open source communities about for you? Was it just about learning, or was there something more? Could you sense at the time what it might become?
Alex: Open source, in a lot of ways, is a social mission, not just a technical one. You should be able to open the hood of your car and tinker with it if you want to. It’s like an online civil liberties issue. At the end of the day, it’s about your freedom to run software the way you want, and to know what’s going on with your software. That was what appealed to me most about it early on, as well as my being just technically curious. I mean, I love learning stuff as well. That really resonates with me, what Mitchell said. There’s an endless amount to learn in software and, when the hood is open, you can learn at much more depth than you would if you were just being delivered products off the shelf.
It was still a very small community, wasn’t it? There were a lot of attacks on the open source community at that time — was that even relevant to you guys? Did it matter?
Alex: It’s so obvious that core infrastructure should be open source. We should share all the hard parts versus reinventing them over and over again. We knew Linux wasn’t going anywhere; Apache wasn’t going anywhere at the time. Are there new versions of stuff coming out? Always – that will always happen. But the core principles of an open source web server, an open source database, an open source kernel — all those things need to exist. I feel that the people who were working on that stuff were doing everybody in the world a service, because that core infrastructure just needs to exist.
When you look back on those early days in Oregon State… when was that?
Alex: This was all early 2000s.
That was the end of the dotcom era. In 2003 through 2007 we saw the adoption of RSS as a method for trading information. The advent of restful APIs brought in new perspectives on how to connect things, and a lot of Internet-scale companies have emerged since then. What are the milestones that you’ve seen in the open source community’s evolution? Now, we’re seeing OpenStack, and Cloud Foundry Foundation — these different kinds of open source organizations that weren’t there when you were getting involved in the open source communities.
Alex: The hardest parts of infrastructure software go free first. The Linux kernel was the first one to go free in a big way — the hardest, lowest-level part. Below the kernel, you’re at the micro-controllers on the hardware itself. The hardest parts always go free, and we’ve seen that work up the stack over time. The next one was virtualization. Virtualization was pretty hard. VMware got it first, but then it went free over time. Next after virtualization was the cloud platform layer. Amazon got it first… but it’s free-ish — I mean, it’s pretty good at this point. OpenStack is pretty solid, although it has its issues. It just continues to work its way up the stack. The cloud platform is still going free, but it’s more like the platform level and not the running-virtualization level.
Why did Linux container technology not have the kind of adoption that virtualization from VMware had?
Alex: I think it’s just an evolutionary thing. At first, we had a server running on a single box that was running Red Hat or whatever. The next logical step is to take that same server, but package it up — run them like servers, but stamp them out virtually instead. We’re still thinking about things in terms of servers.
One thing to keep in mind is, all along the way, a good best practice is to run one application per server. VMware tried to do this with virtual appliances, but because their stack was about helping with that step from single-host server to multi-host server, with virtualization, it never really caught on, even though it was always there.
Containers are the next logical thing here. The main way Docker containers are being built is by taking Ubuntu or CentOS as a starting point, and building an application inside of that. Instead of having a carved-up server, I’m going to take my “server world,” but build a container that’s just for the one application— the thing we’ve always thought was good.
The reason it’s happening now is because the tooling got easy enough to allow us to do it. Any good software architect will tell you that this is the right way to build it, but the tooling was too hard to do that before.
The next step after this is building a runtime that’s specifically for running a particular application, so you don’t need Ubuntu or CentOS in there at all anymore. It might take a while to get there, where you just say, “here’s my Java app — put an application around it.” Right now, we’re at the phase where we’re transitioning from virtualization to an application-focused virtualization, which is a container.
Tell me about that tooling. What made it so complex? And what did virtualization advancements do with the tooling that was different than with container technology?
Alex: Virtualization didn’t make it all that much easier to actually build an OS, but it did make it easier to run multiple instances on one machine. One thing that Docker’s really good at is building this little image of an application — probably something VMware should have done a while ago — a really easy way to generate an OVF, the image format required to run inside of VMware. But nobody did it that way.
What made it easier to do a virtualization then? What toolsets emerged with the virtualization environment that helped it to gain acceptance?
Alex: I think the big story there was, you could consolidate your infrastructure. That was the momentum. There were some ease-of-use things, but even still, the APIs on VMware haven’t really made things easier for people to deploy. I think the consolidation story is what really caused VMware to take off. There’s actually a TCO there for people that they can relatively quantify. That’s why VMware is so popular with enterprises: because that’s how people think about stuff in enterprises.
There are barely any APIs on VMware.
Alex: Right, because that isn’t the focus.
The focus wasn’t distributed infrastructure.
What was it that Docker did to change the mindset? Virtualization has become well-accepted inside cloud services — Amazon, for example — and has cemented itself inside these different cloud services. Was it becoming more difficult to manage those instances across virtualized environments?
Alex: Ultimately, you’re running a server and you want it to run an application. You buy the server from whomever and you wire it up, so when you power it on it’s running exactly the thing you need from the get-go. A hybrid of that is running things like Puppet or Chef that allow you to boot to a known state, and then turning the server into the thing you want. The net result is the same – your server is running the thing you want.
One architect’s-best-practice way of doing that is to set up the environment so that it deterministically runs the thing the same way, every single time. A really good way to do that is by creating an image of what you want it to run, and then just executing that image.
What Docker did was make it really easy to build that image and run it. Ease-of-use is the key there.
They made the theoretical easy; they made the theoretical practical. They made it easy to build an image of your application and also to share that image with various servers, to distribute it with those servers, and also to share it with other people, potentially. There is a re-use thing here too, which is nice.
What are some of the long-term impacts of that? Is it a wholesale shift to containerization? What is this new distributed environment?
Alex: I think this “distributed environment” piece is really interesting. Keep in mind: that has nothing to do with the packaging of a container. You could take your Puppet or Chef environment and run a container in that without any of the distributed systems. For whatever reason, containers have been coupled with this distributed systems approach for running stuff.
Why do you think it’s been coupled?
Alex: I think it’s a bunch of like-minded folks trying to clean up a number of problems all at the same time. There is an approach here. Again, I keep talking about this: an architect who understands how these systems work, regardless of who they are, will tell you what is the best way to run an environment. We all reach the same logical conclusion. And part of that is: no single server should be a single point of failure. That’s one thing most folks agree on. The only reason it’s not done is because it’s too difficult to do that. If it were trivially easy to write an application that could run across a hundred machines – kill any single one and it doesn’t matter, the application keeps working — everybody would want that. But it’s just too difficult.
The distributed systems part that’s combined with it — I’m so glad to see all the vendors unanimously doing this — Docker’s adding clustering — thinking about things from a cluster point of view from the beginning.
I think we’re cleaning up multiple problems here: not just the way you package applications, but also how you run them in a highly available and resilient way.
What are some of the problems it’s helping to clean up?
Alex: High availability, or, workload mobility. Actually, I never thought about this, but perhaps the reason clustering is so important for containers is because of workload mobility. What’s nice is: some standardization around what is a container, so that different environments can run it in the same way.
Perhaps another nice feature is: I spin up some servers on Amazon and Google, and they run my application the same way, and the clustering is important, so I can shut down my Amazon servers and they migrate over to Google, or vice versa. Perhaps that’s why they’re coupled: because containers are about portability, and to have portability, you need more than just a packaging format. The way networking works, and the way clustering works, etc., need to be portable too.
What new complexities are surfacing because of what these like-minded people are doing to clean up things?
Alex: Everything’s at the table again. We’re re-doing networking; we’re re-doing storage; we’re re-doing our databases. We’re fixing it all – that definitely adds a lot of complexity, because now you’re throwing out a lot. For example, MySQL isn’t well suited for this environment, so, “Yikes! Everything’s built on mySQL!”
So we’re re-doing network, re-doing storage, and throwing out a lot. That seems to be the real race right now, getting those things re-done, isn’t it?
Alex: What we all want in this — Docker, Rocket, CoreOS, Mesosphere, Red Hat — what we all want is for people to be successful with containers. We’re all trying to solve these problems in different ways so that a customer can be successful with containers.
We’re also seeing changes in the evolution of virtualization, such as light weight VMs. VMware is talking about different types of approaches; there are different theories about how they see running containers. I’ve heard it described as containers running inside of a jail on top of a VM. How do you perceive the forces that are still fighting to build out this VM-based stack environment?
Alex: Instead of VMware, let’s use Amazon or Google as an example. You’re going to hit an API that says, “boot a container,” instead of booting a virtual machine. There will be some UI that has a bunch of lines on it – one for each container running, and so on. That’s where the current way of thinking will lead us. VMware is the same — right now you boot a virtual machine and you get a container running instead of a virtual machine. Everything else — how the APIs are built, whether they’re running Linux or not — is implementation detail.
The point is: we’re now isolating things at an application or process level, instead of at a whole OS level. That’s how you think about deployment, and that’s the fundamental shift. That’s the difference between looking at that pane on the control panel versus the other one. You’ll have your application pane, and then you’ll have your OS pane, where you have different OSs running. That’s the shift that’s happening across all these guys, in various different ways. The way that they get there will be implementation details, whether that’s a Docker, or a Rocket, or whatever…