TNS
VOXPOP
What news from AWS re:Invent last week will have the most impact on you?
Amazon Q, an AI chatbot for explaining how AWS works.
0%
Super-fast S3 Express storage.
0%
New Graviton 4 processor instances.
0%
Emily Freeman leaving AWS.
0%
I don't use AWS, so none of this will affect me.
0%
Storage

The New Stack Makers: Dreamhost Co-Founder Sage Weil on Ceph, The New Storage and Getting Back to Open Source

Aug 26th, 2014 6:50am by
Featued image for: The New Stack Makers: Dreamhost Co-Founder Sage Weil on Ceph, The New Storage and Getting Back to Open Source

Sage Weil is a technologist who recently sold Intank, a software defined storage company, to Red Hat. In the interview, Weil talks about how he became a developer and some insights into building an open source community.


Sage Weil Interview

Alex Williams: I’m here with Sage Weil, who is one of the co-founders of Inktank and one of the creators behind Ceph. I thought we’d talk a little bit about his life and what he’s doing now, and a little bit on the background of Ceph. Good to see you.

Sage: Good to be here. Thanks for having me.

We tried to do this interview once before, but maybe this time we’ll get it to work. Tell me a little bit about where you grew up and how you got interested in technology.

Sage: Sure. I grew up in Oregon, actually – southern Oregon and northern California. I first got involved in computers in school –

Where in southern Oregon?

Sage: Ashland, Oregon. My mother lived there and my father lived in northern California – Humboldt County – so I split my time between the two places. But, it was elementary school at some point where my babysitter started teaching me Basic and it all sort of spiraled out of control from there – Turbo Pascal, and then I eventually got a C Compiler sometime in high school. Then sometime in the middle of high school, it was like ’94, there was a localized P start-up in town and I got access to the Internet. I started working with them, writing Perl, which was the hotness of the time. It was very exciting.

…must have been one of the early providers.

Sage: Yeah, it was the first ISP in the area, so it was pretty exciting. That was when a fractional T-1 line was a big deal, 28.8 dial-up. My first big project was a site called WebRing (I guess this was in ’96) that lets you link sites together into topical categories. That turned into one of those Internet things in the early days. It kept me pretty well occupied my first year of college just trying to keep all of the servers up.

What were the servers that you were keeping up?

Sage: It was originally running on FreeBSD and then at some point somebody talked me into getting a Spark clone or something that we used – it was Linux. Eventually we had rack-mount servers in San Jose, in the building that used to have MAE-West when that was a thing, but eventually we sold that and went back to doing normal school stuff.

Where was that?

Sage: That was Harvey Mudd College. It’s a southern California tech college.

A lot of interesting engineers have come out of Harvey Mudd – some of the GitHub founders, at least one of them…

Sage: Interesting, I didn’t know that. It’s a really fun school. It’s a hard-core tech engineering school, but it’s nice because it’s co-located with the other Claremont Colleges, so you’re not just surrounded by scientists and computer people. That helped with the whole thing. During college I teamed up with a couple people and co-founded a web hosting company called DreamHost that we worked on through college and continued to work on afterwards. It’s still around today; the company’s doing pretty well. That was a slightly different experience in building a company that’s completely bootstrapped. We never took any external funding. It’s all owned by the founders and the employees; creating a different sort of tech company was a nice experience.

What did you find were your real interests when you got into college, in terms of where you see yourself today, that you draw a path back to?

Sage: I was always interested in building complicated distributed services that are on the Internet, pulling lots of machines together and having them build a higher-level service. A lot of my work with WebRing, of course – it’s how do you orchestrate all of these websites and build a service that lets you link them together. With DreamHost it was around building all of the back-end infrastructure that automates the deployment and provisioning of hundreds of hosting machines; it’s all fully-automated. That interest area in distributed systems in general led me to get involved in storage at UC Santa Cruz when I went to grad school in… I guess it was 2003 or 2004.

WebRing was an Internet-scale business.

Sage: At the time, yeah. I forget how many users it had, but it kept the servers pretty busy. The traffic levels that we saw then are nothing compared to what Internet-scale means today, but we definitely spent a lot of time just trying to make it work and go fast. That eventually became Yahoo!’s problem after they acquired all of the technology.

Then DreamHost became an extension of that interest in distributed systems.

Sage: Yeah, I had a lot of fun building infrastructure that drove that. Eventually there was too much coding in Perl and dealing with automating the system and administrator’s tasks; I wanted to do something a little bit more exciting, and that led me to go back to grad school at UC Santa Cruz and get involved in their storage research group. They had a couple of grants from the Department of Energy – Los Alamos National Labs, Lawrence Livermore, Sandia – specifically to research petabyte-scale file systems for their big supercomputers – tens or hundreds of thousands of nodes in these big machines where they’re all opening files in the same directory, writing and shoveling data into the system and making that scale out, and looking at the object-based paradigm for storage devices at that time. Originally, I became involved in that project looking at the scalable metadata problem – how you deal with the metadata servers that keep track of the file system main-space, files and directories and inodes and creations and readdir and all of that stuff, how you make that scale out across lots of nodes. At the time, the labs were deploying Lustre, which was the current HPC open source thing that everybody was using, but it still had a single metadata server and it was sort of the weak link for these big machines. It was killing performance not being able to scale that particular part of the system. I focused on figuring out how to do that.

Over the course of that project I worked with all of the other people in the research group who had great ideas on how to distribute data – hash-placement and building specialized file systems for managing individual disks – we pulled all of those ideas together into a coherent distributed file system that we named Ceph.

What is it about storage that you found interesting?

Sage: Originally, it didn’t sound interesting at all. I specifically remember this conversation with my adviser-to-be where he was trying to sell me on the program. I think what he said was really true: you have this interesting combination of a lot of different core areas of computer science. It’s reasoning about distributed systems, which is the intersection of networking, performance, optimization and data structures, all of these core areas of computer science that are all very interesting intersect in this boring-sounding thing called “storage.” The reality is to make something that actually works and scales and behaves. You really need to solve all of those problems well. I find it’s a very interesting field.

So, how did you envision Ceph as an architecture?

Sage: …be careful not to say that it was all my idea, because it certainly wasn’t…

How did you and your group…

Sage: There were a couple of critical ideas that came out of that group and led to the genesis of Ceph. One piece of that was how to distribute management of namespace, the metadata part, and that was mine. One part of it was how to use hash-based placement for distributing the objects in a way that lets you make sense of it and migrate things after the fact. That’s where the CRUSH algorithm came from. That was based on the RUSH stuff that R.J. Honicky worked on within the group.

What is that?

Sage: The idea there is basically that instead of having to have some sort of index or metadata server that is keeping track of where all the data is stored, you simply have a deterministic algorithm that lets you calculate where it’s stored. RUSH had this great idea where you could do hash-based placement, and as you added storapatterns it would amend the placement policy and it would move the right amount of data. There were a couple of different variants that were proposed…

That’s storage orchestration…

Sage: Yeah, but it was all very constrained with specific growth patterns. We generalized that to be something called CRUSH that takes all of the little low-level hashing tricks and puts them together in a hierarchical framework that lets you write generic policies about how to distribute things. I added a part of that – one of those policies, called “straw placement” – that had the best of both worlds in the trade-off between computation and rebalancing. Wrapping that up in a nice bow, and having a general policy for figuring out how to store data, is sort of the key enabler that allows the system to scale really well. You can have thousands of nodes and the clients can have a small little bit of metadata. That’s enough to calculate where any object either is stored or would be stored when it’s created. That’s the key enabling technology. The other piece is the general strategy for doing what’s called “de-clustered replication.” Instead of having the data that’s on one disk mirrored on one other disk, it’s actually broken into chunks – subsets of data that we call “placement groups.” Those are replicated randomly across different disks, so if you lose one node in the system or one disk in the system, the additional copies of that data are actually spread across maybe a hundred other nodes, and so you have this massive parallel recovery that happens.

That’s become a real method in lots of different technologies.

Sage: A lot of it is the same bag of tricks that you see come up in a lot of distributed systems. Hash-based placement isn’t new – there’s something called “consistent hashing”…

You see newer things doing that. I mean, you guys were developing it in the earlier days and now it’s almost a common practice, it seems.

Sage: Yeah, definitely. It’s sort of the state of the art for how people build these distributed systems now. Now people call it “sharding,” and hash-based placement is hash-based placement, and so forth. The key thing with Ceph that we did over the next several years, that I’m sure other systems have replicated but you don’t see it in the open-source space much, is figuring out how to provide a sharded object store (that we called RADOS), where the data is stored across lots of different nodes, that gives you strong read-after-write consistency (which is required for doing file system semantics or block devices), and handles all of the complexity around what happens if nodes go down and come up and then data migrates (then you have these complicated state changes because it’s an inherently dynamic system with switches going on and off and all the rest of it), and manages to maintain tenuous consistency guarantees without relying on scalability issues (where you’re all going back to one server to ask where data is stored).

Once we had that base, scalable, elastic, low-level object source service, we could build lots of stuff on top. There’s the distributed file system service that we started with. We later built a block device that’s used heavily in the cloud space now. There’s the RADOS gateway that gives you a high-level restful object service that’s compatible with Amazon S3 and OpenStack Swift to give you those sorts of semantics. We have customers who are building entirely new stuff. They’re building Dropbox-type stuff and they’re using that low-level object service to persist all of their data.

Have you always been involved in open source projects? Or did open source projects not really have any kind of name associated with them when you were starting to develop technology?

Sage: I’ve always – well, not always – it’s been an ongoing evolution, my involvement with open source. In my early days I used FreeBSD and Linux and so forth, so I was always a big consumer of open source. I didn’t actually write very much of it until much more recently. In fact, Ceph is the first real open source project that I’ve done. That was one of the things that made it a really fulfilling project for me. The general issue that I had was, you know, coming out of grad school, you see all of your peers are making all of these amazing projects and designing these really cool systems that are totally unique and different designs, but the vast majority of them, when they graduate, need to go get a job. The only jobs out there are working for the likes of EMC and NetApp and Data Domain, and all of those guys. So, a lot of these great ideas exist long enough for the student to write a paper and publish some results, and maybe the code is sitting around in some repository and is “open,” but it sort of dies on the vine. All of the talent seems to be continuing to go into the proprietary vendors.

…who can pay a lot of money…

Sage: Yeah. They pay really well. I mean they’re great jobs, right? You’re building cool systems. The thing that frustrated me when we were designing Ceph (especially coming out of DreamHost where we had our own infrastructure problems – we were a NetApp customer and it was just so expensive – there were no OpenLog alternatives to having a scalable, reliable storage system that you could actually back an enterprise-type operation – it seemed like a huge gap), I had the luxury, having DreamHost and WebRing before, where I didn’t need a job right away and could continue working on Ceph – this lofty goal of making the first open source, enterprise-scalable distributed file system that was going to change the world – I naively thought that as soon as I posted it online and announced my plans that everybody would be, “yes, this is great!” and start sending patches and it would be smooth sailing from there. It turns out that building community is actually a very long and difficult process.

So, we built a small team of engineers inside DreamHost that worked on Ceph. Over the next four or five years it grew to maybe ten to twelve people. We worked on things like the block device, the risk gateway, authentication, general robustness – taking it from a prototype that you could run in careful scenarios to something that was more real – build up QA processes and so forth. Then, around 2011, we had this realization that Ceph was sort of a big deal, but it was very hard for organizations to really use it. There was no company to support it. It was difficult for DreamHost to invest adequately in the engineering effort that was required to turn it into something robust and usable. At that point we decided to spin out a new company that would focus specifically on Ceph – building up the technology and engineering. Co-founder Bryan Bogensberger, who was at DreamHost at the time, put together the business plan and worked with Simon Anderson, the DreamHost CEO, to make this happen at the beginning of 2012. I guess the rest is history. The last couple years we’ve built a team of fifty people – all of the professional services and support stuff that we needed to support real customers, and the sales and marketing and branding to grow up into a real company, not just a project but an actual product.

Did the building of the community in its early days help you?

Sage: Yes. It was a long road, that phase when we were working as part of DreamHost. A lot of people heard about Ceph and thought it was really great and wanted to get engaged. We had some developer contributors and a lot of people who did some testing. One of the early milestones was getting a native kernel client implementation into the Linux kernel in 2010. That sort of raised the profile a lot. I’m engaging with the kernel community; I started attending the Linux storage and file system workshops and working with those people. One of the things that gave Ceph credibility when we were doing Inktank was that we had a community already. This wasn’t a fresh project i.e., “we have this great idea, we should build this software-defined storage system and we’re going to go write some code now.” The code was written and validated from an architectural and an academic perspective. It was really just about hardening it and turning it into a product, and then building the business relationship so that organizations could either adopt it or engage as partners to push the technology forward.

So, going forward, how do you view open source generally? Ceph’s remaining an open source project, but overall if you’re looking forward into the future, how do you see the general open source community evolving, in terms of the context that you have?

Sage: I see open source as a transformative force within the industry, within computing in general and even non-computing areas, as the same ideas could apply to other sorts of things. Storage in particular is ripe for disruption. Twenty years ago we saw Linux totally transform the server industry. We went from a time where you had SGI and DEC and Sun shipping their Linux variants that are all fully-proprietary along with proprietary hardware – so you’d sort of buy the full appliance, if you will, for servers. That’s completely changed now. We’ve commoditized the server market and completely opened up the industry, but the same thing hasn’t happened with storage yet. The goal for the Ceph team and other people who are working on “software defense storage” in the open space is to do the same thing with storage. There will always be people who sell turnkey solutions, but there needs to be the separation of the software set that you want to run from the hardware that you run it on, so can build your own systems and really open up that infrastructure.

…and make it more affordable, too.

Sage: Exactly, bring down the prices and move to a more efficient development model where you don’t have all of these siloed organizations building the same thing over and over again.

That way all of your friends can come back and design the cool things that they made at Harvey Mudd.

Sage: Yeah, yeah. One of the things I’d really like to see, in general and in the academic space, is to make it easier for students, either undergrads or graduates, to move from building systems in the academic space to transition into the open source world, and bring those ideas forward as opposed to going to companies. One of the things to watch out for is that a lot of open source projects today are heavily dominated by corporations, as far as corporations trying to control the projects. It’s all the marketing money that surrounds it. It’s good in the sense that you have a lot of resources, but it’s also dangerous in that the interests of the corporations, who are trying to use these products to build product and sell product, aren’t always in line with the freedoms of the user or the greater good of the industry. So, it’s something to watch out for.

Sage, thank you very much for taking some time to talk with us.

Sage: Thanks for having me. It’s been good.

Great. Thanks.

Group Created with Sketch.
THE NEW STACK UPDATE A newsletter digest of the week’s most important stories & analyses.