Malte Schwarzkopf — currently finishing his PhD on “operating system support for warehouse-scale computing” at the University of Cambridge — has released a series of slides describing some of his research into large-scale, distributed data architectures.
Schwarzkopf and his team at Cambridge Systems at Scale are aiming to build the next generation of software systems for large-scale data centers. So it has been essential for him to understand how some of the current data giants are configuring their full stack at present, in order to build software for the next wave of businesses that grow with a need to work at a similar scale. Along the way, he has contributed to a number of open source projects including DIOS (a distributed operating system for warehouse-scale data centers that uses an API based on distributed objects); Firmament (a configurable cluster scheduler that looks to apply optimization analysis over a flow network); Musketeer (a workflow manager for big data analytics); and QJump (a network architecture that reduces network interference and provides latency messaging).
Schwarzkopf’s slide deck builds on his extensive bibliography into the Google stack.
His research finds that warehouse-scale computing (defined at 10,000-plus machines) requires a different software stack, all aiming to help increase the utilization of many-core machines, and allow fast, incremental stream processing and approximate analytics (like that offered by BlinkDB) on large datasets. (Many-core is a term meant to indicate a level of magnitude greater than multi-core.)
Schwarzkopf’s research spells out the three main characteristics that many of the largest data-driven companies like Microsoft, Twitter and Yahoo have in common with Google and Facebook:
- “Frontend serving systems and fast backends.
- Batch data processing systems.
- Multi-tier structured/unstructured storage hierarchy.
- Coordination system and cluster scheduler.”
In his presentation, “What does it take to make Google work at scale?” Schwarzkopf discusses the architecture behind those 139 microseconds between submitting a search request in the Google input bar, and the pages of ads-and-search results that are returned.
All of what happens, Schwarzkopf says, takes place in containers between customized Linux kernels on each data machine and the transparent layer of distributed systems.
He identifies 16 different software technologies that work in tandem to return the real-time, contextual, personalized search results that users expect from Google.
- GFS/Colossus: a bulk block data storage system.
- Big Table: a three dimensional key-value store that combines row and column keys with a timestamp.
- Spanner: Software that uses the GPS and atomic clocks within data centers to enable transactional consistency at a global scale.
- MapReduce: a parallel programming framework.
- Dremel: a column-oriented datastore useful for quick, interactive queries.
- Borg/Omega: the father of Kubernetes, a cluster manager and scheduler for large-scale, distributed data center architecture.
It’s unclear where Schwarzkopf may have presented this work so far: his bio page and Twitter feed don’t indicate that the slides were released in conjunction with any particular talk, it was just provided in a link from a tweet dated August 17. While high-level, the presentation slides are clear enough to provide useful insights into the infrastructure map needed to make distributed architecture work at scale, and there are enough links and resources mentioned that anyone working in the area has plenty of interesting rabbit holes to wander through in late-night research or post-lunch procrastination.