The History of the New Stack: Scale-Out Architecture

Scaling out, as opposed to up, means adding more power to an application by adding more machines, instead of upgrading a machine by adding a faster CPU or more memory. You can think of it as an argument for quantity over quality. Strictly speaking, scale-out computing would include things like pre-built clusters and supercomputers that include multiple system boards designed to work together. But when people talk about scale-out, they tend to mean building clusters of cheap, commodity hardware.
There are several advantages to this approach.
First, they’re redundant. An application won’t go offline just because one hard drive fails. Second, they can be scaled incrementally. That is, you can add additional servers as you need them. Third, they tend to be more cost effective, though there’s some still some debate on exactly what scale you need to reach before scaling out becomes cheaper than scaling up. Scale-out architectures were popularized by Amazon and Google during the 2000s, but the idea actually goes back to the early days of commercial computing.
1955-1965: The First Cluster
DEC coined the term “cluster” in the early 1980s when it introduced its VAXCluster line. But Greg Pfister, author of In Search of Clusters, says clusters were most likely invented in the early 1960s or even late 1950s. Though no one knows exactly who built the first clusters, Pfister says, but he has a pretty good idea of why. “They were invented by the first customer who looked at one of these expensive [mainframes] and thought: ‘If this thing goes down, my businesses crashes. We need a backup.'”
Those customers would have quickly realized that keeping the backup computer sitting around doing nothing was wasteful, and decided to combine the power of both machines. Those two system clusters are a far cry from today’s thousand node Hadoop clusters, but every idea has to start somewhere.
1983: The Cosmic Cube
Although there were experiments with parallel computing through out the 1960s and 70s, the first example of what we might today call scale-out may have been Cosmic Cube, built by Caltech researchers Charles Seitz and Geoffrey C Fox.
Seitz and Fox began building the Cosmic Cube in 1981 and completed it in 1983. It consisted of 64 separate x86 system boards and processors, the same components that IBM was using for desktop PCs in those days. The Cray X-MP was one of the most powerful supercomputers of its day, though only had four processors, according to Extreme Tech.
The Cosmic Cube was only about 10 percent as powerful as the single processor Cray 1, according to an Engineering and Science article published in 1984. But at $80,000, the Cosmic Cube cost only about one percent what a Cray 1 cost. “The Cosmic Cube provided a dramatic demonstration that multicomputers could be built quickly, cheaply, and reliably,” Geoffrey C. Fox, Roy D. Williams Paul C. Messina wrote in their book Parallel Computing Works.
“Its performance was low by today’s standards, but it was still between five and ten times the performance of a DEC VAX 11/780, which was the system of choice for academic computer departments and research groups in that time period,” they wrote. “The manufacturing cost of the system was $80,000, which at that time was about half the cost of a VAX with a modest configuration.” The Cosmic Cube inspired many other cheap, bespoke supercomputers, such as Caltech’s Mark II and Mark III systems. But the concept was eclipsed by the supercomputing industry. Companies like Cray and Fujitsu were building machines with thousands of processors, and companies like IBM were selling clusters such as the Parallel Sysplex. But the idea of scaling-out commodity hardware wouldn’t stay forgotten for long.
1993: Beowulf Brings Open Source Clustering to the Masses
In 1993, a computer scientist and early Linux developer named Donald Becker realized that desktop computers actually offered better performance for the money than supercomputers.
“Previously, you would find the best price-performance [ratio] at the very high end, with the supercomputers,” he told GCN in 2005. “But that was becoming increasingly less true. Workstations had the lead for a while, but clearly PCs were starting to offer the best price-performance.”
Becker was working for the Institute of Defense Analysis Supercomputer Research Center doing research for the National Security Agency at the time, but his employers weren’t interested in the idea of scale-out computing. Fortunately, Becker and his collaborator Thomas Sterling managed to find funding through NASA. There, they created a 16 node cluster of off-the-shelf PCs running Linux. Sterling named it Beowulf, after the protagonist of the epic poem, who was said to have had the strength of 1,000 men. Most importantly, they open sourced the software that united the PCs into a single system, enabling researchers all over the world to build their own cheap supercomputers.
2003: Google Reveals Its Scale-Out Secrets to the World
At first, commodity clusters were used mostly by university researchers and hobbyists. But somewhere along the way engineers at companies like Amazon and Google realized that it made sense to scale-out their web server operations than to scale them up. But because Beowulf was never designed with server web pages and managing databases, Amazon and Google had to build their own software. And though the companies didn’t open source their tools, both did publish white papers explaining their methods.
First came the Google File System paper in 2003, which detailed the workings of the company’s distributed file system. The company followed this with the MapReduce paper in 2004, which explained its approach to breaking down large problems into smaller pieces, farming those problems out to a cluster, and then compiling the results in a single answer. In 2006, it published a paper on its distributed database BigTable.
Amazon followed suit in 2007 with its Dynamo paper, which explained how it scaled data storage out across thousands of servers. Those papers led to a slew of imitators. In 2005, Doug Cutting and Mike Cafarella began building Hadoop, which was based on both the MapReduce and Google File System papers. Powerset built Hbase, a BigTable clone, in 2007. In 2008, Facebook, released Cassandra, a data store based on both the Dynamo paper and the BigTable paper. All of this activity culminated in the first NoSQL meetup in 2009, by which time the idea of scale-out data storage was cemented in the minds of forward thinking developers.
Today
Today, scale-out is no longer an edgy idea. But not everyone thinks it’s the best approach. Last year Microsoft Research published a paper arguing that for all but the largest workloads, scaling up is actually more cost effective than scaling out, and that even companies like Facebook and Yahoo use clusters to solve problems that would be more effectively solved on a single server.
Meanwhile, a company called ProfitBricks, has bucked the trend by offering customers the ability to both scale out and scale up.
“The founders of ProfitBricks have a pretty good idea of what people actually use dedicated machines for, what their CPU and memory requirements are,” says the company’s VP of marketing William Toll. “One of the challenges of vertical scaling in traditional databases, like MySQL, is that they can be very memory hungry.”
For those types of legacy applications, a single, powerful machine is usually better than several smaller systems. Nonetheless, scale-out remains a core part of today’s infrastructures. “I think without question modern applications are designed around cloud technologies,” he says. “Cloud technologies tend to mean access to a crazy number of virtual machines so it’s easy to scale-out.”
Image by seier+seier