Networking / Contributed

What’s Preventing Disaggregation in Networking?

28 Mar 2019 3:00am, by

Adam Casella
Adam Casella co-founded SnapRoute in August, 2015, where he is responsible for setting technical direction and ensuring the operator’s perspective is always seen. Adam’s background of supporting networking products as a vendor, paired with his operational experience running large-scale data center networks gives him a unique perspective on how to build reliable, resilient, and easy-to-use products. Adam is an authority on both disaggregated and traditional networking technologies and especially their use in hyperscale spine/leaf CLOS designs and topologies. Prior to founding SnapRoute, Adam was responsible for designing and building hyperscale data center networks at Apple. Before Apple, Adam was a lead engineer in Cisco’s TAC on the LAN and Data Center Switching teams — giving him deep insight on silicon pipelines, hardware and software architecture, and a strong background in debugging and troubleshooting complex, multifaceted technical issues.

It’s a common adage that white box and disaggregated networking will never supplant traditional OEM vendors who offer tightly coupled hardware and software. This folklore states that uptime, reliability, and SLAs can only be delivered with a network appliance and a tightly coupled hardware/software stack. It’s believed that disaggregation makes the system less reliable, more complex, harder to manage, and will end up costing nearly the same or sometimes more than traditional offerings.

But what is really at the root cause of these concerns? At its core, this is an expression of trust that business places in the software and hardware that makes up their infrastructure. Most network operators are risk averse, even if there are clear benefits to changing the way network hardware and software are procured.

However, there is a clear distinction between how cloud and service providers are benefiting from disaggregation in contrast to the rest of the market. Hyperscalers — companies such as Facebook, Google and Microsoft — have been reaping the benefits of disaggregation for years. As a result, traditional OEM vendors have been forced to respond by augmenting their product suite.

The benefit of disaggregation isn’t felt by everyone — these offerings are typically at the behest of hyperscalers and aren’t meant for the general public. Others in the industry have been left to figure out disaggregation on their own, which makes it difficult to see a clear line from where their businesses are today to where they need to be. Add to this that any anticipated savings generated from white box hardware has been eaten up via the network operating system (NOS)  software and support, the promise of cost savings via disaggregation has been a fallacy. This has put operators in the tough position of choosing between keeping the lights on and trying to accommodate the needs of modern applications.

So How Do We Move Beyond this Impasse?

The first step is to move networking away from the legacy model (both technically and financially) which OEMs have been using for the over 30 years. Disaggregation, thus far, has failed as white box NOS vendors have simply attempted to reimplement the legacy OEM model, but with a separation of hardware and software. This thinking is misguided, similar to that of organizations that “forklift” applications from on-premise into the public cloud, and haphazard. As a result, it has led to skyrocketing costs, ill-suited management toolsets, and a lack of visibility into application performance — common pitfalls seen when care is not taken during the shift to the cloud.

Analysts at Gartner have developed a well-defined strategy called the five “R”s — that guides decision makers through the cloud migration process to prevent the baggage of legacy environments from encrusting the cloud. While so much thought and care have been put into these strategies, why has networking not learned from this? We know “fork-lifting” an outdated architecture from one environment to another is rarely the best strategy, why has disaggregated networking not applied these principles? The legacy model used to build network operating systems does not work for disaggregated networking and does not fit into the cloud native environments of today.

So What Is the Legacy Networking Model?

Proprietary, closed, monolithic, binary blobs — that is the legacy networking model — one enjoyed by the networking industry for over 30 years. While applications have advanced and moved from bare metal, to VMs, and finally to cloud native containers, networks have been stagnant with OEMs supplying systems and tools to build and manage infrastructure in an increasingly old-fashioned way. Network operators have grown accustomed to closed, proprietary networks, which are managed manually and don’t automate Day 2 operations.

This has naturally resulted in application owners that are not able to work with the network they have and have to work around instead. We saw this in the early adoption of SDN when overlays used to go over-the-top of legacy network infrastructure. When that wasn’t enough, applications teams bypassed on-premise infrastructure entirely and moved to the public cloud.

This outsourcing of infrastructure has become necessary for many businesses as the legacy model of networking does not supply the flexibility, uptime and reliability they require. We live in a global 24/7 marketplace where any loss in availability of an application can have long term cascading effects on the business. We know that hyperscalers and public cloud providers have moved to a more agile method to manage their networks, and, as such, are using disaggregated networking heavily. This is one of the many of the reasons why they can deliver the infrastructure modern applications need. Organizations don’t have to build this themselves, they can simply leverage the work already put into this effort by the public cloud providers. However, it comes at a very high price — just look at what Lyft is spending for AWS.

When you look what is supplied by hyperscalers and public clouds, they do not look at infrastructure as separate silos. They’ve created a platform that is designed to implement faster time-to-service for applications, with a solution for layers two thru seven. DevOps folks can now deploy apps without thinking about underlying infrastructure beyond an API call. This is the same methodology used by the CNCF and others embracing cloud native.

With the apparent long-term adoption of multicloud and hybrid-cloud approaches, it is clear that cloud native methods are at the forefront. Operators now see the public cloud as an extension to their on-prem environment, not a replacement for it. In order for white box networking to gain traction, cloud native principles cannot be ignored and must be embraced. This also means that the cost equation must radically change as well. Industry players have long promised that white box/disaggregation adoption would generate significant cost savings. Unfortunately, this has not been the case and has been a major contributor to the lack of broader market adoption. Network operating center vendors have created this dynamic by absorbing most of the white box hardware savings via software licensing and support.

Operators can’t continue to manage their on-prem and cloud infrastructures differently and expect to provide the level of service that their customers demand. The competition is too fierce and the infrastructure is too large and complex. White box adoption will continue to flounder until it embraces the cloud native methodology and delivers significantly lower costs. The old legacy networking model just won’t cut it.

The networking industry innovation and cost paradigm that has been in place for decades needs to be broken. A cloud native approach to networking that marries network infrastructure and modern applications would allow for unison of on-prem deployments under the orchestration of Kubernetes management.

Without this bridge between cloud native methods and the network, even considering the potential for up to 50 percent cost savings versus today’s legacy systems, we won’t see mass adoption in white box. The legacy networking model network is dead — technically and financially. Cloud native networking is the future, along with its improved and efficient operating model. The only way to gain adoption in white box is to fundamentally change how business and operations of the network is done.

CNCF is a sponsor of The New Stack.

Feature image by TanteTati from Pixabay.

A newsletter digest of the week’s most important stories & analyses.