What if network operations were as agile as DevOps and could use CI/CD methods as easily?
Adam Casella and Glenn Sullivan, former network engineers for the Apple data centers that drive iCloud, iTunes and other customer-facing applications, have taken on that challenge. Their company SnapRoute has released the Cloud Native Network Operating System (CN-NOS).
It’s built on a containerized, microservices architecture to provide enhanced network agility and integrate networking natively into DevOps environments.
Traditional vendors have sold networking equipment — hardware and software — that comes as a monolithic blob that can’t be decoupled, Sullivan explained. As in the server world, there’s a trend to disaggregated networking or white-box networking which allows users to write their own software on top of the hardware.
While other people now are writing networking software, but it’s more a reimagining of what’s available from traditional vendors.
“No one has really done it from an operator’s perspective,” Sullivan said.
Hyperscalers — Google, Microsoft, Facebook — can write their own routing stack and protocols because they have thousands of developers to deal with the scale issues no one else has.
“[Others] can’t build network software themselves. So we’re building a network OS that has all this cloud native goodness built into it so they can use some cloud-native tools like Kubernetes natively,” he said.
He calls is an OS like any other, but with the operator experience built in with native support for cloud native tools.
“We’ve fully containerized the OS using Docker containers to create a system of microservices. Other the other systems out there are going to be monolithic blobs. It has all the features built into it. You can turn features on and off, but you get the whole kitchen sink. It’s daemonized — you can start and stop daemons and things like that, … but it doesn’t give you a mechanism for pulling out pieces of this network OS and upgrading them,” he said.
CN-OS is API-driven with the CLI an application that can leverage APIs. This enables the integration of
DevOps tools, in-house tooling or custom applications. There are three main integration points:
- Rest API layer (Kubernetes API)
- CLI interface that won’t require network engineers to learn Kubernetes
- Telemetric streaming layer
It’s built on Yocto, a slim version of Linux meant for embedded systems. The Yocto Linux Base handles the installation, launch and running of containers.
“We’ve kind of stayed out of the religious discussion of Debian vs. Red Hat,” Sullivan said.
“The problem with network devices is they have very specialized hardware. They’re not generic like servers. Network devices have hardware that allows them to forward packets at line rate. … It’s specialized hardware that’s very tightly controlled for initialization and running. …You can’t just take a version of Linux off the shelf and make it work. You really have to have a network operating system that understands that hardware, abstracts it away and gives you an interface to manage that hardware.”
It’s not just another Linux server with a bunch of interfaces, he said.
“Network devices are special because they have a lot of stuff downstream. … [If you want to update something], you can’t do that because you’ll take down all the downstream devices attached to it. You have to treat networking differently because the models and paradigms affect more than the device you’re on.
“We don’t have a separate plug-in or bolt-on to manage the devices to use the cloud-native tools. … You can use the Kubernetes language directly. … We have adapted all our APIs into industry-standard Kubernetes to use Kubernetes to control the network directly.”
On top of the Yocto base, there are infrastructure resource controllers, such as the data plane abstraction (ASICRC), interface resource controller and adjacency manager. The host controller and process manager resource controller handle dependency management, installation of services and ensures coherence between services and the system. The platform resource controller manages the discovery, monitoring, and management of system peripherals for hardware on the switch.
The L2/L3 protocols and associated control-plane resource controllers are containerized. CN-NOS uses the MetaSwitch L3 protocol suite, but it’s separated into multiple Kubernetes native controllers that can be managed independently. The infrastructure and L2 functionalities were built in-house as Kubernetes controllers as well.
The containerized services running on CN-NOS are portable and can be updated, tested, and replaced independently. Using tools and processes like Helm and CI/CD test pipelines, operators can verify a change before deployment and roll back changes as needed.
“We believe that network disaggregation and the empowerment of developers are essential for our ‘new enterprise’ customers,” said Ihab Tarazi, Chief Technology Officer at Packet. “With a completely cloud native architecture, SnapRoute’s CN-NOS provides the truly open, DevOps-driven product-development approach to infrastructure that we require as we scale to hundreds of locations.”
Increasing Agility, Security
The company maintains CN-NOS will increase agility and security by enabling operators to:
- Add and upgrade features and fixes in real-time without downtime requirements, eliminating the need for scheduled maintenance windows.
- Remove, not just disable, unused services to reduce security exposure and the threat surface of the network operating system.
- Assure compliance at any time with the ability to surgically replace only vulnerable services in real time.
For decades, network operators have been in a situation like when Facebook decided all Whatsapp users had to use Facebook Messenger, Sullivan said — or like the situation where a new phone comes with all the features, whether you want them or not.
The difficulty of making changes has meant networking folks gained a reputation for saying no to anything that might have any affect uptime — the sole measure on which they are judged, he said — not on how quickly a new feature is deployed.
“We say that because the network OS is broken into these smaller pieces and you know what you’re getting every time you do a deployment, you can take these incremental steps and build this muscle of doing agile deployment, just like you would on the compute side,” he said.
For instance, DevOps can move quickly to address a vulnerability, he said, but on network side, you have to get a new version from the vendor, it’s a blob containing the whole kitchen sink, so you have to spend time testing every single feature, then you might have to start over because a new vulnerability was found. It doesn’t let you move quickly.
“We’re saying you take this piece and build this muscle of doing small changes. You do CI/CD for networking,” he said.
Feature image via Pixabay.