CI/CD for Networking: Adopting DevOps Principles for a More Robust Network

Since early 2017 the term CI/CD has been steadily gaining popularity to describe the philosophy of software development which dictates the use of test-driven infrastructure to constantly improve the code base with continual integration of enhancements — based on feedback from both users and the system alike. This process of Continuous Integration and Continuous Deployment (or if you prefer, Delivery) has become the de facto standard for software development and has also been co-opted by a growing number of organizations as a way to implement “infrastructure as code” — using software development principles to manage their systems and compute environments.
This adoption of CI/CD by DevOps-minded organizations has led to more flexible, secure and agile infrastructure with new features and functionality being constantly deployed and improved upon. With these architectures in place, the compute and storage pieces of the environment have benefited from the rapid deployment and constant improvement that CI/CD mandates. The missing piece that still remains unaddressed is the network. Built upon legacy, monolithic architectures — the Network Operating System (NOS), as it exists today, is not able to take advantage of CI/CD philosophies. What is needed is a new NOS architecture that is built upon core cloud-native principles of microservices and containerization to enable CI/CD methods for networking.
Learning from DevOps
On the compute side of the house, operators have understood the importance of on-demand adaptability and evolution for quite some time. CI/CD has become a popular term with the rise of tools like Jenkins, Travis CI, and services such as Codefresh. These tools and services are driven by a common goal — give developers a way to quickly deploy safe and well-tested improvements to applications. For those on the outside looking in (i.e. network operators), it may not be immediately clear what the driving force is for these toolchains — keeping the infrastructure current and fresh.
Application infrastructure is not seen as something that is treated in a “one and done” fashion. As with any infrastructure (technical and non-technical alike) — it’s important to develop a process for continuously building upon and improving it. Take, for instance, a well-traveled interstate highway — if the road is simply paved and not maintained, it will quickly fall into disrepair, with unreadable markings, overpass failures and dangerous potholes. When it comes time to repair these backlogged issues, it becomes costly, complicated and potentially dangerous. The same is true with application infrastructure — without a process in place to continuously and safely deploy improvements, these environments quickly devolve into a hodgepodge of deployed software versions, tools and security patches. This leads to a fragile infrastructure that requires complicated maintenance windows that often result in application downtime, on purpose or by mistake.
It Takes a Village
Embracing the necessary cultural changes an organization must make is just as critical as vaulting the technical hurdles necessary to build a CI/CD pipeline. The biggest opponent that needs to be defeated, across the board, is the fear to change and subsequently the inertia this creates. This stagnation surrounding infrastructure change is understandable, most teams are rewarded by uptime and stability — not agility. Incentivizing teams in this way leads to a culture of “no” — as anything new is seen as something new to break (and might affect the precious uptime metric).
This culture covers the spectrum of restraint that is expressed by teams when managing a network — from the understandable hesitation by a single engineer to push a simple change to production to the crushing paralysis that is experienced by an entire team with the mystical aurora that surrounds the semi-annual maintenance windows or change freezes. Conquering the fear for change is the first step in developing a successful CI/CD structure — no matter which part of the infrastructure it is being applied to. With CI/CD, things will (and must) change on a constant and regular basis — without any hesitation from concern that the halcyon status of the production environment will be disrupted. Things break, figure it out quickly, fix it and then move on. When teams embrace the cultural aspects of CI/CD — post mortems are transformed from a witch hunt for who caused the outage to a set of action plans to improve the process and ensure that the root cause of the outage never occurs again. These action plans are then integrated directly into the CI/CD pipeline ensuring this case is tested and verified for all future deployments.
The Opportunity for the Network
Looking at how DevOps has applied the technical and cultural aspects of CI/CD yields some practical applications for operators managing the network. Yes, network engineers are limited by the capabilities of appliance-like Network Operating Systems available today. In a DevOps focused Cloud Native Network OS — with all protocols and management functions built as microservices with immutable containers — network operators can natively leverage the capabilities of CI/CD. With the NOS built of containerized microservices, operators can deploy only the features they need, reducing complexity and security attack vectors, upgrade components during production with no fear of disrupting other services, and use cloud native tools such as Kubernetes to natively manage these network services. Until a NOS embracing these architectural advantages is available, there are some steps that operators can take to improve the network and benefit from some of the core concepts of CI/CD for Networking:
- Establish and document clear roles and responsibilities for each team — having a well-defined scope of ownership for each piece of infrastructure is key to building automation and CI/CD pipelines.
- Identify the network changes that are manual, repetitive and don’t require peer review to push into production (think VLAN changes, bringing up/down host interfaces, etc.) and allow the ops team to make those changes during the day, without a formal change request. These are the changes that will be automated away as you progress down your CI/CD journey.
- Change the typical time of maintenance windows from the middle of the night — to the middle of the day. Changes shouldn’t happen when key staff needs to be paged in to respond to an issue — they should happen during normal business hours when everyone is at their sharpest and is collocated to facilitate collaboration and quick troubleshooting.
- Your network infrastructure isn’t sacred and it shouldn’t be treated as such — changes and failures are going to happen. We need policy and procedures in place to ensure they occur as often as is necessary to drive the business needs. Force change to happen by rebooting key devices weeks before your busiest time of year. Stop embracing a device’s uptime as the ultimate measure of stability.
- Stay current with security patches and upgrades to resolve vulnerabilities. If you are on a once-a-year upgrade cycle (or longer) — get it down to every three months or less. Compute teams don’t sweat upgrades, because they do it often, as well as test and validate them prior to rolling them out — to build a culture of change. Maintenance is like a muscle — the more you use it, the stronger it gets.
- Start small with rolling constant configuration changes and build upon the easy wins. If your compliance requires quarterly local password changes — perform these changes, monthly and on a continuous basis — instead of letting them stagnate, fall behind, and build up more security tech debt. Establishing a cadence of constant rolling changes is crucial to establishing a good CI/CD pipeline.
Feature image via Pixabay.