The pioneers who formed the core of the SDN ecosystem came together in 2011 to establish the Open Networking Research Center (ONRC) and Open Networking Lab (ON.Lab). According to the web site, ONRC is part of Stanford University and ON.Lab develops, deploys and supports open source SDN tools and platforms.
The open source SDN controller – Open Network Operating System (ONOS) – which is the topic of this article, is an outcome of this effort from ON.Lab. ONOS is a distributed-system — an SDN controller platform designed specifically for scalability and high-availability. With this design, ONOS projects itself as a network operating system, with separation of control and data planes for wide area network (WAN) and service provider networks. Though ONOS is a relatively newer controller, the amount of documentation available at wiki.onosproject.org is truly commendable. However, we have relatively less exposure to ONOS, and this article may get updated in the future as we spend more time working on ONOS. But before we describe ONOS in detail, we will first introduce three topics to help the reader appreciate the design principles of ONOS:
- Intent-based networking
- Distributed controller architecture
- SDN and Service Providers
Oxford defines intent as “an aim or plan or purpose.” This term has crept into the domain of networking over past several years. Dave Lenrow of HP puts it nicely by saying intent is all about “what,” and not about “how.” He begins his argument by posing an interesting question: “What if, instead of describing how we want a network to be configured, we could describe how distributed workloads behave? Instead of requiring detailed network configuration expertise and understanding network protocols and equipment interfaces, network managers would only need to specify the distributed workload’s behaviors and communications requirements.”
To understand this in detail, and to appreciate Lenrow’s question, let us first look at application-specific policies that can be seen as statements (or descriptions) of application intents. They are usually associated with the network design for applications with varied requirements, e.g., a three-tier application comprising a set of Web application servers connected to a public Internet through an application delivery controller (ADC) and using a set of database servers as a back-end data store. Typically, policies — for example: the traffic between the Internet and the server must pass through a set of firewall rules – are a defined set of rules that decide all aspects of the packet flow in the network. To enforce such policies, there is a need to configure, which can be very low-level and vendor-specific: the physical network infrastructure, the endpoints, and the appliances [1,2]. This Configuration depends on the concrete physical network topology and technology, and on the installed management tools and processes. Hence, the configuration (of endpoints and interfaces) directly affects the connectivity. To provide both agility and simplicity, we need a mechanism (for example, a language) for describing the abstracted intent of connectivity. With this mechanism, the user doesn’t need significant networking knowledge to describe the requirements for connectivity. Additionally, this intent should be decoupled from network forwarding semantics so that the end user can describe the policy in such a way that a change in policy need not affect forwarding behavior, and the converse .
Hence, the trend that stressed the users to focus on their intents — “what they want,” rather than “how they want it implemented” — and allow network layers to figure out how to accomplish the intent, took prominence over the past few years. The goal of the intent-driven systems was to shift the focus from networking details to the needs of the distributed network application. Some works where we’ll find the use of ‘intent’ are listed below — readers are requested to refer to these solutions to dig deeper into intent-based networking:
- IBM’s Dove: Intent-based Approach to Networking Virtualization 
- Group-Based Policy (GBP): an intent-driven policy API for OpenStack or Cisco’s Application-Centric Policy Model using Group-Based Policy, or intent-based network automation through Puppet Enterprise.
In summary, the term intent-based refers to the property, whereby the management abstraction relates to the functionality of the network, allowing a person to express, formalize, and verify it.
Distributed SDN Controller Architecture
Considering the number of works — both open source and commercial — on controller, highlights the fact that control platform is the crucial enabler of the SDN paradigm. Some of the most important challenges in building a production-quality control platform are the scalability and reliability . Control plane scalability – where the network control plane is implemented as a distributed system – has been one of the hot research topics in SDN. To emphasize the importance of distributed architecture, researchers have argued how the network information is managed in the controller typically dictates the scalability and reliability properties of the system . For example, as the number of network elements (switches/interfaces) in the network increases, a network information base (NIB) that is not distributed could exhaust the system memory. In addition, the number of network events and corresponding handling-processes could grow exponentially in some cases (such as a switch breakdown) to saturate the CPU-usage of a single controller instance . To support such scalability and reliability requirements, there have been various proposals for distributed SDN controller architectures – where multiple instances of the controller platform run to manage the network. How they interact, how the information is shared, etc., are all different in different solutions. There have been various techniques such as partitioning/sharding, aggregation, replication, etc., used for different work. Below we enlist different works on distributed SDN controller architectures, the details of which can be obtained from the included references:
- Onix: A Distributed Control Platform for Large-Scale Production Networks  — ONIX is proposed as a platform on top of which a network control plane can be implemented as a distributed system.
- HyperFlow: A Distributed Control Plane for OpenFlow Networks  – Proposes a logically centralized and physically distributed event-based control plane for OpenFlow.
- Kandoo: Hierarchical Distribution of the Controllers  — Proposes two layers of controllers, where the bottom layer controllers run only local control applications, and have no interconnection or knowledge of the network-wide state, and the top layer is a logically centralized controller that maintains the network-wide state.
- DISCO: Distributed Multi-domain SDN Controllers  – Proposes an open and extensible distributed SDN control plane able to cope with the distributed and heterogeneous nature of modern overlay networks and wide area networks.
- ElastiCon: Towards Elastic Distributed Controller Architecture  — Proposes a controller pool, which dynamically grows or shrinks according to traffic conditions, and the workload is dynamically distributed among the controllers.
- Pratyaastha: An Efficient Elastic Distributed SDN Control Plane  — Proposes a novel approach for assigning SDN switches and partitions of SDN application state to distributed controller instances.
SDN and Service Provider Networks
Over past few years we have seen a rapid increase in the number of mobile devices, which has resulted in exponential growth of traffic, numerous and interesting over-the-top services, and adoption of cloud and cloud-based services by service providers. Service providers rely on rich, reliable and differentiated services to generate revenue and retain customers, and constant service innovation is the only way to win in the cloud battleground. Consequently, as described by the ONOS white paper [9,10], service providers are:
- Exploring possibilities to make their networks agile and efficient to meet the challenges of these exponential bandwidth demands.
- Looking to create revenue streams with innovative services and new business models.
- Looking to reduce CapEx and OpEx by taking advantage of the technology innovations.
To address these challenges, it has necessitated the service providers to rethink their networks . Software defined networking has emerged as the paradigm that has the potential to transform these networks by delivering cloud-style agility and innovation and reinstating economic viability. Many service providers, such as NTT, AT&T, etc., have embraced SDN and NFV to improve the value of their network. “SNS research estimates that by 2020, SDN and NFV can enable service providers (both wireline and wireless) to save up to $32 billion in annual CapEx investments.” 
ONOS – Open Networking Operating System
The reason for introducing the above three concepts is that they form the core of the design principles of the ONOS controller platform. ONOS is a scalable and distributed controller platform that targets service provider networks and service provider’s requirements, such as policy-driven network programmability and being operator-friendly.
ONOS argues that building a control platform for service providers involves solving difficult distributed systems problems to address the requirements of high availability, scale-out and performance. Along with these three key attributes – availability, scalability, and high performance – ONOS also targets support of multiple protocols at the southbound interface to communicate with diverse devices, and expose right APIs at the northbound interface in order to accommodate the needs of service provider use cases and application developers.
Figure 1 shows the architecture of ONOS. From the figure, we can see the presence of multiple instances of the ONOS platform, highlighting the “distributed” nature of the system. ONOS architecture can be seen as a three-tiered collection of multiple ‘subsystems’ (also referred to as services), where each subsystem realizes a service and is implemented as a combination of components present in three different layers — application, core and southbound protocol layers. Some of the example subsystems are Cluster, Device, Packet, FlowRule, Path, Link, Host and Intent. In the remaining part of this section, let us look at each layer of ONOS in detail.
ONOS platforms do boast of being designed to support, similar to other controllers, various application categories such as control, configuration and management applications. Among the applications that are published by ON.Lab, some of them are Segment Routing, multi-layer SDN control, topology viewer, path computation and SDN-IP peering applications. [9,10,11,12]
Similar to other controllers, applications work on the information that present at the core-layer – via sending and receiving command request and response, respectively, and event-handling. ONOS core layer exposes the two interfaces, AdminService and service, which are used with applications to work on the information managed by different service components in the core. Using its name, each application registers with CoreService, which in turn provides the application with a unique ApplicationId. This identifier is used by ONOS to keep track of tasks and objectives, such as intents and flow rules, associated with an application.
Distributed Core Layer
As explained in the beginning, ONOS provides this policy-driven programmatic framework to enable users to specify what they need without worrying about how it will get instantiated on the underlying network – abstraction of network complexity from higher layers.
The Intent Framework is a subsystem within the ONOS core. These Intents, policy-based directives, get translated and compiled into specific instructions that get installed on network devices. Intent can be described in terms of network resource, constraints, criteria and instructions. In addition to translation and compilation, the framework also includes the support for managing changes in network conditions and optimization across intents, including realizing complex functionality with combination of intents. As shown in figure 1, this functionality is realized by four different sub-components: intent manager, installation worker, intent store and resource scout. The framework mainly includes intent compilers that translate intents into installable intents that are more specific to the network environment, and coordinators that determine how the network must be programmed, including the order of the installation at a device/resource level.
Topology Management and Global Network View
ONOS also provides the network graph, and the view of the entire network, as the northbound abstraction. This global network information is presented as logically centralized, even though it is physically distributed across multiple servers. The global network view is built out of network topology and state discovered by each ONOS instance, such as switch, port, link and host information. The network view data model was implemented by combinations of solutions, such as the Titan  graph database, the Cassandra  key-value store for distribution and persistence, and the Blueprints graph API, to expose the network state to applications. 
ONOS Distributed Core: Scalability, HA and Performance
ONOS’s Distributed Core is designed to target scalability, HA and performance by including different state of the art distributed-system techniques as summarized below. The subsequent section – ONOS clusters — will describe the usages of some of these techniques in detail.
- ONOS includes an anti-entropy protocol (a type of gossip-based protocol) to realize synchronization between multiple instances of the controller.
- There are multiple databases with high-availability, reliable transactions, scalability and performance improvement approaches such as replication techniques, strong-consistency and partitioning. ONOS includes all these techniques, and also includes eventual consistency model, which informally guarantees that if no new updates are made to a given data item, eventually all accesses to that item will return to the last updated value.
- Vector clocks, one of the approaches for generating a partial ordering of the events in a distributed system, is used in ONOS. The messages that are exchanged between processes contain the state of the sending process’s logical clock (or an array of logical clocks for multiple processes), which helps in generating the partial ordering.
- There are many techniques, such as distributed queuing and queue-sharing groups, that many researchers have used to increase the availability of the systems. In ONOS, HA execution is supported by distributed queues.
- ONOS uses Hazelcast — an in-memory open source software data grid based on Java – for its cluster membership management.
Other Managers: Device, Packet, FlowRule, Path, Link, Host Managers
Most of these managers include a functionality that it implements and interfaces with both southbound providers and northbound applications. For example, DeviceManager is capable of interfacing with multiple southbound providers via a DeviceProviderService and DeviceProviderRegistry interfaces, and multiple northbound listeners via DeviceService and DeviceAdminService interfaces, as shown in figure 2.
Pluggable Southbound: Providers and Protocols
ONOS supports multiple southbound protocols – Openflow, NetConf, etc., for communication with a variety of net devices. ONOS, similar to other controllers (ex: ODL), uses the concept of providers — one each for every southbound protocol – which hide protocol complexity (any protocol specific behavior or requirement) from other components of the controller platform. These providers provide all the necessary ‘descriptions’ of network elements to the core layer. The term ‘pluggable’ highlights the fact that anybody can develop one’s own device/protocol-specific provider and register with core. Following the registration, the provider and core communicate with each other by (a) notification of new events (device connected, packet_in) by the provider to the core as descriptions (immutable and short lived messages that contain a URI for the object it is describing); and (b) issuing of commands from the core to the elements under provider control.
Similar to the application-identifier at the northbound interface, ONOS also uses the concept of ProviderID, which is assigned to every provider in the southbound interface. ProviderID serves the purpose of unique identification and proper mapping with the devices. Finally, from the subsystem perspective, multiple providers (typically designated as primary and ancillary) may be associated with a single subsystem. A device subsystem supports multiple providers.
In ONOS, unlike other controllers, distributed-architecture support is one of the design principles, and not an afterthought support. ONOS is also similar to five to six of the distributed architectures described above in the Distributed SDN Controller Architecture section. That is, ONOS can be deployed as collection of controller-servers that coordinate with each other to achieve resiliency, fault-tolerance, and better load management. As seen in the existing distributed architectures, there are various challenges, such as master-selection, network state distribution and management, etc., that distributed controller architectures much address. Let us take “Cluster Coordination” as a case study. Each instance is aware of the subset of network states, and the same is shared across other members of the cluster as events by the instance that manages the subset (Figure 3). Hence, different subsystems’ (service) stores, which generate the events, include a distribution mechanism. For example, a cluster subsystem manages nodes joining and leaving the cluster – which is implemented using Hazelcast’s distributed structures (strongly-consistent). Similarly, the link and host managements use optimistic replication technique and gossip protocol to ensure eventual consistency (note that for eventual consistency events are partially ordered with vector clocks).
In a clustered (multi-instance) environment, there can be various failures ranging from a complete failure (crash) of a node to a node in the cluster unable to receive updates from its peers. To address these issues, ONOS uses approaches such as the anti-entropy mechanism, which is based on gossip-protocol and periodic probing of nodes.
Writing Application in ONOS
Quite like a typical Maven project, ONOS is also a collection of Java class files bound together using the Project Object Model (pom.xml). ONOS is essentially an OSGi-compliant framework for binding together the jar files created after compilation of the Maven bundles, using Karaf as the framework implementation. The pom.xml file provides all the binding glue by holding information about the dependency of the bundles, which are satisfied by OSGi at the module loading time.
The ONOS Project team provides well-written, elaborate documentation, which includes a basic tutorial of detailed API listings. Thus, rather than duplicating that information, this article presents a birds-eye view of the overall process of writing an application.
Writing an application for ONOS can be understood through a flow diagram representing various important steps, as depicted below:
Steps for Creating an ONOS Application:
- As mentioned above, ONOS uses Karaf to implement the OSGi framework, breaking applications into bundles. The first step is to create the directory layout for such a bundle. This directory layout would contain the pom.xml file and various Java class files holding the core logic.
- The pom.xml file is placed in the root of this directory structure. It also contains reference to the ONOS root pom.xml as a parent POM. A skeleton directory layout for any typical project can be automated by using Component Templates.
- All the applications written in ONOS need to have their code glued to Karaf using annotations like @Activate or @Deactivate. These annotations act as hooks for Karaf to call the application’s java code at various events like bundle loading and unloading. These calls are important points to set up necessary variables, hooks to other services and initiate the application logic. Then, the next step is to create entry and exit functions with these annotations to allow Karaf to load your application.
- For registering other services available with Karaf, within the startup method using the CoreService.registerApplication() method of the org.onosproject.core.CoreService package, register the application with a unique name for Karaf to recognize.
- Within the startup and cleanup methods, request Karaf for all the services the application is expected to use. For example, the application might require packets received by ONOS to be delivered to it — PacketProcessor interface needs to be added as a “Processor”.
- Once the skeleton is ready to be glued with Karaf, it is time to write your core logic, a.k.a the business logic. That part is done in the Java files, which would then be compiled to appropriate class files. Once the code is completed, build/compile the application using the Maven commands.
- Maven would take care of compiling all the application java files, linking all the dependency bundles, or downloading those bundles which are not locally available.
- Once the Maven process is completed, the Java archive (jar) would be placed at a location within the ONOS folder for Karaf to pick it up without developer intervention.
- Finally, start the application using the ONOS CLI. “feature:install <application name>” command.
 Cohen, R.; IBM Res. Lab., Haifa, Israel; Barabash, K.; Rochwerger, B.; Schour, L., “An Intent-Based Approach for Network Virtualization” 2013 IFIP/IEEE International Symposium on Integrated Network Management (IM 2013), 27-31 May 2013.
 Thomas Graf, Intent Driven Networking with OVS and Vendor Neutral Hardware Offload, OVS Conference 2014
 T. Koponen et al., “Onix: a Distributed Control Platform for Large-Scale Production Networks,” in OSDI, 2010.
 A. Tootoonchian and Y. Ganjali, “Hyperflow: A Distributed Control Plane for Openflow,” in INM/WREN, 2010.
 1S. H. Yeganeh and Y. Ganjali, “Kandoo: A Framework for Efficient and Scalable Offloading of Control Applications,” in HotSDN, 2012.
 K. Phemius, M. Bouet, and J. Leguay, “DISCO: Distributed Multi-Domain SDN Controllers,” CoRR, vol. arxiv.org/abs/1308.6138, 2013
 Advait Dixit, Fang Hao, Sarit Mukherjee, T.V. Lakshman, and Ramana Kompella. 2013. Towards an Elastic Distributed SDN Controller. SIGCOMM Comput. Commun. Rev. 43, 4 (August 2013), 7-12.
 Anand Krishnamurthy, Shoban P. Chandrabose, and Aaron Gember-Jacobson. 2014. Pratyaastha: An Efficient Elastic Distributed SDN Control Plane. In “Proceedings of the Third Workshop on Hot Topics in Software Defined Networking“ (HotSDN ’14). ACM, New York, NY, USA, 133-138.
 ON.Lab white paper, “Driving SDN Adoption in Service Provider Networks”, 2014.
 ON.Lab white paper “Introducing ONOS — A SDN Network Operating System for Service Providers”, 2014.
 Prajakta Joshi, “Introducing ONOS,” Webinar, 2014 http://www.opennetsummit.org/ons-inspire-webinars-onlab-onos-nov11.php
 Pankaj Berde, Matteo Gerola, Jonathan Hart, Yuta Higuchi, Masayoshi Kobayashi, Toshio Koide, Bob Lantz, Brian O’Connor, Pavlin Radoslavov, William Snow, and Guru Parulkar. 2014. ONOS: Towards an Open, Distributed SDN OS. In “Proceedings of the Third Workshop on Hot Topics in Software Defined Networking” (HotSDN ’14). ACM, New York, NY, USA, 1-6.
 Titan Distributed Graph Database. http://thinkaurelius.github.io/titan/.
 A. Lakshman and P. Malik. Cassandra: A Decentralized Structured Storage System. ACM SIGOPS Operating Systems Review, 44(2), 2010.
Sridhar received his Ph.D. in computer science from the National University of Singapore in 2007; his M.Tech. degree in computer science from KREC, Surathkal, India, in 2000; and his B.E. degree in instrumentation and electronics from SIT, Tumkur, Bangalore University, India, in August 1997. He worked as a Research lead at SRM Research Institute, India; post-doctoral fellow at Microsoft Innovation Center, Politecnico Di Torino, Turin, Italy; and as a research fellow at the Institute for Infocomm Research (I2R) in Singapore. He has worked on various development and deployment projects involving ZigBee, WiFi and WiMax. Sridhar is currently working as the Group Technical Specialist with NEC Technologies India Limited. Sridhar’s research interests lie mainly in the domain of next-generation wired and wireless networking, such as OpenFlow, software defined networking, software defined radio-based systems for cognitive networks, Hotspot 2.0 and the Internet of Things.
Feature image via Flickr Creative Commons.