With applications, services and now networking moving to the cloud, one key question is portability — the ability for a company, SaaS or otherwise, to migrate from one cloud operator to another. In fact, 451 Research reports that a “multi-cloud” strategy can save up to 74 percent, according to their Cloud Price Index.
Doron Samelson is co-founder and vice president of DevOps, responsible for customer success and Teridion deployments. He brings to the company a deep experience in software platform management and cloud computing companies.
Now that companies are comfortable with public and hybrid clouds, the next logical question is how to deploy across multiple operators to best leverage their footprints, capabilities and pricing. The key is to implement processes and architectures that permit this approach.
One example is Teridion’s Global Cloud Network (GCN), deployed across upwards of a dozen cloud operators without having to reinvent the wheel each time. It relies on a management system, abstraction layer and API that introduce a common language when spinning up and breaking down services. For example, if requesting service “A” in operator “B” and parameters “C,” what is the equivalent vernacular in operator “D”? Also, what criteria can one use in deciding which operator to use? A brief review of the Teridion architecture will help before going into technical details.
Dave Ginsburg is chief marketing officer, bringing to Teridion 25+ years of experience spanning corporate and product marketing, product management, digital marketing, and marketing automation. Previous roles included Pluribus, Extreme, Riverstone, Nortel, and Cisco.
Our Teridion GCN is comprised of three primary components:
- Teridion Measurement Agents (TMAs) run on virtual machines within cloud operators, and are used to determine latency across an operator’s backbone or to other cloud operators. These TMAs are the collection points for this report.
- Teridion Cloud Virtual Routers (TCRs) create overlay paths across the Internet based on throughput, latency or geography, to optimize a given customer’s dynamic content.
- The Teridion Management System (TMS) gathers data from TMAs to paint a live view of Internet performance, and maintains APIs to cloud operators to create or destroy TCRs on-demand and based on customer requirements.
The above diagram depicts the system. The TMS, located in the cloud, spins-up TMAs in every region of every major cloud operator. These TMAs are constantly gathering performance data — including throughput via iPerf and latency by establishing TCP connections — and sending this to the TMS.
Any TCRs that already exist do the same via embedded TMAs. The TMS uses all of this data to build a picture of Internet performance. This same data is used by the TMS to deploy TCRs, also in cloud operators. Customer traffic is sent through these TCRs following the most optimal paths, and the TCRs are also used to generate link throughput data. There are two critical operations here:
- Determining which cloud operator – when and where – for TCR deployment.
- Efficiently interfacing into this cloud operator.
More detailed information on Teridion GCN architecture is here.
Cloud Operator Selection
As described above, the TMS develops a real-time view of Internet cloud operator performance. Our customers may select a virtual overlay network based on throughput, latency, or even to prefer or avoid certain geographies. The TMS, therefore, uses this along with pricing to select the best cloud operator(s) for a given customer, and this may differ by region and by date/time. For example, one of our customers has over 500 TCRs deployed, on average, across four different operators. After deploying the TCRs, the TMS establishes the overlay path from one to another, and may also create a standby path for resiliency.
Cloud Operator Interface
The TMS is able to accomplish this via an API that contains the necessary hooks for each of the operators. It needs to be both flexible and capable of very high performance. One way to look at this abstraction between our control plane — the TMS — and our data forwarding plane — the TCRs — is to use the analogy of a data center switch or router. In most cases, a vendor’s network operating system is not limited to a single silicon family, and implements an abstraction layer. We’ve just extended this to the cloud, where the TMS is the cloud networking operating system, and the TCRs — virtual routers within the cloud operators — are the switching elements.
High performance is ensured by how we architect the TMS. For a given region and sub-region, the TMS maintains a list of potential cloud operators. Once it selects a given operator, it issues a deployment order to spin-up a TCR. Note that the same process may have occurred in advance of this in creating a TMA. For either, we implement a Download Server with the TMA/TCR image, and across the Teridion GCN, different TCRs may run different software releases.
The TMAs and TCRs report back to the TMS queuing/logging engine, based on Kafka. This is the funnel for all data arriving from the network nodes. From there, data is shunted in two directions. The TMS builds a real-time view of the GCN via Neo4j, a graphing database. At the same time, we forward performance data to Elasticsearch for analytics, Kibana visualization, and ElasticHQ for cluster monitoring and management. Here is where we determine the Internet state and best paths. Kibana may be queried in real time for this data.
Summary data from our analytics and graph engines is rendered by the user portal, providing a per-customer view. We also implement a REST API over which a customer can design their own UI.
The above architecture may serve as a blueprint for an organization wishing to leverage multiple cloud operators for compute, storage, networking or any combination of the three. What is critical is not only the documented APIs, but a backend intelligence that can decide which operator to use, preferably in an automated way.
Teridion is a sponsor of The New Stack.