Edge / IoT / Networking / Service Mesh

The New Dimensions of Service Mesh at the Edge

14 Oct 2020 11:19am, by

Smart Network Interface Controllers (SmartNICS) puts the service mesh at center stage where the network and the application layer meet. The new dimensions that come with the integration of hardware and software is ushering in a new generation of capabilities such as cryptographic operations and new approaches to resource utilization.

At VMworld last month, VMware featured SmartNICs as part of Project Monterey, an effort to deeply integrate Kubernetes into vSphere. It is evidence of VMware’s changing view of the data center and shows its work with Intel and other ecosystem providers to offer a platform that uses Kubernetes. Of particular interest is the telecommunications market that operates second- and third-tier data centers.

Resources once were seen as plentiful — just add more machines. Now, with the increasing pressure on networks, utilization is now a core issue that will affect the developer experience and how we view networks and applications. Peak loads are a growing concern, making SmartNICs appealing to chip makers, hyperscale cloud services, second- and third-tier data center providers and software vendors.

Identity Matters

The combination of SmartNICS and service mesh opens ways to extend identity management to manage edge devices.

“While a service mesh can function without a cryptographic identity plane, weak forms of identity are inevitably created to permit service-to-service communication and discovery,” said Andrés Vega, a product line manager for VMware Tanzu Foundation Services and the former open source manager at Scytale, which was acquired by HPE. Vega is the product manager for the CNCF SPIFFE and SPIRE identity control plane projects.

“Many service mesh implementations have adopted partial implementations of the SPIFFE specification as a sound identity bedrock. In those cases, the use of SmartNICS allows for acceleration and offload encryption, calculation, and rotation of keys,” Vega said.

With more meaningful boundaries, there’s an opportunity for more specialized service meshes. “There is an emergence of specialized lightweight service meshes optimized for security like the Tanzu Service Mesh or the NGINX Service Mesh,” Vega said.

“SmartNICs can play a very important role in this area,” Vega said.

One of the challenges in hardware is bus encryption. A processor on the network card provides a way to offload those operations from the CPU, Vega said. Adding a strong cryptographic identity can play a role in negotiating and establishing the underlying connectivity on the fly for the service mesh. That is, only if the SmartNIC can be authenticated and the policy allows for such connection.

SmartNICs Basics

For those unfamiliar with SmartNICs, a good example is how they are used by Microsoft.

Microsoft uses FPGAs in its Azure Application Azure Accelerated Networking (AccelNet), which serves as the company’s SmartNICS in the public cloud. SmartNICS are used to reduce the complexity of traditional networking stacks by offloading host networking to hardware, using custom Azure SmartNICs based on field-programmable gate arrays (FPGAs).

The process overhead is significant, with SmartNIC speeds above 25GB, 50GB and 100GB (and 400GB in time to come). Processors in the NIC card are an obvious advantage to perform programmatic networking processing in hardware.

What we are seeing today is a change in the way we think of networking and application technologies. A service mesh is the layer where the two meet.

In real terms, there’s a cost to custom silicon, writes Mary Branscombe in The New Stack about Azure’s use of FPGAs. CPU processing gets offloaded to the network interface card (NIC) that is programmable with FPGAs.

FPGAs have long been used. Branscombe’s article provides an overview of what makes FPGAs a bit different, though. The gates allow for massive parallelism customized to specific algorithms, resulting in lower power consumption than GPUs. As a note, NVIDIA is also partnering with VMware on Project Monterey.

VMware and Intel

There are three aspects to VMware’s partnership with Intel that shows the importance of SmartNICS, said VMware Cloud Vice President and Chief Technology Officer Kit Colbert:

  • Performance: SmartNICS moves the compute closer to the network IO traffic. Performance, increases, latency reduces and core CPU cycles get spared. There are more cycles for applications.
  • Operations Models: A lot of the traditional hypervisor functionality can be placed on the SmartNCS — things like storage virtualization and network virtualization security. It allows VMware to support bare-metal workloads for some use cases. “You start to essentially have a bit of a distributed system within a single physical host,” Colbert said. “ It actually gives us a whole bunch of tools that we can use to simplify operations.”
  • Security: With a SmartNIC, “we now have sort of split the hypervisor,” Colbert said. “Running it across two different CPU you can get a bit more defense in depth.  You get this ability to put all these security functions directly on the SmartNIC, again, without impacting performance.” The network traffic doesn’t get slowed, and it’s not taking up core CPU cycles.

Enter gRPC

Tetrate is a service mesh built on Envoy and Istio. Tetrate CEO and founder Varun Talwar is the co-creator of Istio. He was also the Google product manager for gRPC, a low latency distributed services communication protocol. By Talwar’s estimate, it’s still a bit early in the development cycle, but he expects within two years there will be a deeper need and interest in SmartNICs.

There are three upsides to SmartNICs, Talwar said. In order, they are performance, security, and data loads that can be offloaded.

He cites gRPC as an example of a technology needing the excellent performance smartNICs could provide. gRPC is used inside Google to run its services. A lot of time is spent marshaling services over the wire. SmartNICs could help with latency and performance issues and that would lead to a more adoption for technologies like service mesh, Talwar said.

Data loads will only increase and security can be managed in the hardware more efficiently, Talwar said. For example, cloud service providers are looking to decrease operations costs as the scale-out requirements for services continues to increase. Moving security into the hardware provides better resource utilization and it allows the hardware to be the roots of trust.

“Would the user be in a better state if this were done in hardware and the software knew how to cooperate with hardware? Yeah, for sure,” Talwar said.

There’s some work to do to get all the parties to the table: the software providers, the cloud services and the chip makers who all have a stake in not just running the code in the hardware but knowing what to return back to the software.

But in the end, it’s the developer experience that matters. A SmartNIC may have lots of potential but the challenge is in providing the developer a great experience so they don’t even have to think about what’s underneath when they are using service mesh technologies, library-like approaches such as gRPC or proxies like Envoy with existing brownfield environments.

It will be up to hardware vendors such as Intel, software vendors and cloud service providers to build an open model for cooperation.  The limits being reached with resource utilization is a pain point that is emerging that may hasten cooperation. Resources are getting maximized. The service mesh increases the demand on resources. In essence, the service mesh is adding another tax.

Savvy customers who use service mesh are now asking questions about how to reduce the costs for scale-out architectures.

What we are seeing today is a change in the way we think of networking and application technologies. A service mesh is the layer where the two meet. SmartNICs can be a critical technology to allow for new levels of performance, crypto operations and ways to approach resource utilization. Today, the answer is adding more machines to a cluster but that approach is maxing out.

“Envoy is on track to be mostly everywhere,” said Matt Klein, a software engineer at Lyft and the creator of Envoy. “Optimally, people will not know they are using Envoy, nor Kubernetes, under the hood. The industry is already working on higher-level platform abstractions. SmartNICS and kernel offloads are a logical continuation of that effort down the line.”

A newsletter digest of the week’s most important stories & analyses.