API Management / Networking / Sponsored / Contributed

How We Built an Open Source Drop-In Replacement for gRPC

26 May 2021 11:00am, by

JT Olio
JT is the CTO at Storj. He oversees product development and led the re-architecture of Storj’s distributed cloud storage platform. He was previously Director of Engineering at Space Monkey, which was acquired by Vivint in 2014. JT has an MS in Computer Science from the University of Utah and a BS in Computer Science and Mathematics from the University of Minnesota.

Our team at Storj is building a decentralized cloud object storage and when we decided to build it using Go, we thought we’d also utilize gRPC for peer-to-peer remote procedure calls in client/server interactions. gRPC is designed for environments like the one we are building at Storj. It connects services together with easy code generation, robust wire protocols, and low overhead. Also, gRPC has some serious clout. It is open source and widely used by companies in the cloud — including Cockroach Labs, Docker, Dropbox, IBM, Netflix, Square, Wikipedia, and many more companies you’ve likely heard of.

However, we discovered that gRPC needed improvements — at least for us. How could this be? After all, gRPC was built by Google and it’s used by all the “cool” companies. But for next-generation, decentralized architectures, it wasn’t for us. So after trying to make gRPC work for us for too long, we looked for an alternative.

Issues with gRPC

When we started looking at our object storage product to identify performance improvement opportunities, we found that we were spending a lot of time fighting gRPC. gRPC is monolithic and not very modular; it has lots of feature bloat. For example, instead of providing a modular mechanism to establish an underlying socket transport, gRPC wants to own the entire dialing system and has spawned a dizzying amount of complexity in connection state management. When it came to debugging at 2 a.m., we wanted a much simpler state machine.

It’s not entirely gRPC’s fault though — gRPC is based on HTTP/2, which has its own share of problems. In Go, supporting HTTP/2 easily adds a couple of megabytes to your libraries, while also having production issues like head of line blocking. Overall, the addition of gRPC and its dependencies is huge and in total it made up about 1/5th of our total binaries. We also found the amount of resources it utilized on certain parts of our network to be off the chart — 81% of the heap usage of our storage nodes.

We needed a streamlined RPC tool that would do slightly fewer things, but do them very well. So we built our own framework — DRPC — that was a mere 3,000 lines of code, required very few dependencies, and could greatly improve many important functions in our system.

Introducing DRPC: A Drop-in Replacement for gRPC

DRPC was our solution to gRPC’s weaknesses. It is a drop-in replacement for gRPC, so if you’re currently using gRPC, migrating to DRPC (the “D” doesn’t stand for anything in particular) is as easy as swapping out your protocol buffer generation pipeline to get up and running. We even put together an example of how you can migrate a live service to DRPC.

DRPC retains many of the key pieces of functionality of gRPC. It supports unitary and bidirectional streaming requests, it has an HTTP/JSON gateway, it supports metadata for per-request side-channel information (like tracing), it reduces memory usage, and it supports layering and middleware. Read more about the functions supported by DRPC in the README or our documentation (which also includes some impressive performance benchmarks).

Migrating to DRPC from gRPC

When we built DRPC, we already had gRPC rolled out in our production service. To migrate with minimal downtime, we needed to support both gRPC and DRPC. This may seem like a daunting task, but ultimately we were able to migrate with zero downtime.

The way we did this is that we observed that gRPC connections all use the same couple of initial protocol bytes. We extended our transport layer to watch and demultiplex on the initial bytes over an open socket. If the first few bytes started with “DRPC!!!1”, then we knew the incoming request was not gRPC over TLS, HTTP, or any other existing protocol. Via this switching behavior, we were able to extend our servers to handle both the gRPC protocol and DRPC protocol at the same time, on the same sockets, and on the same ports.

Once we were convinced that all of our servers spoke DRPC, then we began the process to upgrade all of our clients to DRPC. As soon as we were sure that all of our clients were upgraded to DRPC, we could take gRPC out.

For Go programmers, we’ve included these same helpers for you to use in the DRPC package, and provided examples here.

At this point, we have years of production DRPC usage across tens of thousands of servers under our belt, so it’s ready for you to use too.

DRPC Improvements Since Launch

A couple of weeks ago, we published an article announcing the public launch of DRPC for others to use. We have been thrilled with the reception!

Already it is our second-highest-starred repository on Github, was the top-voted submission in a number of developer social media watering holes, and has driven a ton of great discussion and interest. We had a number of exciting discussions on Reddit, Lobste.rs, and even in our Github repository’s Issues tab.

People seem excited by DRPC — we’ve had volunteers to help with documentation and additional language bindings, and in the days since the launch we figured out how to add Twirp compatibility, Websocket compatibility, improved some ergonomics around Javascript and browser interactions, optimized code to reduce memory allocations and improve speed even further (sometimes up to 90% fewer allocations and seven times faster in some micro-benchmarks), and a bunch of other things. One contributor just succeeded in getting her NodeJS bindings working against Go processes and vice versa.

It’s early days for DRPC. We’ve had a number of people tell us they are planning to adopt it in their products or switch to it, and so now we get to excitedly wait and see (and help if asked!) what the community builds on top!

Lead image via Pixabay.

A newsletter digest of the week’s most important stories & analyses.