Networking

How Bolt Scaled Remote Work with Tailscale’s Zero Trust Mesh VPN

1 Jul 2022 4:00am, by

Ryan BreslowWhile he was still in high school, Bolt‘s founder Ryan Breslow started building e-commerce sites for businesses of all sizes. It didn’t take long for him to realize that the world needed a better checkout platform: to start with, the user experience (UX) needed to be faster and simpler.

Fast forward past his days in Stanford University’s computer science program to the company’s founding in 2014, its launch of the payment step in 2016, and the official launch of its payments and checkout platform in 2018.

Today, Bolt has grown from its initial team of 10 — including technical members from Google, Facebook, Twitter and Airbnb — to more than 600 team members and hundreds of retailers around the globe.

Bolt’s checkout platform bundles all the tools retailers need for payment processing, payment gateways and fraud detection, as well as checkout processes like calculating coupons, tax and shipping costs. This, plus several other improvements, speeds up and simplifies UX. Once shoppers join the Bolt network, they can check out with only one click from participating retailers. Retailers and content creators can expand their storefronts to any digital surface, such as social media feeds, chat, and digital publications.

Traditional VPNs Don’t Scale Easily

In 2021, Bolt’s growth accelerated even faster. But its virtual private network (VPN) at the time, a typical, single Amazon Web Services‘ EC2 instance running OpenVPN Access Server, could not be easily scaled to keep pace with that growth.

Since company employees are entirely remote, scalability was definitely a requirement.

“We wanted exponential improvement, not just marginal improvement,” said Roopak Venkatakrishnan, Bolt’s head of platforms and infrastructure. “So the question was how we could set up our network to do more, faster, and with a better user experience.”

They also looked at other alternatives, including AWS Client VPN, Nebula Client VPN, and extending its existing OpenVPN Access Server. “Initially, we just wanted to fix our VPN, but then we realized that a VPN was only the starting point, and we wanted to do more than that,” he said.

The current solution wouldn’t scale without making significant changes to Bolt’s network design, such as adding peering connections, or Transit Gateways between virtual private clouds (VPCs) within AWS, or even just relying on public networks for all of its cross-region or cross-cloud connectivity, said Jake Edgington, Bolt’s site reliability engineer.

“The Tailscale mesh network just took care of all of that complexity,” he said. “The bang for the buck was just nowhere near what we could get by moving to Tailscale.”

Mesh Topology Gives Higher Throughput, More Scalability

Venkatakrishnan originally suggested trying Tailscale‘s VPN because he’d already used it at home for his own personal projects, such as connecting his servers and Raspberry Pi on his personal Tailscale network.

Tailscale was founded to give developers a secure, scalable alternative to traditional VPNs: small, trusted, human-scale networks to work in, where everyone on that network can access everyone else’s devices and applications, but access is denied to anyone outside the network.

The idea is simply to scale the systems, instead of coping with the overhead required for scaling those systems, or with hassles in the development environment. Tailscale’s service can also be used by enterprises and businesses to reduce the complexity of internal networks or to grant remote access to employees working from home.

Tailscale’s mesh topology creates a peer-to-peer network, which connects each device to every other device directly. This contrasts with the traditional, hub-and-spoke VPN network architectures that send network traffic through a central gateway. A peer-to-peer mesh network produces lower latency and higher throughput and maintains existing connections even when switching to a different network, such as from wired to WiFi.

“I’d describe the end goal of Tailscale as a complete inversion of how you normally design a network,” Edgington said. “It’s agnostic about where devices are located — things like different clouds or user locations — it just stops mattering because of the mesh network.”

For Bolt, Tailscale’s subnet routers made it easier to transition from a traditional VPN to a mesh network. “We were able to break up our network into smaller subnetworks and deploy subnet routers for each of them,” he said.

Better Security, Fewer Hassles

Traditional VPNs normally have either a single entry point or a small number of entry points, into the network. Those entry points require very broad network access, to all of the parts intended to be accessible to all of the network’s users.

As more VPCs, regions, or even multiple clouds are added to the global network, a lot of heavy lifting is needed to connect those private networks, Edgington said. “With Tailscale, I didn’t have to worry about any of that. I can just spin up a new Tailscale node on any network, in any VPC, any region, or in any cloud, and it’s connected to our private Tailscale network.”

The zero trust aspect of Tailscale’s network was also very important to Bolt’s decision-making process.

“Traditional VPN solutions typically require you to provide very broad network access to the VPN servers themselves, and then restrict user or group access at the point of VPN connection,” Edgington said.

But Tailscale’s mesh topology means each node on the network, each device, can communicate directly if the network access control policy permits it. So there’s no single trusted point at the network edge, like with a traditional VPN, which is better from the security standpoint. “If a device is compromised, it’s much harder to move laterally through the network, because you don’t have the same level of wide-open network access,” he said.

Tailscale also eases both development and deployment. “It’s easy to install Tailscale just about anywhere — EC2 instances, GitHub Actions, CircleCI runners, or containers in a Kubernetes cluster,” Venkatakrishnan said. “We were able to use the Tailscale GitHub Action to connect some of our existing GitHub Action workflows within minutes.”

Secure Connections to External SaaS Tools

Another thing that drew the Bolt team to using Tailscale was how much more seamlessly it works with software-as-a-service tools, “basically anything running on someone else’s network or hardware, which was not easily achievable with our existing VPN,” Edgington said.

Since “Tailscale makes it very easy to connect anything running on pretty much any cloud or any network securely to our private networks,” Bolt is also connecting SaaS tools like GitHub Actions to its Tailscale network, he said.

At first, Bolt began with a smaller offering and later expanded it. “We started with a single subnet router,” Venkatakrishnan said. “After that, we added multiple subnet routers in different availability zones and regions, including subnet router failover for high availability. Then we started using access control lists to control user access to specific subnetworks.”

Just like a traditional VPN, Bolt is using Tailscale to give all of its employees secure access to internal network resources. One employee, after installing Tailscale, messaged that it had changed her life.

“Now that I’ve started using Tailscale, I can never go back,” she wrote. “The login is first action in the dropdown, and I’m done in 1 click! No SSO required. It loads so fast. I unreasonably love this, I can see all you wonderful [people] in the network.”

The New Stack is a wholly owned subsidiary of Insight Partners, an investor in the following companies mentioned in this article: Tailscale.