How has the recent turmoil within the OpenAI offices changed your plans to use GPT in a business process or product in 2024?
Increased uncertainty means we are more likely to evaluate alternative AI chatbots and LLMs.
No change in plans, though we will keep an eye on the situation.
With Sam Altman back in charge, we are more likely to go all-in with GPT and LLMs.
What recent turmoil?
CI/CD / DevOps / Software Development

Turborepo: Speedy Builds for JavaScript Monorepos

Mar 23rd, 2022 6:00am by
Featued image for: Turborepo: Speedy Builds for JavaScript Monorepos

While many software shops appear to be migrating towards monorepos as a way to store code, the build management tools they are using — especially for JavaScript/TypeScript — appear to be too sluggish for these new environments.

But there is a downside to monorepos. The convenience of storing all code in a single repo is offset by the extra build time it requires to churn through this code whenever a new change is added.

This is the problem that developer Jared Palmer encountered when building his own app (a TypeScript runtime, TSDX). He was building this project in a monorepo, where all the code, including dependencies, was located in a single repository, and he wanted to structure TSDX so that it could be managed in a monorepo as well. When he vented his frustrations online, he found many others felt the same.

So he created Turborepo, a open source monorepo build tool, that, according to Palmer, could boost build speeds by roughly 65% to 85%. In a few outlying instances, it has reduced a 30-minute build to 100 milliseconds, Palmer asserted.

“Turborepo is really good at what it does: Ridiculously fast builds,” enthused one engineer on Twitter.

In addition, Palmer geared the software to be super-intuitive to single developers and small teams. Turborepo has indeed garnered praise from reviewers in this regard when compared to NX, a similar project created by former Google engineers.

So impressed with the software, Vercel acquired the technology, filling out its portfolio of web development technologies, which also includes the Svelte next-generation front end framework and the Next.js library for augmenting the React framework with server-side rendering capabilities.

The Big Code Problem

The problem of managing large amounts of code in a uniform manner has been around for a while and has been exacerbated by the explosion of web development, which relies on a diversity of open source packages and a certain swiftness of delivery.

Why does one need a build system for JavaScript? Although JavaScript can be run directly in the browser, this is rarely done any longer, Palmer explained. Libraries such as React require multiple tools such as JSX that need to be compiled. But if more than one software team is using JSX, the organization as a whole quickly finds itself multiple, and sometimes conflicting instances of JSX, which is a logistical and security nightmare.

The answer that the IT giants have come to is to store everything in one giant repository (the “monorepo”). In addition to better managing the code itself, a monorepo sets the stage for uniform coding style and testing across the organization.

Google, Facebook and Uber have all gone this path, as have the keepers of React itself.

A 2015 talk from then Google Engineer Rachel Potvin on why Google uses one gigantic monorepo to store all its code. (YouTube).

A 2015 talk from then-Google Engineer Rachel Potvin on why Google uses one gigantic monorepo to store all its code. (YouTube).

The general build tools haven’t kept up with this evolving environment, however. While web giants Facebook and Google have both developed internal toolsets to tackle the latency issue (open sourced as Bazel and Buck, respectively), these tools required extensive configuration and were designed for large, engineering-heavy organizations.

Palmer was more interested in building a tool that would be more easily used by smaller teams. Enter Turborepo.

Caching and Parallelization

The faster build times come from a couple of different ways.

One is smart caching. For this, the software borrows a technique from Google’s Bazel, built around content-addressable storage.

Turbo looks at “the state of your codebase,” Palmer explained. It also logs the commands that are being run to build the software, making a fingerprint that serves to index the finished work. When the dev types the same sequence of commands, Turbo then can quickly deliver the cached version rather than repeat the work.

“Turbo constructs a dependency graph, both of the external dependencies from package registries, and also the internal dependencies within your codebase,” Palmer explained. The developer provides the dependency information in a turbo.json configuration file the root of the project.

In collaborative environments, every developer’s cache is shared, so one dev can reuse the work of peers.

Compare this to venerable make command, which only looks at the modification times of the files or folders specified, rather than the fingerprint of the actual artifact. Different computers will produce different timestamps for the exact same time, which will cause build systems to miss otherwise identical files.

In addition to using cached work, Turborepo also looks for places to split the build into parallel operations.

The developer’s pipeline, or task graph, provides “a very concise way for developers to express the relationships between the scripts they need to run to build their codebase,” Palmer explained.

A comparison of the work pipelines between Turborepo and Lerna, one that shows more parallel execution by Turborepo. (Turborepo docs)

Turborepo uses this info to determine which operations could be run in parallel, thereby cutting build time by running multiple tasks at the same time, when possible to do so. This is something that can’t easily been done through traditional JavaScript build tools. “They run things only in dependency-first order,” Palmer said. They don’t have the additional info needed to understand how these tasks relate to each other.

Here is a sample json pipeline configuration file (from the docs):

The Turbo command-line interface is open source and operates from the repo. The end-user can host the remote cache index, or use Vercel’s managed service, which comes with additional features such as metrics-based visualizations.

Also unique to Turborepo is that it can be incrementally adopted. Other build systems can “make constraints on your codebase and how it works and how it needs to be shaped. And while those constraints may be great at certain scales, they can be very costly and expensive and risky to migrate to,” Palmer said. In contrast Turborepo aims to “meet developers where they’re already at, with tools they are already using. And so it’s designed to be adopted, and in some ways deleted too. ”

It is still early days for Turborepo, Palmer admits. (The latest version is 1.1.16). The setup is still complicated, and requires some polishing, according to at least one user.

“Turborepo is a really cool project. And it’s not just cool, it’s really necessary — there clearly was missing some tool like this as monorepos are more and more popular,” wrote frontend architect Štěpán Granát in a blog post, while adding that the software’s inconsistencies point to work still needing to be done for production usage. “I still better run our main release pipeline without any caching as I want to be sure that something is not getting cached when it shouldn’t be as that really could be a big problem.”

Group Created with Sketch.
THE NEW STACK UPDATE A newsletter digest of the week’s most important stories & analyses.