Data / Development / DevOps / Contributed

Universal Code Search: A New Search Tech for the Era of Big Code

9 Mar 2020 12:24pm, by

Quinn Slack
Quinn Slack is CEO and co-founder of Sourcegraph. Prior to Sourcegraph, Quinn co-founded Blend Labs, an enterprise technology company with over 500 employees dedicated to improving home lending. At Palantir Technologies, he created a technology platform to help two of the top five U.S. banks recover from the housing crisis. He was the first employee and developer at Bleacher Report after graduating from high school. Quinn graduated with a BS in Computer Science from Stanford.

The way the world creates code has shifted. We’ve entered the era of Big Code.

Big Code is all about how code is growing in:

  • Volume: Exponential increases in the amount of code.
  • Variety: Way more complexity in the languages, tools, and processes for delivering software.
  • Velocity: Accelerated delivery cycles that mean code is changing faster and being shipped virtually every day.
  • Value: The reimagination of business models and practices through high-quality software.

This digital transformation has been great for everyone, leading to countless innovations that improve our lives. But it’s been hard on developers.

The problem is that in the face of this increased complexity, developers still need to efficiently write and make changes to their enterprise’s code to meet tight deadlines and stringent quality and security requirements.

Sub-optimal developer productivity in the era of Big Code is a losing proposition for any company. Development lags mean late releases, poor quality, frustrated teams, unhappy customers, and uncompetitive products.

Google and Facebook were among the first to solve this problem, investing hundreds of millions in their customized, proprietary code search infrastructure for internal use.

But what about everybody else?

The answer is the Universal Code Search. Digital pioneers like Uber, Lyft, Yelp, Qualtrics, and others adopted this technology as the right solution to enhance developer productivity in the era of Big Code.

Universal Code Search enables developers to traverse the complex universe of interdependent codebases — a plethora of different programming languages, code hosts, repositories, version control systems, services, and APIs — to find the code and other information they need to do their jobs in today’s collaborative, multi-dimensional development environment.

Traditional developer tools such as editors and IDEs were built for individual developers working on a sole repository, not for teams working with large codebases at scale, and thus are severely limited for search. GitHub, in its effort to broaden as a software development platform, is improving its code search capabilities, but a single code host inherently can’t be a universal, cross-repository solution.

Universal Code Search is different. A single, highly scalable way to explore, navigate, and analyze all of an organization’s massive stores of code, regardless of system, repository, or language, it is uniquely suited to address big code’s four V’s:

  • Volume: Code exploration, navigation, intelligence, and change management across the whole codebase.
  • Variety: The same functionality is available for every programming language, code host, etc.
  • Velocity: Search all branches, diffs, and commits — code search across history, or get alerts about coming changes.
  • Value: Improved developer productivity that drives more value for the entire enterprise.

Instead of constantly searching in disparate codebases, developers using Universal Code Search can discover and re-use existing code across repositories rather than re-inventing wheels, understand and debug code, figure out the right library or service for a certain task, and share code links to help teammates with best practices.

API owners can see and monitor who’s using their code and how, upgrade API consumers’ call sites across all repositories, and deprecate old APIs.

For DevOps and security teams, Universal Code Search can pinpoint the source of an error or hole, identify code changes responsible for the incident, evaluate the performance of specific lines of code in production, and apply patches and upgrades across all repositories.

In the era of big code, code search needs to be not only how developers discover and explore the entire enterprise codebase but also how they can quickly understand code in context. Universal Code Search enables better, more sophisticated code reviews and is the fundamental technology behind code change campaigns.

Universal Code Search’s core capabilities include:

  • Code search: Ability to quickly find, understand, and change all of the code that they are responsible for — including the activities of code discovery, code intelligence, and code change management
  • Code navigation: Guided travel to find specific code via ad-hoc query
  • Code exploration: Find both known and unknown code, with code intelligence providing the contextual understanding
  • Code discovery: Navigate, explore, and understand the code you are looking for, even if you didn’t know it existed
  • Code intelligence: Display and share additional contextual information around code
  • Code change management: Perform large-scale code changes with campaigns

To be effective, search must be universal across several dimensions:

  • all repositories
  • all programming languages
  • all code changes (commits, branches, and forks)
  • all file formats
  • All other developer tools that generate metadata about code (such as for logging, tracing, and profiling)

Universal Code Search meets all the requirements, and all users have to do is search via a browser, shell, or right inside popular tools like GitHub, GitLab, Bitbucket Server, Phabricator, Perforce, and Subversion. The results are instant.

Among the companies benefiting from Universal Code Search: Lyft. In 2018, the ride-sharing giant embarked on the largest code refactoring project in the company’s history: a decomposition of its PHP monolith to microservices so that Lyft’s infrastructure could be nimbler to serve its 30 million riders.

Such a mammoth refactoring of large codebases is delicate work. For instance, the API of a shared library may need to be updated to support a new feature, but doing so may necessitate changes to dozens or even hundreds of downstream dependents. The number of places in code that must change as a result of updating one shared dependency can easily swell to thousands of points scattered across different components owned by different teams.

Thanks to Universal Code Search, Lyft was able to tie together a wealth of key information, from repositories on code hosts to dependency relationships among projects and application runtime information, and ensure production stability during deployment of the new microservices.

Online personal finance company SoFi (Social Finance Inc.) decided to switch its code host from Bitbucket to GitLab but then quickly realized they would need a more powerful tool to search through their hundreds of repositories. The fintech company’s fast growth was making it difficult to maintain a complete list of the published APIs showing the interdependencies of their services.

With Universal Code Search, SoFi can determine which microservice is referenced by another, safeguarding against breaking production with code changes and avoiding code duplications. The technology allows SoFi to fully understand the scope and breadth of how code changes impact other microservices.

As Universal Code Search technology continues to develop, it will move beyond search, navigation, and analysis toward proactively recommending code fixes and other actions — similar to how Google, when you search for flight information, now lets you book a trip right from the search results page.

By solving some of the most pressing challenges in today’s fast-moving, intricately connected software organizations, Universal Code Search is the right technology for the era of Big Code.

Feature image via Pixabay.

A newsletter digest of the week’s most important stories & analyses.