Development / Open Source

Sourcegraph Aims to be the ‘Google for Code’

15 Jun 2016 9:00am, by

After graduating from Stanford, Quinn Slack and Beyang Liu took jobs at large companies and learned about the many frustrations that developers face. They decided there had to be a better way and set out to build a search engine for open source code.

Sourcegraph is like the Google for code; it’s a public utility for all the developers out there,” Slack said of the company. The San Francisco-based company originated in 2013 and has grown to a staff of 18.

The tool enables users to search within their own code base as well open source code.

“In these big companies, 1,000 other people have already done the same thing in code. There’s just so much repetition and duplication of code,” he said. “People think that developers build all these complex systems, but the tools they use are actually much more primitive than the tools the finance team is using. You have social media analytics for content that’s more advanced than what programmers use.”

Sourcegraph shows how code is used within a company and around the world.

“Someone in Uzbekistan might write a piece of code that GE uses in its jet engines. Our software understands those relationships,” Slack said. “We can help developers search code much better and do other really interesting things, like when you change a piece of code inside a company, show you the ripple effect. What other systems is this going to take down? Who needs to approve it? Does this mean that suddenly people from a black list are able to open checking accounts?” he said.

“It saves the company’s most precious resource, which is time, by not writing the same code over and over again. Not doing those manual tasks, having a human looking over all this code when a computer can do a better job of that.”

Simple as Spellcheck?

The Sourcegraph service is, at its heart, a massive graph database that sits atop their existing workflow if they’re using something like GitHub or Bitbucket.

Sourcegraph is built on other open-source projects including the JavaScript code analysis engine Tern, the Ruby documentation tool YARD, and Jedi.

The company itself created srclib, a hackable, multi-language code analysis library that’s in beta. It supports things like jump to definition, find usages and documentation.

You can search for code by repository, package, or function from the web browser, without having to configure an editor plugin. It seeks to make searching for answers as easy as using spellcheck: Just hover over a piece of code to find a wealth of information about it.

The srclib library handles tasks such as package detection, dependency resolution, and using different version control systems.

It consists of language analysis toolchains for Go, Python, JavaScript, and Ruby with a common output format and developer tools using this format. It’s designed to be modular and extensible. New languages can be added with toolchains that output the srclib format.

It’s available for Go, Java, Python, Scala and JavaScript with ten more languages in progress, including C#, PHP, Objective-C, C/C++ and Perl. Some of those are coming from outside developers who just want the Sourcegraph capabilities in their preferred language, Slack said.

Sourcegraph also lets you embed clickable, interactive snippets of code directly in blog posts and forums. It’s been encouraging developers to enable their repositories to allow others to search and browse the code on Sourcegraph.

“If you’re an open source author, you want nothing more than to make it easier for your users to use that code so your library can be used by more people,” Slack said.

Sourcegraph’s Chrome Extension for GitHub, for Go and Java only, makes every identifier a jump-to-definition link, with documentation and type information displayed by hovering over identifiers. The keyboard shortcut Shift-T allows you to search for functions, types, and other definitions.

Licensing criticism

The company has taken heat, however, by licensing Sourcegraph as Fair Source. The license allows everyone to see the source code and individuals and small groups can publish their own projects with it. It allows companies licensing their products to set a number of free users before customers must start paying. Sourcegraph set that number at 15 users, after which it becomes a paid service. It concedes that all a company’s users could sign up as individuals and get the service for free, but it’s counting on the developer community to “do the right thing.”

“A lot of the tools that developers use are totally closed-source, so we looked at what we could do that was more developer-friendly,” Slack said. “If we went totally open source, then we’d have to find some other business model … So the question is: How do we make the code totally public, but still have a sustainable business?”

They sought out attorney Heather Meeker, a specialist in open source licensing, who wrote a simple, easy-to-understand 22-sentence license. “We decided we wanted to be super transparent,” Slack said.

He pointed to a number of projects in which developers are trying to make a living from code, including developer workspace vendor Codenvy, which uses a derivative of the Sourcegraph license, but limits the number of free users to five. While GitLab considered Fair Source, it ultimately decided that introducing a new licensing model would be too confusing to users.

Redmonk analyst Stephen O’Grady recently wrote that’s it’s harder than ever to make money from software, period, whether open or closed and that smart companies are looking for alternative revenue models.

Meanwhile, Matt Asay, in a post at TechRepublic, calls Fair Source “the worst of all words, because it introduces unnecessary complexity into the licensing process, none of which is intended to benefit developers.”

Feature Image: “Beach binoculars. Nice, France” by Kinolamp, licensed under CC BY-SA 2.0.

A newsletter digest of the week’s most important stories & analyses.

View / Add Comments

Please stay on topic and be respectful of others. Review our Terms of Use.