This Week in Programming: GitHub Copilot Tests the Copyleft
CoPilot, GitHub’s machine learning-assisted code completion feature continues to generate controversy in some quarters of the open source community.
Late last month, Microsoft’s GitHub has moved the Copilot service from beta into a paid offering, starting at US$10 a month (but still free to students and developers of large open source projects ).
Because of this move to charge for the service, two open source advocacy groups, the Free Software Foundation and Software Freedom Conservancy, both recommended that developers who care open source software cut their ties with GitHub altogether. This is a big ask, given the almost universal use of GitHub.
But for SFC, GitHub charging for a service built on Free and Open Source Software (FOSS) was the breaking point.
“We’ve learned from the many gratis offerings in Big Tech [that] if you aren’t the customer, you’re the product. The FOSS development methodology is GitHub’s product, which they’ve proprietarized and repackaged with our active (if often unwitting) help,” wrote SFC’s Denver Gingerich and Bradley M. Kuhn in a fiery blog post. “FOSS developers have been for too long the proverbial frog in slowly boiling water.”
Now a video demonstration https://t.co/Q8CApri50V of autocomplete vs intellisense vs Intellicode vs #Copilot https://t.co/DQfvKy5UCb pic.twitter.com/lEp4OX4ucU
— Scott Hanselman 🇺🇦 (@shanselman) July 1, 2022
Since its inception last year, Copilot has drawn critical attention. GitHub bills the service as “pair programming with AI,” with the aim of cutting out the part of the coding process where developers look for pre-existing solutions on Stack Overflow or Google. To build the service, GitHub paired with another Microsoft entity, OpenAI, to train the models, by scanning the repositories on GitHub to build up a knowledge base to provide these suggestions.
Many developers love the service, though others wonder if GitHub and parent company Microsoft are too aggressively appropriating the work of others.
A lot of open source projects on GitHub have a copyleft license, which demands anything made with the code must also be made available as open source. But in Copilot’s case, the code isn’t used directly, but rather as an input to create entirely new code. Does copyleft apply to this use of code as well? That’s the question on the table.
“FOSS developers have been for too long the proverbial frog in slowly boiling water.”
Even beyond the legalities, there is an ethical question to consider as well. Many see it as an aggressive land grab of open source intellectual property. In a contributed post to The New Stack, Sasha Medvedovsky, the CEO of source control management service provider Diversion, wrote:
If a developer doesn’t want their code to be used in commercial applications, they should be given a right to refuse. If they are ok with it, then there’s no problem. But companies (be it Microsoft, Google or Amazon Web Services) shouldn’t just assume that if they give something for free they can take something else in return.
This is not the first time Microsoft-owned company has overreached, Medvedovsky noted. He pointed to an incident earlier this year when an open source programmer, who goes by the handle of Marak, intentionally broke the code of his open source Faker mock data generator, allegedly to protest the lack of funding for his popular projects which are used by hundreds of companies.
GitHub’s response? The code repository giant reverted the malicious changes — presumably to protect users from running broken code — and denied Marak access to his own projects.
In discussions with SFC, Microsoft and GitHub executives claimed that use of this open source code falls under fair use, since this code is public anyway.
But SFC feels this is wrong. GitHub is using open source code to build a proprietary service that can be accessed only by way of a paid subscription. Also of note is that Microsoft did not provide any code from its own proprietary software offerings, notably Office and Windows, so it is clear, in SFC’s view, that the project did not want to use Microsoft’s own intellectual property. So why is it fair to use open source code, the organization asks.
What do you think? Does Copilot violate the spirit of open source? Or is a natural evolution of programming we soon all will enjoy?
In the future there will only be two data structures, DAGs and data frames and you will only be able to be either a DAG person or a DataFrame person and the hostility will become so intense that someone will do a talk on it with stills from West Side Story.
— Vicki (@vboykis) June 29, 2022
This Week in Programming
- Lazy Loading May Finally Come to Python: Python may finally chill whenever the import module is called, according to an interview in a recent Talk Python to Me podcast. A proposal, PEP 690, is being floated that describes a way that a Python interpreter would load external modules only when needed, rather than loading them all at once. Pretty cool that the Talk Python crew spoke to all the principles of the PEP for the inside scoop: Instagram engineer Carl Meyer, Meta Senior Software Engineer Germán Méndez Bravo, and LinkedIn Senior Staff Engineer Barry Warsaw all participated in the interview. Though at first glance, the Python Import may look similar to C’s include directive, they explained, it has a lot more power: It can call modules that call out to other modules, including those that can call out to the internet. This cascading of multiple modules can slow a server to a crawl, Meyer had said, pointing to Instagram’s own experience.
- Who Let the Bots Out? I guess he can talk about it now, but software engineer Anders Conbere posted a harrowing tale of how closely Etsy was to being taken down by rampaging chat bots. It started innocently enough. He worked for Etsy between 2012-2014, and just before one of the company’s legendary Hacker Weeks, he put together a Scala-based bot that would scan the chat logs of his friend Avi, who recently departed the company. The bot would ingest everything Avi wrote into a Markov Chain, and then when someone would reference Avi in an internal IRC chat, it would respond an Avi-like statement. So, for Hacker Week, Conbere built off this work to create a bot that would assume an employee’s name and issue statements accordingly. This turned out to be a minor annoyance, as the bots replicated across IRC, so they were locked away in a #purgatory channel to be forgotten. It was only then that he realized that the bots could replicate not only user accounts, but also the company’s Chat Ops, which was set up so that an engineer could issue a command — to spin up a server, for instance — and the bot could actually execute that command. “The terrifying realization was that [they could] execute all the other chatops commands, the ones that tear down servers, the ones that kill processes,” Conbere wrote. The bots were killed before they went full Chaos Monkey on the online marketplace, and perhaps not a second too soon.
— Dan (@G_I_T_C) June 28, 2022
- People on the Move: Long-time TNS readers probably know of Chip Childers, who acted as CTO and later the Executive Director of the Cloud Foundry Foundation. Last year he decamped to become to Chief Architect at Puppet (perhaps following colleague Abby Kearns, the prior Cloud Foundry executive director before Childers). This month, however, Childers takes the role of chief open source officer at VMware. Childers has written eloquently about the open source ecosystem for TNS and we hope he continues this run from his new office (hint). Perhaps a more controversial (and quieter) hire has been Microsoft picking up Lennart Poettering, the one-time controversial figure behind the creation of the systemd Linux system configuration software (Poettering was formerly at Red Hat). The controversy came about largely due to how systemd broke the POSIX system model that Unix relies on, a move he said will be increasingly necessary to speed Linux development.
When there’s competition in DevTools, developers win:
◆ Svelte is pushing React
◆ Remix is pushing Next.js
◆ Prisma is pushing ORMs
◆ Deno is pushing Node.js
◆ Supabase is pushing Firebase
◆ esbuild / SWC are pushing JS tooling
◆ Bun is pushing SWC
— Lee Robinson (@leeerob) November 30, 2021