Copilot Lawyers Checking Claims against Other AI Companies
The attorneys that filed the GitHub Copilot lawsuit say they’re getting messages every day from creators concerned about how their work is being used by artificial intelligence companies, according to lead attorney Matthew Butterick.
The New Stack asked whether the attorneys planned to add other names to the lawsuit against GitHub, Microsoft (as an owner of GitHub) and OpenAI for using open source code from GitHub to train OpenAI Codex, which powers Copilot, a code-generating tool for programmers.
“We’re investigating all these claims,” Butterick told The New Stack via email last week. “It’s appalling that AI investors and companies, in pursuit of profit, are already settling on a strategy of massive IP infringement. It’s not going to work. There are going to be a lot more cases challenging these practices.”
OpenAI Codex plans to offer its Codex models through an API, and maintains a private beta waiting list for companies that want to build other offerings on top of the tool. There are other AI-based completion tools, as well, including AWS’ Code Whisperer, whose FAQ noted that its “ML models trained on various data sources, including Amazon and open-source code.” Amazon did not return an email inquiry about Code Whisperer. Likewise, Visual Studio’s AI-assisted development tool, Visual Studio IntelliCode, makes recommendations “based on thousands of open source projects on GitHub each with over 100 stars.”
We asked Butterick — who is also a programmer — whether there is a way to train AI without violating the open source licensing.
“Sure — read the license and do what it says!” Butterick stated. “AI companies can do this — they just prefer not to, because it reduces their profit margins. More broadly, AI companies will need to bring creators into the process to make it fair.”
An OpenAI blog post identifies other AI-based offerings not mentioned in the suit but which do leverage OpenAI Codex:
- Pygma utilizes Codex to turn Figma designs into different frontend frameworks and match the coding style and preferences of the developer. “Codex enables Pygma to help developers do tasks instantly that previously could have taken hours,” OpenAI wrote in its March 4 blog post.
- Replit uses Codex “to describe what a selection of code is doing in simple language so everyone can get quality explanation and learning tools,” including allowing users to highlight selections of code and get an explanation of its functionalities. It also recently began to offer a code completion AI service called Ghostwriter, which it notes uses “large language models trained on publicly available code and tuned by Replit.” It’s not clear whether Ghost Writer uses OpenAI Codex,
- Warp uses Codex to allow users to run a natural language command to search directly from within the terminal for a terminal command.
- Machinet, which helps professional Java developers write quality code by using Codex to generate intelligent unit test templates.
Streaming Music Evolved
He compared the lawsuit, filed on Thursday, Nov. 3, to streaming music services.
“With streaming music, we started with Napster, which was blatantly illegal, and then evolved toward licensed services like Spotify and Apple Music,” he said. “This evolution is going to happen with AI systems, too.”
We also asked whether Copilot be saved if it did violate open source licensing by training on Github code.
“That’s up to the defendants,” Butterick said. “But we need to be a lot more concerned about the massive violation of creators’ rights than ‘saving’ some wealthy corporation’s AI product. Copilot is a parasite and an existential threat to open source. Though given Microsoft’s longstanding competitive antagonism toward open source, maybe we shouldn’t be surprised.”
Butterick reactivated his California bar membership in June to join class-action litigators Joseph Saveri, Cadio Zirpoli, and Travis Manfredi at the Joseph Saveri Law Firm on the federal lawsuit. The 52-page complaint, plus an appendix and exhibit, have been placed online by attorneys and mention two anonymous plaintiffs, one from California and the other from Illinois.
What Butterick Wants Developers to Know
Butterick wanted developers to know that the litigators are interested in hearing from “all open source stakeholders.”
“All of us working on the case are optimistic about the future of AI. But AI has to be fair and ethical for everyone,” he said. “Copilot is not.”
There’s a chance some open source stakeholders won’t agree with this point of view on what is fair play in open source — or any code. For instance, Florin Pop, a frontend developer, recently asked Twitter whether it was okay to copy code. The majority who responded made an ethical distinction, saying it was okay as long as developers used the code to learn how the code works, rather than simply cutting and pasting the code. Others chimed in that the license still mattered and should be considered when copying code.
“This Copilot lawsuit makes little sense in my opinion,” Sanchez stated in a tweet. “Code doesn’t have much value on its own. Good code is as boring as possible. What matters is the execution, the purpose of what you do.”
Ultimately, what matters may be up to a jury, not developers, to decide.
June 6, 2023: Updated to correct deleted url to the new url for the lawsuit.