Meta’s Llama 2 Is Not Open Source and That’s OK
Last week’s release of the Large Language Model (LLM) Llama 2 as “open innovation” has caused a furor that those of us supporting its release have been expecting for some time. Computer Science Regius Professor Dame Wendy Hall said that it’s “like giving people a template to build a bomb.” With all respect to the professor, that technology horse bolted decades ago and rightly so.
Thankfully today the world has access to technology — as it should. Her comment sadly comes from a time that technology forgot.
A quick look at the internet — a technology built collaboratively and openly using open source software, over 25 years ago — helps contextualize. As a technology, the internet has created a previously unimaginable world for our society, facilitating services and providing a digital environment for all. A search of Dame Wendy Hall’s words “bomb template,” brings up a suite of outputs in any search engine, including also her ill-advised comment. That search including the word bomb will also no doubt be seen by those monitoring the internet and protecting our security.
The tech elite repeatedly show ignorance of open source software. Only last week OpenUK’s latest economic analysis shows that 27% of the UK’s tech sector (Gross Value Added) originates from collaborative open source software businesses and explains why these globally collaborative businesses, often with companies in the US which is their primary sales market and an international workforce are frequently missed by the UK along with demonstrating the huge value they bring and the potential they have.
These companies often create and enable the rather dull middleware that sits under technology, sometimes described as the plumbing and plumber’s tools of a digital world. It’s like the base of a technology pizza with all of our technologies sitting on top of the base — the internet, cloud computing, blockchain, and indeed AI. When we talk tech we are generally interested in these toppings, but take the open source base away and what you are left with is a sloppy mess.
This ignorance of the value of open source software and what it has achieved is one thing but a basic lack of understanding of what it is, is quite another.
Open source software is distributed freely on a license with those who own the copyright in the code allowing others to use that software on the terms of the license. To be open source means that the humanly understandable version of the code, the source code, is open but also that this is shared with others on a license approved by the Open Source Initiative (OSI) as meeting the 10 criteria of the Open Source Definition (OSD). As part of that definition, this means that the code can be used by anyone for any purpose.
There can be no restriction on who uses it or what it is used for — whether that is a bad or good purpose unless specifically inhibited by a law which of course overrides the license. It has no moral or ethical judgment which enables its free flow. Open source by its nature, enables collaboration, transparency and trust.
Llama 2 Open Source
Meta’s release of Llama 2 this week is not open source and the Llama Community License is not approved by the OSI nor would it be as it does not meet the requirements of the OSD.
The OSI itself has acknowledged that AI requires something different and is currently undertaking a consultation to find a new “Open Source AI” definition. That consultation continues through the year and will not see an output before the end of this year at the earliest. Companies like Google DeepMind have however been open sourcing AI for years as can be seen from a quick GitHub search, we see DeepMind is the number two repo behind small UK AI company Significant Gravitas.
Meantime our governments are working on the restrictions that any opening of future AI will be accompanied by.
On July 21, the White House announced that a number of key companies including Amazon, Anthropic, Google, Inflection, Microsoft, Meta, and OpenAI have signed up to eight voluntary measures as Guard Rails “to help move toward safe, secure, and transparent development of AI technology. Companies that are developing these emerging technologies have a responsibility to ensure their products are safe.”
These commitments have been made after consultation with 20 countries including the UK and Japan, and are:
- Publicly reporting their AI systems’ capabilities, limitations, and areas of appropriate and inappropriate use — referred to as both security and societal risks.
- Prioritizing research on the societal risks that AI systems can pose, including on avoiding harmful bias and discrimination, and protecting privacy.
- Developing robust technical mechanisms to ensure that users know when content is AI generated, such as a watermarking system.
- Sharing information across the industry and with governments, civil society, and academia on managing AI risks.
- Internal and external security testing of their AI systems before their release.
- Investing in cybersecurity and insider threat safeguards to protect proprietary and unreleased model weights — noting that it is vital that the model weights be released only when intended and when security risks are considered.
- Facilitating third-party discovery and reporting of vulnerabilities in their AI system — risks — after an AI system is released and enabled will be found and fixed quickly using a robust reporting mechanism.
- Develop and deploy advanced AI systems (“frontier models”) to help address society’s greatest challenges.
Margrethe Vesteger, vice president of digital at the European Commission tweeted on 19 June (complete with a picture of herself meeting with Mark Zuckerberg that day) that “#AI code of conduct in motion. Today with Mark #Zuckerberg @Meta, the conversation focused on how to mitigate risks in #OpenSource environment.”
In light of the commitments being made in the US, that conversation with Vesteger and the many other requirements coming our way that are not publicly shared to date, it is unsurprising that Llama 2 was released with an Acceptable Use Policy to guide users to “do no harm” and which is highly likely to be along the lines of that which the European Commission will call for all AI to have in future.
Much of the discussion on the licensing of Llama 2 and open source is effectively a red herring as this Acceptable Usage Policy (AUP) is the critical piece of the story. The AUP and these restrictions, as they meld into law, will mean that AI is not going to be open source software as we know it, but a new form of openness. An attempt to characterize that was made on Meta’s Llama 2 website as “open innovation.”
This is why OpenUK, an organization that represents Open Technology — a broader remit than open source software — has been able to support its release as open innovation, signing up to this statement of support for its release, “We support an open innovation approach to AI. Responsible and open innovation gives us all a stake in the AI development process, bringing visibility, scrutiny and trust to these technologies. Opening today’s Llama models will let everyone benefit from this technology.”
For some in the open source community, these restrictions in the Llama 2 release are frustrating. They are unhappy with the commercial, and other restrictions in the license that would potentially inhibit the free flow of work done and products developed on it. This is an essential part of open source. But it looks unlikely that this — open source as we know it — can be the future of AI. We must however support an open approach and understand what that approach can best be in these evolving circumstances.
Meta’s commercial restrictions may be irritating to open source, but even without those the reality is that the AUP is not going to go away. If this is correct it will never be open source software.
Our technology sector was allowed to grow in a closed way creating proprietary walled gardens. This closed approach is how the giant tech companies evolved — with a moat to close off competition. It created an environment where the power in knowledge and many tools needed across the board in our digital lives as a society are owned and controlled behind closed doors by a few. This mistake must not be allowed to be repeated. Opening the technology up enables this balance to be corrected by democratizing AI and will avoid our repetition of these past mistakes, allowing our technology to be controlled by the hands of a few, at this most critical stage in the development of AI as one of the most important areas of technology of our time.
Access to a LLM allows for innovation by new market entrants, by small companies and by individuals with skills. By opening up AI we allow for its democratization and its transparency will enable trust. Bad actors may continue in black boxes but if they are innovating in the open and we do not spot them it’s our bad. Open innovation should enhance our trust.
The social media posts of Mark Zuckerburg and Yan Le Cun (chief AI Scientist at Meta) are incorrect in calling Llama 2 “open source” but I suspect this is out of laziness in their understanding of open source as opposed to any actual malevolence or “open washing.”
The reality is that the Llama 2 release is open innovation and this — with open data which we are hoping to see soon — will likely be the best we will get in “AI openness” It will serve those justifying their release with no benefit in the eyes of governments and regulators to say that their release is open source software without any restriction when the regulators want to see codes of conduct, AUPs and similar restrictions. If anything it may be to their detriment. And so, I believe that rather than trying to benefit from open source goodness and “open washing” it’s more to do with understanding and lack of accuracy, which is of course important.