We’ve become accustomed to machines automatically correcting our spelling, completing sentences or translating text into other languages, thanks to the power of natural language processing (NLP), a technology that gives machines the ability to read, understand and extract meaning from human languages. However, with the recent release of GPT-3, a massive NLP model created by artificial intelligence lab OpenAI, it’s clear that machines will soon be capable of much more: writing coherent essays or fiction, tweets, poems, blog posts, technical manuals, answering reading comprehension questions and even producing code — much of it indistinguishable from what would be generated by a human. From a human perspective, it’s a remarkable — but also unsettling — leap forward, given the potential implications of such a powerful tool.
GPT-3 is the newest iteration of the much smaller, original GPT model released over two years ago, which was eventually succeeded by GPT-2. These NLP models are based on what are known as “transformer” deep learning neural networks, which are designed to handle sequential data and perform tasks like speech recognition, translation and summarizing texts. Most notably, however, transformer-based models do not have to parse that sequential data in order, meaning that they can process the end of a sentence before its beginning, meaning they allow for much more parallelization than previous models, and can, therefore, handle much larger datasets while also reducing training times.
Largest Model Yet
According to the research team’s paper, the new GPT-3 model’s 175 billion training parameters were sourced from Common Crawl, an open repository of various texts scraped from the Internet. What’s noteworthy about GPT-3’s scaled-up system is that far eclipses other comparative tools, such as Microsoft’s Turing NLG model, which was trained using around 17 billion parameters. GPT-3 is quite capable of learning from only a few examples before it can generate answers of its own, making it almost as adept as previous state-of-the-art approaches that must be manually fine-tuned. It’s also versatile: not only can it generate reams of human-sounding text, but it can also unscramble words and solve three-digit math problems.
To demonstrate this, the researchers put GPT-3 through a series of tests, under specific problem-solving settings known as few-shot learning (getting the model to learn what it needs to do using several training examples), one-shot learning (using a few or one example), and zero-shot learning (getting the system to extrapolate from its previous training, but using no examples at all). The team found that GPT-3 attained promising results under the one-shot and zero-shot conditions, and under some few-shot learning settings was able to perform better than other state-of-the-art NLP models. In particular, GPT-3 demonstrated strong performance in tasks like translation, answering questions, and completing sentences, besides fooling 88% of human evaluators into believing that its generated news articles were written by a human author.
“More remarkably, GPT-3 is showing hints of [artificial] general intelligence,” noted Australian philosopher David Chalmers, one of several experts who offered insights into the future implications of such a powerful system on Daily Nous. “Previous AI systems have performed well in specialized domains such as game-playing, but cross-domain general intelligence has seemed far off. GPT-3 shows impressive abilities across many domains. It can learn to perform tasks on-the-fly from a few examples, when nothing was explicitly programmed in. It can play chess and Go, albeit not especially well. Significantly, it can write its own computer programs given a few informal instructions. It can even design machine learning models. Thankfully they are not as powerful as GPT-3 itself (the singularity is not here yet).”
Scams, Disinformation and Algorithmic Bias
Significantly, the team points out that GPT-3’s powerful abilities in generating high-quality text could potentially be used for more nefarious purposes, such as creating spam, or scams for phishing private information, writing fake academic essays, or implementing disinformation campaigns.
To prevent possible misuse, OpenAI has chosen to release an API, instead of the open-sourcing the models. “The API model allows us to more easily respond to misuse of the technology,” explained OpenAI via blog post. “Since it is hard to predict the downstream use cases of our models, it feels inherently safer to release them via an API and broaden access over time, rather than release an open source model where access cannot be adjusted if it turns out to have harmful applications.”
Even more concerning were some of the potential race, gender and religious biases that the researchers observed in this current model, as well as in GPT-2. For instance, GPT-3 was much more likely to associate 83% of 388 occupations with a male identifier, especially those with higher levels of education (such as banker, legislator or professor emeritus) or those requiring physical labor (mason, sheriff). In contrast, professions like midwife, nurse, receptionist, and housekeeper were overwhelmingly associated with women.
To test for racial biases, the team tested the model with sentence prompts that were to be filled in with either “Asian,” “Black,” “White,” “Latinx,” “Indian,” or “Middle Eastern” and a description. The completed sentences were then analyzed and scored according to how positive or negative they were. Generally, sentences with “Asian” were completed using words with consistently positive or high sentiment scores, while “Black” was associated with consistently low sentiment scores. Similar biases were seen with religious terms, with words like “violent,” “terrorism,” and “terrorist” being associated with “Islam,” while “Atheism” was more likely to be linked with words like “defensive,” “complaining,” and “arrogant.”
It’s important to note that these biases don’t necessarily reflect reality, but the nature of the data that’s being fed into these systems. What is most alarming is the likelihood that these biases will have some sort of long-term impacts when such models are used in courts, hospitals and schools.
“Biases present in training data may lead models to generate stereotyped or prejudiced content,” explained the team. “This is concerning, since model bias could harm people in the relevant groups in different ways by entrenching existing stereotypes and producing demeaning portrayals amongst other potential harms. [..] Broadly, our analysis indicates that internet-trained models have internet-scale biases.”
To counter these algorithmic biases, the team suggests that bias prevention and mitigation will need a broad collective effort to holistically develop the tools needed for identifying and intervening when such biases are found, such as utilizing “model cards,” or documentation on the possible biases and limitations of AI models.
Despite these flaws and the potential for misuse, GPT-3 nevertheless represents a big leap forward in terms of getting machines to understand and manipulate human languages. While such systems have yet to fully approximate the general-purpose versatility of human intelligence, there are glimmers that they are coming quite close.
Images: Raphael Schaller via Unsplash; OpenAI.
The New Stack is a wholly owned subsidiary of Insight Partners, an investor in the following companies mentioned in this article: turing.