Beyond ChatGPT: Exploring the OpenAI Platform
Since the launch of ChatGPT on November 2022, OpenAI has been getting a lot of attention from knowledge workers, developers, and almost everyone who uses the Internet and the web. But, OpenAI has been around for a long time, even before ChatGPT was created, and offers many exciting services to developers. It’s one of the first platform companies to expose generative AI through simple REST API endpoints.
This will be the first in the series of articles on OpenAI where we will explore the big picture of OpenAI and how the platform is structured. We will understand the foundations and the fundamental building blocks of the OpenAI platform.
OpenAI: Democratizing Generative AI
OpenAI was established in 2015 as a non-profit research organization by Ilya Sutskever, Greg Brockman, Trevor Blackwell, Vicki Cheung, Andrej Karpathy, Durk Kingma, John Schulman, Pamela Vagata, and Wojciech Zaremba. Sam Altman and Elon Musk were the initial board members.
OpenAI came into the limelight when Microsoft announced a $1 billion investment in 2019 and another round of $10 billion earlier this year. Infosys and Khosla Ventures are the corporate investors, while Reid Hoffman, Peter Thiel, and Jessica Livingston are the individual investors.
Though often criticized for flipping from a non-profit to a commercial AI company, OpenAI has been at the forefront of generative AI research. Thanks to Microsoft’s partnership, it gained access to the state-of-the-art infrastructure powered by Azure compute services.
Without spending too much time understanding the history and evolution of OpenAI, let’s look at the company’s current state.
The generative AI models are trained with large datasets based on unsupervised learning called foundation models. At a high level, OpenAI has three key foundation models: GPT, DALL-E, and Whisper. GPT is one of the most popular models trained to deal with textual content. DALL-E can generate images based on natural language input. Finally, Whisper is a model to convert speech to text and translate one language into another.
All the use cases and generative AI scenarios supported by OpenAI revolve around these three foundation models. Out of these, GPT got the maximum attention due to the success of ChatGPT, which is powered by the most recent version of the model — GPT-4. There are multiple variations of the GPT model to support scenarios such as word completion, interactive chat, editing, rephrasing, summarizing, and text classification. Similarly, DALL-E can be used for creating, editing, and creating variations of images. The Whisper model can be used for transcription and translation of audio files.
To make it easy for the developers to infuse generative AI, OpenAI has exposed multiple APIs aligned with the use cases. To integrate models such as GPT or DALL-E, the developers will have to get the API key and use it to access the Open AI REST endpoints.
OpenAI’s APIs democratized generative AI by making state-of-the-art language and vision models accessible through a simple REST interface. Any developer familiar with consuming APIs can infuse the power of generative AI into her applications. They neither need to understand the complex math behind neural networks nor have access to powerful compute infrastructure based on high-end CPUs and GPUs.
It’s possible to fine-tune OpenAI’s foundation models with custom, private datasets. The fine-tuned model can then be used to perform inference on private data, which significantly enhances the value of generative AI. OpenAI has exposed fine-tuning as an API that accepts the variation of a foundation model and a custom dataset.
The diagram below summarizes how the OpenAI platform is structured. The bottom-most layer consists of the foundation models, while the next layer has multiple flavors and variations of the models, each optimized for a specific use case. The topmost layer is the REST API which exposes the models through well-known endpoints.
Exploring the OpenAI Ecosystem
OpenAI has built the tools, SDKs, and services targeting developers and end-users. ChatGPT is an example of a service aimed at end users. OpenAI predominantly utilizes ChatGPT to get interactive feedback from the users, which goes a long way in improving the GPT model. It’s also using the input and prompts to analyze how users interact with the model.
For the developers, OpenAI has a playground that acts as an interactive interface to the REST API. It can be used to test how the fine-tuned models respond to the same input or prompt. It can also be used to tweak the parameters that influence the accuracy and creativity of the models.
The below screenshot shows invoking the completions API through cURL:
The same can be done through the official Python library maintained by OpenAI.
OpenAI has also published tools and libraries that convert words to tokens — the fundamental input unit of large language models such as GPT. These tools help developers assess the cost involved in consuming OpenAI’s API. When you install the Python library through PIP, you also get a handy CLI tool to test the API.
If you are a Microsoft Azure developer, you can sign up for Azure OpenAI service that’s tightly integrated with Microsoft’s cloud services, such as Active Directory, Virtual Networks, Role-based Access Controls, and more.
In the next part of this series, we will take a closer look at prompt engineering and its importance in dealing with GPT. Stay tuned.