GPT-3 101: a brief introduction

It has been almost impossible to avoid the GPT-3 hype in the last weeks. This article offers a quick introduction to its architecture, use cases already available, as well as some thoughts about its ethical and green IT implications.

Photo from https://unsplash.com/@franckinjapan

Introduction

Let’s start with the basics. GPT-3 stands for Generative Pretrained Transformer version 3, and it is a sequence transduction model. Simply put, sequence transduction is a technique that transforms an input sequence to an output sequence.

GPT-3 is a language model, which means that, using sequence transduction, it can predict the likelihood of an output sequence given an input sequence. This can be used, for instance to predict which word makes the most sense given a text sequence.

A very simple example of how these models work is shown below:

Visualizing A Neural Machine Translation Model , by @JayAlammar

INPUT : It is a sunny and hot summer day, so I am planning to go to the…

PREDICTED OUTPUT : It is a sunny and hot summer day, so I am planning to go to the beach .

GPT-3 is based on a specific neural network architecture type called Transformer that, simply put, is more effective than other architectures like RNNs (Recurrent Neural Networks). This article nicely explains different architectures and how sequence transduction can highly benefit from the Transformer architecture GPT-3 uses.

Transformer architectures are not really new, as they became really popular 2 years ago because Google used them for another very well known language model, BERT . They were also used in previous versions of OpenAI’s GPT. So, what is new about GPT-3? Its size. It is a really big model. As OpenAI discloses on this paper , GPT-3 uses 175 billion parameters. Just as a reference, GPT-2 “only” used 1,5 billion parameters. If scale was the only requisite to achieve human-like intelligence (spoiler, it is not), then GPT-3 is only about 1000x too small .

L anguage Models are Few-Shot Learners , OpenAI paper.

Using this massive architecture, GPT-3 has been trained using several datasets, including the Common Crawl dataset and the English-language Wikipedia , matching state-of-the-art performance on “closed-book” question-answering tasks and setting a new record for the LAMBADA language modeling task.

Use cases

What really makes GPT-3 apart from previous language models like BERT is that, thanks to its architecture and massive training, it can excel in task-agnostic performance without fine tuning . And here is when magic comes. Since it was released, GPT-3 has been applied in a broad range of scenarios, and some developers have come with really amazing use case applications. Some of them are even sharing the best ones on github for everyone to try:

A non exhaustive list of applications based on GPT-3 are shown below:

Text summarizing

Regular Expressions

Natural language to SQL

Natural language to LaTeX equations

Creative writing

Interface design and coding

Automatic mail answering

Time to panic?

The first question that comes to mind as someone working in the IT services market when seeing all these incredible GPT-3 based applications is clear: will software developers run out of jobs due to AI improvements like these? First thing that comes to my mind here is that developing software is not the same as writing code. Developing software is a much profound task that implies problem solving, creativity, and yes, writing the code that actually solves the problem. That being said, I do really think that this will have an impact in the way we solve problems through software.

Just as humans need priming to recognize something we have never noticed before, GPT-3 does too. The concept of priming will be key for making this technology useful, providing the model with a partial block of code, a good question on the problem we want to solve, etc. Some authors are already writing about the concept of “ prompt engineering ” as a new way to face problem solving through AI in the style of GPT-3. Again, an engineering process still requires much more that what it is currently solved by GPT-3, but it will definitely change the way we approach coding as part of it.

GPT-3 has not been available for much time (and actually, access to its API is very restricted for the moment) but it is clearly amazing what developer’s creativity can achieve by using this model capabilities. Which brings us to the next question. Should GPT-3 generally available? What if this is used for the wrong reasons?

Not so long ago, OpenAI wrote this when presenting its previous GPT-2 model:

“Due to concerns about large language models being used to generate deceptive, biased, or abusive language at scale, we are only releasing a much smaller version of GPT-2 along with sampling code . We are not releasing the dataset, training code, or GPT-2 model weights.”

As of today, OpenAI still acknowledges this potential implications but is opening access to its GPT-3 model through a beta program. Their thoughts on this strategy can be found in this twitter thread:

It is good to see that they clearly understand that misuse of generative models like GPT-3 is a very complex problem that should be addressed by the whole industry:

Despite having shared API guidelines with the creators that are already using the API, and claiming that applications using GPT-3 are subject to review by OpenAI before going live, they acknowledge that this is a very complex issue that won’t be solved by technology alone. Even the Head of AI @ Facebook entered the conversation with a few examples on how, when being prompted for writing tweets over just one word (jews, black, women, etc.) GPT-3 can show harmful biases.

And this is not the only threat. Advanced language models can be used to manipulate public opinion, and GPT-3 models and their future evolutions could imply huge risks for democracy in the future. Rachel Thomas shared an excellent talk on the topic that you can find here:

Data Bias is not the only problem with language models. As I mentioned in one of my previous articles , the political design of AI systems is key. In the case of GPT-3 this might have huge implications on the future of work and also on the lives of already marginalized groups.

As a funny (or maybe scary) note, even GPT-3 thinks GPT-3 should be banned!

How big is big enough?

Going back to the architecture of GPT-3, training a model of 175 billion parameters is not exactly cheap in terms of computational resources. GPT-3 alone is estimated to have a memory requirement exceeding 350GB and training costs exceeding $12 million.

It is clear that the results are amazing but, at which cost? Is the future of AI sustainable in terms of the compute power needed? Let me finish this article by using some sentences that I wrote for my “ Is Deep Learning too big to fail ?” article:

Let’s not forget that more data does not necessarily means better data . We need quality data : unbiased and diverse data which can actually help AI benefit a lot of communities that are far from getting access to the state-of-the-art compute power like the one needed to play AlphaStar.

Only when we use efficient algorithms (therefore accessible to the vast majority of citizens) trained with biased and diverse data will Deep Learning be too big to fail. And it will be too big because it will then serve those who are too big to be failed: the people.

GPT-3 101: a brief introduction