Building a no-code toxicity classifier by talking to GitHub Copilot

Last year, GitHub and OpenAI released Copilot, an AI-powered coding assistant built on top of GPT-3. It's incredible – even the Harvard and Google engineers on our team find it speeds up a lot of their work.

But as helpful as it is for coders, what if it enabled non-engineers to program too – by merely talking to an AI about their goals? Let’s walk through an example.

Building a Toxicity Classifier

So imagine you want to build a machine learning model to classify toxic speech. You understand ML concepts at a high level, but you’re not a Python expert. We’ll show how Copilot can help!

In this example, we’re using the Copilot extension for Visual Studio Code, and a free toxicity dataset that we built; you can follow along by downloading the Jupyter notebook here.

Each time we want to issue Copilot a command, we’ll write our directive as a comment (in green); Copilot's actions are the subsequent lines of code.

So let’s create a Jupyter notebook and start by telling Copilot to import all the programming libraries we need to train a toxicity classifier. We write this comment in green:

Its first response is to "import numpy as np"!

We let it continue until it starts repeating itself, leading to the following:

All 28 imports were generated by Copilot. Next, we ask it to read in our dataset.

Remember: our only contribution will be the green #comments – Copilot created the toxicity = pd.read_csv("toxicity.csv") line itself.

What does this dataset look like? We ask it to display the file...

...and it successfully responds by showing the first 5 rows.

How many examples are in each class?

Copilot responds, unfortunately, with a line of code that doesn’t quite work – there’s no 'target' column or variable in our dataset.

It would be easy to fix this by replacing 'target' with 'is_toxic', but where’s the fun in that? Does it help if we make our request more explicit?

It does!

We can even ask it to create a bar chart:

Great, but it’s visually bland. Can Copilot pretty it up?

Let’s ask it to flip the axes, so that the bars are horizontal.

That didn’t work, but let’s modify our comment and try again:

Can we also change the color of the bars?

Let’s give the bar chart a title too.

Who needs highly paid data scientists now?

Again, Copilot created all of this code simply through a “conversation” with it! As someone who doesn’t generate many data visualizations in Python, I couldn't have easily done this myself, without a lot of Googling and reading documentation.

Imagine, in the future, Copilot creating this bar chart (and other insights) from a higher-level command: “analyze the dataset and produce useful insights”. In the same way that GPT-3 can understand important aspects of text, what if Copilot could automatically detect the important columns in a data frame, and predict that we might want to see the class balance or a precision-recall curve?

In our work with research labs developing large language models, we build advanced “labeling” teams of programmers and mathematicians who train their AI systems to do exactly this. In other words, teams of high-skill, STEM “labelers” on our platform are given commands (or come up with commands themselves) and generate Python programs that solve them, serving as training demonstrations.

Moving on, let’s now ask Copilot to build a toxicity ML model using the dataset.

It produced the following as a result – both the subsequent comments ("# using a Naive Bayes classifier" onwards) as well as the pipeline code!

I don’t know what a Pipeline is, but let’s ask Copilot to use it to classify regardless.

When I don’t know what to do next (what do I use a pipeline for?), and Copilot doesn’t automatically create additional continuations, I simply create a new cell, write #, and let Copilot respond – letting it predict what I might want to ask!

In this case, it completes the comment with “predict the toxicity of a new text”...

…which it then continues with the following.

Unfortunately for paradoxes, our classifier predicts it correctly!

Can we also get Copilot to create some test cases for us to try?

Unfortunately, it doesn’t do a very good job…

…so we’ll give it some examples. When generating the array, it even creates the ideal variable name and escapes the quotations.

Let’s test it out. Even though I had a typo in my instructions, it gets them all right!

Let’s test it on non-toxic comments too.

Of course, we should measure how the model performs on a holdout test set, so we’ll ask Copilot to split the original dataset into training and test.

It continues with the following (after we create new cells and begin them with an empty comment)!

Let’s ask Copilot how well the model performs.

It even automatically suggests printing a classification report next!

Can we ask Copilot to plot a precision-recall curve?

Unfortunately, its first attempt fails:

One interesting way around this: OpenAI and Google have been developing research systems where they give AI models access to search engines so that they can gather information by browsing the web. What if Copilot could search for the error string, discover a relevant StackOverflow solution, and use that to fix itself?!

In the meantime, it seems like the error has something to do with the fact that y_test and y_prod are arrays of strings (“Toxic”, “Not Toxic”), so let’s try binarizing them.

Our first command leads to buggy code:

But once again, after we make our instruction more detailed, it gets much better!

Let’s also regenerate our predictions:

And ask Copilot to create a precision-recall curve.

Once again, let’s try prettying it up…

Tufte couldn't have done it better himself.

Building a Toxicity Classifier

Recommend

RocketMQ -- 消息消费队列与索引文件

手把手带你走进Babel的编译世界

英方软件科创板IPO，拟募资5.74亿元，依赖金融行业客户营收

历时 37 年，Windows 1.0 复活节彩蛋终于曝光：主角竟是“G 胖”！

疫情蔓延，物流难通，多家电商推出抗疫措施以求突围

5 Myths About Nail Care That You Should Totally Not Believe In

数据库与缓存双写一致性

Three.js 火焰效果实现艾尔登法环动态logo 🔥

On node-ipc and the importance of trusting trust

Python Basics: Code Your First Python Program

About Joyk