Building a no-code toxicity classifier by talking to GitHub Copilot
source link: https://www.surgehq.ai/blog/building-a-no-code-toxicity-classifier-by-talking-to-copilot
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
Last year, GitHub and OpenAI released Copilot, an AI-powered coding assistant built on top of GPT-3. It's incredible – even the Harvard and Google engineers on our team find it speeds up a lot of their work.
But as helpful as it is for coders, what if it enabled non-engineers to program too – by merely talking to an AI about their goals? Let’s walk through an example.
Building a Toxicity Classifier
So imagine you want to build a machine learning model to classify toxic speech. You understand ML concepts at a high level, but you’re not a Python expert. We’ll show how Copilot can help!
Each time we want to issue Copilot a command, we’ll write our directive as a comment (in green); Copilot's actions are the subsequent lines of code.
So let’s create a Jupyter notebook and start by telling Copilot to import all the programming libraries we need to train a toxicity classifier. We write this comment in green:
Its first response is to "import numpy as np"!
We let it continue until it starts repeating itself, leading to the following:
All 28 imports were generated by Copilot. Next, we ask it to read in our dataset.
Remember: our only contribution will be the green #comments – Copilot created the toxicity = pd.read_csv("toxicity.csv") line itself.
What does this dataset look like? We ask it to display the file...
...and it successfully responds by showing the first 5 rows.
How many examples are in each class?
Copilot responds, unfortunately, with a line of code that doesn’t quite work – there’s no 'target' column or variable in our dataset.
It would be easy to fix this by replacing 'target' with 'is_toxic', but where’s the fun in that? Does it help if we make our request more explicit?
We can even ask it to create a bar chart:
Great, but it’s visually bland. Can Copilot pretty it up?
Let’s ask it to flip the axes, so that the bars are horizontal.
That didn’t work, but let’s modify our comment and try again:
Can we also change the color of the bars?
Let’s give the bar chart a title too.
Who needs highly paid data scientists now?
Again, Copilot created all of this code simply through a “conversation” with it! As someone who doesn’t generate many data visualizations in Python, I couldn't have easily done this myself, without a lot of Googling and reading documentation.
Imagine, in the future, Copilot creating this bar chart (and other insights) from a higher-level command: “analyze the dataset and produce useful insights”. In the same way that GPT-3 can understand important aspects of text, what if Copilot could automatically detect the important columns in a data frame, and predict that we might want to see the class balance or a precision-recall curve?
In our work with research labs developing large language models, we build advanced “labeling” teams of programmers and mathematicians who train their AI systems to do exactly this. In other words, teams of high-skill, STEM “labelers” on our platform are given commands (or come up with commands themselves) and generate Python programs that solve them, serving as training demonstrations.
Moving on, let’s now ask Copilot to build a toxicity ML model using the dataset.
It produced the following as a result – both the subsequent comments ("# using a Naive Bayes classifier" onwards) as well as the pipeline code!
I don’t know what a Pipeline is, but let’s ask Copilot to use it to classify regardless.
When I don’t know what to do next (what do I use a pipeline for?), and Copilot doesn’t automatically create additional continuations, I simply create a new cell, write #, and let Copilot respond – letting it predict what I might want to ask!
In this case, it completes the comment with “predict the toxicity of a new text”...
…which it then continues with the following.
Unfortunately for paradoxes, our classifier predicts it correctly!
Can we also get Copilot to create some test cases for us to try?
Unfortunately, it doesn’t do a very good job…
…so we’ll give it some examples. When generating the array, it even creates the ideal variable name and escapes the quotations.
Let’s test it out. Even though I had a typo in my instructions, it gets them all right!
Let’s test it on non-toxic comments too.
Of course, we should measure how the model performs on a holdout test set, so we’ll ask Copilot to split the original dataset into training and test.
It continues with the following (after we create new cells and begin them with an empty comment)!
Let’s ask Copilot how well the model performs.
It even automatically suggests printing a classification report next!
Can we ask Copilot to plot a precision-recall curve?
Unfortunately, its first attempt fails:
One interesting way around this: OpenAI and Google have been developing research systems where they give AI models access to search engines so that they can gather information by browsing the web. What if Copilot could search for the error string, discover a relevant StackOverflow solution, and use that to fix itself?!
In the meantime, it seems like the error has something to do with the fact that y_test and y_prod are arrays of strings (“Toxic”, “Not Toxic”), so let’s try binarizing them.
Our first command leads to buggy code:
But once again, after we make our instruction more detailed, it gets much better!
Let’s also regenerate our predictions:
And ask Copilot to create a precision-recall curve.
Once again, let’s try prettying it up…
Tufte couldn't have done it better himself.
Aggregate valuable and interesting links.
Joyk means Joy of geeK