Visualizing Toxicity in Twitter Conversations

In late June, Deb Roy approached me to ask if I would be interested in doing some visualization work for a presentation he’d be giving at Twitter’s all-hands event with Bridgit Mendler . The talk was to be about toxicity in Twitter conversations, and would showcase the conversation analysis work being done in the lab, particularly around classifying replies as toxic or not.

At this point, any time Deb asks if I want to visua — I stop him right there and say yes. Yes please. When can I get started? ( Check out some of his previous visualizations in his fantastic TED talk .)

The project started with an initial design discussion in which we all agreed it would be cool to somehow visualize twitter conversations as natural looking trees, where replies form branches and the more toxic the reply, the more withered the branch would look. At this point, I had no idea how I’d even approach rendering a withered tree, but it sounded like a fun experiment so I said I’d look into it and do my best.

One of the benefits of working at Cortico in the MIT Media Lab is that a ton of interesting people are always coming through. As luck would have it, a few days after our meeting I was showing seasoned visual FX artist Eugénie von Tunzelmann around the lab. She was intrigued by the idea of the conversation trees and thought I might enjoy using Houdini to try and model it, suggesting its procedural nature would appeal to my engineering background. A day after our meeting she was even so kind as to send over a sample file for an approach that might work.

Eugénie’s sample render of a partially withered tree

It looked pretty sweet and I was convinced it would be fun to learn Houdini for this project. My naive past self was exceptionally optimistic about this despite the deadline for the project being just a few weeks away. Thankfully MIT offers access to Lynda.com courses and there were a couple on Houdini that I completed to get a crude, basic understanding of the application.

With Eugénie’s sample file, Lynda.com’s courses, and the online Houdini help pages at the ready, I began my journey modeling the toxicity of conversations on Twitter. I broke it down into three steps to ease my anxiety:

Figure out how to layout the tree
Figure out how to render a tree in Houdini based on real data
Create a video showing multiple conversation trees growing

Lay out the Conversation Tree

At Cortico and the Lab for Social Machines (our group in the Media Lab), it’s pretty common to spend time thinking about, looking at, or creating network graphs . Every time we’ve rendered graphs in 3D however, we’ve used a force-directed layout . Given that this data had a bit more structure (it’s a tree), I thought I’d look and find if there were any other interesting 3D layouts to try.

The first thing I found was a paper from 1995 by an old professor of mine, Tamara Munzner ! Sweet. But it was about hyperbolic spaces, which was a bit beyond what I was interested in. However, there was a great figure demonstrating a 3D cone layout that looked very promising. Thanks Tamara!

Tree Cone Layout (figure taken from Tamara’s paper)

Next I had to figure out how to apply the layout to our data. I found a very extensive Python graphing library called Tulip that had algorithms for laying out graphs in dozens of ways, including the cone layout. Jackpot. With a little finesse I was able to take our conversation tree data and output JSON files that included the nodes, links, 3D positions, and toxicity scores. With these pre-computed files, all I’d need to do was get Houdini to render objects in their positions as specified in the data.

My first approach was to bang my head on the wall several times, but it turned out that a more effective means of moving forward was reading documentation and trying things in Houdini. Houdini has great Python integration , so I was able to write a bit of code that generated geometry (points and lines) based on the data. Thanks to Eugénie’s example, I was able to figure out how to use the toxicity parameter in the data to color the nodes. I really owe her a beer… or five hundred.

I didn’t want to settle on the cone layout without first trying a few others, so I generated several different JSON files with the layouts and began to explore how they’d look. Here green corresponded to “healthy” and red to “toxic”. Each tweet was represented as a sphere, and a line was drawn between tweets to indicate that one had replied to the other.

Lay out the Conversation Tree

Recommend

Reevaluating the Toxicity of Semiconductor Manufacturing

Seeking to respin Instagram's toxicity for teens, Facebook publishes annotated s...

Seeking to respin Instagram's toxicity for teens, Facebook publishes annotated s...

How bias creeps into the AI designed to detect toxicity

GGWP is an AI system that tracks and fights in-game toxicity

Building a no-code toxicity classifier by talking to GitHub Copilot

On anti-crypto toxicity

ModerateHatespeech - Free toxicity detection for content moderation via AI | Pro...

Sony's PS5 Accolades feature built to counter gaming toxicity is retiring due to...

Twitter's ad sales down 59% as advertisers worry about toxicity, Musk's tweets,...

About Joyk