Building @ExcitedPinkbike, a Twitter bot that posts AI-generated Pinkbike comments

I have finished what is, by any objective measure, a completely ridiculous side project.

First things first: here is the result, and here’s the Github repo that’s running it. (Although that repo doesn’t include some of the code, for reasons I’ll explain below).

It does produce some interesting ideas. For example:

Questionable jumping advice:

Modern geometry is so important:

Relatable content:

A realistic simulation of commenters on the internet:

Why did I do this?

I was listening to the Pinkbike podcast (Pinkbike is a popular mountain biking news website and community) on bike marketing failures. In a discussion of broader marketing failures, Microsoft’s racist “Tay” AI chatbot was mentioned, and the hosts joked about what an AI might learn from Pinkbike commenters.

As internet forums go, Pinkbike is actually quite tame, but I thought: that’s an interesting challenge! I was also curious to see whether a text-generation model trained on Pinkbike comments would recreate some of the common jokes about Pinkbike commenters: that they always claim to be incredibly skilled riders, that every bike “looks like a Session,” etc.

Mostly, though, I thought it would be an interesting technical challenge, involving quite a few things I’ve never done before.

What did I do?

First, I scraped a lot of comments from Pinkbike. Hundreds of thousands of them.

(That’s the code that’s not in my Github repo. Although I used Python’s time.sleep() function to add a lengthy pause between each pageload, and also used caching to ensure that I was only making each request once even if I needed to rewrite some of the code, it would be really easy to remove those elements and put a lot of stress on Pinkbike’s servers).

Once I had collected the comments, I thought it’d be more interesting to filter them to make the end results a bit more interesting. Since PB comments are pretty similar to social media posts, I used the vaderSentiment library (which is optimized for social media posts) to score each comment for emotion, and then filtered for the most high-emotion comments both positive and negative. (Specifically, I used comments with a Vader compound score of above 0.5 or below -0.5).

With my dataset of “high emotion” Pinkbike comments, I then needed to train a model to generate text. GPT-2 seemed like the best available option (the newer GPT-3 is much better, but not available to the public).

I used Max Woolf’s Google Colab notebook to train the model and to generate comments. This method was both the easiest and the fastest, since Max has already written almost all of the necessary code, and since running it through a Colab notebook allows the model training to happen on super-fast Google GPUs. Doing it locally would have been much slower.

Once the comments are generated, I’ve been manually picking the best ones and adding them to a CSV. This the only non-automated step in the process, but it’s necessary. For one, most of the comments GPT-2 comes up with aren’t interesting. Also, like any model trained on internet comments, it occasionally comes up with things that are racist, sexist, or otherwise horrifying, and I have no interest in putting any of that out into the world.

The final step to get everything ready was to write the actual “bot”, which in this case is just a script that picks three of the model-generated comments, checks that they are unique and haven’t already been posted to Twitter, and then posts them to the @ExcitedPinkbike account, with a 45 minute delay between each.

I’ve also been version-tracking the entire process through Git and pushing commits to Github via the command line. I’m also running the script from the command line. This is really baseline, everyone-does-that type stuff, but since I started learning Python for data analysis, most of my previous work has been with Jupyter Notebook files that I’ve just uploaded via Github’s web interface. This project provided a nice opportunity for me to practice what I learned about the command line and git from Dataquest’s courses.

At the moment, I’m essentially just running this script manually, whenever I feel like it. At some point, I may automate that element of it too. However, since the comment-picking itself has to be done by a human (to ensure what gets posted is at least potentially interesting, is not racist, etc.), there’s no way to 100% automate this process.

What did I learn?

Number one: none of this stuff is that hard. What I’ve done sounds very complicated and cool, with things like “sentiment analysis” and “AI.” But in reality, it’s not nearly as complicated or difficult as it sounds. There are Python libraries that make every step of the process much easier, and of course StackOverflow and Github exist to offer solutions to most problems.

Number two: “AI” is neither “A” nor “I”. Admittedly, I already knew this (here’s a good read on it), but building a very simple “AI” project like this showed me firsthand just how much of what gets presented as “AI” is influenced by humans at every step of the process.

In this project, for example: humans wrote the initial text corpus used to create GPT-2, and also all of the Pinkbike comments I trained my model with. Humans wrote the vaderSentiment algorithm that decided which comments were emotional and which were not. A human (me) filters the model-generated comments to pick only the interesting ones for actual publication. With so many people in the process, it can hardly be called “artificial,” and it’s certainly not an “intelligence.”

Number three: any “AI” that’s trained with human-generated data is going to reflect human biases. Humans are racist, so if you feed a machine tens of thousands of comments and tell it to imitate them, which is essentially what I did, you’re going to get some racist ones.

In this particular context, it’s essentially harmless, since nothing bad that my model comes up with will ever be published or seen by anyone other than me. But seeing the “Pinkbike commenter” model generate the occasional racist comment has given me some firsthand experience with a thing I’m already quite worried about: algorithmic bias. (If that’s not a topic you’re already familiar with, I’d suggest you read up on it, or I think there’s a good Netflix documentary about it, and then keep that in mind every time you see any headlines about “AI”).

Basically, algorithms are driven by data, and data is chosen and influenced by humans. The output of an algorithm will thus reflect (and sometimes magnify) human biases, even if it’s presented as “unbiased” since the algorithm itself is not a human.

Number four: Dataquest is right. Since I don’t work there anymore, I can now share my unbiased opinion, which is that Dataquest is the best platform for learning programming skills, and that what they preach about learning through personal projects to keep yourself motivated actually works. I would never have finished this project if it hadn’t been my own idea, and hadn’t been related to my mountain-bike obsession. But because I did, I:

Learned how to use GPT-2 simple
Learned how to create a Twitter bot using Tweepy
Got good practice using Git and Github for a real project
Got good practice using the command line
Got good practice using BeautifulSoup, vaderSentiment, etc.
Got good practice using Atom and various script files to write and run Python code (as opposed to a Jupyter notebook)
And more!

What’s next?

Well, probably whatever I need to learn for my new job, which starts in less than a week! But on this project, I may add a feature that enables the script to @ mention some of PB’s most famous folks when they are mentioned in the comments the model generates (which happens pretty frequently).

Charlie Custer