Investigating the Impact of Twitter Bots on the 2020 U.S. Election’s Political Discourse
A non-technical example of using Natural Language Processing techniques to categorize unstructured text documents into categories
By Elliot Wilens
The 2020 Presidential Election took place during a delicate time for most Americans. In the absence of social gatherings, many joined Twitter to share their opinions (annual increase of 23 million users¹). As a result of this separation, and in conjunction with social unrest over racial inequality (#blacklivesmatter) and several other factors, the United States became as polarized as ever. Well, not ever, but you get the point.
Misinformation campaigns exist. Hell, today, even fake people exist (at least in the Twitter world). Those are called Twitter Bots, and I wanted to see just how much, if at all, they impacted the political discourse on Twitter.
Analyzing the Election discourse on Twitter
I first took a large sample of about 1 million tweets containing the words ‘Trump’ or ‘Biden’ from October 1 — November 2, 2020 (the election was on November 3rd). After filtering out some duplicates and shorter tweets (under 100 characters), I wound up with 363,000 Tweets.
I used several Natural Language Processing techniques to cluster Tweets in to categories. Essentially, this was done based on the words used in each Tweet relative to the words used in all Tweets. I’ll refer to these categories as topics, but you can think of them as ‘themes’. This was a pretty involved task, but I’ll keep it non-technical in this post. If you’re interested, check out the Python code on my GitHub page. There, I also explain the techniques used to discover the topics using methods like Lemmatization, n-grams, TF/IDF Vectorization and Non-Negative Matrix Factorization.
My model wasn’t perfect (no model is), and I manually named the topics based on the words and Tweets that were most associated with each grouping. The names are meant to generalize the overarching concepts behind the tweets in each category. Below is a visualization of each of the 30 topics discovered. The sizing of each rectangle indicates the number of tweets representing the topic.
Of all the topics, the most popular (in terms of number of Tweets) was negative sentiment towards Trump supporters. Here are the top five topics, in terms of Tweet frequency, along with some Tweets that are most associated with that topic (according to my model). I’ll keep the authors of the below anonymous (note, none of these are my words/b):
“What’s that word? It starts with a C? Where everything you do is controlled? Like old Russia? Oh, yeah, communist. That’s it. Yeah, Trump supporters are so dumb cause they say they hate that but Trump does everything like a communist and they just love him soooo much”
Covid is a hoax
“@shootsfromhip @charliekirk11 Don’t you get it? Trump is showing you the virus is b/s. He’s pulling down the veil before your eyes and you still can’t see it.2020–10–05"
Trump Rally COVID infections
“@darklight60d @CityNews @meldug Biden doesn’t hold rallies, because he doesn’t want his supporters to risk COVID. Trump doesn’t give a shit.”
“Election 2020: Biden leads Trump in polls as swing states tighten Is Biden the Atlanta Falcons? 👀 https://t.co/Wfo1nuxpic”
“While censoring Hunter Biden story, Twitter allows China, Iran state media https://t.co/tEuph8Cm7l #FoxNews THEY are paid off by Democrats”
The model did a relatively good job separating Tweets into topics. It did an especially good job identifying some rather obscure topics, such as Negative Campaign Ads. Apparently, the Trump campaign released a series of negative advertisements in the days leading up to the election. This topic captured tweets from Biden supporters urging him to respond to the alleged false claims.
I was able to use Chris Doenlen’s ‘Twitter Bot Detection” ² model from a past project of his to classify each Twitter user in my sample as a bot (or not).
Before we continue, let’s define “Twitter bot”. I began without such a definition, which allowed me to construct unrealistic expectations for the project’s outcome. I originally associated the term with, say, a radicalized person that spams online users with their redundant and agreeable ‘opinions’, echoed by the thousands of fake people they’ve coded up for Twitter in an attempt to impose their beliefs upon others.
Maybe ‘The Social Dilemma” on Netflix got me worked up. I mean, combine several of these people with similar opinions. What do you get? The amplification of social media’s separation of their users into contained social bubbles, and a polarized network. It’s a scary thought, one person posing as thousands of different individuals to spread some form of (mis)-information.
The results of this project alleviated much of this concern for a couple of reasons. First, the number of Twitter bots in the discussion was lower than I initially expected, making up 6 percent of the discourse.
And secondly, exploring the suspected bot accounts indicated that they are often:
- Established news-outlets promoting new content, such as automatically tweeting when a new article is posted (#BreakingNews)
- Service companies posting outage notifications (#ServiceOutage)
- Emergency alert systems (#AmberAlert)
- and much more!
With this, we can concisely define Twitter bot as follows:
An account in which its Tweets are generated dynamically by computer programs for a clear purpose.
Thanks for reading! If you’re interested in seeing how I made this work, feel free to take a look at the GitHub repository here.
Metis Data Science Bootcamp
Launch your data science career with a 10-week online data science bootcamp. Learn how to use Python libraries and more…
¹ Q3 Monthly Active Users) Twitter Revenue and Usage Statistics (2020)
Twitter is a real-time microblogging platform, publicly launched in July 2006. Its defining feature is the tight limits…