After doing this week’s readings by Nate Silver, I went to FiveThirtyEight to check out some of the articles. I came across an interesting one by Dan Hopkins titled “Political Twitter Is No Place For Moderates.” Four researchers from the University of Pennsylvania, one of whom is Dan Hopkins, investigated who discusses politics on Twitter. You can find their study, “Beyond Binary Labels: Political Ideology Prediction of Twitter Users” here.
In order to find out who talks about politics on Twitter, the researchers downloaded 4.8 million tweets from 3,938 Twitter users. They asked each user to rate themselves on a 7-point scale as either Very Conservative, Conservative, Moderately Conservative, Moderate, Moderately Liberal, or Liberal, Very Liberal. Then, they selected the most common 12,000 words from all the tweets and coded them as either political or non-political. Examples of political words, as defined by these researchers, include “president,” “racism,” and “Romney.”
After breaking down words into political and non-political, the researchers went back through the tweets (with an algorithm, not by hand) to determine which political categories use these types of words most frequently. The chart below, which comes from the FiveThirtyEight article, demonstrates their findings.
You’ll notice that they broke the words down into further categories: media/pundit names, politician names, and common political words. Visually, the “common political words” data is what jumps out at you. There is a definite “C” pattern which seems to strongly suggest that people who have more extreme political views are much more likely to talk about politics on Twitter. However, when you look closer, you start to notice that the scale of the graph might be doing most of the work, and not the actual data. If we look at moderates, their sample tweets included political words 0.36% of the time, while those who are very conservative and very liberal used political words 0.76% of the time— a difference of only 0.40% percent. While that percentage might be really small, it looks pretty large when the X-axis only goes from 0.00% to 1.00% at 0.25% increments.
What is interesting to me about this graphic is that it does not appear in the researchers’ published study. Instead, the study uses this graph:
What you’ll notice right away is the extra data point on each end— D2: Con. and D2: Lib. These data points were collected from a completely different set of Twitter users with “overt political orientation.” In order to fall into these “D2” categories, users had to meet specific criteria. Essentially, the D2 group are extremely Liberal or Conservative, and they are used for comparison. If you include their data, the range goes from 0.00% to 3.00% at 0.50% increments, and the “C” looks much more defined. However, if you ignore the two end data points, so that you are looking at just the data that was included on the first graphic, the “C” has a lot less curve. On this graph, most of the curve comes from the two extreme points at the end.
I found these graphics interesting because I wonder why Dan Hopkins changed the graph for the FiveThirtyEight article. Did he think it would just be too complicated to explain the D2 data? I also wonder if the graph is misleading. I think it is, mostly because Hopkins just plopped it into his article with no explanation. There is no analysis of the data points and if a difference of 0.40% is significant. For me, this was a good reminder that I have to be diligent even when I go on sites, like FiveThirtyEight, that I think are reputable and responsible.