Are Americans Less Neighborly?

This week, I was reading through The Washington Post when I found an article called “It’s not just Rand Paul’s street: Americans are a lot less neighborly than they used to be.” Christopher Ingraham wrote the article after U.S. Senator Rand Paul was attacked by one of his neighbors. Ingraham argues that this type of behavior isn’t isolated, but part of a trend in America—people are becoming less neighborly.

The writer’s main support for this argument is data from the General Social Survey (GSS). The GSS is a nationwide sociological survey which has been administered since 1972 by the National Opinion Research Center at the University of Chicago. The data from this survey is often used by journalists, politicians, and policymakers.

The article says that “In 2016, the share of Americans who say they ‘never’ socialize with their neighbors hit an all-time high of 34 percent, according to the General Social Survey.” Below this statement was the following graph:

WP Graph

This is the claim that I am specifically interested in—whether or not America has actually become less neighborly. The article includes no link to the GSS data, so I had to find the GSS website by myself (you can find it here). The GSS has a search function where you can enter a search word and find all the data connected to that word. For example, if you type in the word neighbor, the GSS will show you all the surveys which include the word neighbor. You can then click on each question and find the response breakdown for each year that the question was included in the survey.

I went through this process, and I was a little confused because none of the questions asked “How often do you socialize with your neighbor,” or some close variant (here are my search results). I thought that maybe I just didn’t know how to use the search function properly, but then I stumbled across the following question: “How often do you spend a social evening with someone who lives in your neighborhood.” Out of all the questions in the database, this was really the only one dealing with neighbors and social behaviors. However, I didn’t really think that this could be the question that Ingraham was referring to. After all, “How often do you spend an evening with your neighbor,” and “How often do you socialize with your neighbor” are different questions. Someone might rarely, or even never, spend a social evening with their neighbor. They might, however, socialize with their neighbor at church or PTA meetings, or play pickleball with them on a Saturday morning.

I thought that The Washington Post article couldn’t possibly be based on the GSS question, until I looked at the data. The graph from above shows that 21% of people in 1974 never socialize with their neighbor. This matches exactly the breakdown for the question that I found on the GSS website—of the 1,484 people surveyed in 1974, 322 said they never spend a social evening with someone who lives in their neighborhood (or, 21.7%). The breakdown of the question from 1974 is below.

GSS 1974

So what about the 2016 data for this question? Of the 2,867 surveyed, 611 said they never spend a social evening with someone who lives in their neighborhood, or, 21.3%. Below is a breakdown of the question for 2016 (you’ll have to ignore all the years in between), as well as the percentages provided by the GSS codebook.1

 GSS 2016GSS Codebook

21.3% is far from the 34% of Americans that The Washington Post claimed weren’t socializing with neighbors. However, I also noticed that if you take all those that responded “Not Applicable”2 out of the 2016 sample, you get 611 out of 1888 that never spend a social evening with their neighbor—which ends up being about 32.4%. While The Washington Post graph claimed 34% in 2016, I still think this is a solid piece of evidence to suggest that the Ingraham was looking at this question and just didn’t include the full sample.

So is The Washington Post’s claim completely bogus? I can’t say for certain, but I think so. Because Ingraham didn’t provide links and I couldn’t follow his path, I suppose it is possible that he was looking at data I couldn’t find. However, I think it is probable that Ingraham took the question about spending a social evening with a neighbor and generalized it to all socialization with neighbors.

In response to my title— “Are Americans less neighborly?”—I think I would say, “I don’t know.” The Washington Post article wasn’t all that helpful, and I don’t think that the GSS question I found is that representative of “neighborliness.” Also, this blog doesn’t address issues of survey methodology such as sample size, the wording of questions, and changes in the survey over the years. I guess what I learned from writing this blog post is to be careful with data collected from surveys, and even more careful with articles based on survey data.

 

 

1 There is no codebook for 1974, or I would have provided that information as well.

2 The “Not Applicable” option was not offered in the 1974 GSS.

Advertisements

Could the UDOT-Strava partnership create a WMD?

This week, I was scrolling through Facebook when I saw an article from The Salt Lake Tribune called “Utah is using your social-media data to make roads safer for bikers and pedestrians.” Essentially, the article is about how the Utah Department of Transportation (UDOT) is planning to use data from an app called Strava to improve transportation for cyclists and walkers.

Since I had never heard of Strava before, I looked into the app and its functions (you can find their website here). Strava is an app used by cyclists, hikers, mountain bikers, and runners to track their activity level. However, Strava also markets itself as a social media site because users can upload photos, share workouts, and see who is using the app around them. Strava made an agreement to give UDOT the data on its Utah users’ workouts. The data includes the type of workout (e.g. cycling or walking), the user’s route and the speed they traveled at, and how long someone had to wait at different intersections.

This data from Strava could help provide information that city planners don’t currently have—where, when, and how non-driving commuters are traveling. With this data, UDOT believes it can determine which paths should be prioritized for improvement and upkeep. UDOT also plans to use the data to determine if some places need separate walking and bike trails. However, the data is not representative of everyone in Salt Lake who walks and bikes. The data only represents those who use the Strava app—people who tend to be younger, wealthier, and into fitness. In many ways, this reminds me of the article that we read called “Big data: are we making a big mistake?” This article talks about Speed Bump, which had a lot of the same problems that UDOT’s use of Strava might have.

Based on some of the problems that this UDOT-Strava partnership could have, I started to wonder if this could be a weapon of math destruction. In order to determine this, I used Cathy O’Neil’s taxonomy, which looks at opacity, damage, and scale.

First, is the use of the data opaque? This one is a little hard to determine because UDOT hasn’t actually implemented anything yet. In fact, data on the Strava data is hard to come by. In both the Trib article, and statements put out by UDOT (like this one), there isn’t information on how many Utahns use Strava or how much data UDOT will have access to. For example, the Trib article references a study in Salt Lake that found only 2 percent of cyclists use Strava. However, there was no link or further information about this study, and I couldn’t find it anywhere. In this sense, there is some opacity. Additionally, the situation could become more opaque if UDOT isn’t clear about how much the Strava data influences their decisions. However, you could say that UDOT is being transparent because they have informed the public that they intend to use the data.

The next test is whether or not UDOT’s use of this data causes damage. In Cathy O’Neil’s words, “Does the model work against the subject’s interests?” (29). In response, I would argue that this situation works in the exact opposite direction. Because this data only provides UDOT with information about Strava users, UDOT might prioritize the improvement and creation of paths that would benefit the people who use Strava. Commuters from lower income areas who don’t use the Strava app might desperately need improvements to their routes, but the Strava data couldn’t show that. The damage, then, is done to those whose activity is not captured by the Strava data. I think this still counts as damage because a population is still facing negative impacts.

Finally, what is the scale of the damage? While several states have entered partnerships with Strava, this type of pairing is still not widespread. However, I think the use of this data in Utah could cause a lot of problems within communities. How widespread the damage is, I think, dependent on how much UDOT relies on this data. If the Strava data is more of a supplemental tool, then maybe it doesn’t have that much potential for wide-scale damage.

So is UDOT creating a WMD by using Strava data? Until UDOT actually starts implementing the use of Strava data, it will be hard to tell. Particularly, it is hard to make a prediction because I couldn’t find out any specific information about the data. But I do think there is the potential for the situation to go either way. If UDOT is fairly transparent with their data and how they use that data, I don’t think there is a weapon of math destruction. However, if UDOT doesn’t share any of the data they use, and they rely heavily on that data to make decisions, I think we would be looking at a WMD.

 

*The featured image for this post is a screengrab from the Strava Global Heatmap. The Heatmap shows all Strava users’ workouts (unless they have adjusted their privacy settings) onto a single map. On the map, you can easily pick out rural areas because they have much lower activity. However, lower-income areas like South L.A. and Chicago’s South Side also have significantly fewer data points. You can find the heatmap here.

 

 

Rating the “Truthfulness” of Movies

For this blog post, I was looking around on a data visualization website called Information is Beautiful. The website was started by David McCandless, an author who designs infographics. The purpose of the website is to “distil the world’s data, information and knowledge into beautiful and useful graphics & diagrams.” McCandless also boldly states that his “pet-hate is pie charts,” so I figured he at least knows a little bit about presenting data in a truthful way.

As I was scrolling the home page, a post titled “Based on a *True* True Story” caught my eye. The researcher, Stephanie Smith, went through 17 different “Based on a True Story” movies scene by scene to calculate how truthful they are. Each scene was rated on a “truthfulness” scale – True, True-ish, False-ish, False, and Unknown – with a color to represent each level. After coding the whole movie, Smith created data visualizations for each movie, like the ones below.

Movie Infographics

On the website, you can click on each individual bar, which represents a scene, and a box will pop up, explaining why it was labeled the way it was. Another interesting feature of the data set is that there are three different pedantry levels— “Flexible – c’mon, it’s movies!” “Can bear some dramatic license,” and “Only the absolute truth.” The movies have a different rating for each level. For example, under the “flexible” and “some dramatic license” levels, Selma was given 100%. However, on the “absolute truth” level, Selma is only given an 81.4% truth rating.

Overall, I thought this visualization was really informative and helpful. One thing that I found interesting was that there was no explanation or analysis of the data—it is just placed on the page for the reader to draw their own conclusions. On the one hand, I think this good because they are just giving you the facts, not interpreting them for you. On the other hand, though, it would be helpful if there was a written explanation of the methodology. While they do provide a google sheet which breaks the information down scene by scene for each movie, there still was not a summary of the method. An explanation of the methodology would be helpful because I don’t quite understand how Smith assessed the different pedantry levels—how did she decide that a scene was true under one level, but false under another?

Another concern I had was that Smith was treating each “false” as equal. In other words, it doesn’t matter how serious a falsehood is, they all measure the same. Take The Big Short and Bridge of Spies for example. One scene in The Big Short depicts Geller and Shipley wandering around the empty Lehman offices after the big crash. This didn’t actually happen, so the scene is labeled “false.” In Bridge of Spies, the antagonist is depicted as spending the night in jail, which is also “false.” While both falsehoods are treated the same, I don’t personally think they are equal. Showing two people walking around an empty office is different than showing someone in jail—I think the latter is more severe. I do wonder, then, if you can really compare the truth percentages of movies against each other. One movie might have a lot of falsehoods that are not all that crucial, while another might have fewer falsehoods that are more crucial.

Despite some of my uncertainty about the methods, I still think these visualizations are really helpful if you want to know more about how “true” a movie is. I know that when I see “Based on a True Story” I sometimes take that statement at face value. Sure, I understand that there is some truth-stretching and creative additions, but unless I am really interested in the topic, I usually don’t research more. I understand that the movie isn’t totally accurate, but I don’t know what and how much isn’t true. This post definitely got me thinking about how I need to be a more responsible consumer of movies, especially when they claim to be based on a true story. I also wonder if writers and producers need to be more accountable when they create these movies. Perhaps there should be more specificity than “Based on a True Story?” I don’t think that movies should have to say “Warning: only 74% true,” but I do think that there needs to be either more accurate storytelling or more transparency about what “based on” means. I think there needs to be more accountability because there might be a lot of people out there who have misconceptions about historical events because of something they saw in a movie (maybe an interesting topic for my next post?).

 

 

 

 

Twitter, Politics, and a Misleading Graph

After doing this week’s readings by Nate Silver, I went to FiveThirtyEight to check out some of the articles. I came across an interesting one by Dan Hopkins titled “Political Twitter Is No Place For Moderates.” Four researchers from the University of Pennsylvania, one of whom is Dan Hopkins, investigated who discusses politics on Twitter. You can find their study, “Beyond Binary Labels: Political Ideology Prediction of Twitter Users” here.

In order to find out who talks about politics on Twitter, the researchers downloaded 4.8 million tweets from 3,938 Twitter users. They asked each user to rate themselves on a 7-point scale as either Very Conservative, Conservative, Moderately Conservative, Moderate, Moderately Liberal, or Liberal, Very Liberal. Then, they selected the most common 12,000 words from all the tweets and coded them as either political or non-political. Examples of political words, as defined by these researchers, include “president,” “racism,” and “Romney.”

After breaking down words into political and non-political, the researchers went back through the tweets (with an algorithm, not by hand) to determine which political categories use these types of words most frequently. The chart below, which comes from the FiveThirtyEight article, demonstrates their findings.

FiveThirtyEight

You’ll notice that they broke the words down into further categories: media/pundit names, politician names, and common political words. Visually, the “common political words” data is what jumps out at you. There is a definite “C” pattern which seems to strongly suggest that people who have more extreme political views are much more likely to talk about politics on Twitter. However, when you look closer, you start to notice that the scale of the graph might be doing most of the work, and not the actual data. If we look at moderates, their sample tweets included political words 0.36% of the time, while those who are very conservative and very liberal used political words 0.76% of the time— a difference of only 0.40% percent. While that percentage might be really small, it looks pretty large when the X-axis only goes from 0.00% to 1.00% at 0.25% increments.

What is interesting to me about this graphic is that it does not appear in the researchers’ published study. Instead, the study uses this graph:

Dan Hopkins

What you’ll notice right away is the extra data point on each end— D2: Con. and D2: Lib. These data points were collected from a completely different set of Twitter users with “overt political orientation.” In order to fall into these “D2” categories, users had to meet specific criteria. Essentially, the D2 group are extremely Liberal or Conservative, and they are used for comparison. If you include their data, the range goes from 0.00% to 3.00% at 0.50% increments, and the “C” looks much more defined. However, if you ignore the two end data points, so that you are looking at just the data that was included on the first graphic, the “C” has a lot less curve. On this graph, most of the curve comes from the two extreme points at the end.

I found these graphics interesting because I wonder why Dan Hopkins changed the graph for the FiveThirtyEight article. Did he think it would just be too complicated to explain the D2 data? I also wonder if the graph is misleading. I think it is, mostly because Hopkins just plopped it into his article with no explanation. There is no analysis of the data points and if a difference of 0.40% is significant. For me, this was a good reminder that I have to be diligent even when I go on sites, like FiveThirtyEight, that I think are reputable and responsible.

Is the Myers-Briggs Type Indicator Test Valid and Reliable?

Last class, a few of us were talking about the MBTI1 during the break. In particular, we were wondering if the test is really able to accurately determine personality type. I was curious about the answer to our question, so I decided to research the MBTI for my blog post. I found an article by David Pittenger in the Review of Educational Research (a peer-reviewed journal that is published by the American Education Research Association). Like me, Pittenger is interested in answering the question: Is the MBTI valid and reliable? Or, in other words, does the MBTI consistently measure what it says it measures? To be even more specific, does the MBTI consistently measure an individual’s personality? Pittenger answers this question by reviewing numerous studies that measure the validity and reliability of the MBTI. For this blog post, I’ll review some of the conclusions and studies that he cites.

First, Pittenger explains that the data from a sample of tests should look bimodal because the categories are mutually exclusive— for example, you are coded as either thinking or feeling, but not both. However, researchers have found that the test does not produce bimodal data. For example, Stricker and Ross (1962), Hicks (1984), and McCrae and Costa (1989) graphed data about introversion/extroversion from a collection of MBTI results. They found that the distribution of scores is actually normal. This indicates that most people’s answers put them somewhere in between extrovert and introvert, but the test categorizes them as either extrovert or introvert based off of which they lean towards the most, however slight the lean. If, in reality, personality types really are mutually exclusive, then “there should be separate distributions of scores representing extroverts and introverts, and each distribution should have an independent mean and standard deviation” (471). Essentially, the researchers found that people exist somewhere along the extroversion/introversion scale, and not squarely in one or the other as the test suggests.

Pittenger also argues that the test-retest reliability for the MBTI is suspect. He cites several studies, all of which indicate a high test-retest reliability for individuals. However, Pittenger is cautious about these reliability numbers because they don’t consider the fact that even changing just one letter is a serious change. For example, say someone takes the test and their results say they are an ISTP, but a few months later they retest and are now an ESTP. This may appear pretty reliable because not much has changed— just one letter. The raw data might also show that the numbers are very close. However, the test is essentially saying that someone has gone from an introvert to an extrovert. That is a pretty dramatic personality change, especially considering that one tenet of Jung’s work, and the Myers-Briggs test, is that personality is fairly fixed throughout one’s lifetime. Additionally, Pittenger cites a study that “examined the stability of the type assignment across a 5-week interval and found that 50% of the subjects were reclassified on one or more of the four scales. This finding suggests that the four-letter type code is not a stable personality characteristic” (472).

There are also ethical concerns about using the MBTI test that Pittenger raises. For example, say a career adviser uses the MBTI to help people determine what jobs might be good for them. If someone’s test labels them an ISFJ (someone who is caring, sympathetic, and organized), the adviser might be more likely to steer them towards nursing or accounting rather than police work or business. Even if these assumptions are accurate for a handful of people, they are still stereotypes about what kinds of work people enjoy or do well. As Pittenger says, “left unchallenged by empirical investigations, these stereotypes become persistent in the culture” (482).

Another ethical issue that Pittenger raises is that of self-fulfilling prophecy. Someone, after taking the MBTI, might read their results and find them to be very accurate. This could be because the person expects the results to be accurate, so they ignore inaccurate feedback and remember accurate feedback. Also, the information might be so vague that people can easily read themselves into the description without feeling uncomfortable. This second phenomenon is called the “Barnum effect” and it is often used to explain why some people are so convinced by horoscopes.

Pittenger concludes his investigation with this damning statement:

There is insufficient evidence to justify the specific claims made about the MBTI. Although the test does appear to measure several common personality traits, the patterns of data do not suggest that there is reason to believe that there are 16 unique types of personality. Furthermore, there is no convincing evidence to justify that knowledge of type is a reliable or valid predictor of important behavioral conditions. Taken as a whole, the MBTI makes few unique practical or theoretical contributions to the understanding of behaviors (483).

After reading the evidence, I agree with Pittenger. However, most people who administer and use the MBTI would probably argue that although the MBTI isn’t entirely valid or reliable, it is still a useful tool. I agree with this— to some extent. I understand that thinking about different personality types can be helpful in considering communication and work styles. However, I also think that someone administering the test might have to issue too many disclaimers or caveats for the test to be useful. I do wonder if a more practical approach to the MBTI would be to show people where they fall on the scale of each category, rather than assigning them a single letter. I think this could alleviate some of the concerns highlighted above.

1For those of you who are unfamiliar with the MBTI, it is a personality test based on the work of psychologist Carl Jung and developed by mother-daughter duo Katharine Briggs and Isabel Briggs Myers. When someone takes the MBTI, they answer a series of questions that force them to choose between two behavioral choices/characteristics. For example, a question could read: “I would describe myself as (A) adventurous or (B) consistent.” Test results are counted and used to indicated which category someone belongs in for the following four dimensions: Extroversion-Introversion (EI), Sensing-Intuition (SN), Thinking-Feeling (TF), and Judgement-Perception (JP). Based on someone’s responses, they are given a dominant type for each category, which creates their four-letter personality. For example, I am an INTJ, so my dominant types are introversion, intuition, thinking, and judgment.

Do Drug Testing Policies Actually Deter Student Drug Use?

Earlier this week, I was doing research for another class when I found a Supreme Court case that I was not familiar with. The case, Vernonia School District 47J v. Acton, was decided in 1995 and has to do with drug testing in high schools. Essentially, the Supreme Court found that mandatory drug testing for students involved in extra-curricular activities does not violate the Constitution, in part because of the importance of deterring drug use. I found the court’s argument persuasive, but I was left wondering: do drug testing policies actually deter students from using drugs?

In order to answer this question, I looked at some articles on the subject. I found one article by The New York Times, and another by The Washington Post, which reach the conclusion that school drug testing programs are costly, uninformative, and not that likely to deter drug use. I want to specifically focus on a study cited by The Washington Post article in order to investigate whether or not drug testing programs are successful.

The cited study, which you can find here, was conducted in 2010 by the National Center for Education Evaluation and Regional Assistance (NCEE). The NCEE is part of the U.S. Department of Education. The purpose of NCEE is to evaluate federal programs and provide objective information, but not to recommend policy. I did not read the whole study because it is over 300 pages long, but I did read the initial 50 pages which explain the study design and findings.

NCEE conducted the study by randomly sampling students from 36 different high schools about their use of drugs and their perceptions of the consequences of drug use. After the surveys were complete, the researchers randomly assigned each high school to either the control or treatment group. The schools in the treatment group were to immediately begin mandatory randomized student drug testing. The schools in the control group, however, were not allowed to participate in drug testing and were instructed to not even bring up the topic of drug testing with students. One year later, the exact same survey was distributed at the schools and the responses from the treatment and control groups were compared.

The researchers note two main findings. First, students in the treatment group reported less substance use compared to students in the control group. Second, there was not a statistically significant difference in how students in the treatment group felt about the consequences of drug use compared to students in the control group. In essence, students who attend schools with mandatory drug testing programs might be less likely to report drug use than students without drug testing programs, but their opinions toward drug use are not significantly different.

So does mandatory randomized student drug testing deter drug use? The data doesn’t suggest a strong “yes.” Even though students in the treatment group reported less use, they still showed the same opinions toward drug use. If drug testing programs don’t actually help students understand the consequences of drug use, I’m not sure that I would consider them successful deterrents— especially not in the long run. For example, the researchers noted that “Students in treatment schools were as likely as students in control schools to report that they ‘definitely will’ or ‘probably will’ use substances in the next 12 months.” So perhaps, in answer to my question, drug testing deters some students in the short term, but not in the long run? If this is the case, it seems like schools need to rely on more effective deterrence policies.

Is Attorney General Sessions Right About Crime in the US?

On February 28, 2017, Attorney General Jeff Sessions spoke at the National Association of Attorneys General annual winter meeting. You can watch the speech here. During his remarks, Attorney General Sessions covered a wide variety of topics from drugs, gangs, law enforcement, immigration, and guns. However, underlying his message on all of these topics was his belief that crime in the US is on the rise. For example, Attorney General Sessions said, “Now we are at a time, it seems to me, crime is going back up again. Overall crime rate increased last year 3 1/2 percent. One of the bigger increases, I think, since 1991.” After relaying this data, AG Sessions stated “I’m afraid it represents the beginning of a trend.” So is crime in the US on the rise?

The FBI’s crime statistics for the year 2016, which came out this month, are available here. On some level, the data does support what AG Sessions said— violent crime and murder in the US did increase from 2015 to 2016. But does an increase in one year prove that there is a trend? No, the data just demonstrates an increase from one year to the next. For example, this Vox article by German Lopez provides a handy chart showing the US murder rate per 100,00 people from 1960 to 2016. Looking at the graph, you can see that the murder rate—which most criminologists agree is the best comparison for crime— has increased from 2015 to 2016. But, overall, the murder rate is still on a declining trend.

Another important point is that national crime and murder rates don’t provide meaningful data about what is happening on state and local levels. Just because the national crime and murder rates increase does not mean that this increase is happening uniformly across the country. As the Vox articles points out, “Chicago alone contributed to about 22 percent of the increase in murders.” This could be a case where extreme values are pulling the national rate higher.

Additionally, increased crime rates at state and city levels are not always connected. In the Vox article, the writer described what is known as the “Ferguson effect.” Some criminologists and journalists believe that events in Ferguson,  and specifically the Black Lives Matter movement, have made law enforcement officers too scared to enforce and therefore prevent violent crime. Another offshoot of this theory argues that community members no longer trust police officers, which leads to violent crime going unaddressed. While someone might find these theories intriguing, there is no substantial evidence to back them up. In reality, it is entirely possible that there is no single or common cause connecting each city or state’s crime rate. Different locations could have an increase in violent crime and/or murder because of localized reasons that don’t carry over to another area. This makes me think of what boyd and Crawford refer to as apophenia, which is “seeing patterns where none actually exist.” When researchers or journalists look for a single way to describe rising crime rates in different areas, they might just be apophenic.

I think it is also important to remember that national crime rates only factor in behaviors that are considered crimes. This means that what behaviors get included in crime statistics may vary from year to year, depending on what is labeled a crime. While the legal status of different behaviors may not vary that much from year to year, there could be a significant difference from 1970 to 2016. Additionally, how effectively laws are enforced, and criminal behavior reported, could have a significant impact on the data.

So was Attorney General Sessions right in his assessment of crime? I don’t think so. While data might, at first glance, seem to support his theory, I don’t think any critical analysis of the data substantiates his claim.