Does the death penalty deter violent crime?


Proponents of the death penalty say it deters people from committing crimes, especially violent crimes like murder. In other words, when execution is a potential consequence, rates of violent crime decrease. Opponents of the death penalty, however, argue that executions “brutalize” society by officially diminishing the respect for life. This in turn increases the rate of violent crimes. In this blog post, I want to explore the data surrounding this debate, and try to determine whether or not the death penalty is a deterrent for crime.


Let’s begin by looking at the context surrounding the issue. Currently, the U.S. is one of 57 countries worldwide which retain the practice of capital punishment—a.k.a the death penalty.

57 countries

However, in the U.S., only 31 states, the federal government, and military still use capital punishment as a legal penalty.

31 states

According to a 2013 Pew Research survey, 55 percent of American adults surveyed said they favored the death penalty for persons convicted of murder. Compared to only 37 percent who said they opposed the practice.


Across demographic groups, the percentage of those in favor of the death penalty is also higher than those opposed.

death-penalty-demographic groups


Despite extensive research on this topic criminologist, statisticians, and politicians have been unable to prove the death penalty has any effect on crime rates. ([Washington Post] [Dartmouth College] [University of Pennsylvania]) In 2012, the National Research Council released a report reviewing three decades of research. The committee in charge of the report stated:

“The committee concludes that research to date on the effect of capital punishment on homicide is not informative about whether capital punishment decreases, increases, or has no effect on homicide rates. Therefore, the committee recommends that these studies not be used to inform deliberations requiring judgments about the effect of the death penalty on homicide. Consequently, claims that research demonstrates that capital punishment decreases or increases the homicide rate by a specified amount or has no effect on the homicide rate should not influence policy judgments about capital punishment.”

In the report, the National Research Council outlined three “fundamental flaws” of the existing studies on deterrence:

  • The studies do not factor in the effects of non-capital punishments that may also be imposed.
  • The studies use incomplete or implausible models of potential murderers’ perceptions of and response to the use of capital punishment.
  • Estimates of the effect of capital punishment are based on statistical models that make assumptions that are not credible.

Other common problems when reviewing deference studies include: correlation vs. causation, confounding variables (uncontrolled experiments), declining overall rates of executions, and the differences between deterring petty crime and violent crime. As the department of criminology at the University of Pennsylvania says, “One cannot study the impact of executions when they are hardly ever imposed, and it is difficult to separate any impact of the death penalty from the large number of other factors that affect the amount and kinds of crime.” At the end of the day, neither side of the debate can claim to have empirical support.


If this were a digipo article, the short answer to my question would be: can’t tell. Essentially, research cannot tell if the death penalty influences crime at all, let alone if it deters violent crime. This answer was equal parts expected and surprising. Growing up, I was told there was no evidence that the death penalty was an effect deterrent. However, I had no idea how inconclusive the whole field of research was. Yet, despite ambiguous research, I think the death penalty is falling out of favor. Even in states where capital punishment is legal, the rate of executions has been steadily decreasing over the last decade. I think it’s likely the U.S. will abolish capital punishment before any conclusion is reached from the data.


Big data and baking

kitchen-cake-backe-backingThis blog post was inspired by my younger brother who turned 17-years-old over the weekend. It’s family tradition that I make his birthday cake—although we’ve branched out from cakes into other deserts the last couple of years—and it was in the middle of making a banana and caramel cheesecake that I remembered I still had to write this blog post. This got me thinking. Is there a way to apply big data tools to baking?

Apparently, many corporate bakeries are data driven. I mean why not? As we know, data can improve efficiency which increases profits something any business is interested in. The example I’m going to analyse is from the company blog of Factora—an organization that plans, designs, and implements “Manufacturing Execution Systems (MES) and Manufacturing Operations Management (MOM) solutions.” In the post, author Michael Chang, uses the story of a bakery to demonstrate the evolution of smart manufacturing with the introduction of big data.

The baking company—which was never named in the post—hired Factora to eliminate a “waste issue.” (On the production line, a certain type of cake was rising too high, too often.) From my research, I learned variability this is a common issue in food production. Products that are too small or too large cannot be packaged and sold, thus, wasting company resources. Chang claims a variability problem is “one which smart manufacturing was born to remedy.” This does not mean, however, that creating the solution was easy. As Chang explains:

For those of you with no experience in food production, imagine vats of flour, hundreds of gallons of water flowing through industrial hoses, bags of sugar – in all, 300 meters of mixing and baking machine. Is it the percentage of flour? The heat of the oven? The amount of water? The size of the eggs? Or a combination?

And did we mention that every line baker, over time, has developed their own way of managing the settings and producing the cakes? That the whole process is regarded as a type of expert black magic, known only to a skillful handful?”

Factora’s solution had two steps: One, data scientists used trend analysis—the practice of collecting information and attempting to spot a pattern and then using historical results to predict future outcomes—and test batches to develop “a new set of norms for production.” Two, Factora mounted overhead electronic displays to alert line bakers if any given KSF (solubility product constant) was out of spec. Or in other words, when a batch needed adjustment.

What I found interesting about this blog was that it explores the changes to the smart manufacturing industry with the introduction of big data tools. Chang explains that if Factora was to tackle the cake problem today, they would use an algorithm to optimize the cake baking process. “Rather than a data expert developing a set of static test criteria from a data set, the predictions would be machine-generated, becoming ever more sophisticated over time, backed by ever more data.” Chang says, the program Factora uses to generate algorithms—Analytics, by ThingWorx—can take years of data and create an algorithm that offers 70 percent accuracy. (Which seems low to me, but is apparently more than acceptable in the industry.)

Essentially, even baking can create a data set to be dumped into a program and used to produce an algorithm. I do, however, see a difference between these hypothetical baking algorithms and those labeled Weapons of Math Destruction by Cathy O’Neil. For one, these baking algorithms can receive feedback and be adjusted accordingly. It’s very easy to track which cake batches succeed or fail and compare them with the predictions of the algorithm. These baking algorithms are also transparent because bakeries need to know exactly what goes into a successful batch.

Honestly, I was pleasantly surprised by this blog post. Who knew there was a useful and non-harmful way to use big data to improve baking? That being said, I wonder if this only applies to corporate bakeries or if mom-and-pop shops could benefit as well.

Simpson’s Paradox helps us understand US median wage trends

coins-currency-investment-insurance-128867Although we discussed Simpson’s Paradox in class, it still amazes me that it is at all possible for a trend to appear in different groups of data but disappear (or reverse) when these groups are combined. I especially could not believe how difficult it was to make our own example of Simpson’s Paradox. In fact, Cheyli and I went through several attempts before we made a functioning table for the in-class activity. My bewilderment, however, made me curious. So I asked myself, “What are some real-world scenarios influenced by Simpson’s Paradox?”

One of the most frequent examples I found while researching was the case of the US median wage. As outlined in this New York Times article, the median US wage has risen about 1 percent (adjusted for inflation) since 2000. However, over the same period, the median wage for high school dropouts, high school graduates with no college education, people with some college education, and people with a Bachelor’s degree or higher have all fallen.

This blog post from Revolution Analytics does a nice job of explaining how this is possible. According to the author, David Smith, part of the explanation, is that the education profile of the workforce has changed over the last two decades. There are now more college graduates than there were in 2000, and wages for this group have declined at a slower rate than the other groups I mentioned above—high school dropout, high school graduates only, and some college. As of 2013, wages for college graduates were down 1.2 percent, while wages for high school drop outs have fallen 7.9 percent in comparison. Essentially, growth in the proportion of college graduates overwhelms the wage decline for specific groups. There appears to be less of a decline overall because there are more people in the group that had the smallest decline in media wage.

–On a quick side note, most of the information and articles I found on this topic were over four years old. While I do not think this impacts our understanding of the scenario, it is important to note that this data may be out of date.–

An interesting takeaway from my research, is that Simpson’s Paradox reflects the fact that your perception of events can change depending on your viewpoint. It shows that while a headline might read, “Wages are increasing” that is not the reality for many people. At the end of the day, there can be a large discrepancy between what is reported by economists and journalists and what is being experienced by everyday citizens. Food for thought.

Work Cited:

Do male characters dominate film?


This blog is a review of “The Largest Analysis of Film Dialogue by Gender, Ever,” an article and data visualization from The Pudding—a weekly journal of visual essays. The authors of the study, Hanah Anderson and Matt Daniels, analyzed the gender breakdown in approximately 2,000 screenplays to see if more film dialogue is spoken by male characters.

In short, Anderson and Daniel’s study did find that male characters have more dialogue than female characters. However, rather than take this at face value, let’s explore the methodology and results of the study.



This study analyzed the gender breakdown of dialogue in roughly 2,000 publicly-accessible screenplays. Anderson and Daniels write, “We didn’t set out trying to prove anything, but rather compile read data. We framed it as a census rather than a study.”

With each screenplay, Anderson and Daniels then “mapped characters with at least 100 words of dialogue to a person’s IMDB page.” I wasn’t entirely sure what this meant so I did a little more digging. Luckily at the end of the article was a link to a FAQ that addresses concerns about the methodology and data of the study. What I found was that Anderson and Daniels only included characters with at least 100 words of attributed dialogue in their analysis. Then they tried to identify/confirm the remaining character’s gender from IMDb pages—which list the actress or actor who played the role. Or if an IMDb page was unavailable, they used the pronouns used in the screenplay. It is important to note, that the authors eliminated screenplays that were too inconsistent with the film cast listed on IMDb.


There are some obvious problems when using screenplays to measure the gender divide in film dialogue. For one, films can change significantly from script to screen. Creators rewrite lines, cut lines, add characters, or cast a different gender for a character than was indicated in the screenplay.

Additionally, Anderson and Daniels’s analysis only included characters with over 100 words of dialogue which cut out minor characters. Schindler’s List, for example, features a few minor female characters who speak less than 100 words. Thus, while the film is listed as having “100% of Words are Male” the measurement would actually be closer to 99.5 percent male dialogue.

Anderson and Daniels confess these limitations freely in their article. In fact, the Schindler’s List example is there’s. However, despite admitting some possible errors with individual films, they said they believe their results are still “directionally accurate.” Can they reasonable make this claim?

Additionally, Anderson and Daniels used publicly-available screenplays which could skew the results of the study. While unlikely, it is possible publicly-available screenplays are overly representative of male-dialogue-driven films. Or maybe they have some other bias or confounding variable we are unaware of.



The final results of the study showed that almost 76 percent of screenplays had the majority (60 percent or more) of its dialogue spoken by male characters. That is 1,513 out of 2,000 screen plays.

all table


Anderson and Daniels also looked specifically at Disney and Pixar animated movies—which have been called out before for lacking gender parody—and found 22 of 30 films have male majority dialogue.


Shockingly, in the movie Mulan, Mulan’s protector dragon Mushu, voiced by Eddie Murphy, has 50 percent more dialogue than Mulan herself. I assumed that as the lead of the movie, Mulan would have the most lines, but apparently more dialogue is spoken to or about Mulan than by her.



This study does not prove that Hollywood is sexist. Yet, the authors never claimed that their study “proves” anything. Daniels points out in the FAQ that this study only sheds a light on one part of film representation: dialogue. While dialogue is an important piece, we cannot make any definitive claims from this data alone. For one, Anderson and Daniel did not factor screen time or context (the ways in which characters are portrayed) into their analysis. Additionally, the article does not contain statistical analysis to check for significance . In short, while the information is interesting, we can not extrapolate or make large generalizations from it. We have to take it for what it is—one slice of a larger pie.

Are pop songs getting more repetitive?


I think we have all experienced a time in our lives when we have had a song stuck in our head. In my experience, it is usually a song I am not particularly fond of but is catchy so my mind keeps returning to it.

This weekend, I found a wonderful data visualization from The Pudding, a weekly journal of visual essays, that asks the question: Are Pop lyrics getting more repetitive? This blog post explores the process, results, and limitations of the author, Colin Morris’s, analysis.


First off, how do you measure repetitiveness in song lyrics? Can you quantify it by dividing the number of unique words by the total number of words in the song? The short answer is no—it is a little more complicated. Here is an example to show us why.

song 1

According to the percent uniqueness metric, both of the above choruses would be equally repetitive—52 words long and use the same 23 word vocabulary. Obviously, this is not true which makes this a poor method for measuring song repetitiveness. In reality, the chorus on left is much more repetitive because it not only repeats words, but it also arranges words in a predictable order.

Thus, Morris turns to the Lempel-Ziv algorithm to measure repetitiveness. The Lempel-Ziv algorithm is a lossless compression algorithm (think of a zip file) that “works by exploiting repeated sequences.” In short, the algorithm targets duplicated lines of lyrics and similar sounds in words (like ills from bills and thrills) and replaces them with markers. In the case of Sia’s Cheap Thrills, the chorus is reduced from 247 characters to 133 characters when using the algorithm—that’s total reduction size of 46.2 percent.

song 2

In contrast, Morris’s original composition has a 22.9 percent reduction size. This demonstrates how the algorithm is a more effective way to measure song repetitiveness.

(If anyone is curious, here is a link to a list of several full songs run through the algorithm. The website shows you a visual representation of how the algorithm compresses the songs and arrives at their percent reduction size.)

After running 15,000 songs through this algorithm, the distribution looks like this:

song 3


To answer the question posed at the beginning of this blog, Morris compares the average percent reduction size by year of songs from 1960 to 2015. When he plots the average of all songs in any given year, there is a positive trend on the graph (blue line). In other words, the percent reduction has gone up in recent songs. The same is true when looking at the top 10 songs for any given year—although there is much more variability year to year, the overall trend is still positive (yellow line). So yes, according to this data, pop songs are becoming more repetitive.

song 4


I can think of several limitations for this analysis. One, I do not know where the data (songs) in Morris’s database came from. They could be the top 100 songs per year from the Billboard charts or something completely different. Two, the sample of pop songs studied does not appear to be a random sample. Morris makes no mention of randomly selecting n songs from each year but on the flip-side, it would be impossible to collect all pop songs produced in the United States in any given year. Three, as Morris himself writes, “it’s easier to find lyrics for recent songs” which could mean new songs are over represented in the data set. In the end, I would be careful about making any general claims from this analysis—although it is an entertaining project.

FUN FACT: Rihanna was the most repetitive artist in the database.

song 5

Marriage rate trends in the U.S.


Recently, I had a conversation with a classmate about being invited to weddings without our parents. A little background, I had just attended the wedding of another Westminster student and found it to be a surreal experience. To me, being invited to weddings for people my age is something I am not used to. This made me curious about overall marriage trends in the United States (U.S.).

According to the U.S. census at least 45 percent of U.S. adults (ages 18 and older) were unmarried in 2015. (The Pew Research Center reports the number around 50 percent.) This percentage has remained consistent over the past five years, but is a significant dropped from the peak of 72 percent in 1960.


Quick side note, I want to be very clear that the actual number of married people in the U.S has increased since 1960 because the U.S. population has grown since 1960—72 percent of 179.3 million is less than 45 percent of 326 million. Married people as a percentage of the adult population, however, has decreased.

Analysts from the Pew Research Center suggest that this negative trend is partially the result of Americans getting married later in life. In 2016, the median age for a first marriage is 27.4 years old for women and 29.5 for men. This is roughly a seven-year difference from the median ages in 1960 which were 20.3 for women and 22.8 for men.

 However, delayed marriage does not entirely explain the trend of overall declining marriage rates. The percentage of American adults who have never been married has also risen steadily over the same period. According to a recent Pew Research Center survey conducted among 4,971 American adults, of those who have never been married, 58 percent report that they want to get married someday, 27 percent are unsure, and 14 percent say they do not want to get married. This means one-in-seven American adults say they do not want to get married.

(You can review the survey’s methodology here.)

The survey also asked never-married respondents who say they are open to the possibility of marriage why they were not currently married. Possible reasons included: not having found the right person, financial instability, and/or not being ready to settle down. Respondents were then asked if these three categories were a major or minor reason why they were not currently married.

Results showed that 59 percent (of never-married, would like to be married) said not having found the right person was a major reason and 13 percent said that was a minor reason they were not married today.


From the data, it is also clear that (perceptions of) financial instability was a major reason respondents were not married. Lower-income, never-married adults were more likely to say financial instability was a major reason they were not married: 47 percent of those with incomes less than $30,000 and 40 percent of those with incomes of $30,000 to $74,999, compared with 21 percent of those with incomes of $75,000 or higher. Additionally, nonwhite adults who had never been married were more likely than their white counter parts to cite financial instability as a major reason they were not married.


After reading this survey, I wondered if the results said anything about the American adult population at large. Yet, quickly I realized these results could not be generalized. For one, the Pew Research Center gave their survey to a sample of adults and any given sample will deviate from the population. Furthermore, the Pew Research Center article only reviewed the responses of never-married adults who wanted to get married someday when analyzing the reasons American adults were not married. The opinions of this very specific group are likely to differ from the total population of unmarried American adults—which is made up of people who may have been married before (perhaps more than once) but are no longer married as well as those who have never been married. At the end of the day, while these results are interesting I could see how they would be misinterpreted and misreported.

Ethics of using Nazi death-camp data: just because it’s accessible doesn’t make it ethical

Auschwitz-gate_3176691k (1)

Kate Crawford and danah boyd pose an important question of autonomy in section five of their article, “Critical Questions for Big Data.” After all, what does informed consent look like in a world flooded by Big Data? But, when I think of unethical research and controversial data, my mind immediately jumps to the Nazi death-camp experiments. During WWII, the Nazi’s performed medical and other experiments on non-consenting patients many of whom died as a result. The question since 1947 has been, what do we do with the data the Nazi’s collected. Do we throw out potentially lifesaving information because of how it was gathered?

In the camps, Nazi doctors conducted research on many topics including: endurance, eugenics, vaccines, sterilization, and transplantation. However, it surprised me to discover that the results from many of these experiments are either useless or scientifically unsound. For instance, Nazi doctors ran several studies on human endurance in extreme cold. To mimic German pilots who were eject into the North Sea, prisoners were submerged in ice water for hours on end while their heart rate and core temperature was monitored. These experiments did not establish an absolute human tolerance for cold, however, because they failed to simulate real world conditions. The (Nazi’s subjects were emaciated prisoners of war who had already been subjected to months and months of heinous treatment.) Nevertheless, this extreme cold research has played a minor role in the development of survival suits and techniques for hypothermia patients. So where do we draw the line?

The scientific community is still grappling with this question. According to a 2010 Slate article, “Most journals do not maintain blanket prohibitions on citations of Nazi data, but researchers have seen their articles rejected for referencing [Nazi] studies.” In short, some say Nazi data should never be published, while others argue that the data are useful could never me reproduced.

As part of the series Holocaust on Trial, Nova created an interactive website to engage people in this debate:

“You will be asked the following question eight times: “Based on what you now know, do you think doctors and scientists should be able to use data from Nazi death-camp experiments?” Each time, you must answer Yes or No to that question, and each time you will get a different counterargument meant to challenge your decision.”

The game does a wonderful job explaining the complexities of this issue. Interestingly, I also found that several of Nova’s questions are applicable to Big Data. For example, is it ethical to use the data in question if…

  • the data might help save lives?
  • the accuracy of the data collection is questionable?
  • using this data could set a dangerous precedent?

Whether it be Nazi death-camp data or Big Data sets of people’s private information, is it ethical to use valid scientific data that was obtained unethically? Honestly, I do not think there is a clear-cut answer—there is too much variation from case to case. In the end, I agree with Crawford and boyd that, “In order to act ethically, it is important that researchers reflect on the importance of accountability: both to the field of research and to the research subjects.” In my opinion, the ends do not justify the means. If the purpose of research is to help people, then it is our duty as researchers to insure our methods are not harming those same people. And so the discussion continues.