Last class, a few of us were talking about the MBTI1 during the break. In particular, we were wondering if the test is really able to accurately determine personality type. I was curious about the answer to our question, so I decided to research the MBTI for my blog post. I found an article by David Pittenger in the Review of Educational Research (a peer-reviewed journal that is published by the American Education Research Association). Like me, Pittenger is interested in answering the question: Is the MBTI valid and reliable? Or, in other words, does the MBTI consistently measure what it says it measures? To be even more specific, does the MBTI consistently measure an individual’s personality? Pittenger answers this question by reviewing numerous studies that measure the validity and reliability of the MBTI. For this blog post, I’ll review some of the conclusions and studies that he cites.
First, Pittenger explains that the data from a sample of tests should look bimodal because the categories are mutually exclusive— for example, you are coded as either thinking or feeling, but not both. However, researchers have found that the test does not produce bimodal data. For example, Stricker and Ross (1962), Hicks (1984), and McCrae and Costa (1989) graphed data about introversion/extroversion from a collection of MBTI results. They found that the distribution of scores is actually normal. This indicates that most people’s answers put them somewhere in between extrovert and introvert, but the test categorizes them as either extrovert or introvert based off of which they lean towards the most, however slight the lean. If, in reality, personality types really are mutually exclusive, then “there should be separate distributions of scores representing extroverts and introverts, and each distribution should have an independent mean and standard deviation” (471). Essentially, the researchers found that people exist somewhere along the extroversion/introversion scale, and not squarely in one or the other as the test suggests.
Pittenger also argues that the test-retest reliability for the MBTI is suspect. He cites several studies, all of which indicate a high test-retest reliability for individuals. However, Pittenger is cautious about these reliability numbers because they don’t consider the fact that even changing just one letter is a serious change. For example, say someone takes the test and their results say they are an ISTP, but a few months later they retest and are now an ESTP. This may appear pretty reliable because not much has changed— just one letter. The raw data might also show that the numbers are very close. However, the test is essentially saying that someone has gone from an introvert to an extrovert. That is a pretty dramatic personality change, especially considering that one tenet of Jung’s work, and the Myers-Briggs test, is that personality is fairly fixed throughout one’s lifetime. Additionally, Pittenger cites a study that “examined the stability of the type assignment across a 5-week interval and found that 50% of the subjects were reclassified on one or more of the four scales. This finding suggests that the four-letter type code is not a stable personality characteristic” (472).
There are also ethical concerns about using the MBTI test that Pittenger raises. For example, say a career adviser uses the MBTI to help people determine what jobs might be good for them. If someone’s test labels them an ISFJ (someone who is caring, sympathetic, and organized), the adviser might be more likely to steer them towards nursing or accounting rather than police work or business. Even if these assumptions are accurate for a handful of people, they are still stereotypes about what kinds of work people enjoy or do well. As Pittenger says, “left unchallenged by empirical investigations, these stereotypes become persistent in the culture” (482).
Another ethical issue that Pittenger raises is that of self-fulfilling prophecy. Someone, after taking the MBTI, might read their results and find them to be very accurate. This could be because the person expects the results to be accurate, so they ignore inaccurate feedback and remember accurate feedback. Also, the information might be so vague that people can easily read themselves into the description without feeling uncomfortable. This second phenomenon is called the “Barnum effect” and it is often used to explain why some people are so convinced by horoscopes.
Pittenger concludes his investigation with this damning statement:
There is insufficient evidence to justify the specific claims made about the MBTI. Although the test does appear to measure several common personality traits, the patterns of data do not suggest that there is reason to believe that there are 16 unique types of personality. Furthermore, there is no convincing evidence to justify that knowledge of type is a reliable or valid predictor of important behavioral conditions. Taken as a whole, the MBTI makes few unique practical or theoretical contributions to the understanding of behaviors (483).
After reading the evidence, I agree with Pittenger. However, most people who administer and use the MBTI would probably argue that although the MBTI isn’t entirely valid or reliable, it is still a useful tool. I agree with this— to some extent. I understand that thinking about different personality types can be helpful in considering communication and work styles. However, I also think that someone administering the test might have to issue too many disclaimers or caveats for the test to be useful. I do wonder if a more practical approach to the MBTI would be to show people where they fall on the scale of each category, rather than assigning them a single letter. I think this could alleviate some of the concerns highlighted above.
1For those of you who are unfamiliar with the MBTI, it is a personality test based on the work of psychologist Carl Jung and developed by mother-daughter duo Katharine Briggs and Isabel Briggs Myers. When someone takes the MBTI, they answer a series of questions that force them to choose between two behavioral choices/characteristics. For example, a question could read: “I would describe myself as (A) adventurous or (B) consistent.” Test results are counted and used to indicated which category someone belongs in for the following four dimensions: Extroversion-Introversion (EI), Sensing-Intuition (SN), Thinking-Feeling (TF), and Judgement-Perception (JP). Based on someone’s responses, they are given a dominant type for each category, which creates their four-letter personality. For example, I am an INTJ, so my dominant types are introversion, intuition, thinking, and judgment.