Title | Fall 2020 1 - Martin Evans |
---|---|
Author | Spare Man |
Course | Economic Statistics |
Institution | Georgetown University |
Pages | 24 |
File Size | 1.2 MB |
File Type | |
Total Downloads | 97 |
Total Views | 128 |
Martin Evans...
How To Lie With Statistics (VS Chapter 10)
• Statistics are famous for being easily manipulated to serve the purposes of the person who reports them. • People sometimes report statistics that are complete fabrications. • People sometimes make factually correct claims, but present them in a plainly deceptive way. • People sometimes report data accurately but fail to recognize simple explanations for the patterns they find. Martin Evans
1/24
Outline
Part 1 1. Overlooked Simple Explanations 2. Variation 3. Polls 4. Sampling
Part 2 5. Causal Inference and Extrapolation 6. Spurious Correlation and Data Mining 7. Interpolation
Martin Evans
2/24
Overlooked Simple Explanations
Example: Righties live much longer than lefties! “Handedness and life span”, New England Journal of Medicine, 1991. The researchers obtained death certificates from two southern California counties. They asked the families which hand the deceased favored. Lifespans of women: righties: 77 years. lefties: 71 years. Lifespans of men: righties: 75 years. lefties: 69 years.
As explanations, the authors suggest “implied pathological factors and environmental interactions” and “covert neuropathologic features or immune system dysfunction”.
Martin Evans
3/24
Example: Righties live much longer than lefties! “Handedness and life span”, New England Journal of Medicine, 1991: Lifespans of women: righties: 77 years. lefties: 71 years. Lifespans of men: righties: 75 years. lefties: 69 years.
But: until the mid 20th century, teachers typically forced everyone to write with their right hands. self-reported left-handers: in 1932, 2.2%; in 1972, 11.2%. In 1991, older left-handed people would have been conspicuously absent from lists of recent deaths. . .
Martin Evans
4/24
People sometimes report statistics that are complete fabrications. People sometimes make factually correct claims, but present them in a plainly deceptive way. People sometimes report data accurately but fail to recognize simple explanations for the patterns they find. Sometimes the reasons behind incorrect interpretations are more subtle.
Martin Evans
5/24
10.2 Variation Variation in data can be a source of subtle statistical problems. Example: In 1991, America West Airlines’ on-time arrival rate was superior to that of its West Coast rival, Alaska Airlines. For flights into 5 of the 30 busiest U.S. airports, 89.1% of America West’s flights were on time, but only 86.7% of Alaska’s were on time. Should we choose America West for timely travel?
Martin Evans
6/24
Alaska
America West
destination Los Angeles
% on time 88.9
% on time 85.6
Phoenix San Diego San Francisco Seattle
94.8 91.4 83.1 85.8
92.1 85.5 71.3 76.7
At all five airports, Alaska has a higher on-time percentage than America West! How could America West perform better overall?
Martin Evans
7/24
Alaska
America West
destination Los Angeles Phoenix
% on time 88.9 94.8
San Diego San Francisco
91.4 83.1
232 605
85.5 71.3
448 449
Seattle
85.8
2,146
76.7
262
# arrivals % on time 559 85.6 233 92.1
# arrivals 811 5,255
Alaska’s hub is Seattle. America West’s hub is Phoenix. Such reversals of comparisons using aggregated and disaggregated data are known as Simpson’s paradox.
Martin Evans
8/24
Example: The (im)precision of the Nielsen ratings. From the front page of the New York Times, March 2007: NBC’s dominance in television’s evening news race is undergoing its most serious challenge in a decade as ”World News” on ABC scored its second ratings victory in the last three weeks. According to the Nielsen ratings, ABC’s “World News” garnered 9.69 million nightly viewers, compared to 9.65 million viewers for NBC’s “Nightly News”.
Martin Evans
9/24
Example: The (im)precision of the Nielsen ratings. From the front page of the New York Times, March 2007: NBC’s dominance in television’s evening news race is undergoing its most serious challenge in a decade as ”World News” on ABC scored its second ratings victory in the last three weeks. According to the Nielsen ratings, ABC’s “World News” garnered 9.69 million nightly viewers, compared to 9.65 million viewers for NBC’s “Nightly News”. Nielsen ratings are based on random samples, and so are subject to random variation. In this case, the “sampling error bound” was 280,000 viewers!
Martin Evans
10/24
After complaints, the Times’s Public Editor wrote in April 2007: Delving into the complaint, I found that The Times has kept readers in the dark for years about the real-world significance of the Nielsen television audience data it publishes regularly. . . . . . two recent newsroom-wide initiatives calling for publication of the margin of error on sample-based surveys haven’t yet produced any change. (They have improved some since then.)
Martin Evans
11/24
10.3 Polls and Sampling Opinion polls can provide a great deal of information about the public‘s views. But since poll results can also influence people’s views, they can also be used to manipulate public opinion. One obvious way to manipulate poll results is through the wording of questions. A slightly less obvious way is through priming.
Martin Evans
12/24
Example: Psychologists showed subjects film clips of a car accident. They then asked about the speeds the cars were moving at the time of the accident. question wording
mean response
“About how fast were the cars going when they smashed?” “About how fast were the cars going when they collided?” “About how fast were the cars going when they bumped?” “About how fast were the cars going when they hit?” “About how fast were the cars going when they contacted?”
Martin Evans
13/24
Example: Psychologists showed subjects film clips of a car accident. They then asked about the speeds the cars were moving at the time of the accident. question wording “About how fast were the cars going when they smashed?”
mean response 40.8 mph
“About how fast were the cars going when they collided?” “About how fast were the cars going when they bumped?”
39.3 mph 38.1 mph
“About how fast were the cars going when they hit?” “About how fast were the cars going when they contacted ?”
34.0 mph
Martin Evans
31.8 mph
14/24
How long was the movie? How short was the movie?
Do you think the U.S. should allow public speeches against democracy? Do you think the U.S. should forbid public speeches against democracy?
Martin Evans
15/24
How long was the movie?
130 minutes
How short was the movie?
100 minutes
Do you think the U.S. should allow public speeches against democracy? Do you think the U.S. should forbid public speeches against democracy?
Martin Evans
16/24
How long was the movie?
130 minutes
How short was the movie?
100 minutes
Do you think the U.S. should allow public speeches against democracy? Do you think the U.S. should forbid public speeches against
62% don’t allow
46% forbid
democracy?
Martin Evans
17/24
Example: In March 2007, Congress passed a bill demanding a timetable for the withdrawal of troops for Iraq. The Newsweek, CBS News, USA Today/Gallup, and Pew Research polls showed public support for the bill ranging from 57 to 60%. The Fox News poll showed support for the bill at 44%.
Martin Evans
18/24
The CBS News poll asked: Do you think the United States should or should not set a timetable for the withdrawal of U.S. troops from Iraq that would have most troops out by September 2008?
Martin Evans
19/24
The CBS News poll asked: Do you think the United States should or should not set a timetable for the withdrawal of U.S. troops from Iraq that would have most troops out by September 2008? The Fox News poll asked: Who do you trust more to decide when U.S. troops should leave Iraq: U.S. military commanders or members of Congress? Last week the U.S. House voted to remove U.S. troops from Iraq by no later than September 2008. Would you describe this as a correct and good decision or a dangerous and bad decision?
Martin Evans
20/24
10.4 Endogenous Sampling Biases A sampling procedure produces an endogenous sampling bias when the way in which observations are generated creates systematic discrepancies between the sample and the population. Example: Mediocrity in business. In 1933, Northwestern University statistics professor Horace Secrist wrote The Triumph of Mediocrity in Business. His thesis: ”[m]ediocrity tends to prevail in the conduct of competitive business”.
Martin Evans
21/24
The evidence: Secrist divided firms from various industries into groups according to average profits in 1920-21. The average profits of the firms in groups with the highest 1920-21 profits tended to fall over time, with average profits falling fastest in the groups with the highest 1920-21 profits. Average profits in the groups with below average 1920-21 profits tended to rise over time.
Martin Evans
22/24
The evidence: Secrist divided firms from various industries into groups according to average profits in 1920-21. The average profits of the firms in groups with the highest 1920-21 profits tended to fall over time, with average profits falling fastest in the groups with the highest 1920-21 profits. Average profits in the groups with below average 1920-21 profits tended to rise over time. To show this phenomenon was not a statistical artifact, Secrist performed the same exercise with average July temperatures in U.S. cities. In the cities with the highest 1921 temperatures, the 1922 temperatures did not show a tendency to decrease. Conclusion: The tendency of group averages to move towards a central value is a special feature of human economic activity.
Martin Evans
23/24
Secrist’s data on firms is not evidence of a tendency toward mediocrity. It is a textbook example of regression to the mean.
The idea: A firm’s profits in a given year are partly due to quality and partly due to chance. Some firms in the top group in 1920-21 had a lucky year. Such firms will tend to have lower profits in subsequent years. Thus the average profits of firms in the top group from 1920-21 are almost guaranteed to move toward the population average.
This logic does not apply to average temperatures in different cities. A city’s average July temperature is determined almost entirely by location. There is not a large, city-specific random component. Phoenix will not have an off year in which it is cooler than San Francisco. Martin Evans
24/24...