Fall 2020 1 - Martin Evans PDF

Title	Fall 2020 1 - Martin Evans
Author	Spare Man
Course	Economic Statistics
Institution	Georgetown University
Pages	24
File Size	1.2 MB
File Type	PDF
Total Downloads	97
Total Views	128

Preview

CLICK TO PREVIEW PDF

Summary

Martin Evans...

Description

How To Lie With Statistics (VS Chapter 10)

• Statistics are famous for being easily manipulated to serve the purposes of the person who reports them. • People sometimes report statistics that are complete fabrications. • People sometimes make factually correct claims, but present them in a plainly deceptive way. • People sometimes report data accurately but fail to recognize simple explanations for the patterns they find. Martin Evans

1/24

Outline

Part 1 1. Overlooked Simple Explanations 2. Variation 3. Polls 4. Sampling

Part 2 5. Causal Inference and Extrapolation 6. Spurious Correlation and Data Mining 7. Interpolation

Martin Evans

2/24

Overlooked Simple Explanations

Example: Righties live much longer than lefties! “Handedness and life span”, New England Journal of Medicine, 1991. The researchers obtained death certificates from two southern California counties. They asked the families which hand the deceased favored. Lifespans of women: righties: 77 years. lefties: 71 years. Lifespans of men: righties: 75 years. lefties: 69 years.

As explanations, the authors suggest “implied pathological factors and environmental interactions” and “covert neuropathologic features or immune system dysfunction”.

Martin Evans

3/24

Example: Righties live much longer than lefties! “Handedness and life span”, New England Journal of Medicine, 1991: Lifespans of women: righties: 77 years. lefties: 71 years. Lifespans of men: righties: 75 years. lefties: 69 years.

But: until the mid 20th century, teachers typically forced everyone to write with their right hands. self-reported left-handers: in 1932, 2.2%; in 1972, 11.2%. In 1991, older left-handed people would have been conspicuously absent from lists of recent deaths. . .

Martin Evans

4/24

People sometimes report statistics that are complete fabrications. People sometimes make factually correct claims, but present them in a plainly deceptive way. People sometimes report data accurately but fail to recognize simple explanations for the patterns they find. Sometimes the reasons behind incorrect interpretations are more subtle.

Martin Evans

5/24

10.2 Variation Variation in data can be a source of subtle statistical problems. Example: In 1991, America West Airlines’ on-time arrival rate was superior to that of its West Coast rival, Alaska Airlines. For flights into 5 of the 30 busiest U.S. airports, 89.1% of America West’s flights were on time, but only 86.7% of Alaska’s were on time. Should we choose America West for timely travel?

Martin Evans

6/24

Alaska

America West

destination Los Angeles

% on time 88.9

% on time 85.6

Phoenix San Diego San Francisco Seattle

94.8 91.4 83.1 85.8

92.1 85.5 71.3 76.7

At all five airports, Alaska has a higher on-time percentage than America West! How could America West perform better overall?

Martin Evans

7/24

Alaska

America West

destination Los Angeles Phoenix

% on time 88.9 94.8

San Diego San Francisco

91.4 83.1

232 605

85.5 71.3

448 449

Seattle

85.8

2,146

76.7

262

# arrivals % on time 559 85.6 233 92.1

# arrivals 811 5,255

Alaska’s hub is Seattle. America West’s hub is Phoenix. Such reversals of comparisons using aggregated and disaggregated data are known as Simpson’s paradox.

Martin Evans

8/24

Example: The (im)precision of the Nielsen ratings. From the front page of the New York Times, March 2007: NBC’s dominance in television’s evening news race is undergoing its most serious challenge in a decade as ”World News” on ABC scored its second ratings victory in the last three weeks. According to the Nielsen ratings, ABC’s “World News” garnered 9.69 million nightly viewers, compared to 9.65 million viewers for NBC’s “Nightly News”.

Martin Evans

9/24

Example: The (im)precision of the Nielsen ratings. From the front page of the New York Times, March 2007: NBC’s dominance in television’s evening news race is undergoing its most serious challenge in a decade as ”World News” on ABC scored its second ratings victory in the last three weeks. According to the Nielsen ratings, ABC’s “World News” garnered 9.69 million nightly viewers, compared to 9.65 million viewers for NBC’s “Nightly News”. Nielsen ratings are based on random samples, and so are subject to random variation. In this case, the “sampling error bound” was 280,000 viewers!

Martin Evans

10/24

After complaints, the Times’s Public Editor wrote in April 2007: Delving into the complaint, I found that The Times has kept readers in the dark for years about the real-world significance of the Nielsen television audience data it publishes regularly. . . . . . two recent newsroom-wide initiatives calling for publication of the margin of error on sample-based surveys haven’t yet produced any change. (They have improved some since then.)

Martin Evans

11/24

10.3 Polls and Sampling Opinion polls can provide a great deal of information about the public‘s views. But since poll results can also influence people’s views, they can also be used to manipulate public opinion. One obvious way to manipulate poll results is through the wording of questions. A slightly less obvious way is through priming.

Martin Evans

12/24

Example: Psychologists showed subjects film clips of a car accident. They then asked about the speeds the cars were moving at the time of the accident. question wording

mean response

“About how fast were the cars going when they smashed?” “About how fast were the cars going when they collided?” “About how fast were the cars going when they bumped?” “About how fast were the cars going when they hit?” “About how fast were the cars going when they contacted?”

Martin Evans

13/24

Example: Psychologists showed subjects film clips of a car accident. They then asked about the speeds the cars were moving at the time of the accident. question wording “About how fast were the cars going when they smashed?”

mean response 40.8 mph

“About how fast were the cars going when they collided?” “About how fast were the cars going when they bumped?”

39.3 mph 38.1 mph

“About how fast were the cars going when they hit?” “About how fast were the cars going when they contacted ?”

34.0 mph

Martin Evans

31.8 mph

14/24

How long was the movie? How short was the movie?

Do you think the U.S. should allow public speeches against democracy? Do you think the U.S. should forbid public speeches against democracy?

Martin Evans

15/24

How long was the movie?

130 minutes

How short was the movie?

100 minutes

Do you think the U.S. should allow public speeches against democracy? Do you think the U.S. should forbid public speeches against democracy?

Martin Evans

16/24

How long was the movie?

130 minutes

How short was the movie?

100 minutes

Do you think the U.S. should allow public speeches against democracy? Do you think the U.S. should forbid public speeches against

62% don’t allow

46% forbid

democracy?

Martin Evans

17/24

Example: In March 2007, Congress passed a bill demanding a timetable for the withdrawal of troops for Iraq. The Newsweek, CBS News, USA Today/Gallup, and Pew Research polls showed public support for the bill ranging from 57 to 60%. The Fox News poll showed support for the bill at 44%.

Martin Evans

18/24

The CBS News poll asked: Do you think the United States should or should not set a timetable for the withdrawal of U.S. troops from Iraq that would have most troops out by September 2008?

Martin Evans

19/24

The CBS News poll asked: Do you think the United States should or should not set a timetable for the withdrawal of U.S. troops from Iraq that would have most troops out by September 2008? The Fox News poll asked: Who do you trust more to decide when U.S. troops should leave Iraq: U.S. military commanders or members of Congress? Last week the U.S. House voted to remove U.S. troops from Iraq by no later than September 2008. Would you describe this as a correct and good decision or a dangerous and bad decision?

Martin Evans

20/24

10.4 Endogenous Sampling Biases A sampling procedure produces an endogenous sampling bias when the way in which observations are generated creates systematic discrepancies between the sample and the population. Example: Mediocrity in business. In 1933, Northwestern University statistics professor Horace Secrist wrote The Triumph of Mediocrity in Business. His thesis: ”[m]ediocrity tends to prevail in the conduct of competitive business”.

Martin Evans

21/24

The evidence: Secrist divided firms from various industries into groups according to average profits in 1920-21. The average profits of the firms in groups with the highest 1920-21 profits tended to fall over time, with average profits falling fastest in the groups with the highest 1920-21 profits. Average profits in the groups with below average 1920-21 profits tended to rise over time.

Martin Evans

22/24

The evidence: Secrist divided firms from various industries into groups according to average profits in 1920-21. The average profits of the firms in groups with the highest 1920-21 profits tended to fall over time, with average profits falling fastest in the groups with the highest 1920-21 profits. Average profits in the groups with below average 1920-21 profits tended to rise over time. To show this phenomenon was not a statistical artifact, Secrist performed the same exercise with average July temperatures in U.S. cities. In the cities with the highest 1921 temperatures, the 1922 temperatures did not show a tendency to decrease. Conclusion: The tendency of group averages to move towards a central value is a special feature of human economic activity.

Martin Evans

23/24

Secrist’s data on firms is not evidence of a tendency toward mediocrity. It is a textbook example of regression to the mean.

The idea: A firm’s profits in a given year are partly due to quality and partly due to chance. Some firms in the top group in 1920-21 had a lucky year. Such firms will tend to have lower profits in subsequent years. Thus the average profits of firms in the top group from 1920-21 are almost guaranteed to move toward the population average.

This logic does not apply to average temperatures in different cities. A city’s average July temperature is determined almost entirely by location. There is not a large, city-specific random component. Phoenix will not have an off year in which it is cooler than San Francisco. Martin Evans

24/24...