Chapter 03 - Eelementary Statistics PDF

Title Chapter 03 - Eelementary Statistics
Author USER COMPANY
Course Elementary Statistics
Institution Old Dominion University
Pages 51
File Size 2.2 MB
File Type PDF
Total Downloads 89
Total Views 175

Summary

DESCRIBING, EXPLORING, AND COMPARING DATA...


Description

3-1 Measures of Center 3-2 Measures of Variation 3-3 Measures of Relative Standing and Boxplots

3 DESCRIBING, EXPLORING, AND COMPARING DATA

CHAPTER PROBLEM

Which carrier has the fastest smartphone data speed at airports?

Data Set 32 “Airport Data Speeds” in Appendix B lists data

if we round all of the original data sets, we get the dotplot

speeds measured by RootMetrics at 50 different U.S. airports

shown in Figure 3-1. (Examination of the horizontal scale in

using the four major smartphone carriers of Verizon, Sprint,

Figure 3-1 reveals that the original data speeds have been

AT&T, and T-Mobile. The speeds are all in units of megabits

rounded to the nearest even integer by the software used to

(or 1 million bits) per second, denoted as Mbps. Because

create the dotplots.) By using the same horizontal scale and

the original data speeds listed in Data Set 32 include deci-

stacking the four dotplots, comparisons become much easier.

mal numbers such as 38.5 Mbps, unmodified dotplots of the data would be somewhat messy and not so helpful, but

80

Examination of Figure 3-1 suggests that Verizon is the overall best performer with data speeds that tend to be higher

Chapter Objectives

81

than the data speeds of the other three carriers. But instead of relying solely on subjective interpretations of a graph like Figure 3-1, this chapter introduces measures that are essential to any study of statistics. This chapter introduces the mean, median, standard deviation, and variance, which are among the most important statistics presented in this book, and they are among the most important statistics in the study of statistics. We will use these statistics for describing, exploring, and comparing the measured data speeds from Verizon, Sprint, AT&T, and T-Mobile as listed in Data Set 32.

FIGURE 3-1 Dotplot of Smartphone Data Speeds

CHAPTER OBJECTIVES Critical Thinking and Interpretation: Going Beyond Formulas and Arithmetic In this modern statistics course, it isn’t so important to memorize formulas or manually do messy arithmetic. We can get results with a calculator or software so that we can focus on making practical sense of results through critical thinking. Although this chapter includes detailed steps for important procedures, it isn’t always necessary to master those steps. It is, however, generally helpful to perform a few manual calculations before using technology, so that understanding is enhanced. The methods and tools presented in this chapter are often called methods of descriptive statistics, because they summarize or describe relevant characteristics of data. In later chapters we use inferential statistics to make inferences, or generalizations, about populations. Here are the chapter objectives: 3-1 Measures of Center mode, and midrange.

3-2 Measures of Variation the range, variance, and standard deviation.

range rule of thumb to determine whether a particular value is significantly low or significantly high. 3-3 Measures of Relative Standing and Boxplots given value x is significantly low or significantly high.

82

CHAPTER 3 Describing, Exploring, and Comparing Data

3-1

Measures of Center Key Concept The focus of this section is to obtain a value that measures the center of a data set. In particular, we present measures of center, including mean and median. Our objective here is not only to find the value of each measure of center, but also to interpret those values. Part 1 of this section includes core concepts that should be understood before considering Part 2.

PART 1

Basic Concepts of Measures of Center

In Part 1 of this section, we introduce the mean, median, mode, and midrange as different measures of center. Measures of center are widely used to provide representative values that “summarize” data sets.

Go Figure

DEFINITION A measure of center is a value at the center or middle of a data set.

$3.19: Mean amount left by the tooth fairy, based on a survey by Visa. An unlucky 10% of kids get nothing.

There are different approaches for measuring the center, so we have different definitions for those different approaches. We begin with the mean.

Mean

The mean (or arithmetic mean) is generally the most important of all numerical measurements used to describe data, and it is what most people call an average.

DEFINITION The mean (or arithmetic mean) of a set of data is the measure of center found by adding all of the data values and dividing the total by the number of data values.

Important Properties of the Mean ■

Sample means drawn from the same population tend to vary less than other measures of center.



The mean of a data set uses every data value.



A disadvantage of the mean is that just one extreme value (outlier) can change the value of the mean substantially. (Using the following definition, we say that the mean is not resistant.)

DEFINITION A statistic is resistant if the presence of extreme values (outliers) does not cause it to change very much.

3-1 Measures of Center

83

Calculation and Notation of the Mean The definition of the mean can be expressed as Formula 3-1, in which the Gree letter Σ(uppercase sigma) indicates that the data values should be added, so Σ represents the sum of all data values. The symbol n denotes the sample size, whic is the number of data values.

Class Size Paradox There are at least two ways to obtain the mean class size, and they can have very differ-

FORMULA 3-1

Mean =

Σ x d sum of all data values n d number of data values

If the data are a sample from a population, the mean is denoted by x (pronounced “x-bar”); if the data are the entire population, the mean is denoted by m (lowercase Greek mu).

ent results. At one college, if we take the numbers of students in 737 classes, we get a mean of 40 students. But if we were to compile a list of the class sizes for each student and use this list,

NOTATION Hint: Sample statistics are usually represented by English letters, such as x, and population parameters are usually represented by Greek letters, such as m.

we would get a mean class size of 147. This large discrepancy is because there are many students

Σ

denotes the sum of a set of data values.

in large classes, while there are few students in small classes.

x

is the variable usually used to represent the individual data values.

Without changing the number

n

represents the number of data values in a sample.

N

represents the number of data values in a population.

of classes or faculty, we could reduce the mean class size expe-

Σx x = n Σx m = N

is the mean of a set of sample values. is the mean of all values in a population.

EXAMPLE 1

Mean

Data Set 32 “Airport Data Speeds” in Appendix B includes measures of data speeds of smartphones from four different carriers. Find the mean of the first five data speeds for Verizon: 38.5, 55.6, 22.4, 14.1, and 23.1 (all in megabits per second, or Mbps). S O LU T I O N

The mean is computed by using Formula 3-1. First add the data values, then divide by the number of data values: 38.5 + 55.6 + 22.4 + 14.1 + 23.1 Σ x 153.7 = = n 5 5 = 30.74 Mbps

x =

The mean of the first five Verizon data speeds is 30.74 Mbps. YOUR TURN

Find the mean in Exercise 5 “Football Player Numbers.”

CAUTION Never use the term average when referring to a measure of center. The word average is often used for the mean, but it is sometimes used for other measures of center. The term average is not used by statisticians and it will not be used throughout the remainder of this book when referring to a specific measure of center. The term average is not used by the statistics community or professional journals.

rienced by students by making all classes about the same size. This would also improve attendance, which is better in smaller classes.

84

CHAPTER 3 Describing, Exploring, and Comparing Data

Median

What the Median Is Not Harvard biologist Stephen Jay Gould

The median can be thought of loosely as a “middle value” in the sense that about half f the values in a data set are less than the median and half are greater than the meian. The following definition is more precise.

wrote, “The Median Isn’t the Mes-

DEFINITION The median of a data set is the measure of center that is the middle value when the original data values are arranged in order of increasing (or decreasing) magnitude.

sage.” In it, he describes how he learned that he had abdominal mesothelioma, a form of cancer. He went to the library to learn more, and he was shocked to find that mesothelioma was incurable, with a median survival time of only eight months after it was discovered. Gould wrote this: “I suspect that most people, without training in statistics, would read such a statement as ‘I will probably be dead in eight months’ the very conclusion that must be avoided, since it isn’t so, and since attitude (in fighting the cancer) matters so much.” Gould went on to

Important Properties of the Median ■

The median does not change by large amounts when we include just a few extreme values, so the median is a resistant measure of center.



The median does not directly use every data value. (For example, if the largest value is changed to a much larger value, the median does not change.) Calculation and Notation of the Median The median of a sample is sometimes denoted by ∼x (pronounced “x-tilde”) or M or Med; there isn’t a commonly accepted notation and there isn’t a special symbol for the median of a population. To find the median, first sort the values (arrange them in order) and then follow one of these two procedures:

carefully interpret the value of the

1. If the number of data values is odd, the median is the number located in the ex-

median. He knew that his chance of living longer than the me-

2. If the number of data values is even, the median is found by computing the

dian was good because he was

act middle of the sorted list. mean of the two middle numbers in the sorted list.

young, his cancer was diagnosed early, and he would get the best medical treatment. He also reasoned that some could live much longer than eight months, and he saw no reason why he could not be in that group. Armed with this thoughtful interpretation of the median and a strong positive attitude, Gould lived for 20 years after his diagnosis. He died of

EXAMPLE 2

Find the median of the first five data speeds for Verizon: 38.5, 55.6, 22.4, 14.1, and 23.1 (all in megabits per second, or Mbps). S O LU T I O N

First sort the data values by arranging them in ascending order, as shown below: 14.1 22.4 23.1 38.5 55.6

another cancer not related to the mesothelioma.

Median with an Odd Number of Data Values

Because there are 5 data values, the number of data values is an odd number (5), so the median is the number located in the exact middle of the sorted list, which is 23.1 Mbps. The median is therefore 23.10 Mbps. Note that the median of 23.10 Mbps is different from the mean of 30.74 Mbps found in Example 1. Note also that the result of 23.10 Mbps follows the round-off rule provided later in this section. YOUR TURN

EXAMPLE 3

Find the median in Exercise 5 “Football Player Numbers.”

Median with an Even Number of Data Values

Repeat Example 2 after including the sixth data speed of 24.5 Mbps. That is, find the median of these data speeds: 38.5, 55.6, 22.4, 14.1, 23.1, 24.5 (all in Mbps).

3-1 Measures of Center

85

S O LU T I O N

First arrange the values in ascending order: 14.1 22.4 23.1 24.5 38.5 55.6

Because the number of data values is an even number (6), the median is found by computing the mean of the two middle numbers, which are 23.1 and 24.5. 47.6 23.1 + 24.5 = = 23.80 Mbps Median = 2 2 The median is 23.80 Mbps. YOUR TURN

Find the median in Exercise 7 “Celebrity Net Worth.”

Mode

The mode isn’t used much with quantitative data, but it’s the only measure of cente that can be used with qualitative data (consisting of names, labels, or categories only)

Go Figure Mohammed: The most common name in the world.

DEFINITION The mode of a data set is the value(s) that occur(s) with the greatest frequency.

Important Properties of the Mode ■

The mode can be found with qualitative data.



A data set can have no mode or one mode or multiple modes.

Finding the Mode: A data set can have one mode, more than one mode, or no mode. ■

When two data values occur with the same greatest frequency, each one is a mode and the data set is said to be bimodal.



When more than two data values occur with the same greatest frequency, each is a mode and the data set is said to be multimodal.



When no data value is repeated, we say that there is no mode.



When you have ice cream with your pie, it is “à la mode.” EXAMPLE 4

Mode

Find the mode of these Sprint data speeds (in Mbps): 0.2 0.3 0.3 0.3 0.6 0.6 1.2 S O LU T I O N

The mode is 0.3 Mbps, because it is the data speed occurring most often (three times). YOUR TURN

Find the mode in Exercise 7 “Celebrity Net Worth.”

In Example 4, the mode is a single value. Here are other possible circumstances: Two modes:

The data speeds (Mbps) of 0.3, 0.3, 0.6, 4.0, and 4.0 have two modes: 0.3 Mbps and 4.0 Mbps.

No mode:

The data speeds (Mbps) of 0.3, 1.1, 2.4, 4.0, and 5.0 have no mode because no value is repeated.

86

CHAPTER 3 Describing, Exploring, and Comparing Data

Midrange

Another measure of center is the midrange.

DEFINITION The midrange of a data set is the measure of center that is the value midway between the maximum and minimum values in the original data set. It is found by adding the maximum data value to the minimum data value and then dividing the sum by 2, as in the following formula: Midrange =

maximum data value + minimum data value 2

Important Properties of the Midrange ■

Because the midrange uses only the maximum and minimum values, it is very sensitive to those extremes so the midrange is not resistant.



In practice, the midrange is rarely used, but it has three redeeming features: 1. The midrange is very easy to compute. 2. The midrange helps reinforce the very important point that there are several

different ways to define the center of a data set. 3. The value of the midrange is sometimes used incorrectly for the median, so

confusion can be reduced by clearly defining the midrange along with the median.

EXAMPLE 5

Midrange

Find the midrange of these Verizon data speeds from Example 1: 38.5, 55.6, 22.4, 14.1, and 23.1 (all in Mbps) S O LU T I O N

The midrange is found as follows: maximum data value + minimum data value 2 55.6 + 14.1 = 34.85 Mbps = 2

Midrange =

The midrange is 34.85 Mbps. YOUR TURN

Find the midrange in Exercise 5 “Football Player Numbers.”

Rounding Measures of Center

When calculating measures of center, we often need to round the result. We use the following rule. Round-Off Rules for Measures of Center: present in the original set of values. mode are the same as some of the original data values).

3-1 Measures of Center

When applying any rounding rules, round only the final answer, not intermedii ate values that occur during calculations. For example, the mean of 2, 3, and 5 isi 3.333333 . . . , which is rounded to 3.3, which has one more decimal place than the original values of 2, 3, and 5. As another example, the mean of 80.4 and 80.6 is 80.50 0 (one more decimal place than was used for the original values). Because the mode isi one or more of the original data values, we do not round values of the mode; we sim m ply use the same original values that are modes.

87

Rounding Error Changes World Record Rounding errors can often have disastrous results. Justin Gatlin was

Critical Thinking

elated when he

We can always calculate measures of center from a sample of numbers, but we should always think about whether it makes sense to do that. In Section 1-2 we noted that it makes no sense to do numerical calculations with data at the nominal level of measurement because those data consist of names, labels, or categories only, so statistics such as the mean and median are meaningless. We should also think about the sampling method used to collect the data. If the sampling method is not sound, the statistics we obtain may be very misleading.

set the world record as the person to run 100 meters in the fastest time of 9.76 seconds. His record time lasted only 5 days, when it was revised to 9.77 seconds, so Gatlin then tied the world record instead of breaking it. His actual time was 9.766 seconds, and it

EXAMPLE 6

Critical Thinking and Measures of Center

See each of the following illustrating situations in which the mean and median are not meaningful statistics. a. Zip codes of the Gateway Arch in St. Louis, White House, Air Force division

of the Pentagon, Empire State Building, and Statue of Liberty: 63102, 20500, 20330, 10118, 10004. (The zip codes don’t measure or count anything. The numbers are just labels for geographic locations.) b. Ranks of selected national universities of Harvard, Yale, Duke, Dartmouth, and

Brown (from U.S. News & World Report): 2, 3, 7, 10, 14. (The ranks reflect an ordering, but they don’t measure or count anything.) c. Numbers on the jerseys of the starting defense for the Seattle Seahawks when

they won Super Bowl XLVIII: 31, 28, 41, 56, 25, 54, 69, 50, 91, 72, 29. (The numbers on the football jerseys don’t measure or count anything; they are just substitutes for names.) d. Top 5 incomes of chief executive officers (in millions of dollars): 131.2, 66.7, 64.4, 53.3, 51.5. (Such “top 5” or “top 10” lists include data that are not at all representative of the larger population.) e. The 50 mean ages computed from the means in each of the 50 states. (If you

calculate the mean of those 50 values, the result is not the mean age of people in the entire United States. The population sizes of the 50 different states must be taken into account, as described in the weighted mean introduced in Part 2 of this section.) YOUR TURN

For Exercise 5 “Football Player Numbers,” determine why the mean and median are not meaningful.

In the spirit of describing, exploring, and comparing data, we provide Table 3-1 on the next page, which summarizes the different measures of center for the smartphone data speeds referenced in the Chapter Problem. The data are listed in Data Set 32 “Airport Data Speeds” in Appendix B. Figure 3-1 o...


Similar Free PDFs