Chapter 2-2 - Lecture notes 1,2 PDF

Title Chapter 2-2 - Lecture notes 1,2
Author David Attallah
Course Mathematical Explorations
Institution California State University Northridge
Pages 10
File Size 838.3 KB
File Type PDF
Total Downloads 112
Total Views 143

Summary

math 227...


Description

2.1 Frequency Distributions for Organizing and Summarizing Data A frequency distribution shows how data are partitioned among several categories (or classes) by listing the categories along with the number (frequency) of data values in each of them. We construct frequency distributions to: summarize large data sets, see the distribution and identify outliers, have a basis for constructing graphs. McDonald’s Lunch Drive-Through Service Times

Time 75-124 125-174 175-224 225-274 275-324

Lower and upper class limits, class boundaries, class midpoints, class width 75

74.5

124 125

174 175

224 225

274 275

324

124.5

174.5

224.5

274.5

324.5

Lower class limits: Upper class limits: Class boundaries: Class midpoints =

(lower class limit + upper class limit) 2

Class width = the difference between two consecutive lower class limits:

Procedure for Constructing a Frequency Distribution 1. Select the number of classes (between 5-20). 2. Calculate the class width ≈

max. data value − min. data vale . number of classes

3. Choose the value to be the first lower class limit. 4. Calculate and list vertically the rest of the lower class limits. 5. List the corresponding upper class limits. 6. Take each individual data value and put a tally mark in the appropriate class. 7. Add the tally marks to find the total frequency for each class.

Frequency 11 24 10 3 2

Example 1 Construct a frequency distribution for the following dataset, which is the drive-through service times for Burger King dinners. Begin with a lower class limit of 30 seconds and use a class width of 40 seconds.

Time

Frequency

Lower class limits:

Upper class limits:

Class boundaries:

Class midpoints:

Exercise Weights of respondents were recorded as part of the California Health Interview Survey. The last digits of weights from 50 randomly selected respondents are listed below. Construct a frequency distribution with 10 classes. 5 5 4 3

0 0 5 0

1 5 0 5

0 6 0 0

2 0 4 0

0 0 0 0

5 0 0 5

0 0 0 8

5 0 0

0 0 0

3 8 8

8 5 0

5 5 9

0 0 5

Category Frequency

Frequency Distribution of Categorical Data Example 2 The table below lists data for the highest seven sources of injuries resulting in a visit to a hospital emergency room (ER) in a recent year. The activity names are categorical data at the nominal level of measurement, but we can create the frequency distribution as shown. Activity Frequency

Bicycling Football Playground Basketball Soccer Baseball All-terrain vehicle 26,212

25,376

16,706

13,987

10,436

9,634

6,337

Relative Frequency Distribution Relative frequency for a class =

frequency for a class sum of all frequencies

frequency for a class Percentage for a class = sum of all frequencies × 100%

The sum of the percentages in a relative frequency distribution must be very close to 100% (with a little wiggle room for rounding errors). Example Time (seconds)

Frequency

75-124

11

125-174

24

175-224

10

225-274

3

275-324

2

Relative Frequency

Cumulative Frequency Distribution The frequency for each class is the sum of the frequencies for that class and all previous classes. Example Time (seconds)

Cumulative Frequency

less than 125 less than 175 less than 225 less than 275 less than 325

Normal Distributions 1. The frequencies start low, then increase to one or two high frequencies, and then decrease to a low frequency. 2. The distribution is approximately symmetric: Frequencies preceding the maximum frequency should be roughly a mirror image of those that follow the maximum frequency.

Exercise Using a loose interpretation of the criteria for determining whether a frequency distribution is approximately a normal distribution, determine whether the given frequency distribution is approximately a normal distribution. Age (yr) of Best Actor When Oscar Was Won

Frequency

20 – 29

1

30 – 39

28

40 – 49

36

50 – 59

15

60 – 69

6

70 – 79

1

Blood Platelet Count of Females

Frequency

100 – 199

25

200 – 299

92

300 – 399

28

400 – 499

0

500 – 599

2

Analysis of Last Digits Frequencies of last digits sometimes reveal how the data were collected or measured. Last digit of puse rate 0 1

Frequency 455 0

2 3 4

461 0 479

5 6 7

0 425 0

8 9

399 0

• Pulse rates from 2219 adults. • The last digits of the recorded pulse rates are identified. • All of the last digits are even numbers. • If the pulse rates were counted for 1 full minute, there would surely be a large number of them ending with an odd digit. • One reasonable explanation is that even though the pulse rates are the number of heartbeats in 1 minute, they were likely counted for 30 seconds and the number of beats was doubled.

Exercise Based on the distribution of the exercise problem on page 2, do the weights appear to be reported or actually measured? What do you know about the accuracy of the results?

Exercise Use the given categorical data to construct the relative frequency distribution. Natural births randomly selected from four hospitals in New York State occurred on the days of the week (in the order of Monday through Sunday) with these frequencies: 52, 66, 72, 57, 57, 43, 53. Does it appear that such births occur on the days of the week with equal frequency? Category Frequency

2.2 Histograms A histogram is a graph consisting of bars of equal width drawn adjacent to each other (unless there are gaps in the data). The horizontal scale represents classes of quantitative data values and the vertical scale represents frequencies. The heights of the bars correspond to the frequency values. Important Uses of a Histogram • Shows the distribution of the data

• Shows the spread of the data

• Shows the location of the center of the data

• Identifies outliers

A relative frequency histogram has the same shape and horizontal scale as a histogram, but the vertical scale uses relative frequencies (as percentages or proportions) instead of actual frequencies.

Common Distribution Shapes

When graphed as a histogram, a normal distribution has a ”bell” shape. Many statistical methods require that sample data come from a population having a distribution that is approximately normal. A histogram can be used to judge whether this requirement is satisfied. Another method is using the normal quantile plots. A normal quantile plot can be interpreted on the following criteria: • Normal Distribution: Points are reasonably close to a straight line • Not a Normal Distribution: Points not reasonably close to a straight line or the points show some systemic pattern that is not straight Examples

2.3 Graphs That Enlighten and Graphs That Deceive Graphs That Enlighten • Dotplot – Consists of a graph in which each data value is plotted as a point (or dot) along a scale of values. – Dots representing equal values are stacked. – Displays the shape of the distribution of data. – It is usually possible to recreate the original list of data values Example 1

Exercise Construct the dotplot. Listed below are diastolic blood pressure measurements (mm Hg) of females selected from Data Set 1 ”Body Data” in Appendix B. All of the values are even numbers. Are there any outliers? If so, identify their values. 62 70

70 76

72 90

88 86

70 60

66 78

68 82

70 78

82 84

74 76

90 60

62 64

• Stemplot – Represents quantitative data by separating each value into two parts: the stem (such as the leftmost digit) and the leaf (such as the rightmost digit). – Shows the shape of the distribution of the data. – Retains the original data values. The sample data are sorted (arranged in order). Example

Exercise Construct the stemplot. 62 70

70 76

72 90

88 86

70 60

66 78

68 82

70 78

82 84

74 76

90 60

62 64

• Time-Series Graph – A time-series graph is a graph of time-series data, which are quantitative data that have been collected at different points in time, such as monthly or yearly. – Reveals information about trends over time Example

• Bar graph – Uses bars of equal width to show frequencies of categories of categorical (or qualitative) data. – The bars may or may not be separated by small gaps. – Shows the relative distribution of categorical data so that it is easier to compare the different categories • Pareto chart – A bar graph for categorical data, with the added stipulation that the bars are arranged in descending order according to frequencies. – Shows the relative distribution of categorical data so that it is easier to compare the different categories – Draws attention to the more important categories • Pie chart – Depicts categorical data as slices of a circle, in which the size of each slice is proportional to the frequency count for the category. – Very common, but they are not as effective as Pareto charts. – Shows the distribution of categorical data in a commonly used format. • Frequency polygon – Uses line segments connected to points located directly above class midpoint values. – A variation of the basic frequency polygon is the relative frequency polygon, which uses relative frequencies. – An advantage of relative frequency polygons is that two or more of them can be combined on a single graph for easy comparison. Graphs That Deceive • Nonzero Axis

• Pictographs

2.4 Scatterplots, Correlation, and Regression A correlation exists between two variables when the values of one variable are somehow associated with the values of the other variable. A linear correlation exists between two variables when there is a correlation and the plotted points of paired data result in a pattern that can be approximated by a straight line. A scatterplot (or scatter diagram) is a plot of paired (x, y) quantitative data with a horizontal x-axis and a vertical y-axis. The horizontal axis is used for the first variable (x), and the vertical axis is used for the second variable (y ). Example

The linear correlation coefficient is denoted by r, and it measures the strength of the linear association between two variables. The computed value of the linear correlation coefficient is always between −1 and 1. If r is close to −1 or close to 1, there appears to be a correlation, but if r is close to 0, there does not appear to be a linear correlation. Critical Values of the Linear Correlation Coefficient r 5

6

7

8

9

10

11

12

0.950

0.878

0.811

0.754

0.707

0.666

0.632

0.602

0.576

Correlation

Critical Value of r

4

−1

Correlation

Number of Pairs of Data n

No correlation

−r

r

0

1

Example Consider the data below. The linear correlation coefficient is found to be r = 0.591. Determine if there is a linear correlation. Shoe Print Length (cm) Height (cm)

29.7 175.3

29.7 177.8

31.4 185.4

31.8 175.3

27.6 172.7

Exercise Use the data below with the linear correlation coefficient is r = 0.980 to determine if there is a linear correlation.

Exercise Use the data below with the linear correlation coefficient is r = −0.017 to determine if there is a linear correlation.

Given a collection of paired sample data, the regression line (or line of best fit, or least-squares line) is the straight line that ”best” fits the scatterplot of the data....


Similar Free PDFs