Chapter 5 - Lecture notes 5 PDF

Title Chapter 5 - Lecture notes 5
Course Quantitative Reasoning And Problem Solving
Institution University of Wisconsin-Madison
Pages 7
File Size 136.2 KB
File Type PDF
Total Downloads 12
Total Views 195

Summary

Professor: Aaron Goldsmith...


Description

Individuals: entities about which information (data) is collected - May be people, but can also be groups, animals, or things Variable: a characteristic or trait that can take on different values for different individuals - A particular variable may be either quantitative or qualitative EDA: Exploratory data analysis 1. Begin by examining each variable by itself. Start with one or more graphs, and then add numerical summaries of specific aspects of the data 2. Explore possible relationships among variables, using graphical displays and then numerical summaries Distribution: gives information (as a table, graph, or formula) about how often the variable takes certain values or intervals of values - Frequency distribution: classifies data on a single variable into non-overlapping classes or intervals (class intervals) and records how many times data values are in each class - Relative frequency distribution: classifies data on a single variable into non-overlapping classes or intervals and records what fraction (or percentage) of the data values are in each class - Gives us the fraction of time that each value occurs - Example: if there are seven data values, obtain the relative frequencies by dividing the frequencies by seven Class intervals: non-overlapping consecutive intervals - Provides a means of summarizing the data and often reveals patterns that are obscured when too much data is visible Constructing a frequency distribution 1. Choose the classes - Determine an interval that is wide enough to contain all of the data. Subdivide this interval into a reasonable number of class intervals of equal width, be sure the specify the classes precisely so that each individual falls into exactly one class 2. Setting up the table - Set up a table with three columns for the following: class interval, tally, and frequency 3. Complete the table - Determine the frequency with which data values fall into each class interval - If desired, add a fourth column for relative frequency Histogram: a graphical representation of a frequency distribution for a single numerical value

-

Bars are drawn over each class interval on a number line - The area of the bars are proportional to the frequencies with which data fall into the class intervals

Making a histogram 1. Draw a set of axes. On the horizontal axis, mark the boundaries of the class intervals. 2. Label the vertical axis as “frequency” (or “relative frequency”). Label the horizontal axis with the name of the variable being measured and the units 3. Over each class interval, draw a rectangle with the interval at its base. The height of the rectangle should match the frequency (or relative frequency) of data contained in that interval ** this is a bar graph **

Pages 188-194 Stemplot (stem and leaf plot): a display of the distribution of a variable that attaches the final digits of each observation as a leaf on a stem made up of all but the final digit Making a stemplot 1. Separate each observation into a stem, which consists of all but the rightmost digit, and a leaf, which is the final digit 2. Write the stems in a vertical column, with the smallest at the top and draw a vertical line at the right of this column. Include all stems, even if they are not used 3. Write each leaf in the row to the right of its stem. Arrange the leaves from smallest to largest Mean: the average Median: number in the middle

Quartiles and extremes together form the 5 number summary min

1 quartile

median

3 quartile

max

0

25

50

75

100

All of this together can form a box and whisker plot - Interquartile range: 25%-75%

Example 1: 3, 3, 5, 6, 9, 10, 14, 14, 16 Min: 3 Q1: 5 Q2:9 (median) Q3: 14 **(see written notes for the box whisker plot of this) Example 2: 3, 3, 5, 6, 9, 10, 14 Min: 3 Q1: 4 Q2:6 (median) Q3: 9.5 **(see written notes for the box whisker plot of this)

Max: 16

Max: 14

Normal distribution - Symmetric

-

Sigma: σ: how spread out it is - Larger standard deviation= larger stigma Average distance from the mean can tell you how big your standard deviation is - σ^2= average square distance from mean - Will also tell you how spread out the distribution is but in a different way

Example: 3, 4, 7, 10 1. Find the mean… 6 2. Find sigma (SD) squared… (3-6)^2 + (4-6)^2 + (7-6)^2 + (10-6)^2 4-1 ← (this is the # of numbers -1) σ^2= 10 **Normal distributions are determined by two numbers: M and σ (mean and sigma) To find the M of two distribution graphs, just add their two M together To find the average σ^2 of the two distribution graphs, just add their two σ^2 together 66-95-99.7% rule If you are within one standard deviation from the mean -> 66%... inflection point

If you are within two standard deviations from the mean-> 95% If you are within three standard deviations from the mean-> 99.7% Example Suppose the daily H20 consumption of indoor types has mean of 2 cups and a standard deviation of ½ cup and is normally distributed. How much water does the 84% drink? Hint: use the 66-95-99.7% rule

r^2= how strong the correlation of the graph is - If r is negative, the trend will one decreases as the other increases - If r is positive, the trend will be that they increase together Correlation does not imply causation - You could have a third factor (or more than three) If r is big, then the two variables are strongly correlated Example: you have a lemonade stand - You notice that people buy more lemonade when it is warm outside - Being warm outside also makes people sweat - If you plotted how much people sweat and how much lemonade people buy, you’d have a positive correlation - Still, you can’t say it’s the sweating that causes the sale of lemonade or the sale of lemonade causes people to sweat - It’s the third factor, it being Summer, that makes this a positive correlation Measuring the same is not being the same To calculate r - If you have a bunch of points, they each have an x coordinate and a y coordinate - Start off w a list of points: (x, y) (x2, y2) (xn, yn) - r^2= the sum of all of the products divided by n, - the product of all sums divided by n^2 ***(r=1 is just a line with positive slope) See notebook for equation formulas

Line of best fit: given a list of points, there is a line through the region that minimizes the error - Error, meaning how far the points are from the line - Ei is the difference between xi and then point on the line Finding the line of regression 1. Find Mx, My, σx, σy, r 2. What is the slope and what is the y intercept a. Slope intercept form: y=mx+b i. ii.

b = y1+y2+yn n M= r (σx • σy)

Equations: - Finding standard deviation ** The standard deviation s of n observations x1, x2,... xn is 1. Find the mean 2. Find each digit’s difference from the mean 3. Find the variance a. Take each difference, square it, and then average the results 4. The standard deviation is the square root of the variance

You measured the heights of four dogs: 600mm, 470mm, 170mm, 430mm and 300mm. 1. Find the mean Mean = = =

600 + 470 + 170 + 430 + 300 5 1970 5 394

2. Find each digit’s difference from the mean Difference from mean = 600- 394= 470- 394= 170- 394= 430- 394= 300- 394=

206 76 -224 36 -94

3. Find the variance - Take each difference, square it, and then average the results σ2 = 206^2 + 76^2 + (−224)^2 + 36^2 + (−94)^2 5 = 42436 + 5776 + 50176 + 1296 + 8836 5 = 108520 5 = 21704 1. The standard deviation is the square root of the variance σ = √21704 = 147.32... = 147 (to the nearest mm)

y=mx+b b= y1 + y2… +yn n M= r (σx / σy) Quartile of normal distribution: 0.67 68-95-99.7 Rule - 68% lie within 1 SD of the mean - 95% lie within 2 SD of the mean - 99.7% lie within 3 SD of the mean...


Similar Free PDFs