Title | Res-Econ TBL 5 Notes - Chebyshev\'s Theorem - Normal Distribution, Empirical Rule, Z-Scores, Measures |
---|---|
Course | Introductory Statistics for the Social Sciences |
Institution | University of Massachusetts Amherst |
Pages | 3 |
File Size | 142 KB |
File Type | |
Total Downloads | 73 |
Total Views | 137 |
Chebyshev's Theorem - Normal Distribution, Empirical Rule, Z-Scores, Measures of Association (Covariance, Correlation Coefficient) along with examples, formulas, and steps.
Professor: Wayne Roy Gayle...
Resource Economics 212 October 6 2016 Textbook Notes TBL 5 Chebyshev’s Theorem For any population (sample data) with mean and population (sample) standard deviation, the percentage of observations that lie within k standard deviations of the mean, k>1, must be: o At least 100[1-1/k^2] EG: Assume k = 2 standard deviations, then 100[1-1/2^2] = 75% So, at least 75% will lie within 2 standard deviations from the mean. Normal Distribution A bell shaped curve that is symmetrical o Not all bell curves are normal Completely characterized by its mean and standard deviation Importance of the Normal Distribution o One of the most important distributions in natural and social sciences, because: There are many variables htat closely follow the normal distribution Examples: o Height of people o Errors in measurement due to imperfect instruments/observers o Test Grades The average of many samples independently drawn from the same distribution are distributed nearly normal for a large enough sample size. This is a crude statement of the Central Limit Theorem The Empirical Rule If data distribution is Normally distributed, then Empirical Rule states that we expect o the interval [µ - kσ µ + kσ] to contain a known percentage of data. k = 1: 68.26% will lie within µ ± 1σ k = 2: 95.44% will lie within µ ± 2σ k = 3: 99.73% will lie within µ ± 3σ If data distribution is Normally distributed, then the Empirical rule states that we expect: o +/- standard deviation from the mean to contain 68% of the data o +/- standard deviation from the mean to contain 95% of the data o +/- standard deviation from the mean to contain 99.7% of the data Chebyshev’s Theorem vs. Empirical Rule o Chebyshev’s Theorem gives lower bound percentages for any distribution EG: 75% or more of the data falls within 2 standard deviations from the mean.
o If the data is Normal, the Empirical Rule gives exact percentages EG: 95% of the data falls within 2 standard deviations from the mean. Z-Scores – Standardizing Data Definition o The number of standard deviations a value is from the mean. o Z = deviation from the mean (x – mean) / sample standard deviation (S) o Z-scores tell us the position of any observation relative to the mena. o Z-scores are Unit free Inverse Z-Score o There is a one-to-one relationship between the z-score and data value. o Mathematically: � = (�− )/s =�×�+ Important Z-Scores o Z = -3: Three standard deviations below the mean o Z = -2: Two standard deviations below the mean o Z = -1: One standard deviation below the mean o Z = 0: Exactly at the mean o Z = +1: One standard deviation above the mean o Z = +2: Two standard deviations above the mean o Z = +3: Three standard deviations above the mean Visual Display – Bivariate Data Bivariate data set is a data set consisting of two variables Can be used to analyze the association between two variables visually or numerically. Examples: o Multiple Bar Chart o Multiple Line Graph o Multiple Scatter Chart Measures of Association – Covariance Definition o Measure of linear association between two variables. o Measures how two variables change together. o Positive Covariance: Implies the variables are directly associated, or move together. o Negative Covariance: Implies the two variables are inversely associated, or move in the opposite direction. Calculating the Covariance o Covariance of a Population = Summation of (Yi – Y Mean)(Xi – X Mean) / N o Covariance of a Sample = Summation of (Yi – Y Mean)(Xi – X Mean) / n – 1 Measures of Association – Correlation coefficient Definition
o Measure of the degree of linear association between two variables. o Gives more information than the covariance: How closely related X and Y are. A unit free number with range [-1, 1] Value of -1 means perfect negative linear association. Value of +1 means perfect positive linear association. Formula o Population Correlation Coefficient = (Covariance of X and Y) / Standard Deviation of X * Standard Deviation of Y ��,� = ��,� / �� �Y o Sample Correlation Coefficient = Assessed value by Sales Price / Assessed value of x and y �� ,� = ��,� / �� �Y...