Title | Stats Exam 2 Study Guide |
---|---|
Course | Introduction to Statistics and Data Analysis |
Institution | University of Michigan |
Pages | 12 |
File Size | 282 KB |
File Type | |
Total Downloads | 52 |
Total Views | 135 |
Stats 250 Exam 2 Study Guide...
STATS 250: Exam 2 CHAPTER 5: Population Proportion Sample Proportion Distribution ● Population Proportion: p ● Sample Proportion: p ○ Will vary from sample to sample ○ Found by doing X/n (x is count, or number of successes) ○ If we have the interval, we can estimate that p is just the midpoint ● Sampling distribution ○ N ( p, √ ❑ ) ○ When sample size increases, standard deviation gets smaller ● Standard Deviation of p = s.d.(p ) = √ ❑ ○ How far apart a sample proportion would be from the true population proportion ○ ON POPULATIONS ● Standard Error of p = s.e. (p ) = √❑ ○ Estimate of the standard deviation of p ○ ON THE SAMPLE! ○ Use this to create a range of values we are confident will contain the true proportion (p +/_ a few standard errors) ○ “We would estimate the average distance between the possible sample proportion values (from repeated samples) and the population proportion to be about s.e. (p )” ● Margin of Error- the standard error, multiplied by the confidence level multiplier ○ Ex: 95% confident means multiplier of 1.96→ multiply the standard error by 1.96→ MOE ○ Larger sample size→ smaller margin of error ○ z∗√ ❑
Estimating Proportions with Confidence ● Confidence level- how confident we are in the procedure, percentage of the time we expect the procedure to produce an interval that contains the population parameter ○ MORE CONFIDENT→ WIDER INTERVAL ○ “Before you look”, not after! ○ “If we repeated over and over, we should expect 95% of the resulting intervals to contain the population proportion of all ____. ■ Don’t specify an interval ■ “Represented in the data” means nothing ● Confidence Interval = p +/- z∗ √❑ is used to estimate the value of a population parameter ○ Sample estimate +/- a few standard errors ■ “Few” depends on how confident we want to be ○ “Based on this sample, with 95% confidence, we would estimate that somewhere between ____ and ____, of all American teenagers think they....” ○ P( ____ ≤ p ≤ ____) = 0 or 1! It is NOT the confidence interval
○ THE INTERVAL AND p ARE FIXED!! ○ Assumptions/conditions: ■ Sample is randomly selected ■ Sample size is large enough so it passes those ≥ 10 tests ○ Values outside of interval can be rejected as reasonable values! ● Conservative Confidence Interval gives the largest confidence interval, and will be wider than needed ○ Margin of error:
○ Interval: p +/-
● Desired sample size:
z∗¿ 2 √❑ ¿ z∗¿ 2 √❑ ¿ z∗¿ 2m ¿ where m = desired margin of error ¿ ¿ n=¿
Testing about a Population Proportion ● Null Hypothesis or H0 : p= # which is the status quo: nothing changes ● Alternative Hypothesis or Ha : p > # states that there is a relationship or difference, that something changed or is happening ● Both are about population parameter, not sample ● Data is summarized using a test statistic =
sample statistic−null value (null)standard error
○ Standardized stat that measures distance between the sample statistic and null value in standard error units ● P-value is the probability of getting a test statistic as extreme or more extreme than the observed test statistic value ○ Indicates degree of significance ○ Must be between 0 and 1 ○ smaller→ stronger evidence against H0 ● Significance level α ○ P-value low, H0 will have to go!!! ○ P ≤ α→ reject H0 bc there is sufficient support for Ha ○ P > α→fail to reject H0 bc there isn’t sufficient support for Ha ● We use N ( p, √❑ ) ● Observed test statistic: Use
z=
p −p 0❑ √❑
○ p0 is the null hypothesis ○ Distribution will be N (0,1) ● If it says “majority” or “minority” we use p and .50 ● STEPS
1.
2.
3.
Determine appropriate null and alternative hypotheses ■ H0 : p= .10 Ha : p > .10 ■ Where the parameter, p, represents the population proportion of all _____ ■ Note: direction of extreme is: one sided to the right/left or 2 sided ■ Significance level Check assumptions ■ Random sample ■ np0 ≥10 and n(1-p0) ≥10 Calculate the test statistic and determine p-value ■ Observed test statistic:
4.
z=
p −p 0❑ √❑
■ P-value: assume null is true so N (0,1) ■ Draw bell curve, shade in Ha, and use table to find P(z ≥ .82) Evaluate and report conclusion ■ “There is not enough sufficient support to say the population proportion of all _____ is greater than 10%
● IF SAMPLE SIZE IS < 10→ WE MUST GO BACK TO BINOMIAL DISTRIBUTION ○ Bin (n, po) ○ The observed test statistic is now the COUNT, or number of successes (ex: 9/10 showed improvement→ X = 9) ○ P-value would be P (X≤ or ≥ observed test statistic aka SUCCESSES) → ex: P (X ≤ 9)
Two Types of Errors ● Type 1: rejecting H0 when H0 is true ○ SIGNIFICANCE LEVEL!!!!!!! ○ α= P (Type 1 error) ○ H0 IS TRUE! ● Type 2: failing to reject H0 when Ha is true ○ ß = P (type 2 error) ○ Ha IS TRUE! ● Power of the test: what is the probability of correctly rejecting H0? ○ Power = 1 - ß ■ = P (rejecting H0 when Ha is true) = 1- P (failing to reject H0 when Ha is true) ■ = 1 - P (Type 2 error) = 1 - ß ○ Researchers want a test with higher power ○ “Probability of advocating the new theory given its true” ○ Best way to increase power is to increase sample size!!! ● α and ß have trade-offs ● Larger sample size→ higher power ● Larger significance level→ higher power ● True parameter value that falls further from null value in direction of H a→ higher power ● Probability that type 1 or type two mistake was made: EITHER ONE OR ZERO!
CHAPTER 6: Difference in Population Proportion (CATEGORICAL) Distribution: Difference in Sample Proportions ● Independent samples: measurements in a sample are not related to the measurements in the other, generated with ○ Random sample taken separately from two populations, same response variable ○ One sample with categories (old vs young) ○ Participants randomly assigned to one of two treatments ● Mean of difference = difference in means ● Variance of difference = sum of variances
○
p(1− p) n
● Sampling distribution of difference in two Independent Sample Proportions ○ N (p1 - p2, √❑ ) ○ All must be > 10 ● Standard error of difference in sample proportions ○ √❑
Confidence Interval: Difference in Population Proportions ● Confidence Interval: (p 1 -p 2 ) +/- z∗¿ s.e. (p 1 -p 2 ) ● Assumptions: ○ independent, random samples ○ All quantities (with p1/2 are ≥ 10 ● With x% confidence, there is a significant difference if 0 is not in the interval STATE: p n1p1 ≥ 10 n1(1-p1) ≥ 10
n2p2 ≥ 10 n2(1-p2) ≥ 10
CHECK: p (SPECIFIC) n1p 1 ≥ 10 n2p 2 ≥ 10 n1(1-p 1) ≥ 10 n2(1-p 2) ≥ 10
Hypothesis Testing: Difference in Population Proportions ● Null hypothesis, H0: p1 = p2 ● Alternate Hypothesis, Ha: p1 (>≠≠≠ 25) → check w/ histogram of sample ● Pooled approach ○ Independent, random sample ○ From random normal population ○ Equal population variances (similar sample variances) ■ Pool sample variances for overall estimate: ■ Standard deviation sp = √❑ ■ Standard error: pooled s.e. (x 1−x 2) = sp √ ❑ CHECKING EQUAL VARIANCE ● Compare Standard Deviations ○ similar→ can assume common population variance is reasonable→ pooled ○ Larger > 2x smaller ● Use side-by-side boxplots→ if IQR lengths are similar→ pooled ● Levene’s Test (H0: σ2 = σ2 ) ○ Big p-value→ pooled (fail to reject) ○ Small p-value→ unpooled ( reject)
Hypothesis Testing: Difference in Two Population Means
● H0: µ1= µ2 ● Ha: µ1 (>≠...