Stats Exam 2 Study Guide PDF

Title	Stats Exam 2 Study Guide
Course	Introduction to Statistics and Data Analysis
Institution	University of Michigan
Pages	12
File Size	282 KB
File Type	PDF
Total Downloads	52
Total Views	135

Preview

CLICK TO PREVIEW PDF

Summary

Stats 250 Exam 2 Study Guide...

Description

STATS 250: Exam 2 CHAPTER 5: Population Proportion Sample Proportion Distribution ● Population Proportion: p ● Sample Proportion: p ○ Will vary from sample to sample ○ Found by doing X/n (x is count, or number of successes) ○ If we have the interval, we can estimate that p is just the midpoint ● Sampling distribution ○ N ( p, √ ❑ ) ○ When sample size increases, standard deviation gets smaller ● Standard Deviation of p = s.d.(p ) = √ ❑ ○ How far apart a sample proportion would be from the true population proportion ○ ON POPULATIONS ● Standard Error of p = s.e. (p ) = √❑ ○ Estimate of the standard deviation of p ○ ON THE SAMPLE! ○ Use this to create a range of values we are confident will contain the true proportion (p +/_ a few standard errors) ○ “We would estimate the average distance between the possible sample proportion values (from repeated samples) and the population proportion to be about s.e. (p )” ● Margin of Error- the standard error, multiplied by the confidence level multiplier ○ Ex: 95% confident means multiplier of 1.96→ multiply the standard error by 1.96→ MOE ○ Larger sample size→ smaller margin of error ○ z∗√ ❑

Estimating Proportions with Confidence ● Confidence level- how confident we are in the procedure, percentage of the time we expect the procedure to produce an interval that contains the population parameter ○ MORE CONFIDENT→ WIDER INTERVAL ○ “Before you look”, not after! ○ “If we repeated over and over, we should expect 95% of the resulting intervals to contain the population proportion of all ____. ■ Don’t specify an interval ■ “Represented in the data” means nothing ● Confidence Interval = p +/- z∗ √❑ is used to estimate the value of a population parameter ○ Sample estimate +/- a few standard errors ■ “Few” depends on how confident we want to be ○ “Based on this sample, with 95% confidence, we would estimate that somewhere between ____ and ____, of all American teenagers think they....” ○ P( ____ ≤ p ≤ ____) = 0 or 1! It is NOT the confidence interval

○ THE INTERVAL AND p ARE FIXED!! ○ Assumptions/conditions: ■ Sample is randomly selected ■ Sample size is large enough so it passes those ≥ 10 tests ○ Values outside of interval can be rejected as reasonable values! ● Conservative Confidence Interval gives the largest confidence interval, and will be wider than needed ○ Margin of error:

○ Interval: p +/-

● Desired sample size:

z∗¿ 2 √❑ ¿ z∗¿ 2 √❑ ¿ z∗¿ 2m ¿ where m = desired margin of error ¿ ¿ n=¿

Testing about a Population Proportion ● Null Hypothesis or H0 : p= # which is the status quo: nothing changes ● Alternative Hypothesis or Ha : p > # states that there is a relationship or difference, that something changed or is happening ● Both are about population parameter, not sample ● Data is summarized using a test statistic =

sample statistic−null value (null)standard error

○ Standardized stat that measures distance between the sample statistic and null value in standard error units ● P-value is the probability of getting a test statistic as extreme or more extreme than the observed test statistic value ○ Indicates degree of significance ○ Must be between 0 and 1 ○ smaller→ stronger evidence against H0 ● Significance level α ○ P-value low, H0 will have to go!!! ○ P ≤ α→ reject H0 bc there is sufficient support for Ha ○ P > α→fail to reject H0 bc there isn’t sufficient support for Ha ● We use N ( p, √❑ ) ● Observed test statistic: Use

z=

p −p 0❑ √❑

○ p0 is the null hypothesis ○ Distribution will be N (0,1) ● If it says “majority” or “minority” we use p and .50 ● STEPS

1.

2.

3.

Determine appropriate null and alternative hypotheses ■ H0 : p= .10 Ha : p > .10 ■ Where the parameter, p, represents the population proportion of all _____ ■ Note: direction of extreme is: one sided to the right/left or 2 sided ■ Significance level Check assumptions ■ Random sample ■ np0 ≥10 and n(1-p0) ≥10 Calculate the test statistic and determine p-value ■ Observed test statistic:

4.

z=

p −p 0❑ √❑

■ P-value: assume null is true so N (0,1) ■ Draw bell curve, shade in Ha, and use table to find P(z ≥ .82) Evaluate and report conclusion ■ “There is not enough sufficient support to say the population proportion of all _____ is greater than 10%

● IF SAMPLE SIZE IS < 10→ WE MUST GO BACK TO BINOMIAL DISTRIBUTION ○ Bin (n, po) ○ The observed test statistic is now the COUNT, or number of successes (ex: 9/10 showed improvement→ X = 9) ○ P-value would be P (X≤ or ≥ observed test statistic aka SUCCESSES) → ex: P (X ≤ 9)

Two Types of Errors ● Type 1: rejecting H0 when H0 is true ○ SIGNIFICANCE LEVEL!!!!!!! ○ α= P (Type 1 error) ○ H0 IS TRUE! ● Type 2: failing to reject H0 when Ha is true ○ ß = P (type 2 error) ○ Ha IS TRUE! ● Power of the test: what is the probability of correctly rejecting H0? ○ Power = 1 - ß ■ = P (rejecting H0 when Ha is true) = 1- P (failing to reject H0 when Ha is true) ■ = 1 - P (Type 2 error) = 1 - ß ○ Researchers want a test with higher power ○ “Probability of advocating the new theory given its true” ○ Best way to increase power is to increase sample size!!! ● α and ß have trade-offs ● Larger sample size→ higher power ● Larger significance level→ higher power ● True parameter value that falls further from null value in direction of H a→ higher power ● Probability that type 1 or type two mistake was made: EITHER ONE OR ZERO!

CHAPTER 6: Difference in Population Proportion (CATEGORICAL) Distribution: Difference in Sample Proportions ● Independent samples: measurements in a sample are not related to the measurements in the other, generated with ○ Random sample taken separately from two populations, same response variable ○ One sample with categories (old vs young) ○ Participants randomly assigned to one of two treatments ● Mean of difference = difference in means ● Variance of difference = sum of variances

○

p(1− p) n

● Sampling distribution of difference in two Independent Sample Proportions ○ N (p1 - p2, √❑ ) ○ All must be > 10 ● Standard error of difference in sample proportions ○ √❑

Confidence Interval: Difference in Population Proportions ● Confidence Interval: (p 1 -p 2 ) +/- z∗¿ s.e. (p 1 -p 2 ) ● Assumptions: ○ independent, random samples ○ All quantities (with p1/2 are ≥ 10 ● With x% confidence, there is a significant difference if 0 is not in the interval STATE: p n1p1 ≥ 10 n1(1-p1) ≥ 10

n2p2 ≥ 10 n2(1-p2) ≥ 10

CHECK: p (SPECIFIC) n1p 1 ≥ 10 n2p 2 ≥ 10 n1(1-p 1) ≥ 10 n2(1-p 2) ≥ 10

Hypothesis Testing: Difference in Population Proportions ● Null hypothesis, H0: p1 = p2 ● Alternate Hypothesis, Ha: p1 (>≠≠≠ 25) → check w/ histogram of sample ● Pooled approach ○ Independent, random sample ○ From random normal population ○ Equal population variances (similar sample variances) ■ Pool sample variances for overall estimate: ■ Standard deviation sp = √❑ ■ Standard error: pooled s.e. (x 1−x 2) = sp √ ❑ CHECKING EQUAL VARIANCE ● Compare Standard Deviations ○ similar→ can assume common population variance is reasonable→ pooled ○ Larger > 2x smaller ● Use side-by-side boxplots→ if IQR lengths are similar→ pooled ● Levene’s Test (H0: σ2 = σ2 ) ○ Big p-value→ pooled (fail to reject) ○ Small p-value→ unpooled ( reject)

Hypothesis Testing: Difference in Two Population Means

● H0: µ1= µ2 ● Ha: µ1 (>≠...