HW05-Sol - Chapter 5 Homework Solutions. Professor Su. PDF

Title HW05-Sol - Chapter 5 Homework Solutions. Professor Su.
Author Cynthia Moreno
Course Prob & Applied Statistics
Institution University of Texas at El Paso
Pages 6
File Size 118.4 KB
File Type PDF
Total Downloads 79
Total Views 152

Summary

Chapter 5 Homework Solutions. Professor Su....


Description

STAT 3325

Probability & Applied Statistics

Solutions for Homework 05: Inference for Numerical Data

The following problems are taken from OpenIntro Statistics, the Third Edition: 5.1, 5.2, 5.4, 5.8, 5.12, 5.14, 5.15, 5.16, 5.18, 5.20, 5.22, 5.26, 5.28, 5.30, 5.32, 5.35

1. Problem 5.1: (5)

(a) n = 6, CL = 90%, df = 6 − 1 = 5, t0.95 = 2.02; (20)

(b) n = 21, CL = 98%, df = 21 − 1 = 20, t0.99 = 2.53; (28)

(c) n = 29, CL = 95%, df = 29 − 1 = 28, t 0.975 = 2.05; (11)

(d) n = 12, CL = 99%, df = 12 − 1 = 11, t0.995 = 3.11. 2. Problem 5.2: The dotted line is the t-distribution with 1 degree of freedom, the dashed line is the t-distribution with 5 degrees of freedom, and the solid line is the standard normal distribution. As the degrees of freedom increases the t-distribution approaches the normal distribution. Another valid justification is that lower the degrees of freedom, thicker the tails. 3. Problem 5.4: (a) Ha : µ > 0.5, We have n = 16, observed t = 2.485, df = 26 − 1 = 25. Since 0.01 < p-value < 0.025, fail to reject H0 at significance level α = 0.05. (b) Ha : µ < 3, We have n = 18, observed t = 0.5. With df = 18 − 1 = 17, it can be found that p-value > 0.1. Thus we fail to reject H0 at significance level α = 0.05. 4. Problem 5.8: (a) This may not be reasonable since the sample may not be random. The drivers who volunteer to submit their gas mileage on fueleconomy.gov might be those that are getting much lower or much higher than the gas mileage estimated by the EPA. (b) H0 : µ = 50 vs. Ha : µ = 6 50. Before calculating the test statistic, we should evaluate the conditions for the test: • Independence: Our sample is a convenience sample, which is a red flag regarding the independence of observations (even when limiting our population to be those who participate on the fueleconomy.gov website). When reporting these results to others, we should volunteer this information and note that our results rely on the assumption that the observations are independent. 1

• Normality: The distribution is approximately symmetric and there is no evidence that it is not nearly normal, though checking this conditions is difficult for such a small sample. The observed test statistic can be calculated as follows x ¯ − µ0 3.3 53.3 − 50 √ √ = = 2.37. tobs = = 1.39 s/ n 5.2/ 14 With df = n − 1 = 14 − 1 = 13, the p-value can be found as   p-value = Pr |t(13)| > 2.37 ⇒ 0.02 < p-value < 0.05.

Since p − value < 0.05, reject H0 . The data provide strong evidence against the EPA claim of 50 MPG.

(13) (c) For confidence level 95%, it can be found that the 97.5-th percentile of t 0.975 is 2.16. Thus the 95% confidence interval (CI) is

5.2 (13) s x ¯ ± t0.975 √ = 53.3 ± 2.16 × √ = 53.3 ± 3 = (50.2, 56.3). n 14 We are 95% confident that a 2012 Prius gets on average 50.3 to 56.3 MPG. 5. Problem 5.12: (a) H0 : µ = 35 vs. Ha : µ > 35 (b) Two assumptions • Independence: 52 police officers are less than 10% of all police officers and if we can assume that these 52 officers represent a random sample, we can assume that the blood lead concentration of one officer in the sample is representative of another. • Normality: We don’t have a plot of the distribution that we can use to check this condition, however given that the sample average is more than three times as large as the standard deviation, it is conceivable that the distribution is approximately normal (the 68-95-99.7% rule could apply here). There is also no reason to suspect extreme skew in the distribution of blood lead concentration. • The observed test statistic is tobs =

124.32 − 35 x ¯ − µ0 √ = √ ≈ 17.07. s/ n 37.74/ 52

With df = 52 − 1 = 51, it can be found that p-value = Pr(t(51) > 17.07) < 0.005. The hypothesis test yields a very small p-value, so we reject H0 . This indicates that the data provide very convincing evidence that the police officers have been exposed to a higher concentration of lead than individuals living in a suburban area. • Given that the one-sided p-value is less than 0.005, a two-sided hypothesis test at α = 0.01 would also be rejected with these data. Since such a test is equivalent to a 99% confidence interval, we would not expect this interval to include the null value of 35 µg/l. 2

6. Problem 5.14: (a) Using the inequality 250 25 ≥ 1.65 × √ ⇒ n≥ n



1.65 × 250 25

2

≈ 272.25.

Raina should collect a sample of at least 273 students. (b) If Luke had the same sample size as Raina but used a higher confidence level, he would end up with wider interval. To keep the width of his confidence interval the same as Raina’s Luke will need a higher sample size. (c) Based on the inequality 250 25 ≥ 2.58 × √ ⇒ n≥ n



2.58 × 250 25

2

≈ 665.64.

Luke should collect a sample of at least 666 students. 7. Problem 5.15: (a) Two-sided, we are evaluating a difference, not in a particular direction. (b) Paired, data are recorded in the same cities at two different time points. The temperature in a city at one point is not independent of the temperature in the same city at another time point. (c) t-test, sample is small and population standard deviation is unknown. 8. Problem 5.16: (a) TRUE (b) TRUE (c) TRUE (d) FALSE. We find the difference of each pair of observations, and then we do inference on these differences. 9. Problem 5.18: (a) Paired, on the same day the stock prices may be dependent on external factors that affect the price of both stocks. (b) Paired, the prices are for the same items. (c) Not paired, these are two independent random samples, individual students are not matched. 10. Problem 5.20: (a) The median writing score is slightly higher but it’s difficult to tell if the average scores on the two tests are different or not.

3

(b) No, the score of one student on the reading test is not independent of their score on the writing test. (c) Let d denote the difference between read and write. Then the hypotheses can be stated as H0 : µd = 0 vs. Ha : µd = 6 0.

(d) The conditions for the sampling distribution of x ¯d to be nearly normal and the estimate of the standard error to be sufficiently accurate are as follows:

• Independence: Students are randomly sampled and 200 < 10% of all students who take this survey, therefore we can assume that the reading and writing scores of one student are independent of another. • Normality: The distribution of the differences appears fairly symmetric, so we can assume that the sampling distribution of average differences will be approximately normal. (e) The observed test statistics is tobs =

−0.545 − 0 x ¯ − µ0 √ = √ = −0.87. sd / n 8.887/ 200

With df = 200 − 1 = 199, p-value = Pr(|t(199) | > 0.87) > 0.20. Since the p-value > 0.05, fail to reject H0 . The data do not provide convincing evidence of a difference between the average reading and writing scores. (f) We may have made a Type 2 error, i.e. we may have incorrectly failed to reject H0 . In this context a Type 2 error means deciding that the data do not provide convincing evidence of a difference between the average reading and writing scores of students when in reality there is a difference. (g) Since we failed to reject H0 , which claimed the average difference is equal to 0, we would expect a confidence interval to include this value. 11. Problem 5.22: (a) A 95% confidence interval can be calculated as follows: 8.887 (df) s x ¯d ± t0.975 √d = −0.545 ± 1.98 × √ = (−1.79, 0.70). n 200 (b) We are 95% confident that on the reading test students score, on average, 1.79 points lower to 0.70 points higher than they do on the writing test. (c) No, since 0 is included in the interval. 12. Problem 5.26: These data come from a population (all oscar winners) not a random sample, so there is no need for hypothesis testing in this situation. We can see that the average age for the population of best actress winners is lower than the average age for the best actor winners. 13. Problem 5.28: The hypotheses are: H0 : µ0.99 = µ1 vs. Ha : µ0.99 6= µ1 . The conditions that need to be satisfied for the sampling distribution of (¯ x0.99 − x ¯1 ) to be nearly normal and the estimate of the standard error to be sufficiently accurate are:

4

(a) Independence: Both samples are random and represent less than 10% of their respective populations. Also, we have no reason to think that the 0.99 carats are not independent of the 1 carat diamonds since they are both sampled randomly. (b) Normality: The distributions are not extremely skewed, hence we can assume that the distribution of the average differences will be nearly normal as well. The test statistic is (¯ x0.99 − x ¯1 ) − (µ0.99 − µ1 ) (44.51 − 56.81) − 0 −12.3 = s = s = −2.82. 4.36 2 s0.99 s21 16.132 13.322 + + n0.99 n1 23 23

tobs =

With df = 23 − 1 = 22, it can be found that p-value = Pr(|t(22) | > 2.82) = 0.01. Since p-value < 0.05, reject H0 . The data provide convincing evidence that the average standardized price of 0.99 carats and 1 carat diamonds are different. 14. Problem 5.30: (a) The 95% con dence interval can be calculated as follows: r

s2 s2 · n0.99 + n11 0.99 r .322 + 16.132 = (56.81 − 44.51) ± 2.07 × 1323 23 = −12.3 ± 2.07 × 4.36 = (−21.33, −3.27). (¯ x0.99 − x ¯1 ) ±

(23−1) t 0.975

We are 95% confident that the average standardized price of a 0.99 carat diamond is $3.27 to $21.33 lower than the average standardized price of a 1 carat diamond. 15. Problem 5.32:

First state the hypotheses: H0 : µA,c = µM,c vs. Ha : µA,c 6= µM,c .

We are told to assume that conditions for inference are satisfied. Then, the test statistic can be calculated as follows: tobs =

(16.12 − 19.85) − 0 −3.73 (¯ xA,c − x ¯ ) − (µA,c − µM,c ) sM,c = −3.3. = s = 1.13 2 s2A,c sM,c 3.582 4.512 + + nA,c nM,c 26 26

With df = min(nM,c − 1, nA,c − 1) = min(26 − 1, 26 − 1) = 25, the corresponding p-value is p-value − Pr(|t(25) > 3.3) < 0.01. Since p-value < 0.05, reject H0 . The data provide strong evidence that there is a difference in the average city mileage between cars with automatic and manual transmissions.

5

16. Problem 5.35: The hypotheses are H0 : µT = µC and Ha : µT = µC . We are told to assume that conditions for inference are satisfied. The observed statistic can be found tobs =

30 (¯ xT − x ¯C ) − (µT − µC ) (57.1 − 27.1) − 0 = 2.69. = r = s 11.14 45.12 + 26.42 s2T s2C + 22 22 nC nT

With df = min(n1 − 1, n2 − 1) = min(22 − 1, 22 − 1) = 21, the p-value can be found as p-value = Pr(|t(21) | > 2.69) ⇒ 0.01 < p-value < 0.02. Since p-value < 0.05, we reject H0 at the significance level α = 0.05. The data provide convincing evidence that the average food consumption by the patients in the treatment and control groups are different. Furthermore, the data indicate patients in the distracted eating (treatment) group consume more food than patients in the control group.

6...


Similar Free PDFs