Exam 6 September 2016, questions PDF

Title Exam 6 September 2016, questions
Course Statistical Practice I
Institution The University of Adelaide
Pages 18
File Size 448.1 KB
File Type PDF
Total Downloads 497
Total Views 613

Summary

Student ID: Family name: Other names: Desk number: Date: Signature: Examination in the School of Mathematical Sciences Semester 2, 2016 5543 STATS 1000 Statistical Practice I 102232 STATS 1004BR 102232 STATS 1004 104271 STATS 1504 Statistical Practice I — Life Sciences — Bradford Statistical Practic...


Description

Student ID: Family name: Other names: Desk number:

Date:

Signature:

Examination in the School of Mathematical Sciences Semester 2, 2016 5543

STATS 1000

Statistical Practice I

102232

STATS 1004BR

102232

STATS 1004

104271

STATS 1504

Statistical Practice I — Life Sciences — Bradford Statistical Practice I — Life Sciences Statistical Practice I — Life Sciences — Veterinary Bioscience

Time for completing booklet 120 mins (plus 10 mins reading time).

Question

Marks

1

/8

2

/24

3

/13

4

/10

5

/3

6

/12

Total

/70

Instructions to candidates • Attempt all questions and write your answers in the space provided below that question. • If there is insufficient space below a question, then use the space to the right of that question, indicating clearly which question you are answering. • Only work written in this question and answer booklet will be marked. • Examination materials must not be removed from the examination room. Materials • Calculators without remote communications capability are allowed. • A single A4 sheet of notes (double-sided, typed or handwritten) is allowed. • English and foreign-language dictionaries may be used. Do not commence writing until instructed to do so. Page 1 of 18

Statistical Practice I

8 Total

Page 2 of 18

Question 1. To test the effect of two new treatments, denoted Drug A and Drug B, on asthma, 300 asthma sufferers were randomly allocated to receive either Drug A, Drug B, or a control medication. After two weeks on the treatment, the change in lung capacity, as measured by the forced vital capacity (FVC) measured in litres, was recorded. 1(a) Is this an observational study or a designed experiment?

/1 mark

Solution This is an designed experiment. [Core: 1]

1(b) What are the subjects? How many are there? /2 marks

Solution The subjects are the asthma sufferers. [Core: 1] There are 300 subjects. [Core: 1]

1(c) What is the response variable? What type of variable is it (quantitative discrete, etc.)? /2 marks

Solution The response variable is the change in FVC. [ Core: 1] must mention change. It is a quantitative continuous variable. [Core: 1]

1(d) What is the predictor variable? What are the levels of the predictor variable? /2 marks

Solution The treatment is the asthma medication. [Core: 1] It has three levels: [Core: 1] (i) Drug A. (ii) Drug B. (iii) Control medication.

1(e) Give the name of an appropriate hypothesis test procedure that could be used to test for an effect of the predictor variable on the response variable. /1 mark

Please turn over for page 3.

Statistical Practice I Solution One-way ANOVA test. [Core: 1] [Marking comments: accept if only say ANOVA.]

Please turn over for page 4.

Page 3 of 18

Statistical Practice I

Page 4 of 18

24 Total Question 2. Rotten Tomatoes is a movie review website. For every reviewed movie, Rotten Tomatoes gives two scores. The first score is called the Tomatometer. This is the percentage of all film critics that gave a positive review. The second score is called the Audience Score. This is the percentage of all Rotten Tomatoes users who have given the movie a score of 3.5 stars or higher. Is it possible to use the film critic reviews to predict the users’ scores? 2(a) Figure 1 is a scatterplot of Audience Score against Tomatometer for 146 randomly chosen movies on the Rotten Tomatoes website. Describe the relationship. /3 marks

Solution Moderate positive linear relationship. Also two potential outliers at (20, 80). [Core: 3] [Marking comments: 1 for positive, 1 for linear, 1 for moderate. If missing a piece of information, but mention outliers, then give full marks]

2(b) A simple linear regression of the Audience Score on the Tomatometer was performed in SPSS and the output, given in Figure 2, was obtained. (i) State the value of the estimated intercept, b0 . Interpret it in context. /2 marks

Solution The value of the intercept estimate is 32.217. [Core: 1] If the Tomatometer score was zero, then on average the Audience Score would be 32.217. [Core: 1]

(ii) State the value of the estimated slope, b1 . Interpret it in context. /2 marks

Solution The slope estimate is 0.521. [Core: 1] If the Tomatometer score increases by one then, on average, we expect the Audience Score to increase by 0.521. [Core: 1]

Please turn over for page 5.

Statistical Practice I

Page 5 of 18

Figure 1: Scatterplot of Audience Score against Tomatometer for 146 movies on the Rotten Tomatoes website.

Figure 2: Output for the simple linear regression on the Rotten Tomatoes dataset.

Please turn over for page 6.

Statistical Practice I

Page 6 of 18

2(c) Determine whether there is a statistically significant relationship between Audience Score and Tomatometer using the output in Figure 2 by undertaking the following steps. (i) State the appropriate null and alternative hypotheses. Define any parameter used. /2 marks

Solution H 0 : β1 = 0 Ha : β1 6= 0, where β1 is the slope of the linear relationship between Audience Score and Tomatometer. [Core: 2] [Marking comments: 1 for null and alternative, 1 for definition of β1 ]

(ii) State the observed value of the test statistic. /1 mark

Solution The observed value of the test statistic is 14.953. [Core: 1]

(iii) State the P-value. /1 mark

Solution The P-value is 0.000(< 0.0001) [Core: 1]

(iv) Do you reject or retain the null hypothesis at the 5% significance level. Why? /2 marks

Solution Reject the null hypothesis as the P-value is < 0.05. [Core: 2] [Marking comments: 1 for reject, 1 for reason]

Please turn over for page 7.

Statistical Practice I

Page 7 of 18

2(d) State whether each of the following linear regression assumptions is reasonable using Figures 3 and 4. In each case, state which figure is used, explain what you would expect to see if the assumption is valid, and state your conclusion. (i) Linearity of the relationship between Audience Score and Tomatometer. /3 marks

Solution • Look at the residual versus fitted plot (Figure 3). [Core: 1] • Expect random scatter around the zero line. [Core: 1] • Looks reasonable. [Core: 1]

(ii) Constant spread of the residuals. /3 marks

Solution • Look at the residual versus fitted plot (Figure 3). [Core: 1] • Expect equal spread around the zero line. [Core: 1] • Looks reasonable. [Core: 1]

(iii) Normality of the residuals. /3 marks

Solution • Look at the normal QQ-plot of the residuals (Figure 4). [Core: 1] • Expect close alignment to line. [Core: 1] • Looks reasonable. [Core: 1]

2(e) A new movie “Tai-chi Rabbit IV” has received a Tomatometer score of 80 and an Audience Score of 90. Is this an unusually large Audience Score for a movie with a Tomatometer score of 80? Justify your answer with reference to Table 1. Interval type Lower bound Confidence 71.38 Prediction 48.91

Upper bound 76.24 98.71

Table 1: 95% confidence interval and prediction interval for movies with a Tomatometer score of 80. /2 marks Please turn over for page 8.

Statistical Practice I

Page 8 of 18

Figure 3: Scatterplot of the residuals against the fitted values for the simple linear regression on the Rotten Tomatoes dataset.

Figure 4: Normal QQ-plot of the residuals for the simple linear regression on the Rotten Tomatoes dataset.

Please turn over for page 9.

Statistical Practice I

Page 9 of 18

Solution No [Core: 1] , it is not unusually large as the value 90 lies within the 95% prediction interval. [Core: 1]

Please turn over for page 10.

Statistical Practice I

Page 10 of 18

13 Total Question 3. An exam consists of two parts: Part A and Part B. The time,X, for students to complete Part A is assumed to be normally distributed with a mean of 15 minutes and a standard deviation of 5 minutes. The Excel commands and results given at the end of this question may be used in your calculations (Figure 5). 3(a) Calculate the probability that the time for a student to complete Part A is less than 5 minutes. /2 marks

Solution P (X ≤ 5) = N ORM.DI ST (5, 15, 5, T RU E) = 0.02275013. [Core: 2] [Marking comments: 1 for answer, 1 for working]

3(b) Calculate the probability that the time for a student to complete Part A is between 5 and 15 minutes. /2 marks

Solution P (5 ≤ X ≤ 15) = P (X ≤ 15) − P (X ≤ 5)

= N ORM.DI ST (15, 15, 5, T RU E) − N ORM.DIST (5, 15, 5, T RU E)

= 0.4772499. [Core: 2]

[Marking comments: 1 for answer, 1 for working]

3(c) After how many minutes, would we expect 90% of the students to have completed Part A? /2 marks

Solution We need to find the time c such that P (X ≤ c) = 0.9. Using N ORM.IN V (0.9, 15, 5) gives c = 21.40776 minutes. [Core: 2] [Marking comments: 1 for answer, 1 for working]

Please turn over for page 11.

Statistical Practice I

Page 11 of 18

3(d) The time, Y , for students to complete Part B is assumed to be normally distributed with a mean of 30 minutes and a standard deviation of 8 minutes. The total time to complete Part A and Part B of the exam is denoted by T , i.e., T = X + Y. It is assumed that X and Y are independent. (i) Calculate the mean of T . /2 marks

Solution µT = µX + µY = 15 + 30 = 45. [Core: 2] [Marking comments: 1 for answer, 1 for working]

(ii) Calculate the standard deviation of T . /2 marks

Solution σT =

q

2 + σ2 = σX Y

p

52 + 82 = 9.433981.

[Adv: 2] [Marking comments: 1 for answer, 1 for working]

(iii) Hence, what is the distribution of T . /1 mark

Solution T is normally distributed. [Adv: 1]

3(e) Is the assumption of independence of X and Y reasonable given the context? If not, would you expect X and Y to be positively or negatively correlated. Why? /2 marks

Solution No, the assumption of independence is not reasonable as we would expect that given the fixed time for the exam, then if a student takes too long on one part, then have to take less time for the other parts. Hence we expect them to be negatively correlated. [Adv: 2] [Marking comments: for any reasonable discussion]

Please turn over for page 12.

Statistical Practice I

Page 12 of 18 NORM.DIST(0, 15, 5, TRUE) = 0.00135. NORM.DIST(5, 15, 5, TRUE) = 0.02275. NORM.DIST(15, 15, 5, TRUE) = 0.5. NORM.INV(0.9, 15, 5) = 21.41. NORM.INV(0.1, 15, 5) = 8.59.

Figure 5: Excel commands and results that may be used in this question.

10 Total Question 4. Researchers are interested in the effect of oral contraceptives on systolic blood pressure. Ten women are randomly allocated to two groups each containing five women. The first group are given an oral contraceptive for two weeks and then take no oral contraceptive for two weeks. The second group take no oral contraceptive for two weeks, and then take an oral contraceptive for two weeks. At the end of each of the two weeks, the women’s systolic blood pressure is measured. Let X be the systolic blood pressure measurement for a subject while on an oral contraceptive and Y be the systolic blood pressure measurement for a subject while not on an oral contraceptive. For each subject, the difference in systolic blood pressure, D, was calculated by D = X − Y. The following summary statistics were obtained: d¯ = 4.8; sD = 4.57. 4(a) The analysis of the dataset is performed using a matched-pairs t-test rather than a two-sample t-test. Why is this the appropriate analysis?

/1 mark

Solution There are two measurements on each subject. [Core: 1]

4(b) Calculate a 95% confidence interval for the population mean of the differences, µD .

Please turn over for page 13.

Statistical Practice I

Page 13 of 18

You may assume that t∗ = 2.26. /3 marks

Solution

sd d¯ ± t∗ √ n 4.57 = 4.8 ± 2.26 √ 10 = (1.5339, 8.0661) [Adv: 3] [Marking comments: 1 for each bound and 1 for working]

4(c) State the degrees of freedom used to calculate t∗ . /1 mark

Solution The degrees of freedom are n − 1 = 10 − 1 = 9. [Core: 1]

4(d) Based on the confidence interval for µD , is the effect of oral contraception on systolic blood pressure statistically significant? /2 marks

Solution This is equivalent to testing the null hypothesis: µD = 0, where µD is the population mean of the differences. We can test this at a 5% significance level by using the 95% confidence interval for µD. The null value 0 is not contained within the 95% confidence interval, then we reject the null hypothesis at the 5% significance level. [Core: 2] [Marking comments: 1 for reject, 1 for reason.]

4(e) Would a 99% confidence interval for µD be wider or narrower than the 95% confidence interval? Why? /2 marks

Solution It would be wider. To see this, consider the formula for the confidence interval. The only part that would change for a 99% confidence interval is the value of t∗ .

Please turn over for page 14.

Statistical Practice I

Page 14 of 18

Consider Figure 6 which illustrates how to obtain the value of t∗ for a 95% confidence interval. For the case of a 99% confidence interval, we would have a larger blue area, and hence −t∗ and t∗ are larger in magnitude. 0.4

f(t)

0.3

0.2

0.95

0.1

0.0 −2.26

0.00

2.26

t

Figure 6: Illustrative figure of how to obtaint∗ for a 95% confidence interval. [Adv: 2] [Marking comments: 1 for wider, 1 for good discussion.]

4(f) Recall that the women in the study are randomly allocated to two groups, one that takes oral contraception, then no oral contraception, while the second group takes no oral contraception, then oral contraception. Why is this good experimental design? /1 mark

Solution Because of the possibility of a carry-over effect. [Adv: 1] [Marking comments: Any good discussion of time effect. ]

Please turn over for page 15.

Statistical Practice I

3 Total

Page 15 of 18

Question 5. Tick one box for each of the following questions. 5(a) A 95% confidence interval for a population mean is calculated to be (4.5, 9.5). Which of the following statements is correct?

/1 mark

2 95% of the observations lie between 4.5 and 9.5. 2 The probability that the true population mean lies between 4.5 and 9.5 is 0.95. 2 We are 95% confident that the true population mean lies between 4.5 and 9.5. 2 If the process was repeated 100 times, we would expect the sample mean to be within the 95% confidence interval approximately 95 times. Solution 2 2 2  2

95% of the observations lie between 4.5 and 9.5. The probability that the population mean lies between 4.5 and 9.5 is 0.95. We are 95% confident that the true population mean lies between 4.5 and 9.5. If process was repeated 100 times, we would expect the sample mean to be within the 95% confidence interval approximately 95 times.

[Adv: 1]

5(b) A simple random sample is taken from a population with finite mean and variance, if the sample size is large enough, then the distribution of the sample mean is approximately normally distributed. Which of the following explains this phenomenon? /1 mark

2 The central limit theorem. 2 The law of large numbers. 2 It’s just one of those things. 2 Common response. Solution 2  2 2 2

The central limit theorem. The law of large numbers. It’s just one of those things. Common response.

[Adv: 1]

5(c) Which of the following is not an assumption of the binomial distribution? /1 mark

Please turn over for page 16.

Statistical Practice I

Page 16 of 18 2 2 2 2

Binary outcome. Independence of the trials. Normality of the variable. Success probability is constant.

Solution 2 2 2  2

Binary outcome. Independence of the trials. Normality of the variable. Success probability is constant.

[Core: 1]

Please turn over for page 17.

Statistical Practice I

Page 17 of 18

12 Total Question 6. A movie is said to have passed the Bechdel test if it has at least two named women in it that talk to each other about something other than a man. Of a random sample of 253 English language movies since 2000, 114 passed the Bechdel test. 6(a) Calculate the sample proportion. /2 marks

Solution pˆ =

114 = 0.4505929. 253

[Core: 2] [Marking comments: 1 for answer, 1 for working]

6(b) Is the sample proportion a statistic or a parameter? Why? /2 marks

Solution The sample proportion is a statistic as it is a numerical characteristic of a sample. [Core: 2] [Marking comments: 1 for statistic, 1 for reason.]

6(c) The proportion of all movies released in 1980 that passed the Bechdel test was 0.29. Test the null hypothesis that the proportion of movies released since 2000 that pass the Bechdel test is still 0.29. Perform a test of this hypothesis with the following steps: (i) State the appropriate null and alternative hypotheses. Define any parameter used. /2 marks

Solution H0 : p = 0.29 Ha : p 6= 0.29, where p is the proportion of movies since 2000 that pass the Bechdel test. [Adv: 2] [Marking comments: 1 for hypo, 1 for definition of p. ]

(ii) Calculate the observed value of the appropriate test statistic. /3 marks

Please turn over for page 18.

Statistical Practice I

Page 18 of 18 Solution 0.45 − 0.29 = 5.63. z=q 0.29(1−0.29) 253

[Adv: 3] [Marking comments: 1 for answer, 2 for working]

(iii) Using the Excel command: NORM.DIST(-5.63, 0, 1, TRUE) = 9.04 × 10−9 , calculate the P-value. /1 mark

Solution P-value is 2 × 9.04 × 10−9 = 1.808 × 10−8 . [Adv: 1]

(iv) Do you reject or retain the null hypothesis at the 5% significance level? Why? /2 marks

Solution Reject the null hypothesis as the P-value is less than 0.05. [Core: 2] [Marking comments: 1 for reject, 1 for reason.] Core marks total: 51 Adv marks total: 19 Total: 70

End of examination questions....


Similar Free PDFs