Final exam Summer 2017, questions and answers PDF

Title Final exam Summer 2017, questions and answers
Course Introduction to Biostatistics
Institution University College Cork
Pages 13
File Size 1.5 MB
File Type PDF
Total Downloads 65
Total Views 133

Summary

Questions and solutions to the Summer exam in st2001 2017...


Description

OLLSCOIL NA hÉIREANN, CORCAIGH THE NATIONAL UNIVERSITY OF IRELAND, CORK COLAISTE NA hOLLSCOILE, CORCAIGH UNIVERSITY COLLEGE, CORK Summer 2017 - Semester 2 ST2001 – Introduction to Biostatistics EXAMINERS Dr. P. Ansell Prof. F. O'Sullivan Dr. Michael Cronin Ms. Paula Harrison Dr. Eric Wolsztynski

DURATION OF PAPER 1½ Hours FULL MARKS FOR THREE COMPLETE ANSWERS There are two sections in this paper. You must answer question one in section A and two questions from section B.

A list of statistical formulae and selected statistical tables are provided at the end of the examination paper. Graph Paper is available. A calculator may be used provided that it does not contain any information stored by any person prior to the examination. Fifteen minutes of reading time are permitted prior to this examination.

PLEASE DO NOT TURN THIS PAGE UNTIL INSTRUCTED TO DO SO THEN ENSURE THAT YOU HAVE THE CORRECT EXAM PAPER

Page 1 of 13

SECTION A Question 1 A study carried out in 2010 found that mean daily caffeine intake among Irish women was 142.8 mg. A nutritionist is interested in verfiying a claim that the average level of caffeine intake in this subpopulation has increased since this last study. (a)

Based on a sample of 121 Irish women surveyed in 2015, the nutrionist measured an average daily caffeine intake of 145.1 mg. Explain whether or not this result is sufficient to validate the nutritionist’s claim. [5 marks] Question unseen before. Insufficient - Need to assess whether the observed result is due to chance, or if on the contrary it likely describes a feature of the population. Statistical significance of a result allows the information collected from the sample to be generalized to the population. [2 marks for a vague definition] 5 marks

(b)

Suppose the nutritionist in (a) also calculated that the standard error associated with the sample mean was 1.2 mg. What is the standard deviation of the sample of daily caffeine uptake for this experiment? [10 marks] Question unseen before – indirect way of testing understanding of definition. The standard deviation is sqrt(n)*std.error= 13.2 mg [0 or full marks] 10 marks

(c)

Calculate the width of the 95% confidence interval associated with this measure of the mean found in (a). [5 marks] Standard question. Width of the confidence interval is 2* 1.96 * std.error = 4.70 mg [No marks for CI; 3 if wrong calc; 3 for half width] 5 marks

(d)

Describe how increasing the confidence level in (c) would affect the width of this confidence interval. [5 marks] Standard question. Confidence interval would become wider. 5 marks

(e)

Based on the confidence interval in (c), would you say that the nutritionist’s claim that the average level of caffeine intake in this subpopulation has increased from 2010 to 2015 is well founded? Justify your answer. [5 marks] Standard question. 95% CI: 145.10 +/- 2.35 = (142.75,147.45) contains the old average of 142.8 mg, so at the 95% confidence level we cannot support that claim. 5 marks

Page 2 of 13

(f)

In order to test the nutritionist’s claim, the nutritionist is considering a one sample t-test of the following hypotheses: H0: µ=145.1 versus HA: µ>145.1 Explain whether the nutritionist is setting the t-test correctly. [10 marks] Question unseen before. Wrong setup: The directions of the hypotheses are correct, but the nutritonist is using the observed sample mean from the 2015 sample, but to challenge the claim he should have tested against the 2010 average daily caffeine uptake value. 10 marks

(g)

The nutritionist was able to compare the results of the analysis on Irish women with another analysis based on a sample of British women. The two samples are of comparable sizes. The nutritionist first carries out Levene’s test on this two-sample dataset. Formulate the null and alternative hypotheses for this test. [5 marks] Standard question. H0: the population variances in daily caffeine intake are equal for both Irish and British populations. HA: the population variances in daily caffeine intake differ between Irish and British populations. [2 marks only if population is not mentioned] 5 marks

(h)

The 95% confidence interval for the difference in mean daily caffeine uptake was found to be (-3.33 mg, -2.78 mg). Interpret this result, in terms of statistical significance. [5 marks] Standard question. As zero is not contained in the 95% CI, we can conclude that the means are not equal. The result is statistically significant. The statement does not allow to decide which population has the lower average uptake. 5 marks

Page 3 of 13

SECTION B Question 2 Data unseen before. All question items were seen before. The weight gains of beef steers were measured over a 140-day test period. The average daily gains (lb/day) of 19 steers on the same diet were as follows: 3.89 3.27 3.63 (a)

3.51 3.13 3.98

3.97 3.76 3.12

3.31 3.79

3.21 3.72

3.36 3.42

3.67 3.15

3.24 3.88

Calculate the mean and median. [10 marks] 𝑥=

!".!" !"

= !𝟑. 𝟓𝟑

𝑛 = 19 𝑚𝑒𝑑𝑖𝑎𝑛!𝑖𝑠!𝑖𝑛!𝑡ℎ𝑒!

!!! !

𝑡ℎ!𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛 = 10𝑡ℎ!𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛 = 𝟑. 𝟓𝟏 5, 5 marks

(b)

Calculate the quartiles and interpret these values. [10 marks] Q1 is in position (19 + 1)/4 = 5th position 5th: 3.24 Q3 is in position 3(19 + 1)/4 = 155th position 15th: 3.79

5 marks 5 marks

One in four beef steers have a weight gain of 3.24 lb/day or less. Three in four beef steers have a weight gain of 3.79 lb/day or less. 2x1 marks (c)

Construct a box-plot showing all calculations. [16 marks] IQR = Q3-Q1 = 3.79-3.24 = 0.55 lb 1.5IQR = 0.825 Q3 + 1.5*IQR = 4.615 = UAV = 3.98 Q1 – 1.5*IQR = 2.415 = LAV = 3.12 No outliers. 12 marks

Page 4 of 13

4 marks (d)

Based on the box-plot in (c), provide suitable measures of centrality and spread. Include an explanation for your choice. (Note: you do not need to do any calculations.) [8 marks] Centrality: The mean, as the distribution is symmetric. The mean and the median are about the same. Spread: The standard deviation, as this is the appropriate measure of spread when the data is not significantly skewed or has outliers. 3, 3, 2 marks

(e)

Suppose an additional beef steer with a weight gain of 5.2 lb/day was added to the dataset. What effect would this have on the mean and median? [4 marks] Both the mean and the median would increase. The effect on the median would be less than the effect on the mean.

(f)

4 marks What other method of presentation would be appropriate for this data? (Note: You do not need to prepare this). [2 marks] Histogram

2 marks

Page 5 of 13

Question 3 Data unseen before. All question items were seen before (a)

A study showed that 45% of people in a certain population smoke. It also showed that 30% of these smokers suffered from chronic bronchitis, and that 20% of the non-smokers did not suffer from chronic bronchitis. For this question, you may use a hypothetical distribution of 1000 people from that population. (i)

Produce a contingency table. [10 marks] Smoker Suffers from bronchitis Does not suffer from bronchitis Total

Total

135

NonSmoker 440

315

110

425

450

550

1000

575

10 marks (ii)

For a randomly selected person, what is the probability s/he is a non-smoker (NS)? [5 marks] P(NS) = 550/1000 = 0.55 or 55.0% 5 marks

(iii)

Calculate the probability that a randomly selected person is a smoker (S) and does not suffer from chronic bronchitis (NCB). [5 marks] P(S and CB) = 315/1000 = 0.315 or 31.5% 5 marks

(iv)

For a randomly selected person who is a non-smoker (NS), what is the probability that the person suffers from chronic bronchitis (CB)? [5 marks] P(CB | NS) = 440/550 = 0.8 or 80% 5 marks

(b)

(i)

List the three properties of a binomial experiment. [3 marks] A binomial experiment has the following properties: 1. a sequence of n Bernoulli trials 2. each trial is independent 3. the probability of success in a single trial is constant.

Page 6 of 13

3 marks

(ii)

The mean number of pet attacks per year in County Dublin is 4. Assuming the number of pet attacks follows a Poisson distribution, what is the probability that in the current year there will be at most one pet attack. [5 marks] At most one pet attack. P(at most 1) = P(0) + P(1) = 0.0183 + 0.0733 = 0.0916 5 marks

(c)

In a large Drosophila population, 30% of the flies are black and 70% are grey. You captured 7 flies at random. (i)

How likely is it that exactly 5 of the flies are black? [6 marks] P(5 flies are black )=𝑪𝒓𝒏 𝒑𝒓 (𝟏 − 𝒑)𝒏!𝒓 P(5 flies are Black )=𝑪𝟓𝟕 ×𝟎. 𝟑𝟎𝟓 ×𝟎. 𝟕𝟕!𝟓 = 0.025

(ii)

6 marks

How likely is it that at least 5 of the flies are not grey? [8 marks] P(at least 5 flies are not grey) = 1- P(at most 2 flies are grey) P(at most 2 flies are grey) = P(0) + P(1) + P(2) P(0)=!𝑪𝟎𝟕 ×𝟎. 𝟕𝟎𝟎 ×𝟎. 𝟑𝟎𝟕 = 0.0002187 P(1)=!𝑪𝟏𝟕 ×𝟎. 𝟕𝟎𝟏 ×𝟎. 𝟑𝟎𝟔 = 0.0035721 P(2)=!𝑪𝟐𝟕 ×𝟎. 𝟕𝟎𝟐 ×𝟎. 𝟑𝟎𝟓 = 0.025 P(at most 2 flies are grey) = 0.0002187 + 0.0035721 + 0.025 = 0.02879 P(at least 5 flies are not grey) = 1 - 0.02879 = .9712 = 97.12% 8 marks

(iii)

What is the standard deviation of the number of grey flies? [3 marks] Standard deviation = 𝒏𝒑(𝟏 − 𝒑) =

𝟕×𝟎. 𝟕𝟎×(𝟏 − 𝟎. 𝟕𝟎) = 𝟏. 𝟒𝟕 3 marks

Page 7 of 13

Question 4 (a)

The heights (cm) of a sample of 40 plants 14 days after a particular treatment X was administered were recorded. The mean height was 23 cm and the standard deviation was 1.2 cm. (i)

Calculate a 95% confidence interval for the mean plant height. Interpret this confidence interval. [14 marks]

x ±Z

s n

= 23 ± 1.96×

1.2 40

= 23 ± 0.37 = (22.6, 23.4)cm

Question is seen before We can be 95% confident that the mean plant height lies between 22.6cm and 23.4cm. 10, 4 marks (ii)

Suppose the question in part (i) had asked to construct a 90% confidence interval rather than a 95% confidence. Without doing any further calculations, how would you expect the confidence interval to change? (no calculations required) [4 marks] Question is seen before The higher the level of confidence, the wider the interval. Therefore if you decreased the degree of confidence from 95% to 90% the band width would decrease. The more confident you want to be of where the true mean lies, the wider the band (and vice versa). 4 marks

(iii)

You want to estimate the mean plant height to within 0.2 cm with 95% confidence. How many plants would you need to include in your study? [8 marks] Question is seen before Z = 1.96 s = 1.2 AE = 0.2 Z2 σ2 1.962 1.22 n= = = 138.3 AE 2 0.2 2 n = 139 8 marks [no rounding or rounding down: -2 marks]

(b)

Paul is a scientist interested in the learning experience of undergraduate biology students in Ireland. Paul is considering the four sampling strategies described below. Name each of them and comment on the quality of the sample each would produce, and its practical feasibility. (i)

Recruit 100 undergraduate biology students from UCC at random for the survey. [6 marks] Page 8 of 13

(ii) (iii) (iv)

Audit all second-year biology students in two randomly selected Irish universities. [6 marks] Randomly select 10 biology students of each year (from first to fourth year) separately, each time across all Irish universities. [6 marks] Randomly select 16 undergraduate biology courses (over all four undergraduate years) from the pool of all Irish undergraduate programmes, and audit all students in each selected course. [6 marks]

New question on standard (theory) course material. 6 marks for each item: 3 (name) + 3 (comment). (i) Simple Random Sampling; will produce a biased sample since Cork-only students. (ii) Cluster sampling; will produce a biased sample since only 2nd-year students would be audited. (iii) Stratified sampling; should produce a reliable sample but requires a labourintensive process. (iv) Cluster sampling; should produce a reliable sample and should be less labour-intensive than (iii).

Page 9 of 13

Question 5 (a)

A student is analysing the depths at which significant archaeological discoveries were made for four different excavation sites. (i)

Suppose the student was analysing all measurements for all four sites together as one sample, without knowledge of which site each measurement was taken from. She was told that the mean typical excavation depth in this area is 70 cm but she believes it may in fact be deeper. State the null and alternative hypotheses for a one-sample, one-sided t-test based on the student’s belief. [6 marks] Standard question on new course material. The null and alternative hypotheses for the 1-sided test are: H0: µ ≤ 70cm versus HA: µ > 70cm [3 for H0 + 3 for HA] 6 marks

(ii)

The student obtained a P-value of 0.144 from the test, and thinks she was right to think the typical excavation depth for a significant discovery was greater than 70 cm. Comment on the student’s inference. [6 marks] Standard question on new course material. The student’s inference is wrong: since p>0.05 there is no reason to reject H0 based on the data avilable. 6 marks

(iii)

The student then carried out a one-sample, two-sided t-test on the same data to challenge the result of the first test. State the null and alternative hypotheses for a one-sample, one-sided t-test based on the assumption that the mean typical excavation depth in this area is 70cm. [6 marks] Standard question on new course material. The null and alternative hypotheses for the 2-sided test are: H0: µ = 70cm versus HA: µ ≠ 70cm 6 marks

(b)

We consider an experiment where genetically engineered mice were given different doses of an experimental treatment and their time to recovery was recorded, in weeks. Time-torecovery data were obtained for the following four groups: a control group, a group with low dosage, a group with medium dosage, and a group with high dosage. An ANOVA was carried out across the four groups using SPSS®, and produced the following results: ANOVA Sum of df Mean Square F Sig. Squares Between Groups 4051.960 3 1350.653 3.550 .024 Within Groups 12937.434 34 380.513 Total 16989.395 37 Use the information in the above table to answer the following: Page 10 of 13

(i)

State the assumptions for the one-way analysis of variance. [9 marks] A one-way ANOVA requires the following assumptions to be satisfied: 1. There are k simple random samples from k populations 2. The k samples are independent of each other (units in one group can not be related to units in another group) 3. The units are independent within each group 4. The populations are Normally distributed 5. The populations have the same variance [3 each, for any 3] 9 marks

(ii)

Formulate the null and alternative hypotheses for this analysis of variance. [6 marks] Standard question on new course material. H0: µ1 = µ2 = µ3 = µ4 HA: “At least one of the population mean times to recovery differs from the others” [3+3] 6 marks

(iii)

What is the P-value for the test for equality of means? What is your conclusion? [5 marks] Standard question on new course material. The computed p-value is 0.024. As this p-value is lower than the 5% significance level (P...


Similar Free PDFs