Docx copy - hw PDF

Title Docx copy - hw
Author tomielie huge
Course Statistical Methods For Biology
Institution Purdue University
Pages 14
File Size 353.2 KB
File Type PDF
Total Downloads 86
Total Views 138

Summary

hw ...


Description

Lichti/STAT 503 – Statistical Methods for Biologists

Modified: 2018-10-29

STAT 503 – Statistical Methods for Biology

Homework 10 (OPTIONAL) 25 Points. Due at 11:59 PM on Tuesday, December 11, 2018. This homework includes material from Tutorial 9, and you may use R to answer questions but you are not required to do any coding (there is a probability lookup). Optional practice: Whitlock and Schluter: Chapter 15, questions 1-15 (pp. 486-492), Chapter 16, questions 1-14 (pp. 524-529), Chapter 17, questions 1-18 (pp. 576-582). 1. [2 points] In honor of Finals Week: Caffeine, a defensive alkaloid that occurs in many plant tissues and helps deter herbivory, also occurs at low concentrations in flower nectar, where it appears to improve honey bees' memory of floral rewards and to strengthen their association of rewards with flower scents, improving pollination success (Wright et al. 2013, Science 339:1202-1204). To see whether or not bees preferred caffeinated nectar, Singaravelan et al. (2005, Journal of Chemical Ecology 31:2791-2804) presented bees with feeding stations that offered a choice between a 20% sucrose solution or a mixture of 20% sucrose with caffeine in one of 4 concentrations: 20, 100, 150, or 200 ppm. For each station, they calculated the difference in grams between the amount of the caffeinated sucrose solution that bees consumed and the amount of the pure sucrose solution that they consumed. If bees consumed more of the caffeinated solution at a station, then this response was variable was positive. If they consumed less of the caffeinated solution, it was negative. There are 5 data points per treatment. The data appear below:

Caffeine concentration 50 ppm

100 ppm

150 ppm

200 ppm

-0.4 0.34 0.19 0.05 -0.14

0.01 -0.39 -0.08 -0.09 -0.31

0.65 0.53 0.39 -0.15 0.46

0.24 0.44 0.13 1.03 0.05

a. Calculate the sample means and standard deviation for each group. Let group 1 be 50 ppm, group 2 be 100 ppm, group 3 be 150 ppm, and group 4 be 200 ppm.

The sample mean for group

j is 1

´x j=∑ x ij /n j :

Lichti/STAT 503 – Statistical Methods for Biologists

Modified: 2018-10-29

´x 1=

− 0.4 +0.34 + 0.19 + .05− 0.14 0.04 =0.00 8 = 5 5

´x 2=

0.01−0.39− 0.08−.09 −0.31 −0.86 = =−1.7 2 5 5

´x 3=

0.65+ 0.53 + 0.39 −0.15 + 0.46 1.88 = =0.376 5 5

´x 4=

0.24+ 0. 4 4 + 0. 13 +1 .03 + 0.05 1.89 =0.378 = 5 5

The sample standard deviation for group

j

[

is s j =

1 2 ∑ ( x ij−´x j ) n j −1

]

0.05 −0.008 2 2 2 (−0. 4 −0.008 ) +( 0.34 − 0.008 ) + ( 0.19−0.008 ) +( ¿ ) ¿ 2 ¿ 2+ (−0.14 −0.008 ) ¿ ¿ s 1=¿ ¿ ¿

(

)

0.5

0.333 4

=0.289

−0 . 31+1.72 2 2 2 ( 0. 01+1.72) + (−0.3 9+1.72 ) + (−0.08+1.72 ) + (−0. 0 9+1.72) +(¿ ) ¿ ¿2 ¿ ¿ s 2=¿ ¿ 2

¿

(

0.115 4

)

0.5

=0.169

−0.1 5−0.376 ( 0.65− 0.376) 2 +( 0.53 −0.376 )2+ ( 0. 3 9−0.376 )2 +(¿) ¿ 2 ¿ 2+( 0.46 −0.376 ) ¿ ¿ s3 =¿ ¿

(

0.383 ¿ 4

)

0.5

=0. 309

2

1 2

:

Lichti/STAT 503 – Statistical Methods for Biologists

Modified: 2018-10-29

1.03 −0.378 2 2 ( 0.2 4−0.378 ) +( 0. 4 4−0.378 ) +( 0. 13−0.378 ) +( ¿ ) ¿ 2 ¿ 2+( 0. 05−0.378 ) ¿ ¿ s 4 =¿ ¿ 2

¿

(

0.617 4

)

0.5

=0. 393

b. We would like to analyze these data using analysis of variance. Based on the design of the study, would this be an appropriate analysis? Why or why not?

Yes, ANOVA would be appropriate. The explanatory variable is caffeine concentration, which is quantitative. However, it has been applied in a small set of discrete treatment levels (50, 100, 150, and 200 ppm). As a result, we have four clearly defined groups or populations of stations that we are comparing. In addition, the response variable is a continuous number, and since it is a difference between two random variables, we have a reasonable theoretical expectation that it will be approximately normal in its distribution. As long as the assumptions are met, ANOVA is the appropriate test to use to compare the mean value of a continuous response among a set of three or more distinct groups.

c. Please state the null and alternative hypothesis for the ANOVA. H 0 : The honey bees' average relative use of caffeinated versus uncaffeinated sucrose solution does not differ among the caffeine concentration treatments ( μ1= μ2=μ 3= μ4 , where μ j is the mean ratio of caffeinated:uncaffeinated solution consumed per station in treatment j ). H a : The honey bees' average relative use of caffeinated versus uncaffeinated sucrose solution does not differ among the caffeine concentration treatments (at least one μ j differs from the others).

d. Please state the assumptions of the ANOVA.

1. The data represent a random sample from each population or group. 2. The sample units are independent of each other. 3. The response variable is normally distributed within each group. 3

Lichti/STAT 503 – Statistical Methods for Biologists

Modified: 2018-10-29

4. The response variable has the same standard deviation in each group.

Assumptions 2-4 can be combined as: The response variable can be described as the sum of a group mean plus a random error term, where the error is i.i.d. normal with mean 0 and standard deviation σ that does not depend on the group.

Mathematically, this combined assumption can be stated as, y ij=μ j +ε ij ε ij ∼N ( 0, σ ) for individuals i=( 1, … , n j)

in groups

j=(1 , … , k) .

e. Do the following diagnostic plots give you any reason to be concerned regarding the assumptions? 6

0.50

4

count

sample

0.25

0.00

2

-0.25

-0.50

0 -2

-1

0

1

2

theoretical

-0.4

0.0

0.4

resid(cbm)

Not particularly. There appears to be some small asymmetry in the distribution of the residuals. However, it is not severe, and the sample size (20) is moderately large, so the central limit theorem should help ensure an approximately normal sampling distribution.

f. Are there any additional checks of assumptions that you should conduct prior to running the analysis? If not, say "No." If yes, please conduct them and state whether you are satisfied that ANOVA is reasonable given your results.

4

Lichti/STAT 503 – Statistical Methods for Biologists

Modified: 2018-10-29

Yes. We need to make sure that the variances are equal. The easiest way to do this is to check the rule of thumb that s max /s min ≤2 . From question 1, we have 0.393/0.169 = 2.32. This is higher than we would like to see, but below the "serious problem" threshold of 3. By using ANOVA with this data set, we are operating in a gray area. It would be wise to confirm our results using permutation or the Kruskal-Wallis test.

For reference, the Kruskal-Wallis test does agree with the results of the ANOVA F -test ( χ 23=8.764 , P=0.0326 ).

g. Regardless of your answers in (e) and (f), please calculate the ANOVA table. You can do this by hand if you would like the practice, or in R. Use the pf() function in R to obtain a P -value. Work for each answer is shown below. Here's a clean version of the table: Source 1

df 3 16 19

SS

MS 0.37 8 0.09 1

F 4.15 4

P 0.023 5

Group 1.134 Error2 1.448 Total 2.582 1 Depending on who you ask, this may also be called Factor, Model, Among, or Between 2

R refers to this as Residuals

Degrees of freedom (df): In a one-way ANOVA, the group degrees of freedom is always one less than the number of groups: d f G =k −1=4−1=3 .

The error degrees of freedom are equal to the total sample size minus the number of other parameter estimates that must be calculated to estimate the errors. In a one-way ANOVA, that is always k group means, so, d f E=n−k =20−4=1 6 The total degrees of freedom is the degrees of freedom for the overall sample variance of the response variable, n−1 . Thus, d f T =n−1=20 −1=1 9 Sums-of-squares:

5

Lichti/STAT 503 – Statistical Methods for Biologists

Modified: 2018-10-29

The overall mean of the response is 0.008−0.172+ 0.376 + 0.378 =0.1475 . See the discussion at the end of ´y = 4 this answer key if you are unsure why ´y was calculated this way. Given the grand mean, the group sum of squares is: k

SS G=∑ n j ( x´ j−´x ) 2 j=1

2 2 2 ¿ 5 ( 0.008−0.1475) +5 ( −0.172 −0.1475 ) +5( 0.376− 0.1475 ) +5 ( 0.378 −0.1475

¿ 1.134 From the work done in question 1, we have the sum of squared errors for each group ( SS E j ) already calculated. The full sum of squared errors is the sum over the groups: k

nj

SSE=∑ ∑ ( yij− ´y j )2 j=1 i=1

¿ ∑SS E j = ( 0.333 + 0.115 + 0.383 + 0.617 ) ¿ 1.44 8 The total sum-of-squares is equal to the sum of

SSG

and SSE , so,

SST =SSG + SSE=1.134 + 1.448=2.582 Mean-squares: In general,

MS= SS / df . Therefore,

MSG=

SSG = 1.134 =0.37 8 d fG 3

MS E=

SS E 1.448 =0. 09 1 = dfE 16

and

F: F=

MSG 0.378 = =4.153 8 MSE 0.091

Due to rounding error, my F statistic is slightly off from the one reported by R (= 4.178). 6

Lichti/STAT 503 – Statistical Methods for Biologists

Modified: 2018-10-29

P-value: From R, > pf(4.1538, 3,16, lower.tail=FALSE) [1] 0.02351997

h. Based on your results, do you reject or not reject the null hypothesis? Based on these results, the data are moderately inconsistent with the null hypothesis. Using α=0.05 , I would reject the null hypothesis and conclude that in at least one of these groups, the bees' average preference for caffeinated sugar water over uncaffeinated sugar water was different than in the other treatments.

7

Lichti/STAT 503 – Statistical Methods for Biologists

Modified: 2018-10-29

Please refer to the R output below for the next questions

> > > >

cb % TukeyHSD() Tukey multiple comparisons of means 95% family-wise confidence level Fit: aov(formula = cbm) $ppmCaffeine diff lwr upr p adj 100-50 -0.180 -0.724376219 0.3643762 0.7809455 150-50 0.368 -0.176376219 0.9123762 0.2534122 200-50 0.370 -0.174376219 0.9143762 0.2493712 150-100 0.548 0.003623781 1.0923762 0.0482052 200-100 0.550 0.005623781 1.0943762 0.0472406 200-150 0.002 -0.542376219 0.5463762 0.9999996

i. If you were to conduct a planned comparison of the 50 ppm treatment and the 100 ppm treatment, what would be the point estimate and confidence interval for the difference in the group means? The point estimate and confidence interval for a planned contrast are given by:



(n n )

¿ ¿ 1 1 ( ´y b− ´y a) ± t 1− α , d f MSE + =−0.180 ± t 0.975,16( 0.1903) 2

E

a

b

8

Lichti/STAT 503 – Statistical Methods for Biologists

Modified: 2018-10-29

Where the values −0.180 for the point estimate and 0.1903 for the standard error come from line 2 of the Coefficients summary. From R, I get ¿ t 0.975,16=2.1199 , therefore, the 95% confidence interval works out to: ¿−0.180 ±(2.1199)( 0.1903) →(− 0.5834,0.2234 ) (answer continued on next page) So, at a confidence level of 0.95, we estimate that the difference in mean relative consumption of caffeinated sugar water between the 50 ppm treatment and the 100 ppm treatment is -0.180 (-0.583, 0.223) grams.

If you did this by hand using an MSE=0.091 , you would get a slightly different answer, with a confidence interval of (-0.584, 0.224), due to the rounding error.

j. None of the t -test P -values in the coefficients table appear to be significant. Why does this result not contradict the result of the F -test you did in (g)?

The coefficients reported by R represent the mean for the 50 ppm group and pairwise contrasts between the 50 ppm group and each of the 100, 150, and 200 ppm groups. None of the t -tests on these contrasts indicate a significance difference from zero, so we cannot conclude that any of the latter three groups differs significantly from the 50 ppm group. However, the F -test looks for any significant pairwise differences, not just differences from the reference group whose mean is described by the intercept term in the model. The results for the Tukey-Kramer pairwise comparisons shows that the 150 and 200 ppm groups both differ significantly (at α =0.05 ) from the 100 ppm group.

k. In the table below, please list the four treatment groups in increasing order of their value for the response variable (i.e., the treatment with the lowest mean response value first, and so on), and use letter codes to indicate which groups are significantly different from each other. Remember that sets of groups that are not different should share a letter. Groups that fall between two sets and overlap both should have two letters.

Treatment

Letter code

100 ppm

a

9

Lichti/STAT 503 – Statistical Methods for Biologists

Modified: 2018-10-29

50 ppm

ab

150 ppm

b

200 ppm

b

10

Lichti/STAT 503 – Statistical Methods for Biologists

Modified: 2018-10-29

2. The tables and graphs below show results and diagnostic plots from a simple linear regression examining the relationship between an index of body leanness (m2/kg) and the rate of heat loss ( ° C /min) among 12 boys who spent 40 minutes swimming in water at 20.3 ° C (68.54 ° F) (data from Sloan and Keatinge 1973, Journal of Applied Physiology 35:371-375).

Table 1: Linear regression results for the effect of leanness on the rate of het loss. Source Leanness index Residuals

df 1 10

SS 0.0148 0.0022

MS 0.01481 0.00022

F 68.729

Intercept Leanness index

Estimate -0.0269 0.0190

SE 0.0100 0.0023

t -2.687 8.290

P 0.0228...


Similar Free PDFs