Biostatistics - Lecture notes - 22 - 28 PDF

Title Biostatistics - Lecture notes - 22 - 28
Course Biostatistics
Institution University of Calgary
Pages 54
File Size 3.6 MB
File Type PDF
Total Downloads 70
Total Views 153

Summary

Download Biostatistics - Lecture notes - 22 - 28 PDF


Description

-1-

Power of ANOVA & Assumptions

Power of ANOVA & Assumptions We have now established how ANOVA determines whether means differ significantly, now we are going to discuss some experimental design factors that affect the power of ANOVA and the assumptions of ANOVA. Objectives: By the end of this lecture you will be able to: 1. List factors that affect the power of ANOVA, and describe how to increase the power of ANOVA. 2. List the assumptions of ANOVA, and be able to appropriately test each assumption. 3. Describe how you would deal with an outlier in your own dataset. x Earlier, I said ANOVA is a method for comparing means of two or more groups, while a two-sample t-test is a method for comparing means of two groups. x Both ANOVA and a two-sample t-test can be used when you have two groups. BUT - WHICH METHOD SHOULD YOU USE? You can prove that mathematically they are the same when either test is applied to 2 means. Work through the example in the supplementary materials to prove this to yourself.

-2-

Power of ANOVA & Assumptions

ANOVA POWER

I.

Recall: POWER ≡

TOP HAT MONOCLE When H0 true:

When H0 false:

ÆPower of an ANOVA depends on several factors, some of which are the same as for a 2-sample t-test, some of which are new: 𝐹𝑠 =

𝑀𝑆𝐵

𝑀𝑆𝑊

=

𝑆𝑆

(𝑑𝑓𝐵 )

𝐵 𝑆𝑆𝑊 ( ) 𝑑𝑓𝑊

x Power : o EXAMPLE: testing the effect of Nutrient on algae abundance, and the use of EXTREME treatments.

-3-

Power of ANOVA & Assumptions

x Power

:

But, these factors are generally not under your control as an experimenter—they’re properties of the populations you’re studying. However, two other factors that affect power are under your control: x

Power : treatment groups minus one).

x Power

:

. dfbetween equals k-1 (number of

.

Example: Which is the better experimental design? Top Hat Monocle a) Five treatments levels with 4 replicates in each treatment. b) Four treatment levels with 5 replicates in each treatment.

-4-

Power of ANOVA & Assumptions

How else could we increase the performance of the test? Think about the birds…..

II. ANOVA assumptions Next I’d like to return to the assumptions of ANOVA. As a reminder, in ANOVA, and in a two-sample t-test, which of course is equivalent to an ANOVA on two groups, the general linear model for the observations is:

𝑌𝑖𝑗 = 𝜇++ + 𝜇𝑖+ + 𝑒𝑖𝑗

and the H0 we wish to test is:

H0: μ1= μ2 = ... μk = μ ++ We test this null hypothesis using: x A one-tailed F-test for equality of variances among groups vs. variances within groups. Î In order to determine the expected value of F under the null hypothesis, we have to make some assumptions about the nature of sampling error, because if the null hypothesis is true then any differences among our sample means will just be due to sampling error (if μi+=μ++).

-5-

Power of ANOVA & Assumptions

So we have to make some assumptions about the eij values: 1. 2. 3.

III. HOW do you CHECK THESE ASSUMPTIONS, and WHAT DO YOU DO IF THEY ARE VIOLATED? FIRST - WHY RESIDUALS? Î With ANOVA we are sampling from multiple treatment populations, therefore we assume that the deviation from groups means (residuals: εij) have a normal distribution, and then we can do one test for all of the data at once. Based on central tendency, if the data are normally distributed, then the means should be normally distributed, even with small sample sizes. x Recall: The residuals are the deviations of individual observations from group means, and we want to examine them Æ they should have the same properties as the data, but we can examine the residuals without the treatment effect. Look at all treatments pooled together.

-6-

Power of ANOVA & Assumptions

1. Normality of residuals To check for normally distributed εij:

Q-Q plot Estimated Z

Observed

Î Î Î

What if the normality assumption is violated?

-7-

Power of ANOVA & Assumptions

WHAT ABOUT OUTLIERS? WHAT ARE THEY? x outlier ≡

Because outliers are very different from all other observations in a group, they have a disproportionately large influence both on the group mean, and on the error variation within the group, and so could have a disproportionately large influence on the outcome of the analysis. And it’s always a cause for concern when the entire outcome of the analysis is strongly influenced by any one data point. OUTLIER EXAMPLE: Data: 3,2,4,3,4,2,5, AND 20 Mean without outlier: 23/7 = 3.29 Mean with outlier: 43/8 = 5.38

variance: 1.24 variance: 35.98

-8-

Power of ANOVA & Assumptions

So what do you do if you think you have an outlier? DISCUSSION

2. Homogeneity of Variance = ANOVA is also fairly robust to heteroscedasticity, as long as the number of observations per group are about equal. There is TWO WAYS TO TEST FOR HOMOSCEDASTICITY:

-9-

Power of ANOVA & Assumptions

1. Probably the simplest is to PLOT YOUR RESIDUALS for each treatment,

x In this example, the variance of treatment i=2 has a greater variance than the other treatments (i.e., wider spread of residuals). SO, PLOTTING YOUR RESIDUALS IS ONE WAY TO TEST FOR HOMOSCEDASTICITY. That’s an informal approach. 2. The FORMAL APPROACH is to use a statistical test called a Bartlett’s Test. This is just a test of the ASSUMPTIONS of ANOVA, not the ANOVA itself! Now that we have more than two samples we need a test the can determine if variances from MORE than two samples are different from each other. This test asks whether k sample variances estimate the same population variance.

H0: 𝜎1 2= 𝜎2 2 = ⋯ 𝜎𝑘 2 Ha: the k population variances are not the same

- 10 -

Power of ANOVA & Assumptions

pooled estimate of within-group variance

non-pooled estimate of within-group variances

sampling error (many obs./group, few groups = low error)

If H0 true, the numerator and denominator differ only due to sampling error. If false, they differ more than that. • Compare test statistic to chi-squared distribution with k-1 df • One-tailed test • Sufficiently large value of test statistic (expected 5% of time or less if H0 true) --> reject H0 • Caution: Bartlett's test assumes all observations were randomly sampled from normal distns., is sensitive to violation of normality • Other tests are robust to non-normality, but less powerful than Bartlett's test when normality holds. We won't cover these.

3. Independence of error terms

- 11 -

Power of ANOVA & Assumptions

EXAMPLES: x Height:

x Problems can also arise with temporal & spatial data collection patterns.

x RANDOM SAMPLING—

We are not going to discuss remedies for non-independent data in this class. BUT, what do you do if your data are non-normal and/or heteroscedastic? Two options: 1.

- 12 -

Power of ANOVA & Assumptions

2. Summary: Next class we will talk about what to do given violation of these assumptions. So what are some things you should consider when designing an ANOVA experiment?

-1-

Tukey’s

Tukey’s HSD test Last class we started talking about the assumptions of ANOVA: x εij normally distributed x εij homoscedastic (have the same variance for each treatment group) x εij independent, meaning the value of any particular eij does not depend on the value of any other eij (eij should be random) How do we tell WHICH means differ if we get a significant ANOVA. Objectives: By the end of this lecture you will be able to: 1. Conduct a post-hoc Tukey’s test to determine where the differences in the means of an ANOVA lie. 2. Correctly interpret the output of a Tukey’s test 3. Explain how Tukey’s test works and how the q-distribution is generated 4. Use lettering notation to distinguish which groups differ from each other.

I. Post-hoc multiple comparisons So, let’s say you do an ANOVA and you get a significant F-test. You reject H0 and your statistical conclusion is that not all the group means are equal, it’s fairly vague. It doesn’t tell you which means differ from which others. x e.g., Consider again the feeding rates of 3 size classes of finches, both the following datasets would result in a significant F-test: Dataset #1

-2-

Tukey’s

Dataset #2

x First dataset: x Second dataset:

The test we will go over next: x x

There are many ways to follow up a significant F-test in an ANOVA to figure out exactly why H0 was rejected, to determine which means differ from which others, while still controlling your experimentwise error rate. ÆAll of these methods are collectively termed:

In this course you will learn one multiple comparison method, probably the most popular one: A. TUKEY’S Test

-3-

Tukey’s

x Uses the studentized range distribution of q, where q is our test statistic. x You can think of the studentized range distribution of q as sort of like a corrected form of the t-distribution, corrected for the fact that we’re conducting multiple comparisons of pairs of means, not just one. This correction ensures that the experimentwise type I error rate for the entire set of comparisons that we conduct will be held at our chosen level of 0.05. Here’s how Tukey’s test works. Consider our finch experiment, which compares the mean feeding rates of 3 groups of finches. If we repeated that experiment many times, each time we’d expect to get slightly different results, just because we randomly sampled different finches. So each time we did the experiment, we’d expect the sample means for each of our 3 groups of finches to change due to chance. So you can think of each sample mean as an independent observation drawn from a population of means. By the central limit theorem, each population of sample means should be normally distributed. The null H0 of our ANOVA is that the true means of all the populations from which our sample means were drawn are identical: µ1+= µ2+= µ3+= µ++ We will use this to generate our expectations for the difference between our sample means being due to random sampling vs. statistically significant. a) The q-distribution generator

-4-

Tukey’s

Now, if we’re doing a Tukey’s test, we’ve already rejected the H0 that all of the means are equal. We now want to use Tukey’s test to infer which of these population means is different from which others. b) Comparing means using the q distribution *** We are asking:

So let’s say you’re comparing the sample means of groups A and B. The H0 for Tukey’s test is that the population means for these two groups are equal: µA = µB. The alternative is that they’re unequal. To compare these two means, we can calculate the following sample statistic, which is called THE STUDENTIZED RANGE:

qs

YA  YB SE

, where mean A > mean B. SE is the standard error of the sample mean.

Notice that this looks sort of like the t-statistic for a two-sample t-test.

-5-

Tukey’s

x In fact, the only difference is that the SE in the denominator is the SE of the mean, not the SE of the difference between two means. ** The really important difference between Tukey’s test and a 2-sample t-test is NOT the difference in how qs and ts are calculated. THE IMPORTANT DIFFERENCE IS IN THE DISTRIBUTIONS TO WHICH THE SAMPLE STATISTIC IS COMPARED. For a Tukey test, in order to decide whether to reject the null hypothesis that population means A and B are equal, we compare our sample statistic qs to the critical value of the studentized range distribution. The way you get the studentized range distribution is to ask, if I drew k sample means, not just two, from a single normally-distributed population of means, then calculate the range of those k means (largest minus smallest), and then I “studentize” the range by dividing by the SE, what is the probability distribution of the studentized range of the means? x The critical value of ‘q’ is the value that demarcates the upper 5% of this distribution. If your sample statistic qs exceeds the critical value, then that implies that, there would be less than a 5% chance that the largest and smallest means in your entire sample of k means would differ as much as the two means you’re comparing if the H0 were true. Any two means that differ by more than this critical value of q therefore are significantly different at αe = 0.05. It’s by considering the expected range of a sample of k means under the H0 that Tukey’s test is able to control the experiment-wise type I error rate. The standard deviation of the population of means can be represented by the standard error of a sample from the population:

SE

s2 n

, where s2 is the sample variance, n is the number of observations in the sample.

-6-

Tukey’s

For an analysis of variance, MSW is an estimator of s2. So, if the number of observations in each group in our ANOVA is equal to n, we have:

SE

MSW n

Of course, we don’t always have equal numbers of observations from each group. Technically, Tukey’s test assumes that the number of observations per group is equal. A statistician named Kramer suggested a modified version of the standard:

SE

MSW 2

§ 1 1· ¨¨  ¸¸ © n A nB ¹

(divide by 2, b/c ‘n’ (above) is for EACH group)

DEGREES OF FREEDOM IN TUKEY-KRAMER TEST The degrees of freedom for this test are: x “k”, x “v”, x The df are the same for all of the comparisons you make, even if the sample size ni varies among groups.

-7-

Tukey’s

Look at the table: For example:

If you inspect the statistical table of critical values of q, you’ll find that the critical value of q increases with k. Why is this? x

Let’s walk through our finch feeding rate data Recall: H0: µ1+= µ2+= µ3+= µ++ ; this was rejected in ANOVA Now do multiple comparisons. v = dfw = 7 (n1=3, n2=3, n3=4; N-k =10-3); and k = 3 means From the table q0.05,7,3 = 4.165; if q is greater than this, conclude the means being compared are significantly different. MSW = For comparisons of small and medium finches with large finches (n3=4):

sY

SE

MSW 2

§ 1 1· ¨  ¸ © n1 n2 ¹

0 .214 § 1 1 ·  2 ¨© 3 4¸¹ 0 .2498

-8-

Tukey’s

For comparisons of small and medium finches (n1 and n2=3):

FG H

JKI

0. 214 1 1  2 3 3

SE

0.2671 Now, arrange the groups according to the group means and compare them in sequence: Small 5.5

L-S L-M M-S

YA-YB 1.5 1.0 0.5

SUMMARY:

Medium 6.0 SE qs (=YA-YB/SE) 0.2498 6.0048 0.2498 4.0032 0.2671 1.8720

Large 7.0 p 0.05 >0.05

Biol 315 Winter 2015 Worksheet Activity #2 Most plants have mycorrhizal fungi that provide minerals and antibiotics to the plant in exchange for sugars. These fungi extend through the soil and connect to other plants, even to other plant species. Simard et al (1997) tested whether the flow of carbon from birch to Douglas fir seedlings depends on the degree of shading. They expected that shaded trees may draw more carbon via the mycorrhizae than trees in full sun. Using stable isotopes, they measured the carbon transfer from birch to Douglas fir in Deep shade, partial shade and no shade treatments. Write the null and alternate hypothesis for this experiment. The table below shows the ANOVA results.

1. Based on this table, assuming a balanced design, how many replicates were in each of the treatment levels? a. 15 b. 3 c. 5 d. 6 e. Unable to determine with the information given 2. What do you conclude about the null hypothesis based on this table? a. Fail to reject H0 b. Reject H0 c. Unable to determine with the information given Below is the result of a Tukey’s test conducted on the data. Complete the missing values.

3. What do you conclude from the Tukey’s test? a. All the means are significantly different from each other because the range is greater than the critical value b. There is a significant difference between deep shade and no shade, deep shade and partial shade but no significant difference between partial shade and no shade treatments c. There are no significant differences between any of the treatment means d. Partial shade and no shade treatments are significantly different from each other, but no other means are significantly different from each other

Biol 315 Winter 2015 4. Below is a figure of the data, including the results of the Tukey’s test. Which of the following are matched to the correct treatment means a. 1= Deep shade , 2 = Partial shade , 3 = No shade b. 1 = No shade, 2 = Partial shade, 3 = Deep shade c. 1 = Partial shade, 2 = Deep shade, 3 = No shade d. 1 = No shade, 2 = Deep shade, 3 = Partial shade

1

2

3

5. Describe how the q-distribution differs from the t-distribution.

6. Describe how ANOVA uses VARIANCES to test for significant differences between MEANS?

7. When a significant difference between two means is detected by a Tukey’s test, we assume that these means were drawn from different populations. Given this, discuss how the result of a Tukey’s test below is possible.

1

Transformations Objectives: By the end of this class you will be able to: 1. Decide when to ignore violations of assumptions 2. Decide when a transformation is an appropriate approach 3. Explain how/why a transformation can solve assumption violations. 4. Determine which transformation is appropriate given the data 5. Report the results of a statistical analysis on transformed data.

I.

When to ignore violations of assumptions

How does the central limit theorem relate to the assumptions of a statistical test?

What does this mean in terms of when to ignore a violation?

II.

Transformations

2

This example shows a common form of heteroscedasticity you’ll run into is that the variance of the residuals increases with the mean.

Why is this funnel shape common? BECAUSE:

3 Check assumptions of normality and homoscedasticity. Homoscedasticity Bartlett’s test.

Try a transformation, but which one? 1. Most common transformation in biology: LOG transformation. Generally useful for: x x x x 2. Arcsine transformation:

3. Square-root transformation:

4

Log transform the data Î looks like it might be better. Homoscedasticity:

The assumptions are met…..now…do ANOVA.

5

**If you do transform your data, you must interpret the results in terms of the transformed data!! i.e., if your data are the abundances of an organism, and you log-transform the data, you must discuss differences among groups in mean log-abundance, not mean untransformed abundance

Summary: How will you know when to try a transformation?

How does this fix violations of assumptions?

1

Randomization/Permutation/Monte Carlo & Bootstrapping Objectives: By the end of this lecture you should be able to: 1) 2) 3) 4) 5)

Define permutation tests Appropriately conduct and interpret permutation tests Explain the theory of how a permutation test work...


Similar Free PDFs