Ecological Data Analysis Midterm 2 Study Guide PDF

Title Ecological Data Analysis Midterm 2 Study Guide
Author Bella Goñi
Course Introduction To Ecological Data Analysis
Institution University of California, Berkeley
Pages 15
File Size 823.9 KB
File Type PDF
Total Downloads 1
Total Views 120

Summary

Midterm 2 study guide containing information from lecture and textbooks guided by the study guide posted, also includes helpful pictures of graphs and equations...


Description

CHAPTER 8: - Steps of a chi-squared goodness of fit test when null hypothesis involves estimating a parameter for the binomial distribution or Poisson distribution - Chi-squared goodness of fit test compares frequency data to a probability model stated by the null hypothesis - Use proportions to calculate expected values - Sum of expected values should be same as sum of observed values - Calculate the chi-squared test statistic

-

-

-

Calculate degrees of freedom

Calculating the P-value - For chi-squared test, the p-value is the probability of getting a chi-squared value greater than the observed chi-squared value calculated from the data - Critical value: the value of a test statistic that marks the boundary of a specified area in the tail (or tails) of the sampling distribution under Ho - If observed chi-squared value is greater than the test statistic for alpha = .05, then the p-value is less than .05 and you reject the null - Assumptions of the chi-squared test: none of the categories should have an expected frequency less than one, no more than 20% of the categories should have expected frequencies less than five - When there are only two categories, the binomial test is the best option when n is small and when the expected frequencies are too low to meet the assumptions of the chi-squared goodness of fit test - When testing the fit to a binomial distribution, we are fitting the results of multiple sets of trials and comparing the frequencies of sets having different numbers of successes to the expectation of the binomial distribution - If null doesn’t specify probability, then must estimate that (and subtract in df) - P is much less than alpha: frequency distribution does not match the binomial distribution so one assumption is not met (probability varies, not independent) Define, interpret and use Poisson distribution - Poisson distribution describes the number of successes in blocks of time or space, when successes happen independently of each other and occur with equal probability at every instant in time or point in space (if data fits, then random and independent)

-

Alternative to poisson distribution is that successes are distributes in some nonrandom way in time or space - Clumped: occur closer together than expected by chance - When presence of one success increases probability of other successes occuring nearby - Dispersed: spread out more evenly than expected by chance

-

Equation:

-

Mu: mean number of independent successes in time or space X: number of successes Null is that data fits poisson (so is random and independent); alternative is that data does not fit poisson (so is not random and independent) First estimate mu using sample mean from data Use equation to get expected probability then multiply by total number of data to get expected frequency If expected frequency is less than one, then need to group categories and calculate again Calculate chi-squared test statistic Calculate degrees of freedom (subtract 1 for parameter - mu - estimated) Property of poisson: the variance in the number of successes per block (time) is equal to the mean - If variance more than mean: distribution is clumped - If less than mean: dispersed

-

CHAPTER 9: - Calculate and interpret population and/or sample values of: - Odds - Odds of success are the probability of success divided by the probability of failure - Estimate of odds calculated from a random sample of trials:

-

- p^ is the estimated proportion Odds ratio - The odds of success in one group divided by the odds of success in a second group:

-

-

-

Standard error for log odds ratio

-

- Then do e^ all numbers to convert back to normal scale and get rid of ln - Z is 1.96 for 95% confidence interval Relative risk - The probability of an undesired outcome in the treatment group divided by the probability of the same outcome in a control group

Go through all steps of a chi-squared contingency test - Contingency analysis estimates and tests for an association between two or more categorical variables - If two variables are independent, then the state of one variable tells us nothing about the probability of the different values of the other variable (association implies that the variables are not independent) - Null hypothesis is that variables are independent; alternative is not independent - Calculate expected frequencies under null hypothesis of independence

-

Use multiplication rule to calculate probability of each combo of events, then multiply by total number to get expected frequency Calculate chi-squared test statistic - C: number of columns, r: number of rows

-

Degrees of freedom:

-

Shortcut for calculating expected frequencies - Expected cell value for given row and column:

CHAPTER 10: - Write and interpret the probability density function for the normal distribution (don’t need to calculate values from it)

-

-

If mu stays same and have different sigma values: then all centered on same spot but some wider or narrower - If sigma stays same and have different mu values: then all look identical just shifted to be over different mu Know how to use dnorm and pnorm functions in R (first three arguments only) Define and use standard normal distribution

-

-

-

-

Normal distribution is a continuous probability distribution describing a bell-shaped curve; good approximation to the frequency distributions of many biological variables Properties: continuous distribution so probability measured by area under curve, symmetrical around mean, has single mode, probability density is highest exactly at mean About ⅔ of area under normal curve lies within one standard deviation of the mean 95% of probability of normal distribution lies within about two (really 1.96) standard deviations of the mean The further values are from the mean, the lower the probability density of observations Two parameters to describe its spread: the mean and standard deviation

Gives probability density function for a value x which can be any real number, mu is mean of the distribution, sigma is the standard deviation Shifting and scaling any normal distribution to a standard normal distribution which has mean of 0 and standard deviation of 1 (subtract mean, divide by standard deviation)

-

Use statistical table B - To get area under curve to right: 1-pnorm(_,_,_) - Gives probability that a random draw from standard normal distribution is above a given cutoff value - First two digits of cutoff value in column, then go over to second digit after decimal, the value there is the probability that Z > cutoff value - For a negative cutoff number:

-

-

For bounds:

For Pr [Z > neg. #] = Pr [Z < #] = 1-Pr [z > #]

Define and apply the normal distribution of sample means

-

-

Understand the population parameters and the sample estimates and how they are related quantities of the normal distribution from which the sample is drawn The sampling distribution of an estimate lists all the values that we might obtain when we sample a population and describes their probabilities of occurrence If x has normal distribution in pop, then distribution of sample means x bar is also normal

Explain and interpret the central limit theorem - The sum or mean of a large number of measurements randomly sampled from a non-normal population is approximately normally distributed - Sampling distribution of sample means x bar is approximately normal even i the distribution of individual data points is not normal, provided the sample size is large enough

CHAPTER 11: - Explain and use Student’s t-distribution - Use estimated standard error of the mean

-

-

-

Fatter in the tails than the standard normal distribution; larger range of values of t compared to Z comes from the uncertainty about the true value of the standard error; gets closer to normal with higher sample size/n value - As df gets large, tdf distribution looks more and more like N(0,1) aka standard normal Use statistical table C to look up either a P-value (within precision available) or critical value - Every t distribution has its own critical 5% t-value, depending on the number of degrees of freedom (where 95% of the area under the curve lies between - and + critical value) - First find the row in the table corresponding to degrees of freedom, then find the column that corresponds to alpha(2)=0.05 (2: 5% area is divided between the two tails of the distribution; 1: area under curve at only one tail of distribution so for one-sided hypothesis tests), corresponding cell contains critical value Determine confidence interval for the mean of a normal distribution with a chosen coverage probability (80%, 90%, 95%, 99%)

-

-

The 95% confidence interval for the mean will capture the population in 95% of random samples - 99% confidence interval is broader than 95% because we have to include more possibilities to achieve the higher probability of covering the true mean - 95% confidence interval for mu: all values that, if true, could plausibly produce x bar All steps of a one-sample t-test - One sample t-test compares the mean of a random sample from a normal population with the population mean proposed in a null hypothesis - State null and alternative hypotheses - Null is that the true mean is equal to a specific value mu knot - Alternative is that true mean does not equal mu knot - Calculate appropriate test statistic - Y bar is same as x bar (just book vs notes notation)

-

-

-

Y bar is sample mean, mu knot is population mean proposed by the null hypothesis, SE y bar is the sample standard error of the mean Determine null distribution including degrees of freedom - T distribution has n-1 degrees of freedom Calculate P-value or critical value to within precision available in table provided - P-value can be computed by comparing the observed t with the student’s t distribution - P value is probability of obtaining a result as extreme as or more extreme than t = __ assuming that null hypothesis is true (if two tailed: both tails of t-distr. Are included in the probability) - Critical value of test statistic marks off the point or points in the distribution that have a certain probability alpha in the tails of the distribution - Values of the test statistic that are further in the tails have p-values lower than alpha (if value of t is closer to 0 than critical value, we cannot reject the null hypothesis) Explain P-value in the context of a specific example

-

-

If p value is greater than alpha/significance level, we do not reject the null hypothesis Reach a conclusion for a given significance level Interpret concepts of chapter 6 specifically in relation to this test Give main assumptions and identify situations that do not meet those assumptions - Assumptions of one-sample t-test - The data are a random sample from the population - The variable is normally distributed in the population

CHAPTER 12: - Explain, interpret, and calculate methods for comparing means of two groups: all assume normality of underlying population - Two groups: Control and treatment (where we have done something to the variable and want to measure the effect) - Paired t-test (when and how you use it) - Both treatments are applied to each sampling unit - Measurements within a sampling unit are not independent - Controls for other variation in sampling units that might have nothing to do with treatment - Reduce paired measurements down to a single measurement by taking the difference between them (n=number of sampling units you have) -> use a one sample t-test on those differences

-

-

Assumptions - Pairs chosen at random (sampling units are randomly sampled from the pop) - Differences normally distributed (not necessarily the individual values); (paired differences have a normal distribution in the pop) Two-sample t-test (unpaired) including the pooled sample variance and relevant assumptions - Treatments applied to separate different sampling units - Each sampling unit is independent - Useful when a paired sampling design is not feasible - Pooled sample variance: the average of the variances of the samples weighted by their degrees of freedom - Assume two groups have same variance - Estimate variance jointly

-

-

-

-

-

Assumptions - Random sampling (each of the two samples is a random sample from its pop) - Numerical variable is normally distributed in each pop - SD and variance of the numerical variable are same in both populations Understand when you would use a Welch’s t-test, but don’t need to know calculations - Compares means of two groups and can be used when variances are not equal Using correct sampling units - If repeated measurements taken on each sampling unit, then not independent so must summarize data for each unit by a single measurement Fallacy of indirect comparison - Comparisons between two groups should always be made directly, not indirectly by comparing both to the same null hypothesized value Interpreting overlap of confidence intervals

CHAPTER 13: - Explain and interpret:(detecting deviations from normality) - Histograms - Useful for seeing deviations from a normal - Not normal: skewed right or left, very skewed right or left, extreme outlier

-

Normal quantile plots aka normal quantile-quantile (QQ) plots - Compares each observation in sample with corresponding quantile expected from standard normal distribution - Points should fall roughly along a straight line if the data come from a normal distribution

-

-

-

-

shapiro-Wilk test (don’t need to know how to calculate it) - Evaluates the goodness of fit of a normal distribution to a set of data randomly sampled from a population - First estimates the mean and standard deviation of pop using sample data, then tests goodness of fit to data of normal distribution having this same mean and standard deviation Explain what it means for a hypothesis test procedure to be incorrect, in terms of type 1 error rate, due to violations of assumptions - For two sample t-test, we assume variances are equal; two sample t-test is robust to fairly large differences in standard deviations between the two populations being sampled Explain and interpret concept of statistical robustness - A procedure that gives approximately valid results when assumptions are violated is robust - If a hypothesis test or confidence interval is robust, we can trust it even if assumptions are moderately violated - A lot of procedures or normal means (one and two sample t-tests and CI’s) are robust to violations of normality because of the central limit theorem - Two sample t-test assumes equal standard deviations but is robust even when one standard deviation is up to 3 times the other Understand when and how to use transformations - Can try diff. transformations until you find one that makes the data fit the assumptions, but cannot keep trying transformations until p < 0.05; must transform each individual the same way - Log transformation - Can only be applied to data when all values are greater than zero - Can add one to all data if data include zero

-

-

-

-

Useful when: - Measurements are ratios or products, distribution skewed right, group with larger mean has larger SD, data span several orders of magnitude - Arcsine transformation - Used for data that are proportions - Inverse of the sine function of the square root of the proportion - Arcsin(sqrt(p)) - Square-root transformation - Useful when data are counts - For Poisson data, gives approximately equal variances Understand why assessing and dealing with potential violations of assumptions is often a subjective process - If you have a small sample size and you don’t reject, you can’t conclude much - If you have a large sample size and you do reject, violation may be small and not matter much - test(s) of assumptions can themselves be less robust than primary test(s) you want to use Understand when each of these non-parametric tests would be used (don’t need to calculate non-parametric tests): assume less about underlying distributions - Sign test - Can be used in place of one-sample or paired t-test when not normal - Compares data to hypothesized constant about medium (null: ½ values above and ½ below) - Binomial test of number above and below; has very low power - Wilcoxon signed-rank test - Uses signed ranks instead of just signs - Test whether median = some null hypothesis value - Assumes symmetry - Mann-Whitney U-test - For two-sample t-test when normal distribution assumption not met - Uses ranks of all measurements - Null: two distributions are the same Understand pros and cons of parametric vs non-parametric tests in terms of type 1 error rate and statistical power - Nonparametric tests - Have lower statistical power (lower probability of rejecting a false null hypothesis) - But good because assume less about distribution - Parametric tests...


Similar Free PDFs