Stats exam 1 review PDF

Title	Stats exam 1 review
Author	Sajil Ismail
Course	Biostatistics
Institution	University of Texas at Austin
Pages	7
File Size	261 KB
File Type	PDF
Total Downloads	107
Total Views	145

Preview

CLICK TO PREVIEW PDF

Summary

review...

Description

Sefia Khan SDS 328M Exam 1 Review DATA: Descriptive Statistics: summarizing and displaying data (median, histogram) Inferential Statistics: using estimates from a sample to draw conclusions about a population (hypothesis testing, correlation coefficient, etc) Population Data: entire collection of individuals of interest - Use parameters to describe characteristics  mean = µ, sd = σ Sample Data: subset of population - Use statistics to describe sample characteristics  mean = ¯x, sd = s) Properties of Good Samples: no bias or variance Sampling Error (variance) - difference between an estimate and the population parameter being estimated caused by chance - lower the sampling error, higher the precision - precise (no variance)  values obtained are tightly grouped and repeatable

-

Bias discrepancy between true population and estimates we would obtain if we could sample a population repeatedly would create an inaccurate estimate of the population accurate (no bias)  data centered around the bull’s eye or true population stats

Random Sampling: (external validity) - Eliminates bias and quantifies sampling error 1. Every unit must have equal chance of being included in population. 2. Selection of units must be independent. Nonrandom Sampling: - Convenience Sampling – selection based on being easily available to researcher - Volunteer Sampling – selection based on subjects’ desire to participate (may result in behavioral differences) VARIABLES: Numeric Variables: quantitative measurements a. Discrete  values are whole numbers with no intermediate values o ex – shoe size, number of children b. Continuous  any value in range, no gaps in between o ex – height, weight, distance, time

Categorical Variables: qualitative characteristics a. Nominal  unordered categories o Ex – gender, pet preference b. Ordinal  ordered categories o Ex – grade on exam, survey responses, classification in school EXPERIMENTS: Experiment: treatment is assigned randomly to individuals  reveal cause-effect Observational Study: treatment is not assigned randomly  reveal associations True Experiments: use random assignment XY - X and Y are correlated - X preceded Y in time - No other explanations for X  Y Minimize Bias: 1. Randomization o Creates two or more groups of units that are similar to each other o No preexisting differences o Allows researchers to tease apart the effects of the explanatory variable from those of confounding variable o Random assignment – making sure treatment and control groups are not systematically different prior to treatment (internal validity) Confounding variable: variable that masks the relationship between variables, bias the estimate - Influences explanatory and response variables so that we cannot discern actual effect - Non-manipulable events: natural disaster, childhood trauma - Non-manipulable attributes: age, height, being a smoker 2. Control o Group of subjects who do not receive treatments o Decrease placebo effect o Counterfactual – way to stimulate what would happen if treatment group was not treated 3. Blinding o Concealing information about who actually got treated from participants and researchers o Single blind  participants don’t know if they got treated  Prevents subjects from behaving differently  Requires indistinguishable treatments o Double blind  participants and experimenters unaware of who got treated  Prevents researchers from behaving differently towards subjects

Minimize Sampling Error: 1. Replication 2. Balance 3. Blocking/Stratifying o Stratified random sampling is when the population is divided into subpopulations and random samples are drawn from each subset/stratum o Ensures sample ends up with desired distribution o Reduces sampling error, higher precision Selection Bias: happens when the average person receiving one condition (treatment) different from average person receiving another condition (control). - Occurs when individuals select themselves into treatments, no randomization, etc. - Random assignment eliminates selection bias - Matching: create a control group that ‘looks like’ the treatment group in terms of possible confounding variables. Match every individual who has the characteristic of interest with someone who does not DISPLAYING DATA: Univariate: graphs/stats with a single variable Bivariate: graphs/stats with two or more variables; deals with relationships/association  use scatterplot, grouped boxplot/histogram/bar chart Frequency Distribution: counts for each value (or range of values) in dataset Type of Variable Categorical Numeric

Graph Bar Chart, contingency table Histogram

Histogram  splits the range of the data into equal sized bins and shows how many data points fall into each DESCRIBING DATA: Measures of Center: - Mean - Median o Resistant to outliers Measures of Spread: - Standard deviation - Range (max – min)

-

Interquartile range (Q3 – Q1) o Q1 – 25th percentile = median of lower half o Q3 – 75th percentile = median of upper half o To calculate outliers: IQR x 1.5  Add answer for outliers above Q3  Subtract answer for outliers below Q1

Skewed Distribution Report median and IQR

Symmetric Distribution Report mean and standard deviation

Mean get skewed to where the direction of the tail is:

PROBABILITY: Probability of an event is the ‘long run’ relative frequency Conditional Probability: P(A | B) x P(B) = P(A & B) The probability of event A given that B has happened Ex: “what is the probability of surviving, given that you were in first class?” P(alive | first class) / P(first class) Independent: occurrence of one event does not inform us about the probability of another - Events are independent if any of these are true: (multiplication rule) o P(A) × P(B) = P(A and B) o P(A) = P(A | B) o P(B) = P(B | A) Dependent: P(A or B) = P(A) + P(B) – P(A and B)  addition rule

Probability of events A or B (or both) occurring If A and B are mutually exclusive, then P(A and B) = 0 Probability Trees: - Roots are marginals: P(A) - Branches are conditionals: P(B | A) - Leaves are joints: P(A and B) - Multiply roots by branch to get leaf Law of Total Probability:

Bayes’ Theorem:

DISTRIBUTION: Normal Distribution: Defined by two parameters: mean (µ) and standard deviation (σ) - If variable X is normally distributed: X ~ N(µ, σ) - Most of the data found close to mean; less at extremes Properties: 1. Unimodal 2. Symmetrical 3. Mean = median = mode 4. Bell shaped 5. Height and width determined by standard deviation Standard Normal Distribution: - Z scores follow a standard normal distribution

For a population For a sample -

Z-scores in the range from −1.96 to 1.96 are usual or typical 95 % is usual

Empirical Rule:

Sampling Distributions: - Standard deviation of the sampling distribution (the standard error) is σx¯ = σ/√ n - Mean of sampling distribution same as the population - Standard deviation of sampling distribution (standard error) is much smaller HYPOTHESIS TESTING: Steps: 1. Hypothesize: state a claim and counterclaim about a population parameter (usually mean) o Null Hypothesis  statement of no effect / no difference  Ex: (µ = 0) o Alternative Hypothesis  statement of difference  Ex: µ 6= 0, µ < 0, or µ > 0 2. Test Statistics: To calculate whether your hypothesis is correct use x ~ N(µ, σ/√ n) a. Find the p value using the z score

3. P-value: probability of getting an estimate at least as extreme as the one we got in the direction of alternate hypothesis in the direction if null hypothesis is true 4. Conclusions: o If p is small  (reject null – hypothesis is true) our result is unlikely if null hypothesis is true, so we can reject null as being true o If p is large  (fail to reject null – hypothesis is false) our result is not unlikely if null hypothesis is true, so we cannot reject null o ‘how small’ is set by the significance level α (cut off) o α = .05

Errors in Hypothesis Testing: Conclusion Reject H0 Fail to reject H0

H0 true Type I error Correct

H0 false Correct Type II error

Power: the probability that a random sample will lead to rejection of a false null hypothesis - Study has more power: o If sample size is large o If the true discrepancy from the null hypothesis is large o If variability in the population is low...