Canvas notes stat PDF

Title Canvas notes stat
Course Elementary Statistics
Institution The Pennsylvania State University
Pages 8
File Size 218.3 KB
File Type PDF
Total Downloads 118
Total Views 157

Summary

Summary notes of entire course taken from textbook resources - Professor Lock Morgan - virtual course...


Description

https://online.stat.psu.edu/stat200/lesson/2/2.3 Chapter 1 Cases: individuals from which the data is collected ● Aka participants and subjects Variables: characteristics that are being measures ● Something that can vary Categorical variables ● Named or labeled groups ● Ex. ice cream flavors, birth location, level of education, online courses taken Quantitative variable ● Numerical values ● Ex. weight, height, gpa, number of children, running distance Explanatory variable ● Independent variable ● The variable that is manipulated by the researcher Response variable ● Dependent or outcome variable ● The outcome that’s measured following manipulation Population - entire set of possible cases Sample - subset of population from which data is collected Parameter - a measure concerning population Statistic - a measure concerning a sample Sampling bias ● If a sample is not selected from the population randomly, it may be prone to bias ● Use simple random sampling to avoid this Experimental Research ● A study in which the researcher manipulates the treatments (ex. Level of the explanatory variable) received by subjects and collects data ● Eliminates confounding variables ● By randomly assigning cases to different levels of the explanatory variable, a causal conclusion can be made Observational Research ● A study in which the researcher collects data without performing any manipulations ● Cannot conclude causality since it’s not randomized ● Ex. giving people surveys Independent groups - cases in each group are unrelated to one another

Paired groups - cases in each group are meaningfully matched with one another Control group - a level of the explanatory variable that does not receive and active treatment; they may receive no treatment or a placebo Placebo group - a group that receives what, to them, appears to be a treatment but actually is neutral and does not contain any active treatment Blinding - to avoid bias in which the participants and/or researcher don’t know which treatment that each case is receiving ● Single blind: participants don’t know the treatment group that they’ve been assigned to ● Double blind: participants and researchers don’t know which cases have been assigned to which treatment groups Chapter 2 One categorical variable ● Proportion = number in category / total number ○ Sample: p hat ○ Population: p ○ Aka risk ● Odds = number with outcome / number without outcome ○ Comparing the likelihood of an event happening to the likelihood that it does not happen ● Visual representations ○ Frequency table ○ Pie chart ○ Bar chart Two categorical variables ● Difference in proportions ○ P1-P2 ● Visual representations ○ Two way table ○ Stacked bar chart One quantitative variable ● Single mean (t) ○ The numerical average ○ Population: mu ○ Sample: x bar ● Median ○ The middle of the distribution ● Visual representation ○ Dot plot



● ●



○ Histogram Distribution (shapes) ○ Symmetrical - similar on both sides ○ Normal - bell shaped ○ Right skewed - lower values are on the right ■ Mean > median ○ Left skewed - lower values on the left ■ Mean < median Standard deviation - Measures of spread Z scores - describes an observation in relation to the distribution of all observations ○ By converting to z scores, we can compare observations from different distributions ○ Z = (statistic - mean) / Standard deviation ○ Stat, mean, and sd are all from the original distribution 5 number summary ○ Minimum = smallest value ○ Q1 = 25th percentile ○ Median = middle value (50th percentile) ○ Q3 = 75th percentile ○ Maximum = largest value ○ Range = max - min ○ IQR = Q3 - Q1

Chapter 3 Box Plots

One quantitative and one categorical variable ● Difference in means or paired t ○ Conditions to use t distribution ■ Sample size is larger than 30 or symmetric ● Visual representations ○ Side by side box plots ○ Dot plots with groups ○ Histograms with groups Two quantitative variables







Correlation ○ population = p (rho) ○ Sample = r ○ Measures the direction and strength of the relationship ○ Number determines the strength and sign determines the direction ■ Positive association: r > 0 ■ Negative association: r < 0 ■ No association: r = 0 ■ The closer r is to 0, the weaker the relationship ■ The closer r is to +1 or -1, the stronger the relationship ○ Correlation does not equal causation Simple linear regression uses one quantitative variable to predict a second quantitative variable ○ Explanatory variable - variable being manipulated (x) ○ Response variable - outcome variable (y) ○ Slope = measures how steep the line is ■ predicted change in Y for every unit change in X (keeping other X variables constant if more than one X) ○ Intercept = location on y axis ■ predicted Y when X value is 0 ○ Residual = the difference between an observed y value and predicted y value ■ y - y hat ■ observed - predicted ● Predicted (y hat) is the value you get from plugging number into equation ● Observed is the response variable ○ R^2 =proportion of variability in Y explained by Xs ■ The sign of r is determined by whether the relationship is positive or negative (look at slope) ■ Higher r^2 = stronger correlation ■ Lower r^2 = weaker correlation Visual representation ○ Scatter plot - explanatory variable on x axis and response variable on y axis ○ Bubble plot (multiple variables)

Chapter 4 Confidence Intervals ● Answers a question using numbers ● Uses data collected from a sample to estimate a population parameter ● A range of numbers that is stated with a level of confidence ● When sample size increases, confidence interval becomes more narrow ● 95% confidence interval ○ Sample statistic +/- 2(SE) ○ Sample statistic +/- margin of error ● Interpretation: “we are 95% confidence that the population parameter is between x and x” ○ Ex. is there evidence of a positive correlation between height and weight (.410, .55) ■ Yes because interval is greater than 0 ○ Ex. is there evidence that a proportion of females who always wear their seatbelt is different from .65? (.612, .668) ■ .65 is contained in the confidential invertwel so we cannot conclude that the population proportion is different from .65 ○ Ex. is there evidence that the mean IQ score at this school is different from the national average of 100? (96.6, 106.4) ■ 100 is within the confidence interval so there is not evidence that it’s different from 100. Sampling distributions ● Distribution of sample statistics with a mean approximately equal to the mean in the original distribution and a standard deviation known as the standard error ● Statkey: ○ For sampling distributions of quantitative variables → click mean ○ For sampling distributions of categorical variables → click proportion ○ Ex. proportion

■ ■ ■ ■



Pick dataset or edit proportion Change sample size if needed Generate 5000 samples Box inside graph: displays the number of samples and the mean/SE of the SAMPLE As sample size increases, the variability of the sampling distribution decrease ○ So SE decrease

Bootstrapping ● A resampling procedure for constructing a sampling distribution using data from a sample (population values are not known) ● Must have same sample size as original and include the values that are in the original sample ● Bootstrap confidence interval ○ 95% CI: statistic +/- 2(SE) ○ Use the original sample statistic and standard error from the bootstrap distribution ● Bootstrap confidence interval in Statkey: ○ Click bootstrap CI ○ Select dataset/input number ○ (One dot represents one bootstrap sample) ○ Generate 5000 samples ○ Click two tail → input confidence interval Chapter 5 Hypothesis tests - Answers a yes/no question P values ● Used to evaluate statistical significance ● P value is calculated assuming the null is true ● P value’s smallest value is 0 ● P value < alpha ○ P value is smaller ○ Reject the null hypothesis and there is a enough evidence (there’s a difference in the population) ○ Results are statistically significant ● P value > alpha ○ P value is larger ○ Fail to reject the null and there is not enough evidence ○ Results are not statistically significant Null Hypothesis ● The statement that there is a not a difference in the population

● Ho ● Always = ● Use population parameters Alternative Hypothesis ● The statement that there’s some difference in the population ● Ha ● Use population parameters (mu, p, rho) ● Not equal < or > ○ Not equal is two tailed ○ < is left tailed ○ > right tailed Randomization Procedures ● Centered around the null ○ N (null value, SE) ● Statkey ○ Click randomization test ○ Input data and edit null hypothesis if needed ○ Generate 5000 samples ○ Click two tail, left tail, or right tail ○ Edit the bottom number to match the original sample statistic ○ Two tails ■ Add the two top values together to get the p value Chapter 6 Errors ● Type I error ○ Rejecting Ho thinking it’s false when Ho is actually true ○ False positive? ○ If sample size increases, you’re more likely to make a Type I error ■ This is because SE decreases so the p value decreases and that means that the observed statistic is further away from the null → more likely to reject the null ● Type II error ○ Not rejecting Ho because you think it’s true but Ho is actually false ○ False negative? Chapter 7 Standard Normal Distribution ● Bell shaped and symmetric ● Mean of 0 and standard deviation of 1 ● Aka z distribution

Z scores ● Distance between an individual score and the mean in standard deviation units ● z= (stat - mean) / sd ● Stat, mean, and sd are all from original distribution Central Limit Theorem Test statistic = (sample statistic - null parameter) / standard error ● Or slope / SE Confidence Interval ● Sample statistic +/- z* (SE) ● z* multiplier can be found by constructing a z distribution in minitab ● Or use statkey

○...


Similar Free PDFs