Stats 250 Study Guide Test 1 PDF

Title Stats 250 Study Guide Test 1
Author John Bergeson
Course Introduction to Statistics and Data Analysis
Institution University of Michigan
Pages 5
File Size 183.3 KB
File Type PDF
Total Downloads 107
Total Views 136

Summary

Download Stats 250 Study Guide Test 1 PDF


Description

Statistics 250 Exam 1 Review Sheet 1. 2. 3. 4. 5.

Topics Covered: Understanding Data Sampling Probability Random Variables Confidence Intervals & Hypothesis Testing

Understanding Data BASIC DEFINITIONS  Statistics: Collection of procedures and principles for gathering data and analyzing information in order to help people make decisions  Variable: A characteristic that differs from one individual to the next o Categorical: Places an individual into one of several groups or categories  Pie Graph  Bar Chart  Frequency Tables o Quantitative: Takes numerical variables when arithmetic operations are used  Histogram  Q-Q Plots  Time Plots  Box Plots  Scatter Plots  Data from a population is referred to as a parameter (μ, σ)  Data from a sample is referred to as a statistic ( , s) HOW TO INTERPRET DISTRIBUTIONS Shape (Approximately Symmetric, Skewed, Bell-Shaped, Uniform)  Location (Center, Average) = Mean, Medan  Spread (Variability) = Range, Standard Deviation  Outliers = A data point not consistent with the bulk of the data  BOX PLOTS & IQR Box Plot: A graphical representation of the give-number summary  o Five Number Summary: Min, Q1, Median, Q3, Max IQR: Q3 – Q1  Outliers: If any number in the data set is below the lower bound or above the upper  bound. In a Modified Box Plot, the outliers become circles on the graph and separated from the Box Plot o Lower Bound: Q1 – (1.5)(IQR) o Upper Bound: Q3 + (1.5)(IQR) Pros of Box Plot: Good for comparison and shows outliers  Cons of Box Plot: Can NOT confirm shape  riba

BELL SHAPED DISTRIBUTION  Empirical Rule: 68 – 95 – 99.7 o 68% of values fall within 1 standard deviation from the mean o 95% of values fall within 2 standard deviation from the mean o 99.7% of values fall within 3 standard deviation from the mean  Z-Score: The distance between the observed value and the mean, measured in number of standard deviations o Z = (observed value – mean) / standard deviation

Sampling BASIC DEFINITIONS  Descriptive Statistics: “To Describe” – Describing data using numerical summaries and graphical summaries  Inferential Statistics: “To Decide” – Using sample information to make conclusions about a larger group of items/individuals than just those in the sample  Population: The entire group of items/individuals that we want information about, about which inferences are made  Sample: The smaller group, the part of the population we actually examine in order to gather information FUNDAMENTAL R ULE FOR USING DATA FOR INFERENCE  Available data can be used to make inferences about a much larger group if the data can be considered to be representative with regard to the question of interest  Random Selection BIAS: HOW SURVEYS CAN GO WRONG  Bias: Systematic deviation from the truth, producing values too high or too low O Selection Bias: Occurs if the method for selecting the participants produces a sample that does not represent the population of interest O Nonresponse Bias: Occurs when a representative sample is chosen for a survey, but a subset cannot be contacted or does not respond. O Response Bias: Occurs when participants respond differently from how they truly feel. The way questions are worded, the way the interviewer behaves, as well as many other factors might lead an individual to provide false information. OBSERVATIONAL OR EXPERIMENTAL  Observational Studies: The researchers simply observe or measure the participants (about opinions, behaviors, or outcomes) and do not assign any treatments or conditions. Participants are not asked to do anything differently.  Experiments: The researchers manipulate something and measure the effect of the manipulation on some outcome or interest. Participants are randomly assigned to the conditions or treatments. Effect of the explanatory variable on the response variable.  Confounding Variables: A variable that affects both the response variable and also is related to the explanatory variable. Its effects cannot be separated from either variable.

Probability RULES OF PROBABILITY  Complement Rule: P(AC) = 1 – P(A)  Addition Rule: P(A or B) = P(A) + P(B) – P(A and B)  Multiplication Rule: P(A and B) = P(A) x P(B/A)  Conditional Probability: P(A/B) = P(A and B) / P(B) DEFINITIONS  Mutually Exclusive (or Disjoint): If they do not contain any of same outcomes; no intersection. o P(A or B) = P(A) + P(B)  Independent: If knowing one occurs does not change the probability of the other to occur. o P(A/B) = P(A) o P(A and B) = P(A) x P(B) SAMPLING AND REPLACEMENT IN TERMS OF PROBABILITY  If a sample is drawn from a very large population, the distinction between sampling with and without replacement becomes unimportant

Random Variables DEFINITIONS  Random Variable: Assigns a number to each outcome of a random circumstance, or equivalently, a random variable assigns a number to each unit in a population o Discrete Random Variable: A Finite or countable number of possible outcomes.  Binomial: Number of successes that occur in a sample o Continuous Random Variable: Can take any value in an interval or collection of intervals  Uniform  Normal o Rules for Means:  Mean (X+Y) = Mean (X) + Mean (Y)  Mean (X-Y) = Mean (X) – Mean (Y) o Rules for Variances: Special Case  Variance (X+Y) = Variance (X) + Variance (Y)  Variance (X-Y) = Variance (X) + Variance (Y) DISCRETE RANDOM VARIABLES  Conditions: Must always apply to the probabilities for Discrete Random Variables 1. The sum of all of the individual probabilities must equal 1. 2. The individual probabilities must be between 0 and 1  Probability Distribution Function (PDF): A table or rule that assigns probabilities to the possible values of the X  Cumulative Distribution Function (CDF): A table or rule that provides the probabilities P(X ≤ k) for any real number k.  Graphical Display of Data: Stick Graph (Bar graph look-alike)

EXPECTATIONS FOR DISCRETE RANDOM VARIABLES  Expected Value: E(x) is the mean value that would be obtained from an infinite number of observations on the random variable o μ = E(x) = Σ(xi pi)  Standard Deviation: σ = √(Σ(xi – μ)2 pi) BINOMIAL RANDOM VARIABLES  Conditions: 1. There are n “trials” where n is determined in advance and is not a random value 2. There are two possible outcomes on each trial, called “success” (S) and “failure” (F) 3. The outcomes are independent from one trial to the next 4. The probability of a “success” remains the same from on trial to the next, and this probability is denoted by p. The probability of a failure is 1 - p for every trial  Probability of exactly k successes in n trials: o P(X = k) = (n choose k)(p)k(1-p)n-k  n choose k = (n!)/(k!(n-k)!)  Calculator Trick: #(n)  Math  Prob  NCR  # (k) o Normal approximation works instead when both np and n(1-p) are at least 10.  Mean: μ = E(x) = np  Standard Deviation: σ = √(np(1 – p)) GENERAL CONTINUOUS RANDOM VARIABLE  If it is a continuous random variable, it takes on a density curve o Key Idea: Area under a density curve over a range of values corresponds to the probability that the random variable X takes on a value in that range.  Conditions: A curve is called a probability density curve if 1. It lies on the horizontal axis 2. Total area under the curve is equal to 1

Confidence Intervals and Hypothesis Testing DEFINITIONS  Confidence Interval: A range of values that the researcher is fairly confident will cover the true, unknown value of the population parameter.  Hypothesis Testing: Uses sample data to attempt to reject a hypothesis about the population. Hypothesis testing proceeds by obtaining a sample, computing a sample statistic, and assessing how unlikely the sample statistic would be if the null parameter value were correct. Researchers are trying to show that the null value is not correct. CONFI DENCEI NTERVALS  Ge ne r a lFor ma t :Sa mpl eEs t i ma t e±( Z*) ( St a nda r dEr r or ) o Ho w“ c onfide nt ”wewa ntt ob ec h a n g e sZ* o St a nd a r dEr r or=s t a nd a r dd e vi a t i onwi t hpha tr a t he rt h a nps i nc ei ti sa ne s t i ma t e  Commonc on fide nc el e v e l sa r e90 %, 9 5%, 98%,99 %  95% i smos tc ommo n  Con di t i ons :





1 .Thes a mpl ei sar a nd oml ys e l e c t e ds a mpl ef r omt hepo pul a t i on .Av a i l a bl eda t ac a nb e u s e dt oma k ei nf e r e n c e sa bou tamu c hl a r g e rgr oupi ft hed a t ac a nbec ons i de r e dt obe r e pr e s e nt a t i v ewi t hr e g a r dt ot heq ue s t i onofi nt e r e s t . 2 .ni sl a r g ee n ou ghs ono r ma la ppr o xi ma t i onhol dst r ue( n pa ndn( 1p)≥ 10 ) For mul a s( OnYe l l o wCa r d )t ha ts hou l dbea wa r eo f : o Con s e r v a t i v eMa r g i nofEr r o r :m=( Z* ) / ( 2 √n) 2 o Sa mpl eSi z e :n=[ ( Z* ) / ( 2 m) ] I mp or t a ntNot e s :I f50% i si nc l u de di nt hei nt e r v a l ,y ouc a nno ts a yonei smor el i k e l yt ha nt h e o t he r .

HYPOTHESIS TESTING  Ba s i cSt e ps : 1 .De t e r mi n enu l la nda l t e r na t i v eh yp ot he s e s( Ho&Ha) 2 .Ve r i f yne c e s s a r yd a t ac o ndi t i on s , a ndi fme t ,s umma r i z et heda t ai nt oa na p pr opr i a t e t e s ts t a t i s t i c . 3 .As s umi n gt hen ul li st r u e ,findt hep v a l ue 4 .De c i dewh e t h e rorno tt her e s u l ti s“ s t a t i s t i c a l l ys i gn i fic a nt ”b a s e dont hep v a l ue  Pv a l ue≤ a l pha ,t he nI Ss t a t i s t i c a l l ys i gn i fic a nt o Re j e c tnu l li fi ti ss t a t i s t i c a l l ys i gni fic a nt  “ Si nc eou rpv a l u eof__ _i sl a r g e rt ha n_ __wes t a yr e j e c t / f a i lt or e j e c tHo, s oi t i s / i s n ’ ts t a t i s t i c a l l ys i gn i fic a nt ” 5 .Re por tt hec onc l us i oni nt hec ont e x toft h es i t ua t i on  Da t ai ss umma r i z e di nat e s ts t a t i s t i c ,as t a nda r di z e ds t a t i s t i ct ha tme a s u r e st hedi s t a nc e b e t we e nt hes a mpl es t a t i s t i ca ndt hen ul lv a l uei ns t a nda r de r r oruni t s o Te s tSt a t i s t i c=( Sa mpl eSt a t i s t i c–Nul lVa l ue )/Nu l lSt a nda r dEr r or TWO TYPES OF ERRORS  Type 1 Error: Rejecting Ho when Ho is true (α)  Type 2 Error: Failing to reject Ho when Ha is true (β) o Power = 1 – β  Conditions that affect power: o Sample Size: larger sample size leads to higher power. o Significance Level: larger  leads to higher power. o Actual Parameter Value: a true value that falls further from the null value (in the direction of the alternative hypothesis) leads to higher power (however this is not something that the researcher can control or change)  We are most concerned with type 1...


Similar Free PDFs