Stats Cheat Sheet 2 - Lecture notes all PDF

Title Stats Cheat Sheet 2 - Lecture notes all
Author Chloe Battelle
Course Introduction To Probability And Statistics
Institution University of California, Berkeley
Pages 4
File Size 429.4 KB
File Type PDF
Total Downloads 56
Total Views 149

Summary

Cheat Sheet for final...


Description

Some Valid Rules of Reasoning Common Formal Fallacies • A or not A. (Law of the Excluded Middle) • A or B. Therefore, A. (If B is true, A or B is true. It could be B) • Not (A and not A). • A or B. A. Therefore, not B. (Affirming the Disjunct) • A. Therefore, A or B. • Not both A and B are true. Not A. Therefore, B.(Denying the Conjunct; both could be false) • A. B. Therefore, A and B. • If A then B. B. Therefore, A. (Affirming the Consequent) • A and B. Therefore, A. • If A then B. Not A. Therefore, not B. (Denying the Antecedent) • Not A. Therefore, not (A and B). • If A then B. C. Therefore, B. (Nonsequitur of Evidence; C sounds like A) • A or B. Not A. Therefore, B. (Denying the Disjunct) • If A then B. Not C. Therefore, not A. (Nonsequitur of Relevance; If B sounds like C) • Not (A and B). Therefore, (not A) or (not B). (de Morgan) • If A then B. A. Therefore, C. (Nonsequitur of Relevance, If C sounds like B) • Not (A or B). Therefore, (not A) and (not B). (de Morgan) • If A then B. Not B. Therefore, not C. (Nonsequitur of Relevance) • If A then B. A. Therefore, B. (Affirming the Precedent) • Valid: deductive reasoning that is mathematically correct; when • If A then B. Not B. Therefore, not A. (Denying the Consequent) premises are true, conclusion must be true 2 Types of Reasoning • Fallacious: deductive reasoning that is incorrect 1. Inductive: Requires correct deductive reasoning; inherently • Sound: reasoning is valid and based on true premises (valid & unsound = uncertain, generalize from experience factually incorrect b/c one of premises is false 2. Deductive (AKA “logic): thinking mathematically Common Fallacies of Relevance Common Fallacies of Evidence • Inappropriate appeal to authority: All animals w/ rabies go crazy. Jessie • Positively Relevant: Adds weight to assertion says my cat has rabies. Thus, my cat will go crazy. (If A then B. C. • Ad Hominem (personal attack): attack person rather than reasoning Therefore, B) • Bad Motive: Addresses motives of person to attack them • Appeal to ignorance: Lack of evidence that a statement is false is not • Tu Quoque (look who’s talking): person is wrong because they are a hypocrite evidence that the statement is true. • Two Wrongs Make a Right: fine to do something because someone else did • False dichotomy: It starts with a premise that is an artificial "either-or,". • Ad Misericordium (appeal to pity): pleading with extenuating circumstances It is possible to do both. • Ad Populum (bandwagon): It is moral b/c it is common, not everyone can be • Loaded question: Did you know that the sun goes around the Earth? wrong Statements that presuppose something • Straw Man: Attack the more vulnerable claim as if it refutes the original • Questionable cause: post hoc ergo propter hoc (after this, therefore • Red Herring: distraction from real topic because of this), giving coincidences special significance • Slippery slope: If A then B. If B then C. If C then D, etc. Eventually, Z. • Equivocation: use fact that word can have >1 meaning • Ad Baculum (appeal to force, nonsequitur of evidence): If you do/don’t do So, you must prevent A. something, something bad will happen • Hasty generalization: Some x are (sometimes) A. Therefore, most x are (always) A. Sample could be biased. Chi-Square Test (𝑋 − 𝐸(𝑋))2 • Weak analogy: X is similar to y in some regards. Therefore, everything that is true for x is true for y. 𝐸(𝑋) • Inconsistency: "Nobody goes there anymore. That place is too crowded." (A. Not A) Do for all terms, add together Informal Fallacies (error in reasoning) • Non sequitur of relevance: He says X is true. He does y. Anyone who does Y is a bad person. Therefore, X is false. (If A then B. A. Therefore C.) (need If B, then C) • Non sequitur of evidence: All Ys are Zs. Mary says X is a Y. Therefore X is a Z. (Need to add if Mary says X is a Y. X is a Y). Types of data: Qualitative data=ordinal (hot, warm, cold), quantitative data=discrete (countable, ex: annual number of sunny days) or continuous (no min spacing between values, ex: temperature), categorical (gender, zip code, type of climate) Histograms (Base=class interval; area=fraction of data Area of the bin = (fraction of data in the class interval) = (# observations in class interval) / (total # of observations) Height of bin = (relative frequency) / (width of class interval) OR (fraction of data in the class interval) / (width of class interval) Frequency Tables: Lists frequency (number) or relative frequency (fraction) of observations that fall in various class intervals based on a decided endpoint convention (usually include L boundary and exclude R) Skewness and modes x Skew right: more data to left of center-> mean larger than median x Skew left: more data to right of center-> median larger than mean x Unimodal: consists of only one “bump” (usually multimodal/bimodal) Estimating percentiles from histograms 1 𝑛−1 25th = smallest number that is at least as large as 25% of data > 50th = smallest number that is at least as large as half the data 𝑛(𝑛 − 2) 𝑛 75th = the smallest number that is at least as large as 75% of the data Probability win by switching general: choose number at least as big as % given pth percentile: approximate point on horizontal axis s.t. area under to the left of the point is p% Median ( in order): At least half the data are equal to or smaller than the median and equal to or larger than the median ***Choose the left number when it is an even amount of data*** x Histogram: Median is where the area is split in half evenly; harder to skew, must corrupt half of data x Must corrupt half the data to make the median arbitrarily large or small x Ex: whether a country is affluent, typical salary at a job Mode = Histogram: Mode is the highest bump; most common value (if all occur at once, all #s are the mode) Mean = Histogram: Mean is where the histogram would balance; Sum of data / # data, smallest rms difference x *Changing one datum can make the mean arbitrarily large or small* x Ex: how much can a family afford to spend on housing IQR= Upper quartile (75th%) – lower quartile (25th%) = middle 50%; resistant/insensitive to extremes; IQR=0 if at least half #s in list are = Root mean square (rms) = sqrt((sum of the m squares of the entries)/(number of entries) ) Standard deviation = sqrt(((6 xi –)^2)/n) Affline Transformations x mean = 0 → SD = RMS x Mode/Median/Mean is a x (__ of original) + b x Range/SD = |a| x (__ of original) Å not affected by b x SD = 0→all numbers in list =, IQR=0, range=0 Markov's Inequality (for lists) x IQR = a x (__ of original) if a is positive x If the mean of a list of numbers is M, and the list contains no negative number then [fraction of numbers in the list that are greater than or equal to x] ≤ M/x. (multiply by n for actual #) Chebychev's inequality (for lists) x If the mean of a list of numbers is M and the standard deviation of the list is SD, then for every positive number k, [the fraction of numbers in the list that are k×SD or further from M] ≤ 1/k2. x Inside a range is at least (1-1/k2); Outside a range is at most (1/k2) x **Use whichever produces the smallest number (more restrictive) Point of averages (red square) = a measure of the center of a scatterplot (mean(x), mean(y)) Linearity (straight line plot) and Nonlinearity (curved plot) Homoscedasticity (equal scatter) and Heteroscedasticity (equal scatter depending on where take slice)

Outliers = Point that does not fit the data, many SD away Association: property of 2 or more variables (not the same as causation; scatter in x is smaller than SDx) • (-) association: larger than average values of one variable have smaller than average values of the other Correlation coefficient (r) = (X1*Y1 + X2*Y2 + … + Xn*Yn)/n where X,Y are in standard units • How nearly the data fall on a straight line (nonlinear curve → bad summary of association; even if association is strong, if it is nonlinear, r can be small or 0). If two points are on a straight line, r=1. If two variables are perf correlated, doesn’t mean there is a causal connection • (- slope line) -1 < r < 1 (+ slope line) always, r=0 if data doesn’t cluster along straight line SU = (original value-mean)/SD or (X – E(X))/SE(X); List in standard units → new mean is 0 and new SD is 1 Original value = (value in SU)*SD + mean • To find normal approximation: convert to SUs and find area under curve between those two points • Secular trend is a linear association (trend) with time • Not a good approximation if (1) Nonlinear association (2) Heteroschedastic (3) There are outliers Football-Shaped graphs work well with r and are summarized well by mean of x, mean of y, SD of x, SD of y The SD Line • Goes through the point of averages (a single point) • Has slope equal to SDy/SDx if the correlation coefficient R is greater than or equal to zero; –SDy/SDx if r is negative • When r>0 most values of Y are above SD line to the left and below SD line to the right • When r0 x Vertical residual = difference between the value of Y and the height of the regression line (measured Y) - (estimated Y) x 𝒚 = 𝒓 𝑺𝑫𝒚 ⁄𝑺𝑫𝒙 (𝒙) + [𝒎𝒆𝒂𝒏(𝒀) − 𝒓 𝑺𝑫𝒚⁄ 𝑺𝑫𝒙 (𝒎𝒆𝒂𝒏(𝒙))] = equation or regression line x ***If the regression line was computed correctly, the point of averages of the residual plot will be on the x axis, and the residuals will not have a trend (horizontal line good): the correlation coefficient for the residuals and X will be zero. If the residuals have a trend and their average is not zero, then the slope of the regression line was computed incorrectly. A residual plot shows heteroscedasticity, nonlinear association, or outliers iff the original scatterplot does x Special cases: r=0 →line is horizontal and slope=0; r=1→ all points fall on a line with a positive slope, regression=SD line x Extrapolation = estimating value of Y w/value bigger/smaller than any observed; Interporlation: estimating within actual range Residual Plots (x1, e1), (x2, e2),…,(xn, en) x Tells us (1) whether it is appropriate to use a linear regression, (2) whether the regression was computed correctly x Vertical residual is e1=y1-(ax1+b) where y1 is the measurement and (ax1+b) represents the residual y value x Points above the regression line are >0 on the residual plot x e’s AKA residuals, should average to zero if done correctly and should not have a trend x Easier to see (1) heteroschedasticity (2) nonlinear association [seen as + in some ranges, - in others] (3) outliers on a residual plot than a scatterplot (only if the original had it to begin with) ∈ = is an element of RMS error of residuals of Y against X = √(1 − 𝑟2 ) 𝑥 𝑆𝐷(𝑌) (same units as original, not unitless; value between 0 and SD) ∩ = intersection x Basically SD: it is the rms of the vertical residuals from the regression line U = union x Regression effect: Second score is less extreme than the first (students landing planes ex.) ⊂ = is a subset of Permutations = 𝑛𝑃𝑘 = 𝑛!⁄(𝑛 − 𝑘 )! Ɐ = every or all Combination = 𝑛𝐶𝑘 = 𝑛!⁄ 𝑘! (𝑛 − 𝑘)! ᴲ = there exists, some, at least Strategies for Counting = (1) divide into smaller, non-overlapping subsets (2) divide by 2 for double counting (3) make a tree Equally likely outcomes = Probability assignments depend on the assertion that no particular outcome is preferred over any other by Nature . Probability of each outcome is 100%/(n possible outcomes). Relies on natural symmetries. Frequency theory = Probability is the limit of the relative frequency with which an event occurs in repeated trials; repeat enough under ideal conditions and the percentage it will occur will converge on a % p q (p→q) The Subjective Theory = Probability measures the speaker’s degree of belief that the event will occur; good for one-time events T T T ex: I believe that A will happen twice as strongly as I believe that A will not happen. T F F Set Theory Facts False Set Theory Statements Other Set Theory Facts F T T x Sc = {} and {}c=S x (AUB) ⊂B x If (AUB)=C and (A∩C)={}, then A={} F F T C C x Every set is a subset of itself x If A⊂B then (AUB)=B, If A =B then A=B x If A∩B={} and B∩C={}, then A∩C={} p q (p|q) x If A⊂B and B⊂A, then A=B x A⊂(A∩B) x A∩B⊂B, B⊂AUB T T T x If A⊂B, then BC⊂AC x If A⊂B, then AUB=A x IfAC⊂B and B⊂C, then CC⊂A T F T x If A⊂B and B⊂C, A⊂C x If A⊂B then Ac⊂BC x If x is not in A∩B, then x is not in A and F T T x is not in B x If x is not in A∩B then x is not in A and x is not in B F x A∩B⊂A, A∩B=A iff A⊂B F F x {}∩A={} x If x is in A∩B and x is in C then x is in x If x is in A∩B or x is in B∩C then x is in A∩C p q (p↔q) A∩C or x is in B∩C x S∩A=A x If x is in A∩B or A∩C then x is in B∩C T T T x A∩(B∩C)=(A∩B)∩C=A∩B∩C x If x is in A∩B or A∩C, then x is in A x If x is in A or x is in BUC then x is in AUB and AUC T F F Commutative/Associative within x If x is in A or B∩C, then x is in AUB and AUC x If x is not in A∩B, then x is not in A or B F T F themselves but not w/each other x If every A is a B then every non-B is a non-A x If x is not in A or x is not in B then x is not in AUB F F T x A∩(BUC)= (A∩B)U(A∩C) x If x is in A & x is in BUC then x is in A∩B or A∩C x If x is in AUB or x is in C then x is in AUC and x is p q (p&q) x AU(B∩C)= (AUB)∩(AUC) x If x is not in AUB, then x is not in A and B in BUC T T T x {}UA=A, {}⊂A x If every non-A is a non-B then every B is an A x If every non-A is a non-B then every A is a B T F F x SUA=S x If every A is a B, then every non-A is a non-B x If x is in A∩B or x is in B∩C then x is in B F T F x AU(BUC)=(AUB)UC =AUBUC x If x is in A∩B or x is in C then x is in AUC x A∈A F F F Useful identities: and x is in BUC x If A⊂B then AUB=B p → q = ( !q → !p ) = !p | q de Morgan’s Rules x If x is not in AUB then x is not in A and de Morgan’s Law: p ↔ q = (p & q) | (!p & !q). x is not in B x (A∩B)C=ACUBC !(p | q) = !p & !q (p | ! q) = T x (A∩B) ⊂A, A⊂(AUB) !(p & q) = !p | !q x (AUB)C=AC∩BC (p & ! q) = F x A⊂AUB, A=AUB iff B⊂A x If A⊂AC then A={}, If AUB={} then B={} (p & p) = (p | p) = p x If (AUB)=A then B⊂A, If AUB=A then A∩B=B Negation ! Disjunction | v or Conjunction & and ^ → = implies, if p then q = if and only if

Exhaust: A collection is exhaustive of A if every element of A is in at least one of the sets Partition: break up a complicated set without double counting Axioms of Probability: (1) Chances are always at least zero. For any event A, P(A)≥0 (2) The chance that something happens is 100%. P(S)=100% (3) If two events cannot both occur at the same time (if they are DISJOINT OR MUTUALLY EXCLUSIVE), the chance that either one occurs is the sum of the chances that each occurs. If AB={} P(AUB)=P(A)+P(B) Mutually exclusive = two events cannot both occur in the same trial: The probability of their intersection is zero. The probability of their union is the sum of their probabilities. One is incompatible with occurrence of the other. ex. Prob of “A or B” is largest when A and B are mut. Exc. Cardinality is the # elements it contains Independent = two events can occur in the same trial. The probability of their intersection is the product of their probabilities. The probability of their union is less than the sum of their probabilities, unless at least one of the events has probability zero. *the only way two events can be both is if at least one of them has probability equal to zero. x If A is a subset of B, P(AB)=P(A) and P(AUB)=P(B) A & B are Independent if P(AB) = P(A) x P(B) x P(AB)≤P(A) x P(AUB)=P(A)+P(B)-P(AB) x 0 ≤ P(AB) ≤ P(A) ≤ P(AUB) ≤ P(A) + P(B) (equal if disjoint) x P(not A) = 100% - P(A) Conditional Probability of A given B = 𝑷(𝑨|𝑩) = 𝑷(𝑨𝑩)⁄ 𝑷(𝑩)= P(B|A) ×P(A)/( P(B|A)×P(A) + P(B|Ac) ×P(Ac) ). = (P(ABC)+P(ABCc))/P(B) x x x

Binomial distribution w/ replacement EV = np SE = √𝑛(𝑝(1 − 𝑝)) Prob = nCk (p)k(1−p)n−k Chance of success must be the same in every trial

The Hypergeometric Distribution = A random sample without replacement of size n from a population of N units, It gives for each k the chance that the sample sum of the labels on the tickets equals k, for a simple random sample of size n from a box of N tickets of which G are labeled "1" and the rest are labeled "0." x EV = n(G/N) x SE = √𝑁 − 𝑛 ⁄𝑁 − 1 ∙ √𝑛 ∙ √𝐺 ⁄𝑁 ∙ (1 − (𝐺⁄𝑁)) x Prob = GCk × N−GCn−k / NCn

Geometric Distribution = The number of random draws with replacement from a 0-1 box until the first time a ticket labeled "1" is drawn is a random variable with a geometric distribution with parameter p=G/N, where G is the number of tickets labeled "1" in the box and N is the total number of tickets in the box. x EV = 1/p x SE = (√1 − 𝑝)⁄𝑝 x Prob = P(X = x) = (1 − p)x−1 × p

Negative Binomial Distribution = the chance that it takes k draws to get a ticket labeled "1" the rth time x EV = r/p x SE = (√𝑟(1 − 𝑝))⁄𝑝 x Prob = k−1Cr−1 × pr−1 ×(1−p)k−r × p = k−1Cr−1 × pr ×(1−p)k−r Expected Value (As n↑ Number of successes within a range of of EV decreases, Percentage of successes within range of EV increases x E(X) = x1×P(X = x1) + x2×P(X = x2) +x3×P(X = x3) + … . x EV of the sample sum of n random draws with or without replacement from a box of labeled tickets is n×(average of the labels on all the tickets in the box). ((If the draws are without replacement, the number of draws cannot exceed the number of tickets in the box.)) x If the random variables X and Y are independent, then E(X x Y) = E(X) x E(Y) Standard Error of Affline Transformation does not depend on additive constant b →If Y=aX+b, SE(Y) = |a| x SE(X) Standard Error x P(X=x) (x-E(X))2 (x-E(X))2 x P(X=x) x SE(X) = (E (X – E(X))2)1/2 →Make a chart with Sum the last column, take sqrt 2 2 1/2 2 2 x SE(X1+X2+X3+…+Xn) = (SE (X1)+ SE (X2)+ SE (X3)+…+ SE (Xn)) E(C)=C 𝑓 = 𝑓𝑖𝑛𝑖𝑡𝑒 𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑐𝑜𝑟𝑟𝑒𝑐𝑡𝑖𝑜𝑛 x SE of the SAMPLE SUM of n INDEPENDENT draws = √𝑛 ∙ 𝑆𝐷(𝑏𝑜𝑥) (square root law) E(X+C)=E(X)+C x SE of the SAMPLE MEAN of n INDEPENDENT draws = 𝑆𝐷(𝑏𝑜𝑥)⁄ √𝑛 (square root law) 𝑁−𝑛 E(CX)=CE(X) = √ x SE of the SAMPLE SUM of a SIMPLE RANDOM SAMPLE of size n = √𝑁 − 𝑛 ⁄𝑁 − 1 ∙ √𝑛 ∙ 𝑆𝐷(𝑏𝑜𝑥) 𝑁−1 E(X+Y)=E(X)+E(Y) x SE of the SAMPLE MEAN of a SIMPLE RANDOM SAMPLE of size n = √𝑁 − 𝑛 ⁄𝑁 − 1 ∙ 𝑆𝐷(𝑏𝑜𝑥) ⁄√𝑛 x SE of the SAMPLE PERCENTAGE of a SIMPLE RANDOM SAMPLE of size n from a box of tickets, each labeled either "0" or "1" = √𝑁 − 𝑛 ⁄𝑁 − 1 ∙ √𝑝(1 − 𝑝) ⁄𝑛 (at most 𝑓 ∙ (50%⁄ √𝑛)) (( p × (1−p) )½ / n½ with replace) Normal Approximation = Approximates probability by the area under part of a special curve, the normal curve (in SU) Binomial Probability Histogram area (in bins) is closest to the area under the normal curve when p is close to 50% and far from 0 and 100% x ↑ sample size, normal approx. is more accurate. Mean, SD of ticket #s do not influence how large a sample is needed. Skewness does. Central Limit Theorem = Asserts normal approximations to the probability distributions of the sample sum and mean improve as # draws grows, no matter what #s are on the tickets. Normal curve approximates it well if the sample size is large and p=50% or is not too close to 0% or 100%. Accuracy does not depend on # tickets or mean or SD of tickets. Chebychev’s Inequality for Random Variables = limits probability that a random variable differs from its EV by multiples of SE x 𝑃(|𝑋 − 𝐸(𝑋 )| ≥ 𝑘𝑆𝐸(𝑋 )) ≤ 1⁄ 𝑘 2 Markov’s Inequality for Random Variables = limits probability that a non(-) random variable exceeds any multiple of EV x a>0, 𝑃(𝑋 ≥ 𝑎) ≤ 𝐸(𝑋)⁄ 𝑎 Sample Mean & Percentage (φ) = unbiased estimators SU for Random Variables = 𝑋 − 𝐸(𝑋)⁄ 𝑆𝐸(𝑋) Sample with Replacement Continuity Correction = area under a point=0, so look at -0.5 and +0.5 of the range x 𝑆𝐸(𝜑) ≤ 𝑓 ∙ 50%⁄ √𝑛 Sample SD (divides by (n-1) before taking sqrt) = s = √(𝜑 ∙ (1 − 𝜑) ∙ 𝑛)⁄ (𝑛 − 1) x 𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑒 𝑜𝑓 𝑆𝐸(𝜑) = 𝑠 ∗⁄ √𝑛 = √𝜑 ∙ (1 − 𝜑) ⁄√𝑛 * ∗ Relationship between s and s → 𝑠 = 𝑠 ∙ √𝑛⁄ 𝑛 − 1 x 𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑒 𝑜𝑓 𝑆𝐸(𝑠𝑎𝑚𝑝𝑙𝑒 𝑚𝑒𝑎𝑛) = 𝑠⁄ √𝑛 = s > s* but they closest when n is large Variance = SD2 2 2 √(𝜑 ∙ (1 − 𝜑) ∙ 𝑛)⁄(𝑛 − 1)⁄√𝑛 s = sample standard deviation s = SD when drawing w/ replacement Sample without Replacement s* = standard deviation of the sample s2 > SD2 when drawing w/o replacement x 𝑆𝐸(𝜑) ≤ 50%⁄ √𝑛 Confidence Level = long-run fraction of intervals that contain the parameter in repeated sampling. ↑ longer interval; ↓ shorter interval. Coverage probability of procedure before data is collected. x 𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑒 𝑜𝑓 𝑆𝐸(𝜑) = 𝑓 ∙ 𝑠 ∗⁄ √𝑛 = 𝑓 ∙ √𝜑 ∙ (1 − 𝜑) ⁄√𝑛 Common levels are 68%, 90%, 95%, 99% x 𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑒 𝑜𝑓 𝑆𝐸(𝑠𝑎𝑚𝑝𝑙𝑒 𝑚𝑒𝑎𝑛) = 𝑓 ∙ 𝑠⁄ √𝑛 = Co...


Similar Free PDFs