Title | MEC 471 Exam 1 Review Guide |
---|---|
Course | Empirical Techniques for Industry Analysis |
Institution | Washington University in St. Louis |
Pages | 15 |
File Size | 608.7 KB |
File Type | |
Total Downloads | 28 |
Total Views | 147 |
MEC 471 with Hickman review guide for exam 1...
Empirical Techniques for Industry Analysis -Experiment: any process with random outcomes -Sample space: set of all outcomes possible from an experiment -Random variable: a function mapping a sample space S to the real line (S R) Ex. S = yes, no – X = 1 if yes, 0 if no -Discrete RV: finite # of possibilities S is a countable set -Continuous RV: can take on a continuum of values and has no mass points Can take on each value with probability 0 -Mass point: a single outcome that occurs with positive probability Ex. probability study 0 hours -X = random variable x = one specific outcome, # -Pr(A) = # times observe A / # times run experiment -Probability mass function Prob >= 0 Sum of probabilities = 1 Only for discrete RV o If give .5 P = 0 -Probability density function For continuous RV – not actually a probability since no mass points Area underneath is 1 (integral) Prob >= 0 -A function of a RV is also a RV itself Ex. RV X, Y = g(X) -Cumulative distribution function Limit as x -infinity = 0, limit as x 1 is positive infinity Right continuous Non decreasing function Asymptote at 1 Same for discrete and CT -Moments: a number that summarizes some aspect of a distribution, usually derived as an expectation of some function X Constant – one number that summarizes one aspect of a distribution -Raw expectation / average / mean – central tendency Discrete: E[X] = summation: x*probability x A constant E[X + Y ] = E[X] +E[Y] o Expectation of sum is sum of expectations 2 E[X ] = X2 * P(X) Jensen’s equality o If g concave g(E[X]) >= E[g(x)) – biased downward o G linear E[a + bX] = a + bE[X] -Bernoulli distribution
q = success probability E[X] = q E[X]2 = 02*(1-q) + 12 * q -Variance – dispersion Average distance of an outcome from the mean VAR[X] = E[ ( X – E[X] )2 ] = (X – Mean)2 * probability o Mean is a fixed constant o Must 2 or else +/- variables will cancel Variance of a constant = 0 VAR[ a + bX] = b2 VAR[X] o Adding a constant has no impact o Multiplying does, makes distance larger/smaller o Proof: y = a +bX and use E[X] rules Standard deviation = sqrt(VAR) – can use units VAR[X | Y = 8] = E [ (X – E[X | Y = 8] )2 | Y = 8] VAR[X] = E[X2] – E[X]2 -More moments Centralized: E[ (X – E[X])r Standardized moment: centralized / standard deviation -Skewness: more mass left or right of mean = centered moment, r = 3 -Kurtosis: fattiness of tails, how likely are extreme events, r = 4 -Variance: r = 2
-2D RV PMF with X and Y Matrix of probabilities X and Y take on different values Joint PMF value – ex. P(X= 2) AND P(Y = 0) -Joint distribution: random vector of individual randomness and link in randomness -Marginal distributions Sum of probabilities of each outcome
-Conditional distribution P( Y | X ) = P(X, Y) / P(X) Conditional expectation = summation P(Y|X)*Y
-Non independent conditional distribution P ( X & Y ) = P( Y | X ) * P(X)
-Law of iterated expectations: E[E[Y | X] = E[Y] -Conditional variance VAR [Y | X] = E[ Y – E[Y | X]2 | X] -Independence: variable is totally uninformative about likely values of another variable P(Y | X) = P(Y) If X and Y independent: E[Y | X] = E[Y] o Expected value of Y does not depend on X When X changes, the mean, variance, skewness, etc. of Y do not change o Var [Y | X] = E[Y] E[XY] = E[Y] E[X] -Mean independence (weaker) As X changes, the mean does not change, but the variance, skewness, etc. of Y might change E [ Y |X ] = E [ Y] = c E[XY] = E[X]E[Y] Implied by independence -Covariance Amount of linear dependence between two random variables Positive covariance = two RVs move in same direction Cov = E [ (X – E[X] ) (Y – E[Y] ) ] Cov = E[XY] – E[X]*E[Y] Cov(X, X) = VAR(X) Cov(a + bX, Y) = b Cov(X, Y) ~~ + no impact, * does Cov(ax + bz, Y) = aCov(x,y) + bCov(z, y) If variables independent (or mean independent), covariance = 0 o E[Y | X] = E[Y] Cov(X, Y ) = 0 o Independent covariance 0 o Covariance 0 /= independent -Note: VAR [ X + Y ] = VAR[X] + VAR[Y] + 2Cov[X, Y] If independent, VAR [ X + Y ] = VAR[X] + VAR[Y]
-Correlation Relationship between two variables Always between -1 and 1 o 1: always move together in same direction Unit free Mean / Independent Corr(X,Y) = 0 Corr = 0 if covariance = 0
Statistics Review -Random sample: collection of independent and identically distributed (IID) observations Independent: collection of 1st observation doesn’t impact the 2nd o Ex. not independent if pick a card and don’t replace -θ: parameter Fixed natural constant that governs the randomness of X Never directly observed, never changes Only see the samples generated by θ -Estimator for θ Mathematical rule which maps a random sample into a best guess at the true value of the parameter θ Function of random data RV -3 important estimators: all differ in how formalize the best guess notion 1. Least squares: best guess is to minimize the distance between the model and the data from the pick of values from the estimated parameter a. Minimizes predictive error for the empirical version of the underlying population model of the RVs - best fit line through data b. Min: Summation (xi – XQ)2 , take derivative i. U = SSR = summation U2 c. Minimize sum of square deviations, minimize role of Us d. Requires the least assumptions, most undisciplined e. U = yi – y bar
2. Method of moments: analogy principle – start by writing down a theoretical equation involving population moments and the parameter of interest θ, then substitute the analogous sample moments for the population moments a. h(E [X]) = g(θ) h(XQ) = g(θ^) solve for θ^ b. Replace population moment with a sample counterpart c. Freedom to only study parts of model, 3. Maximum likelihood: likelihood principle – choose value of estimated θ to maximize the probability of observing our actual data set
a. b. c. d.
Value of θ that makes the likelihood of the observed data the largest Usually consistent, sometimes unbiased Can be minimum variance unbiased estimator Take log of function (log likelihood function) and then derivative i. Ln(a*b) = ln(a) + ln(b) ii. Ln(ab ) = bln(a) e. Must fully specify entire form of underlying random law before begin, need strong assumptions, opposite extreme of OLS -When take derivative / FOC: switch from parameters to estimator Summation N * E[X] = E[X]*E[X]*… = N*E[X]
Properties of Estimators Note: can pull summation and constants out of E and VAR Summation N = N -Bias E(θ^) = θ o Probability distribution X has an expected value equal to the parameter it is supposed to be estimating o If could draw infinity samples and take average, would get θ Bias = E(θ^) - θ On average, will we get the answer right? If too high biased upward Ex. on average, does XQ = M Sample mean and variance are unbiased For variance, n-1 unbiased, n biased downward (E < parameter) -Efficiency Prefer smaller variance of unbiased estimators = more efficient Concentrated around truth Precisely wrong vs. vaguely right Achieve maximum precision or another estimator is closer to truth on average
-Mean square error (MSE) How far average estimator is away from θ Allows for possibility of estimator being biased o For two biased estimators: lower MSE = more efficient MSE(θ^) = E[ ( θ^ - θ )2 ] MSE(X) = Var(X) + (Bias(X))2 -Consistency Does estimator eventually pin down the true θ as N infinity As increase N, less and less values outside tolerance zone (E) As N infinity, become more centered around true θ Convergence in probability: An a OR limit ninfinity An = a Consistency: θ^ PR θ Don’t want margin of error > tolerance level If unbiased, as n infinity, if var 0 will become consistent Minimum requirement\ All of mass collapses to a single point o Sequence loses all randomness in the limit, with all probability mass collapsing on top of a single number a
-Sampling distribution What is the shape of the density of our estimator? If xns normally distributed XQ normal o Only need to know mean and variance to know distribution o As n increases, distributions align Lindenberg Levy CLT o Converges in distribution – randomness settles down to a stable limit
o Sequence retains its randomness in the limit, but its randomness settles down to a stable form o Distribution of the sample mean XQ is approximately normal as long as the underlying RV X has positive and finite variance Slutsky’s theorem o An D A , Bn D b o An + Bn D a + b
-Central limit theorem: the average from a random sample for any population, when standardized, has a standard normal distribution -Weak law of large numbers: for any random variable X if E[x2]/Var(x) < infinity (i.e. is finite), then XQ E[X] = M Consistent Can get close to parameter by choosing a large N -Continuous mapping theorem If 2 RVs converge in probability to a and b Continuous function Cn = g(An, Bn) g(a, b) Continuous functions preserve converge in probability -CMT + WLLN Individual pieces converge so overall function converges St dev, correlation -Path to Inference Derive an estimator: 3 methods Investigate estimator properties: how good is it – bias, variance, consistency, sampling distribution Inference / hypothesis testing Hypothesis Testing 1. H0 = null hypothesis, = 2. Ha = alternate hypothesis, > < /= 3. T statistic 4. Significance level 5. T table: find t statistic align with α CV a. N-1 degrees of freedom 6. Reject: t > c or t < -c a. Two tailed: reject if |t| > c
-Errors Cutoff for when difference is too large for random variation Type 1 : reject H0 when it is true Type 2: fail to reject H0 when it is false -P value Largest significance level could use and still fail to reject, indifferent Multiply by 2 for two sided P < significance level reject H0 Find t statistic within table for given df and look at corresponding sig level When graph t, area above If null true, P% chance of observing a value as large as T small p value is evidence against -Confidence interval 90 = 1.64, 95 = 1.96, 99 = 2.58 95% chance that the confidence interval calculated contains the true population mean For nonnormal: +/- se* XQ Outside interval = reject H0 One sided: one bound is infinite -Normal distribution X ~ N( M, σ2 ) Bell curve: symmetric about mean Z = (X – M) / σ As n infinity, distribution of any well behaved random variable looks more like standard normal -T distribution N – 1 degrees of freedom More variance and fat in tails compared to normal Converges in distribution to normal as n infinity -F distribution M / n degrees of freedom As n infinity becomes chi squared -Chi squared: X2 M degrees of freedom Econometrics -Statistics: random relationships, machine learning / predictions / correlations, applied mathematics – need observational data only -Econometrics: causal relationships, deterministic (if X then Y must follow), need observational data and behavioral theory of choice – tease out confounds Simple Linear Regression -Population regressor function: Yn = B0 + B1xn + Un Y = dependent, outcome
X = independent, explanatory, regressor U = noise, unobserved o Injects noise into the link between X and Y – can fix if n infinity o X and U may be correlated endogeneity, cannot fix with large N N = observation index B0 = intercept – unobserved, pop parameter B1 = slope - unobserved, pop parameter -What is the meaning behind PRF Predictive only, not causal o E[ Y | X ] = B0 + B1xn o U = mechanical residual => U = Y - E[ Y | X ] o Derivative E[ Y | X ] / X = B1 o U does not have a life of itself o Average value of Y changes by B when X increases one unit o Cov(x, U) = 0 model always identified Causal model o U does have a meaning apart from X o U = observed factors causing shifts in y o Derivative y with respect to x = B1 deterministic causal link -Model identification Data contain enough information to formulate a unique educated guess at the parameter values B0 and B1 -Conditions in SLRM for identification A0: Model is linear in parameters A1: 0 < VAR (X) < infinity A2: Exogenous regressors: E[U] = 0 and E[XU] = 0 ~ mean independent o Cov(X, U) = 0 o Regressors uncorrelated with error term o Can always force E[U] = 0 to be true o Predictive model Using U = Y - E[ Y | X ] Cov(X, U ) = 0 o Causal model Cov(X, U ) =/ 0 X and U (ex. work ethic) move in tandem A2I: Identified o B0 = E[Y] – B1E[X] o B1 = Cov(X, Y) / Var(X) o If X endogenous Cov(X, U) = M B1 = Cov(X, Y) – M / Var(X) -Method of moments estimators
= Corr(Sx /Sy)
-Residual: yi – y^i Ordinary least squares minimizes sum of squared residuals -Regress dependent variable on independent -Change units: dependent coefficient *, independent only B1 * Ch. 9 Watson: How to diagnose endogeneity problems -External validity: inferences and conclusions can be generalized from the population and setting studied to a population of interest Differences in populations: humans to rats for medicine Differences in settings: drinking in US vs. India b/c differences in legal punishment -Internal validity: statistical inferences about causal effects are valid for the population being studied Estimator of causal effect (OLS) is unbiased and consistent Standard errors yield confidence intervals with desired confidence level o Hypothesis tests have desired significance level and confidence intervals have desired confidence level o Ex. if have a 95% confidence interval, should contain true population slope with probability 95% over repeated sample Forms of endogeneity, 2-5 are special cases of 1 E [ U | X ] /= 0 not mean independent Cov(X, U) /= 0 Make impossible to pin down B1, B0 using data 1. Omitted variable bias When a variable that determines Y and is correlated with one or more regressors is omitted from regression Persists even in large samples Define y = B0 + B1xn + Un ~~ where Un = B2xn + V o Cov(X, V) = 0 Cov(X, U) = ? = B2 Cov(X1, X2) Solutions o Include variable if have data o Include control variable Caution: adding a variable that doesn’t belong (regression coefficient = 00 reduces precision of other coefficients Tradeoff: bias and variance of coefficient of interest o Use data where observational unit observed at different points in time Panel data – ex. collect in 1995 and 2000 Can control for unobserved omitted variables, as long as don’t change over time
o Use instrumental variables regression: relies on a new variable o Randomized control experiment X will be distributed randomly of U 2. Misspecification of functional form If true population regressor function nonlinear but estimated regression linear Type of omitted variable bias Detect by plotting the data and use a different form Solution: continuous or discrete dependent variable 3. Measurement error and errors in variable bias Error in the measurement of the independent variable Occurs even in large samples Sources: respondent gives incorrect answer, data entry error, ambiguous question Errors in variables can result in correlation between regressor X and error term Proof o X~i= mismeasured value of X = xi + w o Xi = true value of X o Y = B0 + B1X~i + B1(Xi - X~i) + ui o Vi = B1(Xi - X~i) + ui o Xi - X~i = measurement error o With measurement error, typically X~i is correlated with ui B1 biased o Size of bias depends on measurement model – attenuation bias o Classical measurement error model B1 biased toward 0 even in large samples V = noise Assumes Corr(Xi, vi) = 0 If there is pure noise added to Xi, then B1 is biased towards zero (closer to zero) o Best guess model: Try to make a best guess estimate => X~i = E( Xi | Wi) B^1 consistent but variance of B^1 is larger than would be without measurement error, unbiased, extreme model o If y underreported by 10% B^1 biased upward by 10% Measurement error in Y o If classical measurement error: error increases variance of regression and of B^1 but does not induce bias in B^1 Ex. random grading errors for exam o Errors in Y don’t produce bias as long as Cov(Error, X) = 0 Solutions o Get accurate measure of X through instrumental variables regression Have another variable that is correlated with the actual value of X but uncorrelated with the measurement error o Develop model of measurement error
4. Missing data and sample selection: selection process influences availability of data Data missing at random o Reduces sample size, but does not introduce bias, makes SE larger o Ex. randomly lose half of data Data missing based on X o Reduces sample size, but does not introduce bias, makes SE larger o Ex. only used data where age > 18 Data missing because selection process influences availability of data and is related to dependent variable – missing based on values of Y or U o Creates correlation between regressors and the error term bias and inconsistency o Sample selection bias o Sampled individuals in a way that is related to the outcome Y o Ex. heights of people entering basketball locker room o Ex. mutual funds that have been in existence for 10 years, funds with poor returns were eliminated Solution: randomized controlled experiment 5. Simultaneous causality Y causes X biased, inconsistent Ex. student teacher ratio and test scores – but government hires more teachers where low test scores so low test scores lead to low student teacher ratios Xi = Y0 + Y1Yi + vi Large U large Y large X Correlation between x and error term Solutions o Instrumental variables regression o Randomized controlled experiment -Incorrect standard errors: threat to internal Even if OLS estimator is consistent and large n, inconsistent SE will produce hypothesis tests with size that differs form desired significance level Heteroskedasticity: SE not a reliable basis for hypothesis testing o Errors heteroskedastic but computed with homoskedasticity formula Correlation of error term across observations o Variables not independent across observations o Could be issue if did not use random sampling o Serial correlation if using panel data/time series, or geographical differences -Regression models can produce reliable forecasts even if their coefficients are not causal To obtain estimates of causal effects: need to address threats to internal validity To forecast: need externally valid model – explanatory power, coefficients precise o Care about R2 and fit, not interpretations of coefficients o Omitted variables bias doesn’t matter
-Chart Slope, () = SE o T = slope / SE Confidence interval o Estimated effect = change * slope o Standard error = change * () o CI = estimated effect +/- CV * SE Columns: adding additional variables Look at R2 to find the best model To compare different units: standardize them Changing a variable o Compare regression before and after using relevant variables o Coefficient * previous value – coefficient * new value Multiple Linear Regression -Population regressor function: Yn = B0 + B1Xn + B2X2 +…. BKXK + Un K = explanatory variable No matter how many K, always U -Vectors Xn = [1 x1n x2n …. Xkn ] o (k+1) * 1 matrix B = 1 * (k+1) matrix = B0, B1 … BK o Same for Y and U Cant divide matrix multiply by 1/A I = multiplying by 1 o A-1 * A = I o A*I=A MLRM Assumptions A0) No column in Xn is redundant - no multicollinearity o No independent variables are constant, no exact linear relationship among independent variables o Multicollinearity: high correlation between two independent variables o Columns are linearly independent set o Cant use 2 variables to get the third A1) Linear in parameters o Linear function of parameters B A2) E[ Un | Xn ] = 0 mean independent A21) E[U] = 0, E[XU] = 0 A3) Homoskedasticity o VAR[Un | Xn ] = σn2 o U is variance independent of X
o U has the same variance given any value of explanatory variable A4) No autocorrelation o Cov(Un, Um | X) = 0 o Variables independent across observations o Correlated across time – if high v...