MEC 471 Exam 1 Review Guide PDF

Title	MEC 471 Exam 1 Review Guide
Course	Empirical Techniques for Industry Analysis
Institution	Washington University in St. Louis
Pages	15
File Size	608.7 KB
File Type	PDF
Total Downloads	28
Total Views	147

Preview

CLICK TO PREVIEW PDF

Summary

MEC 471 with Hickman review guide for exam 1...

Description

Empirical Techniques for Industry Analysis -Experiment: any process with random outcomes -Sample space: set of all outcomes possible from an experiment -Random variable: a function mapping a sample space S to the real line (S  R)  Ex. S = yes, no – X = 1 if yes, 0 if no -Discrete RV: finite # of possibilities  S is a countable set -Continuous RV: can take on a continuum of values and has no mass points  Can take on each value with probability 0 -Mass point: a single outcome that occurs with positive probability  Ex. probability study 0 hours -X = random variable  x = one specific outcome, # -Pr(A) = # times observe A / # times run experiment -Probability mass function  Prob >= 0  Sum of probabilities = 1  Only for discrete RV o If give .5  P = 0 -Probability density function  For continuous RV – not actually a probability since no mass points  Area underneath is 1 (integral)  Prob >= 0 -A function of a RV is also a RV itself  Ex. RV X, Y = g(X) -Cumulative distribution function  Limit as x  -infinity = 0, limit as x  1 is positive infinity  Right continuous  Non decreasing function  Asymptote at 1  Same for discrete and CT -Moments: a number that summarizes some aspect of a distribution, usually derived as an expectation of some function X  Constant – one number that summarizes one aspect of a distribution -Raw expectation / average / mean – central tendency  Discrete: E[X] = summation: x*probability x  A constant  E[X + Y ] = E[X] +E[Y] o Expectation of sum is sum of expectations 2  E[X ] = X2 * P(X)  Jensen’s equality o If g concave  g(E[X]) >= E[g(x)) – biased downward o G linear  E[a + bX] = a + bE[X] -Bernoulli distribution

 q = success probability  E[X] = q  E[X]2 = 02*(1-q) + 12 * q -Variance – dispersion  Average distance of an outcome from the mean  VAR[X] = E[ ( X – E[X] )2 ] = (X – Mean)2 * probability o Mean is a fixed constant o Must 2 or else +/- variables will cancel  Variance of a constant = 0  VAR[ a + bX] = b2 VAR[X] o Adding a constant has no impact o Multiplying does, makes distance larger/smaller o Proof: y = a +bX and use E[X] rules  Standard deviation = sqrt(VAR) – can use units  VAR[X | Y = 8] = E [ (X – E[X | Y = 8] )2 | Y = 8]  VAR[X] = E[X2] – E[X]2 -More moments  Centralized: E[ (X – E[X])r  Standardized moment: centralized / standard deviation -Skewness: more mass left or right of mean = centered moment, r = 3 -Kurtosis: fattiness of tails, how likely are extreme events, r = 4 -Variance: r = 2

-2D RV  PMF with X and Y  Matrix of probabilities X and Y take on different values  Joint PMF value – ex. P(X= 2) AND P(Y = 0) -Joint distribution: random vector of individual randomness and link in randomness -Marginal distributions  Sum of probabilities of each outcome

-Conditional distribution  P( Y | X ) = P(X, Y) / P(X)  Conditional expectation = summation P(Y|X)*Y

-Non independent conditional distribution  P ( X & Y ) = P( Y | X ) * P(X)

-Law of iterated expectations: E[E[Y | X] = E[Y] -Conditional variance  VAR [Y | X] = E[ Y – E[Y | X]2 | X] -Independence: variable is totally uninformative about likely values of another variable  P(Y | X) = P(Y)  If X and Y independent: E[Y | X] = E[Y] o Expected value of Y does not depend on X  When X changes, the mean, variance, skewness, etc. of Y do not change o Var [Y | X] = E[Y]  E[XY] = E[Y] E[X] -Mean independence (weaker)  As X changes, the mean does not change, but the variance, skewness, etc. of Y might change  E [ Y |X ] = E [ Y] = c  E[XY] = E[X]E[Y]  Implied by independence -Covariance  Amount of linear dependence between two random variables  Positive covariance = two RVs move in same direction  Cov = E [ (X – E[X] ) (Y – E[Y] ) ]  Cov = E[XY] – E[X]*E[Y]  Cov(X, X) = VAR(X)  Cov(a + bX, Y) = b Cov(X, Y) ~~ + no impact, * does  Cov(ax + bz, Y) = aCov(x,y) + bCov(z, y)  If variables independent (or mean independent), covariance = 0 o E[Y | X] = E[Y]  Cov(X, Y ) = 0 o Independent  covariance 0 o Covariance 0 /= independent -Note: VAR [ X + Y ] = VAR[X] + VAR[Y] + 2Cov[X, Y]  If independent, VAR [ X + Y ] = VAR[X] + VAR[Y]

-Correlation  Relationship between two variables  Always between -1 and 1 o 1: always move together in same direction  Unit free  Mean / Independent  Corr(X,Y) = 0  Corr = 0 if covariance = 0

Statistics Review -Random sample: collection of independent and identically distributed (IID) observations  Independent: collection of 1st observation doesn’t impact the 2nd o Ex. not independent if pick a card and don’t replace -θ: parameter  Fixed natural constant that governs the randomness of X  Never directly observed, never changes  Only see the samples generated by θ -Estimator for θ  Mathematical rule which maps a random sample into a best guess at the true value of the parameter θ  Function of random data  RV -3 important estimators: all differ in how formalize the best guess notion 1. Least squares: best guess is to minimize the distance between the model and the data from the pick of values from the estimated parameter a. Minimizes predictive error for the empirical version of the underlying population model of the RVs - best fit line through data b. Min: Summation (xi – XQ)2 , take derivative i. U = SSR = summation U2 c. Minimize sum of square deviations, minimize role of Us d. Requires the least assumptions, most undisciplined e. U = yi – y bar

2. Method of moments: analogy principle – start by writing down a theoretical equation involving population moments and the parameter of interest θ, then substitute the analogous sample moments for the population moments a. h(E [X]) = g(θ)  h(XQ) = g(θ^)  solve for θ^ b. Replace population moment with a sample counterpart c. Freedom to only study parts of model, 3. Maximum likelihood: likelihood principle – choose value of estimated θ to maximize the probability of observing our actual data set

a. b. c. d.

Value of θ that makes the likelihood of the observed data the largest Usually consistent, sometimes unbiased Can be minimum variance unbiased estimator Take log of function (log likelihood function) and then derivative i. Ln(a*b) = ln(a) + ln(b) ii. Ln(ab ) = bln(a) e. Must fully specify entire form of underlying random law before begin, need strong assumptions, opposite extreme of OLS -When take derivative / FOC: switch from parameters to estimator  Summation N * E[X] = E[X]*E[X]*… = N*E[X]

Properties of Estimators Note: can pull summation and constants out of E and VAR  Summation N = N -Bias  E(θ^) = θ o Probability distribution X has an expected value equal to the parameter it is supposed to be estimating o If could draw infinity samples and take average, would get θ  Bias = E(θ^) - θ  On average, will we get the answer right?  If too high  biased upward  Ex. on average, does XQ = M  Sample mean and variance are unbiased  For variance, n-1 unbiased, n biased downward (E < parameter) -Efficiency  Prefer smaller variance of unbiased estimators = more efficient  Concentrated around truth  Precisely wrong vs. vaguely right  Achieve maximum precision or another estimator is closer to truth on average

-Mean square error (MSE)  How far average estimator is away from θ  Allows for possibility of estimator being biased o For two biased estimators: lower MSE = more efficient  MSE(θ^) = E[ ( θ^ - θ )2 ]  MSE(X) = Var(X) + (Bias(X))2 -Consistency  Does estimator eventually pin down the true θ as N  infinity  As increase N, less and less values outside tolerance zone (E)  As N  infinity, become more centered around true θ  Convergence in probability: An  a OR limit ninfinity An = a  Consistency: θ^ PR θ  Don’t want margin of error > tolerance level  If unbiased, as n  infinity, if var  0 will become consistent  Minimum requirement\  All of mass collapses to a single point o Sequence loses all randomness in the limit, with all probability mass collapsing on top of a single number a

-Sampling distribution  What is the shape of the density of our estimator?  If xns normally distributed  XQ normal o Only need to know mean and variance to know distribution o As n increases, distributions align  Lindenberg Levy CLT o Converges in distribution – randomness settles down to a stable limit



o Sequence retains its randomness in the limit, but its randomness settles down to a stable form o Distribution of the sample mean XQ is approximately normal as long as the underlying RV X has positive and finite variance Slutsky’s theorem o An D A , Bn D b o An + Bn D a + b

-Central limit theorem: the average from a random sample for any population, when standardized, has a standard normal distribution -Weak law of large numbers: for any random variable X if E[x2]/Var(x) < infinity (i.e. is finite), then  XQ  E[X] = M  Consistent  Can get close to parameter by choosing a large N -Continuous mapping theorem  If 2 RVs converge in probability to a and b  Continuous function  Cn = g(An, Bn)  g(a, b)  Continuous functions preserve converge in probability -CMT + WLLN  Individual pieces converge so overall function converges  St dev, correlation -Path to Inference  Derive an estimator: 3 methods  Investigate estimator properties: how good is it – bias, variance, consistency, sampling distribution  Inference / hypothesis testing Hypothesis Testing 1. H0 = null hypothesis, = 2. Ha = alternate hypothesis, > < /= 3. T statistic 4. Significance level 5. T table: find t statistic align with α  CV a. N-1 degrees of freedom 6. Reject: t > c or t < -c a. Two tailed: reject if |t| > c

-Errors  Cutoff for when difference is too large for random variation  Type 1 : reject H0 when it is true  Type 2: fail to reject H0 when it is false -P value  Largest significance level could use and still fail to reject, indifferent  Multiply by 2 for two sided  P < significance level  reject H0  Find t statistic within table for given df and look at corresponding sig level  When graph t, area above  If null true, P% chance of observing a value as large as T  small p value is evidence against -Confidence interval  90 = 1.64, 95 = 1.96, 99 = 2.58  95% chance that the confidence interval calculated contains the true population mean  For nonnormal: +/- se* XQ  Outside interval = reject H0  One sided: one bound is infinite -Normal distribution  X ~ N( M, σ2 )  Bell curve: symmetric about mean  Z = (X – M) / σ  As n  infinity, distribution of any well behaved random variable looks more like standard normal -T distribution  N – 1 degrees of freedom  More variance and fat in tails compared to normal  Converges in distribution to normal as n  infinity -F distribution  M / n degrees of freedom  As n  infinity becomes chi squared -Chi squared: X2  M degrees of freedom Econometrics -Statistics: random relationships, machine learning / predictions / correlations, applied mathematics – need observational data only -Econometrics: causal relationships, deterministic (if X then Y must follow), need observational data and behavioral theory of choice – tease out confounds Simple Linear Regression -Population regressor function: Yn = B0 + B1xn + Un  Y = dependent, outcome

X = independent, explanatory, regressor U = noise, unobserved o Injects noise into the link between X and Y – can fix if n  infinity o X and U may be correlated  endogeneity, cannot fix with large N  N = observation index  B0 = intercept – unobserved, pop parameter  B1 = slope - unobserved, pop parameter -What is the meaning behind PRF  Predictive only, not causal o E[ Y | X ] = B0 + B1xn o U = mechanical residual => U = Y - E[ Y | X ] o Derivative E[ Y | X ] / X = B1 o U does not have a life of itself o Average value of Y changes by B when X increases one unit o Cov(x, U) = 0  model always identified  Causal model o U does have a meaning apart from X o U = observed factors causing shifts in y o Derivative y with respect to x = B1  deterministic causal link -Model identification  Data contain enough information to formulate a unique educated guess at the parameter values B0 and B1 -Conditions in SLRM for identification  A0: Model is linear in parameters  A1: 0 < VAR (X) < infinity  A2: Exogenous regressors: E[U] = 0 and E[XU] = 0 ~ mean independent o  Cov(X, U) = 0 o Regressors uncorrelated with error term o Can always force E[U] = 0 to be true o Predictive model  Using U = Y - E[ Y | X ]  Cov(X, U ) = 0 o Causal model  Cov(X, U ) =/ 0  X and U (ex. work ethic) move in tandem  A2I: Identified o B0 = E[Y] – B1E[X] o B1 = Cov(X, Y) / Var(X) o If X endogenous  Cov(X, U) = M  B1 = Cov(X, Y) – M / Var(X) -Method of moments estimators  

= Corr(Sx /Sy)

-Residual: yi – y^i  Ordinary least squares minimizes sum of squared residuals -Regress dependent variable on independent -Change units: dependent coefficient *, independent only B1 * Ch. 9 Watson: How to diagnose endogeneity problems -External validity: inferences and conclusions can be generalized from the population and setting studied to a population of interest  Differences in populations: humans to rats for medicine  Differences in settings: drinking in US vs. India b/c differences in legal punishment -Internal validity: statistical inferences about causal effects are valid for the population being studied  Estimator of causal effect (OLS) is unbiased and consistent  Standard errors yield confidence intervals with desired confidence level o Hypothesis tests have desired significance level and confidence intervals have desired confidence level o Ex. if have a 95% confidence interval, should contain true population slope with probability 95% over repeated sample Forms of endogeneity, 2-5 are special cases of 1  E [ U | X ] /= 0  not mean independent  Cov(X, U) /= 0  Make impossible to pin down B1, B0 using data 1. Omitted variable bias  When a variable that determines Y and is correlated with one or more regressors is omitted from regression  Persists even in large samples  Define y = B0 + B1xn + Un ~~ where Un = B2xn + V o Cov(X, V) = 0  Cov(X, U) = ? = B2 Cov(X1, X2)  Solutions o Include variable if have data o Include control variable  Caution: adding a variable that doesn’t belong (regression coefficient = 00 reduces precision of other coefficients  Tradeoff: bias and variance of coefficient of interest o Use data where observational unit observed at different points in time  Panel data – ex. collect in 1995 and 2000  Can control for unobserved omitted variables, as long as don’t change over time

o Use instrumental variables regression: relies on a new variable o Randomized control experiment  X will be distributed randomly of U 2. Misspecification of functional form  If true population regressor function nonlinear but estimated regression linear  Type of omitted variable bias  Detect by plotting the data and use a different form  Solution: continuous or discrete dependent variable 3. Measurement error and errors in variable bias  Error in the measurement of the independent variable  Occurs even in large samples  Sources: respondent gives incorrect answer, data entry error, ambiguous question  Errors in variables can result in correlation between regressor X and error term  Proof o X~i= mismeasured value of X = xi + w o Xi = true value of X o Y = B0 + B1X~i + B1(Xi - X~i) + ui o Vi = B1(Xi - X~i) + ui o Xi - X~i = measurement error o With measurement error, typically X~i is correlated with ui  B1 biased o Size of bias depends on measurement model – attenuation bias o Classical measurement error model  B1 biased toward 0 even in large samples  V = noise  Assumes Corr(Xi, vi) = 0  If there is pure noise added to Xi, then B1 is biased towards zero (closer to zero) o Best guess model:  Try to make a best guess estimate => X~i = E( Xi | Wi)  B^1 consistent but variance of B^1 is larger than would be without measurement error, unbiased, extreme model o If y underreported by 10%  B^1 biased upward by 10%  Measurement error in Y o If classical measurement error: error increases variance of regression and of B^1 but does not induce bias in B^1  Ex. random grading errors for exam o Errors in Y don’t produce bias as long as Cov(Error, X) = 0  Solutions o Get accurate measure of X through instrumental variables regression  Have another variable that is correlated with the actual value of X but uncorrelated with the measurement error o Develop model of measurement error

4. Missing data and sample selection: selection process influences availability of data  Data missing at random o Reduces sample size, but does not introduce bias, makes SE larger o Ex. randomly lose half of data  Data missing based on X o Reduces sample size, but does not introduce bias, makes SE larger o Ex. only used data where age > 18  Data missing because selection process influences availability of data and is related to dependent variable – missing based on values of Y or U o Creates correlation between regressors and the error term  bias and inconsistency o Sample selection bias o Sampled individuals in a way that is related to the outcome Y o Ex. heights of people entering basketball locker room o Ex. mutual funds that have been in existence for 10 years, funds with poor returns were eliminated  Solution: randomized controlled experiment 5. Simultaneous causality  Y causes X  biased, inconsistent  Ex. student teacher ratio and test scores – but government hires more teachers where low test scores so low test scores lead to low student teacher ratios  Xi = Y0 + Y1Yi + vi  Large U  large Y  large X  Correlation between x and error term  Solutions o Instrumental variables regression o Randomized controlled experiment -Incorrect standard errors: threat to internal  Even if OLS estimator is consistent and large n, inconsistent SE will produce hypothesis tests with size that differs form desired significance level  Heteroskedasticity: SE not a reliable basis for hypothesis testing o Errors heteroskedastic but computed with homoskedasticity formula  Correlation of error term across observations o Variables not independent across observations o Could be issue if did not use random sampling o Serial correlation if using panel data/time series, or geographical differences -Regression models can produce reliable forecasts even if their coefficients are not causal  To obtain estimates of causal effects: need to address threats to internal validity  To forecast: need externally valid model – explanatory power, coefficients precise o Care about R2 and fit, not interpretations of coefficients o Omitted variables bias doesn’t matter

-Chart  Slope, () = SE o T = slope / SE  Confidence interval o Estimated effect = change * slope o Standard error = change * () o CI = estimated effect +/- CV * SE  Columns: adding additional variables  Look at R2 to find the best model  To compare different units: standardize them  Changing a variable o Compare regression before and after using relevant variables o Coefficient * previous value – coefficient * new value Multiple Linear Regression -Population regressor function: Yn = B0 + B1Xn + B2X2 +…. BKXK + Un  K = explanatory variable  No matter how many K, always U -Vectors  Xn = [1 x1n x2n …. Xkn ] o (k+1) * 1 matrix  B = 1 * (k+1) matrix = B0, B1 … BK o Same for Y and U  Cant divide matrix  multiply by 1/A  I = multiplying by 1 o A-1 * A = I o A*I=A MLRM Assumptions  A0) No column in Xn is redundant - no multicollinearity o No independent variables are constant, no exact linear relationship among independent variables o Multicollinearity: high correlation between two independent variables o Columns are linearly independent set o Cant use 2 variables to get the third  A1) Linear in parameters o Linear function of parameters B  A2) E[ Un | Xn ] = 0 mean independent  A21) E[U] = 0, E[XU] = 0  A3) Homoskedasticity o VAR[Un | Xn ] = σn2 o U is variance independent of X

o U has the same variance given any value of explanatory variable  A4) No autocorrelation o Cov(Un, Um | X) = 0 o Variables independent across observations o Correlated across time – if high v...