Econ 203 Notes PDF

Title Econ 203 Notes
Author Tanvi Sharan
Course Economic Statistics II
Institution University of Illinois at Urbana-Champaign
Pages 19
File Size 722.3 KB
File Type PDF
Total Downloads 86
Total Views 150

Summary

Professor Armendariz, Econ 203 Winter OL notes ...


Description

12/24/19 Types of data ● Quantitative data → values are real numbers ○ Arithmetic calculations are valid ● Qualitative data → values are arbitrary names of possible categories ○ Categorical data ○ Calculation involve how many observations are in each category ■ Calculate the proportion of data that falls into each category ● Time series data → data collected across different points of time ● Cross-sectional data → data collected at a certain point in time ● Proportion → Use countif( ) Statistics Intro ● Descriptive Statistics → calculating summary characteristics of data ○ Summarize data from a population of a sample of it ● Inferential Statistics → using sample summary measures to estimate population characteristics ○ Data on the population is not available ○ Take a sample and use it to estimate the unknown population characteristics ● Statistical Inference ○ Estimation ■ Point estimator → draws inference about a population by estimating the value of an unknown parameter using a single point ■ Interval estimator → draws inference about a population by estimating the value of an unknown parameter using an interval ● Use intervals for precision about degree of certainty regarding the sample statistics proximity to the population parameter ○ Hypothesis testing ■ Testing a specific belief about the value of the parameter 4 steps for Hypothesis Testing ● Set up alternative and null hypotheses ○ Purpose: to determine if there is enough statistical evidence in favor of a certain belief about a population parameter ○ Alternative hypothesis is most important (H1) ■ Use >,,< → one tailed ● Calculate the test statistic → standardization formulas ○ Population mean with sigma known ■ z=(x(bar)-mu)/(sigma/sqrt(n)) ○ Population mean with sigma unknown ■ t=(x(bar)-mu)/(s/sqrt(n)) ○ Population proportion

■ z=(p(bar)-p)/(sqrt((p(1/p)/n)) Find critical values (rejection region method) or find the p-value (p-value method) ○ Rejection region → when doing it by hand ○ Critical value is usually given ■ Find using excel (normsinv,tinv) ○ P value → usually when using software ■ Find using excel (normdist,tdist) ■ P-value is the amount of evidence in favor of the alternative hypothesis ● Smaller the p-value the more evidence in favor of the alternative ● Reject H0 if the p-value is less than or equal to the significance level ■ General guidelines ● Overwhelming evidence ○ p 2 ■ Possibilities to investigate ● Error in recording the value ● Point does not belong in the sample ● Observation is valid ■ Solutions ● Delete it and re-estimate the model ● Or Keep it and recognize the impact it is having on the model



No serious multicollinearity ■ Multicollinearity → linear association between independent variables used in a regression

01/09/20 Applying the regression equation ● Y(hat) → predicted value of Y ○ Plug the given value into x and solve the equation ● Prediction interval and confidence interval ○ Prediction interval → for a particular value of y ○ Confidence interval → for the expected value of y

Se → sample standard deviation ○ Found once you calculate your residuals Basic Multiple Regression Model ●

● ●



The multiple regression model allows for more than one independent variable Graphing a model ○ Quadratic,simple → curve ○ Quadratic, multiple → parabolic ○ Simple → line ○ Multiple → plane Regression Diagnostics ○ Assumptions ■ Error term is properly distributed which means ● Probability distribution of E is normal, with a mean of zero ● Standard Deviation is constant for all values of X ● Set of errors associated with different values of y are all independent ■ Other assumptions, when violated can threaten the usefulness of results ● No unnecessary outliers

● No serious multicollinearity Remedying Violations ■ Non Normality or heteroscedasticity ● Transformations of the y variable [dependent variable] ○ y’ = ln(y) for (y>0) ■ Use when the Se increases with y ■ Or when the error distribution is positively skewed ○ y’ = y^2 ■ Use when Se is proportional to E(y) ■ Or when the error distribution is negatively skewed ○ y’ = y^½ for (y>0) ■ Use when S^2e is proportional to E(Y) ○ y’ = 1/y ■ Use when S^2e increases significantly when y increases beyond some value Non-independence of Errors ● Common with time series data ○ Autocorrelation ● Durbin-Watson Test ○ Detects first order auto-correlation between consecutive residuals in a time series ○



Two tailed test for first order auto-correlation ○ If d4-dL → first order autocorrelation exists ○ If d falls between dL and dU or between 4-dU and 4-dL → test is inconclusive



If d falls between dU and 4-dU there is no evidence for first order autocorrelation

Multicollinearity ○ Nearly always exists ○ Consider it serious if the correlation coefficient between any pair of independent variables exceeds .80 The Regression Modeling Process ● Process Steps ○ Develop a model that has a sound basis Theoretical and practical inputs into model formation ● Working group of experts brainstorming ● Literature review on factors influencing the variable of interest ○ Gather data for the variables in the model Gather data for dependent and independent variables Use a proxy if data can’t be found for exact variables ○ Draw the scatter diagram to determine whether a linear model (or other forms) appears to be appropriate ○ Estimate the model coefficients and stats using excel ○ Assess the model fit and usefulness using the model statistics Use the three step process Do the variables make sense? ○ Diagnose violations of required conditions. Try to remedy identified problems ○ Assess the model fit and usefulness using the model statistics ○ If the model passes the assessment Predict the value of the dependent variable Provide interval estimates for these predictions Provide insight into the impact of each independent variable on the dependent variable ● Coefficient of determination ○ R^2 → coefficient of determination ○ R^2=1-(SSE/SST) ○ Adjusted R^2 ●

● Standard Error of the Estimate





Se → standard deviation of the data points around the regression line

○ K → number of independent variables F-Test ○ Question → is there at least one independent variable linearly related to the dependent variable ○ Answer ■ H0: B1=B2=....=Bk=0 ■ H1: At least one Bi is not equal to zero ● If at least one Bi is not equal to zero the model is valid ○ Test stat ■ Formula ● t=bi-Bi/Sbi ○ Bi = 0 from the null hypothesis ○ Sbi → standard error of variable ● d.f.=n-k-1 ○ Partial F test ■ Consider individual t-test results ■ Hypothesis ● H0: B1=B2=....=Bi=0 ○ Bis refer only to those variables which were eliminated from the original regression ● H1: At least one Bi is not equal to zero ● Asking if you should keep all the eliminated variables outside of the model ■ Partial F Stat ● Test is always one-sided upper tail test

SSRf → from the full equation SSRr → from the reduced equation MSEf → full equation Kd → number of variables eliminated Reject H0 ○ Conclude that some coefficients from the variables you eliminated are non-zero and use the “full model” Intro to other modeling topics ● Curvilinear relationships ○ Three ways to identify a curvilinear relationship ● ● ● ● ●

■ ■ ■ ■

Theoretical basis or practical experience Curvature in data from scatter plot Curvature in data from a residual plot Polynomial model



Quadratic and cubic models



Write the model by including the lowest order term first ● If you reject the first regression then you need to run a new regression ^2 including the first regression, then again ^3 including the first 2 regressions, and again ^4 until you do not reject [p-value is significantly high] ○ Keep the model that goes up to one less Interaction terms ● Neither the slope nor the intercept between any of the x variables included in the interaction term and the y variable are constant



Qualitative Independent Variables ○ Indicator variable ■ Two values → 0 or 1 ○ To represent a qualitative variable that has m possible categories we must create m-1 indicator variables 1/16/20 Time Series Analysis and Forecasting ● Time Series → variable measured over time in sequential order ○ Detect patterns to help forecast future values ● Components of a time series ○ Long term trend ●

Trend → a long term relatively smooth pattern or direction, that persists usually for more than one year ■ Graph ● Solid Line → Long term trend ● Jagged line → Actual data ○ Cyclical Effect ■ Cycle → wavelike pattern describing long term behavior ● For more than one year ■ Seldom regular ■ Appear in combination with other components ■ Graph ● Oscillations that go up and down ● Each oscillation is a cycle ○ Seasonal effect ■ Exhibits short term (less than one year) calendar repetitive behavior ■ Graphs ● Peaks and Troughs ○ Random Variation ■ Irregular, unpredictable changes in the time series ● Tends to hide the other components Time Series Models ○ Additive Model ■ Yt = Tt + Ct + St + Rt ○ Multiplicative Model ■ Yt = Tt*Ct*St*Rt Smoothing Techniques ○ Moving Average ■ Calculated using an odd number of periods [K] ■ The moving average value designated for time period t ■ Example ● 3-period moving average (for period t) ○ (yt+1 + yt + yt-1)/3 ● 5-period moving average ○ (yt+2 + yt+1 + yt + yt-1 + yt-2)/5 ○ Removes more variation than the 3-period moving average ○ Exponential Smoothing ■ Two consecutive moving averages are centered by taking their average and placing it in the middle ■ Formula = St = WYt + (1-W)St-1 ● St = exponentially smoothed time series at time t ● Yt = time series at time t ● St-1 = exponentially smoothed time series at time t-1 ● W = smoothing constant ■





○ Where 0...


Similar Free PDFs