Statalecture - stata PDF

Title	Statalecture - stata
Course	Basic Econometrics
Institution	Royal Melbourne Institute of Technology
Pages	26
File Size	2.7 MB
File Type	PDF
Total Downloads	26
Total Views	161

Preview

CLICK TO PREVIEW PDF

Summary

stata...

Description

Basic Econometrics Introduction to STATA

Launch “Mydesktop” from within myRMIT:

Click on ‘Desktops’

Then on ‘RMIT Desktops’

Your desktop will look (something) like this. As if you’re running a windows PC in your browser.

From within the MyDesktop system, launch Chrome, navigate to Canvas (rmit.instructure.com) and to the Basic Econometrics Canvas site. Go to ‘Modules’, and at the bottom, you should find the ‘Datasets’ module. Click on ‘WAGE2.DTA’.

Click on ‘Download WAGE2.DTA’

Then save it somewhere easy to remember, for example in a folder called “Datasets” on the Desktop.

Next you can close Chrome and return to the beginning of “My Desktop”.

Click on the “Apps” icon in the top ribbon of MyDesktop and type “stata” in the search section on the top right.

The Stata icon should appear on the top left

Click on this and you will have launched Stata.

This is what the STATA application looks like. Don’t worry if it looks a bit complicated to begin with.

. Before anything else, start by opening a dataset (all Stata datasets are .dta files).

The ‘command’ to open the dataset has now appeared in the review pane on the left, and in the output pane in the middle of the STATA window.

The Variables pane on the top-right shows a list of variables, for each variable there is the name of the variable (which has to be one word with no spaces) and a label, which can be several words to describe the content of the variable. For example the variable ‘wage’ is labelled as ‘monthly earnings’ The Properties pane in the bottom right has some information about the dataset including the number of observations (sample size), 935, and the number of variables, 17.

Learning about your data Use the “Data” menu to run commands so you can learn more about your dataset.

First, use “Data – Describe data – Describe data in memory or in a file”

Which opens the following window. Here you can choose specific variables using the “Variables” drop-down list. But you can leave everything blank to select all variables, then click ‘OK’.

Now we have run the command “describe” which appears in the Review pane and the Output pane

The output from the ‘describe’ command appears in the output window. It just confirms the number of observations and variables, the variable names and labels, as well as some other bits of info.

Now run the ‘codebook’ command which is at ‘Data –Describe data – Describe data contents (codebook)’

This time, select one of the variables (eg ‘wage’) as the output from this command would be very long if we choose every variable. Click on the variable(s) you want to describe, then click ‘OK’

Again, the command (here ‘codebook wage’) appears in the Review pane and in the Output pane. In the Output pane, we can see some more detailed information about the ‘wage’ variable. For example, the ‘range’ gives us the maximum and minimum values, we have the mean (957.945) and standard deviation (404.361) as well as the percentiles. Recall the 50th percentile is the median (905).

Next we will use ‘Data – Describe data – Summary statistics’. Summary statistics is another name for descriptive statistics, including the mean, standard deviation etc

This time select a few of the variables by clicking on them in the drop down list, then click ‘OK’

The output from the ‘summarize’ command is a neat table of summary statistics for each of the variables that were selected. Each of the variables has 935 observations, that means there are no missing values for these variables. The table shows the mean, standard deviation and range for each variable.

Next, look at the ‘Data Editor (Browse)’ which can be found in ‘Data – Data Editor’ but there is also a shortcut icon for this in the

This command opens the Data Editor window where we can look at the raw data. So the first observation represents a person who earns $769 per month, works 40 hours per week, has an IQ of 93 etc. The tick-boxes by the variables list on the right hand side allow you to control which variable are shown (the default is all). Looking at the raw data is very important to understand quickly the features of the variables. For example, we can immediately see that many people report exactly 40 hours per week, we can see that ‘married’ and ‘black’ are dummy (binary) variables.

Now, lets quickly learn how to do a simple scatterplot. Use the ‘Graphics – Twoway graph’ command

Choose the ‘Create’ option.

Leave the ‘plot category’ and ‘type’ as they are and choose a ‘Y -variable’ and ‘X-variable’ from the drop-down lists. Choose your dependent variable (here it is ‘wage’) as the Y-variable, and one of the independent variables as the X-variable (here we choose ‘hours’). Then click ‘Accept’.

Here is the output (which opens in a new ‘Graph’ window). We can see the relationship between monthly earnings and hours worked is not a simple positive correlation, many other factors influence earnings.

Next, it is time to run a regression. Go to ‘Statistics – Linear models and related – Linear regression’.

Choose the ‘Dependent variable’ from the drop-down menu and one or more ‘Independent variables’ from the adjacent drop-down menu. Here I will choose ‘wage’ as the dependent variable and ‘hours’ as the independent variable. Then click ‘OK’.

Here is the output. We have estimated the following equation. Note that the constant (intercept) term is automatically included in the regression equation

( )

(

)

Is this a regression model that explains ‘wage’ well? We can from the t-statistic and p-value that the estimated coefficient on ‘hours’ is not statistically significantly different from zero. Looking at the R-squared, we can see that the

Let’s run a different regression, go to ‘Statistics – Linear models and related – Linear regression’ again, and select ‘lwage’ from the drop down list as the Dependent variable and ‘educ’, ‘exper’ and ‘tenure’ as independent variables from the second drop down list. To include a list of several independent variables like this, just click once on each variable and they will appear in the command in a line with spaces in between them ‘educ exper tenure. DO NOT include commas or other punctuation in-between the variables.

Here is the output. We have estimated the following model. ( ) (

) (

)

(

)

(

)

In this model we have a much higher R2 and all three slope coefficient estimates are statistically significant at the 1% level.

Now we will create a new variable ‘lhourlywage’ which will be the log of hourly earnings, then use this in a regression. We will use the ‘generate’ command to generate a series of new variables leading up to ‘lhourlywage’. First, use the ‘Data – Create or change data – Create new variable’ menu.

Type the new variable name (here ‘yearlywage’) in the ‘Variable name’ box. Next, type a simple mathematical expression in the box ‘Specify a value or an expression’. Use the names of existing variables where appropriate. Here I want to create a variable ‘yearlywage’ for yearly earnings, using the variable ‘wage’ which measures monthly earnings. So I have written ‘wage*12’ which means “multiply the variable ‘wage’ by 12”. After this, click OK to run the command.

The output will show ‘generate yearlywage = wage*12’

Next, we run the same command (from ‘Data – Create or change data – Create new variable’), again but create ‘weeklywage’ from ‘yearlywage’ by dividing by 52.

And then create ‘hourlywage’ from ‘weeklywage’ by dividing ‘weeklywage’ by ‘hours’ (which measures hours worked per week).

The output window now has three ‘generate’ commands listed.

Finally we run ‘Data – Create or change data – Create new variable’ one more time to create lhourlywage using the ‘ln(.)’ expression to take the natural log of the newly created variable ‘hourlywage’.

Next, I use the ‘summarize’ command (‘Data - Describe data – Summary statistics’) to get descriptive statistics for all of the different wage/earnings variables

Here is the output. This helps verify our calculations. Yearly earnings are about $11,500 and hourly wages are just over $5. Note that these data are from the US in the 1980’s.

Finally we run a regression (using ‘Statistics – Linear models and related – Linear regression’) using our newly created variable ‘lhourlywage’ as the dependent variable. Note it is at the end of the variable list.

Choose ‘educ’, ‘exper’, and ‘tenure’ as the independent variables again, then click ‘OK’

Here is the output. We have estimated the following model.  .121) (0.0072)

(0.0037)

(0.003)

Again, all of the estimated slope coefficients are statistically significant at the 1% level. The R2 is 0.121 indicating 12.1% of the variation in log(hourlywage) is explained by the education, experience and job tenure.

To copy the table of results into Excel we need to highlight the top half of the table and then use the “copy table” option in the ‘right-click’ menu (or from the Edit menu). Copy just the *top half* of the table first.

And paste into excel. Note you will have to have opened excel *inside* the MyDesktop system to do this.

Then use the ‘copy table’ option from the right-click menu (or from the Edit Menu) again to copy the bottom half of the table. (Note if you try and copy and paste the top half and bottom half together they won’t paste into Excel correctly.)

Then paste the bottom half below the top half into excel.

You can then format the table in excel to look more professional: format cells so that all numbers with decimals have 3 decimal points, bold the text, left-justify the text and numbers, remove the gridlines, add horizontal border lines....