Workshop 1- statistics CBS PDF

Title	Workshop 1- statistics CBS
Author	ha ngo
Course	International Business
Institution	Copenhagen Business School
Pages	5
File Size	254.6 KB
File Type	PDF
Total Downloads	13
Total Views	131

Preview

CLICK TO PREVIEW PDF

Summary

Download Workshop 1- statistics CBS PDF

Description

C OPENHAGEN B USINESS S CHOOL D EPARTMENT OF F INANCE C ENTER FOR S TATISTICS

Statistics Søren Feodor Nielsen August 29, 2019

First workshop The first problem focuses on basic JMP-use with graphs and summaries. The second problem recreates the test we did in the first lecture. Please note that even though the second problem refers to one of the “Web App”, it does not ask you to use the applet.

Forbes data-set 1. Start by reading the data into JMP: (a) Open JMP and then click File → Open... and find the data set, where you have placed it. (b) Select forbes.txt and choose Open as: Data using best guess before pressing return. When done, the data table should look exactly like figure 1; numbers are flushed right, categories flushed left. Note also the icons in the Columns-section on the left. If

Figure 1: Forbes data you haven’t managed to read in the data correctly, and you are sure that you didn’t 1

make a mistake, then try the forbes.comma.txt-file instead. Here you have to select Open as: Data with preview, change End Of Field from Comma to Space, press Next and then Import. 2. Construct a histogram over the market values of the companies in the data set (Analyze → Distribution, click on Market value and then on Y, Columns and OK. Turn the histogram so that it is horizontal rather than vertical by right-clicking on (or clicking on the ▾ in) the Distributions-header and choose Stack (there are at least three ways of turning a histogram). 3. The distribution of the market values is highly skewed to the right, which makes it very difficult to get any information out of the histogram. The usual way of handling this is to take logs. To do so, construct a new variable: (a) Go back to the data table, choose Cols → New Column... and give it a new (sensible!) name. (b) Next go to the Column Properties and select Formula. First find Log (natural logarithm, or “ln”) –or if you prefer Log10 (base 10-logarithm), but then some of the results will not match those in the Selected Solutions– in the Transcedentalsection, and then click on Market value. Press OK. 4. (a) Now make as histogram of the log-transformed market values. (b) Have a look at the summary statistics and compare them to the “empirical rule” (a quick calculation in your head will be quite sufficient). 5. Make a barplot for the sector-variable. (a) Start by making a “histogram” for Sector.Then right-click the Sector-header to turn it into a barplot: Histogram Options → Separate Bars. (b) Add a count axis (Histogram Options) and click on show percent. (c) Then change your mind and change the count axis to a percentage axis (Prob Axis) and replace the percentages by counts. Consider which is better and when you have made up your mind: (a) select the graph; in the menu at the top, which may be hidden in which case you should let your mouse hover over it, click on the “fat cross”( the graph

), use it to select

(b) right-click on the selected output and choose copy and then open (e.g.) Word and paste it. 6. It may be better to have the bars in your barplot ordered according to height (i.e. a Pareto plot). To do this (a) Go back to the data table, right-click on the sector variable-heading and pick Column Properties → Value Ordering. (b) Use the graph you have saved to decide a suitable ordering (Finance is the largest sector, so it should be moved up to the top of the list etc). (c) Then make a new barplot: The easy way to do this is to go back to the barplot you have made, right-click on the Distribution-header, choose Script → Redo Analysis. 7. Go back to the data table: 2

(a) Right-click on the sector variable-heading and choose Sort → Ascending to get the data sorted according to the sector-variable. JMP may complain and insist on opening a new data table-window; if so use the new window. (b) Now exclude all observations except those in the four largest sectors (largest in this data set): • Scroll down in the data table until you get to Hi-Tech (the fifth largest sector) and then mark all rows starting here all the way down.* • Right click on a selected row number and choose Exclude/Unexclude (not Hide and Exclude!). 8. Now make histogram and obtain summaries for market value stratified by sector: (a) Analyze → Distribution, let Market value be Y and put Sector into the by-box. (b) Compare the median market values of the 4 sectors. 9. Also make a contingency table and a mosaic plot to see how the grouped number of employees depend on sector: (a) Analyze → Fit Y by X with grouped number of employees as Y, Response and sector as X, Factor. (b) In the resulting contingency table, right click in the upper left corner and unselect Total % and Col %. 10. Go back to the data table and “unexclude” the excluded observations (right-click on an excluded row and choose Unexclude). 11. Construct a new variable, log-transformed sales (as in 3). 12. Select the Graph builder in the Graph-menu: (a) Put the log-transformed market values on the vertical axis and the log-transformed sales on the horizontal. You should now have a graph showing you how the logarithm of market value depends on the logarithm of sales. (b) Right click on the graph, choose Graph → Marker size and change this to 2. If the graph does not change, choose another value for Marker size. Click Done when done. 13. In the Graph menu select Legacy and then Overlay Plot. (a) Let the log-transformed market values be Y and log-transformed sales X. Let grouped employee be the grouping variable. Unselect Sort X. Press OK. (b) Right-click on the Overlay Plot-header and choose Arrange Plots → 3 Per Row. (c) Right-click on the Overlay Plot-header and choose Overlay Plots → Overlay Groups to get one graph of how the logarithm of market value depends on the logarithm of sales with different plotting symbols depending on the size of the company. Based on this graph, do you think the relationship between market values and sales depend on the size of the company? 14. Fit a regression of log of market value to log of sales: * This will only work as intended if you have managed to order the values of in the previous problem; JMP sorts according to the ordering of the variable.

3

(a) Choose Analyze → Fit Y by X, choose log of market value as Y, Response and log of sales as X, Factor. Press OK. (b) Right-click on the Bivariate...-header and choose Fit Line. Note the equation of the line.

Simulating coin tosses There are a set of “Web apps” for the book available online. One of these applets simulates coin tosses; Chapter 5 → Random Numbers, choose “Coin Flips”. Wanting to ensure that it works as intended, the lecturer tried it out one Saturday afternoon and found that after 700 tosses, the coin had ended up heads 374 times and tails 326 times. Not wanting to ask his students to use this app unless he was convinced it actually worked he decided to make a statistical test of the hypothesis that the probability of getting Heads is 50%. 1. Open a new data table in JMP (File → New → Data table). In the first column, write “Heads” and “Tails”. Make a new column (New column in the Cols-menu) and type in the observed data. It should look similar to the data table in figure 2 when done.

Figure 2: A small data table 2. To test the hypothesis that there is a 50-50 chance of each possible outcome (Heads or Tails): (a) Choose Analyze → Distribution; here the first variable you created should the be Y whereas the second should be Freq. Press OK. (b) Turn the histogram and make it into a barplot (if you have the time). (c) Right click on the header and select Test probabilities; write in the hypothesis value (0.5† ) and press Done to get the result. †

Use the appropriate decimal symbol, i.e. comma if your computer “speaks” Danish

4

The test statistic you get out is not the same as the one used in the first lecture (which is the one used in the book) but the test is the same: Same p-value, same conclusion. After having seen the p-value from the test and remembering that small p-values are evidence that the hypothesis is not true, do you think the applet works? 3. Also find a confidence interval for the proportion of Heads; right-click on the header, choose Confidence Interval.

Selected solutions Forbes data set 4. The empirical rule says that • 68% of the observations lie between 5.75 and 8.21 • 95% of the observations lie between 4.52 and 9.44 • almost all of the observations lie between 3.29 and 10.67 Compared to the quantiles the intervals contain something like 80%, 88-90% and 98%; not a very impressive performance for the empirical rule. 8. The median market values of the four largest sectors (in terms of number of observations in the data set) are Finance Energy Manufacturing Retail

606 779 1093.5 1001.5

9. The counts for the finance sector are 2, 8, 6, 1, 0 (ordered by number of employees); row percentages are 11.76, 47.06, 35.29, 5.88 and 0. 13. The regression equation is log market value = 1.339 + 0.743 ⋅ log sales

Does the Web App work? 3. The p-value is 0.0696. 4. The 95%-confidence interval for “Tails” is ]0.429; 0.503[

5...