Title | Exam 2018, questions and answers |
---|---|
Course | Data Analysis I |
Institution | Loughborough University |
Pages | 7 |
File Size | 212.7 KB |
File Type | |
Total Downloads | 55 |
Total Views | 132 |
Exam 2018, questions and answers...
ECA003 DATA ANALYSIS Specimen Examination Paper Attempt Question 1 (worth 50%) and TWO others Time Allowed: 2 hours
1.
You are provided with the following statistical information concerning the annual rate of U.K. unemployment (in %) between 1855 and 2004.
18
16
16
14
14 12 12 10
Observations 150
10 8 8 6
6 4
4
2
2
0
Mean Median Maximum Minimum Std. Dev. Skewness
4.9 3.9 15.6 0.4 3.4 0.8
0 0
5
10
15
Produce a short report discussing the general features of the distribution of unemployment rates. The following points might be made: (i)
(ii)
The range of values stretch from 0.4 to 15.9%. Note that there are 3 outliers of 14% or more: all other values are below 12%. The bulk of the values lie between approximately 2% and 7% (see boxplot). The mean (approx 5), being larger than the median (4), plus the positive skewness coefficient all imply that the distribution is skewed to the right (not surprising as the variable is bounded below by zero).
1
You are now provided with a scatterplot of unemployment (Y) and inflation (X) for this period, along with the following statistical information: Y 4.93
X 3.06
sY2 11 .22
s 2X 34.47
s XY 6. 71
16
UNEMP (Y)
12
8
4
0 -20
-10
0
10
20
30
INFL (X)
Compute the sample correlation coefficient between unemployment and inflation and construct and interpret a 95% confidence interval for the population correlation coefficient. r XY
s XY 6.71 0.341 s X sY 34.47 11 .22
To construct a confidence interval for XY we must first calculate 1 0. 341 f 12 log 0. 355 1 0. 341
and then the interval 0.355 1.96
1 0.355 0.162 150 3
2
f U 0. 193
i.e.
f L 0.517
Transforming back to correlations: U
L
exp 2 0.193 1 0.320 0.191 exp 2 0.193 1 1.680
exp 2 0.517 1 0.644 0.475 exp 2 0.517 1 1.356
Thus the 95% c.i. is 0.475 XY 0.191 . Since this does not contain 0 we conclude that Y and X are significantly negatively correlated. Compute the intercept and slope coefficients in the regression of Y on X. If the standard error of the slope coefficient is 0.044, test the hypothesis that the true value of the slope is zero, using the 5% level of significance. Explain how the result of this test is related to the confidence interval constructed above. The coefficients in the regression Y a bX are calculated as b
s xy s
2 X
6.71 0.195 34 .47
a Y bX 4.93 0.195 3.06 5.53
To test the hypothesis that the slope is zero, construct t
0. 195 4. 43 ~ t 148 0. 044
Since t 4.43 t 0.025 148 1.96 we can reject the null in favour of the slope being nonzero. This is consistent with finding that zero is excluded from the c.i. above. Can this regression model be used to establish that changes in inflation cause changes in unemployment? No, because correlation does not imply causation. 2.
The prices of different U.K fuels for 2000-2002 are shown below
3
2000 2001 2002
Coal
Gas
Electricity
Petroleum
128 134 141
104 107 114
106 105 105
195 185 179
The quantities consumed of each fuel in 2000 were: coal 3.5; gas 57.3; electricity 28.3; and petroleum 66.2. Calculate the Laspeyres energy price index for 2001 and 2002, based on 2000 = 100. The Laspeyres indices for 2001 and 2002 are L P2001 100
134 3.5 107 57.3 105 28 .3 185 66.2 97 .77 128 3.5 104 57.3 106 28 .3 195 66.2
L 100 P2002
141 3.5 114 57.3 105 28.3 179 66.2 97 .90 128 3.5 104 57.3 106 28.3 195 66.2
Calculate the Paasche price index based on the following quantities consumed and compare and contrast the two sets of indices.
2001 2002
Coal
Gas
Electricity
Petroleum
4.3 3.3
58.0 55.3
28.6 28.6
67.5 66.3
The Paasche indices for 2001 and 2002 are 134 4.3 107 58.0 105 28.6 185 67 .5 P 97 .79 P2001 100 128 4.3 104 58.0 106 28.6 195 67 .5 141 3.3 114 55.3 105 28.6 179 66.3 P P2001 100 97. 77 128 3.3 104 55.3 106 28.6 195 66.3
Both show falls in prices since 2000 but, because relative prices and quantities have remained fairly stable, little difference in the two indices is found.
4
3
(i) Experience has shown that 30% of all persons affected by a certain illness recover. A drug company has developed a new vaccine. Ten people with the illness were selected at random and injected with the vaccine; nine recovered shortly afterwards. Suppose that the vaccine was completely useless. What is the probability that at least nine out of ten injected by the vaccine will recover? Let X denote the number of people who recover. If the vaccine is worthless, the probability that a single ill person will recover is p 0.3 . With n 10 trials, Y is binomially distributed and P Y 910 C 9 0.3 9 0. 7 0.000138
10 0.310 0. 000006 PY P Y 9 0.000138 0.000006 0.000144
This probability is so small that either we have observed a very rare event or that the vaccine is indeed very useful in curing the illness: we would adhere to the latter point of view. (ii) Suppose that a policeman visits a given location on his beat randomly X 0,1, 2, times per half-hour and that, on average, he visits each location once per half-hour. Calculate the probability that the policeman will miss a given location during a half- hour period. What is the probability that he will visit it at least twice? X will be Poisson distributed with a mean of 1, P X r
Thus,
1r e 1 r!
P X 0 e 1 0. 368
P X 1 e 1 0. 368
5
P X 1 1 P X 0 P X 1 1 0.368 0.368 0.264
(iii) The marks on a maths exam are normally distributed with mean 60 and standard deviation 10. To get a first on the exam a mark of at least 70 must be obtained, while to pass the exam a mark of at least 40 must be obtained. What is the proportion of students that obtain a first and what is the proportion that fail? 2 The exam mark is X ~ N 60, 10 . Thus
70 60 P X 70 P z 1 0. 1587 10 40 60 P X 40 P Z 2 P Z 2 0. 0228 10
4
A sample of 16 observations has been taken from a normally distributed population and a sample mean of 50 and a sample standard deviation of 10 have been obtained. Construct 95% confidence intervals for the population mean and variance and test the null hypotheses that the population mean is 52 and the population variance is 250 using the 5% level of significance.
A confidence interval for the sample mean is calculated using the result that t n
X ~ t n 1 s
A 95% c.i. for is given by X t 0.025 15
10 50 2.1312.5 50 5.33 16
The distribution of the sample variance is s2 ~
2 2 n 1 n 1
6
Thus a 95% c.i. for 2 is 15 100 15 100 2 2 02.025 15 0.975 15 2 2 With 0. 975 15 6.262 and 0. 025 15 27.49
54.6 2 239.5
To test H 0 : 52 against the alternative H A : 52 , we calculate t0
52 50 0.8 t0.025 2. 093 2.5
so that we do not reject H 0 . 2 To test H 0 : 250 against the alternative HA : 2 250 , we calculate
02 15
100 6 250
2 2 Now, since 0 0. 975 15 6.262 , we may reject H 0 .
5
Write notes on THREE out of FIVE topics, a list of which might include (i) (ii) (iii) (iv) (v) (vi) (vii) (viii) (ix) (x)
Measures of location and dispersion. Decomposing a time series. Seasonal adjustment. Alternative types of index numbers. Alternative measures of inflation. Real and nominal variables. The Lorenz Curve and the Gini Coefficient. Sampling distributions and alternative methods of sampling. Properties of estimators. Relationships between distributions.
7...