Esame 20 Ottobre 2017, domande+risposte PDF

Title Esame 20 Ottobre 2017, domande+risposte
Author Allergy pc
Course Statistica / Statistics
Institution Università Commerciale Luigi Bocconi
Pages 6
File Size 358.1 KB
File Type PDF
Total Downloads 13
Total Views 54

Summary

FIRST PARTIAL EXAM OF STATISTICS 30001 (or 6045/5047) October 20, 2017 – A Course code Last name Degree program First name SOLUTIONS Class_____________ ID (Matr.) ______________ PROBLEM 1 (11 points) The following table contains data gathered at a hospital, regarding a sample of 10 newborn babies: S...


Description

FIRST PARTIAL EXAM OF STATISTICS 30001 (or 6045/5047) October 20, 2017 – A Course code Last name

Degree program First name

SOLUTIONS

Class_____________ ID (Matr.) ______________

PROBLEM 1 (11 points) The following table contains data gathered at a hospital, regarding a sample of 10 newborn babies:

Sum

CM 45 53 50.5 51 52 47 52.5 51 50 54 506

KG GENDER 2.8 F 3.3 F 3.2 F 3.1 F 3.2 M 2.9 F 3.5 M 3.2 M 2.9 M 3.6 M 31.7

CM2 2025 2809 2550.25 2601 2704 2209 2756.25 2601 2500 2916 25671.5

KG2 7.84 10.89 10.24 9.61 10.24 8.41 12.25 10.24 8.41 12.96 101.09

CM·KG 126 174.9 161.6 158.1 166.4 136.3 183.75 163.2 145 194.4 1609.65

CM: height at birth, in centimetres KG: weight at birth, in kilograms GENDER:gender of newborn

a) Indicate the types of the variables CM and GENDER. b) Is there a linear relation between the variables CM e KG? Respond by calculating an adequate index and commenting on the results obtained. c) Draw a histogram of the variable KG with 3 classes of equal width (suggestion: round the class width to one decimal place). d) Provide the definition of frequency density. In order to construct the histogram required in point (c), is it necessary to use the frequency density? Why?

Remember: if you attempt to solve this problem, any points you may have obtained in the online assignment will not be considered. e)

Knowing that the sample variance for the Weight in Kilograms of Male newborns is 0.077, determine whether the Weight variability observed in newborn is larger for males or females. Justify your answer through adequate numerical calculations.

a) CM is a numerical continuous variable on a ratio scale of measurement. GENDER is a categorical, qualitative nominal variable. b) Denote the variable CM by X and the variable KG by Y, then:  n  ∑x y  n  i= 1 i i 10  1609.65  sXY = -x y  = - (50.6 ⋅ 3.17 ) = 0.6255  10 − 1  10 n −1  n     

 n 2    xi n ∑ 10  25671.5  2 i= 1 -x = - 50.62  = sX =    n −1 n 10 − 1  10     

sY =

r=

 n 2  ∑y  n  i=1 i 2  10  101.09  -y = - 3.172  =   n −1  n 10 − 1  10     

7.544 = 2.7467

0.0667 = 0.2584

s XY 0.6255 = = 0.8813 s X sY 2.7467 ⋅ 0.2584

The linear correlation coefficient indicates a strong positive linear relation between the two variables. c) We wish to find k=3 classes of equal width. In order to determine the class width, we first calculate the range of the data: Range = Max – Min = 3.6 – 2.8 = 0.8

wi =

Range 0.8 = = 0.2667 ≅ 0.3 3 k

Absolute frequency (fi) 3 5 2

Relative frequency (pi) 0.3 0.5 0.2

Frequency density (ci) 1.000 1.667 0.667

ci

Class [2.8;3.1) [3.1;3.4) [3.4;3.7)

2.8

3.1

KG

3.4

3.7

d) The frequency density represents the proportion of observations per unit interval width. In order to draw the histogram required in point (c) it is NOT necessary to use the frequency density: because the classes have equal width, it is also possible to use the absolute or relative frequencies, since the height (and therefore the area) of the bars in the histogram would maintain the same proportion. e)

x

M

x

F

=

16.4 = 3.28 5

=

15.3 = 3.06 5

sM = 0.077 = 0.2775

sf =

5  46.99  − 3.062  =  5 − 1 5 

CVM =

0.2775 = 0.0846 3.28

CVF =

0.2074 = 0.0677 3.06

5 (9.398 −9.3636 ) = 0.043 = 0.2074 4

The observed weight variability is somewhat larger for male newborns than for females, as shown by the difference in the coefficients of variation of the two groups.

PROBLEM 2 (5 points) A given hospital has 2 birthing rooms. Based on data recorded last year, there is a 0.05 probability that both rooms are unavailable at any given time. a) What is the probability that, out of 10 women in the hospital who need a birthing room, at least one finds both rooms unavailable? b) Find the acceptance interval for the sample proportion of women who found both birthing rooms unavailable when they needed one (use z = 3). In the past month, out of 215 in the hospital who needed a birthing room, 20 have found both rooms unavailable. Based on these results, has the proportion of women in the hospital who found both birthing rooms unavailable when they needed one changed over the last month, with respect to the past year? Justify your answer.

a) The variable X = “Number of women, out of 10, who finds both birthing rooms unavailable” can be described using a Binomial model with parameters: Success probability, p = 0.05; Number of independent trials, n = 10. In other words

So,

X ~ Bin(10, 0.05 )

 10 P ( X > 0) = 1 − P ( X = 0) = 1 −   0.050 0.9510 = 1 − 0.9510 = 1 − 0.5987 = 0.4013  0

b)

First, we observe that p = 0.05 and np( 1 − p) = 215 ⋅ 0.05 ⋅ 0.95 =10.2125 > 9 , so we can say that

 p(1 - p)  Pˆ ≈ N p,  n  

So, the acceptance interval for the sample proportion can be obtained through the following formula:

ˆ ) =  p ± z σ ˆ  A. I .1− α ( P α P 2   Using z = 3 as indicated in the question and knowing that

σ Pˆ =

p(1 − p) = 0.0149 n

ˆ ) = ( 0.05 ± 3 ⋅ 0.0149) = (0.0054,0.0946) We get A.I .( P From the data of the past month, we have pˆ =

20 = 0.093 215

Since this value is contained in the acceptance interval, we can conclude that the proportion of women in the hospital who found both birthing rooms unavailable when they needed has NOT changed with respect to the previous year.

PROBLEM 3 (6 points) We wish to predict the weight (y), in Kg, of a newborn using a linear regression model, based on the height (x), in cm, registered at the last echography. For a sample of 30 newborn babies, some partial results regarding such weight and height were recorded and are given below: 30

∑x

30

30

i

i =1

= 1531 ;

∑y i =1

i

= 95.7 ;

∑x y i

i =1

i

30

30

i =1

i =1

= 4904.05 ; ∑ x i 2 = 78333.5 ;∑ y 2i = 307 .93 ; x min = 44 ; x max = 56

a) Calculate the coefficients of the regression line that explains the weight of a new born as a function of the height registered at the last echography. b) Provide the interpretation of each of the two coefficients calculated above. c) What would be the weight of a new born with a recorded height of 45 cm at the last echography? Is this forecast reliable? Why? a) Y = weight of a newborn (in Kg), X = height recorded at the last echography (in cm)

1531 = 51.0333 30 95.7 y= = 3.19 30 n 1 n 2 30  78333.5  2  s 2x = ⋅ -51.0333 2  = 6.9507  ∑ xi − x  = 29 30 n − 1  n i= 1    30  4904.05  s xy = ⋅ − 51.0333 ⋅ 3.19  = 0.6953 29  30  sxy 0.6953 b1 = 2 = = 0.1 sx 6.9507 b 0 = y − b1 x = 3.19 − 0.1⋅ 51.0333 = - 1.9133 x=

b) The slope coefficient of 0.1 indicates that a newborns who are 1cm higher on the last echography are expected to be, on average, 0.1 kg heavier at birth. The intercept, which corresponds to the expected weight at birth when the height in the last echography is 0 has no real meanning, since 0 is not within the observed range of values of X. c) If the height registered at the last echography is 45cm, the forecasted weight of the newborn would be:

yˆ 45 = b0 + b1 45 = − 1.9133 + 0.1⋅ 45 = 2.5867 Since the value x = 45cm is within the range of observed values, the forecast can be considered reliable.

PROBLEM 4 (6 points) We wish to study the relationship between the hospital and the childbirth method by analysing a sample of 146 new mothers in the course of the II and III trimesters of 2017. The following two-way table reports the observed joint absolute frequencies: II + III trimesters 2017 Hospital A Ospedale Hospital B Hospital C

Childbirth method Natural C-section Water birth 67 10 0 15 4 9 34 5 2

a) What is the definition of “statistical independence”? Using an appropriate graph, determine if the two variables Hospital and Childbirth method are statistically independent. b) Which hospital had the highest proportion of C-section births? Calculate such percentage for each of the three hospitals. c) What was the percentage of non-Natural births (C-section and/or water birth)? a) Statistical independence (see Newbold): two variables are statistically independent if there is no relationship between them. Stacked or side-by-side bar charts of the conditional frequencies are useful to assess the presence or absence of association between the variables. Generally, when conditional frequencies are similar across values of one of the variables, we say there is an absence of association (or absence of relationship) between the two variables being analyzed, in other words, evidence of statistical independence. In order to draw the graph, we first calculate the conditional frequencies. In this case, since the question gives no indication that one variable should be considered as a function of the other, either conditioning may be correct if properly justified. We present the distribution of the childbirth method given the hospital.

II + III trimesters 2017 Hospital A Ospedale Hospital B Hospital C

Childbirth method Natural C-section Water birth 87.0% 13.0% 0.0% 53.6% 14.3% 32.1% 82.9% 12.2% 4.9%

We present the stacked bar chart (a side-by-side bar chart would also be correct).

100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0%

0.0% 13.0% 32.1%

4.9% 12.2%

14.3% 87.0%

82.9% 53.6%

Hospital A

Hospital B

Hospital C

Hospital Childbirth method Natural

Childbirth method C-section

Childbirth method Water Birth

The plot indicates an association between the two variables: a change in the Hospital corresponds to a change in the distribution of the childbirth method.

b)

10 = 0.129870 → 12.9870% 77 Freq(C-section | Hospital B) = 4 = 0.142857 → 14.2857% 28 Freq(C-section | Hospital C) = 5 = 0.121951→ 12.1951% 41 The hospital with the highest proportion of C-sections was Hospital B, where 14.2857% of births were by C-section.

Freq(C-section | Hospital A) =

c) Freq(Non-natural birth) = 10 + 4 + 5 + 0 + 9 + 2 = 0.205479 → 20.5479% 146

PROBLEM 5 (3 points) A more detailed analysis of the data in problem (4), lead to the construction of the following two-way tables, in which the original sample has been divided according to the trimester in which the birth took place: II trimester 2017 Hospital A Ospedale Hospital B Hospital C III trimester 2017 Hospital A Ospedale Hospital B Hospital C

Childbirth method Natural C-section Water birth 58 7 0 12 2 5 32 4 1 Childbirth method Natural C-section Water birth 9 3 0 3 2 4 2 1 1

Which hospital had the highest proportion of C-section births in each of the two trimesters? Based on this results, what comments can you make regarding your answer to question (4-b)? Provide an adequate explanation. The conditional probabilities calculated before, in this case become: II TRIMESTER: Freq(C-section | Hospital A) = 7 = 0.107692 → 10.7692% 65 Freq(C-section | Hospital B) = 2 = 0.105263→ 10.5263% 19 Freq(C-section | Hospital C) = 4 = 0.108108 → 10.8108% 37 III TRIMESTER: Freq(C-section | Hospital A) = 3 = 0.25 → 25% 12 Freq(C-section | Hospital B) = 2 = 0.222222 → 22.2222% 9 1 Freq(C-section | Hospital C) = = 0.25 → 25% 4 In the II trimester, the hospital with the highest proportion of C-sections was Hospital C, where 10.8108% of births were by Csection, although the proportions are quite similar for the three hospitals. In the III trimester, the hospitals with the highest proportion of C-sections were Hospitals A and C, with a 25% each. In question (4-b), Hospital B was the one with the highest percentage of C-sections, while now it has the lowest percentage in each of the two trimesters. This effect is known as Simpson’s paradox (see Additional Material Document): The analysis of the aggregate data can lead to erroneous conclusions when other variables that are relevant to the analysis are ignored (the trimester, in this case)....


Similar Free PDFs