Tutorial 3 - (with Answers) PDF

Title Tutorial 3 - (with Answers)
Author Nuzhat Khan
Course Business Data Analytics
Institution McMaster University
Pages 4
File Size 211.8 KB
File Type PDF
Total Downloads 316
Total Views 364

Summary

Tutorial 3Question 1In order to build a model between annual grocery expenses and income, we have the following data table on 5 individuals:Income Grocery Expenses 7000 5000 25000 9600 27000 10300 30000 15000 50000 25000 The model we are building, will estimate expenses based on income. X: Income Y:...


Description

Tutorial 3 Question 1 In order to build a model between annual grocery expenses and income, we have the following data table on 5 individuals:

a)

b) c) d)

e)

Income Grocery Expenses 7000 5000 25000 9600 27000 10300 30000 15000 50000 25000 The model we are building, will estimate expenses based on income. X: Income Y: Expenses The average values for income and expenses are $27,800 and $12,980, respectively. Also, the standard deviation values for income and expenses are $15,319.92 and $7,596.84, respectively. Assuming that the correlation coefficient between the two variables is 0.96423, using a linear regression equation model the relationship between the income and expenses What is the interpretation of the intercept and slope in this context Estimate the expenses for an income of 20000 If we know that an individual earns an income that is 0.8 standard deviations below the average income, how many standard deviations do you expect their predicted expense be from the average expense? What are the corresponding income and predicted expense values discussed in part (d)?

Answer:  = 𝑏0 + 𝑏1 𝐼𝑛𝑐𝑜𝑚𝑒 a) 𝐸𝑥𝑝𝑒𝑛𝑠𝑒𝑠

𝑏1 = 𝑟

𝑆𝑦 𝑆𝑥

=0.478142

𝑏0 = 𝑦 − 𝑏1 𝑥= 12980-0.478142*27800= -312.356  =− 312.356 + 0.478142 ∗ 𝐼𝑛𝑐𝑜𝑚𝑒 𝐸𝑥𝑝𝑒𝑛𝑠𝑒𝑠 b) 𝑏0 is the intercept, it is the value of the line when the x variable is zero. It can be used for initial prediction when x=0, but it is not meaningful in all cases. For example, in the present instance when income=0, we have a negative value. 𝑏1 is the slope representing the change in y or the response variable for every change in x. For every additional income unit, we expect expenses to grow by $0.47.  =− 312.356 + 0.478142 ∗ 20000 =9250.484 c) 𝐸𝑥𝑝𝑒𝑛𝑠𝑒𝑠

d) The regression equation will be, 𝑍󰆹𝑦 = 𝑟𝑍𝑥 We know from the problem that 𝑟 = 0.96423 and 𝑧𝑥 = −0.8. From the equation we will have, 𝑧𝑦 = 0.96423 × (−0.8) = −0.77 We expect the predicted expense will be 0.77 standard deviations below the average expenses. e) The corresponding 𝑥 value (income) for 𝑧𝑥 = −0.8 will be, 𝑥 = 𝑧𝑥 × 𝑠𝑥 + 𝑥 𝑥 𝑦 27800 12980 𝑆𝑦 𝑆𝑥 15319.92 7596.841 𝑥 = (−0.8) × 15319.92 + 27800 = 15544.064 Similarly, for the 𝑦 value (expense) we have, 𝑦 = (−0.77) × 7596.841 + 12980 = 7130.43243

Question 2 Consider the following contingency table (frequencies): 𝑨𝟏 𝑨𝟐

𝑩𝟏 10 20

𝑩𝟐 20 40

Calculate the following probabilities.

a) b) c) d) e) f) g) h) i) j)

𝑃(𝐴1 ) 𝑃(𝐵1 ) 𝑃(𝐴2 ) 𝑃(𝐴1 𝑎𝑛𝑑 𝐵1 ) 𝑃(𝐴2 𝑎𝑛𝑑 𝐵2 ) 𝑃(𝐴1 𝑜𝑟 𝐵2 ) 𝑃(𝐴1 |𝐵1 ) 𝑃(𝐴1 |𝐵2 ) 𝑃(𝐴2 |𝐵2 ) Are the two events 𝐴1 and 𝐵1 in dependent?

Answers: 10+20

a) 𝑃(𝐴1 ) =

10+20+20+40

b) 𝑃(𝐵1 ) =

10+20+20+40

10+20

= =

30 90 30 90

≅ 0.33 ≅ 0.33

c) 𝑃(𝐴2 ) = 1 − 𝑃(𝐴) = 1 − 0.33 = 0.67 or 𝑃(𝐴2 ) =

20+40 10+20+20+40

=

60 90

≅ 0.67

10

d) 𝑃(𝐴1 𝑎𝑛𝑑 𝐵1 ) = 10+20+20+40 = e) 𝑃(𝐴2 𝑎𝑛𝑑 𝐵2 ) =

40 10+20+20+40

=

10 90 40 90

≅ 0.11 ≅ 0.44

f) 𝑃(𝐴1 𝑜𝑟 𝐵2 ) = 𝑃(𝐴1 ) + 𝑃(𝐵2 ) − 𝑃(𝐴1 𝑎𝑛𝑑 𝐵2 ) = g) 𝑃(𝐴1 |𝐵1 ) = h) 𝑃(𝐴1 |𝐵2 ) =

10 10+20 20 20+40 40 20+40

≅ 0.33 or 𝑃(𝐴1 |𝐵1 ) = ≅ 0.33 or 𝑃(𝐴1 |𝐵2 ) =

30 90

𝑃(𝐴1 𝑎𝑛𝑑 𝐵1 ) 𝑃(𝐵1 ) 𝑃(𝐴1 𝑎𝑛𝑑 𝐵2 ) 𝑃(𝐵2 ) 𝑃(𝐴2 𝑎𝑛𝑑 𝐵2 ) 𝑃(𝐵2 )

20

60

+ 90 − 90 =

≅ ≅

0.11 0.33 0.22 0.67 0.44

≅ 0.33

𝑃(𝐴2 |𝐵2 ) =

j)

In order to investigate that we need to confirm if, 𝑃(𝐴1 𝑎𝑛𝑑 𝐵1 ) = 𝑃(𝐴1 )𝑃(𝐵1 ) We already have, 10 𝑃(𝐴1 𝑎𝑛𝑑 𝐵1 ) = 90



0.67

≅ 0.78

≅ 0.33

i)

≅ 0.67 or 𝑃(𝐴2 |𝐵2 ) =

70 90

≅ 0.67

30 90 30 𝑃(𝐵1 ) = 90 𝑃(𝐴1 ) =

It is confirmed that 𝑃(𝐴1 𝑎𝑛𝑑 𝐵1 ) = 𝑃(𝐴1 )𝑃(𝐵1 ) And therefore, the two events 𝐴1 and 𝐵1 are independent.

Question 3 From a survey conducted in Seattle, WA, only 20% of homes had A/C and 40% of homes had Internet access. Of all the homes 55% didn’t have access to either. Create a contingency table (with proportions/probabilities) to answer the following questions:

Yes Internet No Total

A/C Yes No 0.15 0.05 0.2

Total 0.25 0.55 0.8

0.4 0.6 1

a) What is the probability that a home had both A/C and Internet access? b) What is the probability that a home which had Internet access did not have A/C? Answer: a) 𝑃(has A/C AND has Internet) = 0.15 b) 𝑃(does not have A/C | has internet) =

0.25 0.4

= 0.625

Question 4 A linear regression of the price of external hard drives against their capacity in terabytes had a correlation coefficient of 0.994. What is the value of 𝑅 2 for this regression and how do you interpret its meaning? Answer: 𝑅 2 = 98.8%. 𝑅 2 is the squared correlation. Since r=0.9942 therefore, 𝑅 2 =0.988. It means that about 98.8% of the variance in the price of disk drives can be accounted for by the regression model of Price on Capacity....


Similar Free PDFs