ECON1193-Assignment-3A HN-G01 Group-5-1 PDF

Title ECON1193-Assignment-3A HN-G01 Group-5-1
Author Dẹo Lê
Course Business Statistics
Institution Royal Melbourne Institute of Technology University Vietnam
Pages 23
File Size 1.6 MB
File Type PDF
Total Downloads 791
Total Views 847

Summary

RMIT University Vietnam [ECON1193] Business StatisticsCASE STUDYFACTORS AFFECTING NUMBEROF DEATHS DUE TO COVID-Lecturer: Pham Thi Minh Thuy Students and Work Allocation:Student Name Student ID Parts Contributed Contribution % SignatureNgo Phuong Thao s3758242 Part 1, 2, 3, 4, 5 25% ThaoLuu Thi Bao T...


Description

RMIT University Vietnam [ECON1193] Business Statistics

CASE STUDY

FACTORS AFFECTING NUMBER OF DEATHS DUE TO COVID-19

Lecturer: Pham Thi Minh Thuy Students and Work Allocation: Student Name

Student ID

Parts Contributed

Contribution %

Signature

Ngo Phuong Thao

s3758242

Part 1, 2, 3, 4, 5

25%

Thao

Luu Thi Bao Tram

s3818235

Part 1, 2, 3, 5, 7

25%

Tram

Tran Ha Le

s3818001

Part 1, 2, 4, 6, 7

25%

Le

Truong Thuy Linh

s3818033

Part 1, 3, 5, 6, 7

25%

Linh

TABLE OF CONTENT I.

Data Collection

2

II.

Descriptive Statistics

2

1. Measures of Central Tendency

2

2. Measures of Variation

2

3. Measures of Shape - Box and Whisker Plot

3

III.

Multiple Regression

4

A. EU Countries

5

B. Middle East Countries

7

IV.

Team Regression Conclusion

10

V.

Time Series

11

1. Significant Models

11

2. Recommendation on the Best Model

13

VI.

Time Series Conclusion

15

VII.

Overall Team Conclusion

16

APPENDIX

19

REFERENCES

21

1

I.

Data Collection

The data are collected and shown in the Excel File. II.

Descriptive Statistics

1. Measures of Central Tendency

Table 1: Measures of Central Tendency for the numbers of COVID-19 deaths by April 23rd, 2020, distributed by region As can be seen from Table 1, the mean of number of COVID-19 deaths in the EU, which is 3347.519 deaths, is higher than the figure for the Middle East (ME) at 421.634. Hence, it can be concluded that on average there are more death cases in EU countries than in ME countries. Regarding the median, which is the middle value of the data set when putting the data in ascending order, half of the EU countries have more than 225 deaths, whereas, 50% of ME countries count only more than 13. In terms of the mode, while there is no mode in the EU, in ME, the mode is 7 deaths, meaning that the most frequent number of deaths in ME countries is 7. In this situation, outliers are spotted in both categories (see Figure 1). Therefore, the mean, which is affected by outliers, cannot be used to compare both categories. Moreover, the majority of data in each group differ from each other, the mode is not suitable, either. Thus, median is the best measure. The EU's median is nearly 20 times higher than that of the ME, indicating that COVID19 has a more negative impact on the population of EU countries compared with ME countries. 2. Measures of Variation

Table 2: Measures of Variation for the numbers of COVID-19 deaths by April 23rd, 2020, distributed by region

2

Measures of variation indicate how data are distributed or, specifically in this case, how the number of COVID-19 deaths among countries in each region vary from one another. While the range, or the gap between the minimum and maximum values of the data set, of the EU is around 25 thousand deaths, that of the Middle East is only slightly more than one fifths of the EU’s. That means the numbers of deaths of the EU’s countries spread quite widely, much more significantly than the already wide range of the Middle East. Apart from that, sample variance, standard deviation and coefficient of variation are the measures that the larger their values are, the more widely data spread around the mean, and standard deviation is the most common one to be focused on among the three. Again, both the EU and the Middle East’s standard deviations are quite large, and the one belonging to the EU’s numbers of deaths due to COVID-19 is substantially higher than that of the Middle East, which signifies the same idea the ranges tell. However, as the data of the number of COVID-19 deaths collected have outliers (see Figure 1), the above measures might not be effective enough to tell the data variation as they are affected by outliers, where interquartile range, an independent measure of outliers, should be the primary one to be considered. Interquartile range is the range of 50% middle values of an ordered data set. Although the interquartile range of EU’s data is still at the high level of nearly 2000, that of the Middle East is quite insignificant – 76 deaths. Similarly put, there is every likelihood that the EU’s countries’ COVID-19 mortality numbers are considerably different from one another, in contrast with the moderately comparable values of the Middle East’s countries. It might result from the lack of solidarity and political games among the EU’s countries that prevent them from making mutual efforts to deal with the coronavirus pandemic (Szucs 2020), while the Middle East countries remain good communication and risk management between states (Pietromarchi 2020). 3. Measures of Shape

Figure 1: Box and Whisker Plots for the numbers of COVID-19 deaths by April 23rd, 2020, distributed by region Figure 1 illustrates the distributions of total number of deaths due to COVID-19 of countries in European Union and Middle East region. It shows a highly right-skewed distribution of the total number of deaths in both regions due to the presence of extremely high total deaths belonging to 3

some countries. This indicates that there are more countries in both groups with lower than average numbers of deaths due to COVID-19, which is a positive news for both regions. When considering the position of the mean in relation to the boxes and whiskers, the EU’s mean death toll locates in the fourth quartile, pointing out that more than 75% of countries in the EU zone have total deaths below the region’s average. Meanwhile, the Middle East region has a less severe and deathly situation, with only one country having total deaths higher than the region’s average as the mean falls outside the Middle East’s boxplot. Moreover, the box-and-whisker plot of the Middle East is lower than the second quartile of the EU, which indicates that 13 of out 14 countries, excluding Iran, in the Middle East have lower death toll by COVID-19 than over 50% of the EU countries, which also shows a much more serious situation of COVID-19 in the EU than in the Middle East. III.

Multiple Regression

All dataset cases are applied: ● Dependent variable (DV): Total number of deaths due to COVID 19 between January 22 and April 23, 2020 ● Independent variables (IV): -

Average rainfall (in mm)

-

Average temperature (in Celsius)

-

Hospital beds (per 10,000 people)

-

Population of the country (in 1000s)

-

Medical doctors (per 10,000)

Since the significance level is

α =0.05

-

When p-value

¿ 0.05 , the variable is insignificant.

-

When p-value

¿ 0.05 , the variable is significant.

According to Upton and Cook (2014), backward elimination, which is the opposite of forward elimination, is a search procedure where the initial model contains all variables and removes ineffective variables one by one. At each step, one variable is extracted from the model, and this procedure continues to the point where all remaining variables have p-value less than a given threshold, which is 0.05 in this case. Regarding the given case, we shall start the backward elimination process with the full model, which consists of all predictors. The initial model includes average rainfall, average temperature, hospital beds, country population, and medical doctors.

4

A. EU countries: 1. Regression output 1 - All variables

● Significant variables: Population (p-value: 0.0000029) and Hospital beds (p-value: 0.01206) ● Insignificant variables: Average rainfall (highest p-value: 0.643), Medical doctors (pvalue: 0.627), Average Temperature (p-value: 0.474). 2. Regression output 2 - Exclude Average rainfall

5

● Significant variables: Population (p-value: 0.0000014) and Hospital beds (p-value: 0.01005) ● Insignificant variables: Medical doctors (highest p-value: 0.591), Average temperature (p-value: 0.422). 3. Regression output 3 - Exclude Medical doctors

● Significant variables: Population (p-value: 0.000001) and Hospital beds (p-value: 0.01001) ● Insignificant variables: Average temperature (p-value: 0.508). 4. Regression FINAL Output - Exclude Average temperature

6

● Significant variables: Population (p-value: 0.000001) and Hospital beds (0.0102) ● Insignificant variables: None 5. Regression Equation Total deaths = 6925.744 + 0.000263137 × (Population) - 151.453 × (Hospital beds) 6. Interpretation -

b population = 0.000263137 is the coefficient of population of EU countries. The total deaths by COVID-19 in EU countries increases by 1 case when the EU population increases by approximately 3800 (=1/0.000263137) people.

-

b hospital beds = -151.453 is the coefficient of number of hospital beds (per 10,000 people). For 1 extra hospital bed per 10,000 people, the number of COVID-19 deaths will decrease by approximately 151 people assuming that other variables are constant.

-

2 R =0.665 is the coefficient of determination. 66.5% of the total COVID-19 deaths in EU variation can be evaluated through the total population of EU countries and number of hospital beds.

B. Middle East Countries 1. Regression Output 1 - All variables

● Significant variables: Population (p-value: 0.0093) ● Insignificant variables: Average temperature (p-value: 0.951), Average rainfall (highest p-value: 0.959), Hospital beds (p-value: 0.859), and Medical doctors (p-value: 0.635).

7

2. Regression Output 2 - Exclude Rainfall

● Significant variables: Population (p-value: 0.0037) ● Insignificant variables: Average temperature (highest p-value: 0.971), Hospital beds (pvalue: 0.855), and Medical doctors (p-value: 0.568). 3. Regression Output 3 - Exclude Temperature

● Significant variables: Population (p-value: 0.001) ● Insignificant variables: Hospital beds (highest p-value: 0.804), and Medical doctors (pvalue: 0.540).

8

4. Regression Output 4 - Exclude Hospital Beds

● Significant variables: Population (p-value: 0.00032) ● Insignificant variables: Medical doctors (highest p-value: 0.316). 5. Regression FINAL Output - Exclude Medical Doctors

● Significant variables: Population (p-value: 0.00027) ● Insignificant variables: None. 6. Regression Equation Total death = - 613.6408143 + 0.00005597 × (Population)

9

7. Interpretation

IV.

-

b population =0.00005597 indicates that the total number of deaths due to COVID-19 increases by 1 person when the population of Middle East countries increases by approximately 17,867 (=1/0.00005597) people.

-

R2=71.5 %(¿ 0.715) . This means that 71.5% of the variation of the total number of deaths due to COVID-19 is explained by the population of the countries in the Middle East. It can be seen that the relationship between the total number of deaths and population of Middle East countries is relatively strong. Team Regression Conclusion

From the previous section, at 95% level of significance, total COVID-19 deaths of the EU region have correlations with two independent variables: population and hospital beds (per 10,000 people); while the Middle East region is only affected by total population. Moreover, the regression model of the Middle East reflects a better estimation with a higher coefficient of determination ( R2=0.744 ) than that of the EU zone ( R2=0.671 ), which indicates more variation in total deaths of the Middle East region (74%) is predicted by the variation of independent variables compared with the EU (67%). Based on the regression analysis, the EU region appears to be more impacted by this pandemic. Both regions are partly influenced by total population, however, the EU’s slope of population (0.0002) is higher than that of the Middle East (0.00005), indicating that the EU countries will experience more significant variation in total deaths when total population changes. It is quite comprehensible as the fact that the bigger the population in a country is, the wider the virus is able to spread, resulting in more infected people, which contributes to the number of deaths. On the other hand, the EU’s total deaths by COVID-19 is correlated to another variable, which is hospital beds (per 10,000 people). The Middle East countries suffer from fewer cases of COVID19 infected citizens and deaths, so they might manage to take care of their COVID-19 patients with their already available number of hospital beds. Meanwhile, it might not be the same situation for the EU members as they have been burdened by lack of hospital beds due to the large number of infected and death cases (Furlong & Hirsch 2020). As a result, the number of hospital beds might matter to the EU while it might not matter much to the Middle East. This second variable also makes the EU more sensitive to COVID-19’s impacts, because it is likely to be affected by more elements than that of the Middle East. Therefore, in order to reduce the number of COVID-19 deaths, especially in these regions, it is advisable they focus on the virus’s ability to spread in the community, as well as improve the number of hospital beds.

10

V.

Time Series

1. Significant Models According to the significance of the three trend models for each country as shown below, we have come to the conclusion that both regions have all three significant models which are Linear, Quadratic and Exponential trends since all models have p-value < 0.05. a. EU Countries ● Linear Model

Formula: y = - 445.068 + 46.804 × ( number of days since 1st death ) ● Quadratic Model

11

Formula: y = - 975.628 + 87.617 × ( number of days since 1 st death ) - 0.530 × st

number of days since 1 death )2 ¿ ● Exponential Model

Formula: y = 0.002 × 1.291(number of days since 1st

death)

b. Middle East Countries ● Linear Model

Formula: y = 24.745 + 1.897 × ( number of dayssince 1st death ) ● Quadratic Model

12

Formula: y = - 54.540 + 8.414 × ( number of days since 1st death ) - 0.091 number of days since 1st death 2 ) ׿ ● Exponential Model

Formula: y = 2.82 × 1.07(number of days since 1st death) 2. Recommendation on the Best Model To determine the best forecasting model, we have calculated measuring errors by calculating SSE and MAD. The sum of squared errors (SSE) measures how the observations are scattered from the regression line or, in other words, the errors we make when predicting the observations using the regression line. When the observations are scattered quite randomly, we will have a high SSE and vice versa. However, SSE is affected by outliers (observations which vary too much from the mean). The mean absolute deviation (MAD) measures how far each of the

13

observations are from the mean. Unlike SSE, it is not sensitive to extreme observations (observations which vary too much from the mean).

Table 3: SSE and MAD Values of each Time Model for each Region The trend model that shows the smallest error levels is believed to be the most significant trend model. Regarding the EU, although the MAD of the quadratic model calculated is slightly higher than that of the linear model, the quadratic model’s SSE is significantly larger in comparison to the linear one, and the differences among the EU death adjacent numbers are not too large, resulting in a moderate level of outliers’ errors. Apart from that, both SSE and MAD levels of the quadratic model in the Middle East are the smallest among the 3 models. Therefore, the quadratic model is chosen to be the best trend model for both the EU and the Middle East to predict the number of deaths due to COVID-19 (NCD). a. EU Countries The daily number of COVID-19 deaths in a particular day in the EU can then be calculated as: ^ NCD=−975.628+81.617(number of days since 1st death )−0.530(number of days since 1st death)2 ●

Predict the number of deaths due to COVID-19 in the EU respectively on May 29, May 30 and May 31

- On May 29th - day number 105 since the first confirmed death in the EU 2 ^ NCD=−975.628+81.617 ×105−0.530 ×105 =1,750.907 ∼ 1,751 ( people )

- On May 30th - day number 106 since the first confirmed death in the EU ^ NCD=−975.628+81.617 ×106−0.530 ×106 2=1,720.694 ∼ 1,721 ( people )

- On May 31st - day number 107 since the first confirmed death in the EU ^ NCD=−975.628 +81.617 × 107 −0.530 × 107 2=1,689.421 ∼ 1,689 ( people) Based on the formula above, the number of deaths due to COVID-19 on May 29th, 30th and 31st are predicted to be approximately 1,751 people, 1,721 people and 1,689 people respectively.

14

Hence, it can be drawn from those results that the number of deaths due to COVID-19 would see a downward trend in the future in the EU. b. Middle East Countries The daily number of COVID-19 deaths in a particular day in the Middle East can then be calculated as: ^ NCD=−54.540+ 8.414 ×(number of dayssince 1 st death)−0.091 ×(number of days since 1st death)2 ● Predict the number of deaths due to COVID-19 in the Middle East on May 29, May 30 and May 31 -

On May 29th - day number 100 since the first confirmed death in the Middle East ^ NCD=−54.540+ 8.414 ×100 −0.091 × 1002 =−123.14 ∼−124 ( people )

-

On May 30th - day number 101 since the first confirmed death in the Middle East ^ NCD=−54.540+ 8.414 ×101 −0.091 × 1012=−133.017 ∼−133 ( people )

-

On May 31st - day number 102 since the first confirmed death in the Middle East ^ NCD=−54.540+ 8.414 ×102 −0.091 × 1022=−143.076 ∼−143 ( people )

After applying the quadratic formula, the predicted new confirmed deaths in the Middle East on the last three days of May are -124; -133 and -143 respectively. However, these figures are negative, which does not make sense because the daily confirmed deaths must be equal or larger than 0. Hence, the results may suggest that the Middle East will not have any new COVID-19 deaths on May 29, May 30, and May 31. VI.

Time Series Conclusion

Figure 2: Daily Recorded Number of COVID-19 Deaths in 2 Regions since the 1st recorded death in each Region Figure 2 provides an overview for the daily recorded number of COVID-19 deaths in the EU and the Middle East. By these line charts, the level of errors is easily reflected. Although there are two extreme values in the data set of the Middle East, the rest values do not show much

15

differences from their adjacent ones, letting the data form a quite obvious trend with minor volatility. Thus, it might be easy to find a suitable trend model for the number of deaths in this region. In contrast, the figure for the EU witnesses a period of erratic behaviors since around day 49 with usual occurrence of large errors, making it more challenging to determine a trend model for this data set than for that of the Middle East. Nevertheless, by running regression analysis, both the EU and the Middle East are believed to have each region’s number of COVID-19 deaths following a...


Similar Free PDFs