Title | ECON1193-Assignment-3A HN-G01 Group-5-1 |
---|---|
Author | Dẹo Lê |
Course | Business Statistics |
Institution | Royal Melbourne Institute of Technology University Vietnam |
Pages | 23 |
File Size | 1.6 MB |
File Type | |
Total Downloads | 791 |
Total Views | 847 |
RMIT University Vietnam [ECON1193] Business StatisticsCASE STUDYFACTORS AFFECTING NUMBEROF DEATHS DUE TO COVID-Lecturer: Pham Thi Minh Thuy Students and Work Allocation:Student Name Student ID Parts Contributed Contribution % SignatureNgo Phuong Thao s3758242 Part 1, 2, 3, 4, 5 25% ThaoLuu Thi Bao T...
RMIT University Vietnam [ECON1193] Business Statistics
CASE STUDY
FACTORS AFFECTING NUMBER OF DEATHS DUE TO COVID-19
Lecturer: Pham Thi Minh Thuy Students and Work Allocation: Student Name
Student ID
Parts Contributed
Contribution %
Signature
Ngo Phuong Thao
s3758242
Part 1, 2, 3, 4, 5
25%
Thao
Luu Thi Bao Tram
s3818235
Part 1, 2, 3, 5, 7
25%
Tram
Tran Ha Le
s3818001
Part 1, 2, 4, 6, 7
25%
Le
Truong Thuy Linh
s3818033
Part 1, 3, 5, 6, 7
25%
Linh
TABLE OF CONTENT I.
Data Collection
2
II.
Descriptive Statistics
2
1. Measures of Central Tendency
2
2. Measures of Variation
2
3. Measures of Shape - Box and Whisker Plot
3
III.
Multiple Regression
4
A. EU Countries
5
B. Middle East Countries
7
IV.
Team Regression Conclusion
10
V.
Time Series
11
1. Significant Models
11
2. Recommendation on the Best Model
13
VI.
Time Series Conclusion
15
VII.
Overall Team Conclusion
16
APPENDIX
19
REFERENCES
21
1
I.
Data Collection
The data are collected and shown in the Excel File. II.
Descriptive Statistics
1. Measures of Central Tendency
Table 1: Measures of Central Tendency for the numbers of COVID-19 deaths by April 23rd, 2020, distributed by region As can be seen from Table 1, the mean of number of COVID-19 deaths in the EU, which is 3347.519 deaths, is higher than the figure for the Middle East (ME) at 421.634. Hence, it can be concluded that on average there are more death cases in EU countries than in ME countries. Regarding the median, which is the middle value of the data set when putting the data in ascending order, half of the EU countries have more than 225 deaths, whereas, 50% of ME countries count only more than 13. In terms of the mode, while there is no mode in the EU, in ME, the mode is 7 deaths, meaning that the most frequent number of deaths in ME countries is 7. In this situation, outliers are spotted in both categories (see Figure 1). Therefore, the mean, which is affected by outliers, cannot be used to compare both categories. Moreover, the majority of data in each group differ from each other, the mode is not suitable, either. Thus, median is the best measure. The EU's median is nearly 20 times higher than that of the ME, indicating that COVID19 has a more negative impact on the population of EU countries compared with ME countries. 2. Measures of Variation
Table 2: Measures of Variation for the numbers of COVID-19 deaths by April 23rd, 2020, distributed by region
2
Measures of variation indicate how data are distributed or, specifically in this case, how the number of COVID-19 deaths among countries in each region vary from one another. While the range, or the gap between the minimum and maximum values of the data set, of the EU is around 25 thousand deaths, that of the Middle East is only slightly more than one fifths of the EU’s. That means the numbers of deaths of the EU’s countries spread quite widely, much more significantly than the already wide range of the Middle East. Apart from that, sample variance, standard deviation and coefficient of variation are the measures that the larger their values are, the more widely data spread around the mean, and standard deviation is the most common one to be focused on among the three. Again, both the EU and the Middle East’s standard deviations are quite large, and the one belonging to the EU’s numbers of deaths due to COVID-19 is substantially higher than that of the Middle East, which signifies the same idea the ranges tell. However, as the data of the number of COVID-19 deaths collected have outliers (see Figure 1), the above measures might not be effective enough to tell the data variation as they are affected by outliers, where interquartile range, an independent measure of outliers, should be the primary one to be considered. Interquartile range is the range of 50% middle values of an ordered data set. Although the interquartile range of EU’s data is still at the high level of nearly 2000, that of the Middle East is quite insignificant – 76 deaths. Similarly put, there is every likelihood that the EU’s countries’ COVID-19 mortality numbers are considerably different from one another, in contrast with the moderately comparable values of the Middle East’s countries. It might result from the lack of solidarity and political games among the EU’s countries that prevent them from making mutual efforts to deal with the coronavirus pandemic (Szucs 2020), while the Middle East countries remain good communication and risk management between states (Pietromarchi 2020). 3. Measures of Shape
Figure 1: Box and Whisker Plots for the numbers of COVID-19 deaths by April 23rd, 2020, distributed by region Figure 1 illustrates the distributions of total number of deaths due to COVID-19 of countries in European Union and Middle East region. It shows a highly right-skewed distribution of the total number of deaths in both regions due to the presence of extremely high total deaths belonging to 3
some countries. This indicates that there are more countries in both groups with lower than average numbers of deaths due to COVID-19, which is a positive news for both regions. When considering the position of the mean in relation to the boxes and whiskers, the EU’s mean death toll locates in the fourth quartile, pointing out that more than 75% of countries in the EU zone have total deaths below the region’s average. Meanwhile, the Middle East region has a less severe and deathly situation, with only one country having total deaths higher than the region’s average as the mean falls outside the Middle East’s boxplot. Moreover, the box-and-whisker plot of the Middle East is lower than the second quartile of the EU, which indicates that 13 of out 14 countries, excluding Iran, in the Middle East have lower death toll by COVID-19 than over 50% of the EU countries, which also shows a much more serious situation of COVID-19 in the EU than in the Middle East. III.
Multiple Regression
All dataset cases are applied: ● Dependent variable (DV): Total number of deaths due to COVID 19 between January 22 and April 23, 2020 ● Independent variables (IV): -
Average rainfall (in mm)
-
Average temperature (in Celsius)
-
Hospital beds (per 10,000 people)
-
Population of the country (in 1000s)
-
Medical doctors (per 10,000)
Since the significance level is
α =0.05
-
When p-value
¿ 0.05 , the variable is insignificant.
-
When p-value
¿ 0.05 , the variable is significant.
According to Upton and Cook (2014), backward elimination, which is the opposite of forward elimination, is a search procedure where the initial model contains all variables and removes ineffective variables one by one. At each step, one variable is extracted from the model, and this procedure continues to the point where all remaining variables have p-value less than a given threshold, which is 0.05 in this case. Regarding the given case, we shall start the backward elimination process with the full model, which consists of all predictors. The initial model includes average rainfall, average temperature, hospital beds, country population, and medical doctors.
4
A. EU countries: 1. Regression output 1 - All variables
● Significant variables: Population (p-value: 0.0000029) and Hospital beds (p-value: 0.01206) ● Insignificant variables: Average rainfall (highest p-value: 0.643), Medical doctors (pvalue: 0.627), Average Temperature (p-value: 0.474). 2. Regression output 2 - Exclude Average rainfall
5
● Significant variables: Population (p-value: 0.0000014) and Hospital beds (p-value: 0.01005) ● Insignificant variables: Medical doctors (highest p-value: 0.591), Average temperature (p-value: 0.422). 3. Regression output 3 - Exclude Medical doctors
● Significant variables: Population (p-value: 0.000001) and Hospital beds (p-value: 0.01001) ● Insignificant variables: Average temperature (p-value: 0.508). 4. Regression FINAL Output - Exclude Average temperature
6
● Significant variables: Population (p-value: 0.000001) and Hospital beds (0.0102) ● Insignificant variables: None 5. Regression Equation Total deaths = 6925.744 + 0.000263137 × (Population) - 151.453 × (Hospital beds) 6. Interpretation -
b population = 0.000263137 is the coefficient of population of EU countries. The total deaths by COVID-19 in EU countries increases by 1 case when the EU population increases by approximately 3800 (=1/0.000263137) people.
-
b hospital beds = -151.453 is the coefficient of number of hospital beds (per 10,000 people). For 1 extra hospital bed per 10,000 people, the number of COVID-19 deaths will decrease by approximately 151 people assuming that other variables are constant.
-
2 R =0.665 is the coefficient of determination. 66.5% of the total COVID-19 deaths in EU variation can be evaluated through the total population of EU countries and number of hospital beds.
B. Middle East Countries 1. Regression Output 1 - All variables
● Significant variables: Population (p-value: 0.0093) ● Insignificant variables: Average temperature (p-value: 0.951), Average rainfall (highest p-value: 0.959), Hospital beds (p-value: 0.859), and Medical doctors (p-value: 0.635).
7
2. Regression Output 2 - Exclude Rainfall
● Significant variables: Population (p-value: 0.0037) ● Insignificant variables: Average temperature (highest p-value: 0.971), Hospital beds (pvalue: 0.855), and Medical doctors (p-value: 0.568). 3. Regression Output 3 - Exclude Temperature
● Significant variables: Population (p-value: 0.001) ● Insignificant variables: Hospital beds (highest p-value: 0.804), and Medical doctors (pvalue: 0.540).
8
4. Regression Output 4 - Exclude Hospital Beds
● Significant variables: Population (p-value: 0.00032) ● Insignificant variables: Medical doctors (highest p-value: 0.316). 5. Regression FINAL Output - Exclude Medical Doctors
● Significant variables: Population (p-value: 0.00027) ● Insignificant variables: None. 6. Regression Equation Total death = - 613.6408143 + 0.00005597 × (Population)
9
7. Interpretation
IV.
-
b population =0.00005597 indicates that the total number of deaths due to COVID-19 increases by 1 person when the population of Middle East countries increases by approximately 17,867 (=1/0.00005597) people.
-
R2=71.5 %(¿ 0.715) . This means that 71.5% of the variation of the total number of deaths due to COVID-19 is explained by the population of the countries in the Middle East. It can be seen that the relationship between the total number of deaths and population of Middle East countries is relatively strong. Team Regression Conclusion
From the previous section, at 95% level of significance, total COVID-19 deaths of the EU region have correlations with two independent variables: population and hospital beds (per 10,000 people); while the Middle East region is only affected by total population. Moreover, the regression model of the Middle East reflects a better estimation with a higher coefficient of determination ( R2=0.744 ) than that of the EU zone ( R2=0.671 ), which indicates more variation in total deaths of the Middle East region (74%) is predicted by the variation of independent variables compared with the EU (67%). Based on the regression analysis, the EU region appears to be more impacted by this pandemic. Both regions are partly influenced by total population, however, the EU’s slope of population (0.0002) is higher than that of the Middle East (0.00005), indicating that the EU countries will experience more significant variation in total deaths when total population changes. It is quite comprehensible as the fact that the bigger the population in a country is, the wider the virus is able to spread, resulting in more infected people, which contributes to the number of deaths. On the other hand, the EU’s total deaths by COVID-19 is correlated to another variable, which is hospital beds (per 10,000 people). The Middle East countries suffer from fewer cases of COVID19 infected citizens and deaths, so they might manage to take care of their COVID-19 patients with their already available number of hospital beds. Meanwhile, it might not be the same situation for the EU members as they have been burdened by lack of hospital beds due to the large number of infected and death cases (Furlong & Hirsch 2020). As a result, the number of hospital beds might matter to the EU while it might not matter much to the Middle East. This second variable also makes the EU more sensitive to COVID-19’s impacts, because it is likely to be affected by more elements than that of the Middle East. Therefore, in order to reduce the number of COVID-19 deaths, especially in these regions, it is advisable they focus on the virus’s ability to spread in the community, as well as improve the number of hospital beds.
10
V.
Time Series
1. Significant Models According to the significance of the three trend models for each country as shown below, we have come to the conclusion that both regions have all three significant models which are Linear, Quadratic and Exponential trends since all models have p-value < 0.05. a. EU Countries ● Linear Model
Formula: y = - 445.068 + 46.804 × ( number of days since 1st death ) ● Quadratic Model
11
Formula: y = - 975.628 + 87.617 × ( number of days since 1 st death ) - 0.530 × st
number of days since 1 death )2 ¿ ● Exponential Model
Formula: y = 0.002 × 1.291(number of days since 1st
death)
b. Middle East Countries ● Linear Model
Formula: y = 24.745 + 1.897 × ( number of dayssince 1st death ) ● Quadratic Model
12
Formula: y = - 54.540 + 8.414 × ( number of days since 1st death ) - 0.091 number of days since 1st death 2 ) ׿ ● Exponential Model
Formula: y = 2.82 × 1.07(number of days since 1st death) 2. Recommendation on the Best Model To determine the best forecasting model, we have calculated measuring errors by calculating SSE and MAD. The sum of squared errors (SSE) measures how the observations are scattered from the regression line or, in other words, the errors we make when predicting the observations using the regression line. When the observations are scattered quite randomly, we will have a high SSE and vice versa. However, SSE is affected by outliers (observations which vary too much from the mean). The mean absolute deviation (MAD) measures how far each of the
13
observations are from the mean. Unlike SSE, it is not sensitive to extreme observations (observations which vary too much from the mean).
Table 3: SSE and MAD Values of each Time Model for each Region The trend model that shows the smallest error levels is believed to be the most significant trend model. Regarding the EU, although the MAD of the quadratic model calculated is slightly higher than that of the linear model, the quadratic model’s SSE is significantly larger in comparison to the linear one, and the differences among the EU death adjacent numbers are not too large, resulting in a moderate level of outliers’ errors. Apart from that, both SSE and MAD levels of the quadratic model in the Middle East are the smallest among the 3 models. Therefore, the quadratic model is chosen to be the best trend model for both the EU and the Middle East to predict the number of deaths due to COVID-19 (NCD). a. EU Countries The daily number of COVID-19 deaths in a particular day in the EU can then be calculated as: ^ NCD=−975.628+81.617(number of days since 1st death )−0.530(number of days since 1st death)2 ●
Predict the number of deaths due to COVID-19 in the EU respectively on May 29, May 30 and May 31
- On May 29th - day number 105 since the first confirmed death in the EU 2 ^ NCD=−975.628+81.617 ×105−0.530 ×105 =1,750.907 ∼ 1,751 ( people )
- On May 30th - day number 106 since the first confirmed death in the EU ^ NCD=−975.628+81.617 ×106−0.530 ×106 2=1,720.694 ∼ 1,721 ( people )
- On May 31st - day number 107 since the first confirmed death in the EU ^ NCD=−975.628 +81.617 × 107 −0.530 × 107 2=1,689.421 ∼ 1,689 ( people) Based on the formula above, the number of deaths due to COVID-19 on May 29th, 30th and 31st are predicted to be approximately 1,751 people, 1,721 people and 1,689 people respectively.
14
Hence, it can be drawn from those results that the number of deaths due to COVID-19 would see a downward trend in the future in the EU. b. Middle East Countries The daily number of COVID-19 deaths in a particular day in the Middle East can then be calculated as: ^ NCD=−54.540+ 8.414 ×(number of dayssince 1 st death)−0.091 ×(number of days since 1st death)2 ● Predict the number of deaths due to COVID-19 in the Middle East on May 29, May 30 and May 31 -
On May 29th - day number 100 since the first confirmed death in the Middle East ^ NCD=−54.540+ 8.414 ×100 −0.091 × 1002 =−123.14 ∼−124 ( people )
-
On May 30th - day number 101 since the first confirmed death in the Middle East ^ NCD=−54.540+ 8.414 ×101 −0.091 × 1012=−133.017 ∼−133 ( people )
-
On May 31st - day number 102 since the first confirmed death in the Middle East ^ NCD=−54.540+ 8.414 ×102 −0.091 × 1022=−143.076 ∼−143 ( people )
After applying the quadratic formula, the predicted new confirmed deaths in the Middle East on the last three days of May are -124; -133 and -143 respectively. However, these figures are negative, which does not make sense because the daily confirmed deaths must be equal or larger than 0. Hence, the results may suggest that the Middle East will not have any new COVID-19 deaths on May 29, May 30, and May 31. VI.
Time Series Conclusion
Figure 2: Daily Recorded Number of COVID-19 Deaths in 2 Regions since the 1st recorded death in each Region Figure 2 provides an overview for the daily recorded number of COVID-19 deaths in the EU and the Middle East. By these line charts, the level of errors is easily reflected. Although there are two extreme values in the data set of the Middle East, the rest values do not show much
15
differences from their adjacent ones, letting the data form a quite obvious trend with minor volatility. Thus, it might be easy to find a suitable trend model for the number of deaths in this region. In contrast, the figure for the EU witnesses a period of erratic behaviors since around day 49 with usual occurrence of large errors, making it more challenging to determine a trend model for this data set than for that of the Middle East. Nevertheless, by running regression analysis, both the EU and the Middle East are believed to have each region’s number of COVID-19 deaths following a...