ECON1993 Final-asm 2020A PDF

Title ECON1993 Final-asm 2020A
Course Business Statistics
Institution Royal Melbourne Institute of Technology University Vietnam
Pages 27
File Size 1.8 MB
File Type PDF
Total Downloads 586
Total Views 1,023

Summary

ASSESSMENT TASK 3ATEAM REPORTTABLE OF CONTENT I. DATA COLLECTION II. DESCRIPTIVE STATISTICS Check for outlier Measure of central tendency Measure of variation Box and Whisker plots Conclusion III. MULTIPLE REGRESSION Building Regression model (applying Backward Limitation) FINAL Regression Model of ...


Description

RMIT University 2020

ASSESSMENT TASK 3A TEAM REPORT

RMIT University 2020

TABLE OF CONTENT I. DATA COLLECTION II. DESCRIPTIVE STATISTICS 1. Check for outlier 2. Measure of central tendency 3. Measure of variation 4. Box and Whisker plots 5. Conclusion

1 1 1 1 2 2 2

III. MULTIPLE REGRESSION 1. Building Regression model (applying Backward Limitation) 2. FINAL Regression Model of each region IV. TEAM REGRESSION CONCLUSION

3 3 7 9

V. TIME SERIES 1. Regression output and formula of the significant trend model 2. Recommend trend model 3. Prediction on number of deaths on May 29, May 30, May 31 VI. TIME SERIES CONCLUSION

10 10 15 16 16

VII. OVERALL TEAM CONCLUSION 1. The main factors that impact the number of deaths due to COVID-19 2. Covid 19 deaths prediction 3. Two other variables 4. Recommendation

17 17 17 18 18

VIII. REFERENCES IX. APPENDIX

19 22

Firstly breaking out at the end of 2019 at China, Covid 19 is now a severe pandemic that has globally effects on different fields, including work and daily life, economic, food security and medical ( The European Quality Assurance Register for Higher Education 2020; United Nations 2020; Food and Agriculture Organization of the United Nations 2020). With the vigorously spreading speed, the disease now appears at every continent (except Antarctica) and caused over 311,000 cases of fatality (Worldometer 2020). Especially at Western, the disease existed later than at Eastern, however, more complex and have not shown any sign of decreasing (French Institute of International Relations 2020). This paper will have an insight at two specificial Western regions - Europe and European Union (EU), by analyzing the number of Covid 19 deaths in each region and its relationship with related factors, examining the trend model of Covid 19 deaths in both regions as well as giving out some predictions. I. DATA COLLECTION This section provides data (information) that is necessary for part II and III analysis. According to Romos (2018), Europe contains 54 nations and territories while the EU consists of 27 regions. However, due to the lack of information in some countries, our datasets consist of 27 EU countries but only 41 European countries. The datasets will contain six variables, namely total number of Covid 19 in each country period 22 January to 24 April, population, medical doctors per 10,000 people, hospital bed per 10,000 people, average temperature of the first four months and average rainfall of the first for four months. For details, please see Appendix 1 and Appendix 2. II.

DESCRIPTIVE STATISTICS Mean

Mode Median Range

IQR

Variance

SD

CV

Europe

2837.05

14

169

25082

952

44603267.90

6678.57

235%

European Union

3347.41

14

225

25082

1889

52330253.48

7233.97

216%

Table 1. Europe and European Union dataset descriptive statistics 1. Check for outlier For Europe data set, there is no observation smaller than the lower bound (Q1-1.5 × IQR) but seven observations bigger than the upper bound (Q3+1.5 × IQR). Thus, there are seven outliers in this dataset. For European Union data set, there is no observation smaller than the lower bound (Q1-1.5 × IQR) but five observations bigger than the upper bound (Q3+1.5 × IQR). Thus, there are six outliers in this dataset. 2. Measure of central tendency Since both datasets have outliers, Median, the measure of central tendency that is not affected by outliers, might be the most suitable measurement in this case. Medians of Europe and EU dataset are 169 and 225, respectively. For Europe, it could be implied that 50% of countries in this region have more than 169 Covid 19 deaths (period 22 January to 23 April) and 50% above 169 deaths. Therefore, on average, it is 169 Covid 19 deaths per country in European dataset. Similar conclusions could be made for the European Union. 50% of countries in this region have more than 225 Covid 19 deaths (period 22 January to 23 April) and 50% above 225 deaths. On average, it is 225 Covid 19 deaths per European Union country. 1

RMIT University 2020 Europe dataset’s Median is slightly higher than European Union dataset’s so in general, number of Covid 19 deaths in European Union countries is slightly higher than in European countries. 3. Measure of variation In this context, Interquartile Range (IQR) would be the most suitable measurement as it is not affected by outliers and could measure how much the middle 50% of observations spread out from the Median - the chosen average in this case. Interquartile ranges of Europe and European Union dataset are 952 and 1889, respectively. Interquartile illustrates how the middle 50% observations spread out and the smaller the IQR is, the more consistent the middle 50% observations are. Thus, it could be concluded that European Union dataset is less consistent in terms of Covid 19 deaths per country. In other words, the difference between the number of Covid 19 fatalities per nation in European Union is bigger than in Europe. 4. Box and Whisker plots According to the box and whisker plots graph, both datasets are right-skewed because their right part is longer than their left part. This implies that for each dataset, the majority of the data are located on the high side of the graph. In other words, most of the countries (in both European Union and Europe region) have a high number of Covid 19 deaths. The graph also indicates that there are outliers in both datasets, meaning both datasets contain extreme values which would affect the objectiveness of some sensitive measurements, such as Mean or Range. Thus, those sensitive measurements are not recommended to use in assessing the two datasets. The box plot shows that Europe’s Median (quartile 2) is slightly smaller than European Union’s Median, meaning that in general, the number of Covid 19 deaths in EU countries is higher than European countries. Comparing the two boxes, it could be concluded that the European Union's box is bigger than Europe’s, meaning that the number of deaths of the middle 50% European Union countries spread out more

Figure 1. Europe and EU dataset box and whisker widely from the Median than the middle 50% of Europe. Since the middle 50% observations are generally considered as the most concentrated part of the dataset, it is possible that the number of Covid 19 deaths between European Union countries vary more than Europe countries’. 5. Conclusion Having analyzed European Union and Europe datasets descriptive statistics, it could be concluded that in general, from January 22 to April 23, European Union countries have more cases of Covid 19 2

RMIT University 2020 deaths than Europe because Europe Median is smaller than EU Median. Since both datasets contain outliers, some sensitive measurements such as Mean or Range are likely to be unreliable in and should not be used in assessing these two datasets. Moreover, the IQR results of Europe and European Union revealed the variation in number of Covid 19 deaths in the two regions. European Union’s IQR is smaller than Europe’s IQR so the number of Covid 19 deaths among European Union countries is considered less consistent than Europe, meaning the difference between the number of Covid 19 fatalities among European Union countries is bigger than the difference among European countries. III. MULTIPLE REGRESSION In this part, we will use the data of 2 regions which are Europe and European Union (it should be noticed that all the countries within European Union are included in Europe). The data of the two regions are collected based on 6 different variables: ● Total number of deaths due to COVID 19 ● Average temperature (in Celsius) ● Average rainfall (in mm) ● Medical doctors (per 10,000 people) ● Hospital beds (per 10,000 people) ● Population (in thousands) Among 6 variables above, the total number of deaths due to COVID 19 is the only dependent variable. Other 5 variables including average rainfall (in mm) and average temperature (in Celsius), hospital beds (per 10,000 people), population of the country in 2018 (in thousands), and medical doctors (per 10,000 people), all are considered as independent variables. Based on these 5 independent variables, we will build multiple regression models for Europe region and Europe Union region to predict the number of death rates due to COVID 19. For each data set of each region, we will apply backward elimination to reach the final model with only the variables that are significant at 5% level of significance. 1. Building Regression model of Europe and European Union (applying Backward Limitation) * Europe Step 1: Regression output for Europe (1)

It can be seen that, there are 4 independent variables which are insignificant at 0.05 significance level, but we first eliminate the Average temperature (in Celsius) since this non significant independent variable has the highest pvalue.

Figure 2. Regression Output for Europe (1)

3

RMIT University 2020

Step 2: Regression output for Europe (2) After eliminating the Average temperature (in Celsius) out of those independent variables, we run the regression analysis again and have the summary output as below:

From the regression output, there are still 3 non-significant independent variables since their p-value is larger than 0.05. We remove the Medical doctor (per 10,000 people) since this non significant independent variable has the highest pvalue.

Figure 3. Regression Output for Europe (2)

Step 3: Regression output for Europe (3) After eliminating the Medical doctor (per 10,000 people) out of those independent variables, we run the regression analysis again and have the summary output as below:

As can be seen in the summary output, there is still 1 non-significant independent variable which is the Average rainfall (in mm) since its p-value is larger than 0.05. Thus, we remove it out of the independent variables.

Figure 4. Regression Output for Europe (3)

4

RMIT University 2020

Step 4: Regression output for Europe (4) (FINAL Model) After removing the non-significant independent variables including the Average rainfall (in mm), the Average temperature (in Celsius), and the Medical doctor (per 10,000 people), we reach the FINAL regression model where the two independent variables left namely the Hospital bed (per 10,000 people) and the Population (in thousands) are significant at 5% significance level (p-value < 0.05). This indicates that these two significant independent variables have an effect on the number of deaths due to COVID 19 at 5% level of significance.

Figure 5. Regression Output for Europe (4) (Final Model) * European Union Step 1: Regression output for European Union (1)

It can be seen that, there are 3 independent variables which are insignificant at 0.05 significance level, but we first eliminate the Medical doctor (per 10,000 people) since this non significant independent variable has the highest p-value.

Figure 6. Regression Output for European Union (1) 5

RMIT University 2020

Step 2: Regression output for European Union (2) After eliminating the Medical doctor (per 10,000 people) out of those independent variables, we run the regression analysis again and have the summary output as below:

From the regression output, there are still 2 non-significant independent variables since their p-value is larger than 0.05. We remove the Average Temperature (in Celsius) since this non significant independent variable has the highest p-value.

Figure 7. Regression Output for European Union (2)

Step 3: Regression output for European Union (3) After eliminating the Average Temperature (in Celsius) out of those independent variables, we run the regression analysis again and have the summary output as below:

As can be seen in the summary output, there is still 1 non-significant independent variable which is the Average rainfall (in mm) since its pvalue is larger than 0.05.

RMIT University 2020 Thus, we remove it out of the independent variables.

Figure 8. Regression Output for European Union (3)

Step 4: Regression output for European Union (4) (FINAL Model) After eliminating 3 non-significant independent variables including the Average temperature (in Celsius), the Average rainfall (in mm) and the Medical doctor (per 10,000 people), we reach the FINAL regression model where the two independent variables left namely the Hospital bed (per 10,000 people) and the Population (in thousands) are significant at 5% significance level (p-value < 0.05). This indicates that these two significant independent variables have an impact on the number of deaths due to COVID 19 at 5% level of significance.

Figure 9. Regression Output for European Union (4) (Final Model)

2. FINAL Regression Model of each region * Europe’s FINAL Regression model a)

Regression Output

7

RMIT University 2020

Figure 10. Regression Output for Europe (Final Model)

b) Regression Equation The number of deathsdue = 5717.342 – 102.152 ^¿ COVID ¿ × Population (in thousands)

×

Hospital bed (per 10,000 people) + 0.118

c) Interpret the regression coefficient of the significant independent variables: ● The coefficient of Hospital bed (per 10,000 people) ( b1 = -102.152) denotes the negative relationship between the two variables, which means for every increase in the hospital bed (per 10,000 people), the number of deaths due to COVID 19 is estimated to decrease by 102.152 deaths, given that the population holding constant. ● The coefficient of Population (in thousands) ( b2 = 0.118) indicates the positive relationship between two variables, which means for every increase of 1,000 people in population (in thousands), the predicted number of deaths due to COVID 19 increases by 118 deaths, holding the hospital bed constant. d) Interpret the coefficient of determination The Coefficient of Determination ( R2 = 0.320) shows that 32% of the variation in the number of deaths due to COVID 19 can be explained by the variation in the hospital bed (per 10,000 people) and the population (in thousands). Whereas, the remaining 68% of the variation in the number of deaths due to COVID 19 is affected by other factors.

* European Union’s FINAL Regression model a) Regression output

Figure 11. Regression Output for European Union (Final Model) 8

RMIT University 2020

b) Regression Equation The number of deathsdue = 6508.490 – 146.198 × Hospital bed (per 10,000 people) + 0.265 ^¿ COVID ¿ × Population (in thousands) c) Interpret the regression coefficient of the significant independent variables ● The coefficient of Hospital bed (per 10,000 people) ( b1 = -146.198) denotes the negative relationship between the two variables, which means for every increase in the hospital bed (per 10,000 people), the number of deaths due to COVID 19 is expected to decrease by 146.198 deaths, keeping that the population holding constant. ● The coefficient of Population (in thousands) ( b2 = 0.265) indicates the positive relationship between two variables, which means for every increase of 1,000 people in population (in thousands), the average number of deaths due to COVID 19 is estimated to increase by 265 deaths, given the hospital bed (in the model) remains constant. d) Interpret the Coefficient of Determination The Coefficient of Determination ( R2 = 0.663) shows that 66.3% of the variation in the number of deaths due to COVID 19 can be explained by the variation in the hospital bed (per 10,000 people) and the population (in thousands). Whereas, the remaining 33.7% of the variations of the number of deaths due to COVID 19 may refer to other factors. IV. TEAM REGRESSION CONCLUSION REGION

SIGNIFICANT INDEPENDENT VARIABLES

COEFFICIENT OF COEFFICIENT OF THE DETERMINATION SIGNIFICANT S ( R2 ) INDEPENDENT VARIABLES

Europe

Hospital bed (per 10,000 people)

-102.152

Population (in thousands)

0.118

Hospital bed (per 10,000 people)

-146.198

Population (in thousands)

0.265

European Union

0.320

0.663

Table 2: Regression Conclusion for Europe and European Union datasets According to the FINAL regression models of each region, it is noticeable that both regions (Europe Union and Europe) have the same significant independent variables which are the Hospital bed (per 10,000 people) and the Population (in thousands). We will analyze the coefficient of the two significant independent variables to access which region is more impacted due to this pandemic. With regards to the coefficient of the hospital bed (per 10,000 people) of two regions, there is a negative relationship between the number of deaths due to COVID 19 and the hospital bed (per 10,000 people). However, the absolute value of b1 in European Union’s model is higher than the absolute value of Europe's model (146.198 > 102.152), which 9

RMIT University 2020 indicates the larger impact of the hospital bed on the number of deaths due to COVID 19 in Europe Union’s model. Specifically, in Europe Union, the number of deaths will decrease by 146.198 deaths, for every increase in the hospital bed (per 10,000 people), holding the population is unchanged. While, in Europe, the number of deaths will decrease by 102.152 deaths, for every increase in the hospital bed (per 10,000 people), holding the population is constant. One of the substantial challenges faced by many hospitals is the lack of intensive care unit (ICU) beds in hospitals for COVID 19 patients, especially when handling the big wave of coronavirus cases. Building more hospital beds is important in saving patients’ lives. Without enough total ICU beds, patients may not hospitalize and receive required treatments from doctors, which may result in a rise in the number of deaths due to COVID 19 (Mangan & Schoen 2020). Therefore, the increase in the ICU beds will lead to a reduction in the number of deaths caused by COVID 19. In terms of the coefficient of the population (in thousands) of two regions, there is a positive relationship between the number of deaths due to COVID 19 and the population (in thousands). However, the absolute value of b2 in European Union’s model is higher than the absolute value of b2 in Europe’s model (0.265 > 0.118), which shows that the population (in thousands) has a greater effect on the number of deaths due to COVID 19 of European Union. This implies that the number of deaths resulting from COVID 19 will increase by 265 deaths, for every increase of 1,000 people in the population (in thousands), keeping the hospital bed (in the model) remains constant. An increase in population density in an area may result in the difficulty in implementing social-distancing measures. According to Dr. Seven Goodman, an epidemiologist at Stanford University, “density is really an enemy in a situation like this” since the virus tends to spread faster in larger population centers where people interact more with each other, which may increase the coronavirus cases and deaths. As reported by public health experts, density is also the main reason for the increasing number of coronavirus cases in New York (Rosenthal 2020). Therefore, the higher the population density, the higher the number of deaths resulted from COVID 19. Since the coefficient of the hospital bed (per 10,000 people) and population (in thousands) are higher in European Union’s regression model, the hospital bed (per 10,000 people) and the population (in tho...


Similar Free PDFs