ECON1193 Group Assignment PDF

Title	ECON1193 Group Assignment
Course	Business Statistics
Institution	Royal Melbourne Institute of Technology University Vietnam
Pages	22
File Size	1.3 MB
File Type	PDF
Total Downloads	13
Total Views	615

Preview

CLICK TO PREVIEW PDF

Summary

Description

RMIT International University Vietnam ECON1193B – Business Statistics 1

3A Subject code

Econ1193B

Subject Name

Business Statistic

Campus

RMIT Hanoi Campus

Student Names

Ngo Nguyen Phu Dang Dinh Nguyen Vu Duong Tri DUng

Student Numbers

S3877298 S3819269 S3877892

Teacher

Pham Tran Minh Trang

Word count

3954

1

Table of Contents PART 1: DATA COLLECTION PART 2: DESCRIPTIVE STATISTICS 1. Check for outliers 2. Measure of Central Tendency 3. Measure of Variation 4. Box and Whisker Plot 5. Conclusion PART 3: MULTIPLE REGRESSION 1. Region A 2. Region B PART 4: MULTIPLE REGRESSION CONCLUSION PART 5: TIME SERIES 5.1 Regression Output 5.2. Recommendation for the most suitable trend model for both regions.

5.3. Prediction PART 6: TIME SERIES CONCLUSION 6.1. Line chart 6.2. Analysis PART 7: OVERALL CONCLUSION 7.1. Main factors that impact the number of COVID-19 deaths 7.2. Predict the number of deaths in the world on Oct 31, 2021 7.3. Analysis on whether global deaths will be reduced by the end of 2021 7.4. Two variables that might impact the number of Covid deaths in the world

Since the end of 2019, when the Covid-19 first appeared in China, it has become a serious phenomenon that had great impacts on a lot of fields, including people's lifestyles, economy, and medical, causing many negative effects on the global health goals (WHO 2021). It cannot be denied that the Covid-19 has a remarkable spread speed, therefore it appears at almost every corner of the world, including the two regions we may discuss below, especially Africa,

2

which was caused over 7,075,119 cases and born a new variant of the virus (Saifaddin G 2021). This report is written to provide a closer look at two specific regions – Middle East & Oceania and Africa, by investigating the relationship between the number of Covid 19 deaths in each region and related elements, analyzing the trend model as well as coming up with some significant predictions. PART 1: DATA COLLECTION This section provides data (information) which is crucial for further analysis in Part 2 and 3. In the first region, which is Africa, there are 54 nations (Andrew W 2021) while the number of countries in Region B (both Middle East and Oceania) are only 31. However, as some countries’ information cannot be collected, these datasets only consist of 10 Middle East and Oceania countries and 25 African countries. There are six variables to be measured, namely: population, average rainfall of the first for four months, average temperature of the first four months, hospital bed per 10,000 people, medical doctors per 10,000 people and total number of Covid 19 in each country period 22 January to 24 April. PART 2: DESCRIPTIVE STATISTICS

CV

Africa

Mean

Median

Range

IQR

Variance

SD

31.5

16.75

318.38

22.81

4147.17

64.40

74.73

491.9

214.725

18930.454

137.58799

204.47

0.9425 MiddleEas 138.49 t & & 2 Oceania

Table 1. Africa and Middle East & Oceania dataset descriptive statistics

1.

Check for outlier There are no observations smaller than the lower bound (Q1-1.5IQR) in the Africa data set, but two observations larger than the upper bound (Q3+1.5IQR). As a result, this dataset contains two outliers. There are no observations smaller than the lower bound (Q1-1.5IQR) and no observations larger than the upper bound (Q3+1.5IQR) in the European Union data collection. As a result, this dataset has no outliers. 3

2. Measure of central tendency

Africa

Middle East & Oceania

Mean

31.5

138.492

Median

16.75

74.73

Mode

#N/A

#N/A

Table 2: Measure of Central Tendency in Africa and Middle East & Oceania The mean is an appropriate measure of central tendency for this situation, since there are no outliers for region B and only two outliers for region A. Africa's and the Middle East and Oceania's data sets have means of 31.5 and 138.492, respectively (Table 2). As a result, the average Covid 19 death rate per nation in the African sample is 31.5 per million. For the other region, similar results might be drawn. In region B, Covid 19 fatalities per million are 138.492 per million on average in the Middle East and Oceania. Because the mean of the African dataset is substantially lower than that of the Middle East and Oceania datasets, the number of Covid 19 deaths in the Middle East and Oceania is significantly greater than in Africa. However, the presence of two outliers in Africa means that the mean could be unreliable specifically for region A because this measure is sensitive.

3. Measure of variation

Africa

Middle East & Oceania

4

Range

318.38

491.9

IQR

22.81

214.725

Standard Deviation

64.40

137.58799

Variance

4147.17

18930.454

Coefficient Variation

of 204.47

0.9425

Table 3: Measure of Variation in African and Middle East & Oceania In this case, the Interquartile Range (IQR) would be the most appropriate measurement in this situation since it is unaffected by outliers and can quantify how much the middle 50% of observations deviate. Africa and Middle East & Oceania datasets have interquartile ranges of 22.81 and 214.725, respectively (Table 3). The interquartile range (IQR) depicts how the middle 50% of observations are distributed, and the lower the IQR, the more consistent the middle 50% of data are. As a result, it's possible to conclude that the Middle East and Oceania have a lower consistency of Covid 19 fatalities per nation. To put it in another way, in the Middle East and Oceania, the disparity in the number of Covid 19 fatalities per nation is greater than in Africa.

4. Box and Whisker plots

5

Figure 1: Africa and Middle East & Oceania box plot Both datasets are right-skewed, according to the box and whisker plots graph, since the right half is longer than the left. This means that the majority of the data in each dataset is found in the graph's upper reaches. In other words, Covid 19 fatalities are common in most nations (in both areas). The graph also shows that one area, Africa, has outliers in its dataset, indicating that it has extreme values that might impact the objectivity of sensitive metrics like Mean or Range. As a result, the reliability of such measures might be diminished. The box plot clearly demonstrates that Europe's Box is smaller than the Middle East & Oceania's, implying that the number of Covid 19 fatalities in the Middle East & Oceania is significantly greater than in African nations. When the two boxes are compared, the Middle East & Oceania box is larger than the African box (Figure 1), resulting in the wider number of fatalities in the middle 50% of Middle East & Oceania nations spread from the Median compared to the middle 50% of Africa. Because the middle half of the dataset is typically thought to be the most concentrated, it's probable that the number of Covid 19 fatalities in the Middle East and Oceania varies more than in African nations. 5. Conclusion After examining the descriptive statistics of the two datasets, it can be inferred that from April 31st to July 31st, the Middle East and Oceania nations had more cases of Covid 19 fatalities than Africa, because Africa's Mean is lower than the other region's Mean. Because only Africa’s dataset contains outliers, sensitive measures like Mean are appropriate for analyzing both regions. However, we determined that the range is not as suitable as the IQR data for analyzing the spread of data due to the presence of outliers. The IQR data for Africa and the Middle East and Oceania indicated a difference in the number of Covid 19 fatalities between the two areas. Because the IQR of African countries is significantly lower than that of the Middle East and Oceania, the number of Covid 19 deaths in Africa is considered more consistent than in the Middle East and Oceania, implying that the difference between the number of Covid 19 fatalities in the Middle East and Oceania is greater than the difference between African countries.

6

Part 3: Multiple Regression Region A (Africa): Through the process of backwards elimination (Appendix), the final regression model for the African region includes only one independent variable which is significant at the 5% level of significance. a. The regression analysis output of the final model is presented below:

b. From this data output, we obtain the regression equation: Total Deaths from 1/4 to 31/7 (per million) = 7.450 x Hospital Beds (per 10,000) - 33.062

c. Interpretation of the significant independent variable’s coefficient in context with paper’s research topic: According to the regression equation, one extra hospital bed per 10,000 people correlates to an increase of 7.45 total deaths per one million people during the period from April 1st to July 31st of 2021 and vice versa. The p-value for this model is 0.0002, which is statistically significant at 95%, 98% and 99%. This means that the variable hospital beds per 10,000 has a very strong association with the total number of deaths. This is a surprising discovery. Since the number of hospital beds per 10,000 is an indicator of a country’s capability in providing medical care, one would assume that it would have a negative relationship with the dependent variable total deaths. One possible explanation is that in reality, healthcare systems across Africa respond to the rising death toll due to Covid-19 by increasing the number of hospital beds available. However, the data for hospital beds are not regularly updated for African countries and many figures were recorded

7

before the start of the pandemic in 2019. Therefore, this explanation is unreliable and either further research or additional independent variables are required to draw a solid conclusion. A promising variable could be one that indicates the strength of a nation’s preventive and reactive measures against Covid-19. d. Interpretation of the coefficient of determination: The coefficient of determination, or R-squared, of this regression model is 46.1%. This value indicates that 46.1% of the variation in the total deaths data is explained by the independent variable. In terms of goodness of fit, 46.1% indicates that this linear regression does not fit data samples very well, making estimations more unreliable. This suggests that our study might need more independent variables in order to obtain better estimation models.

Region B (Middle East/Oceania): Through the process of backwards elimination (Appendix B), the final regression model for the Middle East and Oceania regions includes only one independent variable which is significant at the 5% level of significance. a. The regression analysis output of the final model is presented below:

b. From this data output, we obtain the regression equation: Total Deaths from 1/4 to 31/7 (per million) = 246.671 - 7.856 x Population (millions) c. Interpretation of the significant independent variable’s coefficient in context with paper’s research topic: According to the regression equation, one million extra people in the population correlates to a decrease of 7.856 total deaths per one million people during the period from April 1st to July 31st of 2021 and vice versa. The p-value associated with the independent

8

variable is 0.023. Thus, we would be confident that there exists a significant relationship between the independent and dependent variable with 95% confidence level, but not at 98% or 99%. This is also a surprising result because a larger population should theoretically have more infections and consequently more deaths. Different nations respond to the pandemic differently in terms of social distancing and travel, which significantly affects the rate that the virus could spread among the population. Similar to the sample from region A, additional data on alternative variables regarding pandemic control could provide a better regression model for predicting Covid-19 deaths. d. Interpretation of the coefficient of determination: The R-squared of this sample is 49.6%. This means that 49.6% of the variation in the total deaths data is explained by the independent variable population. Like the sample from Region A, this linear regression does not fit data samples very well, making estimations more unreliable, and calls for more independent variables to be identified to improve future models.

Part 4: Team Regression Conclusion Regression Comparison: The regression results from region A and region B returned different significant independent variables. While the number of hospital beds per 10,000 is significant for region A, population is the significant independent variable for region B. The corresponding pvalues are 0.0002 and 0.023, respectively, which shows that both models are indeed statistically significant. The coefficient of determination is roughly the same for both region A and region B, which are 46.1% and 49.6%. This indicates that both regression models can be improved by including new variables. According to the final regressions, the coefficients of region A and region B’s independent variables are 7.450 and 7.856. This shows that the number of hospital beds per 10,000 people have a lower absolute impact on the number of total deaths than the population in millions, but this difference is small. On the other hand, the results from Part 2 show that the average total deaths for region B is much higher than region A, 138.492 to 31.5. Therefore, a one-unit change in each independent variable would correlate to a larger relative change in the total deaths of region A compared to region B. So in conclusion, while the impacts of each independent variable on their respective region is roughly the same in absolute terms, the number of hospital beds per 10,000 creates a larger percentage change on region A than what population in millions does to region B.

PART 5: TIME SERIES We have collected this dataset from 24 countries in region A (Africa) and 9 countries in region B (Middle East and Oceania). This part analyses the trend models of the daily death due to Covid-19 in region A and B from April 1st to July 31, 2020. Part 5 also calculates the regression output and formula of the significant trend model. Moreover, it recommends the

9

most suited trend model to predict the number of deaths due to Covid-19 in both region A and B. Lastly, it predicts the number of deaths due to Covid-19 in both regions on September 28, September 29, and September 30, 2021. 5.1. Regression output. Region A: Linear trend model:

Figure 2: Linear trend model for region A Formula: Y^ = 85.08 + 3.74T Y^ is the predicted number of deaths due to Covid-19 and T is the trend (time). B0 = 85.084 when the trend is 0 means that there were about 85.084 deaths due to Covid-19 on March 31. B1 = 3.74 indicates that from April 1 to July 31 the mortality rate would increase about 3.74 deaths a day from April 1 to July 31. Quadratic trend model:

Figure 3: Quadratic trend model for region A

10

Formula: Y^ = 228.46 – 3.18T + 0.05T^2 Y^ is the predicted number of deaths due to Covid-19 and T is the trend (time) B0 = 228.36 means that there were about 228.36 deaths in day 0 (March 31) B2 = 0.056 indicates that there were about 0.056 deaths recorded every T^2 day from April 1 to July 31. Exponential trend model:

Figure 4: Exponential trend model for region A Formula: Linear format: Log(Y^) = 2.15 + 0.004T Non-linear format: Y^ = 142.202 * 1.01^T Y^ is the predicted number of deaths due to Covid-19 and T is the trend (time) B0 = 142.2 means the number of deaths on day 0 (March 31) were about 142.2. (B1 – 1) * 100% = 1% indicates that the mortality rate due to Covid-19 increased for about 1% each day from April 1 to July 31. Region B: Linear trend model:

11

Figure 6: Linear trend model for region B Formula: Y^ = 10.87 – 0.11T Y^ is the predicted number of deaths due to Covid-19 and T is the trend (time) B0 = 10.87 means that there were about 10.87 deaths in day 0 (March 31) B1 = -0.11 illustrates that about -0.11 deaths recorded each day from April 1 to July 31. Quadratic trend model:

Figure 6: Quadratic trend model for region B Formula: Y^ = 101.81 – 1.48T + 0.008T^2 Y^ is the predicted number of deaths due to Covid-19 and T is the trend (time) B0 = 101.81 means that there were about 228.36 deaths in day 0 (March 31) 12

B2 = 0.008 indicates that there were about 0.008 deaths recorded every T^2 day from April 1 to July 31. Exponential trend model:

Figure 7: Exponential trend model for region B Formula: Linear format: Log(Y^) = 1.904 – 0.003T Non-linear format: 80.26 * 0.99^T Y^ is the predicted number of deaths due to Covid-19 and T is the trend (time) B0 = 80.26 means that there were about 80.26 deaths in day 0 (March 31) (B1 – 1) * 100% = -1% indicates that the number of deaths due to Covid-19 decreased for about 1% every day from April 1 to July 31. 5.2. Recommendation for the most suitable trend model for both regions. a. Region A We will use the least squares method, which is often used to find the best fit model by selecting models with the minimum sums of the error terms. In this case, we will choose the best model by using the Mean Absolute Deviation (MAD) and Sum of Squares Error (SSE). By collecting data on August 1 and August 2, we will predict the number of deaths due to Covid-19 on those two days and compare it to its actual counterpart. Model

Error day 123

Error 124

day Sum of absolute MAD value of error

SSE

Linear

400-545=-145

469-548=-79

-145-134=-224

-224/2=-112

27266

Quadratic

400-593=-193

489-603=134

-193-134=-327

-327/2=-163

55205

Exponential

400-483=-83

469-588=119

-119-83=-202

-202/2=-101

21050

13

Table 4: MAD and SSE calculation of region A It can be seen that the exponential trend model is the most suitable one because it has the smallest MAD and SSE among the three.

b. Region B Model

Error 123

day Error 124

day Sum of absolute MAD value of error

SSE

Linear

33+2 = 35

34+3 = 37

35+37 = 72

72/2 = 36

2594

Quadratic

33-40 = -7

34-41 = -7

-7-7 = -14

-14/2 = -7

98

Exponential

33-23 = 10

34-23 = 11

10+11 = 21

21/2 = 10.5

221

Table 5: MAD and SSE calculation of region B It can be seen that the quadratic trend model is the best measurement for prediction because it has the smallest MAD and SSE out of the three. 5.3. Prediction. In this part, we will predict the number of deaths due to Covid-19, given that September 28, September 29 and September 30, 2021, are day 181, day 182 and day 183 in sequence.

Date

Region A

Region B

September 28, 2021

861

96

September 29, 2021

869

97