Title | S3743984 - Vu Quang Hung - Asm 2 Bstat |
---|---|
Author | Quang Hung Vu |
Course | Business Statistics |
Institution | Royal Melbourne Institute of Technology University Vietnam |
Pages | 17 |
File Size | 830.6 KB |
File Type | |
Total Downloads | 165 |
Total Views | 268 |
Subject’s Code: ECONSubject’s Name: Business StatisticsLocation & Campus: RMIT Vietnam, HanoiStudent’s Name & Number: Vu Quang Hung- sLecturer’s Name: Pham Thi Minh ThuyThis report contains 5 parts:1. Summary descriptive statistics 2. Confidence intervals 3. Hypothesis testing 4. Regression ...
Subject’s Code:
ECON1193
Subject’s Name:
Business Statistics
Location & Campus:
RMIT Vietnam, Hanoi
Student’s Name & Number: Vu Quang Hung- s3743984
Lecturer’s Name:
This report contains 5 parts:
Pham Thi Minh Thuy
1.
Summary descriptive statistics
2.
Confidence intervals
3.
Hypothesis testing
4.
Regression analysis
5.
Overall conclusion
Part 1: Summarize descriptive statistics From my assignment 1 results, i have concluded that crude death rate and national income is related. I will list the analytics i have worked on previously to show why these two variables is related.
Firstly, Iceland has lower death rate than Uganda. Given that Iceland is a more developed country (GNI of 49960 in 2015) compare to Uganda, has lower income (GNI of 670 in 2015), and still a developing country.
(CDR from 1980 to 2016: Iceland & Uganda)
Taking the larger sample size of 35 countries, the results from Contingency Table suggested that high and middle-income countries have higher probability of having low crude death rate.
(CDR Contingency Table)
Moreover, as can be seen in the table 1 below, CDR and Income are not independent, which means that the data of death rate might be affected by national earnings and vice versa.
(Table 1) These evidences above all suggest that there is a relationship between income and death rate of a country.
Part 2: Confidence Intervals Death rate, crude (per 1,000 people)
Firstly, i choose a level of significance of 5% -> confidence level is 95% -> α = 0.05, α/2 = 0.025 Population standard deviation (σ) is unknown, so i can substitute σ by sample standard deviation S, i will use t-table to calculate the confidence interval Sample size n = 35 d.F = (n-1) = 35-1 = 34 t
34,0,025
= 2.0322
(Distribution plot)
(table 2) Point estimate
X
= 8.22 (Mean value in table 2)
S = 2.26 -> Confidence interval estimation: µ =
= {8.22-0.7766;8.22+0.7766} = {7.444;8.997} -> 7.444 < µ < 8.997 This means that I can be 95% confident that the true mean crude death rate falls between 7.444 and 8.997 (per 1000 people)
Gross National Income (GNI) per capita
Apply the same method +) Level of significance: 5% (α = 0.05) +) Confidence interval estimation calculated is µ = {11408,137;27545.577} -> 11408,137 < µ < 27545.577 +) I am 95% confident that the average of Gross National Income per capita is between 11408.137 and 27545.577 (US dollars)
Domestic general government health expenditure per capita, PPP
+) Level of significance: 5 % (α= 0.05) +) Confidence interval calculated is µ = {854.93;2065.128} -> 854.93 < µ < 2065.128
+) That means i can be 95 % confident that the domestic general government health expenditure per capita is in between 854.93 and 2065.128 (current international $)
Assumptions:
No assumptions required in this case, since i use t-table and the sample size (n) is greater than 30, i am able to use Central Limit Theorem.
Suppose the number of countries will double: o Firstly, the confidence interval width will decrease if the number of countries were doubled
According to the equation above, the confidence interval depends on the square root of the number of measurements (n). Therefore, when we double the number of countries from 35 to 70 countries (double the n), the standard error decrease, confidence interval range will become smaller.
Overall, confidence interval will become narrower towards the true mean of the distribution, hence increasing the accuracy of the results. Moreover, increasing the number of countries involved in the test will make the result a better representative of all 195 countries.
Part 3: Hypothesis Testing a. Prediction: As what have been calculated in part 2 above, it is 95% confidence that the world average crude death rate is in between 7.444 and 8.997. Given that the sample mean
X
is 8.2 in 2015, since
the mean of the population in 2013 was 7.7, it seems that the world average crude death rate was increased to some extent. However, to have better judgement, hypothesis testing is required to determine whether the crude death rate per thousand people has increased or not. b. Hypothesis testing:
Level of significant: α = 0.05
Sample size n = 35 > 30, Central Limit Theorem is applicable and thus sampling distribution of mean becomes normally distributed. I will use t-value to do the test.
Null hypothesis and alternative hypothesis :
H : µ ≤ 7.7 (the average CDR is less than or equal to 7.7) 0
H : µ > 7.7 (the average CDR is more than 7.7) 1
This is a one-tail test (upper-tailed test) since the alternative hypothesis H is focused on 1
the upper tail above the mean of 7.7
Alpha = 0.05 d.F = 34 Upper-tailed test -> t
34,0.05
= 1.6909
Compute test statistics: (Sample mean X=8.22, µ=7.7, Sample standard deviation S=2.26, n=35)
-> t = 1.36
t = 1.36 < 1.6909, the test statistic falls into the non-rejection region, therefore we do not reject the null hypothesis H
0
As H is not rejected, hence with 95% level of confidence it can be concluded that the 0
average crude death rate has decreased or stay the same (not increased).
By not rejecting H , i might have committed type II error (Failed to reject a false null 0
hypothesis)
It means that there is a probability that the average world crude death rate has increased (H false) but i still claim that it has decreased or remain unchanged (not reject H ). 0
0
Type II error can be minimized by picking a larger sample size (n). By increasing the sample size, I make the hypothesis test more sensitive, which means that it is more likely to reject the null hypothesis when it is, in fact, false. Another solution for this is to increase the level of significant α, which makes the rejection area larger hence less likely to ignore a false null hypothesis.
Part 4: Regression analysis a. Dependent variable and independent variables:
Dependent variable: Death rate, crude (per 1,000 people)
Independent variables: -
GNI per capita, Atlas method (current US$)
-
Domestic general government health expenditure per capita, PPP (current international $)
-
Immunization, measles (% of children ages 12-23 months)
-
Smoking prevalence, total (age 15+)
b. Regression analysis:
1. CDR & GNI
As i have concluded in the previous report, i expect the relationship between CDR and GNI to be Negative Linear, higher income countries tend to have lower crude death rate.
Scatter plot of CDR and GNI:
CDR & GNI 14 12
CDR (1000 people)
10 8 6 4 2 0
0
10000
20000
30000
40000
50000
60000
70000
80000
90000
100000
GNI per capita ( current us$ )
(Regression model)
Comment on scatter plot: -
According to the graph, it seems to have no linear relationship between these two variables, many countries in this plot have the same income but also have very different CDR.
Regression output (from excel):
-
Based on the summary output, the Simple Linear Regression equation is Y = 7.958 + 0.00001X1
-
Regression coefficient (slope): b1 = 0.00001 shows the average increase in the crude death rate when the gross national income grows by 1$ per capita.
-
Coefficient of determination R square: 2 %. This means that 2% of the variance of the crude death rate is explained by the country income.
-
Test the significance of the independent variable:
o Using t-value method: (two-tailed test) H : β = 0 (no linear relationship) 0
1
H : β ≠ 0 (linear relationship does exist) 1
1
α = 0.05 -> α/2 = 0.025 d.F = n-2 = 35-2 = 33 t critical value (from t-table): t33,0.025 = ±2.0345 Test statistic t = 0.813 t statistic is in the non-rejection area -> Do not reject H
0
o Using p-value method P = 0.422; α = 0.05 b1
P > α -> Do not reject H b1
0
There is no sufficient evidence to conclude a linear relationship between crude death rate and gross national income.
2. CDR & Domestic health expenditure
Expected result: Negative linear relationship, countries spent more money on domestic health programs will have less people die from health problems thus should have lower CDR.
Scatter plot of CDR & Domestic general government health expenditure:
CDR & Domestic health expenditure 14
CDR ( 1000 people )
12 10 8 6 4 2 0
0
1000
2000
3000
4000
5000
6000
domestic health expenditure ( current international $ )
Comment on the scatter plot: This scatter plot shows that CDR and Domestic health expenditure does not have linear relationship. Almost a half of 35 countries spent approximately the same budget on health (less than 500$ per capita), but these countries have very different death rate which means that money spent on health program seems to have no effect on CDR.
Regression output:
-
Based on the regression statistics, linear equation is Y= 7.880 + 0.00023X2
-
Regression slope b1 = 0.00023 -> For every dollar the government spends on healthcare for each of its citizen, death rate will increase by per 1000 people
-
Coefficient of determination R square = 0.033 -> 3.3 % of CDR is explained by domestic healthcare expenses.
-
Test the significance of the independent variable:
o Using t-value method: H : β = 0 (no linear relationship) 0
1
H : β ≠ 0 (linear relationship does exist) 1
1
α = 0.05 -> α/2 = 0.025 t critical value (from t-table): t = ±2.0345 Test statistic t = 1.0623 t statistic is in the non-rejection area -> Do not reject H
0
o Using p-value method P = 0.2958; α= 0.05 b1
P > α -> Do not reject H b1
0
There is no sufficient evidence to conclude a linear relationship between crude death rate and domestic general government health expenditure.
3. CDR & Immunization, measles (%)
Expected result: Negative linear relationship. Children from 12-23 months age get immunized will less likely to be exposed by measles in the future, this might reduce people die from this disease. Therefore, death rate overall may decrease.
Scatter plot of CDR and Immunization, measles (%)
CDR & IMMUNIZATION,measles 14
cDR ( 1000 PEOPLE )
12 10 8 6 4 2 0 30
40
50
60
70
80
90
100
110
iMMUNIZATION,measles ( % of children ages 12-23 months )
Comment on the scatter plot: This scatter plot shows that CDR and IMR do has linear relationship. The more immunization children get, the lower the death rate, hence this relationship is negative. However, since the value on the trend line and the actual value have quite large distance between them, this is a weak relationship.
Regression output:
-
Based on the regression statistics, linear equation is Y= 14.273-0.06871X3
-
Regression slope b1 = -0.06871 shows how much dependent variable (death rate) decrease if the proportion of infant get measles immunization increases by 1%.
-
Coefficient of determination R square = 0.126 -> 12.6 % of CDR is explained by measles immunization percentage. While the remaining 87.4% is due to other factors.
-
Test the significance of the independent variable:
o Using t-value method: H : β = 0 (no linear relationship) 0
1
H : β ≠ 0 (linear relationship does exist) 1
1
α = 0.05 -> α/2 = 0.025 t critical value (from t-table) : t = 2.0345 Test statistic t = -2.182 T statistic is in the non-rejection area -> Reject H
0
o Using p-value method P = 0.036; α= 0.05 b1
P < α -> Reject H b1
0
There is sufficient evidence to conclude a linear relationship between crude death rate and proportion of 12-24 months age children who have measles immunization.
4. CDR & Smoking prevalence
Expected result: Positive linear relationship, Smoking cause numerous deadly diseases. Therefore, it is dead rate will rise if smoking prevalence among teenagers (under 15) rise.
Scatter plot of CDR & Smoking prevalence:
cdr & sMOKING PREVALENCE 14
cdR ( 1000 PEOPLE )
12 10 8 6 4 2 0
0
5
10
15
20
25
30
35
40
45
50
sMOKING PREVALENCE ( % )
Comment on the scatter plot: This scatter plot shows that CDR and SP has non-linear relationship. It can be seen that data points scatter randomly around the trend line.
Regression output:
-
Based on the regression statistics, linear equation is Y= 7.214 + 0.0467X4
-
Regression slope b1 = 0.0467. This means that smoking prevalence rate increase by 1/1000 will result in the rise by 0.0467 in the crude death rate.
-
Coefficient of determination R square = 0.040 -> 4 % of CDR is explained by Smoking prevalence.
-
Test the significance of the independent variable:
o Using t-value method: H : β = 0 (no linear relationship) 0
1
H : β ≠ 0 (linear relationship does exist) 1
1
α = 0.05 t critical value (from t-table): t=2.0345 Test statistic t = 1.177 T statistic is in the non-rejection area -> Do not reject H
0
o Using p-value method P = 0.248; α = 0.05 b1
P > α -> Do not reject H b1
0
There is no sufficient evidence to conclude a linear relationship between crude death rate and smoking prevalence.
c. Variable recommended for further research on crude death rate: After considering all 4 independent variables above, I recommend immunization, measles (% of children ages 12-23 months) for further research on crude death rate. The main reason for this is that among 4 given variables, this has the most significant correlation with CDR (R square = 0.126) although the relationship is weak. Also, by doing regression analysis, this is the only variable that I can conclude it has linear relationship with CDR.
Part 5: Overall conclusion. According to the findings of this report and the results from the previous report, I have made some conclusion on the world average crude death rate. Based on the confidence interval part, i can be 95 % confident that the world average CDR is in between 7.444 and 8.997 (per 1000 people). 95 % is also the confidence level that i use to conclude that average CDR has decreasing trend from the mean of 7.7/1000 in 2013 to 8.2 in 2015.In part 4 (Regression analysis), I figured out that only independent variable X3 (immunization, measles) do have effect on the dependent variable CDR, but only accounted for a small 12.6% of the total change in CDR (confidence
level 95%),while income, expenditure on healthcare and smoking prevalence seems to have no linear relationship with CDR. Therefore, apart from these 4 above, there are still many prominent factors that have better impact on CDR and worth to do research on, but we have not yet covered in this report. To sum up, by looking at the crude death rate, we are able to have a better measurement of how well we ensure the human well-beings (Soares 2007). Our mission is to expand our studies to find out other aspects affecting the death rate, in the end reducing the world average CDR.
Reference list: 1. Soares.R.R 2007, On the Determinants of Mortality Reductions in the Developing
World,Population and Development Review,vol 33,pp 247-287,viewed 23th December 2018,Wiley Online Library database....