Nguyen Phuong Thao - s3891607 - asm 2 bstat 1 PDF

Title	Nguyen Phuong Thao - s3891607 - asm 2 bstat 1
Author	Thao Nguyen
Course	Business Statistics
Institution	Royal Melbourne Institute of Technology University Vietnam
Pages	13
File Size	378.8 KB
File Type	PDF
Total Downloads	436
Total Views	935

Preview

CLICK TO PREVIEW PDF

Summary

Description

ASSIGNMENT COVER PAGE

Subject Code

ECON1193B

Subject Name

Business Statistics 1

Location of campus

Sai Gon South Campus

Title of Assignment

Individual case study – Inferential Statistics

Dataset

Internet usage – dataset 3

Student Name

Nguyen Phuong Thao

Student Number

s3891607

Lecturer

Ha Thanh Nguyen

Assignment Due Date

August 22nd, 2021

Date of Submission

August 22nd, 2021

Number of this one) Word Count

page

(including

12 pages 2960

Part 1: Introduction

Nowadays, the internet plays an essential role in people's life. The internet, which connects billions of people across the world, is an essential part of today's information society. The worldwide penetration rate had grown from almost 17% in 2005 to over 53% in 2019. However, because certain parts of the globe have reached saturation levels, worldwide growth rates have slowed in recent years (ITU Telecommunication Development Bureau 2019). As an information distribution system, the internet, and its use can deliver education and knowledge to everyone. It also creates substantial new economic prospects as well as the potential for more environmentally friendly choices for the marketplace. Furthermore, the internet can help developing-country enterprises jump into the development mainstream. It holds great promise for easing the delivery of essential services such as health and education, which are now unevenly dispersed (United Nations Department of Economic and Social Affairs 2007). To illustrate, most individuals in developed countries are online, with around 87 percent. In the least developed countries (LDCs), on the other hand, just 19% of individuals have an internet connection in 2019. Furthermore, Europe has the most outstanding Internet usage rates, while the lowest are in Africa (ITU Telecommunication Development Bureau 2019). Besides, the United Nations has created the 2030 Agenda for 17 Sustainable Development Goals (SDGs). One of which is goal number 8 being “promoting sustained, inclusive, and sustainable economic growth, full” (United Nations Department of Economic and Social Affairs, n.d). For these reasons, to accomplish the United Nations' SDG 8, it is critical to keep track of who is using the internet. According to Chong, Liew and Suhaimi (2012), there is a significant long-run and short-run connection between gross national income and internet usage rate. For more details, investing in Information and Communications Technology (ICT) infrastructure, particularly encouraging increased internet usage, is advantageous to raising gross national income per capita. Therefore, that increasing internet usage should be included as one of the critical components of the New Economy Model for the policy's vision and purpose to be realized in the future.

Part 2: Descriptive Statistics and Probability

- 33 nations are divided into three groups according to their gross national income (GNI): + Low-Income countries (LI): containing countries with a per capita GNI of less than $1,000. + Middle-Income countries (MI): containing countries with a per capita GNI of between $1,000 and $12,500. + High-Income countries (HI): containing countries with a per capita GNI of more than $12,500.

- After being divided into three categories, these nations are split into two groups: “low usage of internet” countries (L), which have individuals using the internet (percentage of population) of no more than 40%, and “high usage of internet” countries (H), which have individuals using the internet of more than 40%.

A. Probability

Low-Income countries (LI) Middle-Income countries (MI) High-Income countries (HI) Total

Low usage of internet (L)

High usage of internet (H)

Total

4

0

4

6

13

19

0

10

10

10

23

33

Table 1: Contingency table of internet usage statistics for each nation category

a. To see if income and internet usage are statistically independent events or not, we must evaluate and compare the conditional probability of low internet usage given that low-income nations. P (L | LI), where L denotes the probability for examination and LI denotes the conditional component. Furthermore, the probability of all nations with low internet usage is P (L). P ( L) =

L 10 = =0.3 33 33

4 P(L∧LI ) P(L ∩ LI ) 33 P ( L|LI )= =1 = = 4 P (LI ) P (LI ) 33  P (L|LI) ≠ P (L) After calculating, the income and internet usage are statistically dependent events as nations with low internet usage, given that low-income P (L| LI) have a different probability than countries with low internet usage P(L). It demostrates that these two probabilities affect each other. It is the same for the other 2 group countries. As a result, the gross national income of each country is dependent on the individual's use of the internet.

b. To determine which country categories have more internet usage, we must calculate the probability of 3 country categories and compare it.

0 33 P ( H ∨LI )= =0 4 33 13 33 13 P ( H ∨MI ) = = =0.684 19 19 33 10 33 P ( H ∨HI )= =1 10 33  P (H | LI) < P (H | MI) < P (H | HI) - As a result, the chance of low-income nations having high internet usage is 0%. In contrast, the probability of high internet usage is 100% in high-income countries, compared to 68.4% of middle-income countries. As a result, the countries that have a higher internet usage rate will have a higher GNI. Therefore, the governments should push the percentage of citizens using the internet to improve the GNI and boost the economy.

B. Descriptive statistics Min

Lower bound

Max

Upper bound

Result

LI

4.339

>

-1.78

18.618

<

18.859

No outliers

MI

8.478

>

-4.502

71.391

<

95.86

No outliers

HI

67.096 > 65.996 97.099 < 101.1127 No outliers Table 2: Measures of identifying outliers of each country on usage of internet

To get the most accurate analysis of descriptive statistics, the data set must be carefully examined to see whether it contains any outliers. From table 2, it is clear that no extreme values in the three country categories.

a. Measurement of Central Tendency Central Tendency

Low-Income

Middle-Income

High-Income

Mean

9.519

45.949

82.647

Median

7.56

49.966

81.535

Mode

N/A

N/A

N/A

Table 3: Measurements of Central Tendency of each country on usage of internet in 2017 The mean is the most tool for the central tendency. The mean is calculated based on all the data's values and can be further mathematically treated. Moreover, it's simple to comprehend

for non-technical audiences (Gholba 2012). Furthermore, although the mean is sensitive to outliers, there are no extreme values in the dataset (table 2). Table 3 shows that low-income nations have a lower mean than middle-income countries, and middle-income countries have a lower mean than high-income ones. It indicates that internet usage affects the GNI. For more details, countries with higher income will have the mean of individuals using the internet higher. b. Measurement of Variation Variation

Low-Income

Middle-Income

High-Income

Range

14.279

62.913

30.003

Interquartile Range

5.16

25.09

8.78

Variance

39.848

365.984

75.841

Standard Deviation

6.313

19.131

8.709

Coefficient of Variation 66.313 41.635 10.537 Table 4: Measures of Variation of each country on usage of internet in 2017 The best suitable measure for variation is the standard deviation - S. Standard deviation is the most often used in variation measurement. It demonstrates variance around the mean and calculates all the values in the dataset. Furthermore, the units are the same as in the original data (Descriptive Statistics 2021). Besides, the S is sensitive to the extreme values, there are no outliers in this dataset (table 2). Table 4 demonstrates that the S of middle-income countries is the highest (19.131%), following the high-income and low-income countries at 8.709% and 6.313%, respectively. For more details, the values of middle-income countries are more spread out from the mean than two country categories. In other words, usage internet in MI may be further (higher or lower) from the mean, compared to other countries. On the contrary, HI countries have less S value, so it is more likely to have a high usage rate (the mean of HI countries is 82.647% with spreading around the mean at a low level – 8.709%). Consequently, the countries should push internet usage to stimulate GNI and economy too.

Part 3: Confidence Intervals

a. Calculating confidence interval for the worldwide average of an individual using the Internet (percent of population) - Assuming the confidence level in this part is equal to 95% since it is the most often used confidence level (Hazra 2017). Therefore, the level of confidence in this case is 95%. Population standard deviation

σ

unknown

Sample standard deviation

S

27.773

Sample mean

X

52.654

Sample size

n

33

Significance level

α

0.05

Confidence level

(1−α )∗100 %

95%

Degree of freedom

d.f

32

t-critical value

t

± 2.0369

Table 5: Summary of data regarding the global average of people who use the internet Confidence interval =

X ± tα 2

,n−1

S √n

=

52.654 ± 2.0369

27.773 √ 33

 42.806 ≤ μ≤ 62.502  We are 95% confident that the world average of internet users is between 42.806 and 62.502 percent of population.

b. Assumption Since the sample size is large enough (n=33 > 30), no matter how the population is normally distributed or not, we can apply Central limit theorem (CLT) that the sampling distribution of all possible sample means can be approximated by normally distributed.

 No assumptions are required.

c. Assume we know the worldwide standard deviation of Internet users. In this case, the world standard deviation of internet users, which means population standard deviation is provided, we will use z-table instead of ttable. In other words, the t-critical value (part 3b) will be replaced by the σ ). Besides, critical z-values z-critical value (the new formula is: X ± Z √n will be smaller than critical t-values for any given degree of confidence. Confidence intervals are smaller when critical values are smaller. A broader interval, on the other hand, is a more cautious interval (McEvoy 2018). Furthermore, the confidence interval is defined by its margins of error. Therefore, when the width of the confidence interval reduces, the margins of error decrease too. Thus, it leads to higher precise results (Simundic 2008).

 If σ – the world standard deviation of an individual using the internet is known, the confidence interval will decrease and be more precise.

Part 4: Hypothesis Testing

a. Hypothesis Testing (CV approach) In 2016, the population mean for internet users was 44.7% (percentage of population), according to a World Health Organization survey. We are 95 percent confident that the global average for an internet user is between 42.806 and 62.502 based on the calculations in part 3. The data of the 2016 year also lies in this interval. It leads to confusion about whether individuals' use of the internet will increase, decrease, or remain unchanged in the upcoming years. So, we should do the two-tailed test first to test if the internet usage will change or remain unchanged. Population mean Population standard deviation

μ

44.7

σ

unknown

Sample mean Sample standard deviation Sample size

X

52.654

S

27.773

n

33

Confidence level

(1−α )∗100 %

95%

Significance level

α

0.05

Degree of freedom d.f 32 Table 5: Summary of data regarding the global average of people who use the internet

-

Step 1: Check for CLT

33 countries are calculated in this case, so the sample size n=33 which is higher than 30. Therefore, CLT is applicable. Then, the sampling distribution of mean becomes normally distributed. -

-

Step 2: Determine the null hypothesis H0 and the alternative hypothesis H1 H0:

μ=44.7

H1:

μ ≠ 44.7

Step 3: Determine what kinds of test

From the result in step 2, it is two-tailed test -

Step 4: Choose which table to use

The t-table is utilized because the population standard deviation is unknown.

-

Step 5: Determine critical values (CV)

In this case, α =0.05 , degree of freedom = 32 and two-tailed test, tcritical value = ± 2.0369 t=

-

Step 6: Calculate test statistics t X−μ 52.654−44.7 =1.645 = 27.773 S √ 33 √n Step 7: Make statistical decision 0.45 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 0

5

10

15

20

25

30

35

40

45

As we can see from the curve above, the t value is in the non-rejection area (-2.0369 < 1.654 < 2.0369). Consequently, we do not reject the null hypothesis H0 (reject the alternative hypothesis H1). -

Step 8: Make managerial decision

As H0 is not rejected, then with a 95% degree of confidence, the average global individual utilizing the Internet (percentage of population) is 44.7 percent in the future. -

Step 9: Discuss the possible error

Since H0 is not rejected, then type II error might have been committed. In this context the error means: It is concluded that the world average individual using the Internet (percentage of population) is 44.7% but in actual, the average world individual using the Internet might not be 44.7% in the upcoming years.  The average world number of people utilizing the internet may change (drop or rise) in the future.

b. Consider the influence of doubling the number of nations in the dataset on hypothesis testing findings.

- The sample size (n) will double if the number of nations in the dataset is doubled. For more details, the degree of freedom will be affected directly. - In hypothesis testing, the standard error, determined by sample size, is used to calculate the width of sampling distributions (The University of Texas, n.d). In other words, the standard error represents the distribution's dispersion. The dispersion of the distribution decreases as the sample size increases, and the mean of the distribution is near the population mean (Central Limit Theory). As a result, the sample size is inversely proportional to the standard error of a sample (Zijing Zhu 2020). So, when increasing the sample size, t-distribution will have a skinner curve. Hence t – critical values, in this case, will be pushed closer to the mean, a nonrejection region. At the same time, the sample mean approaches the actual population mean, and the data distribution becomes less variable, resulting in a lower standard deviation S. With a lower S and a higher n, a X −μ t' = t’ statistical test has a new formula: . Then, t and t’ will move to S √n each other. Besides, in this case, a test statistic point is significantly far from critical numbers. Thus, it is difficult for a test statistic point to fall the rejection area even with these adjustments. As a result, it is reasonable to state that the statistical conclusion will not change. - Furthermore, the statistical power and sample size have a positive correlation with each other. Increasing the sample size enhances power by lowering the standard error to raise the test statistic value (Introduction to Hypothesis Testing n.d). Then, the power of the test will increase. In other words, the sample will be more representative of the general population if the standard error is lower. The sample size has an inverse relationship with the standard error; the larger the sample size, the smaller the standard error as the statistic approaches the real value. The standard error is a type of inferential statistic that is used to make inferences. It indicates the standard deviation of the mean within a dataset. This works as a measure of variance for random variables and offers a measurement for the spread. The dataset will be more exact if the dispersion is less (Kenton and Mansa 2020). Consequently, the result will be more accurate. - 1−β , which β is the Type II error is used to determine the power of the test. With higher power, we're less likely to commit a Type II error, which does not reject the null hypothesis when the null hypothesis is false. P (Reject H0 | H0 is false) = 1 – P (Fail to reject H0 | H0 is false) = 1−β . Decreasing beta error ( β ) through increasing the sample size increases power of the test 1−β . In other words, the lower β is, the higher statistical power (Zijing Zhu 2020).

 From these reasons above, when increasing the sample size, n, the statistical decision will remain unchanged and the results of hypothesis testing will be more accurate.

Part 5: Conclusion

Key findings from the analysis and calculation of individuals using the internet at three different income levels in 33 countries are listed below. - In part 1, there is an upward trend in using the internet in the whole world. It is a link between internet usage and gross national income. It means in countries with a high- middle-income, individuals using the internet will be higher than in low-income countries. GNI indicates how the countries’ economy is. So, the government should encourage citizens to access more education and pay attention to developing the internet (for example, high-speed internet, 5G, and so on) to push the economy. - To strengthen this relationship, in part 2, it is concluded that the GNI and internet usage are dependent event when the example of probability of conditional probability of low usage internet given that low-income countries is not equal to the probability of low usage countries (P (L|LI) ≠ P (L)). In addition, the high-income nations are more likely to have using internet rate of 100%, compared to 68.4% and 0% of middle- and lowincome, respectively (P (H | LI) < P (H | MI) < P (H | HI)). The descriptive statistics also show the link between GNI and internet usage. In measuring the central tendency for high-, middle-, and low-income countries, the mean of individuals using the internet (% of the population) is 9.519%, 45.949%, and 82.647%, respectively. It demonstrates the significant difference in internet usage between three major groups of countries. - Besides, in part 3, we are 95% confident that the world average of an individual using the i...