group project sta108 PDF

Title group project sta108
Course Statistics & Probability
Institution Universiti Teknologi MARA
Pages 55
File Size 1.7 MB
File Type PDF
Total Downloads 574
Total Views 724

Summary

PROJECT STA 108NUMBER OF REPORTED CASES AND TOTAL DEATHCAUSED BY CHOLERA IN MALAYSIA FROM YEAR 1971 TO2000NAME : 1) NUR DANIA BINTI AMAN SHAH (2018881976)2) JASMIN SYAFIKAH BINTI JAMAL ASRI (2018406686)3) NOORFATIHAH BINTI HANIPIAH (2018202014)GROUP : AS1204_MDISTRIBUTED TO : SIR ZULKIFLI BIN MOHD G...


Description

PROJECT STA 108 NUMBER OF REPORTED CASES AND TOTAL DEATH CAUSED BY CHOLERA IN MALAYSIA FROM YEAR 1971 TO 2000

NAME : 1) NUR DANIA BINTI AMAN SHAH (2018881976) 2) JASMIN SYAFIKAH BINTI JAMAL ASRI (2018406686) 3) NOORFATIHAH BINTI HANIPIAH (2018202014)

GROUP : AS1204_M

DISTRIBUTED TO : SIR ZULKIFLI BIN MOHD GHAZALI

1

Table of Contents CHAPTER 1: INTRODUCTION ............................................................................................................... 4 1.1 Background of Study ....................................................................................................................... 4 1.2 Objectives of Study ..................................................................................................................... 5 1.3 Significance of Study................................................................................................................... 5 CHAPTER 2: METHODOLOGY .............................................................................................................. 6 2.1 Data Description .............................................................................................................................. 6 2.1.1 Population.................................................................................................................................. 6 2.2 Graphical Technique ........................................................................................................................... 7 2.3 Numerical Technique ........................................................................................................................ 10 2.3.1 Measure Of Central Tendency ................................................................................................. 10 2.3.2 Mean......................................................................................................................................... 10 2.3.3 Median ..................................................................................................................................... 10 2.3.4 Mode ................................................................................................................................................ 12 2.4 Measure Of Location ......................................................................................................................... 13 2.4.1 The first and third quartiles ....................................................................................................... 14 2.5 MEASURES OF DISPERSION ....................................................................................................... 15 2.5.6 SAMPLE STANDARD DEVIATION ..................................................................................... 19 2.6 MEASURE OF SKEWNESS ........................................................................................................ 19 2.7 BOX-and-WHISKER PLOT .......................................................................................................... 20 2.8 PEARSON COEFFICIENT OF SKEWNESS ................................................................................ 20 2.9 CORRELATION ................................................................................................................................. 21 2.9.1 Characteristics of the correlation coefficient .......................................................................... 21 Strength of the Correlation Coefficient .......................................................................................... 22 2.9.2 Regression ......................................................................................................................................22 2.9.3 Fitting a Straight Line ................................................................................................................. 23 2.9.4 Coefficient of Determination ................................................................................................. 23 CHAPTER 3: RESULTS AND INTERPRETATION ............................................................................ 25 3.1 Data Representation ..................................................................................................................... 25 3.2.1 Histogram............................................................................................................................... 27 3.3.1 Scatter Plot ............................................................................................................................ 32 CHAPTER 4: CONCLUSION ................................................................................................................. 37 2

4.1 Report Summary ............................................................................................................................ 37 REFERENCES ......................................................................................................................................... 38 APPENDIX ................................................................................................................................................ 39

3

CHAPTER 1: INTRODUCTION

1.1 Background of Study

Cholera is an illness caused by infection of the intestine with the toxigenic bacterium Vibrio cholerae. A bacterium called Vibrio cholerea causes cholera infection. The deadly effects of the disease are the result of toxin that the bacteria produce in the small intestine. So, the toxin causes the body to secrete enormous amount of water, leading to diarrhea and a rapid loss of fluids and salts. In Malaysia, there were 21535 cases that have been reported but the total of death caused by Cholera were only 388 cases from year 1971 until 2000. This study was taken to analyse the relationship between the number of reported cases and total death caused by Cholera in Malaysia.

Based on this study, the number of reported cases is a manipulated variable while total death caused by Cholera in Malaysia is a responded variable. It is because, total death caused by Cholera in Malaysia depends on the number of reported cases. The data shows a positive correlation which the value is 0. 7432.The value of correlation suggests a moderate correlation relationship between the number of reported case and total death caused by Cholera in Malaysia from 1971 until 2000. The higher number of reported cases, the higher total death caused by Cholera in Malaysia.

4

1.2 Objectives of Study

The objectives of this study are:

1) To determine the relationship between the number of reported cases and total death caused by Cholera in Malaysia. 2) To obtain the types of graph that suitable for the data. 3) To find the values of mean, standard deviation and interquartile range. 4) To determine the correlation and regression of data.

1.3 Significance of Study

The data for this study is easy to access since it is already available at World Health Organisation (WHO) website. Next, it helps to save more time and money as well since we do not need to analyse, interpret the result and collect the data on our own. This kind of data is way more cheaper compared to primary data. Hence, the secondary data is more accurate than the primary data. It is because the values may be obtained rapidly. The stability of the data also high since it is done by the expert researcher from the other country.

1.4 Limitation of Study

The limitation of this study is that no session for asking question can be made to prove more about the accuracy of data since this data is already available in World Health Organisation (WHO) website. Next, the data may slightly different in term of purpose of study to match with our objective. It is because the data was already found from other researcher.

5

CHAPTER 2: METHODOLOGY

2.1 Data Description

2.1.1 Population The population that were used in this study is the number of reported cases and total death caused by cholera from year 1971 to 2000 in all country of the world. 2.1.2 Samples Sample that were used in this study is number of reported cases and total death caused by cholera in Malaysia from year 1971 to 2000. 2.1.3 Data collection method There is no data collection method that were used in this study as the data is a secondary data where it is a ready data. 2.1.3 Sampling Technique There is no sampling technique that were used in this study as the data is in secondary data which it already a ready data. 2.1.4 Variables The variables that were used in this study is the number of reported cases and total death caused by cholera from year 1971 to 2000 in Malaysia where there are 30 of observation were taken for both variables. In statistic, there are two variables which are discrete and continuous variable. The continuous variable is refer to a variable which is a response are taken on values to measure the variable. This variable is not chosen because the data is a secondary data. In this study, the type of variable that are used is discrete variable. This is because the data that were obtained in this study is a quantitative data which is a numerical data where it is suitable for the discrete variable that is a countable variable. 2.1.5 Measurement scale There are many types of measurement scale that have in statistic which include nominal, ordinal, interval and ratio. The measurement scale that were used in this study is ratio. This measurement scale was chosen is because ratio is a measurement which is stated that it is an ordered scale that gives meaning to the difference between the measurement and involve true zero point. This explain in our study that the number of reported cases caused by cholera that have a zero case in year 1974,1994,1996 and 1999 shows that there are no reported cases causes by cholera in 6

those year. The interval is same like the ratio which is the different is it does not involve true zero point. Nominal were not chosen in this study as our data is a quantitative data, where it is not matched with nominal which it used a qualitative data. Also, ordinal was not chosen is because the data that were used in this study is a secondary data, due to this there is no survey that were done, so there is no data that can be ranked which needed in the ordinal.

2.2 Graphical Technique

Due to the data that were obtained in this study is a grouped frequency distribution the histogram graph was chosen. As shown in the figure 1 the vertical of the bar is to represent the frequency of the class. The histogram graph used the frequency of the class as y-axis, and the class boundary as the x-axis.

Figure 1

7

Figure 2

The figure 3 below shows the scatter diagram. The scatter diagram is known as nature of the relationship between two continuous variable which are the dependent variable and the independent variable. From the scatter diagram the characteristic of different possible correlation can also be describe to identified how closed the relationship between the two variables. Type of the characteristic is positive correlation, negative correlation, no correlation, curvilinear correlation and perfect positive correlation. For the positive correlation it can be identify when the two variable which is the dependent, y-axis and the independent variable, x-axis shows a positive variable. The change of the direction on the x-axis will shows an increasing and also for the y-axis. Secondly, for the negative correlation it will shows a negative relationship between the two variables. The change of direction for both independent and dependent variable for negative correlation have different direction which is when the independent variable, x-axis increases the dependent variable, y-axis would be decrease.

8

Figure 3

Based on the figure 3 above the scatter diagram shows a positive skewness which mean in this it have a positive relationship between the 2 variable where when the independent variable, xaxis (number of reported case) is increase the dependent variable, y-axis (total death) also increase.

9

2.3 Numerical Technique

2.3.1 Measure Of Central Tendency The measure of average which the most called in statistic to give its meaning to the measure of central tendency. The central tendency here is the single value that is placed at the centre of a data and it can be taken as a summary value for that data set. There are Three types of averages that often used as measures of central tendency which is the mean, median and mode where the group of data can be either grouped or ungrouped data. An ungrouped data is a group that is not given in the form of frequency table or frequency distribution while a grouped data is a group of data that is tabulated in a frequency table or frequency distribution.

2.3.2 Mean Mean is known as the average of the data. It is the total of all the data observation divides by the number of the data observation. It can be calculated on both grouped and ungrouped data. Ungrouped data: 𝑥 =

∑𝑥 𝑛

Grouped data: 𝑥 = [

∑ 𝑓𝑥 ] 𝑛

2.3.3 Median Median is the value that were arrange in an ascending order to determine its middle value. The interpretation of median is 50% of the total number of observations having a value less than a median value while another 50% of the total number of observations having a value more than a median value.

10

Ungrouped Data

Step to calculated it:

i. Arrange the data in ascending order ii. Find the position of median iii. Find the value of median.

For special case: 1. Do a proper table with include cumulative frequency 2. Find the position of median.

𝑛+1 2

3. Refer the position value in cumulative frequency 4. The value of median is in column x.

Grouped Data

Steps to calculated:

i. Do a proper table with include cumulative frequency, class boundaries and position. ii. Find the position of median.

𝑛+1 2

iii. Refer the position value in cumulative to find the class median iv. Use the formula: 𝑥 = 𝐿𝑚 + [

∑𝑓 − ∑ 𝑓𝑚−1 2

𝑓𝑚

].c

Where, n=sample size 𝐿𝑚 = lower boundary of the median class

∑ 𝑓𝑚−1 = cumulative frequency before the median class 11

𝑓𝑚 = frequency of the median class

C = median class size

2.3.4 Mode Mode is the value that is more frequent that occur on the data. Where it have the formula for the ungrouped and grouped data. For ungrouped data: i. The data is first arranged in ascending order ii. Find the mode (most frequently in a set of data) Then the mode is determined by analyzing the most frequent value occur in those set of data. iii. the highest frequency should be determined for a categorical data. iv. While for a quantitative data can be determined on the histogram, also the mode and the class interval with the highest frequency can be determined. There is also a special case for the mode which is the method is: i. Find the highest frequency ii. Find the mode in column x.

For grouped data:

Steps to calculated it:

i. Do a proper table with include cumulative frequency and class boundaries. ii. Find the highest frequency to know the class mode. iii. Use the formula: 𝑥 = 𝐿𝑚0 + [

∆1

∆1 + ∆2

].c

where, 𝐿𝑚0 =lower boundary of the modal class 12

∆1 =(modal class frequency – frequency for the class before the modal class)

∆2 = modal class frequency – frequency for the class after the modal class) C = mod class size

2.3.5 Relationship between mean, median mode

The data distribution is skewed to the left or left skewness distribution. If the mode > median > mean (or simply mean < median or mean < mode).

the data distribution is skewed to the right or right skewness distribution If the mode < median < mean (or simply mean > median or mean > mode).

The data distribution is symmetrical or normal If mode = median = mean.

2.4 Measure Of Location

Measure location is which it included the quartile where it separate into ungrouped and grouped data. In the ungrouped data it is used to represent the position of the value with a large sets of data of numerical data. Basically, ungrouped data quartile it is the extension of the median. It is also the most used to non-central places. It actually divides the region under the frequency curve into four equal areas. As for the:

Ungrouped Data There have 3 position in the quartile: First Quartiles / Lower Quartiles ( 𝑄1 ) - 25%of the total data is less than first quartile value

and 75% of the total data is more than first quartile value. 𝑄1 =

𝑛+1 4

13

𝑡ℎ

Second Quartiles / Median ( 𝑄2 ) - 50%of the total data is less than second quartile value and

50% of the total data is more than second quartile value.

𝑄2 =

2(𝑛 + 1) 𝑡ℎ 4

Third Quartiles/ Upper Quartiles (𝑄3 ) - 75%of the total data is less than third quartile value and 25% of the total data is more than third quartile value.

𝑄3 =

3(𝑛 + 1) 𝑡ℎ 4

Grouped Data The quartile in grouped data their position can be measured by the first and the third quartile as 𝑄1 and 𝑄3 . The first and third quartiles can be calculated based on the distribution of a table and

also using the ogive.

2.4.1 The first and third quartiles

Method 1: Using Formula

Step 1 : the cumulative frequencies is obtained and also the position of the data. Step 2 After identified the first and third quartile classes. Obtain the first location of the first and the third quartile by using the formula and . then refer to the cumulative frequency column to determine the locations and classes it place and lie. Within these classes, the value s of and can be determine. Step 3 : Find the first and third quartile as follows

14

𝑛4 − 𝑓𝑄1−1] × 𝐶 𝑄1 𝑄1 = 𝐿𝑄1 + [ 𝑓𝑄1

where n= number of observations.

𝐿1 = lower boundary of the first quartile class

𝑓𝑚−1= cumulative frequency before the first quartile class 𝑓1= frequency of the first quartile class 𝐶1 = first quartile class size

3𝑛 − 𝑓𝑄3 𝑄3 = 𝐿𝑄3 + [ 4 ] × 𝐶𝑄3 𝑓𝑄3

where n = number of observations.

𝐿3 = lower boundary of the first quartile class

𝑓𝑚−1= cumulative frequency before the first quartile class 𝑓3= frequency of the first quartile class 𝐶3 = 𝑡ℎ𝑖𝑟𝑑 𝑞𝑢𝑎𝑟𝑡𝑖𝑙𝑒 𝑐𝑙𝑎𝑠𝑠 𝑠𝑖𝑧𝑒 2.5 Measure Of Dispersion

understand the spread or variability of a set of data about the mean. It gives additional information to judge the reliability of the measure of central tendency and helps in comparing dispersion that is present in various samples. Some of the measure of dispersion that is discussed on this topic is range, variance and standard deviation.

15

2.5.1 Range In statistic the simplest measure of dispersion is the range which the difference between the largest and the smallest value of data. So, with this two value of the data the range of the data distribution can be obtained

For ungrouped data; Range= Largest data value – Smallest data value.

For grouped data; Range=Upper class boundary of the last class – Lower class boundary of the first class

2.5.2 Variance And Standard Deviation The variance is the sum of squares of differences between each value of the data and the mean divides by the sample size minus one. Standard deviation is the square root of the variance. Where the standard deviation is a set of values of the amount of variation or dispersion that we want to measur...


Similar Free PDFs