Statistics - Mathematics in the modern worldd PDF

Title Statistics - Mathematics in the modern worldd
Course BS Accountancy
Institution Central Mindanao University
Pages 16
File Size 575.6 KB
File Type PDF
Total Downloads 29
Total Views 195

Summary

1CHAPTER 4StatisticsThe word Statistics have two major definitions, a singular form and a plural form. Statistics , in a plural sense , refers to the data itself or to some numerical computations derived from a set of data that are systematically collected and analyzed. In a singular sense , Statist...


Description

Mathematics in The Modern World

GEC 14 Teachers CHAPTER 4 Statistics

Topics: 4.1 Measures of Central Tendency/Location 4.2 Measures of Dispersion/Variation 4.3 Linear Correlation and Simple Linear Regression

The word Statistics have two major definitions, a singular form and a plural form. Statistics, in a plural sense, refers to the data itself or to some numerical computations derived from a set of data that are systematically collected and analyzed. In a singular sense, Statistics refers to the scientific discipline consisting of the theory and methods for processing collections of quantitative and qualitative data useful when making decisions in the face of uncertainty. Below are the objectives and some key definitions to be considered as you going through this module.

Objectives: (1) Calculate the mean, median and mode of a set of data and under what conditions they are most appropriate to be used; (2) Calculate the range, variance, and standard deviation; (3) Plot a scatter diagram, measure and interpret the relationship between the two variables; and (4) Predict or estimate values of dependent variable from known values of independent variables. Key Definitions: Population Sample Variables Outlier

First Semester

is a collection of all units from which data is to be collected. is a subset of the population. are the characteristics or properties measured from objects, persons or things on every unit of the population. is an observation of data that does not fit the rest of the data. It is sometimes called an extreme value

1

CMU Mathematics Department

Mathematics in The Modern World

GEC 14 Teachers

4.1 MEASURES OF CENTRAL TENDENCY/LOCATION Measures of Central Tendency or Location is a numerical value that summarizes a set of observations into a single value and that value may be used to represent the entire population. There are three types of measures of central tendency namely: arithmetic mean, median and mode. 4.1.1.

Mean The mean (often called the average) is the most popular measure of central tendency. It is the sum of a set of observations divided by the number of observations in the set. This measure is appropriate for data in interval or ratio scale. The computing formulas of the mean are as follows: ➢ Population Mean

𝜇= ➢ Sample Mean

𝑥 =

where

𝑁

1 ∑ 𝑥𝑖 𝑁

𝜇 – population mean

𝑥 – sample mean

𝑖=1

𝑋ത – weighted mean

𝑛

1 ∑ 𝑥𝑖 𝑛

𝑁 – population size or total number of observations

𝑛 – sample size or total number of observations

𝑖=1

𝑥𝑖 – set of data or observations

𝑤𝑖 – the weights of each of the k distinct

➢ Weighted Mean 𝑘

𝑋ത =

observation

∑ 𝑤𝑖 𝑥𝑖

𝑖=1

𝑘

∑ 𝑤𝑖 𝑖=1

Example 1. The number of hours spent by 12 students in studying their Statistics lesson before exam were recorded as follows: 9, 11, 16, 11, 15, 12, 10, 16, 13, 11, 11, 17. Find the arithmetic mean. Solution: Since it was not mentioned that the data are random samples, we assume, for the purpose of illustration, that this a population data. Thus 12 1 1 (𝑥 + 𝑥2 + … + 𝑥12 ) 𝜇 = ∑ 𝑥𝑖 = 12 1 12 𝑖=1

=

1 (9 + 11 + 16 + 11 + 15 + 12 + 10 + 16 + 13 + 11 + 11 + 17) 12 =

1 152 (152) = = 12.67 12 12

This result shows that on the average, the 12 students spent 12.67 hours in studying their Statistics lesson.

First Semester

2

CMU Mathematics Department

Mathematics in The Modern World

GEC 14 Teachers

Example 2. The CMUCAT scores of a sample of 5 students who joined the university during the first semester of SY 2020-2021 were found to be 78, 90, 89, 95, and 88. Compute the mean CMUCAT score. Solution: This is a sample data, hence 5

1 𝑥 = ∑ 𝑥𝑖 1 (𝑥1 + 𝑥2 + 𝑥3 + 𝑥4 + 𝑥5 ) = 1 (78 + 90 + 89 + 95 + 88) 5 5 =5 𝑖=1 440 1 = 88 = (440) = 5 5 This result shows that 5 students have an average CMUCAT score of 88.

Example 3. The student’s final grades in Math 51, Math 43, GEE 12, GEC 19, PE31 and NSTP 1 are 2.5, 2.75, 1.25, 1.75, 1.25 and 1.75, respectively. If the respective credits for these subjects are 3, 4, 3, 3, 2, and 3 units, determine the student’s GPA or weighted average grade. Solution:

𝑋ത =

6

∑ 𝑤𝑖 𝑥𝑖 𝑖=1

𝑤1 𝑥1 + 𝑤2 𝑥2 + 𝑤3 𝑥3 + 𝑤4 𝑥4 + 𝑤5 𝑥5 + 𝑤6 𝑥6 𝑤1 + 𝑤2 + 𝑤3 + 𝑤4 + 𝑤5 + 𝑤6

=

6

∑ 𝑤𝑖

=

𝑖=1

=

3(2.5) + 4(2.75) + 3(1.25) + 3(1.75) + 2(1.25) + 3(1.75) 3+4+3+3+2+3

35.25 = 1.96 18

This result shows that the GPA of this student is 1.96. 4.1.2.

Median The median is the middle value of a set of observations arranged in an increasing or decreasing order of magnitude, denoted by 𝑥. It is a positional value and unlike the arithmetic mean, it is not affected by the presence of extreme values. When abnormal values or outliers are present, it is preferable to use the median rather than the mean as a measure of central location. It is an appropriate measure for data which are at least in the ordinal scale. ❖ Population Median ➢ If N is odd, then the median is computed using

𝑋 = 𝑥(𝑁+1 ) 2

➢ If N is even, then the median is computed using

𝑋 =

𝑥(𝑁 ) + 𝑥(𝑁 +1) 2

2

2

❖ Sample Median ➢ If n is odd, then the median is computed using

𝑥 = 𝑥(𝑛+1

2 )

➢ If n is even, then the median is computed using

First Semester

𝑥 =

𝑥(𝑛) + 𝑥(𝑛 +1) 2

2

2

3

CMU Mathematics Department

Mathematics in The Modern World

GEC 14 Teachers

Example 4. The ages of 8 CMU students enrolled in GEC 14 subject are: 18, 17, 23, 20, 19, 18, 21, and 22. Find the median of ages. Solution: Arrange the ages in ascending order: 17, 18, 18, 19, 20, 21, 22, 23. This means that 𝑥(1) = 17, 𝑥(2) = 18, 𝑥(3) = 18, 𝑥(4) = 19, 𝑥(5) = 20, 𝑥(6) = 21, 𝑥(7) = 22, 𝑥(8) = 23. Since it was not mentioned that the data are random samples, we assume, for the purpose of illustration, that this a population data. Also, N=8, which is an even number, the median is 𝑋 =

𝑥(𝑁 ) + 𝑥(𝑁 +1) 2

2

2

=

𝑥(8 ⁄2 ) + 𝑥(8 +1) 2

Now, 𝑥4 = 19 𝑎𝑛𝑑 𝑥5 = 20, then

2

=

𝑥4 + 𝑥5 2

39 19+20 𝑋 = 2 = 2 = 19.5. Thus, the median ages of 8 CMU students enrolled in GEC 14 subject is 19.5.

Example 5. The CMUCAT scores of a sample of 5 students who joined the university during the first semester of SY 2020-2021 were found to be 78, 90, 89, 95, and 88. Determine the median CMUCAT score. Solution: Arrange the CMUCAT scores in ascending order: 78, 88, 89, 90, 95. This means that 𝑥(1) = 78, 𝑥(2) = 88, 𝑥(3) = 89, 𝑥(4) = 90, 𝑥(5) = 95. Since n=5, which is an odd number, the median is

𝑥 = 𝑥(𝑛+1 ) = 𝑥(5+1 ) = 𝑥(6 ) = 𝑥(3) = 89. 2

2

2

Thus, the median is 89, which is the 3rd observation of the ordered data. 4.1.3.

Mode Mode is defined as the value which occur the greatest number of times or the value with the greatest frequency. It is an appropriate measure for a nominal or categorical type of data. Note: If observations occur with equal frequency then there is no modal value for the data set. Example 6. The CMUCAT scores of a sample of 5 students who joined the university during the first semester of SY 2020-2021 were found to be 78, 90, 89, 95, and 88. Find the mode CMUCAT score. Solution: Since the observations occur with equal frequency then there is no modal value for the data set. Example 7. The number of hours spent by 12 students in studying their Statistics lesson before exam were recorded as follows: 9, 11, 16, 11, 15, 12, 10, 16, 13, 11, 11, 17. Find the mode. Solution: The mode is 11 hours since it occurs four times while the other observations occur only once or twice.

First Semester

4

CMU Mathematics Department

Mathematics in The Modern World

GEC 14 Teachers

4.2. MEASURES OF DISPERSION/VARIATION Measure of dispersion is a numerical value computed from the given observations, measures how the data spreads from the central location. This often used in comparing two sets of data. The lesser the measure is, the closer the values of the observations from the central value. The common measures of dispersion/variation are the range, variance and standard deviation. 4.2.1.

Range Range is the difference between the highest value and the lowest value 𝑅 = 𝐻𝑉 − 𝐿𝑉 Example 8. The CMUCAT scores of a sample of 5 students who joined the university during the first semester of SY 2020-2021 were found to be 78, 90, 89, 95, and 88. Find the range of the CMUCAT score. Solution: The highest CMUCAT score is 95 and the lowest CMUCAT score is 78; hence the range is 17, that is, 𝑅 = 95 − 78 = 17. Example 9. The number of hours spent by 12 students in studying their Statistics lesson before exam were recorded as follows: 9, 11, 16, 11, 15, 12, 10, 16, 13, 11, 11, 17. Find the range of the number of hours spent by 12 students in studying their Statistics lesson before exam. Solution: The highest value is 17 and the lowest value is 9; hence the range is 8, that is, 𝑅 = 17 − 9 = 8

4.2.2.

Variance Variance is another measure of variation which can be used instead of the range. The variance considers the deviation of each observation from the mean. The computing formulas are defined below. ➢ Population Variance 𝑁

𝑁

∑( 𝑥 𝑖 − 𝜇 )2

𝜎2 =

𝑖=1

𝑁

𝜎 = 2

or

∑ 𝑥𝑖 2 − 𝑁𝜇2 𝑖=1

𝑁

where

𝜎 2 – population variance 𝜇 – population mean

𝑁 – population size or total number of observations

𝑥𝑖 – set of data or observations

First Semester

5

CMU Mathematics Department

Mathematics in The Modern World

GEC 14 Teachers

➢ Sample Variance 𝑛

∑( 𝑥 𝑖 𝑖=1

𝑠2 =

𝑛

𝑖=1

𝑖=1

2

𝑛 ∑ 𝑥𝑖 2 − ൭∑ 𝑥𝑖 ൱

− 𝑥 )2

𝑠2 =

or

𝑛 −1

where

𝑛

𝑛(𝑛 − 1)

𝑠 2 – sample variance 𝑥 – sample mean

𝑛 – sample size or total number of observations 𝑥𝑖 – set of data or observations

Example 10. Refer to Example 1 and compute the variance.

Solution: The computed mean (𝜇) was 12.67 and the number of observations is 𝑁 = 12. Since it was mentioned above (Example 1) that the data is a population data, hence we are going to use the formula of the population variance, that is, 12

∑( 𝑥 𝑖 − 𝜇 )2

𝜎 = 2

𝑖=1

=

12

=

(𝑥1 − 𝜇)2 + (𝑥2 − 𝜇)2 + (𝑥3 − 𝜇)2 + ⋯ + (𝑥12 − 𝜇)2 12

(9 − 12.67)2 + (11 − 12.67)2 + (16 − 12.67)2 + ⋯ + (17 − 12.67)2

=

12

(−3.67)2 + (−1.67)2 + (3.33)2 + ⋯ + (4.33)2 = 6.56 12

The population variance (𝜎 2 ) is 6.56.

Example 11. Refer to Example 2 and compute the sample variance.

Solution: From Example 2, 𝑥 = 88 and 𝑛 = 5; hence the sample variance is, 5

𝑠 = 2

∑(𝑥𝑖 − 𝑥 )2 𝑖=1

5−1 =

= =

(𝑥1 − 𝑥 )2 + (𝑥2 − 𝑥 )2 + (𝑥3 − 𝑥 )2 + (𝑥4 − 𝑥 )2 + (𝑥5 − 𝑥 )2 5−1

(78 − 88)2 + (90 − 88)2 + (89 − 88)2 + (95 − 88)2 + (88 − 88)2 5−1

(−10)2 + (2)2 + (1)2 + (7)2 + (0)2 100 + 4 + 1 + 49 + 0 154 = = = 38.5 4 4 4

The sample variance (𝑠2 ) is 38.5. First Semester

6

CMU Mathematics Department

Mathematics in The Modern World 4.2.3.

GEC 14 Teachers

Standard Deviation

The standard deviation, 𝜎 for a population or 𝑠 for a sample, is the positive square root of the variance.

➢ Population Standard Deviation 𝜎 = +√𝜎 2 ➢ Sample Standard Deviation 𝑠 = +√𝑠 2 Note: A smaller standard deviation indicates that the data set tend to be closer to the mean. Example 12. Refer to Example 10 and compute the standard deviation. Solution: Given that 𝜎 2 = 6.56, hence 𝜎 = √6.56 = 2.56. The population standard deviation (𝜎) is 2.56.

Example 13. Refer to Example 11 and compute the standard deviation. Solution: Given that 𝑠2 = 38.5, hence 𝑠 = √38.5 = 6.2. The sample standard deviation (𝑠) is 6.2.

4.3. LINEAR CORRELATION AND SIMPLE LINEAR REGRESSION 4.3.1. Linear Correlation Correlation analysis attempts to measure the strength of the relationship between two random variables by means of a single number called correlation coefficient. This concerned only with the strength of the relationship and no causal effect is implied. The estimated sample correlation coefficient, denoted by (r ), is given by: n

n

n

n  xi yi −  xi  yi r=

i =1

  2  n  xi −   xi   i=1   i=1 n

n

i =1 2

i=1

2   n   2  n  y i −   y i    i=1    i=1

where n is the sample size

n

The Sample Pearson Correlation Coefficient can be interpreted in the following manner: 1. The value of r, ranges from -1 to +1. If r = +1 or r = -1, there is a perfect linear relationship and all points lie in the straight line. 2. An r close to +1 indicates a high positive linear relationship between the two variables X and Y, that is, if the value of X increases then the value of Y also increases. 3. An r close to -1 indicates a high negative linear relationship between the sample values, that is, the value of X decreases as the value of Y increases. 4. An r near 0 means that there is a lack of linearity between the two variables, or there is no linear relationship between them. This doesn’t mean they are not associated at all because the relationship maybe nonlinear. Scatter diagram is a graphical presentation of the independent variable (plotted on the horizontal axis) and the dependent variable (plotted on the vertical axis). Through this graph or diagram is the easiest way to determine if a relationship exists between the two variables.

First Semester

7

CMU Mathematics Department

Mathematics in The Modern World

GEC 14 Teachers

The figure below are the scatter diagrams showing the different types of linear relationships.

Figure 1.Direct Linear Relationship

Figure 2.Inverse Linear Relationship

Note: The correlation coefficient remains high (𝑟 ≈ ±1) value when the points cluster fairly around a straight line (Figure 1 and Figure 2).

Figure 3. No linear Relationship Figure 4. No Linear Relationship Note: • In Figure 3, the coefficient r becomes smaller as the distribution of points cluster less closely around the line, and it becomes virtually zero when the distribution shows randomness. • Figure 4 shows a neat curvilinear relationship between the variables and it can be verified that its linear correlation coefficient will be low or near 0. The Sample Coefficient of Determination, r 2 , is a number that determine the total variation in the values of variable Y that can be accounted for or explained by the linear relationship with the values of the variable X . It is usually expressed as a percentage. For example, if the correlation coefficient, r, is 0.60, then 𝑟 2 = (0.60)2 = 0.36 = 36%. This means that 36% of the total variation of Y can be explained by its linear relationship X. 4.3.2. Simple Linear Regression Regression analysis is a statistical method which makes use of the relationship between two or more quantitative variables so that one variable, called the dependent variable or response variable, can be predicted with the knowledge of the values of the other variable, called the independent variable or explanatory variable. A mathematical equation that allows us to predict values of one dependent variable from known values of one or more independent variable is called a regression equation.

𝑌 = 𝑎 + 𝑏𝑋 Regression analysis deals with finding estimates of the constants a and b so that once an estimate of the constants is found, a value 𝑌 can be predicted from known value of X through the regression equation

=𝒂 𝑿 𝒀 +𝒃

First Semester

8

CMU Mathematics Department

Mathematics in The Modern World where

GEC 14 Teachers

𝑌 – is the predicted dependent variable; 𝑋 – is the independent variable; 𝑎 – is the least squares estimates of the parameter 𝑎; and 𝑏 – is the least squares estimates of the parameter 𝑏.

Assumptions on Regression Analysis i. The values of the independent variable X may be “fixed”, that is, X values may be selected in advance by the researcher, or they may be obtained without the imposition of any restriction, in which case, X is not a random variable. ii. The values of X are measured without error. iii. The dependent variable Y , given different values of the independent variable 𝑋 is normally

distributed. iv. The variances of the dependent variable Y, given different values of the independent variable X are equal. Note: For iii and iv, this is a condition known as homoscedasticity.

Estimation of Parameters Given the sample {( xi , yi ), i = 1, 2, 3, n} the least squares estimate of the parameters in the regression line are:

𝑏 = where 𝑏 is the regression coefficient or the slope of the regression line and 𝑎 is the constant of regression or the y-intercept of the regression line. Moreover, 𝑛

1 𝑦ത = ∑ 𝑦𝑖 𝑛 𝑖=1

𝑎𝑛𝑑

𝑛

1 𝑥 = ∑ 𝑥𝑖 𝑛

are the means of the sample values of 𝑋 and 𝑌, respectively.

𝑖=1

Example 14. A person’s muscle mass is expected to decrease with age. To explore this relationship, a researcher randomly selected 10 persons from ages 40 to 79 years old and measured their muscle mass(unit). The result is as follows: X (age) 71 64 43 67 56 73 68 Y (muscle mass) 82 91 100 68 87 73 78 Based on the given data, do the following: a. Plot the scatter diagram of the given data. b. Find the sample coefficient of determination, 𝑟 2 and interpret the result. c. Obtain the regression line equation.

56 80

76 65

65 84

d. Estimate the muscle mass when age of the person is 60 years old.

First Semester

9

CMU Mathematics Department

Mathematics in The Modern World

GEC 14 Teachers

Solution: a. The scatter diagram of the given data.

Muscle Mass

110 100 90 80 70 60 40

50

60

70

80

Age of a Person

A decreasing slope is observed indicating a negative relationship between X and Y. b. To solve for 𝑟 2 , we have the following given and computations: 𝑛 = 10; 𝑥1 = 71, 𝑥2 = 64, 𝑥3 = 43, 𝑥4 = 67, 𝑥5 = 56, 𝑥6 = 73, 𝑥7 = 68, 𝑥8 = 56, 𝑥9 = 76, 𝑥10 = 65; 𝑦1 = 82, 𝑦2 = 91, 𝑦3 = 100, 𝑦4 = 68, 𝑦5 = 87, 𝑦6 = 73, 𝑦7 = 78, 𝑦8 = 80, 𝑦9 = 65, 𝑦10 = 84; 10

∑ 𝑥𝑖 = 𝑥1 + 𝑥2 + ⋯ + 𝑥10 = 71 + 64 + ⋯ + 65 = 639;

𝑖=1 10

∑ 𝑦𝑖 = 𝑦1 + 𝑦2 + ⋯ + 𝑦10 = 82 + 91 + ⋯ + 84 = 808; 𝑖=1 10

∑ 𝑥𝑖 𝑦𝑖 = 𝑥1 𝑦1 + 𝑥2 𝑦2 + ⋯ + 𝑥10 𝑦10 = 71(82) + 64(91) + ⋯ + 65(84) = 50887;

𝑖=1 10


Similar Free PDFs