COR10 - Probability AND Statistics Module 4 PDF

Title COR10 - Probability AND Statistics Module 4
Author Mamaguila
Course Probability and statistics module 4
Institution Malayan Colleges Mindanao
Pages 30
File Size 1.3 MB
File Type PDF
Total Downloads 408
Total Views 848

Summary

11REMOTE LEARNING MODULECOR 10 – PROBABILITY AND STATISTICSMODULE 4STUDENT NAME: ____________________________________________________ADDRESS: _________________________________________________________CONTACT NUMBER: _____________________ EMAIL:__________________GRADE LEVEL:__________________ STRAND:_...


Description

1

REMOTE LEARNING MODULE COR 10 – PROBABILITY AND STATISTICS

MODULE 4 STUDENT NAME: ____________________________________________________ ADDRESS: _________________________________________________________ CONTACT NUMBER: _____________________

EMAIL:__________________

GRADE LEVEL:__________________

STRAND:________________

SUBJECT TEACHER: ________________________________________________ CLASS ADVISER:____________________________________________________ SCHOOL YEAR: ______________________

SEMESTER:_____________

Contents BENCHMARK TEST

1

Your Journey

4

Your Objectives

5

1 Scatterplots 1.1 Scatterplots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

6 6

2 Pearson Correlation Coefficient 2.1 Pearson Correlation Coefficient . . . . . . . . . . . . . . . . . . .

11 11

3 Simple Linear Regression 17 3.1 Estimating the Parameters a and b . . . . . . . . . . . . . . . . . . 18 3.2 Interpreting the Regression Equation . . . . . . . . . . . . . . . . 20 3.3 Prediction using Simple Linear Regression . . . . . . . . . . . . . 21 Summative Test

24

BENCHMARK TEST Multiple Choices Instructions: Answer the following questions. Write your answer on the space provided before the number. 1. Based on the scatterplot given below, what do you think is the level of relationship between the two variables?

a) b) c) d)

Weak Negative Correlation Weak Positive Correlation Very Strong Positive Correlation Very Strong Negative Correlation

2. Based on the scatterplot given below, what do you think is the level of relationship between the two variables?

a) Weak Negative Correlation b) Weak Positive Correlation 1

c) Very Strong Positive Correlation d) Very Strong Negative Correlation 3. Based on the scatterplot given below, what do you think is the level of relationship between the two variables?

a) b) c) d)

Weak Negative Correlation Weak Positive Correlation Very Strong Positive Correlation Very Strong Negative Correlation

4. Based on the scatterplot given below, what do you think is the level of relationship between the two variables?

a) b) c) d)

around −0.85 around −0.50 around 0.50 around 0.85

5. A researcher wants to predict the value of the foreign direct investment based on political stability of one country. From among the choices below, which of these are true? I. Foreign direct investment is the dependent variable whereas political stability of one country is the independent variable. 2

II. Scatterplot should be checked first before doing a regression analysis. a) I only b) II only c) Both are true. d) Both are false. 6. A teacher wants to use the result of the pre-test examination (PTE) he administered to his students to predict their grade at the end of the quarter (QG) using the regression equation QG = 60 + 2.8PTE which of the following statements is true? a) For every one point increase in the grade of the student at the end of the quarter, the pre-test examination score will increase by 2.8 points. b) The correlation between the two variables is negative. c) Interpreting the intercept will not make sense. d) If the score of the student in the pre-test examination is 12, then his quarterly grade is about 94, on average. 7. A marketing specialist would like to predict the revenue of their company in millions (Revenue) based on the budget alloted for advertising in millions (Ads) using the following regression equation, Revenue = 125 + 1.2Ads which of the following statements is true? a) The dependent variable is revenue whereas the independent variable is the budget alloted for advertising. b) For every 1 increase in the budget alloted for advertising, the revenue will increase by 1.2. c) Interpreting the intercept will not make sense. d) The correlation between the two variables is perfect positive linear correlation. 8. A social scientist was able to formulate the regression equation which is given below that will help her predict one’s self-esteem score (ranges from 1 to 5) using social media usage (SocMed) in hours/day. Self-esteem = 3.32 − 0.75SocMed which of the following statements is true? a) If you want to increase your self-esteem, you may reduce your social media usage per day. 3

b) The more you use social media, the more likely it is that you have higher self-esteem score. c) Social media usage causes one’s self-esteem to go down. d) If a person does not use social media, his/her self-esteem score is 0.75 points. 9. Which among the statements is true? a) Perfect positive correlation means that we are certain that the change in y was caused by the change of x. b) A negative correlation means that if one variable increases the other will surely decrease. c) A positive correlation means that if one variable increases the other will surely increase. d) A correlation coefficient of 0.25 does not indicate a linear relationship five times as strong as that of 0.05. 10. Which among the statements is true? a) When we use a regression equation to predict a variable, the prediction is expected to be accurate. b) If the scatterplot shows a decreasing pattern, then we are to expect a negative slope. c) If the scatterplot shows an increasing pattern, then we are to expect a positive intercept. d) If the intercept and the slope of a regression equation is zero, then the Pearson correlation coefficient is expected to be −1.

Your Journey In this module, you will learn how to analyze, interpret, and create regression equation given a bivariate data. Lesson 1 focuses on the initial analysis by checking the scatterplot of a bivariate dataset. Lesson 2 discusses how to calculate the Pearson’s correlation coefficient, a standardized measure of linear relationship between two continuous variables, using summation notation. Finally, Lesson 3 discusses how to create a simple linear regression equation given a bivariate dataset. These lessons will be very useful when you reach your APP5 - Practical Research 2, in which this subject is a prerequisite.

4

Your Objectives Performance Standards: The learner is able to perform correlation and regression analyses on real-life problems in different disciplines. Content Standards: The learner demonstrates understanding of key concepts of correlation and regression analyses. Learning Competencies: At the end of the module, the learner should be able to: • illustrate the nature of bivariate data; • construct a scatterplot; • describe shape (form), trend (direction), and variation (strength) based on a scatter plot; • calculate the Pearson’s sample correlation coefficient; • solve problem involving correlation analysis; • identify the independent and dependent variables; • calculate the slope and y-intercept of the regression line; • interpret the calculated slope and y-intercept of the regression line; • solve problems involving regression analysis; and • predict the value of the dependent variable given the value of the independent variable; and • solves problems involving regression analysis.

5

Lesson 1: Scatterplots

Learning Outcomes At the end of the lesson, the students will be able to: • illustrate the nature of bivariate data; • construct a scatterplot; and • describe shape (form), trend (direction), and variation (strength) based on a scatter plot.

Introduction Many applications of inferential statistics are much more complex than the methods presented in Module 3. Often, you will want to use sample data to investigate the relationships among variables, ultimately create a model for some variable (ie. IQ, General Weighted Average, Income, etc.) that can be used to predict its value in the future. The process of finding a mathematical model (an equation) that best fits the data is part of a statistical technique known as regression analysis.

Lesson Proper 1.1

Scatterplots

In statistics, bivariate data is data on each of two variables, where each value of one of the variables is paired with a value of the other variable. Typically it would be of interest to investigate the possible association between the two variables. For two quantitative variables (interval or ratio in level of measurement) a scatterplot can be used and a correlation coefficient or regression model can be used to quantify the association. Graphs condense large amounts of information into easy-to-understand formats that clearly and effectively communicate important points. With that, when we want to investigate the relationship between two variables, the first thing that we must do is to inspect the behavior of the variables using a scatterplot. Scatterplots use dots to represent values for two different 6

numeric variables. The position of each dot on the horizontal and vertical axis indicates values for an individual data point. Scatter plots are used to observe relationships between variables. Generating scatter plots is just like plotting coordinates in a Cartesian plane. We usually set one variable (the independent variable) in the x-axis whereas the other variable (the dependent variable) in they -axis. We use the value of the independent variable as the x-coordinate whereas the value of the dependent variable as the y-coordinate. The following plots summarize the possible patterns that you may observe when looking at scatter plots. Scatterplot (a) describes a highly positive linear relationship which means both variables move in the same direction: as x increases, y may also increase. Moreover, scatterplot (b) describes no correlation which means the change in x has nothing to do with the change in y : if x increases, y may move independently. Further, scatterplot(c) describes a highly negative linear relationship which means both variables move in the opposite direction: as x increases, y may decrease. Finally, scatterplot (d) describes a nonlinear relationship. Figure 1.1: Possible Patterns of Scatterplots

(a)

(b)

(c)

(d)

Example 1.1.1 For instance, suppose a teacher wants to see the relationship

between the grade of her 5 students in Mathematics and Physics. The dataset is given below Table 1.1: Grades of the Students in Mathematics and Physics Student Score in Math Score in Physics 1 84 85 2 89 90 3 81 79 4 79 75 5 92 95

The graph of the the dataset is given below: 7

Figure 1.2: Scatterplot of the Grades of the Students 100

Physics

5 2

90 1 3

80 4 70 75

80

85 90 Mathematics

95

The numbers in the dots, which represents a student, are not displayed in an actual scatterplot. However, for the sake of discussion, we add these labels. Moreover, labeling the axes (Mathematics and Physics) is very important when generating a scatterplot. Finally, adding the title of the scatterplot is also important. Looking at the scatterplot, we can see a linear relationship between the two variables. In addition, we can see that as the students with higher grades in Mathematics increases tend to have higher grades in Physics too;= whereas students with relatively lower grade in Mathematics tend to have lower grades in Physics too. This suggests that the two variables have positive (higher grade in Mathematics corresponds to higher grade in Physics) linear (we can draw a straight line as seen in the plot to estimate the pattern) relationship. 

8

Performance Check 1 Instruction: Draw the scatterplot and describe the shape, trend, and vari-

ation (if it exist) of the given dataset. 1. Let x be a random variable that represents the percentage of successful free throws a professional basketball player makes in a season. Let y be a random variable that represents the percentage of successful field goals a professional basketball player makes in a season. x

67

65

75

86

73

73

y

44

42

48

51

44

51

Scatterplot: (10 points) 52 50

y

48 46 44 42 65

70

75 x

Description of the Scatterplot: (5 points)

9

80

85

90

2. The local ice cream shop keeps track of how much ice cream they sell versus the temperature on that day, here are their figures for the last 12 days: Day Temp (◦ C)

1 14.2

2 16.4

3 11.9

4 15.2

5 18.5

6 22.1

Sales ($)

215

325

185

332

406

522

7

8

9

10

11

12

19.4

25.1

23.4

18.1

22.6

17.2

412

614

544

421

445

408

Day Temp

(◦ C)

Sales ($)

Scatterplot: (10 points)

Sales

600

400

200 10

15

20 Temp

Description of the Scatterplot: (5 points)

10

25

Lesson 2: Pearson Correlation Coefficient

Learning Outcomes At the end of the lesson, the students will be able to: • calculate the Pearson’s sample correlation coefficient; and • solve problem involving correlation analysis.

Introduction In the previous lesson, we discussed how to analyze and describe relationship between bivariate data using scatterplots. In this module, we will discuss how to measure this relationship using Pearson correlation coefficient, or also known as Pearson r. This statistic is one of a large number of coefficients used to describe bivariate relationship, but it is also the most frequently encountered one.

Lesson Proper 2.1

Pearson Correlation Coefficient

Most correlation coefficients tell us two things. First, we have an indication of the magnitude of the relationship. It is worth noting at this point that a correlation of −0.88 is the same magnitude as one of +0.88; the sign has nothing to do with the magnitude of the relationship, but it does tell us the second thing, the direction of the relationship. When two variables are positely related, as one increases, the other may also increase. For instance, intelligence test scores are positively related to academic grades − in general, the higher the intelligence test scores, the higher the grades received in school. Other variables are inversely related, by which we mean that as one increases, the other may decrease. As an example, think for a moment of the relationship between the speed of a car in high gear and the number of miles per gallon obtained from a gallon of gasoline; the faster that one drives, the worse the gasoline mileage. The absence of any relationship between variables is denoted by a correlation coefficient of0.00 or thereabouts. 11

Definition 2.1.1 Pearson Correlation Coefficient. The measure of linear re-

lationship between two variables X and Y is estimated by the sample correlation coefficient r, where r = rh

n ∑ xy − (∑ x) (∑ y) i ih n ∑ x2 − (∑ x)2 n ∑ y2 − (∑ y)2

The value of r lies between −1 and 1, inclusive. Therefore, we can make a standardized interpretation for the value ofr. The table below shows how to interpret the Pearson correlation coefficient. All you need to do is to look for the interval at which your correlation coefficient falls into. Table 2.1: Table of Interpretation for Pearson r Correlation Coefficent Degree of Correlation Perfect Positive Correlation

r = 1.00 0.80 ≤ r < 1.00 0.60 ≤ r < 0.80 0.40 ≤ r < 0.60

Very Strong Positive Correlation Strong Positive Correlation Moderate Positive Correlation

0.20 ≤ r < 0.40 0.00 < r < 0.20 r=0

Weak Positive Correlation Very Weak Positive Correlation No Correlation

0.00 < r < −0.20 −0.20 ≤ r < −0.40 −0.40 ≤ r < −0.60

Very Weak Negative Correlation Weak Negative Correlation Moderate Negative Correlation

−0.60 ≤ r < −0.80 −0.80 ≤ r < −1.00 r = −1.00

Strong Negative Correlation Very Strong Negative Correlation Perfect Negative Correlation

The correlation coefficientr must be interpreted cautiously since a positive correlation coefficient is not a proof of a cause and effect relationship. It is possible that another variable may be affecting both X and Y in a similar manner: a confounding variable, or the correlation is a spurious one (ie. For example, the number of astronauts dying in spacecraft is directly correlated to seatbelt use in cars: Use your seatbelt and save an astronaut life!). Moreover, values of r equal to 0.3 and 0.6 only mean that we have two positive correlations, one is somewhat stronger than the other. It is wrong to conclude that r = 0.6 indicates a linear relationship twice as strong as that indicated by the value r = 0.3.

12

Example 2.1.1 A researcher wants to measure the correlation between the

height (cm) and weight (g) of his rats. The following data were gathered from 6 rats available in his laboratory. x (height) 12 10 14 11 12 9 y (weight) 18 17 23 19 20 15

S

Step 1: To compute the correlation coefficient, we need to create the following table to identify the following information:n = 6 (because we have six pairs of observations), ∑ x, ∑ y, ∑ x2 , ∑ y2 , and ∑ xy. x

y

12

18

10

17

14 11

23 19

12

20

9

15

x2

y2

xy

Step 2: Next, we fill in every cells in the third column by squaring the values of x. That is, x

y

x2

y2

12 10

18 17

(12)2 = 144 (10)2 = 100

14

23

(14)2 = 196

11 12

19 20

(11)2 = 121 (12)2 = 144

9

15

(9)2 = 81

xy

Step 3: Then, we fill in every cells in the fourth column by squaring the values of y. That is, x

y

x2

y2 2

12 10

18 17

144 100

(18) = 324 (17)2 = 289

14

23

196

(23)2 = 529

11 12

19 20

121 144

(19)2 = 361 (20)2 = 400

9

15

81

(15)2 = 225

13

xy

Step 4: Then, we fill in every cells in the fifth column by multiplying the values of x and y. That is, x

y

x2

y2

xy

12

18

144

324

12 × 18 = 216

10

17

100

289

10 × 17 = 170

14 11

23 19

196 121

529 361

14 × 23 = 322 11 × 19 = 209

12

20

144

400

12 × 20 = 240

9

15

81

225

9 × 15 = 135

Step 5: Using the table, we now get the sum of each columns. That is, x

y

x2

y2

xy

12 10

18 17

144 100

324 289

216 170

14

23

196

529...


Similar Free PDFs