Lin Reg TTest - Lecture notes N/a PDF

Title	Lin Reg TTest - Lecture notes N/a
Author	Victoria Rendeiro
Course	Introductory Statistics
Institution	Western Kentucky University
Pages	4
File Size	824.9 KB
File Type	PDF
Total Downloads	2
Total Views	138

Preview

CLICK TO PREVIEW PDF

Summary

lecture ...

Description

Dr. Neal, WKU

MATH 183

Linear Regression T-Test

Suppose we have a random sample of paired data: {{ x1 , y1 }, { x2 , y2 }, . . ., { xn , yn }}. We wish to test the null hypothesis that there is no correlation between the measurements. To perform this test, we study the sample correlation coefficient r , defined by x y − (x ) (y )

r =

x 2 − ( x )2

y 2 − ( y )2

,

which estimates the true correlation ρ between the measurements that is defined by

µ − µX µY . ρ = XY σX σY Recall: If the measurements are independent, then ρ will equal 0. If ρ ≠ 0, then the measurements will not be independent; there will be some type of dependence. So if r is not sufficiently close to 0, then we will reject the claim that ρ equals 0, and therefore conclude that there is some dependence between the measurements. Hypothesis Test We shall test the hypothesis H0 : ρ = 0, with either a one-sided alternative. To do so, we use the test statistic r n−2 x = , 1 − r2 which follows a t -distribution with n − 2 degrees of freedom: t(n − 2). The alternatives could be Ha : ρ > 0; Ha : ρ < 0. We will reject H0 in favor of one of the alternatives if one of the right or left-tail probabilities is too small. Heights versus GPA. The heights and grade point averages from a random sample of 20 students gave a sample correlation of r = –0.059468. Is there significant evidence to reject the hypothesis that the true population correlation ρ is equal to 0 against the onesided alternative ρ < 0? Solution. We test H0 : ρ = 0 vs. Ha : ρ < 0. The test statistic is

x=

r n−2 1 − r2

=

−0.059468 × 18 1 − (−0.059468 )2

≈ –0.252749.

Because the alternative is ρ < 0, the rejection region is on the left side; thus the P value for this test will be the left-tail probability from the t(n − 2) = t(18) distribution.

Dr. Neal, WKU

Using the tcdf( command from the DISTR menu, we compute the P -value with the command tcdf(-1E99, -0.252749, 18) to obtain 0.40929. With the large P -value, we do not have significant evidence to reject that ρ = 0. If ρ = 0 were true, then there would be a 40.93% chance of obtaining a sample correlation r = –0.059468 or lower with a sample of size 20. Beer and blood alcohol. Sixteen volunteers at Ohio State University drank a randomly assigned number of beers. Thirty minutes later, a police officer measured their blood alcohol content (BAC). Here are the data: Student Beers BAC

1 5 0.10

2 2 0.03

3 9 0.19

4 8 0.12

5 6 3 7 0.04 0.095

Student Beers BAC

9 3 0.02

10 5 0.05

11 4 0.07

12 13 6 5 0.10 0.085

14 7 0.09

7 3 0.07

8 5 0.06

15 1 0.01

16 4 0.05

Test the hypothesis that the number of beers has no effect on blood alcohol versus a one-sided alternative that more beers increases the BAC. Solution. To perform the test on the TI-83/84, we first enter the data into lists. We will use list L4 for the number of beers (to be plotted on the x -axis), and use list L5 for the BAC (to be plotted on the y -axis). After entering the data, we adjust STAT PLOT settings, and press ZOOM 9.

From the scatterplot, we observe the general tendency of an increase in BAC as the number of beers increases. So we have some visual evidence that ρ > 0, which we will use as our alternative hypothesis. We now apply the LinRegTTest feature on lists L4 and L5 with the alternative set to >0 and the RegEq set to function Y1. (To set Y1, press VARS, Y-VARS, Function, Y1.) Upon doing so, we obtain the least-squares regression line as y ≈ –0.0127 + 0.01796376 x . The correlation is found to be r ≈ 0.894338, which yields r 2 ≈ 0.79984. (This feature puts the line in the form y = a + bx .)

Dr. Neal, WKU

We now test the hypotheses H0: ρ = 0, Ha : ρ > 0. From the results of the LinRegTTest, we obtain test statistic of about 7.48 with a P -value of about 1.4847 × 10 −6 . This very low P -value gives us evidence to reject H0 in favor of the alternative. If ρ = 0 were true, then there would almost no chance of obtaining an r of 0.894338 or higher from a sample of size 16. Because we conclude ρ > 0, we also can conclude that there is dependence between the number of beers consumed and BAC. Some Formal Details 1. The hypothesis test formally assumes that one of the sets of measurements X or Y is normally distributed. In this case, the test statistic truly follows a t(n − 2) distribution. However, even if neither population is normally distributed, the test is relatively robust for large samples. That is, it still gives experimentally accurate results. 2. If ρ = 0, can we infer that the two measurements are independent of each other? In general, the answer is “No.” It is possible that the correlation is 0 while the measurements display some dependence other than linear dependence. However, if the paired measurements are bivariate normal, then ρ = 0 does imply independence. Bivariate normal means: (i) Each set of measurements is normally distributed. (ii) For each fixed X measurement, the Y measurements are still normally distributed. (iii) For each fixed Y measurement, the X measurements are still normally distributed. For example, with heights and grade point averages, both measurements are usually normally distributed. But for students with any fixed GPA, say 3.25, their heights are still normally distributed; for students of a fixed height, say 5'10", their GPAs are still normally distributed. So if we accept ρ = 0 in this case, then we also may accept that height and GPA are independent. Exercise The data below gives the high school GPA and Verbal SAT score of a random sample of students: 2.6 3.1 3.4 3.1 3.4 3.8

460 450 510 400 420 500

3.7 3.25 3.4 2.7 3.5 3.8

500 560 540 480 530 480

3.2 3.0 3.5 3.2 3.2 3.1

450 500 520 620 460 420

3.6 2.5 3.5 3.1 3.0 3.2

480 510 500 550 560 520

2.5 2.6 3.5 2.6 3.5 3.7

510 410 330 470 540 550

3.7 2.7 3.3 3.5 3.7 2.7

490 450 570 450 600 410

(a) Find the sample correlation and the linear regression line for Verbal SAT as a function of GPA. (b) Test H0 : ρ = 0 versus Ha : ρ > 0 at the 5% level of significance. (c) If the data is bivariate normal, what can we conclude about independence?

Dr. Neal, WKU

Solution Solution. To perform the test on the TI-84, first enter the data into lists. We will use list L1 for the high school GPA (to be plotted on the x -axis), and use list L2 for Verbal SAT (to be plotted on the y -axis). After entering the data, adjust STAT PLOT settings, and press ZOOM 9.

From the scatterplot, there does not seem to be any observable relationship between High School GPA and Verbal SAT score. Next we compute the linear regression line, store it in Y1 and graph.

We see that y ≈ 34.44 x + 380.83474 and that r ≈ 0.2256 with r 2 ≈ 0.05. The low r 2 shows that the line is not at all a good fit of the data. The low r shows that there is not a strong linear relationship between High School GPA and Verbal SAT score. We now apply the LinRegTTest feature on lists L1 and L2 to test H0 : ρ = 0 versus Ha: ρ > 0 at the 0.10 level of significance. (Upon doing so, we also obtain the leastsquares regression line as well as r and r 2.)

We obtain test statistic of about 1.35 with a P -value of about 0.0929, which is not less than our stated level of significance of .05. Therefore we do not have evidence to reject H0 and we can accept that ρ = 0. If the data is bivariate normal and ρ = 0, then we also can conclude that High School GPA and Verbal SAT scores are independent of each other....