WEEK 09 Workshop Areesha Fatima PDF

Title WEEK 09 Workshop Areesha Fatima
Course Introduction To Statistical Reasoning
Institution Monash University
Pages 9
File Size 514.6 KB
File Type PDF
Total Downloads 83
Total Views 172

Summary

Workshop to help assist ...


Description

SCI1020: Introduction to Statistical Reasoning

WEEK 9: INFERENCE TECHNIQUESCOMPARING TWO MEANS: two independent sample problems Student's Name: Areesha Fatima

Tutorial Day/Time: Tuesday 8.a.m.

PRELIMINARY READING: Moore et al “Basic Practice of Statistics”, Ch21. On completion of this workshop you should be able to: 1.

Use Table C in Moore to determine critical t-values and (probability) P-values values associated with the Student’s t-distribution,

2.

Write a confidence interval (at a given level of confidence) for the difference between 2 INDEPENDENT means, unknown, using t-procedures,

3.

Perform the necessary steps in a test of significance (called a two sample t-test) on the difference between 2 INDEPENDENT means, unknown, using t-procedures.

PRELIMINARY QUESTIONS: Q.1 State in your own words OR by a math equation what is meant by each of the terms: Term

Meaning

Independent samples

samples that are selected randomly so that its observations do not depend on the value’s other observations.

Usual null value for the difference between two populations

The hypotheses for a difference in two population means are like those for a difference in two population proportions. The null hypothesis, H0, is again a statement of “no effect” or “no difference. th

Q2 Do Qs 21.1, 21.3 &21.4 from Moore et al, p.480. (7 ed, p.486). Read the question carefully! Inference about a mean or means: circle the correct data design in each case and explain your choice. Explanation: .1 single sample OR matched pairs OR two in Similar individual is paired from the same random sample and two different are measured for each individual

.3

.4

single sample OR matched pairs OR two independent samples single sample OR matched pairs OR two independent samples

A sample mean is statistically different from a known mean There are two independent random sample from two distinct groups

Q3 What is the formula for the standard error on the difference between two independent th sample means? Define all terms. Read Moore p.483 (7 ed, p.489).

Standard error : t = 



and and

Week 11

-

/

is the sample mean for two independent sample is the standard deviation for two independent sample

Copyright 2019: Monash University

Page | 1

TWO INDEPENDENT MEANS:

t- procedures

When population SD, , is NOT known use sample standard deviation, s.

CONFIDENCE INTERVAL SAMPLE VALUE

(MULTIPLIER

STANDARD ERROR)

For difference between means of two INDEPENDENT populations, B - A, Use UNPOOLED Approach only:

xB xA where each

x

(t*

sB2 nB

sA2

nA )

is each sample estimate of the mean,

t* is the critical z-value (or z-multiplier) for a given level of confidence AND sample size as described by the degrees of freedom, df, conservatively approximated by the (smaller n -1). The full df calculation is only done if using software, see Moore p.480.) sA and sB are the sample standard deviations, and nA and nB are the respective sample sizes,.

TEST OF SIGNIFICANCE Remember the 4 steps in any test of significance and general structure of the formula for the test statistic used is TEST STATISTIC

= SAMPLE VALUE – NULL VALUE STANDARD ERROR

Use only the FOR UNPOOLED case:

(xB

xA ) 0

t-statistic =

2

sB

n

B

2

sA

n

A

where sA and sB are the sample standard deviations, nA and nB are the respective sample sizes, and

df is conservatively approximated by the smaller (n-1) NOTE that the null value is usually zero – no difference, unless specified otherwise: Use the Student’s t-distribution (Table C) to determine the area associated with this statistic and the degrees of freedom of s.

For how to read the t-distribution table given as Table C of Moore et al refer back to Week 10 Workshop sheet, p. 4.

Week 11

Page | 2

WORKSHOP PROBLEMS: th

Q.4 Do Q21.52 from Moore et al, p.505 (7 ed, p.514). The Excel file “Flower lengths” is on Moodle site. Data: H. caribaea (red) Sample mean, x Sample standard deviation, s Sample size, n

39.71

H. caribaea (yellow)

36.18

1.80

0.98

23

15

Difference between sample means (Red – Yellow) = ( xR – x r )= 39.71-36.18= 3.53 (conservative) df = __14___

so (95%) t* = 2.145

95% confidence interval for difference between the mean length of all Red and Yellow H. cariaea flowers is:

-

±

39.71 – 36.18 ± 2.145 3.53 ± (2.145) (0.453) 3.53 ± 0.972 = (2.558, 4.502) Is zero a possible value for the difference between the two population means? No, because zero is not present in the confidence interval.

Conclusion: Zero is not a possible value for the difference between the two population means because it is not included in the confidence interval range above. A value of zero would indicate that there is no significant difference between the population means. Furthermore, we are 95% confident that the mean length of the red flowers exceeds the mean length of the yellow flowers by 3.53 mm.

th

Q.5 Do Q21.8 from Moore et al, p.490 (7 ed, p.496). Download the Excel file “LifeExp” available on Moodle. The question is “Do men and women differ in their perceptions of life expectancy?” Perform a test of significance fully at a 5% level of significance. Men Sample mean, x Sample standard deviation, s

-

-

19 .50

5.6125

Sample size, n

Women

6

12.71

5.5891 7

Step 1. Statement of null and alternative hypotheses Ho:

:

-

= 0 Vs

:

-

≠0

where µ is the perceived life expentancy Step 2a. Use appropriate plots to check the conditions that are needed to carry out this test. The population is approximately normal, with both samples being random.

Step 2b. Calculation of appropriate Test Statistic:

Step 3. Determination of p-value: (use Table C) Sketch the t-distribution and position of sample:

t=

=

=- 2.18 Degrees of freedom , df = 5 p-value range is: between 0.05 and 0.10

Step 4 Statistical decision and Conclusion: Statistical decision: The p-value is more than α= 0.05 so there is not enough evidence to claim at 5% significance level that the mean of life expectancy is different for the two populations (men and women).

In context of question: There is not enough evidence to conclude that men and women differ in their perceptions of life expectancy.

EXTRA: The supplied answer in the question used software and so the full df calculation (10.88) and gave an exact p-value. The p-value was stated as ___10.68______?

The p-value was stated as 0.05281

Did it matter in this case? No , in this case it is not significant. Page | 4

th

Q.6 An Excel file “Soil Penetrability” is on Moodle. (Based on Q 21.10 from 7 ed, p.499). Identify the outlier in the “intermediate soil” sample: Her es ampl es i z eoft hei nt er medi at es oi l s ampl ei sn1=20

On checking that measurement the researchers found that the soil in this unit was not really of the intermediate type (but loose, sandy soil) and so that datum value can be removed from the data. What are the summary statistics of the remaining data (ie without the outlier in this case)? Do keep 2 d,p. in average and 4 d.p. in s for the calculations. C, Compressed Soils 2.91

Sample mean, x Sample standard deviation, s

0.1390

Sample size, n 20

I, Intermediate Soils N 2 = 19

0.2397 19

Perform a test of significance at = 0.05 to answer the question of: Is the penetrability of compressed soil less than that of “intermediate soil”? Show all steps in the test and your working. Step 1: determine null and alternative hypotheses :

= 0 Vs

:

-

0

Where µ is the mean of number of words spoken per day Step 2a: verify conditions Sum of sample size is 47 which is more than 40. So, the condition is ok

Step 2b: test statistic

Step 3: find p-value df= smaller n-1 = 20-1 = 19 One-tailed P-value is between 0.05 and 0.10 Step 4: Statistical decision and conclusion The p-value is less than = 0.10 so there is sufficient evidence to claim at 10% significance level that the mean of number of words spoken per day are different for the two populations (women and men).

EXTRA: USE a 90% confidence interval to determine the difference between men and

women in the number of word spoken per day:

df= 20-1 = 19 so (90%) t* = 1.729

90% Confidence interval on the difference is: 16496-12867 ± 1.729

3629 ± (1.729) (2408.3) 3629 ± 4164 = ( -535, 7793)

Conclusion: we are 80% confident that the difference between men and women in the number of words spoken per day is between -535 and 7793 as it is one-sided at α = 0.10.

IMPORTANT NOTE: What do you notice about the conclusion from these two approaches? NB Confidence Intervals are always 2-sided procedures … so a 90% CI and a test at α = 0.10 are only identical always IF the test of significance is also 2-sided. If the test is ONE-sided at α = 0.10 as above, then it may give a different conclusion (especially if borderline!) and really corresponds to an 80% (0.10 in each side) confidence interval.

MARK :

/10 Page | 6...


Similar Free PDFs