STAT 252 Chi Squared Solutions PDF

Title	STAT 252 Chi Squared Solutions
Course	Statistical Inference for Management
Institution	California Polytechnic State University San Luis Obispo
Pages	6
File Size	561.9 KB
File Type	PDF
Total Downloads	61
Total Views	155

Preview

CLICK TO PREVIEW PDF

Summary

STAT 252 Chi Squared Solutions...

Description

Stat252 spring 2017

24 Chi-square tests- (Chapter #14)

1

______________________________________________________________________________________________________________________________________________________________________________________________________

Chi-Square Goodness-of-Fit Test Today we turn our attention to the analysis of a categorical variable with more than two possible categories. Goodness of Fit/Chi-square Test (TODAY)   Is the distribution of outcomes matching our theory or not?   Eg: Is this die fair?

Example1: Cereal Distribution (Goodness of Fit Chi-square Test) According to A.C. Nielsen Financial Services, 2016. Data is based on "unit sales" for the 52-weeks ended June 8, 2014. A survey of 635 shoppers selected randomly from grocery stores in California asked for their favorite brands of cereal. From the following data, determine whether the cereal companies have the same marked share in California as nationwide: At least 1 categorical variable that can have 2+ categories

Categories

Percent in population

Observed Frequency

Expected frequency (Ei)

(or observed cell count)

(expected cell count)

Kellogg’s

34%

220

General Mills

31%

165

Kraft

14%

Private Labels

Chi-square for each category. ( Obsi  Exp i ) 2 2 i 

Exp i

(.34*635) = 215.9

(220-215.9)2/215.9 = .07

(.31*635) = 196.85

(220-196.85)2/196.85 = 5.153

100

(.14*635) = 88.9

(220-88.9)2/88.9 = 1.385

10%

80

(.1*635) = 63.5

(220-63.5)2/63.5 = 4.287

Quaker Oats

7%

40

(.07*635) = 44.45

(220-44.45)2/44.45 = 0.445

Malt-O-Meal& Other

4%

(.04*635) = 25.4

(220-25.4)2/25.4 = 0.833

(1*635) = 635

(220-635)2/635 = 12.183049

Total a)

100%

30 635

Identify the observational units and variable. N= 635 shoppers from California Variable: what is your favorite cereal?

b) What type of variable is this: categorical or quantitative? Type: categorical with K=6 c) Are the percentages mentioned above (34%, 31%, 14%, 10%, 7% and 4%) parameters or statistics? Explain how you know. Statistics (in a population), are fixed numbers in 2016 full market share of cereal d) Do those alleged percentages seem to correspond to a null hypothesis or an alternative hypothesis? State the null and alternative hypotheses for this problem. H0: Pk=.34 (the true proportion of kellogs will be 34%), PGM=.31, PKraft=.14, PPL=.1, PQ=.07, Pother=.04 Ha: at least 1 brand will be different to given (Pi≠to given) How a Goodness- of -Fit Test Works: The goodness -of -fit test is based on a comparison of the observed frequencies (actual data from the your sample or observed) with the expected frequencies when H0 is true. That is, we compare what we actually see with what would expect to see if H0 were true. If the difference between the observed and expected frequencies is large, we reject H 0. As usual, it comes down to how large a difference is large. The hypothesis we conduct to answe r this question relies on χ2 distribution.

Stat252 spring 2017

24 Chi-square tests- (Chapter #14)

2

The basic idea is to  Calculate the expected counts given the (null) hypothesis that the claim is correct  Compare the observed counts to these expected counts  Construct a test statistic that measures the discrepancy between them  Determine the probability of getting such an extreme discrepancy if the claim were correct (p-value)  Reject the claim if this p-value is small

e)

Calculate the expected counts for this study, under the hypothesis that the theory is correct. Record these in the “Expected counts (E)” row of the table above. 2 (Obs i  Expi ) 2 2    The test statistic is denoted by Χ and is calculated as: Exp

f) Calculate the value of the test statistic for these data using the table above. 2  (using the column, Chi-square (per cell)  Total chi-square:

for all cells

i

Decision Rule: The chi-square is a one-tailed distribution. Negative values are impossible. (Mean = df, Mode= df-2) Larger values of

2

indicate stronger evidence against H 0 ;

2

Reject H0 If  >  c.v. 2 with df=k-1 or p-value < α (where k=# of categories)

g) Determine the p-value of this test, as accurately as possible from the Chi-square table. Df=6-1=5 Critical value at α=0.5 is 11.07 Reject H0 If  2 >X11.07 12.18 is > 11.07

h)

Would you reject the null hypothesis at α =.05 level? We reject null at 0.5.

This procedure is called a chi-square test of goodness-of-fit. It is valid as long as:   

The data have been randomly selected. The sample data consist of frequency counts for each of the different categories. For each category, the expected frequency is at least 5.

i) Check conditions for our study.

Assumptions randomly selected-data was randomly collected from california categorical data-cereal types expected is high High chi squared- reject H0 j) Summarize your conclusion about whether the sample data support or refute the claim, and explain how your conclusion follows from this test.( using alpha=0.05)

At a .05 significance level, we have enough evidence to say that at least 1 brand of California market shares cereal does not follow the nationwide market shares.

Stat252 spring 2017

24 Chi-square tests- (Chapter #14)

3

k) Summarize your conclusion about whether the sample data support or refute the claim, and explain how your conclusion follows from this test.( using alpha=0.01)

California likes General Mills 5.1 less compared to nationwide. l) Step to perform this test in JMP. DATA

Analyze Distribution: Y=Categories and Freq= Observed Frequency “hot spot” next to Categories and select “Test Probabilities” and enter the hypothesized probabilities.

OUTPUT

SUMMARY of Goodness of Fit: Goodness of Fit Test (One of several “Chi-Square” Tests)  Null Hypothesis: the population rates all match specified levels (eg: either chance of each outcome is the same or equal to prespecified distribution )  Alternative Hypothesis: at least one population rate does not match specified (o  e )2  Test Statistic  2  P-value from Chi-Square Table and df = #cells-1=k-1  ie i i  Assumptions: 1) a random sample or random assignment, 2) frequency counts, 3) the expected count is at least five for each category. (Expi is greater than 5)

Column expected frequencyformula: % in population * 635

Stat252 spring 2017 24 Chi-square tests- (Chapter #14) ( ON YOUR OWN) Example2: Birthdays of week (Goodness of Fit Chi-square Test)

4

The following table reports the birthdays of 147 “noted writers of the present” listed in The 2000 World Almanac and Book of Facts: Monday Tuesday Wednesday Thursday Friday Saturday Sunday Total Days Distribution (pop. proportion)

Observed counts (O) Expected counts (E)

100% 17

26

22

23

19

15

25

147

Chi-square

Test whether these sample data provide evidence that people are not equally likely to be born on the seven days of the week. Summarize and explain your conclusions.

Stat252 spring 2017 24 Chi-square tests- (Chapter #14) 5 Example1 : A highway department executive claims that the number of fatal accidents which occur in CA does not vary from month to month. A survey of 175 fatal accidents produced the following results. Month January February March April May June July August September October November December Total N 195

DF 11

Chi-Sq 16.5077

# of accidents 18 16 17 10 8 22 15 18 15 11 20 25 N=175

Expected

Chi-square

P-Value 0.123

a)

(4pts) Is the executive’s claim refuted by the data at 0.1 level? State the null and alternative hypothesis Use the table above to do the calculations and state conclusion of the test.

b)

(3pts) What assumptions were made in the test?

Example 2: Just to practice: A statistics Professor conducted an attitude survey of 120 students (taking Stat252- ) . The students were asked to pick the one category which most accurately described their attitude toward statistics in general…. It’s known that 25% of the students have optimistic attitude toward statistics, 30% are both slightly optimistic and slightly pessimistic and 15 % have pessimistic attitude toward statistic. She would like to know whether the attitude toward statistics of the students in her class follows the same pattern (distribution) or different. The results of the survey were as follows: Observed cell counts = Percent Expected cell Attitude toward Chi-square Number of students given counts statistics

Optimistic

50

0.25

Slightly Optimistic

30

0.3

Slightly Pessimistic

30

0.3

Pessimistic

10

0.15

Total

N=120 DF 120

Chi-Sq P-Value 3 18.8889 0.000

a) Can the statistics professor conclude that attitude toward statistics in her class different from given at 0.01? State the null hypothesis and conclusion of the test.

Stat252 spring 2017

24 Chi-square tests- (Chapter #14)

6

Example1: A highway department executive claims that the number of fatal accidents which occur in CA does not vary from month to month. A survey of 175 fatal accidents produced the following results. Months January February March April May June July August September October November December

# of accidents 18 16 27 10 8 12 15 8 15 11 20 15

Total

Expected

Chi-square

N=175

N

DF

175

11

Chi-Sq

P-Value

22.280 0.0223

a) Is the executive’s claim refuted by the data at 0.05 level? State the null and alternative hypothesis. (Show all your work in the table above). State conclusion of the test.

b) What assumptions were made in the test?

Example 2: Just to practice: A statistics Professor conducted an attitude survey of 120 students (taking Stat252- ) . The students were asked to pick the one category which most accurately described their attitude toward statistics in general…. It’s known that 40% of the students have optimistic attitude toward statistics, 20% are both slightly optimistic and slightly pessimistic and 20% have pessimistic attitude toward statistic. She would like to know whether the attitude toward statistics of the students in her class follows the same pattern (distribution) or different. The results of the survey were as follows: Attitude toward Observed cell counts = Percent Expected cell Chi-square statistics Number of students given counts Optimistic

50

0.40

Slightly Optimistic

30

0.20

Slightly Pessimistic

30

0.20

Pessimistic

10

0.20

Total

N=120 DF 120

a)

Chi-Sq P-Value 3 11.25 0.0104

Can the statistics professor conclude that attitude toward statistics in her class different from given at 0.01? State the null hypothesis and conclusion of the test....