Title | STAT 252 Chi Squared Solutions |
---|---|
Course | Statistical Inference for Management |
Institution | California Polytechnic State University San Luis Obispo |
Pages | 6 |
File Size | 561.9 KB |
File Type | |
Total Downloads | 61 |
Total Views | 155 |
STAT 252 Chi Squared Solutions...
Stat252 spring 2017
24 Chi-square tests- (Chapter #14)
1
______________________________________________________________________________________________________________________________________________________________________________________________________
Chi-Square Goodness-of-Fit Test Today we turn our attention to the analysis of a categorical variable with more than two possible categories. Goodness of Fit/Chi-square Test (TODAY) Is the distribution of outcomes matching our theory or not? Eg: Is this die fair?
Example1: Cereal Distribution (Goodness of Fit Chi-square Test) According to A.C. Nielsen Financial Services, 2016. Data is based on "unit sales" for the 52-weeks ended June 8, 2014. A survey of 635 shoppers selected randomly from grocery stores in California asked for their favorite brands of cereal. From the following data, determine whether the cereal companies have the same marked share in California as nationwide: At least 1 categorical variable that can have 2+ categories
Categories
Percent in population
Observed Frequency
Expected frequency (Ei)
(or observed cell count)
(expected cell count)
Kellogg’s
34%
220
General Mills
31%
165
Kraft
14%
Private Labels
Chi-square for each category. ( Obsi Exp i ) 2 2 i
Exp i
(.34*635) = 215.9
(220-215.9)2/215.9 = .07
(.31*635) = 196.85
(220-196.85)2/196.85 = 5.153
100
(.14*635) = 88.9
(220-88.9)2/88.9 = 1.385
10%
80
(.1*635) = 63.5
(220-63.5)2/63.5 = 4.287
Quaker Oats
7%
40
(.07*635) = 44.45
(220-44.45)2/44.45 = 0.445
Malt-O-Meal& Other
4%
(.04*635) = 25.4
(220-25.4)2/25.4 = 0.833
(1*635) = 635
(220-635)2/635 = 12.183049
Total a)
100%
30 635
Identify the observational units and variable. N= 635 shoppers from California Variable: what is your favorite cereal?
b) What type of variable is this: categorical or quantitative? Type: categorical with K=6 c) Are the percentages mentioned above (34%, 31%, 14%, 10%, 7% and 4%) parameters or statistics? Explain how you know. Statistics (in a population), are fixed numbers in 2016 full market share of cereal d) Do those alleged percentages seem to correspond to a null hypothesis or an alternative hypothesis? State the null and alternative hypotheses for this problem. H0: Pk=.34 (the true proportion of kellogs will be 34%), PGM=.31, PKraft=.14, PPL=.1, PQ=.07, Pother=.04 Ha: at least 1 brand will be different to given (Pi≠to given) How a Goodness- of -Fit Test Works: The goodness -of -fit test is based on a comparison of the observed frequencies (actual data from the your sample or observed) with the expected frequencies when H0 is true. That is, we compare what we actually see with what would expect to see if H0 were true. If the difference between the observed and expected frequencies is large, we reject H 0. As usual, it comes down to how large a difference is large. The hypothesis we conduct to answe r this question relies on χ2 distribution.
Stat252 spring 2017
24 Chi-square tests- (Chapter #14)
2
The basic idea is to Calculate the expected counts given the (null) hypothesis that the claim is correct Compare the observed counts to these expected counts Construct a test statistic that measures the discrepancy between them Determine the probability of getting such an extreme discrepancy if the claim were correct (p-value) Reject the claim if this p-value is small
e)
Calculate the expected counts for this study, under the hypothesis that the theory is correct. Record these in the “Expected counts (E)” row of the table above. 2 (Obs i Expi ) 2 2 The test statistic is denoted by Χ and is calculated as: Exp
f) Calculate the value of the test statistic for these data using the table above. 2 (using the column, Chi-square (per cell) Total chi-square:
for all cells
i
Decision Rule: The chi-square is a one-tailed distribution. Negative values are impossible. (Mean = df, Mode= df-2) Larger values of
2
indicate stronger evidence against H 0 ;
2
Reject H0 If > c.v. 2 with df=k-1 or p-value < α (where k=# of categories)
g) Determine the p-value of this test, as accurately as possible from the Chi-square table. Df=6-1=5 Critical value at α=0.5 is 11.07 Reject H0 If 2 >X11.07 12.18 is > 11.07
h)
Would you reject the null hypothesis at α =.05 level? We reject null at 0.5.
This procedure is called a chi-square test of goodness-of-fit. It is valid as long as:
The data have been randomly selected. The sample data consist of frequency counts for each of the different categories. For each category, the expected frequency is at least 5.
i) Check conditions for our study.
Assumptions randomly selected-data was randomly collected from california categorical data-cereal types expected is high High chi squared- reject H0 j) Summarize your conclusion about whether the sample data support or refute the claim, and explain how your conclusion follows from this test.( using alpha=0.05)
At a .05 significance level, we have enough evidence to say that at least 1 brand of California market shares cereal does not follow the nationwide market shares.
Stat252 spring 2017
24 Chi-square tests- (Chapter #14)
3
k) Summarize your conclusion about whether the sample data support or refute the claim, and explain how your conclusion follows from this test.( using alpha=0.01)
California likes General Mills 5.1 less compared to nationwide. l) Step to perform this test in JMP. DATA
Analyze Distribution: Y=Categories and Freq= Observed Frequency “hot spot” next to Categories and select “Test Probabilities” and enter the hypothesized probabilities.
OUTPUT
SUMMARY of Goodness of Fit: Goodness of Fit Test (One of several “Chi-Square” Tests) Null Hypothesis: the population rates all match specified levels (eg: either chance of each outcome is the same or equal to prespecified distribution ) Alternative Hypothesis: at least one population rate does not match specified (o e )2 Test Statistic 2 P-value from Chi-Square Table and df = #cells-1=k-1 ie i i Assumptions: 1) a random sample or random assignment, 2) frequency counts, 3) the expected count is at least five for each category. (Expi is greater than 5)
Column expected frequencyformula: % in population * 635
Stat252 spring 2017 24 Chi-square tests- (Chapter #14) ( ON YOUR OWN) Example2: Birthdays of week (Goodness of Fit Chi-square Test)
4
The following table reports the birthdays of 147 “noted writers of the present” listed in The 2000 World Almanac and Book of Facts: Monday Tuesday Wednesday Thursday Friday Saturday Sunday Total Days Distribution (pop. proportion)
Observed counts (O) Expected counts (E)
100% 17
26
22
23
19
15
25
147
Chi-square
Test whether these sample data provide evidence that people are not equally likely to be born on the seven days of the week. Summarize and explain your conclusions.
Stat252 spring 2017 24 Chi-square tests- (Chapter #14) 5 Example1 : A highway department executive claims that the number of fatal accidents which occur in CA does not vary from month to month. A survey of 175 fatal accidents produced the following results. Month January February March April May June July August September October November December Total N 195
DF 11
Chi-Sq 16.5077
# of accidents 18 16 17 10 8 22 15 18 15 11 20 25 N=175
Expected
Chi-square
P-Value 0.123
a)
(4pts) Is the executive’s claim refuted by the data at 0.1 level? State the null and alternative hypothesis Use the table above to do the calculations and state conclusion of the test.
b)
(3pts) What assumptions were made in the test?
Example 2: Just to practice: A statistics Professor conducted an attitude survey of 120 students (taking Stat252- ) . The students were asked to pick the one category which most accurately described their attitude toward statistics in general…. It’s known that 25% of the students have optimistic attitude toward statistics, 30% are both slightly optimistic and slightly pessimistic and 15 % have pessimistic attitude toward statistic. She would like to know whether the attitude toward statistics of the students in her class follows the same pattern (distribution) or different. The results of the survey were as follows: Observed cell counts = Percent Expected cell Attitude toward Chi-square Number of students given counts statistics
Optimistic
50
0.25
Slightly Optimistic
30
0.3
Slightly Pessimistic
30
0.3
Pessimistic
10
0.15
Total
N=120 DF 120
Chi-Sq P-Value 3 18.8889 0.000
a) Can the statistics professor conclude that attitude toward statistics in her class different from given at 0.01? State the null hypothesis and conclusion of the test.
Stat252 spring 2017
24 Chi-square tests- (Chapter #14)
6
Example1: A highway department executive claims that the number of fatal accidents which occur in CA does not vary from month to month. A survey of 175 fatal accidents produced the following results. Months January February March April May June July August September October November December
# of accidents 18 16 27 10 8 12 15 8 15 11 20 15
Total
Expected
Chi-square
N=175
N
DF
175
11
Chi-Sq
P-Value
22.280 0.0223
a) Is the executive’s claim refuted by the data at 0.05 level? State the null and alternative hypothesis. (Show all your work in the table above). State conclusion of the test.
b) What assumptions were made in the test?
Example 2: Just to practice: A statistics Professor conducted an attitude survey of 120 students (taking Stat252- ) . The students were asked to pick the one category which most accurately described their attitude toward statistics in general…. It’s known that 40% of the students have optimistic attitude toward statistics, 20% are both slightly optimistic and slightly pessimistic and 20% have pessimistic attitude toward statistic. She would like to know whether the attitude toward statistics of the students in her class follows the same pattern (distribution) or different. The results of the survey were as follows: Attitude toward Observed cell counts = Percent Expected cell Chi-square statistics Number of students given counts Optimistic
50
0.40
Slightly Optimistic
30
0.20
Slightly Pessimistic
30
0.20
Pessimistic
10
0.20
Total
N=120 DF 120
a)
Chi-Sq P-Value 3 11.25 0.0104
Can the statistics professor conclude that attitude toward statistics in her class different from given at 0.01? State the null hypothesis and conclusion of the test....