Title | Statistic Group Assignment 01 |
---|---|
Author | sithmi aloka |
Course | Business statistics |
Institution | Victory University |
Pages | 32 |
File Size | 1 MB |
File Type | |
Total Downloads | 54 |
Total Views | 181 |
statistic group assignment 1...
GROUP ASSIGNMENT 01 BUSINESS STATISTICS
BEO2255 APPLIED STATISTICS FOR BUSINESS GROUP ASSIGNMENT 1 GROUP MEMBERS NAME B.D.S. ALOKA AYESHA D. LIYANAGE
STUDENT ID NUMBERS 4624803/10020410 4610235/10020597
CONTENTS 1. Introduction 2. Part A 3. Part B
Pg. 4 Pg. 4 - 19 Pg. 20 - 32
INTRODUCTION This report uses SPSS output and its interpretations to conduct One-way ANOVA and Descriptive statistics for two set of data and by testing Wilcoxon Rank Sum Test, Wilcoxon Signed Rank Test and Kruskall-Wallis test.
Assignment 1 Part A 1. Use Descriptive statistics to summarize the data and develop a 95% confidence interval estimate of the mean income of the respondents Table 1 - Case processing summary
Case Processing Summary Cases Valid N INCOME
Missing Percent
160
N
100.0%
Total
Percent 0
0.0%
N
Percent 160
100.0%
In the above case processing summary table, there are 160 values that are taken to the consideration and no missing values are found. The total value is 160 as above mentioned in the table. Table 2 – Descriptive statistic
Descriptives Statistic MONTHLY
Mean
INCOME
95% Confidence Interval for
Lower Bound
34357.63
(SLR)
Mean
Upper Bound
38154.87
36256.25
5% Trimmed Mean
36097.22
Median
39000.00
Variance
147864740.566
Std. Error 961.330
Std. Deviation
12159.965
Minimum
18000
Maximum
58000
Range
40000
Interquartile Range
24000
Skewness Kurtosis
.009
.192
-1.403
.381
As above mentioned in the descriptive statistic table, there are 160 observations, out of that the average monthly income of the respondents are or the mean value is 36,256.25. with a standard deviation of 12159.965. The income is distributed between the range of maximum and minimum point of 18000-58000; as a percentage of 26.47%, the range between maximum and minimum value is 40000. Moreover, the Co-efficient of Variation would be 33.54%. (CV= STD/ Mean * 100) Skewness is 0.009 and Kurtosis IS -1.403. According to the above information we can claim that, at 95% of significant level, we have enough evidences to prove that, average income will be distributed between 34,357.6338,154.87 the lower bound and the upper bound the range between it is 3797.24.
2. Use Descriptive statistics to summarize the data and develop a 95% confidence interval estimate of the mean Age of (a) Male and (b) Female respondents in the target population Table 1 - Case Processing Summary
Case Processing Summary GENDER
Cases Valid N
Missing Percent
N
Total
Percent
N
Percent
Age in
Male
91
100.0%
0
0.0%
91
100.0%
years
Female
69
100.0%
0
0.0%
69
100.0%
As shown in the case processing summary, among all individuals, 91 of more people are males and other 69 are females. Furthermore, there are no any missing values in the table. The range of difference between male and female is (91-69) 22.
Table 2 - Descriptive Statistic
Descriptives GENDER AGE
Male
Statistic Mean
38.55
95% Confidence Interval for
Lower Bound
36.38
Mean
Upper Bound
40.72
5% Trimmed Mean
38.38
Median
36.00
Variance Std. Deviation
108.584 10.420
Minimum
20
Maximum
59
Range
39
Std. Error 1.092
Interquartile Range
18
Skewness
.353
.253
Kurtosis
-.966
.500
Mean
40.30
1.286
95% Confidence Interval for
Lower Bound
37.74
Mean
Upper Bound
42.87
5% Trimmed Mean
40.25
Median
38.00
Variance Female
Std. Deviation
114.068 10.680
Minimum
23
Maximum
59
Range
36
Interquartile Range
16
Skewness Kurtosis
.289
.289
-1.047
.570
As above mentioned out of 160 individuals, 91 are males and 69 are females. So, as per above descriptive statistic table shows from all 91 males, the average age of males is 38.55 years and the median age is 36 years. The values lie within the range of 39, with the minimum value of 20 and the maximum value as 59. The standard deviation of descriptive statistic table of age of the males is 10.420. The co- efficient of variation would be of age of males is 27.03%. Therefore, at 95% of confidence level, we have enough evidences to prove that average age of males is in between lower bound of 36.38 and upper bound of 40.72, a range difference of 4.34 years. According to all the 69 females, the average age of females in the above table data set are 40.30 years and the median age of 38 years, the values lie within a range is 36 of the maximum value of 59 and the minimum value of 23. Standard deviation of females are 10.680 years. The co- efficient of variation would be 26.50%. Hence, at 95% of confidence level interval, we have enough evidences to prov that the mean age of females is in between lower bound of 37.74 and the upper bound of 42.87 years. The range difference between upper and lower bound is 5.13 years.
3. Create a new variable called POSITIVE where POSITIVE = 1 if the response to Attitude was 4 or 5, OTHERWISE = 0. Attach value labels and variable labels to POSITIVE. a) Find the proportion of Male respondents in the sample that had a positive response to the attitudinal (positive) question.
Case Processing Summary Cases Valid N POSITIVE * GENDER
Missing
Percent 160
N
100.0%
Total
Percent 0
N
0.0%
Percent 160
100.0%
Gender* Positive Crosstabulation Gender Male Count
Total Female
59
49
108
36.9%
30.6%
67.5%
32
20
52
20.0%
12.5%
32.5%
91
69
160
56.9%
43.1%
100.0%
Otherwise % of Total Positive Count Positive % of Total Count Total % of Total
From the above gender * positive crosstabulation table the whole data set, 108 people, which was 67.5% from 160 observations, have strongly disagreed/ disagreed/ had no view either way, for the declaration, “The campaign has influenced my smoking habit”. On other way, the rest of 69 people, which was 43.1% from whole set of population, have either strongly agreed or agreed to the declaration, “The campaign has influenced my smoking habit”
Among all 91 males, 59 which is 36.9% from the whole population, have not had a positive response in this new variable as per shown in the table. Though the rest of 20.0% have given a positive response to the above declaration saying, that the campaign has influenced their smoking habits. Furthermore, from the above data 69 females of the population, only 30.6% have strongly disagreed/ disagreed/ had no view either way for the declaration above; and the rest of 12.5% have strongly agreed/ agreed to the declaration, saying that smoking has influenced their smoking habits. Therefore, the proportion of Male respondents in the sample that had a positive response to the attitudinal (positive) campaign was 18.1% as a proportion from the whole populations
According to the cross tabulation in table 2
b. Construct a 95 percent confidence interval on the proportion of Male customers in the population with a positive response. (Remember that, given the way POSITIVE is coded, the mean is the same as the proportion that had a positive response to Attitude.
Descriptives GENDER POSITIVE
Male
Statistic Mean
Std. Error
.35
95% Confidence Interval for
Lower Bound
.25
Mean
Upper Bound
.45
5% Trimmed Mean
.34
Median
.00
Variance
.231
Std. Deviation
.480
Minimum
0
Maximum
1
Range
1
Interquartile Range
1
Skewness
.632
.050
.253
Kurtosis Mean
Female
-1.637
.500
.29
.055
95% Confidence Interval for
Lower Bound
.18
Mean
Upper Bound
.40
5% Trimmed Mean
.27
Median
.00
Variance
.209
Std. Deviation
.457
Minimum
0
Maximum
1
Range
1
Interquartile Range
1
Skewness Kurtosis
.947
.289
-1.137
.570
As shown in the descriptive table above the mean proportion of Male customers in the population with a positive response is 0.35, with a standard deviation of 0.480. The co-efficient of variation of male customers is 1.37%. Therefore, at 95% of confidence level interval, we have enough evidences to prove that positive response of males can be deviated from upper bound of 0.45 and the lower bound of 0.25 and the range difference between boundaries is 0.2.
Question No. 04 Testing the normality of spending data
Descriptives Statistic Mean
478.38
95% Confidence Interval
Lower Bound
453.66
for Mean
Upper Bound
503.09
5% Trimmed Mean
478.89
Median
465.00
Variance BEFORE (SLR)
Std. Error 12.513
25050.173
Std. Deviation
158.272
Minimum
150
Maximum
770
Range
620
Interquartile Range
268
Skewness
.142
.192
Kurtosis
-1.097
.381
AFTER
Mean
516.63
22.564
(SLR)
95% Confidence Interval
Lower Bound
472.06
for Mean
Upper Bound
561.19
5% Trimmed Mean
495.83
Median
500.00
Variance Std. Deviation
81458.978 285.410
Minimum
40
Maximum
2100
Range
2060
Interquartile Range Skewness
370 1.603
.192
Kurtosis
5.562
.381
When examining the histogram and descriptive table 1 above, buying of the product before the campaign (SLR), mean value is 478.38 and the median is 465.00, are not equal and the standard deviation of 158.272. The co- efficient of variation of before the campaign is 33.08%. Moreover, skewness is (-0.142) and kurtosis is (-1.097) none of them are equals to zero. Furthermore, mentioning the above histogram of buying the products before campaign (SLR), it has many peaks and valleys making the curve not bell- shaped, which again proves that the data is not normal and the curve is also not normal with shape in buying of the products before the campaign (SLR).
When analyzing the table 1 and the histogram after the campaign (SLR), as per the table 1 the mean after the campaign is 516.63 and the median is 500.00 are not equal and the standard deviation of after the campaign is 285.410. The co-efficient of variation of buying the product after the campaign is 55.24%. Moreover, skewness value is 1.603 and the kurtosis value of 5.562, none of them is equals to zero in data given in the table. Furthermore, referring to the above histogram of buying products after the campaign (SLR), It has many peaks and valleys
making the curve not bell- shaped, which again proves that the data is not normal in purchase of the products after the campaign (SLR). a. Perform a Wilcoxon Rank Sum test of the null hypothesis that the distributions of spending before being exposed to the advertising campaign are the same for Full-time workers and Parttime workers in the population of respondents. Perform the hypothesis test at 5% level of significance. As data in the above table and histograms is not normal, as therefore we are should continue testing using a non-parametric test as per Wilcoxon Rank Sum Test. H0: The distribution of spending data before advertising campaign are same for full time employees and part time employees in the population of respondents. H1: At least one of the spending data before advertising campaign are not same in full time employees and part time employees in the population of respondents.
Testing the difference between Full- Time Service and Part- Time Service.
Ranks JOB
BEFORE
N
Mean Rank
Sum of Ranks
part time employment
71
91.94
6527.50
full time employment
89
71.38
6352.50
Total
160
In the above table shows that purchase of the product before the campaign,, the observation of number of employees engaging in part-time service are 71 with the mean rank of 91.94 and the sum of ranks of part-time employment is 6527.50, and the employees who engaged in full-time service is 89, with the mean rank of 69.60 and with the sum of ranks of full- time employment is 6352.50. According to that mean ranks and sum of ranks are not same for Part- time employees and fulltime employees, furthermore, by examine the above data, there is a difference between part-time employees and full- time employees. But as per there is no sufficient evidence to prove that, we
should continue testing using a non- parametric test, which is Wilcoxon Rank Sum Test Statistic to test whether there is a difference among mean rank and sum of ranks.
Test Statisticsa BEFORE Mann-Whitney U
2347.500
Wilcoxon W
6352.500
Z
-2.791
Asymp. Sig. (2-tailed)
.005
a. Grouping Variable: JOB
Decision Rule If the significant value is less than 0.05, we reject the null hypothesis (sig < 0.05). Hence, as in the situation, significance value is 0.005, which is less than 0.05, so we can the reject null hypothesis 0.005 < 0.05. Therefore, AT 95% confidence level, we have enough evidence to prove that before being uncovered to the advertising campaign, the distributions of spending are not same in full-time employees and part-time employees in the population of respondents. We reject the null hypothesis because significance level is smaller than 0.05. b. Perform a Wilcoxon Signed Rank Test of the null hypothesis that spending before the advertising campaign exposure is the same as after for the population of customers. H0: There is no different between the spending of before and after the advertising campaign. H1: There is a difference between the spending of before and after the advertising campaign TABLE 01 Ranks N AFTER - BEFORE
Negative Ranks Positive Ranks Ties
Mean Rank
Sum of Ranks
a
62.57
5944.00
b
102.43
6146.00
95 60
5c
Total
160
a. AFTER < BEFORE, b. AFTER > BEFORE c. AFTER = BEFORE
As above rank table shows in the both situation of before and after the advertising campaign, the mean rank of negative ranks is 62.57 and the sum of ranks is 5944.00, in positive ranks the mean rank is 102.43 and sum of ranks is 6146.00 with the total observation of 160. But both the mean ranks and sum of ranks are different. Hence, Thus, there is a difference between purchase of products after the campaign and purchase of products before the campaign. But as we do not have sufficient evidence, we should continue testing. TABEL 02. Test Statistics a AFTER BEFORE Z Asymp. Sig. (2-tailed)
-.181b .857
a. Wilcoxon Signed Ranks Test b. Based on negative ranks.
Decision Rule According to the table 2 shown above if the Significant level is less than 0.05 then we reject the null hypothesis. According to the above observation the significance level is 0.857 > 0.05 which is greater than 0.05. So, as per we cannot reject the null hypothesis. Therefore, we have enough evidences under 95% of confidence level, that there is no difference between the spending of before and after the advertising campaign. According to the above data we accept the null hypothesis.
Question No. 05 a. Perform a chi-square goodness of fit test of the null hypothesis that positive, neutral and negative responses are equally likely amongst the population. H0: Proportion for each attitude (positive, neutral and negative) types are the same. H1: The proportion of at least one attitude (positive, neutral and negative) types are different. TABLE 01 Chi- Square for the proportion. attitude_1 Observed N
Expected N
Residual
Positive
52
53.3
-1.3
Neutral
36
53.3
-17.3
Negative
72
53.3
18.7
Total
160
As shown in the above table of attitude_1, among all 160 observation, 52 people have positive attitude, 36 people have neutral attitude and 72 people who have a negative attitude. Expected number of all three categories are 53.3, same in all three attitudes. But the actual value of all three groups are not same it’s different. Therefore, there is a difference between three groups. But as we do not have enough data to prove, we should continue using Chi-square Test.
Test Statistics attitude_1 Chi-Square df Asymp. Sig.
12.200a 2 .002
a. 0 cells (0.0%) have expected frequencies less than 5. The minimum expected cell frequency is 53.3.
Decision Rule According to the above test statistic table, if the significance value is less than 0.05, we can reject the null hypothesis. Hence, according to the above table the significance value is 0.002...