Statistic Group Assignment 01 PDF

Title	Statistic Group Assignment 01
Author	sithmi aloka
Course	Business statistics
Institution	Victory University
Pages	32
File Size	1 MB
File Type	PDF
Total Downloads	54
Total Views	181

Preview

CLICK TO PREVIEW PDF

Summary

statistic group assignment 1...

Description

GROUP ASSIGNMENT 01 BUSINESS STATISTICS

BEO2255 APPLIED STATISTICS FOR BUSINESS GROUP ASSIGNMENT 1 GROUP MEMBERS NAME B.D.S. ALOKA AYESHA D. LIYANAGE

STUDENT ID NUMBERS 4624803/10020410 4610235/10020597

CONTENTS 1. Introduction 2. Part A 3. Part B

Pg. 4 Pg. 4 - 19 Pg. 20 - 32

INTRODUCTION This report uses SPSS output and its interpretations to conduct One-way ANOVA and Descriptive statistics for two set of data and by testing Wilcoxon Rank Sum Test, Wilcoxon Signed Rank Test and Kruskall-Wallis test.

Assignment 1 Part A 1. Use Descriptive statistics to summarize the data and develop a 95% confidence interval estimate of the mean income of the respondents Table 1 - Case processing summary

Case Processing Summary Cases Valid N INCOME

Missing Percent

160

N

100.0%

Total

Percent 0

0.0%

N

Percent 160

100.0%

In the above case processing summary table, there are 160 values that are taken to the consideration and no missing values are found. The total value is 160 as above mentioned in the table. Table 2 – Descriptive statistic

Descriptives Statistic MONTHLY

Mean

INCOME

95% Confidence Interval for

Lower Bound

34357.63

(SLR)

Mean

Upper Bound

38154.87

36256.25

5% Trimmed Mean

36097.22

Median

39000.00

Variance

147864740.566

Std. Error 961.330

Std. Deviation

12159.965

Minimum

18000

Maximum

58000

Range

40000

Interquartile Range

24000

Skewness Kurtosis

.009

.192

-1.403

.381

As above mentioned in the descriptive statistic table, there are 160 observations, out of that the average monthly income of the respondents are or the mean value is 36,256.25. with a standard deviation of 12159.965. The income is distributed between the range of maximum and minimum point of 18000-58000; as a percentage of 26.47%, the range between maximum and minimum value is 40000. Moreover, the Co-efficient of Variation would be 33.54%. (CV= STD/ Mean * 100) Skewness is 0.009 and Kurtosis IS -1.403. According to the above information we can claim that, at 95% of significant level, we have enough evidences to prove that, average income will be distributed between 34,357.6338,154.87 the lower bound and the upper bound the range between it is 3797.24.

2. Use Descriptive statistics to summarize the data and develop a 95% confidence interval estimate of the mean Age of (a) Male and (b) Female respondents in the target population Table 1 - Case Processing Summary

Case Processing Summary GENDER

Cases Valid N

Missing Percent

N

Total

Percent

N

Percent

Age in

Male

91

100.0%

0

0.0%

91

100.0%

years

Female

69

100.0%

0

0.0%

69

100.0%

As shown in the case processing summary, among all individuals, 91 of more people are males and other 69 are females. Furthermore, there are no any missing values in the table. The range of difference between male and female is (91-69) 22.

Table 2 - Descriptive Statistic

Descriptives GENDER AGE

Male

Statistic Mean

38.55

95% Confidence Interval for

Lower Bound

36.38

Mean

Upper Bound

40.72

5% Trimmed Mean

38.38

Median

36.00

Variance Std. Deviation

108.584 10.420

Minimum

20

Maximum

59

Range

39

Std. Error 1.092

Interquartile Range

18

Skewness

.353

.253

Kurtosis

-.966

.500

Mean

40.30

1.286

95% Confidence Interval for

Lower Bound

37.74

Mean

Upper Bound

42.87

5% Trimmed Mean

40.25

Median

38.00

Variance Female

Std. Deviation

114.068 10.680

Minimum

23

Maximum

59

Range

36

Interquartile Range

16

Skewness Kurtosis

.289

.289

-1.047

.570

As above mentioned out of 160 individuals, 91 are males and 69 are females. So, as per above descriptive statistic table shows from all 91 males, the average age of males is 38.55 years and the median age is 36 years. The values lie within the range of 39, with the minimum value of 20 and the maximum value as 59. The standard deviation of descriptive statistic table of age of the males is 10.420. The co- efficient of variation would be of age of males is 27.03%. Therefore, at 95% of confidence level, we have enough evidences to prove that average age of males is in between lower bound of 36.38 and upper bound of 40.72, a range difference of 4.34 years. According to all the 69 females, the average age of females in the above table data set are 40.30 years and the median age of 38 years, the values lie within a range is 36 of the maximum value of 59 and the minimum value of 23. Standard deviation of females are 10.680 years. The co- efficient of variation would be 26.50%. Hence, at 95% of confidence level interval, we have enough evidences to prov that the mean age of females is in between lower bound of 37.74 and the upper bound of 42.87 years. The range difference between upper and lower bound is 5.13 years.

3. Create a new variable called POSITIVE where POSITIVE = 1 if the response to Attitude was 4 or 5, OTHERWISE = 0. Attach value labels and variable labels to POSITIVE. a) Find the proportion of Male respondents in the sample that had a positive response to the attitudinal (positive) question.

Case Processing Summary Cases Valid N POSITIVE * GENDER

Missing

Percent 160

N

100.0%

Total

Percent 0

N

0.0%

Percent 160

100.0%

Gender* Positive Crosstabulation Gender Male Count

Total Female

59

49

108

36.9%

30.6%

67.5%

32

20

52

20.0%

12.5%

32.5%

91

69

160

56.9%

43.1%

100.0%

Otherwise % of Total Positive Count Positive % of Total Count Total % of Total

From the above gender * positive crosstabulation table the whole data set, 108 people, which was 67.5% from 160 observations, have strongly disagreed/ disagreed/ had no view either way, for the declaration, “The campaign has influenced my smoking habit”. On other way, the rest of 69 people, which was 43.1% from whole set of population, have either strongly agreed or agreed to the declaration, “The campaign has influenced my smoking habit”

Among all 91 males, 59 which is 36.9% from the whole population, have not had a positive response in this new variable as per shown in the table. Though the rest of 20.0% have given a positive response to the above declaration saying, that the campaign has influenced their smoking habits. Furthermore, from the above data 69 females of the population, only 30.6% have strongly disagreed/ disagreed/ had no view either way for the declaration above; and the rest of 12.5% have strongly agreed/ agreed to the declaration, saying that smoking has influenced their smoking habits. Therefore, the proportion of Male respondents in the sample that had a positive response to the attitudinal (positive) campaign was 18.1% as a proportion from the whole populations

According to the cross tabulation in table 2

b. Construct a 95 percent confidence interval on the proportion of Male customers in the population with a positive response. (Remember that, given the way POSITIVE is coded, the mean is the same as the proportion that had a positive response to Attitude.

Descriptives GENDER POSITIVE

Male

Statistic Mean

Std. Error

.35

95% Confidence Interval for

Lower Bound

.25

Mean

Upper Bound

.45

5% Trimmed Mean

.34

Median

.00

Variance

.231

Std. Deviation

.480

Minimum

0

Maximum

1

Range

1

Interquartile Range

1

Skewness

.632

.050

.253

Kurtosis Mean

Female

-1.637

.500

.29

.055

95% Confidence Interval for

Lower Bound

.18

Mean

Upper Bound

.40

5% Trimmed Mean

.27

Median

.00

Variance

.209

Std. Deviation

.457

Minimum

0

Maximum

1

Range

1

Interquartile Range

1

Skewness Kurtosis

.947

.289

-1.137

.570

As shown in the descriptive table above the mean proportion of Male customers in the population with a positive response is 0.35, with a standard deviation of 0.480. The co-efficient of variation of male customers is 1.37%. Therefore, at 95% of confidence level interval, we have enough evidences to prove that positive response of males can be deviated from upper bound of 0.45 and the lower bound of 0.25 and the range difference between boundaries is 0.2.

Question No. 04 Testing the normality of spending data

Descriptives Statistic Mean

478.38

95% Confidence Interval

Lower Bound

453.66

for Mean

Upper Bound

503.09

5% Trimmed Mean

478.89

Median

465.00

Variance BEFORE (SLR)

Std. Error 12.513

25050.173

Std. Deviation

158.272

Minimum

150

Maximum

770

Range

620

Interquartile Range

268

Skewness

.142

.192

Kurtosis

-1.097

.381

AFTER

Mean

516.63

22.564

(SLR)

95% Confidence Interval

Lower Bound

472.06

for Mean

Upper Bound

561.19

5% Trimmed Mean

495.83

Median

500.00

Variance Std. Deviation

81458.978 285.410

Minimum

40

Maximum

2100

Range

2060

Interquartile Range Skewness

370 1.603

.192

Kurtosis

5.562

.381

When examining the histogram and descriptive table 1 above, buying of the product before the campaign (SLR), mean value is 478.38 and the median is 465.00, are not equal and the standard deviation of 158.272. The co- efficient of variation of before the campaign is 33.08%. Moreover, skewness is (-0.142) and kurtosis is (-1.097) none of them are equals to zero. Furthermore, mentioning the above histogram of buying the products before campaign (SLR), it has many peaks and valleys making the curve not bell- shaped, which again proves that the data is not normal and the curve is also not normal with shape in buying of the products before the campaign (SLR).

When analyzing the table 1 and the histogram after the campaign (SLR), as per the table 1 the mean after the campaign is 516.63 and the median is 500.00 are not equal and the standard deviation of after the campaign is 285.410. The co-efficient of variation of buying the product after the campaign is 55.24%. Moreover, skewness value is 1.603 and the kurtosis value of 5.562, none of them is equals to zero in data given in the table. Furthermore, referring to the above histogram of buying products after the campaign (SLR), It has many peaks and valleys

making the curve not bell- shaped, which again proves that the data is not normal in purchase of the products after the campaign (SLR). a. Perform a Wilcoxon Rank Sum test of the null hypothesis that the distributions of spending before being exposed to the advertising campaign are the same for Full-time workers and Parttime workers in the population of respondents. Perform the hypothesis test at 5% level of significance. As data in the above table and histograms is not normal, as therefore we are should continue testing using a non-parametric test as per Wilcoxon Rank Sum Test. H0: The distribution of spending data before advertising campaign are same for full time employees and part time employees in the population of respondents. H1: At least one of the spending data before advertising campaign are not same in full time employees and part time employees in the population of respondents.

Testing the difference between Full- Time Service and Part- Time Service.

Ranks JOB

BEFORE

N

Mean Rank

Sum of Ranks

part time employment

71

91.94

6527.50

full time employment

89

71.38

6352.50

Total

160

In the above table shows that purchase of the product before the campaign,, the observation of number of employees engaging in part-time service are 71 with the mean rank of 91.94 and the sum of ranks of part-time employment is 6527.50, and the employees who engaged in full-time service is 89, with the mean rank of 69.60 and with the sum of ranks of full- time employment is 6352.50. According to that mean ranks and sum of ranks are not same for Part- time employees and fulltime employees, furthermore, by examine the above data, there is a difference between part-time employees and full- time employees. But as per there is no sufficient evidence to prove that, we

should continue testing using a non- parametric test, which is Wilcoxon Rank Sum Test Statistic to test whether there is a difference among mean rank and sum of ranks.

Test Statisticsa BEFORE Mann-Whitney U

2347.500

Wilcoxon W

6352.500

Z

-2.791

Asymp. Sig. (2-tailed)

.005

a. Grouping Variable: JOB

Decision Rule If the significant value is less than 0.05, we reject the null hypothesis (sig < 0.05). Hence, as in the situation, significance value is 0.005, which is less than 0.05, so we can the reject null hypothesis 0.005 < 0.05. Therefore, AT 95% confidence level, we have enough evidence to prove that before being uncovered to the advertising campaign, the distributions of spending are not same in full-time employees and part-time employees in the population of respondents. We reject the null hypothesis because significance level is smaller than 0.05. b. Perform a Wilcoxon Signed Rank Test of the null hypothesis that spending before the advertising campaign exposure is the same as after for the population of customers. H0: There is no different between the spending of before and after the advertising campaign. H1: There is a difference between the spending of before and after the advertising campaign TABLE 01 Ranks N AFTER - BEFORE

Negative Ranks Positive Ranks Ties

Mean Rank

Sum of Ranks

a

62.57

5944.00

b

102.43

6146.00

95 60

5c

Total

160

a. AFTER < BEFORE, b. AFTER > BEFORE c. AFTER = BEFORE

As above rank table shows in the both situation of before and after the advertising campaign, the mean rank of negative ranks is 62.57 and the sum of ranks is 5944.00, in positive ranks the mean rank is 102.43 and sum of ranks is 6146.00 with the total observation of 160. But both the mean ranks and sum of ranks are different. Hence, Thus, there is a difference between purchase of products after the campaign and purchase of products before the campaign. But as we do not have sufficient evidence, we should continue testing. TABEL 02. Test Statistics a AFTER BEFORE Z Asymp. Sig. (2-tailed)

-.181b .857

a. Wilcoxon Signed Ranks Test b. Based on negative ranks.

Decision Rule According to the table 2 shown above if the Significant level is less than 0.05 then we reject the null hypothesis. According to the above observation the significance level is 0.857 > 0.05 which is greater than 0.05. So, as per we cannot reject the null hypothesis. Therefore, we have enough evidences under 95% of confidence level, that there is no difference between the spending of before and after the advertising campaign. According to the above data we accept the null hypothesis.

Question No. 05 a. Perform a chi-square goodness of fit test of the null hypothesis that positive, neutral and negative responses are equally likely amongst the population. H0: Proportion for each attitude (positive, neutral and negative) types are the same. H1: The proportion of at least one attitude (positive, neutral and negative) types are different. TABLE 01 Chi- Square for the proportion. attitude_1 Observed N

Expected N

Residual

Positive

52

53.3

-1.3

Neutral

36

53.3

-17.3

Negative

72

53.3

18.7

Total

160

As shown in the above table of attitude_1, among all 160 observation, 52 people have positive attitude, 36 people have neutral attitude and 72 people who have a negative attitude. Expected number of all three categories are 53.3, same in all three attitudes. But the actual value of all three groups are not same it’s different. Therefore, there is a difference between three groups. But as we do not have enough data to prove, we should continue using Chi-square Test.

Test Statistics attitude_1 Chi-Square df Asymp. Sig.

12.200a 2 .002

a. 0 cells (0.0%) have expected frequencies less than 5. The minimum expected cell frequency is 53.3.

Decision Rule According to the above test statistic table, if the significance value is less than 0.05, we can reject the null hypothesis. Hence, according to the above table the significance value is 0.002...