BSB123 Assignment PDF

Title BSB123 Assignment
Course Data Analysis
Institution Queensland University of Technology
Pages 11
File Size 653.2 KB
File Type PDF
Total Downloads 103
Total Views 163

Summary

BSB123 Research Assignment - Grade: 7. ...


Description

BSB123 DATA ANALYSIS Assessment Item 2 – Research Report (2018 S1)

Jessica Seddon N9816933

Task 1: Boxplots and T-tests 1. (a) Construct separate boxplots of salaries for male and female academics, and compare their distributions (central location, spread and skewness). Figure 1.1: Boxplot of Male Academic Salaries per Year

Figure 1.2: Boxplot of Female Academic Salaries per Year

Figure 1.3: Boxplot Comparison for Male and Female Academic Salaries per Year

CENTRAL LOCATION OF DATA Central tendancy values such as the mean, median and mode are measurements that are indicative of the center or middle of a dataset.

Mean The arithmetic mean of a dataset, is the most frequently used measure of central location. It is the sum of all observations divided by the number of observations. The excel function, (=AVERAGE), was used to calculate the mean annual salary for both females and males. The following values were produced:

Mean of Female Salaries = $81, 665.45 Mean of Male Salaries = $101, 135.99

The higher mean value for male salaries in comparison to female salaries ($101, 135.99 and $81, 665.45 respectively) indicates on average, males are earning a higher salary per year than females.

Median Once all the observations of a dataset are filtered into a numerical order from smallest to largest, the median can be found by locating the middle number. The excel function, (=MEDIAN), was used to deduce the median salary per year for both females and males. The following values were produced:

Median of Female Salaries = $78, 472 Median of Male Salaries = $99, 262

The median is a value separating the higher half of a data population from the lower half. Thus, due to the higher median for male academic salaries ($99, 262) it is assumed that males are more likely to a have a higher annual salary package in comparison the females ($78, 472).

SPREAD OF DATA The spread of data, dispersion and variance are all terms that can be used interchangeably. The coefficient of variation (CV) is a relative measure of variation (dispersion). The formula for the coefficient of variation is the following, whereby ‘s’ stands for the standard deviation and the ‘x’

symbolises the mean of the selected dataset. The standard deviation was calculated in excel using the (=ST.DEV) function and the mean was calculated using (=AVERAGE) function.

CV for Female Academics =

S X

CV for Female Academics =

15010.24 81665.45

CV for Female Academics = 0.184

CV for Male Academics =

S X

CV for Male Academics =

25215.49 101135.99

CV for Male Academics = 0.249

Conclusively, males have a higher CV value of 0.249 in comparison to females, 0.184, indicative that males have a greater dispersion in salaries.

SKEWNESS OF DISTRIBUTION For female academics, the mean ( $81, 665.45) is greater than the median ($78, 472). The same applies for the male academics whereby, the mean ( $101, 135.99), is greater than the median ($99, 262). When large values are added to a normal distribution, both the mean and median are pulled to the right resulting in a positive skew as displayed by Figure 1.4.

Figure 1.4: Positive Skew Representative for Both Female and Male Academics

1. b) Test if male academics on average earn more than their female counterparts at 1% Population of Female Academics = 1 Population of Male Academics = 2

H0: 1 = 2 H1: 1 < 2

 = 1%  p-value <   reject H0 p-value >   accept H0

In Females: x-bar = 81665.455 s = 15010.238 n = 66

In Males: x-bar = 101135.985 s = 25215.48778 n = 133

If x1 and x2 are normal  x-bar1 and x-bar2 is normal x-bar1 – x-bar2 = 0 (assuming H0)

sx-bar1- x-bar2 = depends variance assumption 2

Rule of thumb =

S1

2

S2

provides table) 2. a assume equal 2.b assume eqaul 2.c assume unequal

Task 2 (Regression Analysis) 3. Before you conduct any regression analysis, you use Excel to construct matrix of all the quantitative variables in the dataset. Based on the correlation matrix, comment briefly on the associations between Salary and other quantivtiative variables.

Figure 2.1: Correlation Matrix of Quantitative Variables in Data Set (Male and Female)

Both Age and Years of Service have a positive relationship with the Salary ($/year). The value of 0.407 signifies that there is a correlation between Salary ($/year) and Age. The correlation matrix also reveals that there is an even slightly higher correlation between Salary ($/year) and Years of Service highlighted by the value 0.427.

4. Conduct a stepwise regression according to the following procedure: Step 1: Gender only Step 2: Gender and School Step 3: Gender, School and Rank Step 4: Gender, School, Rank and Years of Service Step 5: Gender, School, Rank, Years of Service and Age Choice of reference variable: Reference variables: Health and ASSO. Categorical variables: School and Rank.

Figure 4.1: Stepwise Regression

5. Based on the regression output obtained in Step 4, answer the following: a) Which summary meaure in the regression output is used to assess the overall adequacy of the model? Comment on the overall adequacy of the model obtained in Step 4.

b) For each of the four independent variables, fully interpret the regression coefficients and comment on their statistical significance. (In discussin statistical significance of a regression coefficient, you have to justify your choice of one or two tail test.) 6. Based on the correlation between Age and Salary, did you expect Age to have a statistically significant effect on Salary? In Step 5, is the statistical significance of the regression coefficient of Age as expected? Discuss fully....


Similar Free PDFs