Analysis of Covariance SPSS example PDF

Title Analysis of Covariance SPSS example
Author Laura Andrews
Course Educational Statistics Ii
Institution Kent State University
Pages 17
File Size 970.6 KB
File Type PDF
Total Downloads 57
Total Views 144

Summary

analysis of covariance notes...


Description

Analysis of Covariance: SPSS Procedures and Results This file will cover how to run Analysis of Covariance (ANCOVA) using SPSS. The file we will be using is titled “hourlywagedata.sav” which includes data from nurses. In this file, there are four variables; we will be focusing on one independent variables: “position”; one dependent variable “hourwage”; and one covariate: “yrsscale”. We will be running an F test to determine if nurses in different positions (office vs. hospital) earned different pay, after controlling for years of experience*. *Technically years of experience in this example is ordinal, since the participants have been grouped into year ranges. You should only use covariates that are measured on an interval or ratio level of measurement. Years of experience would be a ratio level variable, normally, if it had not been placed into year groupings. However, for this example, we will treat this variable as if it is interval/ratio.

ANCOVA has additional assumptions that need to be tested. While ANCOVA does require the assumptions of independence, homogeneity of variance, and normality, (which we’ve already covered, so they will not be covered here) there are three additional assumptions that need to be met (and violations of these assumptions are much more serious than the first three. The first new assumption for ANCOVA is linearity between the covariate and dependent variable. Because scores are adjusted on the dependent variable based on the covariate in ANCOVA, if the relationship is nonlinear (i.e., curvilinear), then the adjustments will not be correct.

You can test this assumption using multiple methods. I personally like the “Curve Estimation” function in SPSS under the “Regression” heading: 1) Select “Analyze” from the list of menu options at the top of the screen. Then select “Regression” and “Curve Estimation”.

2) After the “Curve Estimation” window opens, place “Hourly Salary” in the “Dependent(s)” field and “Years Experience” in the “Independent” field. I would suggest then selecting at least “Quadratic” and “Cubic” relationships to be tested. You may select others (if you are aware of what types of curves are being tested), but I’ve generally found those two be sufficient in most cases.

3) Select “OK”. In the output file, you should see the following table (among others): Model Summary and Parameter Estimates Dependent Variable:Hourly Salary Model Summary Parameter Estimates Equation Linear Quadratic Cubic

R Square .068 .068 .069

F 70.557 35.279 23.814

df1 1 2 3

df2 968 967 966

Sig. .000 .000 .000

Constant 17.457 17.291 16.194

b1 .762 .874 2.138

b2 -.016 -.425

The independent variable is Years Experience.

4) There are a couple of things to examine here. First, which relationships are statistically significant? Here, all three: linear, quadratic, and cubic, are statistically significant. So, that would suggest that any of the three types of curves fits the data. So, the next thing to look at is the “R Square” value. This value will tend to be larger as you go from linear to quadratic to cubic regardless of which relationship best fits the data (this is because the quadratic and cubic equations have more regression coefficients, which can be seen under “Parameter Estimates”. This generally leads to more variance explained, even if only slightly). However, you want to consider the size of the change. Here, the R square value for the linear relationship is .068. It’s approximately the same for the quadratic relationship, and improves slightly to .069 for the cubic relationship. This is a very small change overall, so you can assume that a linear relationship is appropriate here, as the fit of the quadratic and cubic relationships do not improve much over linear. As a side note, determining linearity is not as clear and straight forward as other statistical tests; it is possible that a linear and nonlinear relationship both appear to fit the data reasonably well. When this happens, it’s up to the judgment of the researcher to make the call concerning whether or not this assumption has been met.

b3

.03

The second assumption to be tested in ANCOVA is homogeneity of regression coefficients. Essentially what this means is that the correlation between the covariate and dependent variable is consistent across the treatment groups or levels of the independent variable (i.e., the regression equations are parallel across groups). To conduct the test of the assumption of homogeneity of regression coefficients, follow the steps below: 1) Select “Analyze,” then “General Linear Model,” and then “Univariate”.

2) When the “Univariate” window pops up, you want to select “Hourly Salary” as the dependent variable, “Nurse Type” as the fixed factor, and “Years Experience” as the Covariate.

3) The next thing you have to do is select “Model”. Select “Custom” and place both the “position” and “yrsscale” variables in the “Model” field. Then, highlight both “position” and “yrsscale” simultaneously (you will have to use “Control” on a PC or “Command” on a Mac) and move this over to the “Model” field. This will place the interaction between the “position” and “yrsscale” variable (“position*yrsscale”) in the analysis. This is the effect you want to test for statistical significance (However, you do not want it to be statistically significant).

4) After selecting “Continue” and “OK,” you should obtain the following table in the output file:

Tests of Between-Subjects Effects Dependent Variable:Hourly Salary Source

Type III Sum of Squares a

df

Corrected Model 1768.410 3 Intercept 26941.048 1 position 191.503 1 yrsscale 1010.989 1 position * yrsscale 22.492 1 Error 12749.394 966 Total 408679.974 970 Corrected Total 14517.804 969 a. R Squared = .122 (Adjusted R Squared = .119)

Mean Square 589.470 26941.048 191.503 1010.989 22.492 13.198

F 44.663 2041.278 14.510 76.601 1.704

Sig. .000 .000 .000 .000 .192

5) The only F test we are concerned with here is the test related to “position*yrsscale”. In this case, F (1, 966) = 1.704, p = .192, so the interaction is not statistically significant. This indicates that the assumption of homogeneity of regression slopes has been met. If this interaction had been statistically significant, then relationship between “Years Experience” and “Hourly Salary” would have varied across “Nurse Type”. In other words, the correlations between the covariate and the dependent variable would not be consistent across levels of the independent variable.

The final assumption to test is the assumption of independence of the independent variable and covariate. Basically, all this means is that the levels of the independent variable do not differ on the covariate. You can test this by running a simple oneway ANOVA, with the covariate as the dependent variable: 1) Select “Analyze,” then “General Linear Model,” and then “Univariate”. 2) In the “Univariate” window, select “Years Experience” as the dependent variable, and “Nurse Type” as the fixed factor.

3) Select “OK.” In the output file, examine the following table: Tests of Between-Subjects Effects Dependent Variable:Years Experience Type III Sum of Source Squares df Mean Square Corrected Model 4.319a 1 Intercept 10989.867 1 position 4.319 1 Error 1739.377 998 Total 14332.000 1000 Corrected Total 1743.696 999 a. R Squared = .002 (Adjusted R Squared = .001)

4.319 10989.867 4.319 1.743

F 2.478 6305.643 2.478

Sig. .116 .000 .116

4) We are concerned with the F value associated with the “position” variable. Here, it is not statistically significant as our p value of .116 is greater than our alpha level of .05. This indicates that the groups do not differ on the covariate, and the assumption has been met.

Now that we have tested all of the assumptions, we can now run Analysis of Covariance. To run ANCOVA, follow the steps below: 1) Select “Analyze” from the list of menu options at the top of the screen. 2) Then, scroll down to “General linear model” and select “Univariate”.

3) When the “Univariate” window pops up, you want to select “Hourly Salary” as the dependent variable, “Nurse Type” as the fixed factor, and “Years Experience” as the Covariate.

4) Before selecting “OK” to run the analysis, you will typically want some options, such as descriptive statistics, so select “Options”.

5) In the “Options” window, select “Descriptive statistics” and “Homogeneity tests” at minimum. You may also want “Estimates of effect size” and “Observed power”. Also, move “position” into “Display Means for:” under the “Estimated Marginal Means”. This will provide you with the means after they have been adjusted based on scores on the covariate. I also checked “Compare main effects” and changed “Confidence interval adjustment” to “Sidak.” This will run post-hoc tests on the adjusted means on the dependent variable. While this is not technically needed here, as there are only two levels of the independent variable, I’ve included it for the sake of illustration. If you have 3 or more levels of the independent variable, and the overall F test is statistically significant, then you will want to conduct these tests (Do not use the default option of LSD tests since that test does not adjust alpha for multiple comparisons).

6) Next, select “Continue” and then “OK”. A results file will then be created with the following information (the asterisks indicate my commentary).

Univariate Analysis of Variance Between-Subjects Factors Value Label Nurse Type

0 1

N

Hospital Office

670 300

*The table below provides the observed means (i.e., unadjusted for the covariate) and the standard deviations. Descriptive Statistics Dependent Variable:Hourly Salary Nurse Type Mean Std. Deviation Hospital 20.7154 3.43873 Office 18.9137 4.45501 Total 20.1582 3.87069

N 670 300 970

*The table below provides the test for homogeneity of variance. As you can see here, we’ve violated that assumption as the Sig. value (p value) is less than .05. So, we should use a more conservative alpha level, such as .025. Levene's Test of Equality of Error Variancesa Dependent Variable:Hourly Salary F df1 df2 Sig. 30.179 1 968 .000 Tests the null hypothesis that the error variance of the dependent variable is equal across groups. a. Design: Intercept + yrsscale + position

*The table below provides the ANCOVA results. From this, we can see that the covariate, “yrsscale,” is significantly related to the dependent variable of “Hourly Salary”. While this is certainly a good thing, it’s not the focus of the research since covariates are generally chosen because they are known to be related to the dependent variable. More importantly, the “position” variable is statistically significant, F (1, 967) = 57.513, p < .001, indicating that there is a difference, at the population level, between the two nurse types (office vs. hospital) after controlling for the covariate. Tests of Between-Subjects Effects Dependent Variable:Hourly Salary Source Corrected Model Intercept yrsscale position Error Total Corrected Total

Type III Sum of Squares 1745.917a 32948.182 1073.231 759.610 12771.887 408679.974 14517.804

df

Mean Square 2 872.959 1 32948.182 1 1073.231 1 759.610 967 13.208 970 969

Tests of Between-Subjects Effects Dependent Variable:Hourly Salary Source Corrected Model Intercept yrsscale position

Noncent. Parameter 132.189 2494.611 81.258 57.513

Observed Powerb 1.000 1.000 1.000 1.000

a. R Squared = .120 (Adjusted R Squared = .118) b. Computed using alpha =

F 66.094 2494.611 81.258 57.513

Partial Eta Sig. Squared .000 .120 .000 .721 .000 .078 .000 .056

Estimated Marginal Means Nurse Type *The table below provides the means of the two nurse groups adjusted for the covariate. If you compare this to the observed means above, you will note that the “Hospital” group had their group mean adjusted upward slightly, while the “Office” group had their group mean adjusted downward slightly. This would likely be the case if the “Office” group outscored the “Hospital” group on the covariate. The amount of adjustment that occurs depends on two factors: 1) The size of the difference between the two groups on the covariate (if there is no difference then there is no adjustment), and 2) the correlation between the covariate and dependent variable (the larger the correlation, the larger the adjustment provided the groups do differ on the covariate). Estimates Dependent Variable:Hourly Salary 95% Confidence Interval Nurse Type Hospital Office

Mean Std. Error Lower Bound a 20.751 .140 20.475 a 18.834 .210 18.422

Upper Bound 21.027 19.246

a. Covariates appearing in the model are evaluated at the following values: Years Experience = 3.54.

* The table below provides the post-hoc tests, or pairwise comparisons, between the levels of the independent variable based on the adjusted means. Here, we only have two levels, so we did not really need to run these, since the results will mirror the overall F test above. However, with 3 or more groups and a statistically significant F value, you would need to run this test to determine which groups significantly differ from each other. Pairwise Comparisons Dependent Variable:Hourly Salary

(I) Nurse Type Hospital Office

(J) Nurse Type Office Hospital

95% Confidence Interval for Differencea

Mean Difference (I-J)

Std. Error

*

1.917 -1.917*

.253 .253

Sig.a

Lower Bound Upper Bound .000 1.421 2.413 .000 -2.413 -1.421

Based on estimated marginal means *. The mean difference is significant at the a. Adjustment for multiple comparisons: Sidak.

Univariate Tests Dependent Variable:Hourly Salary Sum of Squares Contras t Error

Mean Square

df

759.610

1

759.610

12771.887

967

13.208

F 57.513

Sig. .000

Partial Eta Squared .056

Noncent. Parameter 57.513

Observed Powera 1.00

The F tests the effect of Nurse Type. This test is based on the linearly independent pairwise comparisons among the estimated marginal means. a. Computed using alpha =...


Similar Free PDFs