SPSS Daten auswerten PDF

Title SPSS Daten auswerten
Author Paulina Gronbach
Course Marketing Methods & Analysis
Institution Otto-von-Guericke-Universität Magdeburg
Pages 17
File Size 1.5 MB
File Type PDF
Total Downloads 54
Total Views 137

Summary

Wichtige Tabellen aus SPSS-Output und wie man die Daten deutet....


Description

Cluster Analysis Hierarchical Optional: check sample size. For equal clusters: 10xcluster variables, general: 70xcluster variables 1. 2.

3.

Look at correlation matrix (only above or under 1’s) If value is >0,9 = kick out variable Look at: a) Agglomeration Schedule b) Scree Plot: Choose elbow (not! One below) c) Dendogram: find out how many clusters to use (for k-means)  look for a long distance where nothing happens. Count horizontal lines on the left. d) Cluster Membership for different cluster solutions Look at Descriptive Statistics  Shows mean values of clustering variables within each cluster  Use to determine an umbrella term

e’s = clustering variables Missing values (ignore)

Cluster 1

Cluster 2

Cluster 3

4.

Initial and final cluster centers  Compare centers. Are they similar? 

5.

Identical = “initial partitioning of the objects in the first step of k-means procedure was retained during the analysis”

Look at Crosstabulations How many are now in Cluster 2

How many fall into same cluster

+

+

/

= overlap

6.

Interpret: Look at ANOVA (or t-test table) table If Sig. < 0,05 = good = all clustering variables’ means differ significantly across at least two of the three segments

7.

Decide on the Number of clusters



Use VRC for each cluster # solution

 

Look for minimal VRC BUT: when solution contains only few observations in cluster: use another solution k = cluster SSW =within cluster variation SSB = between cluster variation

Validation Stability:  Use different clustering procedures on same dataset. Yield similar results  Split dataset in two halves, analyse each and compare Validity:  Assess face and expert validity  Criteria: accessible, actionable, parsimonious, familiar, relevant

For binary and nominal Variables:

  

Interpretation Examine cluster centroids Compare differences with t-test or ANOVA Find a meaningful name/label for each cluster

Simple Matching Coefficient -for symmetric variables (equal degree of information, e.g. gender)

Jaccard Coefficient

Russel and Rao Coefficient

 The higher the value, the more similar the objects are

Linkage algorithms Single linkage (nearest neighbour) = distance between clusters corresponds to shortest distance between any two members in the two clusters - tends to form one large cluster (forming a big segment) - good to identify outliers Complete linkage (furthest neighbour) = longest distance - strongly affected by outliers - tight and compact clusters Average linkage (between-groups linkage) = average distance between all pairs of the two clusters - produces low within-cluster variance - similar sized clusters - popular in marketing Centroid linkage = Distance between centroids (geometric centre = average values of all objects) = mittelwert raw data - produces low within-cluster variance - similar sized clusters - popular in marketing Ward’s linkage = goal: smallest possible within-cluster variance - similar size and tightness of clusters

Factor Analysis and PCA

1.

Check requirements  Measurement Scales: interval, ratio and equidistant ordinal scales (with >5 response categories). Does not work with binary data 

Sample Size: At least 10 times number of items/variables



Dependence of Observations: Only ask the same person once

Correlation between items/variables:   Check correlations significance in correlations matrix

Sig. < α = good (“correlation coefficients differ significantly from zero”)

 Look at Anti-Image Matrix

 Look at Kaiser-Meyer-Olkin (KMO) / Measure of sampling adequacy (MSA) value: at least >0,5 = good Better >0,7

 Bartlett’s Test: Sig. < α = good (reject H0 that variables are uncorrelated)

2.

Decide on Number of Factors

   

3.

Kaiser-Criterion: Keep factors with Eigenvalue >1.0 since it explains more variance than a single variable Parallel Analysis: Choose those, where Prcntyle < Eigenvalues Scree Plot: Plots factors Eigenvalue against factor associated with. Choose one number below elbow Percentage of Variance explained: until at least 50%, but 75% is desirable

Look Total Variance Explained

Accounts for 65% of overall variance

After using only variables with Eigenvalue >1

Total Variance explained by factors >50% = good >75% = better

4. Allocate Variables to Factors  Look at rotated Component Matrix  Allocate to highest absolute value. If not allocable, ignore variable and re-run

5. Evaluate Goodness of Fit Using Residuals:  Look at Reproduced Correlations  Absolute Residuals > 0.05 = bad  But: If percentage of all high residuals is less than 50% = ok

If bad result, remove items and re-run

Using Communalities  Look at Communalities >0.5 = good = “factor solution accounts for more than 50% of variable’s variance”

or

 Look at Reproduced Correlations Rrepr = 0.8322 + (-0.353)2 Reproduced Variance = sum of diagonal elements in reproduced corr. Matrix (i.e. communalities) Proportion of total variance explained by factors = Reproduced Variance / # diagonal elements *100%

P. 229 Ex.

6.

Compute Factor Scores (optional) FAC_ below 0 = below average FAC_ above 0 = above average e.g. “first observation is 1.91 standard deviations below average on factor 1”

Respondents

Hypothesis Testing 1.

Check for normal distribution:  Look at Test of Normality n=>50

n=50 Shapiro-Wilk: Sig. < α = no normality Sig. > α = good = normality is given For n α = good = normality is given

2.

Check for equality of variances:  Look at Independent Samples Test Levene’s Test: Sig. < α = variances are not equal Sig. > α = good = variances are equal

3.

Interpret Results Sig. (2-tailed) > α = do not reject H0 Sig. (2-tailed) < α = reject H0 “Means do not differ significantly in the two populations” ! For one tailed: α/2

ANOVA 1. 2.

Check: Is dependent variable measured on equidistant ordinal, interval or ratio scale? Check sample size per group: min. 20, >30 = good

3.

Check for normal distribution:  Look at Test of Normality Shapiro-Wilk: α = good = normality is given Kolomogorov-Smirnov: >50 Sig. < α = no normality Sig. > α = good = normality is given

Or: If not normally distributed: sample size more than 30 per group = okay, proceed (central limit theorem)

4. Check for equality of variances:  Look at Test of Homogeneity of Variances Levene’s Test: Sig. < α = variances are not equal  make Kruskal-Wallis rank test: Look at Test Statistic Table (Asymp. Sig): Sig α = good = variances are equal

or look at this:

5.

Make the Test Decision  Look at ANOVA Sig. < α = reject H0 = “at least two groups differ significantly” Sig. > α = do not reject H0 Or Look at F-Table: df1 = #groups -1  df Between Groups in ANOVA Output df2= #all observations - #groups  df Within Groups in ANOVA Output If F-value > Critical F-Value from table = reject H0 = “at least two groups differ significantly” If F-value < Critical F-Value from table = do not reject H0

or look at this for df values:

5.

Post Hoc

GT2 and Games-Howell: Look at Sig. Sig.< α = significant Sig. > α = not significant  group that is completely significant = this group if significantly different from other groups

REGWQ: Groups that are in different subsets = differ with p < 0.05

Effect Size:

*100% = “differences in “grouping variable” explain xyz% of total variation in overall dependent variable”

For SSB and SST look at ANOVA table

α-Inflation : making a large number of comparisons induces α-inflation Meaning, the more tests you conduct, the higher the overall possibility to have a type 1 error (claim there is a significant effect when there is actually none). You can’t tell in which tests the error is.

p

1−α ¿ ∗100 % probability of type 1 error ∈at least some of the tests=1−¿ p ( number of pairwise comparisons )=

k −(k −1) 2

k = number of groups to compare  “there is a [percent] probability pf erroneously rejecting null hypothesis in at least some of [p] tests. Far more than [α].”

Two-Way ANOVA 1.

Check for significance  look at Tests of Between-Subjects Effects If Sig. < α = significant = “[genders] differ significantly in [sales]. Females buy more” (look at Descriptives for this)

F-Value =

error df ∗ss of source df of source∗error ss

:

=

2.

Check Observed Power = probability if effect exists in the population, derived from sample >0.8 = good, better, 0.9 If significant but low power: “More samples needed to show effect in population”

3.

Post-Hoc Tests Like in ANOVA

4.

Look at Profile Plots If lines cross or slope is very different = interaction effect If lines are similar = no interaction effect

Missing Values 1.

Do Little’s MCAR test If Sig. > α = do not reject H0 If Sig. < α = reject H0

2. Choose method

= replace missing values with mean

Regression Analysis Test Data Requirements 1.

Check sample size n ≥ 50+8*k (to test overall relationship)   n ≥ 104+k (to test individual parameters effects)  better: Power analysis G*Power

2.

Check scale type Dependent variable = interval or ratio = good

3.

Check for Collinearity  Look at Coefficientsa VIF below 10 = good = no collinearity (conservative: 5) Tolerance >0.1 = good = no collinearity

Multicollinearity can lead to biased regression coefficients and inflated standard errors

Condition Index: Indicates numerical stability of the matrix. If numerically unstable, small changes in X will have great effect on inverse matrix.

Check Gauss-Markov Model Assumptions 1.

Check Linearity  No variable transformation = linearity given = good  Using Ramsey’s RESET test Look at Model Summaryc  Sig. F Change > 0.05 = good = relationships are linear

2.

Check if Expected Mean Error is Zero Not checkable, just assume this

3.

Check if Errors are constant (Homoscedasticity) Look at Scatterplot No funnel shape = good = errors are constant If Funnel shape: use weighted least squares or generalized least squares regression models

4.

Check for no Autocorrelation If data have time component: Durbin-Watson test Rule of thumb: if DW-value < 2 = positive autocorrelation, if DW-value > 2 = negative autocorrelation If there is Autocorrelation: use panel or time series models

5.

Check for normal Error Distribution Look at Histogram or For n < 50: Shapiro-Wilk test: Sig. > 0.05 = good = errors are normally distributed For n ≥ 50: Kolomogorov-Smirnov: Sig. > α = good = errors are normally distributed

Interpret Model 1.

Look at ANOVAa F-Test: Sig. < α = good = model is significant = “at least one ß differs significantly from zero”

2. Check R2  Look at Model Summary 0.5 0.3 0.1

3.

“For complex models: adjusted R2 is better, since it is not biased from added variables. The higher R2adj, the better” k = #independent variables n = #observations

Substantial Moderate Weak

Look at individual Coefficients  Look at Coefficients 1. Check for significant variables Sig. < α = significant, = “independent variable relates significantly to dependent variable”, ignore constant 2. Check Unstandardized Coefficients Beta Plus = positive correlation Minus = negative correlation “When variable s19 goes up by one unit, the dependent variable goes up by 0.012 units” 3. Check Standardized Coefficients Beta Sort by impact on dependent variable Highest number = highest impact on dependent variable (managerial importance)

2

3

1

4.

Validate Options:    5.

Split-validate (re-calculate 30% and 70%). If signs and ordering is identical = good Cross-Validation (new data set, compare) Include more variables. If signs and ordering is identical = good Calculate Scenarios

y = a + ß1*x1 + ß2*x2 + e

Cross Tabs = tests for correlation between two categorical variables 1.

Check Significance a. Set α b. Look for critical Value in x2-table: df = (k-1)*(l-1) c. Calculate v = x2 = (single value – expected frequency of that value)2 / expected frequency of that value + … Side frequency = sum on right hand side for each column Expected Frequency = total of column * total of row / total number of observations The stronger side frequency and expected frequency differ, the stronger the suspected dependency between X and Y. d. If x2 > critical value = observed values differ significantly Or If Sig. < α = reject H0 = There is a correlation between variables.

2.

Check Strength of Effect Phi-Coefficient φ  

The higher, the stronger the correlation Problem: not comparable between several analyses if they have different numbers of table elements. For 2x2: 0-1, for larger tables: >1

2x2

Contingency Coefficient CC   

Conservative method Similar to phi-coefficient, between 0 and 1 Hardly ever reaches one, compare to max. value

Compare C to Maximum C:

R = lesser number of rows or columns, whatever is less

Cramer’s V   

Modified version of phi-coefficient Between 0 and 1 When R = 2, then V and phi-coefficient are the same

>2x2...


Similar Free PDFs