3. Measurement - Topic summaries PDF

Title 3. Measurement - Topic summaries
Author Sophie Hunter
Course Advanced Research Methods in
Institution Victoria University of Wellington
Pages 7
File Size 381.7 KB
File Type PDF
Total Downloads 100
Total Views 158

Summary

Topic summaries...


Description

Measurement in psychology A statistician's joke: "There was a poor statistician who drowned crossing a river… it was 3 feet deep on average." Measurement Read chapter 3: Measurement and validity Key points: Reliability and validity of measurements Types of variables: - Nominal - Ordinal - Interval - Ratio What is the point of measurement? In science, we need to make 'good' measurements of our phenomena, whether they are size of stairs, amount of a chemical reaction, or psychological variables such as hope, depression, and reading ability. Where do we begin? From my philosophy of science lecture, you know about: - Theory/predictions - Observations Theory is made up of constructs whereas observations are based on data. Constructs are hypothetical because we can’t measure them directly. We understand constructs through capturing data that represent constructs. This representation is formally called operationalization. Let’s consider an example: does savoring increase wellbeing? Operationalization

Does savoring predict wellbeing?

But are these 'good' variables? Measurement of constructs, otherwise known as variables, can be better or worse. We want to conduct research based on variables that are reliable and valid. Why? Because then we have confidence that they are fairly representing the construct and not something else (either construct or error). Reliability: a measurement tool that consistently generates a similar empirical estimate Let’s consider a thought experiment. We have three self-report measures and we want to determine reliability over 3 months. Which of these three will be the most reliable? - Mood ("how happy do you feel right now?) - Gender (select among these: male, female, other) - Optimism (tell us "do you feel that things will work out for you?") Test-retest reliability Hopefully you chose gender as the most reliable. - In general, stable demographic variables are the most reliable - Psychological variables rooted in personality, like optimism, are intermediate in stability and - Quickly changing and highly variable variables such as mood are the lowest in reliability. How do we assess reliability? - Most measures of test-retest reliability are simply correlations of scores for the same individuals at two or more points-in-time. - So what's a good test-retest reliability correlation over time? - .70 or .50, depends on the measure No clear-cut offs are proposed Here is a good answer I found: - "The value will depend on the time between test and retest, the length of the test, what is being measured, and the characteristics of the sample. Some traits are very stable. Others may show some change over time. Thus, there is no absolute value. It depends on the situation." - I would actually NOT want my mood measure to yield a high correlation over time (probably .20), whereas I would want gender to be very high (maybe .98) and optimism to be intermediate (maybe .50 over 3 months, .70 over 1 month).

-

So it depends

Point of confusion If someone asks you "What is the reliability of your scale?" what do you say? I would say: "What type of reliability?" There are two: - Test-retest reliability, correlation over time for the same individuals Vs - Internal reliability (e.g. Cronbach's alpha) Measures of internal reliability capture the average level of intercorrelation among all of the items. Let's consider an example: - 6 items for a subscale of grit (persistence of effort) - Example item: "Setbacks don’t discourage me. I don’t give up easily." Where to go in SPSS

Is this okay?

I have seen measures with an alpha between .60 to .70 be used. But in those cases, you should warn the reader that it might not be sufficiently internally reliable. Our measure of grit was 'acceptable' Basis for a good Cronbach's alpha

Notice that all of the correlations are pretty good. The lowest is .197, and the highest is .661. There is an algebraic equation (which you don’t need to know) that combines the number of items, average variance, and average covariance to come up with the final numerical value. More items boost the alpha, and a higher average correlation among the items helps too. You want to remove poor items. Any poor items here?

No. if you look at the right-most column, the deletion of any one item wouldn’t improve the alpha over the initial value, alpha = .74 However, you should notice that the second item is a weak predictor, and removal of it would result in the same alpha. Occasionally you will find that removal of items will improve the overall alpha, you may wish to do so to shorten the scale and improve internal reliability. Okay, to recap… Determining 'reliability' is confusing because there are two types. Test-retest reliability tells you whether the scale yields similar numerical values for the same individuals over time. Low reliability here can indicate that the scale is psychometrically poor OR it might indicate that your phenomenon is just inherently unstable.

Internal reliability tells you how internally consistent the items of the measure are. A high Cronbach's alpha indicates that the items on the scale tend to correlate with each other to a high degree. A good scale will evidence reasonable stability over time, and it will be internally consistent (alpha above .70) Let's turn to validity now We want our scale to measure what we intend it to measure, and this is called validity. There are several types of validity: - Content validity: do the items in the scale relate to or tap the overall construct? Does the following item assess grit: "Setbacks don’t discourage me. I don’t give up easily."? How about "Is it easy to get out of bed in the morning"? -

Criterion validity: to what extent does the scale predict expected outcomes? So, for the grit scale, would it predict success in a job or school?

-

Construct validity: to what extent does the scale measure the intended hypothetical construct? The hypothetical construct of "grit" is defined by Duckworth as "perseverance of effort and passion to achieve long-term goals". Consequently, the scale should measure "perseverance" and "motivation to achieve goals".

-

Convergent validity: measures the extent to which the scale in question correlates with scales that assess something similar. In other words, scores from the Grit Scale should correlate with scores from persistence scales, and research indicates that seems to do this. Also correlated with hardiness, resilience, ambition, and self-control

-

Discriminant validity: measures the extent to which a scale does NOT correlate with scales that are expected to be unrelated. We’re looking for a non-significant correlation, not a negative correlation. A scale measuring laziness would be negatively related to grit, and that would be an example of convergent validity. One would expect IQ to be unrelated to grit and a number of studies show exactly this, and if it supported with future research, then that would indicate discriminant validity.

Why are reliability and validity good? We want 'good scales', and these are defined as scales possessing reliability and validity. - We want our scales to reliably produce a similar score for the same individuals for attributes that don't change much (e.g. religious affiliation) and those that change moderately (optimism) - We also want our scales to measure what they intended to measure, and nothing else. If scales demonstrate good validity (all five types), then we are confident that they measure what we intended. In practice, what does this mean? If you are using a pre-existing scale, then you need to be assured that: - The internal reliability is at least acceptable - The test-retest reliability is good - The items of the scale seem to capture the intended construct (content validity) - It has been shown to predict expected outcomes (criterion validity) - It has been shown to correlate with similar scales (convergent validity) - It does not correlate with dissimilar scales (divergent validity) What about construct validity?

Construct validity A scale demonstrates good construct validity if numerous studies evidence all of the abovementioned types of validity In other words, construct validity is the highest-order, most abstract type of validity, and can only be demonstrated through repeated demonstrations that the scale represents the intended construct in numerous and various contexts. Concretely, does the grit scale predict successful achievement of goals for individuals living in the real world? We will talk about meta-analyses later in the course. A measure that performs well in aggregations of numerous studies is likely to have good construct validity. Shift gears: types of variables In your textbook these are called 'scales of measurement', and there are four basic types that you need to know. I hope that this information is mostly review Nominal variables are composed of numerical values that indicate membership within a particular group. Gender is a classic nominal (also called categorical) variable because an individual typically falls into one of three groups: 0 (males), 1 (females) and 2 (others). Gender is not assessed as binary any more, but its still considered to be nominal. Ordinal variables are based on rankings. If you are taste-testing four beers, most people would rank them 1st to 4th. Only feasible with relatively small groups of comparisons (e.g. less than 5). Can rank on any attribute. Interval (continuous) variables are variables with numerous obtained numerical values between the maximum and minimum. For example, resilience scores vary quite a lot over the 1-5 range if you have multiple Likert-type items:

Ratio variable is similar to ordinal and interval scales, but it has a true zero point. Height is on a ratio scale. The textbook gives an example of errors by a rat running a maze: the rat can conceivably commit zero errors. Usually treated as identical to interval variables, but the minimum numerical value has a special meaning.

Practical concerns for types of variables In psychology, the most common type of variable is continuous/interval. Reason? Many statistical tests (t-test, ANOVA, regression, etc) rely on assumptions of equal spacing between points on a scale and normal distributions. Interval and ratio data are more likely to achieve normality than other types, it is impossible for nominal and ordinal. Thus, other types of analyses must be used if your outcome variable is nominal or ordinal, e.g. nonparametric statistics like Wilcoxon signed rank test, Friedman test, Kruskal-Wallis test, and so forth. Parametric tests are used for ratio and interval. One can use nominal variables in parametric tests, but only as predictors or Ivs. So it pays to know what type of variable you have....


Similar Free PDFs