Introduction to psychological assessment PYC4807 PDF

Title	Introduction to psychological assessment PYC4807
Course	Psychological Assessment
Institution	University of South Australia
Pages	34
File Size	353.7 KB
File Type	PDF
Total Downloads	50
Total Views	144

Preview

CLICK TO PREVIEW PDF

Summary

Download Introduction to psychological assessment PYC4807 PDF

Description

Introduction to psychological assessment PYC4807. Chapter 1 – An overview of assessment: Definition and scope.

Tools: tests, measures, assessment measures, instruments, scales, procedures and techniques – are available to make it possible for us to assess human behaviour. Psychometrics – the systematic and scientific way in which psychological measure are developed and the technical measurement standards required of such measures. To ensure that measures are valid and reliable. Psychological assessment – is a process-orientated activity aimed at gathering a wide array of information from many other sources. Testing involves the measurement of behaviour. Assessment measures and tests: The main characteristics of assessment measures: 

   





Assessment measures include many different procedures that can be used in psychological, occupational and educational assessment and can be administered to individuals, groups and organisations. Specific domains of functioning are sampled by assessment measures. From these samples inferences can be made about both normal and abnormal functioning. Assessment measures are administered under carefully controlled conditions. Systematic methods are applied to score and evaluate assessment protocols. Guidelines are available to understand and interpret the results of an assessment measure. Such guidelines may make provision for the comparison of an individual’s performance to that of an appropriate norm group or to a criterion. Assessment measures should be supported by evidence that they are valid and reliable for the intended purpose. This evidence is usually provided in the form of a technical test manual. The appropriateness of an assessment measure cannot be assumed without an investigation into possible test bias.

Assessment measures vary in terms of:  

   

How they are administrated. Whether time limits are imposed. Speed measure – there are a number of fairly easy items of similar level of difficulty. Completed in a certain time limit with the result that almost no one completes all the item. Power measures – no time limits are imposed, all test takers may complete all items, and the items get progressively more difficult. How they are scored. How they are normed. The nature of their items. The response required from the test taker.

Important issues: Test results represent only one source of information in the assessment process.

Assessment measures, because they offer objective measurement, often take on magical proportions for assessment practitioners who begin to value them above their professional judgement or opinion. Tests can be used as an extension of clinical observation, not as substitutes. Results always need to be bracketed by a band of uncertainty because errors of measurement creep in during administration, scoring and interpretation. The social, economic, educational and cultural background of an individual can influence his/her performance on a measure to the extent that the results present a distorted picture of the individual. The assessment process is multidimensional in nature. Table on page 7. Chapter 2 – Psychological assessment: A brief retrospective overview The development of objective measurement perhaps made the biggest contribution to the development of psychology as a science. Assessment measures are administered under highly standardised conditions.

Chapter 3 – Basic measurement and statistical concepts Measurement scales. Different scaling options: Category scales – where the response categories are categories or defined, usually ordered in some ascending dimension. This type of scale generates categorical or ordinal data. Likert-type scales – or summated-rating method. Respondents have to indicate to what extent they agree or disagree with a carefully phrased statement or question. These items will generate categorical data. Semantic differential scales – provide a series of semantic differentials or opposite. Respondent rates a person, attribute or subject in terms of bipolar descriptions. Only the extreme poles are described or anchored. In this format equal intervals are assumed. Intensity scales – several questions are posed and the extreme poles of the response scales are anchored and there are no descriptors of categories between these two anchors. The number of response categories depends on the sophistication level of the respondent group. Constant-sum scales – respondent is requested to allocate a fixed percentage or proportion of marks between different available options, indicating the relative value or weight of each option. Paired-comparison scales – only compare two characteristics or attributes at a time. Respondents divide 100 marks between or allocate 100 points to two attributes. The procedure is then repeated where the same attribute is systematically compared with all other attributes. Graphic rating scales – the rating scale response is represented in the form of a visual display. Smiling or non-smiling faces may mean different things in different cultures – obtained ratings may be culturally biased.

Forced-choice scales – in cases where intra-individual differences have been assessed. Individual chooses the option that describes him/her best. Can generate nominal or ordinal data. Also used for ability-based measures – one answer is coded as correct – produces categorical and dichotomous data. Ipsative scales (and scoring) – force respondents to compare two or more item options. Used to compare a test-takers score on one scale within the test with another scale in the test. Compare each characteristic within a person, rather than compared to other people. Guttman scales – usually used in social psychology, or in attitude and prejudice research. A respondent is requested. To test whether items belong to the same attitude dimension. The number of items the respondent answered in affirmative would reflect his/her score on this attitude. Considerations in deciding on a scale format. Single dimensional versus composite scale – this is determined by the underlying theoretical definition of the construct. Scales should be operationalised in a similar manner to the dimension of the construct. Question-versus statement-item formats – some research suggests that statement items generate extreme responses resulting in response distributions that are approximately bimodally shaped as opposed to question items that result in more normally distributed response distributions. Type of response labels versus unlabelled response categories. Single attribute versus comparative rating formats. Even-numbered versus odd-numbered options. Ipsative versus normative scale options.

Norms: Psychological assessment is applied in various contexts. In each context, the purpose of testing needs to be specified before an assessment battery (i.e. the combination of psychological assessment measures that will be used) can be assembled. A measure is evaluated in terms of a number of characteristics before including it in the battery.  First, one should consider what attribute, characteristic or construct it measures.  Second, its appropriateness for an individual, group or organisation should be determined. This implies that one should be familiar with the characteristics of the group or context that the measure was developed for. The group should be representative of your client.  Third, one should determine if the measure is psychometrically sound. In this regard, the Employment Equity Act refers to the properties of validity, reliability and equivalence Norm-referenced and criterion-referenced measures Many psychological assessment measures are norm referenced. This implies that an individual’s score on the measure is interpreted by comparing this score to the performance of people similar to himself or herself. A norm is defined as “a measurement against which the individual’s raw score is

evaluated so that the individual’s position relative to that of the normative sample can be determined”. The normative sample refers to the group of people on whom the test was initially standardised during the test development process. Therefore, it stands to reason that a test should be standardised on a representative group of the population for which the test is intended (e.g. secondary school learners, apprentices or first-year students). If you administer a psychological test to a test taker, you should be sure that there is an appropriate norm group with which the score of the test taker could be compared. There is a need for local standardisations of psychological tests developed in other countries. Norm-referenced measures – each test-takers performance is interpreted with reference to a relevant standardisation sample or norm group. Criterion-referenced measures – compare the test-takers performance to the attainment of a defined skill or content. An individual’s score can also be interpreted by comparing performance to an external criterion (rather than to the performance of a norm group). The examination you will have to write for this module is an example of a criterion-referenced measure. An expectancy table is sometimes used once enough data have been gathered on the typical performance of test takers in a particular setting, to set a cut-off score. Such a table gives an indication of the relation between performance on a test and success on a criterion The standard normal distribution: The normal distribution has a mean of 0 and a standard deviation of 1. Raw scores have little or no meaning and must be converted to norm scores through statistical transformation to make them meaningful. A norm – a measurement against which an individual’s raw score is evaluated so that the individuals position relative to that of the normative sample can be determined. Establishing norm groups: Norm groups must be representative of both the applicant pool and the incumbent population as well as appropriate for the position for which the assessment is conducted. Co-norming measures: Entails the process where two or more related but different measures are administered and standardised as a unit on the same norm group. This practice can address issues such as – test-order effects, learning and memory effects, gender, age and education effects and variations of scaling format effects. Types of test norms – page 52 – 57.

Chapter 4 – Reliability: Basic concepts and measures The measurement process entails a clear conception of 3 things:   

What the entity is that we want to measure (Weight/height) What exactly the nature of the measure is that we want to use (Scale/ruler) The application of the rules on how to measure the object (with/without clothing)

The reliability of a measure refers to the consistency with which it measures whatever it measures. The employment equity act (No. 55 of 1998) specifies that psychological measures need to meet 2 technical criteria – reliability and validity. Systematic or chance factors may be present therefore there is always space for error (random or systematic error). When administering a measure, you should be aware of the fact that the score that an individual obtains is not a perfectly accurate reflection of that individual’s standing on the construct being measured. You should understand what is meant by a person’s observed score, true score and error score in the context of reliability. The standard error of measurement is an indication of the probable fluctuations in a person’s observed score due to the imperfect reliability of the test. Types of reliability coefficients: Type of reliability coefficient Test-retest reliability* alternate- form reliability (immediate) Alternate-form reliability (delayed)

Split-half reliability

Forms Sessions

Coefficient

One form Two sessions Two forms One session

Coefficient of stability Coefficient of equivalence

Two forms Two sessions

Coefficient of stability Coefficient of equivalence Coefficient of internal consistency Coefficient of internal consistency

One form One session One form One session

Inter-item consistency Kuder Richardson/ Coefficient Alpha Scorer reliability * The interval between tests should rarely exceed six months.

Sources of error variance** test Time sampling Content sampling

Time sampling Content sampling

Content sampling Content sampling Content heterogeneity Scorer differences

Do not have to know formulas for exams. Test-retest reliability – administering a test twice to the same group of test-takers. The reliability coefficient is the correlation between the scores obtained (T1) and (T2). Draw backs – test circumstances may vary in terms of the test-takers (emotional factors, illness, fatigue, worry), the physical environment (different venue, weather, noise) or even testing conditions (the venue, instructions instructors) which may contribute to systematic error variance. Transfer effects such as practice and memory may play a role in the second test.

Alternate-form reliability – two equivalent forms of the measure are administered to the same group on two different occasions. Must be truly equivalent – same number of items, same scoring procedure and uniform in respect of content and item level difficulty. Expensive and time consuming. Split-half reliability – splitting the measure into two equivalent halves after a single administration of the test and computing the correlation coefficient between the two sets of scores. Common method is to separate the items odd and even. Inter-item consistency – based on the consistency of responses, obtained through the KuderRichardson method. Inter-scorer reliability – examiner variance is a possible source of error variance. Determined by having test-takers test protocols scored by two assessment practitioners. The consistency of ratings between raters. Intra-scorer reliability – the consistency of ratings for a single rater. Factors affecting reliability – page 64 – 67.

Chapter 5 – Validity: Basic concepts and measure. The validity of a measure concerns what the test measures and how well it does so. Reflects the accuracy of a measure. Validity refers to the appropriateness of the inferences made from test scores, or, the more traditional definition that a “test measures what it is supposed to measure”. A test is valid for a specific purpose and therefore there are different validation procedures that need to be considered, namely content description, construct-identification and criterion-prediction procedures. Note that a variety of statistical methods can be used to establish construct validity. You also need to be familiar with the following concepts: the validity coefficient, the standard error of estimate and the prediction of the criterion. Validation procedures: Evidence of validity Content description procedures

Subcategory Face validity Content validity

Examples of test and relevant criteria Achievement test to determine mastery of knowledge and skills in high school mathematics Construct identification procedures Intelligence test subjected to factor analysis to determine the underlying structure. Criterion prediction procedures Concurrent validity Personality questionnaire to diagnose the Predictive validity presence of depression. Aptitude test to predict future performance in an engineering course. * Concurrent validity can be employed as a substitute for predictive validity if the criterion data are already available (e.g. current job success of the individuals tested).

Types of validity: 3 types of validity or validation procedures: content-description, construct-identification and criterion-prediction procedures. These types of validity work in a logical sequence to establish a larger validity picture – content validation, construct-orientation strategies and finally criterion prediction. Content-description procedures: There are two important aspects when content validity of a measure. Face validity – type of validity in non-psychometric or non-statistical terms. Does not refer to what a test actually measures but what it appears to measure. Whether it looks valid to test-takers, a desirable characteristic for a measure. May be achieved by using appropriate phrasing. Testees or a panel of subject experts are used to assess face validity. Once the test developer is satisfied with this content validity facet, s/he can proceed to the contentvalidation procedure. Content validity – involves determining whether the content of a measure covers a representative sample pf the behaviour domain/aspect to be measured. Only partially a non-statistical type of validity. A panel of subject experts to evaluate the item during the test-construction phase. Content validity is relevant for evaluating achievement, educational and occupational measures. A basic requirement for domain-referenced/criterion-referenced and job sample measures. The most important validity to establish for aptitude and personality measures. Construct-identification procedures Construct validity – the extent to which a measure measures the theoretical construct or trait it is supposed to measure. A construct is a theoretical abstraction that serves as a label for sets of behaviours that appear to go together in nature. Establishing construct validity normally involves a quantitative, statistical analysis procedure. Statistical methods used – factorial validity and correlation with other tests – to establish convergent and discriminant validity. Factorial validity – factor analysis is a statistical technique used for analysing the interrelationships of variables. The aim is to determine the underlying structure or dimensions of a set of variables because, by identifying the common variance between them, it is possible to reduce a large number of variables to a relatively small number of factors or dimensions. Factorial validity refers to the underlying dimensions tapped by the measure. This procedure is usually used when a new measure is developed or when an existing measure is applied in a different context to where the original one was validated.

Correlation with other tests: Convergent and discriminant validity – A measure demonstrates construct validity when it correlates highly with other variables with which it should theoretically correlate (convergent validity) and when it correlates minimally with variables from which it should differ (discriminant validity). This method is usually used to determine whether a new construct is sufficiently isolated from other dissimilar instruments or constructs. Criterion-prediction procedures Criterion prediction validity/criterion-related validity – involves the calculation of a correlation coefficient between predictors and a criterion. Determined by means of quantitative/statistical procedure. Two different types – concurrent and predictive validity. Concurrent validity involves the accuracy with which a measure can identify or diagnose the current behaviour or status regarding specific skills or characteristics of an individual. Predictive validity refers to the accuracy with which a measure can predict the future behaviour o category status of an individual. Read pages 75 – 78.

r Coefficient of determination (¿¿ 2) :

¿

The square of the validity coefficient, indicates the proportion of variance in the criterion variable that is accounted for by variance in the predictor score or the amount of variance shared by both variables. Know!↓ Standard error of estimation and predicting the criterion – pages 78-79

Chapter 6 – Developing a psychological measure: It takes 3-5 years to develop a measure. A psychological measure needs to be planned carefully, items need to be written and the initial version of the measure needs to be administered so that the effectiveness of the items can be determined. The final items are chosen and the measure is administered to a representative group so that the measures validity, reliability and norms can be established. ...