Summary psychological testing and assessment ronald jay cohen mark e swerdlik PDF

Title Summary psychological testing and assessment ronald jay cohen mark e swerdlik
Author Yezil Rhose Angelei Lusaya
Course BS Psychology
Institution University of Negros Occidental - Recoletos
Pages 9
File Size 297.8 KB
File Type PDF
Total Downloads 98
Total Views 122

Summary

Download Summary psychological testing and assessment ronald jay cohen mark e swerdlik PDF


Description

Summary Psychological Testing and Assessment, Ronald Jay Cohen, Mark E. Swerdlik Testtheorie en inleiding psychodiagnostiek (Universiteit Twente)

StuDocu is not sponsored or endorsed by any college or university Downloaded by Yezil Rhose Angelei Lusaya ([email protected])

Chapter 4 – Of Tests and Testing Some Assumptions about Psychological Testing and Assessment Assumption 1: Psychological Traits and States Exist - Trait: any distinguishable, relatively enduring way in which one individual varies from another - State: distinguish one person from another but are relatively less enduring - Based on observing a sample of behavior  direct observation oder analysis of self-report statements oder pencil-and-papter test answers - Psychological trait covers a wide range of possible characteristics - How they exist? (für das buch) psychological trait exists only as a construct: informed, scientific concept developed or constructed to describe or explain behavior  can infer their existence from overt behavior: observable action or the product of an observable action included test- or assessment-related responses Trait is not expected to be manifested in behavior 100% of the time  manifestation abhängig of strength of the trait in the individual + nature of the situation Context within which behavior occurs also plays a role in helping select appropriate trait terms for observed behavior Trait+state  used to refer to a way in which one individual varies from another  Assessors make such comparisons with respect to the hypothetical average person + comparisons among people who, because of their membership in some group, are decidedly not average  reference group can influence one’s conclusions or judgments Assumption 2: Psychological Trait and States Can Be Quantified and Measured - Specific traits and stats to be measured and quantified need to be carefully defined - People in general, have many different ways of looking at and defining the same phenomenon - Once having defined the trait, state, or other construct  considers the types of item content that would provide insight into it  from a universe of behaviors presumed to be indicative of the targeted trait  world of possible items that can be written to gauge the strength of that trait in testtakers - Weighing the comparative value of a test’s items comes about as the result of a complex interlay among many factors, including technical considerations, the way a construct has been defined for the purposes of the test, and the value society attaches to the behaviors evaluated - Test score is presumed to represent the strength of the targeted ability or trait or stat and is frequently based on cumulative scoring: the more the testtaker responds in a particular direction as keyed by the test manual as correct or consistent with a particular trait, the higher that testtaker is presumed to be an the targeted ability or trait Assumption 3: Test-Related Behavior Predicts Non-Test-Related Behavior - Patterns of answers to true-false questions on one widely used test of personality are used in decision making regarding mental disorders - Tasks in some tests mimic the actual behaviors

Downloaded by Yezil Rhose Angelei Lusaya ([email protected])

-

Obtained sample of behavior is typically used to make predictions about future behavior + auch möglich verhalten zu postdict  understanding of behavior that has already taken place

Assumption 4: Tests and Other Measurement Techniques Have Strengths and Weaknesses - Competent test users understand a great deal about the tests they use + understand and appreciate the limitations of the tests they use as well as how those limitations might be compensated for by data from other sources Assumption 5: Various Sources of Error Are Part of the Assessment Process - Error: something that is more than expected; it is actually a component of the measurement process  refers to a long-standing assumption that factors other than what a test attempts to measure will influence performance on the test - Error variance: component of a test score attributable to sources other than the trait or ability measured - Assessees, assessors and measuring instruments themselves are sources of error variance - Classical or true score theory of measurement: assumption is made that each testtaker has a true score on a test that would be obtained but for the random action of measurement error Assumption 6: Testing and Assessment Can Be Conducted in a Fair and Unbiased Manner - Today, all major test publishers strive to develop instruments that are fair when used in strict accordance with guidelines in the test manual - One source of fairness-related problems is the test user who attempts to use a particular test with people whose background and experience are different from the background and experience of people for whom the test as intended - Tests are tools  just like other tools they can be used properly or improperly Assumption 7: Testing and Assessment Benefit Society - Considering the many critical decisions that are based on testing and assessment procedures, we can readily appreciate the need for tests, especially good tests

What is a “Good Test”? -

Psychometric soundness besteht aus 1. Reliability und 2. Validity

Reliability - A good measuring tool or procedure is reliable - Criterion of reliability involves the consistency of the measuring tool: precision with which the test measures and the extent to which error is present in measurements - We want to be reasonably certain that the measuring tool or test that we are using is consistent  we want to know that it yields the same numerical measurement every time it measures the same thing under the same conditions - Reliability is necessary but not sufficient element of a good test Validity - if it does, in fact, measure what it purports to measure - questions regarding a test’s validity may focus on the items that collectively make up the test - individual items will also come under scrutiny in an investigation of a test’s validity

Downloaded by Yezil Rhose Angelei Lusaya ([email protected])

-

validity may also be questioned on grounds related to the interpretation of resulting test scores questions concerning the validity of a particular test may be raised at every stage in the life of a test

Other Considerations - trained examiners can administer, score and interpret a good test with a minimum of difficulty - good test=useful test, one that yields actionable results that will ultimately benefit individual test takers or society at large Everyday Psychometrics – Putting Tests to the Test - Why Use This Particular Instrument or Method?

-

Answers can be found in published sources of info. (test catalogues, test manuals, published test reviews) as well as unpublished sources (correspondence with test developers and publishers and with colleagues who have used th same or similar tests) Are There Any Published Guidelines for the Use of This Test? Sometimes a published guideline for the use of a particular test will list other measurement tools that should also be used along with it

-

Published guidelines and research may also provide useful info. Regarding how likely the use of a particular test or measurement technique is to meet the Daubert or other standards set by courts Is This Instrument Reliable?

-

Careful reading of the test’s manual and of published research on the test, test reviews, and related sources Is This Instrument Valid? Starts with careful reading of the test’s manual as well as published research on the test, test reviews, and related sources

-

-

Need for multiple sources of data on which to base an opinion stems not only from the ethical mandates published in the form of guidelines from professional associations but also from the practical demands of meeting a burden of proof in court Is This Instrument Cost-Effective? Z.B. in WW1+WW2 sollten viele soldaten gescant werden  group tests had greater utility than individual tests What Inferences May Reasonably Be Made from This Test Score, and How Generalizable Are the Findings? In evaluating a test it is critical to consider the inferences that may reasonably be made as a result of administering that test People used to help develop a test has a great effect on the generalizability Another issue regarding the generalizability of findings concerns how a test was admnistered

Norms

Downloaded by Yezil Rhose Angelei Lusaya ([email protected])

-

-

-

-

Norm-referenced testing and assessment: method of evaluation and a way of deriving meaning from test scores by evaluating an individual testtaker’s score and comparing it to scores of a group of testtakers Individual test score is understood relative to other scores on the same test Norms: test performance data of a particular group of testtakers that are designed for use as a reference when evaluating or interpreting individual test scores Normative sample: group of people whose performance on a particular test is analyzed for reference in evaluating the performance of individual testtakersmembers of normative sample will all be typical with respect to some characteristic(s) of the people for whom the particular test was designed Norming: process of deriving norms Race norming: controversial practice of norming on the basis of race or ethnic background  members of one cultural group would have to attain one score to be hired, whereas members of another cultural group would have to attain a different score Is expensive  user norms / program norms: consist of descriptive statistics based on a group of testtakers in given period of time rather that norms obtained by formal sampling methods

Sampling to Develop Norms - Standardization / Test standardization: Process of administering a test to a representative sample of testtakers for the purpose of establishing norms - Sampling: Individuals with at least one common, observable characteristic  this characteristic could be just about anything Sample of a population: portion of the universe of people deemed to be representative of the whole population Sampling: process of selecting the portion of the universe deemed to be representative of the whole population Subgroups within a defined population may differ with respect to some characteristics, and it is sometimes essential to have these differences proportionately represented in the sample Stratified sampling: process of developing a sample based on specific subgroups of a population Stratified-random sampling: process of developing a sample based on specific subgroups of a population in which every member has the same chance of being included in the sample Purposive sample: arbitrarily select some sample because it is believed to be representative of the population Important to distinguish what is ideal and what is practical in sampling Incidental sample / convenience sample: researcher may sometimes employ a sample that is not necessarily the most appropriate bus is rather the most convenient or available for use because of budgetary limitations or other constraints

-

There are many samples that are exclusive, in a sense, since they contain many exclusionary criteria Developing norms for a standardized test:

Downloaded by Yezil Rhose Angelei Lusaya ([email protected])

Having obtained a sample  administer the test according to the standard set of instructions that will be used with the test  describes the recommended setting for giving the test Establishing a standard set of instructions and conditions under which the test is given makes the test scores of the normative sample more comparable with the scores of future testtakers Normative sample take the test under a standard set of conditions, which are then replicated on each occasion the test is administered Test data collected analyzedsummarized using descriptive statistics (measures of central tendency + variability)  provide a precise description of the standardization sample itself Test developers should describe the population(s) represented by any norms or comparison group(s), the dates the data were gathered, and the process used to select he sample of testtakers  in practice descriptions very in detail  want most favorable light possible

-

When people in normative sample and people on whom the test was standardized the same  phrases normative sample + standardization sample used interchangeably  when new norms for standardized tests for specific groups of testtakers are developed some time after the original standardization (including new groups of people underrepresented in original standardization sample data) geht das nicht mehr Close – Up – How “Standard” is Standard in Measurement? Noun standard: that which others are compared to or evaluated against The Standards (The Standards for Educational and Psychological Testing) Adjective standard: what is usual, generally accepted, or commonly employed In some areas of psy. Has been a need to create a new standard unit of measurement in the interest of better understanding or quantifying particular phenomena (standard drink) Verb to standardize: making or transforming something into something that can serve as a basis of comparison or judgment z.B. standardize a definition Standardized test: test that have clearly specified procedures for administration, scoring and interpretation in addition to norms. Such test also come with manuals that are as much a part of the test package as the test’s items Standardization: process employed to introduce objectivity and uniformity into test administration, scoring and interpretation  test standardization been used interchangeably with term test norming

Types of Norms - Percentiles: Divide a distribution into 100 equal parts – 100 percentiles In such distribution, the xth percentile is equal to the score at or below which x% of scores fall Percentile: expression of the percentage of people whose score on a test or measure falls below a particular raw score

Downloaded by Yezil Rhose Angelei Lusaya ([email protected])

Percentage correct: distribution of raw scores- more specifically, the number of items that were answered correctly multiplied by 100 and divided by the total number of items

-

Problem with percentiles with normally distributed scores  real differences between raw scores may be minimized near the ends of the distribution and exaggerated in the middle of the distribution Age norms: Age-equivalent scores / age norms: indicate the average performance of different samples of testtakers who were at various ages at the time the test was administered

-

Age norme tables for physical characteristics (height) noncontroversial and accepted  age norm tables for psychological characteristics (intelligence) controversial  idea of identifying the “mental age” of testtakers has had great intuitive appeal  problem with mental age as a way to report test results is too broad and too inappropriately generalized Grade norms: Designed to indicate the average test performance of testtakers in a given school grade Developed by administering the test to representative samples of children over a range of consecutive grade levels Next, mean or median score for children at each grade level is calculated Do not provide info as to the content or type of items that a student could or could not answer correctly Drawback: useful only with respect to years and months of schooling completed (not children who are not yet in school or are out of school and also not for adults who are back to school)

-

Developmental norms: norms developed on the basis of any trait, ability, skill, or other characteristic that is presumed to develop, deteriorate, or otherwise be affected by chronological age, school grade, or stage of life ( grade norms + age norms gehörten zu diesem Überbegriff) National norms National norms: derived from a normative sample that was nationally representative of the population at the time the norming study was conducted National norms may be obrained by testing large numbers of people representative of different variables of interest such as age, gender, racial/ethnic background, socioeconomic strata, geographical location, different types of communities within the various parts of the country Precise nature of the questions raised when developing national norms will depend on whom the test is designed for and what the test is designed to do

-

2 important questions: “What are the differences between the tests I am considering for use in terms of their normative samples?” + “How comparable are these normative samples to the sample of testtakers with whom I will be using the test?” National anchor norms National anchor norms: provide some stability to test scores by anchoring them to other test scores

Downloaded by Yezil Rhose Angelei Lusaya ([email protected])

Equipercentile method: equivalency of scores on different tests is calculated with reference to corresponding percentile scores

-

Scores must be obtained on the same sample – each member of the sample took both tests, and the equivalency tables were then calculated on the basis of these data Subgroup norms Subgroup norms: a normative sample can be segmented (unterteilt) by any of the criteria initially used in selecting subjects for sample

-

The test manual or a supplement to it might report normative info by each of these subgroups Local norms Local norms: provide normative info with respect to the local populationÄs performance on some test

Developed by test users themselves Some test users use abbreviated (abgekürtzt) forms of existing test  require new norms Fixed Reference Group Scoring Systems - Fixed reference group scoring system: another type of aid in providing a context for interpretation - Distribution of scores obtained on the test from one group of testtakers (fixed reference goup) is used as the basis for the calculation of testscores for future administrations of the test - Bsp: SAT  1990 fixed reference group of 2 million testtakers was immortalized as a standard to be used in the conversion of raw scores on future administration of the test  Wenn Max 1990 50 items richtig hatte und Mia 2008 auch 50 items haben sie nicht automatisch den selben score  je nach schwierigkeitsgrad wird umgerechnet - Fixed reference group scores are most typically interpreted by local decision-making bodies with respect to local norms Norm-Referenced versus Criterion-Referenced Evaluation - Criterion: standard on which a judgment or decision may be based - Criterion-referenced testing and assessment: method of evaluation ad a way of deriving meaning from test scores by evaluating an individual’s score with reference to a set standard - Criterion in criterion-referenced assessments typically derives form the values or standards of an individual or organization (z.B. praktische Fahrprüfung) Domain- or contend- referenced testing and assessment (anderer begriff für criterionreferenced testing and assessment) - In norm-referenced interpretation of test data  focus is how an individual performed relative to other people who took the test - Criterion-r. i. of test data  focus is the testtaker’s performance  darum auch mastery tests genannt - Criterion-r. approach beliebt in computer-assisted education programs  mastery of segments of materials is assessed before the program user can proceed to the next level - Kritik: potentially important info about an individual’s performance relative to other testtakers is lost; although this approach may have value with respect to the assessment of mastery of basic knowledge, skills, or both, it has little or no meaningful application at the

Downloaded by Yezil Rhose Angelei Lusaya ...


Similar Free PDFs