TEST Measurment & Evaluation PDF

Title TEST Measurment & Evaluation
Author Noorkalam Sekh
Course m.p.ed
Institution Visva Bharati University
Pages 20
File Size 277.8 KB
File Type PDF
Total Downloads 50
Total Views 137

Summary

Check...


Description

Meaning of Tests A procedure intended to establish the quality, performance or reliability of something, especially before it is taken into widespread use. Meaning of Measurement Measurement is actually the process of estimating the values that is the physical quantities like; time, temperature, weight, length etc. each measurement value is represented in the form of some standard units. The estimated values by these measurements are actually compared against the standard quantities that are of same type. Measurement is the assignment of a number to a characteristic of an object or event, which can be compared with other objects or events. The scope and application of a measurement is dependent on the context and discipline. Meaning of Evaluation Evaluation is a broader term that refers to all of the methods used to find out what happens as a result of using a specific intervention or practice. Evaluation is the systematic assessment of the worth or merit of some object. It is the systematic acquisition and assessment of information to provide useful feedback about some object. Test have been proved to possess great utility in many sphere:  Selection for training  Guidance  Classification according to the level of intelligence  Research  Appointments  Prediction  Diagnosis MEASUREMENT SCALES The most widely used classification of measurement scales are: (a) nominal scale; (b) ordinal scale; (c) interval scale; and (d) ratio scale. (a) Nominal scale: Nominal scale is simply a system of assigning number symbols to events in order to label them. The usual example of this is the assignment of numbers of basketball players in order to identify them. Such numbers cannot be considered to be associated with an ordered scale for their order is of no consequence; the numbers are just convenient labels for the particular class of events and as such have no quantitative value. Nominal scales provide convenient ways of keeping track of people, objects and events. One cannot do much with the numbers involved. Accordingly, we are restricted to use mode as the measure of central tendency. There is no generally used measure of dispersion for nominal scales. Chi-square test is the most common test of statistical significance that can be utilized, and for the measures of correlation, the contingency coefficient can be worked out. 1

Nominal scale is the least powerful level of measurement. It indicates no order or distance relationship and has no arithmetic origin. A nominal scale simply describes differences between things by assigning them to categories. Nominal data are, thus, counted data. The scale wastes any information that we may have about varying degrees of attitude, skills, understandings, etc. In spite of all this, nominal scales are still very useful and are widely used in surveys and other ex-post-facto research when data are being classified by major sub-groups of the population. (b) Ordinal scale: The lowest level of the ordered scale that is commonly used is the ordinal scale. The ordinal scale places events in order, but there is no attempt to make the intervals of the scale equal in terms of some rule. Rank orders represent ordinal scales and are frequently used in research relating to qualitative phenomena. Thus, the use of an ordinal scale implies a statement of ‘greater than’ or ‘less than’ (an equality statement is also acceptable) without our being able to state how much greater or less. The real difference between ranks 1 and 2 may be more or less than the difference between ranks 5 and 6. Since the numbers of this scale have only a rank meaning, the appropriate measure of central tendency is the median. A percentile or quartile measure is used for measuring dispersion. Correlations are restricted to various rank order methods. Measures of statistical significance are restricted to the non-parametric methods. (c) Interval scale: In the case of interval scale, the intervals are adjusted in terms of some rule that has been established as a basis for making the units equal. The units are equal only in so far as one accepts the assumptions on which the rule is based. Interval scales can have an arbitrary zero, but it is not possible to determine for them what may be called an absolute zero or the unique origin. The primary limitation of the interval scale is the lack of a true zero; it does not have the capacity to measure the complete absence of a trait or characteristic. Interval scales provide more powerful measurement than ordinal scales for interval scale also incorporates the concept of equality of interval. As such more powerful statistical measures can be used with interval scales. Mean is the appropriate measure of central tendency, while standard deviation is the most widely used measure of dispersion. Product moment correlation techniques are appropriate and the generally used tests for statistical significance are the ‘t’ test and ‘F’ test. (d) Ratio scale: Ratio scales have an absolute or true zero of measurement. The term ‘absolute zero’ is not as precise as it was once believed to be. We can conceive of an absolute zero of length and similarly we can conceive of an absolute zero of time. Ratio scale represents the actual amounts of variables. Measures of physical dimensions such as weight, height, distance, etc. are examples. Generally, all statistical techniques are usable with ratio scales and all manipulations that one can carry out with real numbers can also be carried out 2

with ratio scale values. Multiplication and division can be used with this scale but not with other scales mentioned above. Geometric and harmonic means can be used as measures of central tendency and coefficients of variation may also be calculated. Sources of Error in Measurement

Measurement should be precise and unambiguous in an ideal research study. This objective, however, is often not met with in entirety. As such the researcher must be aware about the sources of error in measurement. The following are the possible sources of error in measurement. (a) Respondent: At times the respondent may be reluctant to express strong negative feelings or it is just possible that he may have very little knowledge but may not admit his ignorance. All this reluctance is likely to result in an interview of ‘guesses.’ Transient factors like fatigue, boredom, anxiety, etc. may limit the ability of the respondent to respond accurately and fully. (b) Situation: Situational factors may also come in the way of correct measurement. Any condition which places a strain on interview can have serious effects on the interviewerrespondent rapport. For instance, if someone else is present, he can distort responses by joining in or merely by being present. If the respondent feels that anonymity is not assured, he may be reluctant to express certain feelings. (c) Measurer: The interviewer can distort responses by rewording or reordering questions. His behaviour, style and looks may encourage or discourage certain replies from respondents. Careless mechanical processing may distort the findings. Errors may also creep in because of incorrect coding, faulty tabulation and/or statistical calculations, particularly in the data-analysis stage. (d) Instrument: Error may arise because of the defective measuring instrument. The use of complex words, beyond the comprehension of the respondent, ambiguous meanings, poor printing, inadequate space for replies, response choice omissions, etc. are a few things that make the measuring instrument defective and may result in measurement errors. Another type of instrument deficiency is the poor sampling of the universe of items of concern. Reliability and Validity of the Test Reliability A test constructor has to check the accuracy and precision of the measurement procedure as well as the extent to which the test measures what it tends to measure. The term Reliability refers to the stability or consistency as well as the precision which enter into measurement procedure. Methods of determining Reliability: Various methods of estimating test reliability are in vogue. They are : The following methods are commonly used to gauge the reliability or self-correlation of the test.

3

The Test-Retest Method: The simplest way to measure the reliability of a test is to apply it again to the same group after an interval of time. The results of the two trials are then correlated and the co-efficient of correlation denotes the reliability of the test. Split-Half Method: In this method, the test is divided into two equivalent halves and the correlation between the scores of the two halves is calculated which gives the half-test reliability. From the half-test reliability, the self-correlation of the whole test is calculated by Spearman-Brown formula.

Alternate or parallel forms Method: Instead of giving the same test again, the alternate or parallel form of the test is used in the second trial. If the alternate form is used after a fairly long interval, the practice and confidence effects are, to a large extent: eliminated. This method gives more reliable results than the TestRetest method. Rational Equivalence Method: This is the fourth method of estimating test reliability. This method involves to many calculations and therefore can be applied with ease only to tests having a few items. To determine the reliability by this method, Kuder- Richardson formula is used. So this method is also known as Kuder- Richardson method.

4

Validity According to John W. Best (1995), "The validity of a test may be defined as the accuracy with which it measures that which it is intended to measure or as the degree to which it preaches infallibility in measuring what it purports to measure."3 The evaluation of a test does not end with the estimation of the stability and precision of its measurement. It only begins there. A highly reliable test may not measure what it intends to measure. Besides, it is necessary to know how of what is intended, is measured as well as to be sure that nothing else is measured. The question is fundamental with “Assessment” tests but not with the predictor tests. Such tests are more concerned with is termed as “Concept” or “Construct” validity. Methods of Validity: Thorndike and Hagen have categorized the three types of validity namely. (1) Congruent validity. (2) Concurrent validity (3) Predictive validity. Anastasia discusses face validity and factorial validity in addition to content validity and various types of empirical validity. Ross and Stanly mention curricular validity. Gullisken has discussed instrusic validity and Moiser differentiated four types of face validity. 1. Validity by assumption 2. Validity by definition 3. The appearance of validity 4. Validity by hypothesis. • Concurrent Validity: It is concerned with the relation of test scores to an accepted contemporary criterion of performance on the variable which the test is intended to measure. • Construct Validity: It is concerned with “What qualities a test measures” and is evaluated, “by demonstrating that certain explanatory constructs account to some degree for performance on the test. • Content Validity: It is concerned with the adequacy of sampling of a specified universe of content. • Curricular Validity: It is determined by examining the content of the test itself and judging the degree to which it is a true measure of the important objectives of the curse, or truly a representative sampling of the essential materials of instruction. • Empirial Validity: It refers to the relation between test scores and direct measure of that which the test is designed to predict. • Face Validity: It refers not to what a test necessarily measures but to what it appears to measure. • Factorial Validity: It is the co-relation between that test and the factors common to a group of test or other measure of behavior. Such validity is based on factor analysis. • Intrinsic Validity: It involves the use of experimental techniques other than correlation with a criterion to provide objective, quantitative evidence that the test is measuring what is ought to measure.

5

• Predictive Validity: It is concerned with the relation of test scores to measures on a criterion based on performance at some later time. These types of validity are not all distinctly different from each other. In fact, one or two of them are practically identical with one or two others. But enough major differences appear to justify grouping them into two major categories. These concerned with primary or direct validity and those concerned with secondary or derived validity.

Physical Fitness Harrison Clark defined, ‘The definition of physical fitness is one's ability to carry out daily tasks with vigor and alertness, without undue fatigue, and with ample energy to enjoy leisure time pursuits and to meet the above average physical stresses encountered in emergency situations.’ physical fitness is considered a measure of the body’s ability to function efficiently and effectively in work and leisure activities, to be healthy, to resist hypo kinetic diseases, and to meet emergency situations in life.’ PHYSICAL FITNESS TEST The AAHPER Youth Fitness Test (1958) The AAHPER Test was evolved in 1958 by a Committee of the Research Council of the American Association for Health, Physical Education, and Recreation. Its validity was based on a criterion of critical thinking and selection of the test items by the experts on the Research Council. The test items were designed to measure the following aspects of physical fitness: arm and shoulder strength, abdominal strength and muscular endurance, speed and agility, arm and shoulder co-ordination and cardio-vascular fitness. The battery of tests consists of pull-pus, sit-ups, 40-yard shuttle run, 50-yard dash, 600-yard run-walk, standing broad jump and a softball throw for distance. Two sets of percentile norms are available. One is based on age, and the other on the Neilson-Cozens Classification Index. California Physical Fitness Test (1961) A new revision of the California Physical Performance Test for boys and girls from ten to eighteen years of age was announced in 1961. A single battery of six test items was adopted consisting of standing long jump, knee bent sit ups in one minute, sidesteps, pull-ups, chair pushups and six-minute jog-walk. A unique test in this battery is the side-step. For this test, three parallel lines, each 5 feet long, are marked on the floor or ground; the distance from the middle of the center line to the outer borders of each of the other line is 4 feet. The pupil takes a standing position astride the center line, on the signal to start, he or she sidesteps to the left until the left foot completely crosses the left line, then sidesteps to the right across the center line and touches outside the right line with 6

the right foot, continues as rapidly as possible for ten seconds. Scoring - one point each time a line is crossed, left, center or right. Subsequently, the flexed-arm hang was allowed as an alternate test for girls who could not perform a single pull-up. The norms consist of separate percentile tabled for boys and girls at each age for each of the six tests. Fleishman Battery of Basic Fitness Tests (1964) This battery of ten tests was evolved by Fleishman to measure nine basic fitness factors: Extent Flexibility, Dynamic Flexibility, Explosive Strength, Static Strength, Dynamic Strength, Trunk Strength, Co-ordination, Equilibrium and Stamina. In this respect, the battery is all-embracing and as it employs little in the way of apparatus, it is highly recommended for use in schools. Vermont School Fitness Test The Vermont Governor’s Council on Physical Fitness has provided a motor fitness test battery from kindergarten through grade twelve for use by the schools in that state. Since the AAHPERD Youth Fitness Test does not provide norms for ages under ten years, modified test items were developed for younger ages. To keep school levels intact and to utilize the AAHPERD battery when applicable, the modified tests were recommended for the elementary school and the AAHPERD tests for the secondary school. The modified test items are optional for secondary school boys and girls in order for them to enter achievements for special Vermont Fitness awards. Four items compose the modified test battery. Standing long jump, bent-knee, situps, desk pull-ups and run a) Standing Broad Jump- This test is the same as for the AAHPERD Revision, except for an administration aid, the lower back edges of the jumper’s shoes are rubbed with soft carpenter’s chalk, and the resultant marks on the floor from the jump provide location of the heels for exact measurement of the jump. b) Figure -8 Run- Two persons, who serves as counters, sit crosslegged on the floor with back toward each other, one hand of each is placed flat on the floor with end of fingers at outside edge of tapes stuck to the floor. The tapes are 11 feet apart. The runner starts with the left hand on the back of the counter’s hand and runs a figure-8 pattern around the two counters, touching the back of each counter’s hand with inside hand when passing by. Each counter announces the number of touches each time the runner goes by. At the end of two minutes, the touches of the two counters are added to obtain the runner’s score. The timer announces the time remaining at one minute, thirty seconds and fifteen seconds to go. c) Desk Pull-Up- This test replaces the AAHPERD Pull-Ups for boys and flexed arm hang for girls. A broomstick (or bar) is placed on top of two desks,30 inches apart and is held in place by a person at each desk; rubber bands are placed together to a length of 20 inches, to be stretched to 30 inches and tied to the legs of the desk at two-thirds and height of the bar and directly under it. The subject hold of the bar with back of arms just touching the elastic band and with palms

7

toward feet; the arms and body are straight, with back and seat clear of floor and only heels touching floor. The subject slowly pulls up to hook chin over the elastic band and returns to starting position; this is repeated as many times as possible. The Pull-ups must be continuous; no resting is allowed; only pull-ups done slowly and correctly are counted. The height of the bar may be adjusted for longer arms by placing books under it. Chairs with arm rests or bar held at the proper height by two helpers may be substituted for the desks. The correlation between this test and pull-ups for boys grades eight through twelve was .77. he construction and standardization of physical fitness test batteries started around 1940 and to date such studies are going on. MOTOR FITNESS The word physical fitness and motor fitness are often used interchangeably. The term motor fitness was developed to describe a broad concept than physical fitness. This extensive term means the ability to perform basic motor skills efficiently and effectively. Motor fitness is an important component for an athlete in order to obtain optimal performance in sports. The level of motor abilities components is of prime ' importance for learning of various activities and perfection of different skills. Traditionally motor abilities have been viewed as a combination of factors that are basic to all moments. All the factors of motor ability are chiefly concerned with the ability of the player and his capacity of action. The level of motor ability is the prime importance for learning various general activities and perfection of different skills in various sports and physical activities. Motor ability is sometimes used to mean achievement of basic motor skills. It also indicates present athletic ability. MOTOR FITNESS TEST Oregon Motor Fitness Test This test was developed with the co-operative effort of Oregon State Committee, Oregon State College, and University of Oregon, the test was constructed to measure arm and shoulder girdle strength and endurance, abdominal strength and endurance, muscular power, running speed and endurance, agility, and trunk flexibility. It is a three-item test battery which includes Pull-ups, Jump and Reach, and Potato Race, 160 yards for Junior and Senior High School boys. T-score scales and the ...


Similar Free PDFs