Chapter 4 Measures of Variability PDF

Title Chapter 4 Measures of Variability
Course Social Statistics
Institution Sam Houston State University
Pages 6
File Size 80.3 KB
File Type PDF
Total Downloads 34
Total Views 153

Summary

textbook notes ...


Description

Chapter 4 Measures of Variability     

Although measures of central tendency can be very helpful, they tell only part of the story, they may mislead instead of inform Summarizing a distribution of data is by selecting a single number that describes how much variation and diversity there is in the distribution Measures of variability- numbers that describe diversity of variability in the distribution of a variable Researchers often use measures of tendency along with measures of variability to describe their data 5 measures of data 1. The index of qualitative variation 2. The range 3. The interquartile range 4. The standard deviation 5. The variance

the importance of measuring variability      

The importance of looking at variation and diversity can be illustrated by thinking about the difference in the experience of U.S women Are women united by their similarities or are they divided by their differences? o The answer is both One form of stereotyping is treating a group as If they were totally characterized by its central value, ignoring the diversity within the group The concept of variability has implications not only for describing the diversity of social groups such as Asian American women but also for issues that are important in your everyday life One of the most important issues facing the academic community is how to reconstruct the curriculum to make it more responsive to the need of the students Scores are spread from the center of distribution is as important as information about the central tendency in a distribution

The index of qualitative variation   

   

The U.S is undergoing a demographic shift Formerly a European dominated country is now characterized by a racial, ethnic and cultural diversity These changes challenge us to rethink every conceptualized of society based solely on the experience of European population and force us to ask questions that focus on the experiences of different racial/ethnic groups The index of qualitive variation (IQV)- is measure of variability for nominal variables such as race and ethnicity The index can vary from 0.00 to 1.00 When all the cases in the distribution are in one category there is no variation and the IQV is 1 Steps for calculating the IQV o IQV = K(100² - ∑Pct²/100²(K-1)



 K = the number of categories  ∑Pct² = the sum of all squared percentages in the distribution o It is important to remember that the IQV is partially a function of the number of categories o Steps to calculate the IQV  Construct a percentage distribution  Squared the percentages for each category  Sum the squared percentages  Calculate the IQV using the formula  IQV = K(100² - ∑Pct²/100²(K-1) Expressing the IQV as a percentage o The IQV can be expressed as percentage rather than a proportion o Simply multiply the IQV by 100 o The IQV would reflect the percentage of racial/ ethnic differences relative to the maximum possible differences in each distribution

Statistics in practice diversity in U.S society       

 

By the middle of this century the U.S will no linger be predominantly a white society 2044 the U.S will be a minority- majority There will be no racial or ethnic majority it will be close to equal Population shifts began in 1990 Chain migration- migrants use social capital and knowledge of the migration process 2010- about 74% of all foreign born lived in 10 states States with the highest percentage of foreign born was: o California = 27% o New York = 22% o New jersey= 21% Diversity is a characteristic of a population many of us can sense intuitive We use IQV to measure the amount of diversity in different regions

The Range      

The simplest and most straightforward measure of variation is the range Range- measures variation in interval-ratio variables It is the difference between the highest and the lowest scores in the distribution o Range = highest score – lowest score The range can also be calculated on percentages To find the range in a distribution simply pick out the highest and lowest range is simple and quick scores and subtract The two scores might be extreme or atypical which can make the range a misleading indicator of the variation in the distribution

The interquartile range 

To remedy the limitation of the range we can employ an alternative- the interquartile range (IQR)



     

Interquartile range (IQR)-a measure of variation for interval- ratio and ordinal variables is the width of the middle 50% of the distribution, the difference between the lower and upper quartiles (Q₁ and Q₃) o IQR = Q₃ – Q₁ The first quartile (Q₁) is the 25th percentile, the point at which 25% of the cases fall below it and 75% above it The third quartile (Q₃) is the 75th percentile the point at which 75% of the cases fall below I and 25% above it The IQR defines variation for the middle 50% of the cases The IQR is based on only 2 scores It is based on intermediate scores rather than on extreme scores in the distribution, and avoids some of the instability associated with the range Steps for calculating the IQR o To find the Q ₁ and Q₃ order the scores in the distribution form highest to lowest or vice versa o Identify the first 25% or Q₁  Q₁ = N*0.25 o Then identify 75% or Q₃  Q₃ = N*0.75 o We are now ready to find the IQR o It may be more useful to report the full IQR rather than the single value

The box plots   



  

A graphic device called the box plot can visually present the range, the IQR, the median, the lowest score and the highest score The box plot provides us with a way to visually examine the center, the variation and the shape of distributions of interval- ratio variables We can easily draw a box plot by hand following these instructions 1. Draw a box between the lower and upper quartiles 2. Draw a solid line within the box to mark the median 3. Draw vertical lines outside the box, extending to the lowest and highest values We can learn from creating a box plot? o We can obtain a visual impression of the following properties  The center of the distributions is easily identified by the solid line inside the box  Since the box is drawn between the lower and upper quartiles, the IQR is reflected in the height of the box  The length of the vertical lines drawn outside the box represents the rang of the distribution Both the IQR and the range give us a visual impression of the spread in the distribution The relative position of the box and the position of the median within the box tell us whether the distribution is symmetrical or skewed A perfectly symmetrical distribution would have the box at the center of the range as well as the median in the center of the box





When the distribution form symmetry, the box and or the median in will not be centered; it will be closer to the lower quartile when there are more cases with lower scores or to the upper quartile when there are more cases with higher scores Box plots are particularly useful for comparing distributions

The variance and the standard deviation    



  

   

2010- the elderly population in the U.S is 13 times as large as in 1990 and it is projected to increase The pace and distribution of these demographic changes will create compelling social, economic, and ethical choices for individuals, families, and government It is important to know the average projected percentage increase for the nation as a whole, you may also want to know whether regional increases might differ form the national average If the regional projected increases are close to the national average, the figure will cluster around the mean, but if the regional increases deviate much from the national average, they will be widely dispersed around the mean How large are these deviations on the average? o We want a measure that will give us information about the overall variations among all regions in the U.S and unlike the range or the IQR, will not be based on only 2 scores Such a measure will reflect how much, on average each score in the distribution deviates from some central point such as the mean We use the mean as the reference point rather than other kinds of averages (the mode or the median) because the mean is based on all the scores in the distribution Another reason for using the mean as a reference point is that more advanced measures of variation require the use of algebraic properties that can be assumed only by using the arithmetic mean The variance and the standard deviation are two closely related measured of variation that increase or decrease based on how closely the scores cluster around the mean Variance- a measure of variation for interval and ordinal variables; the average of the squared deviations from the center (mean) of the distributions standard deviation- a measure of variation for interval ratio and ordinal variables; is equal to the square root of the variance Calculating the deviation from the mean o This difference called a deviation from the mean is symbolized as (Y - ῩῩ) o The sum of these deviations can be symbolized as ∑ (Y- ῩῩ) o Note that each region has either a positive or a negative deviation score o The deviation is positive when the percentage change in the elderly home population is above the mean o It is negative when the percentage change is the below the mean o Could calculate the average of these deviations by simply adding up the deviations and dividing them. Unfortunately, we cannot, the sum of the deviations of scores form the mean is always form the mean is always 0 or algebraically ∑ (Y- ῩῩ) o This is always true because the mean is the center of gravity of the discrimination

We can overcome this problem either by ignoring the plus and minus signs, using instead the absolute values of the deviations or by squaring the deviations that is multiplying each deviation by itself to get rid of the negative sign o Since absolute values are difficult to work with mathematically, the latter method is used to compensate for the problem o The sum of the squared deviations is symbolized as ∑ (Y- ῩῩ) ²  By squaring the deviations, we end up with a sum representing the deviation from the mean, which is positive Calculating the variance and the standard deviation o The average of the squared deviation from the mean is known as the variance o The variance is symbolized as s² o We are interested in the average of the squared deviations from the mean o We need to divide the sum of the squared deviations by the number scores (N) in the distribution we will use N – 1 rather than N in the denominator  s² = ∑ (Y- ῩῩ) ² / N – 1 o where  s² = the variance  (Y-ῩῩ) = the deviation from the mean  ∑ (Y- ῩῩ) ²= the sum of the squared deviations from the mean  N=the number of scores o This formula means that the variance is equal to the average of the squared deviations from the mean o Steps to calculate the variance  Calculate the mean, ῩῩ= ∑ (Y) / N  Subtract the mean from each score to find the deviation, Y - ῩῩ  Square each deviation, (Y - ῩῩ) ²  Sum the squared deviation, ∑ (Y- ῩῩ) ²  Divide the sum by N – 1, ∑ (Y- ῩῩ) ² / N – 1  The answer is the variance o One problem with the variance is that is based on squared deviations and therefore is no longer expressed in the original units of measurement o We often take the square root of the variance and interpret it instead  S √s² o The formula for the standard deviation uses the same symbols as the formula for the variance  s = √ ∑ (Y- ῩῩ) ²/ N – 1 o as we interpreted the formula we, can say that the standard deviation is equal to the square root of the average of the squared deviations from the media o the advantage of the standard deviation is that unlike the variance it is measured in the same units as the original data o because the original data were expressed in percentages, this number is expressed as a percentage as well o in a distribution where all the scores are identical, the standard deviation is 0 o



o o o o o o

0 is the lowest possible value for the standard deviation in an identical distribution all the points would be the same, with the same mean, mode, and median. The more the deviation departs from 0 the more variation there is in distribution There is no upper limit to the value of the standard deviation The standard deviation can be considered a standard against which we can evaluate the positioning of scores relative to the mean and to other scores in the distribution In most distributions unless they are highly skewed about 34% of all scores fall between the mean and 1 standard deviation above the mean Another 34% of scores fall between the mean and 1 standard deviation below it. Thus we would expect the majority of scores 68% to fall within 1 standard deviation of the mean

Considerations for choosing a measure of variation        



Each measure can represent the degree of variability in a distribution But which one should we use? o There is no simple answer to this question We tend to use only one measure of variation and the choice of the appropriate one involves a number of considerations These considerations and how they affect our choice of the appropriate measure are presented in the form of a decision tree As in choosing a measure of tendency one of the most basic considerations in choosing a measure of variability is the variables level of measurement valid use of any of the measures requires that the data are measured at the level appropriate for that measure or higher nominal level- with nominal variables, your choice is restricted to the IQV as a measure of variability ordinal level- the choice of measure of variation for ordinal variables is more problematic o the IQV can be used to reflect variability in the distribution of ordinal variables but because IQV is sensitive to the rank ordering of values implied in ordinal variables, it loses some information o another possibility to use the IQR interpreting the IQR as the range of rank ordered values that includes the middle 50% of the observation o in most instances, social science researchers treat ordinal variables as interval ratio measures, preferring to calculate variance and standard deviation interval ratio level- for interval ratio variables, you can choose the variance standard deviation, the range, or the IQR o because the range and to lesser extent the IQR is based on only 2 scores in the distribution, the variance and/or standard deviation is usually preferred o if a distribution is extremely skewed so that the mean is no longer representative of the central tendency in the distribution, the range, and the IQR can be used o the range and the IQR will also be useful when you are reading tables or quickly scanning data to get a rough idea of the extent of dispersion on the distribution...


Similar Free PDFs