Measures of Central Tendency PDF

Title Measures of Central Tendency
Author The Contributor
Course Business in Information Technology
Institution Strathmore University
Pages 14
File Size 317.4 KB
File Type PDF
Total Downloads 35
Total Views 178

Summary

Business Statistics...


Description

MEASURES OFCENTRAL TENDENCY A measure of central tendency is a number that locates the approximate center of a distribution of data. The purpose of a measure of central tendency is to locate the “average” or “typical” case in a distribution of cases. The term ‘average’ in statistics is defined as that value of a distribution which is considered as the most representative or typical value for a group. The most commonly used measures of central tendency or averages are the mean, mode and median. Other types of averages include the weighted arithmetic mean, trimmed mean, geometric mean and harmonic mean. There are two main objectives of averaging 1. To get a single value that describes the characteristics of the entire group 2. To facilitate comparison between groups. Properties of a Good (Average)Measure of Central tendency 1. I t should be easy to understand. It should be readily understood otherwise its use will be limited. 2. It should be simple to compute. It is important to note though that ease of computation should not be at the expense of other advantages. 3. It should be based on all items. It should depend on each and every item of the data set so that if any item is dropped its value is altered. 4. It should not be unduly affected by extreme items. If one or two very small or very large items unduly affect the average then it cannot be typical of the entire data set. Extremes may distort its value and reduce its usefulness 5. It should be rigidly defined. An average should be properly defined preferably by an algebraic formula so that it has only one interpretation. Different people computing it from the same figures should get the same answer. 6. It should be capable of further algebraic treatment. It can be used for further statistical computations to enhance its usefulness.. 7. It should have sampling stability. If we pick different samples from a population and compute the average for each of them we should expect to get approximately the same value. MEAN 1

The mean is computed by summing up all the scores in a distribution and then dividing by the number of scores. It should not be used for ordinal or nominal data.

∑ ∑

where

is the sample mean, n is the total number of items int he sample

where

is the population mean, N is the total number of items in the population.

Where some of the values occur more than once





or

Example Find the mean of the following 4,4,5,5,6,6,6,7,7,7 x f

Fx

4 2

8

5 2

10

6 3

18

7 3

12



10

57 10

5 .7



57

For a grouped distribution the midpoint of each interval is used to represent all the scores within that interval. It is assumed that the scores in the interval are evenly distributed. Example The data in the table is for the number of kilometers run during one week for a sample of 20 runners.

2

No of km No of runners(f) Midpoint (x) f.x 5.5-10.5

1

8

8

10.5-15.5 2

13

26

15.5-20.5 3

18

54

20.5-25.5 5

23

115

25.5-30.5 4

28

112

30.5-35.5 3

33

99

35.5-40.5 2

38

76

∑ 490 20



20

.

490

24.5

Properties of the mean 1. It is the point in a distribution of measurements or scores about whichthe sum of the deviations are equal to zero. ∑

0

1

e.g The mean of 2,3,5,7 and 8 is 5 (2 5) (3 5) (5 5) (7 5) (8 5)

0

The mean is therefore characterized as a point of balance. It is the value that balances all scores on either side of it. 2. The sum of the square of the deviations of the items from the arithmetic mean is minimum, that is it is less than the sum of squared deviations of items from any other value. This property is of immense use in regression analysis which is topic to be covered later. e.g taking the square of the deviations from the mean for the data set 2,3,5,7 and 8 for each of the data values in the table below. x

(

2) 2

(

3) 2

)2

(

3

(

7) 2

(

8) 2

5 2 0

1

9

25

36

3 1

0

4

16

25

5 9

4

0

4

9

7 25

16

4

0

1

8 36

25

9

1

0



2)2

(

71 ∑ (

3)2



46

(

)2

26



(

7)2

46



(

8)2

3.If each item in a series is replaced by the mean, then the sum of these substitutions will be equal to the sum of the individual items. E.g for the data set 2,3,5,7 and 8 whose mean is 5 5 5 5 5 5

2 3 5 7 8

25 Therefore



If the average wage in a company with 1500 workers is 25,000 then the total wage bill is 1500 25,000 37,500,000 4.Using the arithmetic mean and number of items of two or more related groups we can compute the combined mean.If the average wage in the Nairobi branch of a company is 17,500 and the branch has 150 workers, while the Mombasa branch has 200 workers with an average of 15,000, the combined average for the two branches can be computed as 150 17,500 200 15,000 follows: 16071 .43 150 200 1 1

2

1

2

2

12

where

12

is the combined mean of the two groups

1

is the mean of the first group,

1

is the number of items in the first group,

2

is the mean of the second group 2

is the number of items in the second group

Merits of the Arithmetic mean 1. It is the simplest average to understand and easiest to compute 2. Its computation is based on all items of the series 4

71

3. It is rigidly defined. Every one computing the arithmetic mean for the same data set will get the same answer. 4. It lends itself to further algebraic treatment. 5. It is relatively reliable in the sense that it does not vary too much when repeated samples are taken from one and the same population.

Limitations of the Arithmetic mean 1. The mean is very sensitive to extreme values when these are not evenly dispersed on both sides of it. E.g Comparing two data sets 2,3,5,7,8 and 2,3,5,7,33. The mean for the first one is 5, while that for the second one is 10. The large score of 33 in the second group makes the mean of the second group double. When a distribution is markedly skewed the mean provides a misleading measure of central tendency. The mean provides a an appropriate ‘average’ , only when the distribution of a variable is reasonably normal(bell shaped) Income is a commonly studied variable in which the median is preferred over the mean , since the distribution is distinctly skewed in the direction of high incomes. 2. The mean cannot be computed in a distribution with open ended classes without making assumptions regarding the size of the class interval of the open ended classes which may lead to substantial errors.

THE MEDIAN The median is the half way point in a data set. It is the point at which 50% of the values in the data set have a value the size of the median value or smaller and 50% of the values have a value the size of the median value or larger. The median is a positional average and unlike the mean its computation is not based on all items.

Calculation of the median: To find the median the data values must be arranged on order . The median is the size of the 1 2 item.e.g the income of five CEOs in USD is 15000, 23285,27000,28564,30000 The median here will be the income of the

5 1 2

3 rd CEO when incomes are arranged in

ascending order. Median =27000 USD

Median for grouped data: Calculation of median for grouped data involves interpolation which gives an estimate of the median. It involves a simplifying assumption that the scores are evenly distributed throughout 5

1 to locate the position of the median instead of 2 2 .This is because in a grouped data all the frequencies lose their individuality and the effort here is to find that one value that will have 50% of the values below it and 50 % above it.

the interval in question. We use

e.g for the following number of absences per year in different branches of a company No of absences

0-4 5-9 10-14 15-19 20-24 25-29 30-34

No of branches

16

21

12

11

10

8

2

Cumulative frequency 16

37

49

60

70

78

80

80 40 th absence.This is in the class 10-14.We shall assume 2 that the number of absences are evenly distributed in this class i.e the difference between absence of one branch and the next is the class interval divided by the number of branches in the class, 5 / 12

The median no of absences is the

The median is the 41 st absence in but the 40-37= 3 th absence in the class. Median= 9.5

5 3 =10.75 12

2

Where L is the lower boundary of the median class is the cumulative frequency of the class preceding the median class is the frequency of the median class is the class interval of the median class

Properties of the median It is insensitive to extreme scores

Merits 1. Useful incase of open ended classes 6

2. Not influenced by extreme scores and therefore is preferred to the arithmetic mean in skewed distributions such as income 3. It is most appropriate when dealing with qualitative data i.e where ranks are given or there are other types of items that are not counted or measured but are scored. 4. The median can be determined graphically

Limitations 1.Its computation does not involve all items in the series 2.It is not capable of further algebraic treatment e.g we cannot find the combined median of two data sets. 3. Its value is affected more by sampling fluctuations than the value of the arithmetic mean.

MODE This is the most frequently occurring score in a distribution. It is the most typical or fashionable value of a distribution. It is the value around which the items tend to be most heavily concentrated. Distributions that have only one mode are referred to as unimodal. Those with two, three or more mode are referred to as bimodal, trimodal or multimodal respectively. Calculation of mode For individual; observations, the mode will be determined by counting the number of times the various values repeat themselves e.g for the data set 10,27,24,10,27,15,30,15,27,18,27 the mode is 27 For grouped data i.e a continuous series the mode is calculated by applying the formula

   1

2

1 1

  2 

where L is the lower boundary of the modals class

is the difference between the frequency of the modal class and the pre modal class. is the difference between the frequency of the modal class and the post modal class ignoring

the signs. It is necessary to ensure the class intervals are uniform throughout. If they are not equal they should be made equal on the assumption that the frequencies are equally distributed throughout the class to avoid getting misleading results. Example1: Find the mode for the following data 7

No of absences 0-4

5-9 10-14 15-19 20-24 25-29 30-34

No of branches 16

21

12

11

10

8

2

The modal class is 5-9 5

4 .5 5

9

5

6 .3

Example 2:Find the mode for the following data No of km No of runners(f) 5.5-10.5

1

10.5-15.5 2 15.5-20.5 3 20.5-25.5 5 25.5-30.5 4 30.5-35.5 3 35.5-40.5 2

20.5

2 2 1

21.2

Locating mode graphically The mode can be obtained graphically using the following steps 1. Draw a histogram 2. Draw two lines diagonally on the inside of the modal class bar, starting form each upper corner of the bar to the upper corner of the adjacent bar. 3. Draw a perpendicular line from the intersection of the two diagonal lines to the horizontal axis which gives us the modal value.IN the histogram below the reading at P gives the modal value.

8

Merits: 1.It is not affected by extreme values. It is the most meaningful measure of central tendency in the case of highly skewed or non normal distributions. 2. It can be determined for open ended distributions. 3. Can be determined graphically unlike the mean. 4.Can be used to describe qualitative data particularly nominal data. Useful in comparison of consumer preferences ice cream flavours, soap etc. 5. It is the most typical or representative value of a distribution. If the modal wage in a company is Ksh 30,000 then more workers in that company receive Ksh 30,000 than any other wage.

Limitations: 1. The value of the mode can not always be determined. The distribution can be bimodal, trimodal or multimodal. 2. It does not lend itself to algebraic manipulation. 3. It is not based on each and every item of the series.

9

4. It is not rigidly defined. There are several formulae for calculating the mode all of which usually give somewhat different answers.

RELATIONSHIP AMONG MEAN MODE AND MEDIAN: In a symmetrical distribution the values of mean, mode and median coincide.

If the mean, mode and median are not equal the distribution is known as asymmetrical or skewed In a distribution that is skewed to the right or positively skewed the mean is higher than the median.

In a distribution that is negatively skewed the mean is lower than the median

10

The mean is the balance point of the distribution. Because points further away from the balance point change the center of balance, the mean is pulled in the direction the distribution is skewed. For example, if the distribution is positively skewed, the mean would be pulled in the direction of the skewness, or be pulled toward larger numbers. One way to remember the order of the mean, median, and mode in a skewed distribution is to remember that the mean is pulled in the direction of the extreme scores. In a positively skewed distribution, the extreme scores are larger, thus the mean is larger than the median.

Summary of when to use mean, median and mode Type of Variable Nominal Ordinal Interval/Ratio (not skewed) Interval/Ratio (skewed)

Best measure of central tendency Mode Median Mean Median

OTHER TYPES OF ‘AVERAGES’: 1.THE WEIGHTED ARITHMETIC MEAN Used in situations where the relative importance of the different items is not the same

∑ ∑

where

represents the weighted arithmetic mean, w the weights and x the

variable. Example: 11

1.A lecturer grades CATS 20%, term paper 30% and final exam 50%. A student has 83, 72 and 90 respectively for the CATS, term paper and final exam. Find the students final average.(Ans 83.2) 2. Another lecturer gives 4 one hour exams and one final exam which counts as two one hour exams. Find a student’s grade if she received 62,83,97 and 90 on the one hour exams and 82 on the final exam. (Ans 82.7)

2.TRIMMED MEAN A trimmed mean is calculated by discarding a certain percentage of the lowest and the highest scores and then computing the mean of the remaining scores. For example, a mean trimmed 50% is computed by discarding the lower and higher 25% of the scores and taking the mean of the remaining scores. The median is the mean trimmed 100% and the arithmetic mean is the mean trimmed 0%. Trimmed means are often used in Olympic scoring to minimize the effects of extreme ratings possibly caused by biased judges.

3.THE GEOMETRIC MEAN This is the nth root of the product of n items or values .

1

2

...

3

where

1

,

2

,

3,

etc refer to various items of the series.

To simplify calculations when there the number of items is three or more logarithms are used 1

log

log 1 1 log 1 log

log

2

3

log 1

... log

2

log

2

... log

3

log

3

... log

Properties of the Geometric mean 1. The product of values of the series remain unchanged when the value of the GM is substituted for each individual value.e.g the GM for the series 2,4,8, is 4 2 4 8 64 4 4 4

12

2. The sum of the deviations of the logarithms of the original deviations of the original observations above or below the logarithm of the GM is equal i.e in the series above log 2 log 4 log 4 log 4 log 8 log 4 log 2 0 log 2 0 The value of the GM is such as to balance the ratio of the deviations of the observations 4 4 8 2 . Because of this property the GM is useful in finding the average from it 2 4 4 of percentages, ratios, indexes or growth rates

Uses of GM 1. Finding average percent increase in sales, production, population or other economic or business series. 2. Construction of index numbers 3. I is the most suitable average when large weights have ti be given to small items and small weights to large items

Merits of GM 1. Based on all items of the series 2. It is rigidly defined 3. Useful in averaging ratios and percentages and in determining rates of increase and decrease. 4. It gives less weight to large items and more to small ones than does the arithmetic mean. Because of this 5. It is capable of algebraic manipulation. A combined GM can be obtained of two or more series using the formula

12

 log  

1

log

1

2

1

2

log

2

  

Limitations of GM 1. It is difficult to understand 2. It is difficult to compute and interpret 3. It cannot be computed when there are both negative and positive values in a series or one or more of the values is zero.Because of this it has a restricted application

4.HARMONIC MEAN It is the reciprocal of the arithmetic mean of the reciprocal of the individual observations. 1

1

1

1

2

3

...

1

13

For discrete series



1 

∑  For continuous series



∑ 

   

∑ 

  

Uses It is useful in situations where the average of rates is required e.g finding average speed. The weighted harmonic mean is used if the items do not have the same weight e.g when calculating average speed and distances covered at the different speeds differ e.g if a car covers the first 30km at a speed of 40km/h, the next 45 at 65 km/h and the las...


Similar Free PDFs