2.1 - Measures of Central Tendency and Skewness STAT 500 - Applied Statistics PDF

Title 2.1 - Measures of Central Tendency and Skewness STAT 500 - Applied Statistics
Course Applied Statistics
Institution The Pennsylvania State University
Pages 6
File Size 419.9 KB
File Type PDF
Total Downloads 38
Total Views 154

Summary

Download 2.1 - Measures of Central Tendency and Skewness STAT 500 - Applied Statistics PDF


Description

STAT 500 Applied Statistics

2.1 - Measures of Central Tendency and Skewnes Printer-friendly version (https://onlinecourses.science.psu.edu/stat500/print Unit Summary Measures of Central Tendency Mean Median Mode Trimmed Mean Skewness Adding and Multiplying Constants

Reading Assignment An Introduction to Statistical Methods and Data Analysis, (See your course schedule.)

Measures of Central Tendency Three of the many ways to measure central tendency are: 1. Mean

the average of the data

2. Median

the middle value of the ordered data

Let's take a closer look at this diagram implies with Dr. Wiesner.

Descriptive measures of population are parameters. Descriptive measures of a sample are stat example, a sample mean is a statistic and a population mean is a parameter. The sample mean denoted by y¯ :

y1 + y2 +

+ yn

n

∑i

1

yi

What if we say we used xi for our measurements instead of yi ? Is this a problem? No. The f simply look like this: n

∑i=1 xi x1 + x2 + … + xn x¯ = = n n The formulas are exactly the same. The letters that you select to denote the measurements ar For instance, many textbooks use x instead of y to denote the measurements. The point is to understand how the calculation that is expressed in the formula works. In this formula is calculating the mean by summing all of the observations and dividing by the numb observations. There is some notation that you will come to see as standards, i.e, n will always equal sample make a point of letting you know what these are. However, when it comes to the variables, th (and do) vary. For example, in one study x may be used to denote weight and y may be used to denote heigh reverse may be used!), but n will always be used to denote sample size in each case.

Note that for the data set: 1, 1, 2, 3, 13 mean = 4, median = 2, mode = 1 Steps to finding the median for a set of data: 1. Arrange the data in increasing order 2. Find the location of median in the ordered data by (n + 1) / 2 3. The value that represents the location found in Step 2 is the median. NOTE: if the sample number then the location point will produce a median that is an observed value as in the e sample size is an even number, then the location will require one to take the mean of two calculate the median. The result may or may not be an observed value as the example bel Mean, median and mode are usually not equal. When the data is symmetric, the mean is equal 4. Trimmed Mean One shortcoming of the mean is that: Means are easily affected by extreme values.

On the other hand, let us see the effect of the mistake on the median value: The original data set in increasing order are: 69, 76, 76, 78, 80, 82, 86, 88, 91, 95 With n = 10, the median position is found by (10 + 1) / 2 = 5.5. Thus, the median is the average and sixth (82) ordered value and the median = 81 The data set (with 91 coded as 9) in increasing order is: 9, 69, 76, 76, 78, 80, 82, 86, 88, 95 where the median = 79 The medians of the two sets are not that different. Therefore the median is not that affected by 9. Measures that are not that affected by extreme values are called resistant. A variation of the mean is the trimmed mean. A 10% trimmed mean drops the highest 10%, th and averages the remaining. Let's calculate the trimmed mean for the data we were looking at a (69), 76, 76, 78, 80, 82, 86, 88, 91, (95) The 10% trimmed mean = 82.13 (9), 69, 76, 76, 78, 80, 82, 86, 88, (95) The 10% trimmed mean = 79. 38 The 10% trimmed mean of the two sets is not that different. The trimmed mean is not as affecte value 9 as the mean. After reading this lesson you should know that there are quite a few options when one wants to tendency, for example, mean, median, mode and trimmed mean. In future lessons, we talk abou the mean. However, we need to be aware of one of its short comings, which is that it is easily a extreme values. One remedy is to use trimmed mean to estimate the central tendency. Rememb this is very different from saying that one can trim data. Unless data points are known mistakes remove them from the data set! One should keep the extreme points and use more resistant mea example, use the sample median to estimate the population median. Or, use the sample trimme estimate the population trimmed mean. Again, this is very different from saying that it is OK to data set.

Skewness

The above distribution is symmetric. 2. Skewed Left Mean to the left of the median, long tail on the left.

The above distribution is skewed to the left. 3. Skewed Right Mean to the right of the median, long tail on the right.

Salary distributions are almost always positively skewed, with a few people that make the most money. To illustrate this, consider your favorite sports team or even the company for which you work. There will be one or two players or personnel that earn the “big bucks”, followed by others who earn less. This will produce a shape that is skewed to the right. Knowing this can be a useful aid in negotiating a higher salary. When one interviews for a position and the discussion gets around to compensation, it is common that the interviewer states an offer that is “typical someone in your position”. That is, they are offering you the average salary for someone particular skill set (e.g. little experience). But is this average the mode, median, or mean company – for whom business is business! – will want to pay you the least they can whil prefer to earn the most you can. Since salaries tend to be skewed to the right, the offer w likely reflect the mode or median. You simply need to ask to which “average” the offer r what is the mean of this average since the mean would be the highest of the three values have these averages, you can begin to negotiate toward the highest number.

Adding and Multiplying Constants What happens to the mean and median if we add or multiply each observation in a data set by a Consider for example if an instructor curves an exam by adding five points to each student’s sc does this have on the mean and the median? The result of adding a constant to each value has t of altering the mean and median by the constant. For example, if in the above example where w aptitude scores, if 5 was added to each score the mean of this new data set would be 87.1 (the o 82.1 plus 5) and the new median would be 86 (the original median of 81 plus 5). Similarly, if each observed data value was multiplied by a constant, the new mean and median a factor of this constant. Returning to the 10 aptitude scores, if all of the original scores were d the new mean and new median would be double the original mean and median. As we will lear effect is not the same on the variance! Why would you want to know this? One reason, especially for those moving onward to more a (e.g. Regression, ANOVA), is the transforming data. For many applied statistical methods a req is that the data is normal, or very near bell-shaped. When the data is not normal, statisticians w data using numerous techniques e.g. logarithmic transformation. But, the log cannot be taken o instance the log of 0 is undefined. However, if we add a constant to all the data values making than zero, then a log can be taken without risk.We just need to remember the original data was...


Similar Free PDFs