Summary - Introduction to Statistics for Psychology PDF

Title Summary - Introduction to Statistics for Psychology
Course Statistics and Data Analysis 1
Institution Bond University
Pages 14
File Size 318.1 KB
File Type PDF
Total Downloads 90
Total Views 143

Summary

Download Summary - Introduction to Statistics for Psychology PDF


Description

Introduction to Statistics for Psychology

Descriptive Statistics: Used to summarize and make understandable – to describe – a group of numbers from a research study. E.g. mean median and mode.

Inferential Statistics: used to draw conclusions and make inferences that are based on the numbers from a research study: i.e. mean, median and mode. However, inferential statistics goes beyond these numbers.

Basic Concepts: ▪ Variable: Any measurable conditions, events, characteristics, or behaviors that are controlled or observed in a study.

▪ Independent variable (IV): The variable that is manipulated

▪ Dependent variable (DV): the measured variable.

▪ Discrete variable: A variable that has specific values and cannot have values which are in btwn. E.g. you are either male or female, there is no in btwn.

▪ Continuous variable: a variable for which, in theory, there are an infinite number of values btwn any two values. E.g. btwn 1 and 2 there exists 1.1, 1.2, 1.3,…1.8, 1.9, 2.

▪ Value: a number or category e.g. female, catholic, 123…

▪ Score: a particular person’s value on a variable

▪ Population: The larger collection of animals or people from which a sample is drawn and that researchers want to generalize about.

▪ Sample: Scores of a particular group of people studied; usually considered to be representative of the scores in some larger population.

▪ Random Sample: In statistical terms a random sample is a set of items that have been drawn from a population in such a way that each time an item was selected, every item in the population had an equal opportunity to appear in the sample. In practical terms, it is not so easy to draw a random sample. First, the only factor operating when a given item is selected, must be chance.

▪ Random Assignment: The constitution of groups in a study such that all subjects have an equal chance of being assigned to any group or condition.

▪ Validity: a research procedure or interpretation of results obtained from a research study are considered valid if they can be justified on reasoned grounds.

▪ External validity: The extent to which the conclusions of an empirical investigation remain true when different research methods and research participants/subjects are used.

▪ Internal validity: the extent to which the conclusions of an empirical investigation are true within the limits of the research methods or subjects/participants used. ▪ Quantitative Data: Quantitative data are observations measured on a numerical scale. In other words, quantitative data are those that represent the quantity or amount of something. ▪ Categorical/qualitative data: E.g. categorical variable (male/female)

Plotting Data - frequency distribution: a table/graph showing classes into which the data have been grouped, together with their corresponding frequencies, that is, the number of scores falling into each class. - Histogram: in descriptive statistics, a graph in which the values of scores of a quantitative variable are plotted, usually on the horizontal axis (x), and their frequencies are represented by the heights of the bars on the vertical axis (y). N.B. see pp. 11 of text.

- Bar chart: in descriptive statistics, a graph resembling a histogram but having a base (usually horizontal (x)) axis representing values of a qualitative variable e.g. gender or religion. - Pie chart: a circular graph divided into segments resembling slices of pie, proportional to the sizes of the various categories represented.

Describing Distributions: 1. Normal: in many natural processes, random variation conforms to a particular probability distribution known as normal distribution, which is the most common observed probability distribution. The shape of a normal distribution resembles that of a bell, so it is sometimes refered to as a bell curve. Bell curves have certain characteristics: symmetry, unimodal, extend to +/- infinity. The normal distribution can be described by two parameters; the mean and standard deviation. If the mean and standard deviation are known, then one essentially knows as much as if one had access to every point on the data set.

2. Bimodal: In statistics, a bimodal distribution is a distribution with two different peaks — that is, there are two distinct values that measurements tend to center around. Unlike other distributions such as the normal distribution, there is no precise definition of a bimodal distribution. A good example is the height of a person; the heights of males form a roughly normal distribution, as do those of females, but

when added together we obtain a bimodal distribution with values clustering around both the averages. E.g.

A simple bimodal distribution, in this case the sum of two normal distributions Typically, observing a bimodal distribution indicates, as in this example, that the distribution is in fact the sum of two different distributions, each with a single notable peak. It can be difficult, however, to find the differentiating factor between those samples in one distribution and those in the other. Bimodal distributions are a commonly-used example of how deceptive summary statistics such as the mean, median, and standard deviation can be when used on an arbitrary distribution. For example, in the distribution in Figure 1, the mean and median would be about zero, but most values are not concentrated near zero. The standard deviation is also very large, even though the deviation of each normal distribution is relatively small.

3. Negatively/ Positively skewed: Skewness. If a distribution is asymmetric it is either positively skewed or negatively skewed. A distribution is said to be positively skewed if the scores tend to cluster toward the lower end of the scale (that is, the smaller numbers) with increasingly fewer scores at the upper end of the scale (that is, the larger numbers).

A negatively skewed distribution is exactly the opposite. With a negatively skewed distribution, most of the scores tend to occur toward the upper end of the scale while increasingly fewer scores occur toward the lower end.

4. Kurtosis: Another descriptive statistic that can be derived to describe a

distribution is called kurtosis. It refers to the relative concentration of scores in the center, the upper and lower ends (tails), and the shoulders of a distribution (see Howell, p. 29). In general, kurtosis is not very important for an understanding of statistics, and we will not be using it again. However it is worth knowing the main terms here. A distribution is platykurtic if it is flatter than the corresponding normal curve and leptokurtic if it is more peaked than the normal curve.

Measures of Central Tendency: Measures of central tendency, or "location", attempt to quantify what we mean when we think of as the "typical" or "average" score in a data set. The concept is extremely important and we encounter it frequently in daily life. Statistics geared toward measuring central tendency all focus on this concept of "typical" or "average." As we will see, we often ask questions in psychological science revolving around how groups differ from each other "on average". Answers to such a question tell us a lot about the phenomenon or process we are studying. Mode: By far the simplest, but also the least widely used, measure of central tendency is the mode. The mode in a distribution of data is simply the score

that occurs most frequently. Recall that one way of describing a distribution is in terms of the number of modes in the data. A unimodal distribution has one mode. In contrast, a bimodal distribution has two. We would accept that a distribution is bimodal if it seems that more than one score or value "stands out" as occurring especially frequently in comparison to other values. But when the data are quantitative in nature, we also want to make sure that the two more frequently occurring scores are not too close to each other in value before we accept the distribution as one that could be described as "bimodal." So there is some subjectivity in the decision as to whether or not a distribution is best characterized as unimodal, bimodal, or multimodal. Median: Technically, the median of a distribution is the value that cuts the distribution exactly in half, such that an equal number of scores are larger than that value as there are smaller than that value. The median is by definition what we call the 50th percentile. This is an ideal definition, but often distributions cannot be cut exactly in half in this way, but we still can define the median in the distribution. Distributions of qualitative data do not have a median. The median is most easily computed by sorting the data in the data set from smallest to largest. The median is the "middle" score in the distribution. Mean: The mean, or "average", is the most widely used measure of central tendency. The mean is defined technically as the sum of all the data scores divided by n (the number of scores in the distribution). In a sample, we often symbolize the mean with a letter with a line over it. If the letter is "X", then the mean is symbolized as

, pronounced "X-bar." If we use the letter X to

represent the variable being measured, then symbolically, the mean is defined as

Choosing a measure of central tendency: With three seemingly sensible measures of central tendency, how do you know which one to use? Not surprisingly, the answer depends a lot on the data you have and what you are trying to communicate. While the mean is the most frequently used measure of central tendency, it does suffer from one major drawback. Unlike other measures of central tendency, the mean can be influenced profoundly by one extreme data point (referred to as an "outlier"). The median and mode tend not to be affected by this. There are certainly occasions where the mode or median might be appropriate, but it depends on what you are trying to communicate.

Measures of Variability: The average score in a distribution is important in many research contexts. So too is another set of statistics that quantify how variable (or "how dispersed") the scores tend to be. Do the scores vary a lot, or do they tend to be very similar or near each other in value? Sometimes variability in scores is the central issue in a research question. Range: = highest score – lowest score

Interquartile Range (IQR):

a measure of the degree of dispersion,

variability, or scatter in a group of scores, defined as the difference btwn the 25th and 75th percentile. Trimmed Statistics: occurs when a certain percentage of scores (e.g. 10%) is removed from both the lower end of scores and the higher end of scores. The new sample set of score are now called trimmed samples. Statistics calculated on such samples are called trimmed statistics (e.g. trimmed mean or trimmed range). Deviation: how scores are dispersed around the mean. To find deviations use the mean absolute deviation. By doing this you will be able to find out how much scores deviate from the mean without regard to whether they are above or below it (which isn’t necessary).

Variance: a measure of the degree of dispersion, variability, or scatter of a set of scores. Sample variance is defined as… (definitional formula)

E.g.

Standard Deviation: The standard deviation is defined as the average amount by which scores in a distribution differ from the mean, ignoring the sign of the difference. Sometimes, the standard deviation is defined as the average distance between any score in a distribution and the mean of the distribution. i.e.

The above formula is the definition for a sample standard deviation. To calculate the standard deviation for a population, N is used in the denominator instead of N-1. i.e.

The Co-efficient of Variation: To compare the variations (dispersion) of two different series, relative measures of standard deviation must be calculated. This is known as co-efficient of variation. It is given as a percentage and is used to compare the consistency or variability of two more series. The higher the C. V., the higher the variability and lower the C. V., the higher is the consistency of the data. The coefficient of variation of a data set is the relation of its standard deviation to its mean

cv = Coefficient of variation =

The Normal Distribution: Normal distributions are a family of distributions that have the same general shape. They are symmetric with scores more concentrated in the middle than in the tails. Normal distributions are sometimes described as bell shaped. The standard normal distribution is a normal distribution with a mean of 0 and a standard deviation of 1. Normal distributions can be transformed

to standard normal distributions by the formula:

Z-scores Another useful transformation in statistics is standardisation. Sometimes called "converting to Z-scores" or "taking Z-scores" it has the effect of transforming the original distribution to one in which the mean becomes zero and the standard deviation becomes 1. A Z-score quantifies the original score in terms of the number of standard deviations that that score is from the mean of the distribution. The formula for converting from an original or "raw" score to a Z-score is:

Hypothesis Testing: Setting up and testing hypotheses is an essential part of statistical inference. In order to formulate such a test, usually some theory has been put forward, either because it is believed to be true or because it is to be used as a basis for argument, but has not been proved. Null Hypothesis: is used when we can’t prove something to be true but we can prove something to be false. A null hypothesis is a hypothesis set up to be nullified or refuted in order to support an alternative hypothesis....


Similar Free PDFs