Psych 10B Lecture - I took this class with Professor Daniel Conroy-Beam. PDF

Title Psych 10B Lecture - I took this class with Professor Daniel Conroy-Beam.
Course Statistical Methods in Psychology
Institution University of California Santa Barbara
Pages 29
File Size 510.5 KB
File Type PDF
Total Downloads 94
Total Views 140

Summary

I took this class with Professor Daniel Conroy-Beam. ...


Description

Statistics ● 1. Facts and figures (ex: average annual snowfall in Denver) ○ Informative and time-saving ● 2. Reference to a general field of mathematics (ex: using this book for a statistical course) ○ Organize and summarize information to communicate result to others ○ Helps researchers answer questions that initiated the research ○ Ensure the information/observations are presented and interpreted in an accurate and informative way ○ Provides a set of standardized techniques Why have statistics? ● Form follows functions ● Unaided reflection on data will never tell you anything (if you are given a bunch of data and statistics, you wouldn’t know what they represent/conclude) ● Ex: Are men taller than women? → How can you say men are taller than women if everyone is a different height → How do you know gender is causing the difference and not something else? → All data varies ○ Different people have different height because of nutrition, genetic factors, exposure to different hormones, etc. ○ We are only caring about the effect of sex/gender on height even though there are so many other variables that can affect someone’s height ○ Sex/gender = signal, other variables = noise ■ Signal: the source of variability we are interested in at the moment (the process we are trying to see in our data) ■ Noise: everything else that causes variability in our data ○ Statistics is a set of mathematical techniques for separating signal and noise (what components are coming from noise and are any components coming from signal?) ■ *statistics reduces t he noise, but it doesn’t get rid of it How do we do statistics? ● 1. Identify the variable (anything that differs across people/observation) ○ outcome/dependent variable: the variable we are trying to explain, the thing you believe is being affected ○ predictor/independent variable: the variables we are using to explain, the thing that is doing the affecting ● 2. Measure your variables (a system for assigning numerical values to observations) ○ Ex: you can look around the room and get a rough estimate of how tall everyone is and estimate who is taller than who, but numerical values are more accurate ○ Picking a measurement system ■ Discrete: observations come in separate categories, no values can exist between categories (things are black and white) ■ Continuous: observations are divisible into infinite number of fractional parts, unlimited possible values between categories (shades of grey)





● ●

Picking a scale of measurement ■ Nominal: set of discrete, labeled categories that are unordered, just represent differences in kind ● Ex: gender, race ■ Ordinal: set of discrete, labeled categories that are ordered, representing differences in rank and can be ranked from low to high ● Ex: Shirt size (S,M,L), competition ranks (1st, 2nd, 3rd) ■ Interval: ordered like an ordinal scale but is continuous (can have intermediate values) ● Difference between scores is equal (difference between 1 and 2 is the same as between 2 and 3) ● No true zero point ● *100 degrees is hotter than 50 degrees, but 100 degrees isn’t twice as hot as 50 degrees ■ Ratio: numerical, ordered, equal size intervals, but has a true zero point (a score of zero means you have none ■ The same variable can be measured using different scales of measurement Data: measurements or observations ○ Datum/raw score/score: a single measurement or observation ○ Data set: a collection of measurements or observations Parameter: value, usually numerical, that describes a population ( ex: average score of the population) Statistics: value, usually numerical, that describes a sample

Descriptive Statistics: statistical procedures used to summarize, organize, and simplify data → used to visualize your data Inferential statistics: methods that use sample data to make general statements about a population ● Problem: sample provides only a limited information about the population → sampling error Proportions/relative frequency: measure the fraction of the total group that is associated with each score (p=f/N) Grouped Frequency Distribution Table: presenting groups of scores rather than individual values (groups/intervals are called class intervals) ● Should have about 10 class intervals ○ More than 10 becomes too cumbersome, less than 10 loses information about the distribution scores ● The width of each interval should be relatively simple numbers ● The bottom score in each class interval should be a multiple of the width ○ If you are using a width of 10 points, the intervals should start with 10, 20, 30, 40… ● All intervals should be the same width

Rank/Percentile Rank: the percentage of individuals in the distribution with scores at or below the particular value ● Percentile is the score identified by its percentile rank Frequency Distribution Graphs: a picture of the information available in a frequency distribution table ● X = score, f = frequency ● Frequency Table: a way to summarize data in two columns ○ 1. All of the scores in your data in order of magnitude ○ 2. How frequently those scores appear in your data ○ Limitations ■ Require a lot of space ■ Still just numbers, not a lot better than just looking at the data ● X axis: abscissa, Y axis: ordinate ● 1. Histogram: a graphical version of a frequency table ○ Presents the same information of a frequency table but more compact and able to see pattern more easily ○ Bins: groups of scores ○ The height of the bar corresponds to the frequency for that category ○ For continuous variables, the width of the bar extends to the real limits of the category ○ For discrete variables, each bar extends exactly half the distance to the adjacent category on each side ○ Modified Histogram: modification consists of stacks of blocks, each block represents one individual ○ Allows you to see distribution information - How is the data shaped? ○ Normal Distribution ■ Bell Shaped - Most of the data is in the middle of the distribution, and gradually less data as you move out to the ends ■ Symmetrical - same on each side ■ Most variables in psychology are normally distributed ○ Bimodal Distribution: two clear peaks ■ Often indicates two subgroups in the data ● Ex: men and women plotted together ○ Skew - is the data lopsided? ○ Skewed Distribution: the scores tend to pile up toward one end of the scale and taper off gradually at the other end ■ Positively skewed (right skew): tail on the right side, scores piling on the left side ■ Negatively skewed (left skew): tail on the left side, scores piling on the right side



■ ○ Symmetrical Distribution ■ The mean, median, and mode would be the same value ○ Tail of the Distribution: section where the scores taper off toward one end of a distribution ○ Provides a nice qualitative summary of the data, but we need a quantitative summary to do mathematical analysis → descriptive statistics 2. Bar Graphs: same as a histogram but spaces are left between adjacent bars ○ Used when scores are measured on a nominal or ordinal scale

Research Methods ● Correlational method: two different variables are observed to determine whether there is a relationship between them ○ Limits: demonstrates the existence of a relationship between two variable, but they do not provide an explanation for the relationship → cannot demonstrate cause-and-effect relationship ● Experimental Method: demonstrate a cause-and-effect relationship between two variables. One variable is manipulated while another is observed and measured, controlling all other variables ○ 1. Manipulation ○ 2. Control Variables that researchers must consider ● 1. Participant variable: characteristics such as age, gender, and intelligence that vary from one individual to another. ○ This variable cannot differ from one group to another ○ If you are testing the effect of violence games on aggressive behavior, group 1 and 2 should have equal number of male and female to avoid confound variables. ● 2. Environmental Variable: characteristics of the environment such as lighting, time of day, and weather conditions ○ Researcher must ensure that the individuals in treatment A are tested in the same environment as the individuals in treatment B ● 3. Controlling variables: random assignment, matching to ensure equivalent groups and environment, holding variables constant

● ●



Assumption: there is a height associated with being a human women ○ Other factors make some women randomly taller or shorter than that height → Central Tendency: an average or representative score that defines the center of a distribution ○ Mode: measure of the most frequent score ■ Limits: highly sensitive to extreme (unusual) scores ■ You can have more than one mode value ■ Useful when ● 1. Using nominal scales ● 2. Your variables are discrete ● 3. To provide an indication of the shape of the distribution ○ Median: the middlemost score in the data ■ 50th percentile score ■ Limit: only represents the middle of the data and ignores the outer edges of the data ■ Useful when you have ● 1. Extreme scores or skewed distributions ● 2. Undetermined values ● 3. Open ended distributions (0, 1, 2, 3, 4, 5 or more) ● 4. Ordinal scales ■ For discrete variables, the median cannot be a value in between the other values ■ Precise median ● Ex: we needed 1 out of the 4 boxes in the interval, so the fraction is ¼ ● **only appropriate for continuous variables ○ Mean: the sum of the scores divided by the number of scores ■ Uses all the data ■ The sum of all the deviation from the mean is equal to zero ■ A good representative of the signal because it uses all the data and is closely related to variance and standard deviation but it does not tell us how different the scores are from one another - how much noise is in the data? → we need measures of variability ■ Because the mean serves as a balance point, it will always be located somewhere between the highest and lowest score → if you calculate a value outside the range, you made an error ■ To change the unit of measurement, just multiply the mean by the conversion just as you would in changing individual values Variability: provides a quantitative measure of the differences between scores in a distribution and describe the degree to which the scores are spread out or clustered together

○ ○ ○ ○



● ● ●



Measures how well an individual score represents the entire distribution Low variability: existing patterns can be seen clearly High variability: obscures any pattern that might exist Range: the largest score - the smallest score ■ Only pays attention to the tails of the data and ignores the middle (opposite of the median) → does not give an accurate description of the variability of the entire distribution → range is unreliable for variability ○ Interquartile range (IQR): the scores that cut off the middle 50% of the data ■ Tells you how far most of the data spreads ■ 1. First quartile: the score that cuts off the bottom 25% of the data ■ 2. Third Quartile: the score that cuts off the top 25% of the data ■ *second quartile is the median ○ Standard Deviation: Represents the average difference between a score and the mean (deviation = X - mean) ■ measures variability and listens to all of the data ■ The sign tells the direction from the mean ■ Variance: the mean of the squared deviation, the average squared distance from the mean ■ → standard deviation= sqrt(variance) = sqrt(sum of(x-mean)^2) ○ Sum of Squares (SS) ■ 1. Find each deviation score ■ 2. Square each deviation score ■ 3. Add it all up Problems with Sample Variability ○ Assumption: sample should be representative of the population from which they come → problematic because samples consistently tend to be less variable than their population → creates bias ○ Correct the bias by making adjustments in equation ■ Sample variance = s^2 = SS/(n-1) ■ Sample standard deviation = s = square root (SS/(n-1)) Biased Statistics: average value of the statistic either underestimates or overestimates the corresponding population parameter Unbiased Statistics: average value of the statistic is equal to the population parameter Transformation Scale ○ 1. Adding a constant to each score does not change the standard deviation ○ 2. Multiplying each score by a constant causes the standard deviation to be multiplied by the same constant Population: every observation relevant to your question ○ The ideal scenario is to collect data from the population ○ Population is too large ○ Ex: Do the people in this class like statistics?” → population = the people in this class



→ use a sample: a set of individuals selected from a population, intended to represent the population ○ For a sample to be useful ■ 1. All people in the population have a chance t o get into the sample ■ 2. The chance of getting into the sample is equal for all people ● Heights that are common in the population will be common in the sample ● Heights that are rare in the population will be rare in the sample ○ Random samples wind up being representative of the population ■ Samples look similar to the total population → we can use analysis of the sample as being representative of the population ○ Problems ■ 1. Samples sometimes require different statistical analyses than populations Parameter vs Statistics ● Parameter ○ Population mean (μ) ○ Population standard deviation (σ) ○ ***Ex: the average amount of money each shopper has ever spent. ● Statistics ○ Sample mean (M) ○ Sample standard deviation (s) ■ Unlike the mean, the formula changes ■ When calculating the standard deviation of a sample ● **We calculate SS using M rather than μ Degrees of Freedom: the number of unique pieces of information in your data ● How many data points do you need? → not all, just n-1 ● Used for solving s (sample standard deviation) Sampling Error: difference between statistics and its corresponding parameter ● There is always some degree of difference between sample and population; similar but not identical ● Your statistic will never equal your parameter because you don’t have the entire population ● Solution: ○ 1. Take a lot of samples ○ 2. Calculate the statistic (the mean) of each sample and then get the mean of all the statistics → better estimate of your parameter of interest ■ Sampling distribution: the set of statistics from all possible samples of size n ● The mean of the sampling distribution will equal the parameter ● The pile of sample means tend to form a normal distribution ● Larger sample is better than smaller sample ● Not feasible because it is labor intensive



→ Central Limit Theorem ○ If your sample size is large enough (>30) ○ The sampling distribution of your statistic will ■ Be normally distributed (shape) ■ The standard deviation will equal the parameter standard deviation divided by the square root of the population (variability) ■ Standard error: standard deviation of a sampling distribution ● Provides a measure of how much difference is expected from one sample to another ○ Large standard error = sample means are scattered over a wide range, and big differences from one sample to another ○ Small standard error = all the sample means are close together and have similar values ■ The mean equals the parameter mean (central tendency) ○ Although we will never know the true population mean, we can at least know how close a given sample mean is likely to be to the truth b/c if the standard deviation of the sampling distribution is small, then a random sample mean will give a good guess about the population mean ○ What makes standard deviation of M small? ■ Larger sample (bigger n) ■ Lower standard deviations (smaller s) ■ We can make s smaller by eliminating noise in our study ■ We can make n larger by recruiting more participants Law of Large Numbers: the larger the sample size, the more probable it is that the sample mean will be close to the population mean ● Sample size and standard error are inversely related ● When the sample consists of a single score (n=1), the standard error is the same as the standard deviation What makes a score “extreme”? ● Unusual, rare, atypical ● Can be unusually high or low ●



Ex: Do brain training games improve cognition? Or Do people who play brain training games have unusually high scores compared to the rest of the population? ○ If the mean is 75, and a specific student got an IQ score of 100, we know that he is smarter than the average but we don’t know if it’s an extreme w  ithout knowing the distribution/standard deviation → Z-score: combines information on the mean and standard deviation, and converts a “raw” score into a “z-score” that reflects how extreme that score is ○ = (X - μ) /σ ○ The sign tells whether the location is above (+) or below (-) the mean



Magnitude: The number tells the distance between the location and the mean in terms of the number of standard deviation, how extreme the score is ○ 1. Transforms a distribution ■ If you z-score all of the scores in a distribution... ● 1. The mean of the z-scores will always equal 0 ● 2. The standard deviation of the z-score will always equal 1 ■ If you took measurements in different units, you can use the z-score to compare the values ■ You can use z-scores to convert one distribution into another ● Look at slide ○ 2 . Determine the relative probability of a score - In a random sample, how likely are we to see some scores rather than others? ● Z-score table ○ If your z-score is positive →  the proportion of scores larger t han your score ○ If your z-score is negative →  the proportion of scores smaller t han your score Normal Distribution ● 68% of the data is within 1 standard deviation ● 95% of the data is within 2 standard deviation ● 99.7% of the data is within 3 standard deviation ● → Problem with the Pavel Case: It’s not experimental and he is only one person When is a sample extreme enough? ● We will always see some difference between sample statistic and population parameter ● At some point, sampling error becomes so unlikely as an explanation that we reject it ○ We assume something else caused our data ○ Used when our sample statistic is particularly extreme compared to the population parameter Factors that Influence a Hypothesis Test ● A big mean difference between the sample and the population indicates that the treated sample is noticeably different from the untreated population and usually support a conclusion that the treatment effect is significant ● Variability of the Scores - the larger the variability, the lower the likelihood of finding a significant treatment effect ● The number of scores in the sample - increasing the sample size produces a smaller standard error and a larger value for the z-score. The larger the sample size, the greater the likelihood of finding a significant treatment effect Assumptions for Hypothesis Test with Z-Scores ● Random sampling - the sample must be representative of the population from which it has been drawn ● Independent observations - two events are independent if the occurrence of the first event has no effect on the probability of the second event



The value of σ is unchanged by the treatment - assume that the standard deviation for the unknown population (after treatment) is the same as it was for the population before treatment ○ A theoretical idea and actual experiments do not show a perfect and consistent additive effect ● Normal Sampling Distribution - the critical region table can only be used for a normal distribution Null Hypothesis Significance Test (z - test) ● 1. State the Null Hypothesis - State a hypothesis about a population that concerns the value of a population parameter ○ Null Hypothesis (H0): states that in the general population there is no change, no difference, or no relationship. The independent variable has no effect on the dependent variable for the population ■ Pretend brain training had no effect and the only thing that determined our sample mean was sampling error. There is no signal in the world and your data is just nois...


Similar Free PDFs