Formula Sheet for Statistics Test 1 PDF

Title Formula Sheet for Statistics Test 1
Course Critical Appr. of Stat. in HS
Institution University of Ontario Institute of Technology
Pages 6
File Size 449.7 KB
File Type PDF
Total Downloads 74
Total Views 141

Summary

Formula Sheet for Statistics Test 1...


Description

Formula Sheet for Statistics Test 1: Notion: X = the scores in the data set of a variable n = number of scores in a sample; N for a population ∑ = “sigma” which tells us to sum up/add what follows ∑ X means that we are summing X1 + X2 + X3 + . . . + Xn Data, Observation and Variables • A data value (or datum) = one measurement. Jan’s age of 47 is a data value. • An observation= all of the information about one participant. One row represents an observation. • A variable = a set of data values for a single measurement. Frequency Distribution- Ungrouped Data: f= Count how many times each value occurred which gives you the frequency P= proportion (frequency/ n) %= proportion x 100 Creating a Distribution Table: 1. Determine the range of scores: larges score- smallest score 2. Find a good interval: Range/ 5 or 10 (Aim for width of 5 or 10 and a total of 10 interval) 3. Identify the intervals: Check lowest score (e.g. if lowest score is 53 then use 50). Bottom score should be a multiple of the width Frequency Distributions and Graphs: cf is the number of people who are in and below each category

Percentiles: C%= (cf/n) x100 This allows to see what ranking that certain data set ranked.

Frequency Distributions and Graphs:  Histograms: A visual depiction of a grouped-data frequency table for interval- or ratio-scale data  Where the height of each bar for each interval listed on the horizontal axis indicates the frequency of scores on the vertical axis  Nominal- or ordinal-scale data need a bar graph  Continuous would need a histogram To make a Histograph:

1. List the midpoints of intervals on horizontal axis, starting from zero if that’s reasonable 2. What’s the midpoint? 3. Averaging the highest and lowest scores in the interval. Interval 55-59 would have a midpoint of 57 (55+59/2). 4. List the possible frequencies on the vertical axis, from zero to highest frequency in any interval 5. Draw a bar for each interval, the height of which corresponds to the frequency of scores for each interval 6. The width of the bar extends to the real limits of the category/interval

Describing Frequency Distribution and Graphs: We can describe the distribution of the histogram: Look at the number of peaks  1 peak is unimodal  2 peaks is bimodal  More than two peaks is multimodal  Uniform – there are no clear peaks Symmetrical vs. skewed Negative (left) Positive (right)  Skew based on side with lower frequencies

Interpretation of Frequency Histogram:

Interpreting Histogram (change between two time periods): • Calculating percentage change: Increase= New Number- Original Number • % Increase= Increase / Original Number x 100

Measures of Central Tendency: This includes mode, median and mean Mode: • Value that appears the most often. • If there are two modes, then it is bimodal and if there is more than two then it is multimodal. • If the modes are adjacent, then you report an average of those. • If the two most frequently occurring values are adjacent to one another then we would report the average of those two values (e.g. 9, 8, 8, 8, 7, 7, 7, 6, 6, 3, 3, 1 Mode=(7+8)/2 =7.5 ) • You can have no mode as well Median: • The measure which divides the total number of scores in half = 50th percentile. • Rank scores from highest to lowest, count down to the middle score, or calculate the median location as (N+1)/2 Mean: • Add up the scores and divide by how many of them there are • Example: g., 7, 8, 8, 6, 3, 2, 6, 9, 3, 8 (n=10) • Mean = = = = 6 Mean Notations; • = = sample mean • mu (µ) for populations, µ = • ∑ = sigma, "sum of" • X = the scores in the data set of variable X • ∑ X means that we are summing X1 + X2 + X3 + . . .+ XN • / means division • n = number of scores in sample data set • N = number of scores in population data set Central Tendency Notes:  When you have a single extreme value, it changes the mean dramatically but has no effect on the mode and median Measures of Variability: • Variance= describes the distribution across a set of data points. It describes whether the data point is clustered together or spread out over a large distance. This concept is referred to as variability. Range= total spread in data • (highest score- lowest score) • e.g. (4,5,7,9 and so 9-4= 5 so the range would be the range 5) Interquartile range (IQR) • The difference between the lower quartile (or Q1) and upper quartile (or Q3) and comprises the middle 50% of the data • The IQR is a more useful measure than the range because it deals with the middle 50% of the data  E.g. 10, 18, 18, 19, 19, 20, 21, 22, 25, 29, 32  Median = X6 = 20 years  Divides the data into two equal sections – an upper half and lower half  Next step is to get the median for the lower half (Q1) = X3 = 18 years  Now get the median for the upper half (Q3) = X9 = 25 years Box-and- Whisker Plot:

 We can compare the top half of the box to the bottom half which shows the spread of the data points so we can see here the top half has more of a spread of data points  We can compare the quarters by looking at the whiskers and this indicates how wide of a spread the data points are Variance (s2) and Standard Deviation (s) • The most commonly used and most important measures of variability • Provide a measure of the standard, or average, distance from the mean • Describe whether the scores are clustered closely together around the mean or are widely scattered Deviation scores • First step is to determine the deviation, or distance from mean, • How much does Dave’s score differ/deviate from the mean? • Obtain each person’s deviation score by calculating the difference between the original score and the mean • Deviation = X - , for X1, X2, X3 … Xn Mean Deviation: • Can’t find this because some are negative, and some are positive so you must do a squared formula: Sum of Square Sum of Square: o So we must square all the deviation scores to get rid of the negative values, which results in the sum of squares (SS) • SS = ∑ X2 – (∑X)2 n An important distinction: • ∑ X2 vs. (∑ X)2 • “sum of X-squared” vs. “sum of X, quantity squared” • Remember Order of Operations! • BEDMAS: Brackets first, then Exponents, Division, Multiplication, Addition, and Subtraction Population Variance: • Divide the sum of squared deviations (SS) by the number of scores (N) • σ2 = SS/N • Sample Variance: • Instead of dividing the sum of squares by N, we need to make a small adjustment to the denominator to calculate the sample variance (s2) • s2 = SS/n – 1 • n – 1 corrects the bias in the estimation of the population variance Sample/ Population standard Deviation: s= *Note: s is never negative! Population standard deviation, σ =

2

SS , where SS = ∑ X2 – ( ∑ x ) n−1 n Formula for Standard Deviation ^ Rules of Rounding: • Important to follow rules of rounding in order to obtain the correct answer 1. Carry as many decimal places as possible through every calculation. If unrealistic to do so, round intermediate steps to four decimal places 2. Final answers should be rounded to one more decimal place than in the original data (see previous example re: men’s SCL) 3. Round consistently; e.g., if you round the mean to one decimal place, you should also round the SD to one decimal place • When a decimal is followed by numbers greater than 5, round up (e.g., 3.26 is 3.3); when it is followed by a number less than 5, round down (e.g., 3.21 is 3.2). • When an odd decimal is followed by a 5, round up (e.g., 2.75 is 2.8). • When an even decimal is followed by a 5, round down (2.65 is 2.6). • Decimals generally are rounded off to the 10th place (e.g., 2.56 is 2.6), 100th place (e.g., 2.563 is 2.56), or 1000th place (e.g., 2.5638 is 2.564).

s=



Probability: The probability (p) for any specific outcome or even is expressed as ratio: • Probability of A occurring = # ways A can occur total # of outcomes • P(A) = #A #O Binomial Equation and Notation: • P(X “successes”) = • n = number of observations or number of times the process is repeated, • p = probability of success or occurrence of outcome of interest, • x = number of successes or events of interest occurring during n observations • Binomial equation also uses factorials • The factorial of a non-negative integer k is denoted by k!, which is the product of all positive integers ≤ k • Example: • 4! = 4x3x2x1 = 24 • 2! = 2x1 = 2 • 1! = 1 • One special case: 0! = 1 Z- score Distribution: • To convert standardized variables to Z scores we use the formula X-µ Z= σ Sampling Distribution of Means: Characteristics of sampling distribution of means: 1. Sample means should be relatively close to the population mean, µ 2. Sample means tend to form a normal-shaped distribution 3. The larger the samples, the closer the sample means should be to the population mean, µ (sometimes referred to as “law of large numbers” where means from larger samples are closer to the true population mean)

Standard Error of the Mean: SEM= SD/ square root of N  When SEM is small, then all of the sample means are close together and have similar values  When SEM is large, sample means are scattered over a wide range and there are big differences from one sample to another Confidence Intervals:...


Similar Free PDFs