EC 285 mid term review WLU PDF

Title EC 285 mid term review WLU
Author James Williams
Course Introductory Statistics
Institution Wilfrid Laurier University
Pages 6
File Size 286.2 KB
File Type PDF
Total Downloads 85
Total Views 143

Summary

Download EC 285 mid term review WLU PDF


Description

Unit 1 Types of data 1. Categorical/qualitative data Ordinal - specific order to the data; ex: Grades A B C D Nominal - no order to the data; ex: colours , yes or no answers 1. Quantitative/interval data Measures a particular quantity; ex: Height, price

Parameter - Measure of a characteristic across the population (rare as we usually do not survey the entire population) Statistic - Measure of a characteristic within a sample Sampling Techniques 1. Random sampling 1. Non random sampling (some members of population are more likely to be picked; not representative of population) 2. Stratified random sampling - Divide population into groups be characteristic and take a random sample from each; a balanced sample occurs when the proportion of respondents in each group matches proportion of members in population 3. Cluster sample - Population is divided into random clusters and a respondents are picked randomly from the selected clusters Sampling Biases 1. Response bias - Question is designed in a leading way 2. Non response bias - Some types of people may be less likely to respond 3. Voluntary response bias

Errors 1. Measurement errors - Respondent provides inaccurate answer 2. Sampling error - There is always a chance we sampled the wrong group of people Unit 2 Categorical data

Frequency - # of observations

Proportion = Frequency / number of observations Contingency table - Examines two categorical variables at the same time Contingency table: Frequency table:

Simpsons Paradox - When averages are taken across different groups and these group averages appear to contradict the overall averages

Quantitative Data

Data distribution can be described by examining its: 1. Shape 2. Symmetry 3. Center 4. Spread Shape ● A mode is a value or narrow range of values that occurs more often in the data than other values ● One main hump = Unimodal ● Two humps = Bimodal ● More than 2 humps = Multimodal ● No clear mode = Uniform Symmetry ● If the chart is cut in half will both sides be similar ● Each end of the distribution is called a tail; if one tail is longer/taller than the other then the distribution is skewed in the opposite direction Center ● Measures of central tendency: Mean and Median Spread ● How far away from the center the data expands Measures: 1. Range 2. Interquartile range 3. Variance - measures spread of each data point from the mean 4. Standard deviation - the variance gets squared through its formula, SD is sued to convert it back into original units Standardized results ● Divides the SD by the mean ● Has no units and is directly comparable across all data sets

Geometric mean ● The geometric mean is a special calculation that will find the compound average growth rate Outliers ● Identified using Interquartile range Box plot 1. Plot all data values along a straight line 2. Identify the median and interquartile range and draw boxes around each middle quartile 3. Multiply the interquartile range by 1.5 and extend that distance from the two quartiles; enclose this rand with two hash lines 4. Any observations beyond this range are considered outliers ● The percentile in a box plot is the proportion of the data that lies below a particular value Unit 3 Bivariate data ● Two variables that provide information about each observation Scatter plots ● A variable that causes changes in another is called an explanatory/predictable variable (X-AXIS) ● A variable whose outcome is cause by another variable is called an outcome/response variable (Y-AXIS) Covariance ● COV > 0 positive relationship ● COV < 0 negative relationship ● COV = 0 no relationship ● The units provided from the covariance do not make any logical sense by themselves aside from determining if the relationship is positive, negative or non existent. This is why we use Correlation Correlation ● Correlation coefficient is a measure of linear association ( tells us how close the scatter plot points match up to the straight line) ● The closer r is to 0 the further the points are away and the closer r is to 1 the closer the points are to the mean ● The closer r is to -1 or +1 denotes a positive or negative relationship as well as the strength of it Before calculation correlation coefficient: 1. Are both variables quantitative? 2. Is the relationship linear? (if the scatter plot shows the relationship is obviously non-linear than r will mean nothing) 3. Are there outliers? Determining line of best fit

B1 = Cov / Var B0 = Mean of y - (B1)(Mean of x)

Coefficient of determination = (Coefficient of correlation)^2 Errors??? Unit 4 Experiments ● Each possible result is denoted as an Outcome (Oi) ● A set of all basic outcomes is denoted as a Set (S) ● One or more basic outcomes is denoted as an Event Events ● The probability of an Event occurring is sum of the probabilities of the events individual elements occurring ex: P(A) = p(o1) + p(o2) + p(o3) 2 rules of all probability distributions 1. Likelihood can never be negative 2. There is a 100% likelihood that something will happen (we just don’t know what) Intersection of events ● Set of outcomes in which both events occur simultaneously ● Denoted as (A n B) Union of events

● Combined set of outcomes of the outcome meaning either even or both event could occur ● Denoted as (A U B) Exclusivity ● If the probability of the intersection is zero then we say that the events are mutually exclusive ● P(A n B) = P(A) + P(B) Complement of the event ● Contains all of the possbile outcomes that are not apart of the event ● P(Ac) = 100% - P(A) Conditional probabilities ● For events A and B, we can say that given that the likelihood of event B occuring given that event A has already occurred ● P(B|A) = P(A n B) / P(A)...


Similar Free PDFs