F18 comm87 notes PDF

Title F18 comm87 notes
Author Ka Ching Chuk
Course Statistical Analysis for Communication
Institution University of California Santa Barbara
Pages 33
File Size 890.9 KB
File Type PDF
Total Downloads 91
Total Views 129

Summary

detailed notes...


Description

10/1 Statistics and research

1. Role of statistics in research a. Empirical research: systematically gathering information from factual observations to answer questions Eg: are kids who play video games more violent? B. qualitative vs quantitative research - Qualitative: nonnumerical (no numbers included), interpretive - Quantitative: numerical (statistics included), measurements (variables, measure the phenomenon) Measurement: assignment of numbers to reflect characteristics of a variable Variable: the varying, observable characteristics of a object or event Data; a collection of measurements C. statistics: mathematical tools used to make sense of our data

2. Descriptive vs. inferential statistics a. Descriptive: describe characteristics of data Eg. describe how many % of students agree to have football team at ucsb B. inferential (sampling): use data from a sample to infer characteristics of a population Population: the total class of whatever is observed Eg: all americans, all blue eyed students Sample: a portion of the population, selected to represent the population 3. The research plan a. Problem: the purpose of the research - Research question Eg: what is the effect of watching violent movies on children’s aggressive behavior? - hypothesis Comes from deep thought and observations then come up with some educated guess Eg: my prediction is that children who watch violent movies are more aggressive than children who don’t. B. method: how the research was conducted 1. Content analysis: analyzing the content of communication Eg: extent of violence in movies seen by kids 2. Survey: give questionnaires to representative sample Eg: how often do you watch violent movies? Do you become aggressive after watching. Ask about their behaviors and background Random sample: every member of population has an equal chance of being included in the sample Eg: Effect of exposure to movie violence on childrens behavior 3. experiment : variables manipulated by researcher to study effect of manipulation Independent variable: manipulated by the researcher

Eg: the exposure of violent movies Dependent variable: the phenomenon that is affected by the manipulation (effect) Eg: aggressive behaviors Experimental vs control groups Random assignment: every member of sample has equal chance of being assigned to either control or experimental group C. results: statistical results are reported Eg: the average number of childrens aggressive acts in the experimental group was 74, whereas the control groups average was 36 D. discussion: statistical results are interpreted in words Eg: watching violent movies increases aggressive behavior in children 10/3 Measurement 1. Definition: the assignment of numbers to specify characteristics of a variable. 2. Levels of measurement: A. Nominal: numbers represent different categories; categories have no order. Eg: gender (male=1, female=2) Political party ID (repub=1, demo=2) - No info about relationship between categories - Categories are mutually exclusive B. ordinal: numbers represent categories; categories are ordered Eg: college football team rankings (who is 1st, 2nd, 3rd) - Info about relationships between categories (eg one category is greater than the other) but cant tell how much greater or lesser - Categories are mutually exclusive *variables measured at nominal or ordinal levels are “categorical variables” C. interval: numbers represent known and equal intervals between ordered categories; with an arbitrary zero point Eg: temperature on celsius or fahrenheit scale most attitude measures - Info about relationships between categories - Info about how much one category is greater or lesser than another category D. ratio: same as interval, but has an absolute zero point Eg: height in inches, temperature on kelvin scale, time - Info about relationships between categories - Info about how much one category is greater or lesser than another category *variables measured at the interval or ratio level are “continuous variables” 3. Measurement accuracy A. Validity: the degree to which a measurement actually measures what it claims to measure B. Reliability: the degree to which a measurement is consistent C. Relationship of validity and reliability

1. Can have reliability without validity 2. Cannot have validity without reliability D. improving measurement 1. Validity: use multiple measures 2. Reliability: use multiple judges and train them to use your measures consistently 10/5 Distribution - describing data 1. Distribution: a collection of measurements, the frequency that observations are assigned to each category or point on a measurement scale, 2. Tabular and graphic descriptions of data A. Frequency table Nominal data Political party n

35

republican

10

independent

5

democracy

20

B. histogram (bar graph) C. frequency polygon (line graph) D. shapes of distribution

1. Bell-shaped (normal curve)

Sides are equally balanced, symmetrical from the middle Eg: height, weight, iq, income 2. Peaked (leptokurtic) and flat (platykurtic) Symmetrical Eg: leptokurtic- college students age 17-25 would be the peak Platykurtic: evenly spread out 3. Skewed Positive (right) skew A hard exam, most of the scores are on the lower end (left) Negative (left) skew Eg: easy exam, scores are on the higher end (right) 4. Measures of central tendency A. Mode (mo): the most frequently occurring score in a distribution. B. Median (mdn): the midpoint or middlemost score in a distribution. Requires at least ordinal level of measurement for data, Nominal data has no median because there is no order C. mean (m): the average of all scores in the distribution calculated by summing the scores and dividing by the # of scores. Does not work for ordinal because it is categories and no numbers to add up, so doesnt for nominal Requires at least interval level of measurement for data Note on symbols: sample mean: bar w x or M. population mean: greek letter nue D. which measure of central tendency is most appropriate? Depends on the level of measurement and the shape of the distribution Mode is for nominal data Median: ordinal, interval level of measurement In skewed distribution the mean is sensitive to extreme scores Using mean is problematic because there is always the long tail end that pulls out the mean so it would be misleading in skewed distribution 10/8 IV. measures of dispersion (variability) A. Range: highest score minus lowest score B. Standard deviation (SD or s, sigma symbol o): how much the scores in a distribution deviate from the mean on average 1. Small SD= small score spread around the mean (homogeneous)

2. Large SD= large score spread from the mean (heterogeneous) 3. Note on symbols: SD (or s) for sample Sigma symbol o for population

C. Variance (SD^2 or s^2, o^2): the square of the standard deviation 1. Similar to SD in principle 2. Note on symbols: SD^2 (or s^2) for sample o^2 for population D. both SD and variance are based on sum of squares (SS) Logic of sum of squares: SS: sum of squared deviations from the mean But sd and variance need average of the ss E. formulas: Variance: s^2 = SS/N Standard deviation: sd=square root of SS/N F. different measures of variability are useful for different statistics Eg: ANOVA, standard scores Using distributions to estimate parameters 1. Some terms to understand: Statistics: measured characteristic of a sample (line+X, SD) Parameter: measured characteristic of a population (neu u, sigma o) 2.. Types of distribution A. Sample distributions - Plot raw scores of a sample - Can calculate sample statistics (eg X = mean, SD= standard deviation) B. population distributions - Plot raw scores of a population - Can calculate population parameters (eg neu= mean, o= standard deviation) C. sampling distributions - Used to estimate parameters - Plot of sample statistics (eg means or other) from multiple samples - There are different types of sampling distributions (eg “sampling distribution of sample means,” or “sampling distribution of sample mean differences.”) 10/10 Example: sampling distribution of sample means - Created by using “sampling with replacement” - Variations between sample means is due to “sampling error” Taking the mean of each sample and plotting them - Can calculate a mean and standard deviation or neu = mean of all the sample means (always equal to neu because theoretically everyone in the population has been measured via the infinite samples) Mean would most likely fall near the center point neu, close to neu but might not be exactly it = standard error of the mean (this is the standard deviation of the sampling distribution of

sample means) When we use sampling distribution, standard error = standard deviation III. properties of normal distributions 68-95-99 rule p=.68 p=.95 p=.99

Examples: using the normal distribution to find probabilities

Sample distribution (statistics) = mean of a sample =48 SD = standard deviation of a sample =4 Population distribution 10/12 IV. estimating neu (population mean) based on sample data - Take a random sample from a population (N>30) and compute X bar (sample mean) - Best estimtae of neu (population mean) = x bar (sample mean) +/- some sampling errors - To determine how much +/-, you need to know 1. Ox (we will give this standard error to you!) 2. Confidence level (usually 95% or 99%) Standard error = standard deviation - Apply numbers to the sampling distribution to get range (confidence interval) or use formula: Neu = X bar +/- (Ox aka standard error mean)(#of standard errors needed for confidence level) Example: X bar=43, Ox=2 Estimate neu with 95% confidence (p critical value = 5.99 7. State conclusion in words: Example: The proportion of freshmen differs significantly from the proportion of sophomores who choose each comm course as the most difficult such that. III. log linear analysis Purpose: similar to chi square test of association, but can also test interaction effects when there are three or more nominal or ordinal variables Example: a researcher is interested in whether gender (M, F) interacts with class standing (fresh, soph, jun, sen) to impact which comm course (comm 1, 87, 88, 89) is the most difficult. IV: nonparametric tests and power: - Nonparametric tests are less powerful than parametric tests STARTING EXAM 3 11/19

Correlation I. purpose: to test the extent to which variables are related to one another. II. simple (“bivariate”) correlation: used to assess the relationship between two variables To use bivariate correlation: - The same sample must be measured on two variables - Both variables must be measured with interval or ratio level data Iii. pearson r product-moment correlation coefficient (r for sample, p for pop) A. Direction (type of relationship) 1. Positive (direct): as X increases, Y increases 2. Negative (inverse): as X increases, Y decreases Not on exam, only for homework IV. testing for statistical significance: test the pearson r to see liklihood that relationship truly exists in the population A. hypotheses: Ha: P XY =/= 0 (relationship likely exists in the pop) H0: P XY = 0 (no relationship likely exists in the pop; relationship seen in sample is due to sampling error) Note: can have directional hypotheses too Ha: P XY > 0 Ha: P XY < 0 H0: P XY / 0 B. compute r, df, and compare with critical value in table if doing by hand or look at sig if doing by computer V. example study: is watching crime dramas related to fear of crime in the real world? A. Measures 1. Hours of crime drama watched 2. Fear of crime B. pearson r = +.54 C. degrees of freedom: df = N - 2 df = 82 - 2 = 80 D. reject H0? VI. interpreting pearson r A. Significant r tells us that the relationship is not likely due to sampling error, but it does not tell us meaning of the relationship B. Direction (look at sign:+-) C. Magnitude (look at absolute value of t) Strength of the relationship is interpreted subjectively. Guidelines ,90 very strong D. coefficient of determination (r^2): tells us about shared variance definition - The proportion of variance in one variable that can be accounted for (or explained by)

variance in another variable Example study: r = E. caution: cannot conclude casual relationship from correlation - Third variable problem - Directionality problem (A->B or B->A) VII. beyond bivariate correlation A. Multiple correlation: used to assess the relationship between one variable with two or more other variables Example: correlate consumption of crime dramas with fear of crime and neighborhood 1. Multiple correlation coefficient (R) R ZXY multiple correlation of z with x and y R XZY multiple correlation of x with z and y Hypotheses: directional and nondirectional Range of values: 0 -> 1 (shows magnitude only) Example: Ha: R ZXY =/= 0 H0: R ZXY = 0 2. Coefficient of multiple determination (R^2) R^2 ZXY proportion of variance x & y share with z (proportion of variance in z that can be explained by x & y) B. partial correlation: the correlation of one variable with another variable, while removing any correlation coming from a third variable(s) from the original two variables (eg correlation of x and z, removing correlation of y in both x and z) 1. Partial correlation coefficient R XZY correlation of x & y, controlling for y 2. Coefficient of partial determination r^2 XZY shared variance of x & z, controlling for y C. semipartial (or part) correlation: the correlation of one variable with another variable, while removing any correlation coming from a third variable(s) from only one of the original two variables (eg correlation of x and y, removing correlation from z in y only) Usd in multiple regression IV. correlation when variables are not interval or ratio A. For ordinal variables: “spearman’s rho” (p) is a nonparametric correlation that compares ranks on the variables rather than scores, Has same range of values and variables and interpreted the same way as pearson r B. for nominal variables: “point-biserial correlation” (r pb) Used when have one nominal variable and one interval/ratio variable Uses “dummy coding” to code categories of the nominal variable (eg 0, 1) and then correlates that with the interval/ratio variable Example: dummy code gender as male = 0, female = 1, measure TV viewing R pb = +.46 R pb = -.62 11/26 Correlation (review)

Correlation tells us the relationship between variables Symbols: p for population, r for sample Pearson r product moment correlation coefficient: shows direction (+-) and magnitude (look at absolute value) Significance test: calculate r and df (n-2) or do on computer - Can have nondirectional or directional hypotheses Nondir: Ha : p =/= 0 H0 : p = 0 Dir: Ha: p > 0 or Ha : p < 0 H0: p / 0 Bivariate linear regression I. Purpose: to predict values of one variable from values of another variable A. Prediction is based on correlation: we can use one var to predict another if there is a correlation of the two variables II. regression line: “line of best fit” drawn to minimize the distance between the line and all data points. A. Equation for the line: Y = bX + a Y: variable predicted, “criterion” variable (DV) X: variable used to predict, “predictor” variable (IV) b: slope of the line, “regression coefficient” How much change in y variable depending on one change in the x variable a: y-intercept, “constant” Wont have to calculate a and b, they will be given. Instead find y B. so, can put a value for x into the equation to find value of y b= 0.4 a=2 X=10 Y=0.4x+2 =0.4(10)+2 =6 III. Error A. The stronger the correlation, the better the prediction B. Standard error of the estimate (SEE): a measure of how much error is in our prediction (i.e. how accurate is our prediction) r and SEE are inversely related IV. testing for statistical significance: does the regression equation (or “model”) predict the criterion variable better than chance? A. Test by: 1. Seeing if the slope is equal to zero with a t-test Ha: beta b =/= 0 H0: beta b = 0 2. Comparing amount of variance predicted in criterion var (by predictor var) to amount unpredicted with an F-test F=variance predicted/variance not predicted = MSregression / MSresidual (“error”)

V. other issues with regression A. Regression assumes all variables are measured at interval/ratio level B. Dummy coding can be used in regression for predictor variables that are nominal but criterion variable must always be interval/ratio level 12/3 Factor analysis (FA) I. Factors in FA refer to the common underlying dimensions of a set of variables. II. Example study: what are the essential characteristics of good football players? III. Aim and logic: to determine if a large number of variables can be combined into fewer, more basic, underlying variables called “factors,” “dimensions,” or “components.” Highly correlated vars probably measure the same thing IV. Steps in FA

1. Compute correlations Correlation matrix: shows correlation of every variable with every other variable 2. extract factors and compute factor matrix Factor matrix: shows correlation of every variable with every underlying factor Factor loading: correlation of each variable with each factor 3. Interpret factors a. Look for high magnitude loadings in each column to see which variables are correlated with factors to name factors Look at the magnitude of the loadings (ie ignore the signs) Factor 1 “speed” Factor 2 “size” Factor 3 “intelligence” B. throw out variables with primary coding < .65 and secondary loading >.45 C. total “cumulative %” of variance explained tells how well all factors describe the original variables and “% of variance” tells how important each factor is V. most common uses of FA: 1. Understand the important dimensions of complex concepts 2. Build measurement scales (find what set of variables should be used to measure each dimension of the concept) 12/5 Factor analysis Aim: to determine if a large number of variables can be combined into fewer, more basic, underlying variables called factors, dimensions, or components Steps in fa 1. Compute correlations 2. Extract factors and compute factor matrix 3. Interpret factors a. Look to see which variables load on which factors b. Look to see if any variables do not load on cleanly on one factor c. Look at how much variance explained by factors IV. factor scores 1. After building measurement scale from fa results, compute factor score by adding raw scores on the variables that load on each factor. Add up speed factors numbers for first two variables 2. Factor scores can then be used as variables in subsequent analysis Example: do male/female or older/younger job recruiters differ in what they think are the most important communication factors in job interviews? DVs = factor scores IVs = recruiters’ gender, age Advanced statistics 1. Cronbach’s alpha: mathematical measure of scale reliability - Based on correlations between variables that make up the scale Tells if people answer similar questions in consistent manner.

2. Ancova: same as anova but controls for effects of unwanted variables - Based on f tests and logic of partial correlation - Example: are 3 different message sources equally effective to persuade people to expand the local dump after controlling for people’s attitudes about the environment? 3. Causal modeling: used to test hypotheses about causal relationships between variables - Based on logic of correlation, regression, and factor analysis...


Similar Free PDFs