Title | Lecture notes, lecture Sep 6 |
---|---|
Author | EMILY LEAH ARSHONSKY |
Course | Research And Data Analysis In Psychology |
Institution | University of California, Berkeley |
Pages | 4 |
File Size | 132.8 KB |
File Type | |
Total Downloads | 44 |
Total Views | 215 |
Lecture Notes Measures continued Reliability repeatable consistent results? Validity measuring what we want to measure Measurement error: problems in the way we measured our variable Measuring how someone likes facebook Amount of time spent on it of hours) Active vs. passive use of app (way they use...
9/6 Lecture Notes Measures continued ● Reliability → repeatable (test/retest) → consistent results? ● Validity → measuring what we want to measure Measurement error: problems in the way we measured our variable ● Measuring how someone likes facebook ● Amount of time spent on it (# of hours) ● Active vs. passive use of app (way they use facebook) ● In comparison to other social media sites ● Watch people’s facial expression as they use it (using camera) ● Neurotransmitters released when using it Reducing measurement error ● Operationalize our variables ● Draw from different sources of data → various measurements, see how much they merge Observational data ● Ratings made by others ○ Good → external, multiple sources, real-world ○ Bad → difficult to collect and measure, observers biased, multiple interpretations ● Hawthorne effect: observation changes behavior ○ Laboratory environment (to control surrounding features for consistency) ○ Awareness of being observed alters behavior ○ How representative is this of real life? ● Habituation: people get used to observation ● Jane Goodall’s chimpanzee observational technique ○ Biased when assigning names to chimps (connotations to names) ■ Agenda? Humanizing the animals? ○ Human in chimpanzee environment ○ Overall good technique Self data ● What a person says about herself/himself (self-report surveys, responses to interviews) ○ Good → Structured and easy, people do have knowledge about themselves, access to private states (values) ○ Bad → People are biased (self-enhancement), people might have lack of knowledge about their behavior, people may lie about sensitive issues Final project → definitely self-data, extra credit for observational data Recap: ● Who do we sample from?
○ Population vs. sample ○ Sampling biases ● What do we measure? ○ Quality of data (reliability, validity) ○ Sources of data (observation, self) Defining our Data ● Data are information ● Quantitative data: concepts reduced to numbers ○ Categorical → values of the variable represent groups (data fall into categories) ○ Continuous → values of the variable represent an “infinite” range of possibilities (data on a spectrum) ● Operationalization ○ Dependent variable must be CONTINUOUS ○ Independent variable can be any type of quantitative variable ● Qualitative not reduced to numbers Identity Spectrum ● Gender/sex = spectrum ● Should measure sample age, # of participants, maybe gender/sex Likert Scales → way to assess continuous variables ● Aggregation: alone, each question is categorical ● Combine related questions together to create a continuous distribution ● Features of a good likert scale ○ Multiple face-valid questions (items) ○ Some reverse scored items (same variable, but opposite end of spectrum) ○ Response scale → odd #, balanced, labeled ■ Not at all → all the time ■ Strongly disagree → strongly agree KEY IDEA: don’t reinvent the wheel ● Utilize other studies ● http://ipip.ori.org/newIndexofScaleLabels.htm ○ Evaluate the items → is this measuring what I want to measure? ○ Adapt/edit as necessary → just explain what you did Data organization ● Vector = one dimensional set of same type data ○ Stores data as a variable ○ Index [i] ● Data.frame = two dimensional collection of vectors ○ Stores related variables (dataset) ○ Index [i (row), i (column)]
○ Row, column → RC Example: data [1, 4] RC (1st row, 4th column) → will give you 1 person
hours
way using facebook
happiness
1
2
A
1
2
3
P
3
3
6
P
5
4
1
A
6
Data[3, 1] → 3 Navigating Data Sets in R ● Structure ○ Each row is a participant ○ Each column is a variable ● Navigation ○ head(data) → prints the first few rows ○ data[1] OR data[,1] → prints first column ○ Data[1,] → prints first row ○ data$variable → prints that variable ■ $ = object within another object Data Cleaning ● Your data will often need some “cleaning” before they can be analyzed ○ TutoRial2 → module 3 ■ Rename a variable ■ Remove an obviously wrong entry ■ Edit an obviously wrong entry ■ Change type of data Describing our Data → a language to talk about data ● Continuous vs. categorical distributions ● Centrality = “the three M’s” ○ Mean ○ Median ○ Mode ● Complexity = “the three S’s” ○ Shape ○ Spread
○ Standard Deviation ● Distribution ○ Histogram - a visualization for one variable ■ Categorical → plot(data$CatVar) ■ Continuous → hist(data$ContVar) ○ Three things to look for: ■ Centrality - how are data clustered? ■ Complexity - how are data spread out? ■ Clarity - do the data make sense? ○ Do not just accept data as truth ■ Is this correct or incorrect? Theoretical distribution: ● What we expect the distribution to look like given true random probability of occurrence ● The shape of the dist. depends on the type of data Familiar concepts → expected distribution ● Symmetry usually Normal distribution ● Large percentage of scores at the center of distribution ● Small percentage of scores expected to be at extreme ends of the distribution ● Normal distribution → defines the probability of a range of scores ○ Hypothetical distribution that occurs when there are multiple explanations for complexity, and when these explanations occur randomly ○ An assumption...