Stats Notes - Statistics in the Modern World with professor Ruth Mihalyi PDF

Title	Stats Notes - Statistics in the Modern World with professor Ruth Mihalyi
Course	Statistics In The Modern World
Institution	University of Pittsburgh
Pages	19
File Size	372.5 KB
File Type	PDF
Total Downloads	92
Total Views	137

Preview

CLICK TO PREVIEW PDF

Summary

Statistics in the Modern World with professor Ruth Mihalyi...

Description

1

STAT 0800: Statistics in the Modern World Chapter 1: Benefits and Risks of Statistics [Mon. 8/27] Course Divided into 4 Parts: 1. Finding data in life (scrutinizing origin of data) 2. Finding life in data (summarizing…) 3. Understanding uncertainty in life (probability theory) 4. Making judgements from surveys and experiments (statistical inference) Definitions: (ways to gather data) 

Variable – a characteristic that varies from one individual to another o Ex. Height, hair color, etc.



Statistics – the science of principles and procedures for gaining and processing data o Summaries of data



Anecdotal Evidence – personal accounts, usually by a few individuals selected haphazardly or by convenience



Observational Study – Researchers observe what happens naturally in terms of variables of interest



Experiment – Researchers take control of values of one variable to see how it affects values of another variable



Factors – “confounding variables”



Placebo Effect – when subjects respond to the idea of treatment, not the treatment itself o Subject shouldn’t know what treatment their getting o Control Group – the group receiving a placebo or no treatment



Blind Subject – unaware of which treatment he/she is receiving



Experimenter Effect – biased assessment of (or attempt to influence) response, due to knowledge of treatment assignment o A Blind – experimenter is unaware of which treatment a subject has received o Double-Blind Study – when experimenter and subject are unaware of treatments the subjects are receiving

Important to have a random sample to avoid unconscious or conscious biases

2 Be careful of word “causes” – have to study for years and be positive (“smoking causes cancer” not “sugar causes hyperactivity”)

Chapter 2: Reading the News [Wed. 8/29] Definitions: 

Data – pieces of information about variables that have number or category values o Quantitative – numerical values for data if you can calculate a meaningful average o Categorical – data that belongs in categories



Survey – particular type of observational study in which data values tend to be selfreported, as in a questionnaire or opinion poll o Opinion polls overwhelmingly represent negative opinions – people who are pissed are more likely to participate in these surveys

Observational studies… 7 Critical Components: 1. Source of research and funding 2. Researchers who had contact with participants 3. Individuals studied, how they were selected 4. Variables studied [measurements, questions] a. How much is a “good size portion of oatmeal”? 5. Setting (time, place) a. if you conduct a study/survey at a certain place, may automatically eliminate a portion of the population 6. Confounding variables [differences besides factor of interest] if causal relationship is claimed a. Getting in a fight with Mom before SAT and then don’t care about the test – SAT testers don’t know that and then make assumptions about your score based on facts they have (parents IQ’s, etc.) 7. Extent or size of claimed effects/differences Most articles/reports mention several of the components adequately…

3 Chapter 3: Measurements, Mistakes, Misunderstandings [Fri. 8/31] Definitions: 

Categorical Variable – one whose values are qualitative (like gender)



Quantitative (Measurement) Variable – one that takes number values with arithmetical meaning (like height) o Discrete Quantitative Variable – one with distinct possible values like the counting numbers 

Ex. If you ask how many pets you had, you wouldn’t say 2.5, you’d say 2 or a “whole” number



Ex. Numbers on a number line

o Continuous Quantitative Variable – one whose possible values fall over a continuous range 

Ex. Values that can take on fractions [fractional numbers between whole numbers on number line]



Valid Measure – measures what it’s supposed to; “on target”; accurate o If an experiment is done wrong, the results are always invalid



Reliable Measure – gives consistent results



Biased Measurement – systematically underestimates or overestimates (can prevent it from being valid) o “Off the mark”; value that is not accurate o Ex. Having a scale that is weighted so you appear 5 lbs. under or over o Ex. Hanging a height chart a ½ in. above the floor – all results are biased (systematically underestimates height)



Variability of Measurements – may result from measurement error and be associated with unreliability o Natural variability – how individuals are inherently different from one another 

Ex. Blood pressure rising at doctors

o Bias is to be avoided (caused variation) Pitfalls/Issues in Survey Questions: 1. Deliberate Bias a. Common in News reporting [Ex. Elections]

4 2. Unintentional Bias 3. Desire to Please a. Purposely asking questions as to not offend, or make interviewee feel better about themselves 4. Asking the Uninformed 5. Unnecessary Complexity a. Asking multiple things in one question, so answer isn’t sufficient for all inquiries 6. Ordering of Questions 7. Confidentiality/Anonymity a. Confidentiality is easier with online surveys b. Anonymity – people saying whatever they want online because they can hide behind a screen i. Ex. Teenagers can write swear words in online surveys 8. Open vs. Closed Questions a. Open – infinite amount of response allowed b. Closed – better for surveys; given total possible values that you want

Chapter 4: How to Get a Good Sample [Wed. 9/5] What does it mean to have something be “random”? 

Ex. Pick 3 states out of a list of 3 columns o If brain is truly random, 1/3 would choose all 3 from one column o Most chose one from each column and not the top 3 because brain doesn’t think that’s random enough – natural

Ways to Gather Data: 

Sample Surveys (this chapter)



Observational Studies (covered next)



Experiments (covered next)



Meta Analysis (covered in Ch. 25)



Case Studies, census (not covered in depth because out statistical methods won’t apply) o Censuses make up data for those they can’t account for 

Ex. Indigenous people in Appalachia

5 Definitions: 

Unit – single individual or subject studied o The thing that is being studied – individual, tree, animal, etc.



Population – entire collection of units about which we’d like information on o Entire set of people that we’re interested in



Sample – collection of units actually studied o Studies done to learn about population, but can only be practically done by sampling 

Ex. Can’t sample the entire US population, take a sample for a study

o Sampling Frame- list of units from which sample was chosen (should match population) 

Ex. Surveying everyone in Alleghany County about sales tax that was imposed to build new sports stadiums, but is no longer needed – everyone has the possibility of being included in the survey



Census – survey that includes entire population



Margin of Error – approximates how close our estimate (from the sample) is to the true value (for the entire population) o Range of how off our estimate is o Ex. 50% of people are going to vote for Hillary Clinton with a Margin of Error of 3%: 

47% - 53%



one way = win, another = loss

o Because it’s almost never possible to survey the entire population, we typically use info from a sample to estimate what’s true for the entire population o Less than 5% of the time, our estimate difference from the true value by more than a Margin of Error 

95% of the time we’re sure or “confident” the population value is within the Margin of Error 

If we take a random sample of size n of categorical values, we are 95% sure that the true proportion is within about 1 divided by the square root of n (1/n) of the same proportions

6 o 1/n = proportion of Margin of Error 

Thus, 1/n is a common error margin surveys

Example: Alcohol Data Shows Little Change: 

Type of Study: Sample Survey



Units: Individual students



Population: All UW-Madison students



Sample: 400 Students surveyed



Sampling Frame: All UW-Madison students? o We hope they were all given the chance to participate, but we don’t know where they were surveyed, etc.



Approx. Margin of Error: 1/n = 1/400 = 1/20 = 0.05 (5%) o 66% of respondents said they had engaged in binge drinking 

Margin of Error: 61% - 71% 

Why not statistically significant? Not really as sure as you’d like to be if kids are drinking more or less now – 66% is within a Margin of Error (5%) of 62%

Sampling Methods: 

Systematic Sampling Plan – uses methodical but non-random approach, like picking individuals at regularly space intervals on a list o Every nth object is selected at regular timed intervals o Ex. Assembly line process



Probability Sampling Plan – makes planned use of chance/randomness in selections



Simple Random Sample (SRS) – (simplest probability sampling plan) selections made at random without replacement, like picking names from a hat o Units are ordered or numbered first o Sample is randomly selected using a random number table or a random number generator



Stratified Random Sample – takes separate random samples from groups of similar individuals (strata) within the population o Taking a representative sample from people that are similar – chosen randomly

7  

Ex. Randomly sampling Pitt students (1st, 2nd, 3rd… year students)

Cluster Sample – selects small groups (clusters) at random from within the population (all units in each cluster included) o Sampling entire set of units in the group 



Ex. Counties in Pennsylvania

Multistage Sample – stratifies in stages, randomly sampling from groups that are successively more specific

Chapter 5: Experiments [Fri. 9/7] Roles of Variables 

Most statistical studies of relationships attempt to establish evidence of causation – changes in values of one variable actually cause changes in values of the other o Ex. Smell of coffee helps people on exams o More variables you can take into account, the more likely you will be able to show a relationship between them/one is causing the other

Definitions: 

Explanatory Variable – the variable that is thought to explain or cause changes in the other variable in a relationship o Diet drug o First o Independent variable in psych.



Response Variable – the variable that is thought to be impacted by another variable in a relationship o Someone’s weight o Second o Dependent



Confounding Variable – one that clouds the issue of causation because its values are tied in with those of the so-called explanatory variable, and also play a role in the so-called response variable’s values o Influencing and causing something in response variable

8 o Ex. Does smoking cause cancer? Such a strong connection that they claim it causes it, despite the other factors (Confounding Variables) at play, such as lifestyle, genetics, environment, how much you smoke, ethnicity, how long you’ve smoked, etc. o A.K.A. “Lurking Variable” o More common and problematic, especially in observational studies 

Ex. Didn’t know as much about genetics when first studying smoking affects



Interacting Variable – one whose presence or absence enables or disables explanatory variable’s impact on response (like a trigger) or one that influences degree of causation



Treatments – values of explanatory variable imposed by researchers in an experiment o Have to be present in an experiment – no treatment, no experiment o Ex. Levels of alcohol in a drunk-driving experiment o Levels of a Treatment: Drunk-Driving Experiment 

Treatment: Alcohol 

Group 1 (Control) – No alcohol



Group 2 – 2 drinks



Group 3 – 4 drinks



Group 4 – 6 drinks o “Groups” = levels – variations that you have in a treatment are levels



Control Group – individuals for which the imposed explanatory value is at baseline or a neutral value, for comparison purposes o A subject can be their own control 



Ex. Before and after studies – taking blood pressure during and after finals

The Placebo Effect – when subjects respond to the idea of treatment, not the treatment itself o Placebo – a “dummy” treatment



A blind subject is unaware of which treatment he/she is receiving



The Experimenter Effect – biased assessment of (or attempt to influence) response due to knowledge of treatment assignment

9 o Subconscious altering of effects of experiment because they know who is getting what treatment o A blind experimenter is unaware of which treatment a subject has received

*** Random assignment to treatments (or to treatment vs. control) is the key to preventing confounding variables from entering into the relationship between explanatory and response *** 

Still going to have confounding variables, but this minimizes it – why studies are repeated



It’s okay if subjects are volunteers, not randomly chosen – not okay to not have random assignment o Can volunteer to be in study, not which group they participate in

Example: How to Avoid Confounding 

Experiments give more control on who gets what – manipulating variables as they exist

Common Pitfalls in Experiments 

Confounding Variables (rare)



Interacting Variables (rarer)



Placebo Effect



Hawthorne Effect – people’s performance can improve simply due to their awareness that they are being observed



Lack of Realism (lack of ecological validity) o Ex. Teachers in an active-shooter drill, but knew it was all fake so just sat in classrooms doing paperwork

Modifications to Complete Randomization 

Block Design – first divide subjects into groups of individuals that are similar with respect to an important variable, then randomly assign to treatments within each group o Ex. Blocking into male and female groups before random assignment



Paired Design – randomly assign 2 treatments (or treatment vs. control) within each pair of similar subjects o Before and after example in previous slide – person is their own control o Ex. Husband and wife studies, twins, etc. 

Numbers dependent on each other

10

Chapter 6: Observational Studies & Review [Mon. 9/10] Definitions: 

Retrospective Observational Study – researchers record variables’ values backward in time, about the past o Ex. Looking at the suicide rates in middle schoolers in 2010 to compare to today



Prospective Observational Study – researchers record variables’ values forward in time from the present o A person recording themselves and things that are happening o Or researchers observing and recording everything o Ex. Therapy patient keeping a journal



Case-Control Study – individuals (cases) with the investigated response are compared to those without (controls), to identify the explanatory value responsible o Often used with illnesses

Why can’t we always use an experiment? 

Not always practical – expenses, behavior, etc.



Not ethical – testing lifespans, etc. o Ex. Testing cigarette smoking to see if it causes cancer



Costs – impossible to be able to pay for it



Unrealistic o Ex. Testing eating junk food and watching tv on a set – inaccurate environment can change behavior

Pros/Cons of Observational Study 

Con: Lack of realism – sometimes tied into the Hawthorne Effect o Researchers watching/talking to young drivers while testing their reflexes to braking



Pro: Person is own control – drove simulation trips while talking and also silent o Gives the ability to compare – let’s researchers know if the drivers just have slow reflexes without being distracted

Common Problems with Observational Studies 

Confounding variables (should always be considered first)

11 

Extending the results inappropriately (sample doesn’t truly represent population of interest) o Ex. Can’t do a study at Pitt on drinking and then apply it to every college in America – we’re a big, urban campus, etc.



Using the past as a source of data (time enters in as confounding variable) o Ex. Suicide rates in middle schoolers are high now, looking back 20 years on stats doesn’t really help with the issue now

Chapter 6: Getting the Big Picture [Wed. 9/12] 7 Guidelines for Systematic Evaluation: 1. Determine if study was sample survey, experiment, observational study, census, or anecdotes 2. Consider 7 Critical Components (details) 3. Check for “Difficulties and Disasters” a. (sampling p. 69, experiments p. 90, observational studies p. 96) b. Biggest problem with observational studies is confounding variables 4. Is information complete? If not, find original study? a. News sources paraphrasing or summarizing studies – click on link or find original study source to get all of information 5. Do results make sense? a. In coffee study, do people really like others better when they hold the warm beverage? 6. Are alternative explanations possible? a. What if the people who tested favorably in the study above just really like coffee? 7. Do results affect your attitude/lifestyle? a. Iced coffee is popular right now – are people going to drink more hot coffee to try to emulate the results of this study?

Chapter 7: Summarizing Data: Measurement Data [Fri. 9/14] Course Divided into 4 Parts (Review): 1. Finding Data in Life (completed): scrutinizing origin of data

12 2. Finding Life in Data (now): summarizing data yourself or assessing another’s summary 3. Understanding Uncertainty in Life: probability theory 4. Making Judgments from Surveys… Definitions: (Review) 

Variable – a characteristic that varies from one individual to another



Statistics – 2 definitions: 1. science of principles... 2. measures or qualities about a sample



Parameters – measure or qualities about a population



Center – measure of what is typical in the distribution of a quantitative variable 

Measures of Center: o Mean = average = sum of values/number of values 

Mean of a sample in statistics = “x bar” (x with line over it)



Affected by outliers

o Median: 

Middle for odd number of values – numbers have to be in order 



Ex. 2 3 3 3 5 5 7 8 21

Average of middle two for even number of values 

Ex. 23 24 24 25 30...