Title | Stats Notes - Statistics in the Modern World with professor Ruth Mihalyi |
---|---|
Course | Statistics In The Modern World |
Institution | University of Pittsburgh |
Pages | 19 |
File Size | 372.5 KB |
File Type | |
Total Downloads | 92 |
Total Views | 137 |
Statistics in the Modern World with professor Ruth Mihalyi...
1
STAT 0800: Statistics in the Modern World Chapter 1: Benefits and Risks of Statistics [Mon. 8/27] Course Divided into 4 Parts: 1. Finding data in life (scrutinizing origin of data) 2. Finding life in data (summarizing…) 3. Understanding uncertainty in life (probability theory) 4. Making judgements from surveys and experiments (statistical inference) Definitions: (ways to gather data)
Variable – a characteristic that varies from one individual to another o Ex. Height, hair color, etc.
Statistics – the science of principles and procedures for gaining and processing data o Summaries of data
Anecdotal Evidence – personal accounts, usually by a few individuals selected haphazardly or by convenience
Observational Study – Researchers observe what happens naturally in terms of variables of interest
Experiment – Researchers take control of values of one variable to see how it affects values of another variable
Factors – “confounding variables”
Placebo Effect – when subjects respond to the idea of treatment, not the treatment itself o Subject shouldn’t know what treatment their getting o Control Group – the group receiving a placebo or no treatment
Blind Subject – unaware of which treatment he/she is receiving
Experimenter Effect – biased assessment of (or attempt to influence) response, due to knowledge of treatment assignment o A Blind – experimenter is unaware of which treatment a subject has received o Double-Blind Study – when experimenter and subject are unaware of treatments the subjects are receiving
Important to have a random sample to avoid unconscious or conscious biases
2 Be careful of word “causes” – have to study for years and be positive (“smoking causes cancer” not “sugar causes hyperactivity”)
Chapter 2: Reading the News [Wed. 8/29] Definitions:
Data – pieces of information about variables that have number or category values o Quantitative – numerical values for data if you can calculate a meaningful average o Categorical – data that belongs in categories
Survey – particular type of observational study in which data values tend to be selfreported, as in a questionnaire or opinion poll o Opinion polls overwhelmingly represent negative opinions – people who are pissed are more likely to participate in these surveys
Observational studies… 7 Critical Components: 1. Source of research and funding 2. Researchers who had contact with participants 3. Individuals studied, how they were selected 4. Variables studied [measurements, questions] a. How much is a “good size portion of oatmeal”? 5. Setting (time, place) a. if you conduct a study/survey at a certain place, may automatically eliminate a portion of the population 6. Confounding variables [differences besides factor of interest] if causal relationship is claimed a. Getting in a fight with Mom before SAT and then don’t care about the test – SAT testers don’t know that and then make assumptions about your score based on facts they have (parents IQ’s, etc.) 7. Extent or size of claimed effects/differences Most articles/reports mention several of the components adequately…
3 Chapter 3: Measurements, Mistakes, Misunderstandings [Fri. 8/31] Definitions:
Categorical Variable – one whose values are qualitative (like gender)
Quantitative (Measurement) Variable – one that takes number values with arithmetical meaning (like height) o Discrete Quantitative Variable – one with distinct possible values like the counting numbers
Ex. If you ask how many pets you had, you wouldn’t say 2.5, you’d say 2 or a “whole” number
Ex. Numbers on a number line
o Continuous Quantitative Variable – one whose possible values fall over a continuous range
Ex. Values that can take on fractions [fractional numbers between whole numbers on number line]
Valid Measure – measures what it’s supposed to; “on target”; accurate o If an experiment is done wrong, the results are always invalid
Reliable Measure – gives consistent results
Biased Measurement – systematically underestimates or overestimates (can prevent it from being valid) o “Off the mark”; value that is not accurate o Ex. Having a scale that is weighted so you appear 5 lbs. under or over o Ex. Hanging a height chart a ½ in. above the floor – all results are biased (systematically underestimates height)
Variability of Measurements – may result from measurement error and be associated with unreliability o Natural variability – how individuals are inherently different from one another
Ex. Blood pressure rising at doctors
o Bias is to be avoided (caused variation) Pitfalls/Issues in Survey Questions: 1. Deliberate Bias a. Common in News reporting [Ex. Elections]
4 2. Unintentional Bias 3. Desire to Please a. Purposely asking questions as to not offend, or make interviewee feel better about themselves 4. Asking the Uninformed 5. Unnecessary Complexity a. Asking multiple things in one question, so answer isn’t sufficient for all inquiries 6. Ordering of Questions 7. Confidentiality/Anonymity a. Confidentiality is easier with online surveys b. Anonymity – people saying whatever they want online because they can hide behind a screen i. Ex. Teenagers can write swear words in online surveys 8. Open vs. Closed Questions a. Open – infinite amount of response allowed b. Closed – better for surveys; given total possible values that you want
Chapter 4: How to Get a Good Sample [Wed. 9/5] What does it mean to have something be “random”?
Ex. Pick 3 states out of a list of 3 columns o If brain is truly random, 1/3 would choose all 3 from one column o Most chose one from each column and not the top 3 because brain doesn’t think that’s random enough – natural
Ways to Gather Data:
Sample Surveys (this chapter)
Observational Studies (covered next)
Experiments (covered next)
Meta Analysis (covered in Ch. 25)
Case Studies, census (not covered in depth because out statistical methods won’t apply) o Censuses make up data for those they can’t account for
Ex. Indigenous people in Appalachia
5 Definitions:
Unit – single individual or subject studied o The thing that is being studied – individual, tree, animal, etc.
Population – entire collection of units about which we’d like information on o Entire set of people that we’re interested in
Sample – collection of units actually studied o Studies done to learn about population, but can only be practically done by sampling
Ex. Can’t sample the entire US population, take a sample for a study
o Sampling Frame- list of units from which sample was chosen (should match population)
Ex. Surveying everyone in Alleghany County about sales tax that was imposed to build new sports stadiums, but is no longer needed – everyone has the possibility of being included in the survey
Census – survey that includes entire population
Margin of Error – approximates how close our estimate (from the sample) is to the true value (for the entire population) o Range of how off our estimate is o Ex. 50% of people are going to vote for Hillary Clinton with a Margin of Error of 3%:
47% - 53%
one way = win, another = loss
o Because it’s almost never possible to survey the entire population, we typically use info from a sample to estimate what’s true for the entire population o Less than 5% of the time, our estimate difference from the true value by more than a Margin of Error
95% of the time we’re sure or “confident” the population value is within the Margin of Error
If we take a random sample of size n of categorical values, we are 95% sure that the true proportion is within about 1 divided by the square root of n (1/n) of the same proportions
6 o 1/n = proportion of Margin of Error
Thus, 1/n is a common error margin surveys
Example: Alcohol Data Shows Little Change:
Type of Study: Sample Survey
Units: Individual students
Population: All UW-Madison students
Sample: 400 Students surveyed
Sampling Frame: All UW-Madison students? o We hope they were all given the chance to participate, but we don’t know where they were surveyed, etc.
Approx. Margin of Error: 1/n = 1/400 = 1/20 = 0.05 (5%) o 66% of respondents said they had engaged in binge drinking
Margin of Error: 61% - 71%
Why not statistically significant? Not really as sure as you’d like to be if kids are drinking more or less now – 66% is within a Margin of Error (5%) of 62%
Sampling Methods:
Systematic Sampling Plan – uses methodical but non-random approach, like picking individuals at regularly space intervals on a list o Every nth object is selected at regular timed intervals o Ex. Assembly line process
Probability Sampling Plan – makes planned use of chance/randomness in selections
Simple Random Sample (SRS) – (simplest probability sampling plan) selections made at random without replacement, like picking names from a hat o Units are ordered or numbered first o Sample is randomly selected using a random number table or a random number generator
Stratified Random Sample – takes separate random samples from groups of similar individuals (strata) within the population o Taking a representative sample from people that are similar – chosen randomly
7
Ex. Randomly sampling Pitt students (1st, 2nd, 3rd… year students)
Cluster Sample – selects small groups (clusters) at random from within the population (all units in each cluster included) o Sampling entire set of units in the group
Ex. Counties in Pennsylvania
Multistage Sample – stratifies in stages, randomly sampling from groups that are successively more specific
Chapter 5: Experiments [Fri. 9/7] Roles of Variables
Most statistical studies of relationships attempt to establish evidence of causation – changes in values of one variable actually cause changes in values of the other o Ex. Smell of coffee helps people on exams o More variables you can take into account, the more likely you will be able to show a relationship between them/one is causing the other
Definitions:
Explanatory Variable – the variable that is thought to explain or cause changes in the other variable in a relationship o Diet drug o First o Independent variable in psych.
Response Variable – the variable that is thought to be impacted by another variable in a relationship o Someone’s weight o Second o Dependent
Confounding Variable – one that clouds the issue of causation because its values are tied in with those of the so-called explanatory variable, and also play a role in the so-called response variable’s values o Influencing and causing something in response variable
8 o Ex. Does smoking cause cancer? Such a strong connection that they claim it causes it, despite the other factors (Confounding Variables) at play, such as lifestyle, genetics, environment, how much you smoke, ethnicity, how long you’ve smoked, etc. o A.K.A. “Lurking Variable” o More common and problematic, especially in observational studies
Ex. Didn’t know as much about genetics when first studying smoking affects
Interacting Variable – one whose presence or absence enables or disables explanatory variable’s impact on response (like a trigger) or one that influences degree of causation
Treatments – values of explanatory variable imposed by researchers in an experiment o Have to be present in an experiment – no treatment, no experiment o Ex. Levels of alcohol in a drunk-driving experiment o Levels of a Treatment: Drunk-Driving Experiment
Treatment: Alcohol
Group 1 (Control) – No alcohol
Group 2 – 2 drinks
Group 3 – 4 drinks
Group 4 – 6 drinks o “Groups” = levels – variations that you have in a treatment are levels
Control Group – individuals for which the imposed explanatory value is at baseline or a neutral value, for comparison purposes o A subject can be their own control
Ex. Before and after studies – taking blood pressure during and after finals
The Placebo Effect – when subjects respond to the idea of treatment, not the treatment itself o Placebo – a “dummy” treatment
A blind subject is unaware of which treatment he/she is receiving
The Experimenter Effect – biased assessment of (or attempt to influence) response due to knowledge of treatment assignment
9 o Subconscious altering of effects of experiment because they know who is getting what treatment o A blind experimenter is unaware of which treatment a subject has received
*** Random assignment to treatments (or to treatment vs. control) is the key to preventing confounding variables from entering into the relationship between explanatory and response ***
Still going to have confounding variables, but this minimizes it – why studies are repeated
It’s okay if subjects are volunteers, not randomly chosen – not okay to not have random assignment o Can volunteer to be in study, not which group they participate in
Example: How to Avoid Confounding
Experiments give more control on who gets what – manipulating variables as they exist
Common Pitfalls in Experiments
Confounding Variables (rare)
Interacting Variables (rarer)
Placebo Effect
Hawthorne Effect – people’s performance can improve simply due to their awareness that they are being observed
Lack of Realism (lack of ecological validity) o Ex. Teachers in an active-shooter drill, but knew it was all fake so just sat in classrooms doing paperwork
Modifications to Complete Randomization
Block Design – first divide subjects into groups of individuals that are similar with respect to an important variable, then randomly assign to treatments within each group o Ex. Blocking into male and female groups before random assignment
Paired Design – randomly assign 2 treatments (or treatment vs. control) within each pair of similar subjects o Before and after example in previous slide – person is their own control o Ex. Husband and wife studies, twins, etc.
Numbers dependent on each other
10
Chapter 6: Observational Studies & Review [Mon. 9/10] Definitions:
Retrospective Observational Study – researchers record variables’ values backward in time, about the past o Ex. Looking at the suicide rates in middle schoolers in 2010 to compare to today
Prospective Observational Study – researchers record variables’ values forward in time from the present o A person recording themselves and things that are happening o Or researchers observing and recording everything o Ex. Therapy patient keeping a journal
Case-Control Study – individuals (cases) with the investigated response are compared to those without (controls), to identify the explanatory value responsible o Often used with illnesses
Why can’t we always use an experiment?
Not always practical – expenses, behavior, etc.
Not ethical – testing lifespans, etc. o Ex. Testing cigarette smoking to see if it causes cancer
Costs – impossible to be able to pay for it
Unrealistic o Ex. Testing eating junk food and watching tv on a set – inaccurate environment can change behavior
Pros/Cons of Observational Study
Con: Lack of realism – sometimes tied into the Hawthorne Effect o Researchers watching/talking to young drivers while testing their reflexes to braking
Pro: Person is own control – drove simulation trips while talking and also silent o Gives the ability to compare – let’s researchers know if the drivers just have slow reflexes without being distracted
Common Problems with Observational Studies
Confounding variables (should always be considered first)
11
Extending the results inappropriately (sample doesn’t truly represent population of interest) o Ex. Can’t do a study at Pitt on drinking and then apply it to every college in America – we’re a big, urban campus, etc.
Using the past as a source of data (time enters in as confounding variable) o Ex. Suicide rates in middle schoolers are high now, looking back 20 years on stats doesn’t really help with the issue now
Chapter 6: Getting the Big Picture [Wed. 9/12] 7 Guidelines for Systematic Evaluation: 1. Determine if study was sample survey, experiment, observational study, census, or anecdotes 2. Consider 7 Critical Components (details) 3. Check for “Difficulties and Disasters” a. (sampling p. 69, experiments p. 90, observational studies p. 96) b. Biggest problem with observational studies is confounding variables 4. Is information complete? If not, find original study? a. News sources paraphrasing or summarizing studies – click on link or find original study source to get all of information 5. Do results make sense? a. In coffee study, do people really like others better when they hold the warm beverage? 6. Are alternative explanations possible? a. What if the people who tested favorably in the study above just really like coffee? 7. Do results affect your attitude/lifestyle? a. Iced coffee is popular right now – are people going to drink more hot coffee to try to emulate the results of this study?
Chapter 7: Summarizing Data: Measurement Data [Fri. 9/14] Course Divided into 4 Parts (Review): 1. Finding Data in Life (completed): scrutinizing origin of data
12 2. Finding Life in Data (now): summarizing data yourself or assessing another’s summary 3. Understanding Uncertainty in Life: probability theory 4. Making Judgments from Surveys… Definitions: (Review)
Variable – a characteristic that varies from one individual to another
Statistics – 2 definitions: 1. science of principles... 2. measures or qualities about a sample
Parameters – measure or qualities about a population
Center – measure of what is typical in the distribution of a quantitative variable
Measures of Center: o Mean = average = sum of values/number of values
Mean of a sample in statistics = “x bar” (x with line over it)
Affected by outliers
o Median:
Middle for odd number of values – numbers have to be in order
Ex. 2 3 3 3 5 5 7 8 21
Average of middle two for even number of values
Ex. 23 24 24 25 30...