Gathering Data Part2 notes PDF

Title Gathering Data Part2 notes
Course Essentials of Data Science
Institution Saint Joseph's University
Pages 8
File Size 369.2 KB
File Type PDF
Total Downloads 22
Total Views 161

Summary

Dr. Regis...


Description

MAT 325 Essentials of Data Science: Gathering Data: Part 2 Hormone Replacement Therapy • Example: Data from numerous observational studies have shown that women who were taking hormone replacement therapy after menopause also had a lower-than-average incidence of coronary heart disease (CHD). • In particular, the data showed that the risk of a heart attack is about 35% to 50% lower for women who take hormones compared with women who do not take them. The risks of taking hormones appeared small compared with the benefits. • Should women then take hormones after menopause to reduce the risk of CHD? • Hormone Replacement Therapy. Does the study provide good evidence that women should take hormones after menopause to reduce the risk of CHD? 1. Yes 2. No • In 1992, several major medical organizations said “Yes.” • Unfortunately, these studies failed to account for one important lurking variable: the women’s backgrounds and lifestyles. • Women who choose to take hormones are very different from those who do not: o richer and better educated o see doctors more often and they do many things to maintain their health • Hence, fewer heart attacks. • By 2002, several controlled experiments with women of different ages agreed that hormone replacement does not reduce the risk of heart attacks. • The National Institutes of Health, after reviewing the evidence, concluded that the previous conclusions based on the observational studies were wrong. • The sample was actually bias bc they were all doing other things to improve their health Music Programs • In a 1981 study conducted at Mission Viejo High School, in California, researchers compared the scholastic performance of music students (those who play a musical instrument) with that of non-music students. The study found that music students had a much higher overall GPA than the non-music students, 3.59 to 2.91. Not only that, 16% of the music students had all A’s compared with only 5% of the non-music students. Does this study provide good evidence that expanded music programs will help students do well in school? 1. Yes 2. No (bc seems like this could be the outcome for any other club a student joined, the types of students that play a music probably have similar traits that make them more successful, not just bc they play an instrument) • Association doesn’t mean causation o Think of reading skill and shoe size Observation versus experiment • Observational study: Record data on individuals without attempting to influence the responses. • Experiment: Deliberately impose a treatment on individuals and record their responses.

o Purpose is to determine whether treatment causes a response. Limitations of an observational study • Observational studies are essential sources of information, but they typically cannot prove cause and effect. • Why? Because association does not imply causation. • To understand cause and effect, you need to perform well-designed experiments. Confounding • Two variables (explanatory variables or lurking variables) are confounded when their effects on a response variable cannot be distinguished from each other. • Observational studies usually cannot prove cause and effect because the explanatory variable is confounded with lurking variables. Binge Drinking • A common definition of “binge drinking” is 5 or more drinks at one setting for men, and 4 or more for women. A study found that students who binge have lower average GPA than those who don’t. Based on this study, can we conclude that binge drinking causes lower GPA? 1. Yes 2. No (if people have a low GPA its probably from their other tendencies they have and don’t care about school very much) Terminology • Experimental units: individuals in an experiment • If the experimental units are humans, we call them subjects. • Factors: explanatory variables (or independent variables) in an experiment • Treatment: any specific experimental condition applied to the subjects • If an experiment has several factors, a treatment is a combination of specific values of each factor. Example (Foster care versus orphanage) • Do abandoned children placed in foster homes do better than similar children placed in an institution? The Bucharest Early Intervention Project found that the answer is a clear “Yes.” The subjects were 136 young children abandoned at birth and living in orphanages in Bucharest, Romania. Half of the children, chosen at random, were placed in foster homes. The other half remained in the orphanages. The experiment compared these two treatments. o Subjects: The 136 abandoned children. o Factor: The type of home. o Treatments: Foster home or institutional care. o Response variables (dependent variable): Measures of mental and physical development. Example (Effects of TV advertising) • What are the effects of repeated exposure to an advertising message? The answer may depend both on the length of the ad and on how often it is repeated. An experiment used undergrad students as subjects. All subjects viewed a 40-minute TV program that included ads for a digital camera. Some students saw a 30-second commercial; others, a 90-second version. The same commercial was shown either 1, 3, or 5 times during the

program. After viewing, all of the subjects answered questions about: their recall of the ad, their attitude toward the camera, and their intention to purchase the camera. What are the subjects, the factors, the treatments, and the response variables in this experiment? • What are the factor(s) in this experiment? 1. the length of the commercial 2. the number of times the commercial was shown 3. the TV program 4. all of the above 5. (1) and (2) only • How many treatments are there in this experiment? 1. 2 2. 3 3. 5 4. 6 5. 9 Effects of TV advertising • Subjects: undergraduate students. • Factors (Explanatory Variables): o Length of the commercial (30 seconds or 90 seconds) o Repetitions (1, 3 or 5) • Treatments: All 6 possible combinations of the two factors.

Response Variables: o Recall of the ad o Attitude toward the camera o Intention to purchase the camera Example 9.4 GMAT Review • A college regularly offers a review course to prepare candidates for the Graduate Management Admission Test (GMAT), which is required by most graduate business schools. This year, it offers only an online version of the course. The average GMAT score of students in the online course is 10% higher than the longtime average for those who took the classroom review course. • Can we conclude that the online course is more effective? 1. Yes 2. No •



Is the online course more effective? o We don’t know. o Students who enrolled in the online review course might be different from students who took the classroom course in the past (e.g., the online students might be older and employed full time). o Effect of online versus in-class instruction is confounded with the effect of lurking variables (students’ ages and backgrounds).

Confounding. Can’t distinguish the effect of the treatment from the effects of lurking variables. An advantage of experiments over observational studies is 1. An experiment can provide evidence of cause and effect.(bc talking about an advantage) 2. An experiment can compare two or more groups. (also found in observational studies) 3. An experiment can include explanatory and response variables. (also found in observational studies) 4. All of the above 5. None of the above Which of the following does an observation study not incorporate? 1. Random assignment treatments How to experiment badly • Subjects Treatment  Measure response (i.e. all subjects receive the same treatment) o In a controlled environment (e.g. laboratory), this simple design can work well. o Field experiments and experiments with human subjects are exposed to more variable conditions and deal with more variable subjects. o Simple design usually yields worthless results because of confounding with lurking variables. How to fix the GMAT review example • 1. Do a comparative experiment: o Some students are taught in the classroom. o Other (similar) students take the course online. • The first group is called a control group. •

2. Ensure that the two groups (i.e. students in the classroom course and students in the online course) are somewhat similar. o If we allow students to make the selection, students who are older or employed are more likely to sign up for online course and this will produce bias in the results. o Hence, assign students to the groups at random. • 3. Use enough subjects to reduce chance variation in the results. o  Effects of chance will average out and there will be little difference in the two groups unless the treatments themselves cause a difference. 3 Principles of experimental design • Control the effects of lurking variables on the response, most simply by comparing two or more treatments. • Randomize—use chance to assign subjects to treatments. • Replicate—use enough subjects in each group to reduce chance variation in the results. What kind of experiment should be performed? • Randomized comparative experiment: o An experiment that uses both comparison of two or more treatments and chance assignment of subjects to treatments. o Designed to give good evidence that differences in the treatments actually cause the differences in the response. Completely Randomized Experimental Design • In a completely randomized experimental design, all the subjects are allocated at random among all the treatments. Example (Does ginkgo improve memory?) • The law allows marketers of herbs and other natural substances to make health claims that are not supported by evidence. Brands of ginkgo extract claim to “improve memory and concentration.” • A randomized comparative experiment found no evidence for such effects. The subjects were 230 healthy people over 60 years old. They were randomly assigned to ginkgo or a placebo pill (a dummy pill that looks and tastes the same). All the subjects took a battery of tests for learning and memory before treatment started and again after six weeks. • Provide an outline of the design of this experiment. •

Statistical Significance • An observed effect is called statistically significant if it is so large that it would rarely occur by chance.

o Statistically significant differences among groups in a randomized comparative experiment provide good evidence that the treatments actually caused these differences. • Need to make sure the outcome didn’t occur by chance or accident About the placebo effect • “Placebo Effect”: Improvement in health due not to any treatment but only to patient’s belief that he or she will improve. o Not understood but is believed to have therapeutic results on up to 35% of patients. o Can sometimes ease the symptoms of a variety of ills, from asthma to pain to high blood pressure and even to heart attacks. o An opposite, or “negative placebo effect,” has been observed when patients believe their health will get worse. • Why use a placebo? o In an earlier example, the subjects were randomly assigned to a placebo pill (control group) or a ginkgo pill. o If the control group did not take any pills, the effect of ginkgo in the treatment would be confounded with the placebo effect (i.e. the effect of simply taking the pills)

Double-Blind Experiments • Double-blind experiment: neither the subjects nor the people who interact with them know which treatment each subject is receiving. • Only the statistician who does the randomization knows. • Example: In the ginkgo study, the subjects didn’t know whether they were taking ginkgo or a placebo. The investigators also didn’t know who was taking the ginkgo or the placebo pill. • Why do we want double-blind experiments? o To avoid unconscious bias by the subject or the person who administers the treatment. o Example: A doctor may be convinced that a new medical treatment is better than the placebo and this could unconsciously influence his diagnosis. o If the doctor knows which is getting the treatment then he may act differently when administering and interacting with the patient

Which of the following principles of good experimentation does an observational study not incorporate? 1. Control or comparison 2. Random assignment to treatments 3. Replication Completely randomized designs



In a completely randomized experimental design, individuals are allocated at random among treatments.

Matched Pairs Designs • Used when comparing two treatments. • Choose pairs of subjects that are closely matched (e.g., same sex, height, weight, age, and race) o Twins are really good for this o Within each pair, randomly assign who will receive which treatment. • Can also use a single person and give the two treatments to this person over time in random order. o In this case, the “matched pair” is just the same person at different points in time. • Note: Order of treatment can influence the subject’s response, so we randomize the order for each subject. Exercise (How long did I work?) • A psychologist wants to know if the difficulty of a task influences our estimate of how long we spend working at it. • She designs two sets of mazes that subjects can work through on a computer. One set has easy mazes and the other has hard mazes. Subjects work until told to stop (after 6 minutes, but subjects do not know this). They are then asked to estimate how long they worked. • The psychologist has 30 students available to serve as subjects. o (a) Outline the design of a completely randomized experiment to learn the effect of difficulty on estimated time. o (b) Describe the design of a matched pairs experiment using the same 30 subjects. • Each student does the activity twice, once with easy mazes and one with the hard mazes



Randomly decide (for each student) which set of mazes is used first • If you give easy one first it may help them warm up and do better on the hard maze • If you give hard maze first they may be tired and not do as well on the easy maze

Block Designs • A block is a group of individuals that are known before the experiment to be similar in some way. • In a block design, the random assignment of individuals to treatments is carried out separately within each block. • Note: The assignment to blocks is not random. • A matched pairs design is a special kind of block design where each pair forms a block. • Blocking by gender:



Why use a matched pairs or a block design? o Purpose of the matching or blocking is to reduce the effect of variation among the subjects....


Similar Free PDFs