STA3024 Notes - Summer B 2020 PDF

Title STA3024 Notes - Summer B 2020
Author Nooar Ibrahim
Course Development Psychol
Institution University of Florida
Pages 116
File Size 3.4 MB
File Type PDF
Total Downloads 90
Total Views 121

Summary

Download STA3024 Notes - Summer B 2020 PDF


Description

STA 3024

Introduction to Statistics II

Summer B 2020

Instructor: Melanie Buhlmann email: [email protected] office hours: TBA, or by appointment Lectures: Will be pre-recorded and uploaded MTWRF at around 1:30 pm. No live lectures! The course number is 14191 (section 7459). Teaching Assistants: Yichen Bai: email: [email protected] office hours: Tuesday 1:00pm-7:00pm WooJung Bae: email: [email protected] office hours: Thursday 9:00am-3:00pm Zhuochao Huang: email: [email protected] office hours: TBA

Course Description and Objectives In this course, students learn how to summarize data, analyze it, and make appropriate decisions based on it. The sequence of courses STA 2023-3024 provides students with a firm foundation in the basics of applied statistical methods. The prerequisite for this course is STA 2023, which covered chapters 1-9 in the textbook (data collection, graphical and numerical summaries, probability and an introduction to statistical inference). Concepts from STA 2023 will be reviewed as needed. The course focuses on the following four topics: 1. Analysis of Variance to compare three or more population means. 2. Simple Linear Regression and Multiple Regression to predict a quantitative response. 3. Analysis of Two-Way Tables to study the relationship between two categorical variables. 4. Nonparametric Statistics that do not require a Normal distribution of the response variable. Materials: 1. Recommended Textbook: Statistics,The Art and Science of Learning from Data, by Agresti, Franklin and Klingenberg, 4th edition, Prentice Hall. 2. Required Scientific Calculator (around $10 to $15) that has some basic statistical functions like mean and standard deviation Graphing calculators are not allowed during the exams.

3. A shell of the lecture notes can be purchased at Target Copy (1412 W University Ave across from Library West). These will have the computer output for examples done in class and an outline of the lecture notes that we will then complete in class, so it will be your class notebook. If you prefer, an electronic version is posted in Canvas. Course Website in CANVAS: https://lss.at.ufl.edu/ This is the portal for UF’s E-learning website. You must log on using your gatorlink username and password, and access the course webpages from there. Important information about the course will be posted here including this syllabus, announcements, your grades throughout the semester and computer output to supplement the examples done in class. Online quizzes will be administered there. Online Quizzes - There will be approximately four online quizzes, administered through ELearning. You will have three tries for each quiz (with questions randomly generated) over a period of several days. Each quiz will be worth 10 points, for a total of 40 points. Hopefully these quizzes will serve the purpose of improving your grade in the class, as well as be an important tool in learning the material for the course. Quiz details will be announced in class and the course website, see calendar on last page of syllabus for dates. Suggested Homework Problems will appear on the website. They will help you master the material but will not be collected. Projects - There will be one small data analysis project to be completed during the semester. The project will be worth 60 points. Project details will be given in class and the course website. Exams - There will be two exams given during the semester, each worth 100 points. Regular exams are in multiple choice format. Students are required to take the exam in the section they are registered for. All students must bring to the exam: their student ID number, picture ID, a nongraphing calculator, and pencils. In case of conflict or illness, if a student is unable to take an exam at the scheduled time, they must get in touch with the instructor prior to the exam time for any arrangements to be made for a makeup. Each case will be reviewed individually. Valid and detailed documentation is a prerequisite under such extenuating circumstances. Makeup exams may not be multiple choice. A grade of zero is the minimum punishment of any type of dishonesty on an exam. Exam 1 - Fr July 24 - online Regression Exam 2 - Fr August 14 - online Nonparametrics Grade Structure Exam 1 100 points Exam 2 100 points Projects 60 points Quizzes 40 points TOTAL 400 points

Ch 10 and 14, Ch12

Comparing Groups and

Ch 13, Ch11 and 15 Regression, Chi Squared and

Grading Scale: A 90% to 100% A- 87% to 89%

B+ 84% to 86% B 80% to 83% B- 77% to 79%

C+ 74% to 76% C 70% to 73% C- 64% to 69%

D 60% to 63% (No D+ or D- given) E 59% and below

UF Grading Policy: https://catalog.ufl.edu/ugrad/1617/regulations/info/grades.aspx

Course Policies: Email to Instructor – will be answered within one working day in most cases. Please be aware that statistical questions should be answered in person (in class or during office hours) since they often require pictures and formulas that make it very hard to communicate through email. Attendance – although not required, is very highly recommended. Please be aware that if there are other sections of this course with a different instructor, they may cover the material in different order. Requirements for class attendance and make-up exams, assignments and other work in this course are consistent with university policies that can be found at https://catalog.ufl.edu/ugrad/1617/regulations/info/attendance.aspx. Classroom Behavior - during class students should turn off their cellular phones and refrain from eating, drinking, reading newspapers, doing homework, listening to music, excessive talking and all other behaviors that are distracting and disrespectful to the instructor and their fellow students. Privacy Policy - Student records are confidential. Only information designated “UF directory information” may be released without your written consent. This applies to parents or anyone else who contacts me about your grades. University’s Honesty Policy: UF students are bound by The Honor Pledge which states, “We, the members of the University of Florida community, pledge to hold ourselves and our peers to the highest standards of honor and integrity by abiding by the Honor Code. On all work submitted for credit by students at the University of Florida, the following pledge is either required or implied: “On my honor, I have neither given nor received unauthorized aid in doing this assignment.” The Honor Code (http://www.dso.ufl.edu/sccr/process/student-conduct-honor-code/) specifies a number of behaviors that are in violation of this code and the possible sanctions. Furthermore, you are obligated to report any condition that facilitates academic misconduct to appropriate personnel. Grading – grades will be changed only when an error has been made. Negotiation is not appropriate. Incompletes are only assigned when extraordinary circumstances, arising after the date for dropping the course, prevent the student from completing the course requirements. Having a failing grade in the course is not a valid reason for requesting an Incomplete. Students with Disabilities - Students with disabilities who experience learning barriers and would like to request academic accommodations should connect with the disability Resource Center by visiting https://disability.ufl.edu/students/get-started/. It is important for students to share their accommodation letter with their instructor and discuss their access needs, as early as possible in the semester

Instructor / Course Evaluations: Students are expected to provide professional and respectful feedback on the quality of instruction in this course by completing course evaluations online via GatorEvals. Guidance on how to give feedback in a professional and respectful manner is available at gatorevals.aa.ufl.edu/students/ . Students will be notified when the evaluation period opens, and can complete evaluations through the email they receive from GatorEvals, in their Canvas course menu under GatorEvals, or via ufl.bluera.com/ufl/. Summaries of course evaluation results are available to students at gatorevals.aa.ufl.edu/public-results/.

Other University Services: U Matter, We Care: Your well-being is important to the University of Florida. The U Matter, We Care initiative is committed to creating a culture of care on our campus by encouraging members of our community to look out for one another and to reach out for help if a member of our community is in need. If you or a friend is in distress, please contact [email protected] so that the U Matter, We Care Team can reach out to the student in distress. A nighttime and weekend crisis counselor is available by phone at 352-392-1575. The U Matter, We Care Team can help connect students to the many other helping resources available including, but not limited to, Victim Advocates, Housing staff, and the Counseling and Wellness Center. Please remember that asking for help is a sign of strength. In case of emergency, call 9-1-1. Sexual Assault Recovery Services(SARS): Student Health Center, 392-1161 University Police Department, 392-1111 ( or 9-1-1 for emergencies) http://www.police.ufl.edu

Basic Inference (Ch. 7-10 in textbook) Variables, Parameters, and Statistics We are often interested in quantitative variables and categorical variables in a population of interest. These variables can be summarized numerically by population parameters. Since parameters are typically unknown, we collect samples and compute sample statistics, which are estimators of parameters based on a sample taken from the population.

1

Data Distribution and Population Distribution

Normal Distribution

2

Sampling Distributions If we collect many different samples from the population and make a dot plot or a histogram of the subsequent sample statistics from every sample, we can see the sampling distribution of the sample statistic. The sample statistics have mean and standard error (standard deviation of the sampling distribution of the estimator). Under certain assumptions, the sampling distribution of the sample statistic will be approximately Normal.

3

Statistical Inference We are often interested in making inference of parameters based on statistics. We use two approaches: confidence intervals and significance tests.

Confidence Intervals A confidence interval (CI) is an interval that gives a reasonable estimate of the unknown parameter. The general format is estimator

standard error

Assumptions for CI’s and Significance Tests

4

Significance Tests A significance test contains a null hypothesis, H0 (what we want to disprove), and an alternative hypothesis Ha (what we want to prove based on the sample). •

Hypotheses



Test Statistic



P-value



Conclusions

NOTES • p-value: Assuming that H0 is true, the p-value is the probability of observing values as extreme as or more extreme than the ones in our sample. If the p-value is small, then H0 is unlikely to be true, so we have evidence for Ha. • significance level: α determines whether we reject H0 or not. Typically, we choose α to be 0.01, 0.05, or 0.10. If p-value ≤ α, reject H0 in favor of Ha. If p-value > α, fail to reject H0. Not enough evidence to reject H0.

5

Relationship Between Confidence Intervals and Significance Tests A confidence interval gives the same interpretation as a two-tailed significance test: • If the confidence interval contains the parameter under H0, then we should fail to reject H0. • If the confidence interval does not contain the parameter under H0, the we should reject H0. • This is especially helpful in comparing two groups. If the confidence interval contains 0, then we should conclude that there is no significant difference between the two groups. We should fail to reject H0 that the difference is 0. • If the confidence interval does not contain 0, then we should conclude that there is a significant difference between the two groups. We should reject H0 and conclude the difference is not 0.

Summary of Common Statistical Inference Procedures

6

Example 1 Do pregnant women who use cocaine have babies with lower birth weight than women who do not use cocaine? Pregnant women were tested for cocaine/crack, and the birth weights of babies (in grams) were recorded and averaged for women who tested positive and those who tested negative separately.

Negative Test Positive Test

n



s

5974 134

3118 2733

672 599

7

Example 2 Many children are diagnosed each year with asthma. In an effort to educate these children about their condition, an educational video was developed. To test the effectiveness of this video, ten randomly selected children, of elementary school age, who had been recently diagnosed, were chosen to participate in a study. A nurse asked the children a series of questions about asthma, then showed them the video and asked the questions again. The children’s scores were as follows:

Child Before

1 61

2 60

3 52

4 74

5 64

6 75

7 42

8 63

9 53

10 56

After

67

62

54

83

60

89

44

67

62

57

8

Example 3 The College Alcohol Study at the Harvard School of Public Health interviews samples of students at 119 colleges periodically and asks questions about their drinking habits and behavior. One of the questions asked was whether they had ever engaged in unplanned sexual activities because of drinking alcohol. In 1993, 2440 out of 12708 students surveyed answered yes to this question, while in 2001, 1871 out of 8783 answered yes. Has there been a significant increase?

9

10

Question

Which case?

Ha

95 % CI

p-value

Was the proportion of “Greeks” who voted for the Republican candidate greater than the proportion of non“Greeks” at UF who voted for him?

(0.202, 0.534)

0.000

Compare the average number of text messages sent by male and female Hume residents.

(-14.89, 20.01)

0.770

Compare the average price of items at newegg.com and Best Buy.

(-62.6, -5.1)

0.022

Are prices more expensive, on average, at the campus convenience store than Publix?

(0.852, 1.616)

0.000

Compare the proportion of females and males at UF that report smoking marijuana at least 3 times per month.

(-0.28, 0.02)

0.084

Do female UF students own more than 15 pairs of shoes, on average?

(18.69, 26.88)

0.000

Are new textbooks less expensive, on average, online than at the UF bookstore?

(-4.52, -11.98)

0.000

Compare the average number of hours per week that male and female UF students use the gym/sports facilities on campus.

(-1.134, 2.081)

0.559

Do the majority of UF students use caffeine to stay awake while studying for finals? A random sample had 93 out of 223 saying yes.

(0.34, 0.47)

0.999

11

Analysis of Variance (Ch. 14 in textbook) 2.1

Design of Experiments

• Response variable: What we want to draw conclusions about • Explanatory (predictor) variables: Do these variables influence the response? Can we use them to predict response? • Treatments: the conditions “applied” to the experimental units • Experimental units: people/objects/animals on which the study is conducted • Replications: # of people per treatment

Types of Experiments

Motivating Example: Which diet is best for losing weight? • Pick diets you want to compare: • Weight watchers • Paleo • South Beach • Special K • Find subjects: Assign them randomly to diets. • Real experiment: Make sure the subjects follow the diets and record results.

12

In our example above, we have: • Response variable:

• Explanatory variable:

• Treatments:

• Experimental units:

• Replications:

To analyze data: Compare average weight loss with _____ diets. ANOVA (analysis of variance) compares the means of 3 or more treatments.

What if: • We added 2 more diets?

• We wanted to compare the 4 diets for males and females?

13

One-Way ANOVA One-way ANOVA contains one factor with different levels. We refer to these levels as groups and we let g = # of groups. The predictor is categorical, but the response is quantitative. In particular, we are interested in comparing different group population means. Example:

ANOVA determines if there is a significant difference in population means by comparing variability BETWEEN groups TEST STATISTIC : F =

variability WITHIN group

• We find a significant difference if:

• We find no significant difference if:

• To determine if there is a significant difference, we look up the test statistic on the F table. – df1 = degrees of freedom in the numerator – df2 = degrees of freedom in the denominator

14

ANOVA Test: • Assumptions:

• Hypothesis:

• Test Statistic:

• P-value from F table: (dfnum,dfdenom)

• Conclusions:

15

Symbols and Notation: • g = number of groups • yij = observation j in group i • ni = number of observations in group i • N = n1 + n2 + ... + ng = Total number of observations • y¯i = average of observations in group i • si = standard deviation for group i • s2i = variance of group i • y¯ = average of all observations

ANOVA Table - to compute the Test Statistic

Sum of Squares

16

Example: Compare average weight loss for 3 diets. We have data for weight loss in pounds after 6 months under three diets. low FAT 22 18 21 25

low CAL 24 21 26 27

Identify: • Response variable:

• Experimental units:

• Factor:

• Treatments: Symbols:

ANOVA Test: •

Assumptions:

17

low CARB 28 27 30 32

Hypotheses:

ANOVA Table and Test Statistic:

p-value:

Conclusions:

18

Multiple Comparison of Means Suppose that we have found that there is a significant difference between at least two groups. How can we determine which groups have a significant difference? 1. Individual C.I. for each mean, µi:

• Interpretation:

• Issue with Multiple Comparisons: Each interval is made at 95% confidence (individual confidence level). But how much confidence do we have in the family of comparisons?

19

2. Tukey’s Honest Significant Difference: •

Family confidence is 95% (a formula is used to give total confidence is 95%). The formula is complicated, an in practice, one would use software rather than computing this by hand.



Individual comparisons have higher confidence level (see example on p. 27).



We obtain pairwise confidence intervals for µi−µj (i.e. the difference in population means for groups i and j). How to interpret?

3. Fisher’s Least Significance Difference: •

Individual confidence is 95%.



Family confidence is smaller (see example on p. 28), but not quite as ...


Similar Free PDFs