Lecture 18 - dasdad PDF

Title	Lecture 18 - dasdad
Course	International Trade Policy
Institution	University of Michigan
Pages	7
File Size	204.3 KB
File Type	PDF
Total Downloads	77
Total Views	132

Preview

CLICK TO PREVIEW PDF

Summary

dasdad...

Description

Lecture 18 Categorical Data Analysis II (Logistic Regression) Logistic regression is used in a wide variety of applications including business, finance, economics, genetics, and medicine. We will focus on logistic regression for categorical data. That is, the data can be represented in the form of a contingency table. The response variable Y is assumed to follow a Binomial distribution, and the explanatory or predictor variable x which is given, is categorical (eg. Yes, No). We will see that certain parameters in this model can be interpreted as estimates of odds ratios on the log scale. The Model: (single predictor x) The logistic regression model for a single predictor variable x is yi ∼ Binomial(ni , pi ) and

(

pi log 1 − pi

)

i = 1, . . . , N

= α + β xi

where • the yi are independent; • α and β are unknown regression parameters to be estimated by maximum likelihood; • xi is a fixed predictor variable. The MLEs: It can be shown that the derivatives with respect to α and β of the log-likelihood equations are =

= which cannot be solved for α and β by hand because these equations are nonlinear in α and β. If you have studied numerical methods then you know there are iterative techniques for solving systems of nonlinear equations (eg. Newton Raphson). R uses a modified version of the Newton Raphson algorithm (Fisher scoring) that iterates until convergence to the MLEs. Main Point: We have to use R (or some other software) to get the MLEs. Logistic Regression for the 2x2 Table Example: Aspirin Use and Myocardial Infarction (continued from Lecture 17)

Myocardial Infarction Yes No Total Placebo 189 10,845 11,034 Aspirin 104 10,933 11,037 Total 293 21,778 22,071 We can fit the logistic regression model log

(

pi 1 − pi

)

= α + βxi

i = 1, 2

to the myocardial infarction data. Note that i = 1, 2 corresponds to the two rows of the data table. The assumptions of the model are that y1 ∼ Binomial(n1 , p1 )

indep

y2 ∼ Binomial(n2 , p2 )

with the corresponding notation in the following table. Row(X) 1 2

Column (Y) 1 2 Total y1 n 1 − y1 n1 (p1 ) (1 − p1 ) (1.0) y2 n 2 − y2 n2 (p2 ) (1 − p2 ) (1.0)

In other words, each row i in the table corresponds to an independepent Binomial observation such that yi ∼ Binomial(ni , pi ). Since xi is a fixed predictor variable, it corresponds to the levels of placebo and aspirin. The predictor xi is a design variable chosen by the experimenter. We will set x1 = 1 for placebo and x2 = 0 for aspirin. The reason for doing this will become clear and is explained next. Interpretation of the Parameter β

The R Code: ydat...