STAT 4444 Lecture 17 Linear Regression Using the Bayesian Approach PDF

Title	STAT 4444 Lecture 17 Linear Regression Using the Bayesian Approach
Author	M.R. Smith
Course	Applied Bayesian Statistics I
Institution	Virginia Polytechnic Institute and State University
Pages	3
File Size	87.8 KB
File Type	PDF
Total Downloads	94
Total Views	150

Preview

CLICK TO PREVIEW PDF

Summary

Prof: H Mahmoud...

Description

Applied Bayesian Statistics

Daniel Eisert

April 13, 2020

Lecture 17: Linear Regression with the Bayesian Approach Class Business • Homework IV is due on April 17 at 11:59 P.M.

We can use the same Bayesian formula for linear regression. p(β0 , β1 , σ 2 |y, x) ∝ L(β0 , β1 , σ 2 |y, x) × p(β0 , β1 , σ 2 ) where the centered regression model is still yi = β0 + β1 (xi − x ¯ ) + ǫi where ei is normally distributed with mean 0 and variance σ 2 . Similarly, yi follows a normal distribution with E[yi ] = β0 + β1 (xi − x ¯)

V ar [yi ] = σ 2 ,

so the data function can be written as yi |xi , β0 , β1 , σ 2 follows N (β0 + β1 (xi − x ¯), σ 2 ),

i = 1, 2, ..., n.

The likelihood function for β0 , β1 , σ 2 |yi , xi follows N (β0 + β1 (xi − x ¯), σ 2 ) The parameter of greatest interest in Bayesian simple linear regression is usually the slope β1 . Thus, we need to find a joint posterior density of all three regression parameters before integrating out β0 and σ 2 to get the posterior density of β1 .

1

Likelihood of β0 , β1 , σ 2

The likelihood of y1 , y2 , ..., yn is the product of each likelihood p(yi |xi , β0 , β1 , σ 2 ). 2

p(y|β0 , β1 , σ ) =

n Y

p(yi |xi , β0 , β1 , σ 2 )

i=1

1 ∝ exp σ

Daniel Eisert



  1 (yn − β0 − β1 (xn − x ¯))2 (y1 − β0 − β1 (x1 − x ¯)) × · · · × exp − − 2σ 2 2σ 2 σ   Pn (yi − β0 − β1 (x − i − x ¯))2 1 ∝ 2 n exp − i=1 2σ 2 (σ ) 2  2

Page 1

Applied Bayesian Statistics

Daniel Eisert

2

April 13, 2020

Prior Distribution of β0 , β1 , σ2

Use a flat prior for β0 , β1 , and an inverse gamma prior with both parameters going to 0 for σ 2 . Multiplying flat priors (proportional to a constant over the whole real line) on both coefficients times the inverse gamma yields the following: p(β0 ) ∝ 1

p(σ 2 ) ∝

p(β1 ) ∝ 1

1 σ2

Find the joint prior distributions. p(β0 , β1 , σ 2 ) ∝ p(β0 ) × p(β1 ) × p(σ 2 ) This multiplication yields the prior distribution for β0 , β1 , σ 2 . p(β0 , β1 , σ 2 ) ∝

1 , σ2

−∞ < β0 , β1 < ∞, 0 < σ 2 < ∞

This prior can be approximated in OpenBUGS with either a vague normal (or dflat()) priors on β0 , β1 and a very vague gamma prior on the precision parameter.

3

Posterior Distribution of β0 , β1 , σ 2

Apply Bayes’ rule to derive the joint posterior distribution after observing data. Recall that Bayes’ rule states that the joint posterior distribution of β0 , β1 , σ 2 is proportional to the product of the likelihood and the joint prior distribution. p(β0 , β1 , σ 2 |y, x) ∝ L(β0 , β1 , σ 2 |y, x) × p(β0 , β1 , σ 2 )   Pn 1 ¯))2 1 i=1 (yi − β0 − β1 (xi − x × 2 ∝ 2 n exp − 2 2σ σ (σ ) 2   Pn ¯))2 1 i=1 (yi − β0 − β1 (xi − x ∝ − n+2 exp 2σ 2 (σ 2 ) 2

3.1

Marginal Posterior Distributions of β0 , β1 , σ 2

To obtain the marginal posterior distribution of β0 , integrate β1 , σ 2 out from the joint posterior distribution.  Z ∞ Z ∞ 2 p(β0 |y) = p(β0 , β1 , σ |y)dβ1 dσ 2 0

−∞

It can also be shown that the marginal posterior distribution of β0 follows a Student’s t-distribution 2

s t(βˆ0 , , n − 2) n as a distribution with mean β0 , a scale parameter

s2 n

, and n − 2 degrees of freedom.

Recall that SSR = Daniel Eisert

X

¯))2 = (y − i − βˆ0 − βˆ1 (xi − x

X

(yi − y ˆ )2

s2 =

SSRes n−2 Page 2

Daniel Eisert

Applied Bayesian Statistics

April 13, 2020

Similarly, we can integrate to obtain the marginal posterior distribution of β1 . Putting the messy calculations aside, we can show that p(β1 y) follows a Student’s t-distribution   s2 ˆ P ,n − 2 t β1 , (xi − x ¯ )2

Similarly, σ 2

  SSR n−2 ,β = IG α = 2 2 The posterior mean for an inverse gamma distribution is SSR/2 SSR/2 SSR/2 SSR β = = n = = α−1 ((n − 2)/2) − 1 n/2 n ( 2 − 1) − 1 Based on these posterior distributions, estimation and credible sets of the intercept and slope will be the same as in the frequentist approach, but the variance will NOT be the same. When performing equal-tail credible sets for parameter θ , H0 : θ = 0

3.2

HA : θ 6= 0

Proper Posterior Densities

When using an improper prior, one must verify that the data provides enough information to make the posterior density proper. For simple linear regression and the standard improper joint prior, the data requires that • the sample size n must be strictly greater than two, and • not all covariate values are equal.

Daniel Eisert

Page 3...