Lecture Notes, Lecture 2 - GLM 1 PDF

Title	Lecture Notes, Lecture 2 - GLM 1
Course	Statistical Modelling 501
Institution	Curtin University
Pages	12
File Size	282.3 KB
File Type	PDF
Total Downloads	89
Total Views	133

Preview

CLICK TO PREVIEW PDF

Summary

GLM 1...

Description

GENERALIZED LINEAR MODELS (GLMs) In the theory of general linear models (or classical multiple linear regression), the underlying assumptions are that: the error terms are normally distributed, the error variances are constant and independent of the mean, and the effects are additive. However, in practise these assumptions may not hold true. Hence, in GLM we allow the distribution to be non Normal. For example, In Risk Theory: 

Claim Size, X also known as the Claim Severity may follow distributions such as : Normal ( μ , σ 2 ) ,exp ( λ ) ,Gamma ( α , β ) ,etc .



Number of Claims, N also known as the Claim Frequency may follow distributions such as : Binomial ( n , p ) , Poisson ( λ ) , etc .



Expected values of interest: E ( X ):

E ( N): Many factors: age, sex, etc. What are factors? How do the factors influence the expected values?



In statistics, the generalised linear model (GLM) is a useful generalisation of ordinary least squares regression.



It relates the random distribution of the measured variable of the experiment (the distribution function) to the systematic (non-random) portion of the experiment (the linear predictor) through a function called the link function.



The subject of generalised linear models was formulated by Nelder, John; Robert Wedderburn (1972). “Generalized Linear Models”. Journal of the Royal Statistical Society. Series A (General) 135: 370-384. as a way of unifying various other statistical models under one framework, allowing for one general method of efficiently performing maximum likelihood estimation for these models.

Example The following study was carried out into the amount of claims that is related to the factor of age. The claim size and the age of the insured in a portfolio of 20 policies were recorded. Suppose that

Y i represents the claim size of the ith insured’s age (i = 1, 2, …, 20).

The response variable Y i are assumed to be exponentially distributed, exp ( λi ) , λ i>0. A This will ensure that possible specification for E ( Y i)=1 /λ i is E ( Y i) =exp( α + β x i) . E ( Y i) is non-negative for all values of xi.

Components of Generalised Linear Models There are 3 components of a generalised linear model (or GLM): 1. Random Component – identify the response variable (Y) and specify/assume a probability distribution for it. 2. Systematic Component – specify what the explanatory or predictor variables are (e.g. X 1 , X 2 , etc). These variables enter in a linear manner α + β1 X 1 + β 2 X 2+ ….. β k X k 3. Link Function– Specify the relationship between the mean or expected value of the random component (i.e. E(Y )¿ and the systematic component. In other words the link function describes a function of the mean value which can be described linearly by the explanatory variables.

Model components: The GLM consists of three elements. 1. A distribution function f, from the exponential family. 2. A linear predictor

η= Xβ .

3. A link function g such that

−1 E ( Y )=μ=g (η) .

ie,  g ( ), where g is a monotonic and a differentiable function. Note: In the classical linear model :

1)

  , called the identity link. In this case,

2) random component follows a Normal distribution 3) systematic component is linear in covariates

Exponential family 

Definition: A response is modelled by a linear combination of the predictors related through a link function.



The response variable must be a member of the exponential family of distributions: f ( y|θ , ϕ)=exp

[

yθ−b(θ) + c ( y ), ϕ a(ϕ )

]



The θ parameter is called the canonical parameter( or the natural parameter) and is the measure of location.  is a function of E(Y) =  only.



The ϕ parameter is called the dispersion parameter( or the scale parameter) and is the measure of scale.



Notice the use of “a”, “b”, and “c”, which are functions.



Exponential family: Normal distribution

The Normal Distribution 

Start with the standard expression: y−μ ¿ ¿ ¿ −¿ 1 exp ¿ f ( y |θ , ϕ)= √2 π σ 2

and re-arrange to: f ( y |θ , ϕ)=exp 

[

(

)]

2 yμ−μ 2 /2 1 y − +log( 2 π σ 2 ) 2 2 2 σ σ

To put this in the canonical form above, re-express as: o

θ=μ , thus the natural parameter is 



o

2 ϕ=σ 2 , scale parameter is 

o

a( ϕ)=ϕ

o

b( θ ) =

o

c ( y , ϕ)=−[

θ2 2 y2 +log ( 2 πϕ)]/2 ϕ

Exponential family: Poisson distribution

Poisson Distribution 

Start with the standard expression:

e−μ μ y y! and re-arrange to: f ( y |θ , ϕ)=

f ( y|θ , ϕ)=exp ( y log μ−μ−log y ! ) .





To put this in the canonical form above, re-express as: o

θ=log(μ)

o

ϕ=1

o

a( ϕ)=1

o

b( θ ) =μ =exp(θ)

o

c ( y , ϕ)=−log y !

so that

Exponential family: Binomial distribution

Binomial Distribution 



Start with the standard expression: Z binomial ( n , μ ) Since require θ to be a function of Divide random variable by n

μ .

so that distribution will have a mean Z Z binomial ( n , μ ) then let Y = so Z =nY n

μ .

()

n−z n z f ( z|θ , ϕ)= μ ( 1−μ ) z then n−ny f ( y |θ , ϕ)= n μny (1−μ ) ny and re-arrange to:

( )

[

( )]

f ( y |θ , ϕ)=exp n( y log μ+ (1− y ) log (1 −μ ) ) +log n ny

[(

) ( )]

μ + log ( 1−μ ) +log n ¿ exp n y log 1−μ ny 



To put this in the canonical form above, re-express as: μ , called the “ logit link” 1−μ

o

θ=log

o

ϕ=n

o

a ( ϕ)=1/ϕ

o

b( θ ) =−log(1−μ )=log [ 1+ exp ( θ )]

o

c ( y , ϕ)=log n ny

( )

Exponential family: Gamma distribution

Gamma Distribution 

Start with the standard expression: λα α−1 −λy y e f ( y |θ , ϕ)= Γ(α )



Then change parameters from α

and

λ

to α

and μ=

λα α−1 −λy αα y e = α y α−1 e− yα/ μ Γ(α ) ( ) μ Γα and re-arrange to: f ( y |θ , ϕ)=

f ( y |θ , ϕ)=exp 

([ −μy −log μ ) α+ (α−1 ) log y +α log α− log Γ (α )]

To put this in the canonical form above, re-express as: o

θ=

−1 μ

α λ

i.e.

λ=

α μ

o



=α ϕ

1 ϕ

o

a( ϕ)=

o

b( θ ) =−log (−θ )

o

c ( y , ϕ)= (ϕ−1 ) log y +ϕlog ϕ−log Γ (ϕ)

Exponential family: Moments

Moments of the Exponential Family Form  Mean and Variance: E ( Y )= μ=b (θ) '

'' Var (Y )=b (θ)a (ϕ)

Prove these using well known results from statistical theory:   l 2    2l  l E E     0   2  E( )  0           and . Solution: The log likelihood function for a member of an exponential family is:

Examples GLM: Normal linear models Linear (in the parameters) model for continuous/numerical response variable (Y) and continuous and/or discrete explanatory variables (X’s). Y i=α + β 1 x 1 i+ β2 x 2i +e i where 2 e i N (0, σ ) and independent

This linear model includes  Multiple regression  ANOVA  ANCOVA Y i=α +β x i +ϵ i 

Random component: Y is the response variable and is normally distributed… generally we assume ϵi N (0, σ 2 ) ,  X , the exploratory variable is linear in the parameters… α+ β x i  Identity link. g( E ( Y i ) )=E (Y i ) =α+ β xi Closer look at each of these components…. GLM: Non-normal case Random Component Let N = sample size and suppose that we have Y1, Y2,…, YN observations on our response variable and that the observations are all independent. Y’s that are discrete variables where Y is either Dichotomous (binary) with a fixed number of trials.    

Success/failure Correct/incorrect Agree/disagree Academic/non-academic program These responses have a Binomial distribution.

Counts (including cells of a contingency table):  Number of people who die from AIDS during a given time period.  Number of times a child tries to take a toy away from another child.  Number of times patents generated by firms. These responses have a Poisson distribution.

Thus the two distributions we will be primarily using are 

Binomial



Poisson

Problem with the Traditional Approach 

A transformation that produces constant variance may not yield normally distributed response. Counts that have a Poisson distribution where

E (Y )=nπ

Binomial distributed responses where 

E (Y )= μ

and Var (Y ) =μ .

and Var (Y ) =nπ (1−π) .

Linear models often fit discrete data very badly – they can yield predicted values of that are outside the range of possible values of Y.

μ

 Consider counts that have a Poisson distribution where Y ≥0 .  Consider Binomial distributed responses where

0 ≤ π ≤1 .

 Linear models can yield negative predictions. 

Linear Predictor and Link Function

The Link Function

“Left hand” side of an equation/model – the random component; that is

“Right hand” side of the equation – the systematic component; that is α + β1 x 1+ β 2 x2 +…+ β k x k

E (Y )= μ

We now need to “link” the two sides. How is

μ= E ( Y )

related to α+ β1 x 1+ β 2 x2 +…+ β k x k ?

We do this using a “Link Function”

⟹ g (μ)

g ( μ )=α + β 1 x 1+ β2 x 2+ … + β k x k More about the Link Function 

Important things about  This function g(.) larger (or smaller).

g(.) :

is “monotone” – as the systematic part gets larger,

 The relationship between 

Some common links are

E(Y )

μ gets

and the systematic part can be non-linear.

 Identity (ordinary regression, ANOVA, ANCOVA): E (Y )= α+ βx  Log link which is often used when Y is non-negative (i.e.

0 ≤Y ¿

log(E (Y ) )=log ( μ ) =α+ βx This yields a “loglinear” model. (e.g. when response is  Logit link, which is often used when 0 ≤ μ ≤ 1 dichotomous/binary and we’re interested in a probability). μ ( 1−μ )=¿ α+ βx log ¿ If    , the canonical parameter for the distribution, we have a canonical link .

Canonical Link Function Summary Family Normal Poisson Binomial

( )

Gamma Inverse Gaussian(IG) Note: f IG ( y|μ , λ) =

(

λ 3 2π y

Canonical Link η=μ η=log μ μ η=log 1−μ η=μ−1 −2 η=μ

)

1 /2

exp

[

]

− λ ( y−μ)2 , 2 2μ y

Variance Function 1 μ μ(1−μ) μ2 3 μ

y , μ , λ>0.

1  for gamma distribution. The minus sign is dropped in the Earlier, we showed that canonical link function. This does not affect anything as the constants will be absorbed into the parameters in the linear predictor. Several other link functions exist, but for actuarial applications these above link functions are often sufficient.

 



Linear Predictor 

  i xi

is a function of the covariates, xi (also known as explanatory or predictor variables) For example, we might expect the mean claim size to be a function of the age of   0  1 x the driver. In this case age x, would be a covariate and i

    α i+βx

Or the Explanatory variable (factors) x, might be the sex of policyholder Age of policyholder – actual value Sex of policyholder – categorical value i.e. 2 categories, α 1 for male, and α 2 for female Model with age and sex effects could have a linear predictor

where i = 1 for a male and i = 2 for a female. Notice that with this model, the effect of the age of the policy holder is the same whether the policy holder is male or female. In other words there are no interaction effects and the lines would be parallel. 

If there is interaction between the two covariates, age and sex, then we have:

η=α i + β i x (2 non-parallel straight lines)



Model with the main effects for two factors and their interaction has a linear predictor:

α i+β j+γ ij Notation for models model

linear predictor

age

β 0+ β 1 x

sex

αi

age + sex

α i+ β x

age + sex +age.sex

α i+β i x

age*sex

α i+ β i x

The last two models are equivalent. These are shown separately to illustrate the notation. Example In a motor insurance business, vehicle-rating group is also used as a factor. Vehicles are divided into twenty categories numbered 1 to 20, with group 20 including those vehicles that are most expensive to repair. Suppose that we have a three-factor model specified as age*(sex + vehicle category). What would the linear predictor be for a model of this type? Solution

Model Fitting   

Estimate GLM parameters using MLE l( y ;θ , φ)=log ( f ( y ; θ ,φ ) ) depends on parameters in linear predictor through link function Maximise l wrt parameters in linear predictor

Example The following study was carried out into the amount of claims that is related to the factor of age. The claim size and the age of the insured in a portfolio of 20 policies were recorded. Suppose that Y i represents the claim size of the ith insured and logarithm (to the base 10) of the ith insured’s age (i = 1,2,…20).

xi

The response variable Y i is assumed to be exponentially distributed. Y Y specification for E(¿¿i) is E(¿ ¿i)=exp ( α + β x i ) . This will ensure that ¿ ¿ non-negative for all values of x i .

represent the

A possible Y E(¿ ¿i) is ¿ ηi=α + β xi .

(i)

Write down the natural link function associated with the linear predictor

(ii)

Use this link function and linear predictor to derive the equations that must be solved in order to obtain the maximum likelihood estimates of α and β .

Solution

Example You are given the following data for claims using the above model: Age xi (years) Claim amount ($)

yi

4 50

8 52

10 119

11 41

Write down the equations for the MLEs for α Solution

and β ....