04-LDA notes PDF

Title	04-LDA notes
Author	Usama Manzoor
Course	Stat Digit Sig Proc&Mod
Institution	Georgia Institute of Technology
Pages	3
File Size	75.3 KB
File Type	PDF
Total Downloads	79
Total Views	138

Preview

CLICK TO PREVIEW PDF

Summary

Notes for linear discriminant analysis...

Description

Linear discriminant analysis Linear discriminant analysis (LDA) is a common “plug-in” method for classification which operates by estimating πk fX|Y (x|k ) for each class k = 0, . . . , K − 1 and then simply plugging these into the formula for the Bayes classifier in order to make a decision. In LDA we make the (strong) assumption that class conditional pdfs are given by the multivariate normal distribution, but with differing means. Mathematically, this corresponds to the assumption that fX|Y (x|k) =

−1 1 (x−µk ) − 21 (x−µk )T Σ e (2π)d/2 |Σ|1/2

for k = 0, . . . , K − 1. Note that under this assumption, each class has a distinct mean µk , but all classes share the same covariance matrix Σ. In LDA, we assume that µ0 , . . . , µK−1 and Σ, as well as the prior probabilities π0 , . . . , πK−1 are all unknown, but can be estimated from the data. In particular, we can use the estimates |{i : yi = k}| n X 1 b xi µk = |{i : yi = k}| i:y =k b πk =

k

X X 1 K−1 b b k )(xi − µ b k )T . Σ= (xi − µ

n

k=0 i:yi =k

The LDA classifier is then defined by b h(x) = arg max πbk · k

1 b −1 (x− µ − 21 (x− µ b k )T Σ b k) e . b 1/2 (2π)d/2 |Σ| 1

Georgia Tech ECE 6254 Spring 2017; Notes by M. A. Davenport and J. Romberg. Last updated 21:39, January 24, 2017

Since the log is a monotonic transformation (meaning that if x > y then log(x) > log(y)), we can equivalently state the classifier as b bk ) + log h(x) = arg max log (π k

1

1

b 1/2 (2π)d/2 |Σ|

e− 2 (x−µbk )

1 b −1 (x − µ bk ) b k )T Σ = arg max log (πbk ) − (x − µ 2 k 1 b −1 (x − µ bk ) b k ) − log (π b k )T Σ = arg min (x − µ 2 k

TΣ b

−1

(x−µ bk)

!

where the second equality above follows from the fact that log

1 b 1/2 (2π)d/2 | Σ|

!

is constant across all k and so does not affect which k maximizes the expression. It is enlightening to consider what happens in the special case of K = 2 (i.e., binary classification). In this case, LDA results in a b classifier such that h(x) = 1 when

b 0 )T Σ−1 (x − µ b0 ≥ (x − µ b1 )T Σ−1 (x − µ b1 . b 0 )− 2 log π b 1 )− 2 log π (x − µ

We can rewrite this as

b 0 )T Σ−1 (x − µ b 1 )T Σ−1 (x − µ b 0 ) − (x − µ b 1 ) + 2 log (x − µ

πb1 ≥ 0. b π0

Using the fact that Σ is symmetric, which implies that we have

2 Georgia Tech ECE 6254 Spring 2017; Notes by M. A. Davenport and J. Romberg. Last updated 21:39, January 24, 2017

(Σ−1 )T = Σ−1 , we can simplify this rule to b 0 )T Σ−1 (x − µ b 1 )T Σ−1 (x − µ b 0 ) − (x − µ b 1 ) + 2 log 0 ≤ (x − µ b 0T Σ−1 µ b0 b 0T Σ−1 x + µ = xT Σ−1 x − 2µ 

T

−1

− x Σ x−

b 1T Σ−1 x 2µ

+

b 1T Σ−1 µ b1 µ



+ 2 log

b1 π b π0

b 0T Σ−1µ b0 − µ b 1 + 2 log b 0T )Σ−1 x + µ b 1T − µ b 1T Σ−1 µ = 2(µ

πb1 b π0

πb1 b π0

b1 π 1 T −1 1 T −1 b1 Σ µ b0 Σ µ b1 − µ b 0 ))T x + µ b 1 + log . b0 − µ = (Σ−1 (µ b 2 2 π0

Thus, if and

b 0) b1 − µ w = Σ−1 (µ

πb1 1 T −1 1 T −1 b0 − µ b 1 + log , b0 Σ µ b1 Σ µ b= µ 2 2 πb0 we can re-write this as wT x + b ≥ 0.

This is the expression of a simple linear classifier, and thus LDA will always result in a linear classifier.

3 Georgia Tech ECE 6254 Spring 2017; Notes by M. A. Davenport and J. Romberg. Last updated 21:39, January 24, 2017...

04-LDA notes PDF

Preview

Summary

Description

Similar Free PDFs

Notes

Notes

Notes

Notes

Notes

Notes

Notes

Notes

Notes LAW121 (A+ NOTES!)

Notes

Notes

Notes

Notes

Notes

Notes

Notes

Popular Institutions