04-LDA notes PDF

Title 04-LDA notes
Author Usama Manzoor
Course Stat Digit Sig Proc&Mod
Institution Georgia Institute of Technology
Pages 3
File Size 75.3 KB
File Type PDF
Total Downloads 79
Total Views 138

Summary

Notes for linear discriminant analysis...


Description

Linear discriminant analysis Linear discriminant analysis (LDA) is a common “plug-in” method for classification which operates by estimating πk fX|Y (x|k ) for each class k = 0, . . . , K − 1 and then simply plugging these into the formula for the Bayes classifier in order to make a decision. In LDA we make the (strong) assumption that class conditional pdfs are given by the multivariate normal distribution, but with differing means. Mathematically, this corresponds to the assumption that fX|Y (x|k) =

−1 1 (x−µk ) − 21 (x−µk )T Σ e (2π)d/2 |Σ|1/2

for k = 0, . . . , K − 1. Note that under this assumption, each class has a distinct mean µk , but all classes share the same covariance matrix Σ. In LDA, we assume that µ0 , . . . , µK−1 and Σ, as well as the prior probabilities π0 , . . . , πK−1 are all unknown, but can be estimated from the data. In particular, we can use the estimates |{i : yi = k}| n X 1 b xi µk = |{i : yi = k}| i:y =k b πk =

k

X X 1 K−1 b b k )(xi − µ b k )T . Σ= (xi − µ

n

k=0 i:yi =k

The LDA classifier is then defined by b h(x) = arg max πbk · k

1 b −1 (x− µ − 21 (x− µ b k )T Σ b k) e . b 1/2 (2π)d/2 |Σ| 1

Georgia Tech ECE 6254 Spring 2017; Notes by M. A. Davenport and J. Romberg. Last updated 21:39, January 24, 2017

Since the log is a monotonic transformation (meaning that if x > y then log(x) > log(y)), we can equivalently state the classifier as b bk ) + log h(x) = arg max log (π k

1

1

b 1/2 (2π)d/2 |Σ|

e− 2 (x−µbk )

1 b −1 (x − µ bk ) b k )T Σ = arg max log (πbk ) − (x − µ 2 k 1 b −1 (x − µ bk ) b k ) − log (π b k )T Σ = arg min (x − µ 2 k

TΣ b

−1

(x−µ bk)

!

where the second equality above follows from the fact that log

1 b 1/2 (2π)d/2 | Σ|

!

is constant across all k and so does not affect which k maximizes the expression. It is enlightening to consider what happens in the special case of K = 2 (i.e., binary classification). In this case, LDA results in a b classifier such that h(x) = 1 when

b 0 )T Σ−1 (x − µ b0 ≥ (x − µ b1 )T Σ−1 (x − µ b1 . b 0 )− 2 log π b 1 )− 2 log π (x − µ

We can rewrite this as

b 0 )T Σ−1 (x − µ b 1 )T Σ−1 (x − µ b 0 ) − (x − µ b 1 ) + 2 log (x − µ

πb1 ≥ 0. b π0

Using the fact that Σ is symmetric, which implies that we have

2 Georgia Tech ECE 6254 Spring 2017; Notes by M. A. Davenport and J. Romberg. Last updated 21:39, January 24, 2017

(Σ−1 )T = Σ−1 , we can simplify this rule to b 0 )T Σ−1 (x − µ b 1 )T Σ−1 (x − µ b 0 ) − (x − µ b 1 ) + 2 log 0 ≤ (x − µ b 0T Σ−1 µ b0 b 0T Σ−1 x + µ = xT Σ−1 x − 2µ 

T

−1

− x Σ x−

b 1T Σ−1 x 2µ

+

b 1T Σ−1 µ b1 µ



+ 2 log

b1 π b π0

b 0T Σ−1µ b0 − µ b 1 + 2 log b 0T )Σ−1 x + µ b 1T − µ b 1T Σ−1 µ = 2(µ

πb1 b π0

πb1 b π0

b1 π 1 T −1 1 T −1 b1 Σ µ b0 Σ µ b1 − µ b 0 ))T x + µ b 1 + log . b0 − µ = (Σ−1 (µ b 2 2 π0

Thus, if and

b 0) b1 − µ w = Σ−1 (µ

πb1 1 T −1 1 T −1 b0 − µ b 1 + log , b0 Σ µ b1 Σ µ b= µ 2 2 πb0 we can re-write this as wT x + b ≥ 0.

This is the expression of a simple linear classifier, and thus LDA will always result in a linear classifier.

3 Georgia Tech ECE 6254 Spring 2017; Notes by M. A. Davenport and J. Romberg. Last updated 21:39, January 24, 2017...


Similar Free PDFs
Notes
  • 18 Pages
Notes
  • 12 Pages
Notes
  • 61 Pages
Notes
  • 35 Pages
Notes
  • 19 Pages
Notes
  • 70 Pages
Notes
  • 6 Pages
Notes
  • 35 Pages
Notes
  • 29 Pages
Notes
  • 70 Pages
Notes
  • 6 Pages
Notes
  • 19 Pages
Notes
  • 32 Pages
Notes
  • 28 Pages
Notes
  • 56 Pages