Title | 04-LDA notes |
---|---|
Author | Usama Manzoor |
Course | Stat Digit Sig Proc&Mod |
Institution | Georgia Institute of Technology |
Pages | 3 |
File Size | 75.3 KB |
File Type | |
Total Downloads | 79 |
Total Views | 138 |
Notes for linear discriminant analysis...
Linear discriminant analysis Linear discriminant analysis (LDA) is a common “plug-in” method for classification which operates by estimating πk fX|Y (x|k ) for each class k = 0, . . . , K − 1 and then simply plugging these into the formula for the Bayes classifier in order to make a decision. In LDA we make the (strong) assumption that class conditional pdfs are given by the multivariate normal distribution, but with differing means. Mathematically, this corresponds to the assumption that fX|Y (x|k) =
−1 1 (x−µk ) − 21 (x−µk )T Σ e (2π)d/2 |Σ|1/2
for k = 0, . . . , K − 1. Note that under this assumption, each class has a distinct mean µk , but all classes share the same covariance matrix Σ. In LDA, we assume that µ0 , . . . , µK−1 and Σ, as well as the prior probabilities π0 , . . . , πK−1 are all unknown, but can be estimated from the data. In particular, we can use the estimates |{i : yi = k}| n X 1 b xi µk = |{i : yi = k}| i:y =k b πk =
k
X X 1 K−1 b b k )(xi − µ b k )T . Σ= (xi − µ
n
k=0 i:yi =k
The LDA classifier is then defined by b h(x) = arg max πbk · k
1 b −1 (x− µ − 21 (x− µ b k )T Σ b k) e . b 1/2 (2π)d/2 |Σ| 1
Georgia Tech ECE 6254 Spring 2017; Notes by M. A. Davenport and J. Romberg. Last updated 21:39, January 24, 2017
Since the log is a monotonic transformation (meaning that if x > y then log(x) > log(y)), we can equivalently state the classifier as b bk ) + log h(x) = arg max log (π k
1
1
b 1/2 (2π)d/2 |Σ|
e− 2 (x−µbk )
1 b −1 (x − µ bk ) b k )T Σ = arg max log (πbk ) − (x − µ 2 k 1 b −1 (x − µ bk ) b k ) − log (π b k )T Σ = arg min (x − µ 2 k
TΣ b
−1
(x−µ bk)
!
where the second equality above follows from the fact that log
1 b 1/2 (2π)d/2 | Σ|
!
is constant across all k and so does not affect which k maximizes the expression. It is enlightening to consider what happens in the special case of K = 2 (i.e., binary classification). In this case, LDA results in a b classifier such that h(x) = 1 when
b 0 )T Σ−1 (x − µ b0 ≥ (x − µ b1 )T Σ−1 (x − µ b1 . b 0 )− 2 log π b 1 )− 2 log π (x − µ
We can rewrite this as
b 0 )T Σ−1 (x − µ b 1 )T Σ−1 (x − µ b 0 ) − (x − µ b 1 ) + 2 log (x − µ
πb1 ≥ 0. b π0
Using the fact that Σ is symmetric, which implies that we have
2 Georgia Tech ECE 6254 Spring 2017; Notes by M. A. Davenport and J. Romberg. Last updated 21:39, January 24, 2017
(Σ−1 )T = Σ−1 , we can simplify this rule to b 0 )T Σ−1 (x − µ b 1 )T Σ−1 (x − µ b 0 ) − (x − µ b 1 ) + 2 log 0 ≤ (x − µ b 0T Σ−1 µ b0 b 0T Σ−1 x + µ = xT Σ−1 x − 2µ
T
−1
− x Σ x−
b 1T Σ−1 x 2µ
+
b 1T Σ−1 µ b1 µ
+ 2 log
b1 π b π0
b 0T Σ−1µ b0 − µ b 1 + 2 log b 0T )Σ−1 x + µ b 1T − µ b 1T Σ−1 µ = 2(µ
πb1 b π0
πb1 b π0
b1 π 1 T −1 1 T −1 b1 Σ µ b0 Σ µ b1 − µ b 0 ))T x + µ b 1 + log . b0 − µ = (Σ−1 (µ b 2 2 π0
Thus, if and
b 0) b1 − µ w = Σ−1 (µ
πb1 1 T −1 1 T −1 b0 − µ b 1 + log , b0 Σ µ b1 Σ µ b= µ 2 2 πb0 we can re-write this as wT x + b ≥ 0.
This is the expression of a simple linear classifier, and thus LDA will always result in a linear classifier.
3 Georgia Tech ECE 6254 Spring 2017; Notes by M. A. Davenport and J. Romberg. Last updated 21:39, January 24, 2017...