Ee278 hw4 sol - just some docs for the course not very helpful but might as well upload PDF

Title	Ee278 hw4 sol - just some docs for the course not very helpful but might as well upload
Course	Latino/a Literature
Institution	Texas A&M University
Pages	11
File Size	196.9 KB
File Type	PDF
Total Downloads	39
Total Views	125

Preview

CLICK TO PREVIEW PDF

Summary

just some docs for the course not very helpful but might as well upload...

Description

EE 278 Introduction to Statistical Signal Processing

Friday, October 26, 2018

Homework #4 Solutions Please submit your assignment as a PDF to the class Gradescope page. 1. Deriving the Conditional pdf of a Gaussian Random Vector: Let Z be an n + m dimensional Gaussian random column vector with mean vector µZ and covariance matrix ΣZ . The random vector Z is partitioned into a n-vector X and an m-vector Y as shown below:       X µX ΣX ΣXY Z= µZ = ΣZ = ΣYX ΣY Y µY We wish to find an expression of fX|Y (x|y). We shall do this in three steps. Express all results in terms of µX , µY , ΣX , ΣY , and ΣXY . a. For any n × m matrix A, it is clear that the transformation from Z to Z′ given by:    ′ X − AY X Z′ = = Y Y is one-to-one; i.e knowledge of X and Y is equivalent to knowledge of X′ = X − AY and Y . Find the matrix A which makes X′ and Y statistically independent random variables. b. Using the matrix A found in part (a), find fX′ (x′ ), the marginal density of X′ (Hint: write X′ as a linear transformation of Z). c. Use the transformation of part (a) to write fX′ ,Y (x′ , y). Now use the statistical independence of X′ and Y and the result of part (b) to find fX|Y (x|y). Solution a. Since Z is Gaussian, X and Y are jointly Gaussian vectors, and Z′ is a linear transform of Z, so Z′ is also Gaussian. For Gaussian, uncorrelatedness and independence are equivalent, so to make X′ and Y independent we only need to make ΛX ′ Y = 0. T

ΛX ′ Y = E[X′ Y ] − µX′ µY T

= E[(X − AY)Y T ] − (µX − AµY )µYT = ΛXY − AΛY = 0.

So, ΛXY = AΛY and therefore: A = ΛXY Λ−1 Y b. Since X′ is Gaussian, we only need its mean and covariance to determine its density.     X ′ X = X − AY = I −A Y

  Let T= I −A , then X′ = T Z, and

ΛX ′ = T ΛZ T T = ΛX − ΛXY AT − AΛTXY + AΛY AT = ΛX − ΛXY ΛY−1 ΛTXY .

Therefore: T fX′ (x′ ) = N (µX − ΛXY ΛY−1µY , ΛX − ΛXY Λ−1 Y Λ XY ).

c. X′ and Y are now independent, and hence: fX′ ,Y (x′ , y) = fX′ (x′ )fY (y). We have, fX,Y (x, y) = fX′ ,Y (x − Ay, y), and therefore, fX|Y (x|y) = fX′ |Y (x − Ay|Y = y) = fX′ (x − Ay)

−1 T ). = N (µX − ΛXY Λ−1 Y µY + Ay, ΛX − ΛXY Λ Y ΛXY −1 T −1 = N (ΛXY ΛY (y − µY ) + µX , ΛX − ΛXY ΛY ΛXY ).

2. Definition of Gaussian random vector. In lecture notes #3 we defined Gaussian random vectors via the joint pdf. There are other equivalent definitions, including the following very revealing definition. A random vector X with mean µ and covariance matrix Σ is a GRV if and only if Y = aT X is Gaussian for every real vector a 6= 0. In the lecture notes (Property 2) we showed that any linear transformation of a GRV results in a GRV. Thus the definition given in the lecture notes implies this new definition. In this problem you will prove the converse, i.e., that the new definition implies that the joint pdf of X has the form given in the lecture notes. You will do this using the characteristic function as follows: a. Write down the definition of the characteristic function for X. b. Define Y = ω T X. Note that the characteristic function of X reduces to the characteristic function of Y evaluated at ω = 1. c. By the new definition, Y is Gaussian. Use this fact to write the characteristic function of X in terms of the mean and variance of Y . d. Write down the mean and variance of Y in terms of ω and the mean and covariance matrix of X and substitute in the characteristic function of X. e. Conclude that the joint pdf of X is Gaussian. Solution a. Just copying the definition from the lecture notes:  T  ΦX (ω) = E eiω X . b. Let Y = ω T X . Then ΦX (ω) = E(eiY ) = ΦY (1) .

Page 2 of 11

EE 278, Autumn 2018

c. Since Y is Gaussian by the new definition, its characteristic function is 1

ΦY (ω) = e− 2 ω Thus

2 σ2 Y

+iωµY

1

.

2

ΦX (ω) = ΦY (1) = e− 2 σ Y +iµY . d. By linearity (lecture notes #3), µY = ω T µ and σY2 = ω T Σω . e. Combining the previous two steps, 1

ΦX (ω) = e− 2 ω

T Σω +iω T µ

,

which is the characteristic function of a Gaussian pdf with mean µ and covariance matrix Σ. Therefore X is a a Gaussian random vector. Note: This proof shows the power of the characteristic function. Try to prove this without using the characteristic function! 3. Gaussian random vector. Suppose X ∼ N (µ, Σ) is a Gaussian random vector with     1 1 1 0 µ = 5 and Σ = 1 4 0  . 2 0 0 9 a. Find the PDF of X1 .

b. Find the PDF of X2 + X3 . c. Find the PDF of 2X1 + X2 + X3 . d. Find the PDF of X3 given (X1 , X2 ). e. Find the PDF of (X2 , X3 ) given X1 . f. Find the PDF of X1 given (X2 , X3 ). g. Find P{2X1 + X2 + X3 < 0}.

  2 1 1 h. Find the joint PDF of Y = AX, where A = . 1 −1 1 Solution a. The marginal PDFs of a jointly Gaussian PDF are Gaussian. Therefore X1 ∼ N (1, 1). b. Since X2 and X3 are independent (σ23 = 0), the variance of the sum is the sum of the variances. Also, the sum of two jointly Gaussian random variables is also Gaussian. Therefore X2 + X3 ∼ N (7, 13).

c. Since 2X1 + X2 + X3 is a linear transformation of a Gaussian random vector,     X1 2X1 + X2 + X3 = 2 1 1 X2 , X3 Homework #4 Solutions

Page 3 of 11

it is a Gaussian random vector with mean and variance    1 1 1 0 2       µ = 2 1 1 5  = 9 and σ 2 = 2 1 1 1 4 0  1 = 21.

2 Thus 2X1 + X2 + X3 ∼ N (9, 21).

0 0 9

1

d. Since σ13 = 0, X3 and X1 are uncorrelated and hence independent since they are jointly Gaussian; similarly, since σ23 = 0, X3 and X2 are independent. Therefore the conditional PDF of X3 given (X1 , X2 ) is the same as the PDF of X3 , which is N (2, 9).

e. We use the general formula for the conditional Gaussian PDF:

−1 −1 Σ12 ) X2 |{X1 = x1 } ∼ N (Σ21 Σ11 (x − µ1 ) + µ2 , Σ22 − Σ21 Σ11

In the case of (X2 , X3 )|X1 , Σ11 = [1] , Σ21

    4 0 1 , Σ22 = . = 0 0 9

Therefore the mean and variance of (X2 , X3 ) given X1 = x1 are       5 x1 + 4 1 −1 [1] [x1 − 1] + = µ(X2 ,X3 )|X1 = 2 2 0            4 0 1  4 0 1 0 3 0 Σ(X2 ,X3)|X1 = − 1 0 = − = 0 9 0 0 9 0 0 0 9

Thus X2 and X3 are conditionally independent given X1 . The conditional densities are X2 |{X1 = x1 } ∼ N (x1 + 4, 3) and X3 |{X1 = x} ∼ N (2, 9). f. In the case of X1 |(X2 , X3 ), Σ11

  4 0 = , 0 9

  Σ21 = 1 0 ,

Σ22 = [1] .

So the mean and variance of X1 |{X2 = x2 , X3 = x3 } are          x2 − 5   1 0  1 1 x 5 2 1 µX1 |X2 ,X3 = 1 0 4 1 − +1= 4 0 + 1 = x2 − 0 9 x3 9 x3 − 9 4 4 1      3 1 0 1 2 σX = σ22 − 1 0 4 1 =1− = . 1 |X2 ,X3 0 9 0 4 4 The conditional mean does not depend on x3 since X1 and X3 are independent.

g. Let Y = 2X1 + X2 + X3 . In part (c) we found that Y ∼ N (9, 21). Thus     0 − µY −9 P{Y < 0} = Φ =Φ √ = Φ(−1.96) = Q(1.96) = 2.48 × 10−2 . σY 21

Page 4 of 11

EE 278, Autumn 2018

h. In general, AX ∼ (AµX , AΣX A⊤ ). For this problem,  1    2 1 1   9 µY = AµX = 5 = −2 1 −1 1   2     1 1 0 2 1 2 1 1  21 6 ΣY = AΣX A⊤ = 1 4 0 1 −1 = 1 −1 1 6 12 0 0 9 1 1     9 21 6 Thus Y ∼ N , . −2 6 12 4. Singular Covariance Matrix In class we showed that the general optimal linear estimator of random variable X given random vector Y is given by: ˆ lin(Y) = ΣXY Σ−1(Y − E Y) + E X. X Y However, this formula would not work if the covariance matrix ΣY is not invertible. In this problem, you are required to investigate this situation. a. Prove that if the covariance matrix of a d-dimensional random vector Y is not invertible, then there must exist at least one component of Y which is expressible as a linear combination of the others. b. Suppose X is a zero-mean scalar random variable, Y is a zero-mean random vector with covariance matrix   1 1 2 ΣY = 1 2 3  , 2 3 5 and the covariance between X and Y is

  ΣXY = 3 4 7 .

Derive the optimal linear estimator of X based on Y. Solution

a. As shown in class, each covariance matrix has a unique Cholesky decomposition. In other words, there exists a lower-triangular matrix L such that ΣY = LL⊤ . If ΣY is not invertible, there there must be at least one zero component in the diagonal of L. Indeed, if all of the diagonal components of L are non-zero, then L is full rank, which implies ΣY is invertible. Assume the i-th diagonal component of L is zero. Now we show that Yi could be expressed as a linear combination of Y1 , Y2 , ..., Yi−1 . Indeed, if we denote the first i − 1 components of Y as an i − 1 dimensional vector Y[1:i−1], and the matrix comprising the first i − 1 rows and columns of L as L[1:i−1], then the Cholesky decomposition shows that there exists a d-dimensional vector U with zero mean and identity covariance such that Y[1:i−1] = L[1:i−1]U[1:i−1] .

Homework #4 Solutions

Page 5 of 11

Equivalently, −1 U[1:i−1] = L[1: i−1] Y[1:i−1] .

Since we have assumed the i-th diagonal component of L is zero, we know that Yi =

i−1 X

Lij Uj .

j=1

Equivalently, we have −1 Yi = [Li1 , Li2 , . . . , Li,i−1 ]U[1:i−1] = [Li1 , Li2 , . . . , Li,i−1 ]L[1: i−1] Y[1:i−1] .

Thus, we have shown that Yi could be expressed as a linear combination of other components in Y. Conversely, if there exists a component Yi such that it can be expressed as a linear combination of other components, then there must exist a non-zero d-dimensional vector a such that a⊤ Y ≡ 0. We have E[(a⊤ Y)2 ] = E[(Y⊤ a)2 ] = a⊤ E[YY⊤ ]a = a⊤ ΣY a = 0. The existence of a non-zero vector a such that a⊤ ΣY a implies that ΣY is not invertible. b. We compute the Cholesky decomposition of ΣY :  1 0  L= 1 1 2 1

 0 0 . 0

Since it is singular, from part (a) we know that there must exist a component of Y which can be expressed as a linear combination of the other components. Using arguments in the solution of part (a), we have Y3 = Y1 + Y2 .

Thus, we abandon Y3 and use Y′ = (Y1 , Y2 )⊤ to construct the optimal linear estimator of X given Y. We have ˆ lin(Y) = Xˆlin(Y′ ) = ΣXY′ Σ−1′ Y′ , X Y since we have assumed E X = 0, E Y = 0. We have

and

  ΣXY′ = 3 4 , Σ−1 Y′

Page 6 of 11



 2 −1 = . −1 1

EE 278, Autumn 2018

Hence we have   ˆ lin(Y) = 2 1 Y′ = 2Y1 + Y2 . X

5. Transformation of Gaussians. Let

   1 2 2 X ∼ N 0, 2 4 4  . 2 4 9

Find a deterministic transformation T such that    2 1 T (X) ∼ N 0, . 1 4 Solution

Note that X1 and X2 are completely correlated, namely X1 = X2 /2. We can thus drop one of them, say X2 , and work with the other one (X1 ) and X3 . This selection corresponds to multiplication by the matrix   1 0 0 A1 = . 0 0 1 From the pdf of X we have A1 X ∼ N (0, Σ1 ) ,

  1 2 where Σ1 = . 2 9

Unlike the covariance matrix of X, this Σ1 is positive definite. To convert Σ1 to the desired   2 1 Σ2 = , 1 4 we first find the whitening matrix corresponding to Σ1 . We use this matrix as a second transformation step A2 , resulting in A2 A1 X ∼ N (0, I). Then we find the coloring matrix for Σ2 and use it as the third transformation step A3 . The final transformed vector A3 A2 A1 X will have the desired distribution. Using the Cholesky decomposition, it is easy to find  −1   1 0 1 0 √ √ √ , A2 = = −2/ 5 1/ 5 2 5   √ √2 p0 , A3 = 1/ 2 7/2

The overall transformation is thus

T (X) = A3 A2 A1 X " √ # 2 0 0 q q = 1 X 14 7 √ − 0 10 5 2   1.414 0 0 X. ≈ −0.966 0 0.837 Homework #4 Solutions

Page 7 of 11

Instead of the matrix 

 1.414 0 0 , −0.966 0 0.837

we could have used   0 0.707 0 , 0 −0.483 0.837 or any linear combination of the two. 6. Pollsters. This problem provides insight into combining expert opinion. The weighted vote interpretation of the solution gave one of the first indications that artificial neurons described by the input-output relation y = sgn(

m X i=1

wi Xi − t)

were natural components of neural nets and pattern classifiers. Let the results of an election be the r.v. Θ, where ( +1 if candidate A wins Θ= −1 if candidate B wins. The prior probability of candidate A winning is P{Θ = 1} = η. Now m pollsters try to predict the outcome of Θ with various degrees of accuracy, perhaps determined by their past performance. The i-th pollster predicts Xi = Zi Θ where ( +1 with probability pi Zi = −1 with probability 1 − pi . We assume that Z1 , Z2 , . . . , Zm and Θ are all independent. Thus the prediction Xi can be viewed as a noisy measurement of Θ, where the probability of correct predictions is pi . What is the best way to combine these pollings? ˆ 1 , . . . , xm ) is to decide a. Show that the minimum probability of error decision rule Θ(x ( Pm +1 if i=1 wi xi ≥ t ˆ= Θ −1 otherwise. Hint: a function g(x) that takes the value p when x = +1 and q = 1 − p when x = −1 (0 ≤ p ≤ 1) can be expressed as g(x) = p(1+x)/2 q (1−x)/2 . b. Find the weights w1 , w2 , . . . , wm . Note that the weights do not depend on η, the prior probability. c. Find the threshold t. Solution

Page 8 of 11

EE 278, Autumn 2018

a. The minimum error probability decision rule is MAP: ( ˆ 1 , . . . , xm ) = +1 pΘ|X1 ,...,Xm (+1|x1 , . . . xm ) ≥ pΘ|X1 ,...,Xm (−1|x1 , . . . xm ) Θ(x −1 otherwise. By Bayes rule, the inequality becomes pX ,...,Xm |Θ (x1 , . . . , xm | − 1)pΘ (−1) pX1 ,...,Xm |Θ (x1 , . . . , xm | + 1)pΘ (+1) ≥ 1 . pX1 ,...,Xm (x1 , . . . , xm ) pX1 ,...,Xm (x1 , . . . , xm ) Canceling the denominators of both sides and using the conditional independence of the Xi ’s given Θ, we obtain  m m m  Y Y Y pXi|Θ (xi | + 1) 1−η pXi|Θ (xi | + 1)η ≥ pXi|Θ (xi | − 1)(1 − η) ⇔ . ≥ pXi|Θ (xi | − 1) η i=1 i=1 i=1 Take the logarithm of both sides to change the product to a sum:     m X pXi|Θ (xi | + 1) 1−η log ≥ log . pXi|Θ (xi | − 1) η i=1

The next step is to use the hint to express probabilities as functions of xi , pi , and qi = 1 − pi : ( pi if xi = +1 pXi|Θ (xi | + 1) = qi if xi = −1. Combining the two cases in one expression, 1+xi 2

pXi|Θ (xi | + 1) = pi

1−xi 2

qi

.

Similarly, ( pi if xi = −1 pXi|Θ (xi | − 1) = qi if xi = +1

1−xi 2

⇒ pXi|Θ (xi | − 1) = pi

1+xi 2

qi

.

Using these results, the left-hand side of the decision rule inequality is  1+xi 1−xi    X   m m m m X X pi 2 q i 2  X pi pi xi −xi  xi log log log(p i qi ) = = log = xi 1−xi 1+xi qi 1 − pi p i 2 qi 2 i=1 i=1 i=1 i=1 Pm has the form i=1 wi xi . Apply this decision rule to the random outcomes x1 , . . . , xm :      Pm pi 1−η  +1 if i=1 ≥ log log 1−p x i ˆ 1 , . . . , xm ) =  η   i Θ(x P  −1 if m log pi xi < log 1−η i=1 1−pi η

  pi b. As promised, the weights wi = log 1−p do not depend on the prior probability η . i   . c. The decision threshold is t = log 1−η η

7. Heart-disease detection. The Stanford Hospital is interested in developing a system for automated heart-disease detection1 . Your task is to develop a decision rule for deciding whether or 1

A patient is deemed to suffer from heart disease if at least one of his or her major blood vessels is 50% narrower

Homework #4 Solutions

Page 9 of 11

not a patient has heart disease. You decide to model the problem as a signal detection problem, in which the signal is a random variable that indicates whether the patient has heart disease or not: ( 0 if the patient does not have heart disease, H= 1 if the patient has heart disease. Data from 218 patients is collected in heart dataset training.mat2 , which records each patient’s sex, the type of chest pain experienced by the patient, and the cholesterol of the patient. We model these quantities as the random variables S, C and X, respectively, where ( 1 if the patient is female, S= 2 if the patient is male,  1 if the pain is typical angina,    2 if the pain is atypical angina, C=  3 if the pain is non-anginal pain,    4 if asymptomatic, X =x if the patient’s serum cholesterol is x mg/dl,

where X is a continuous random variable. a. Assume that S, C and X are conditionally independent given H. Formulate a MAP decision ˆ c, x). Your rule rule for declaring whether or not a patient has heart disease, i.e. H(s, should involve marginal probability pH (h) and conditional probabilities pS|H (s|h), pC|H (c|h), and fX|H (x|h). The template file heartDisease.m reads in the data from heart dataset training.mat and computes the marginal and conditional probabilities for you. Code up your MAP decision rule at the end of the template file, and use your code to compute whether or not the 50 patients in heart dataset test.mat suffer from heart disease. Include the code and the estimates in your solution. b. Estimate the error probability of your detector by writing a line of code to compare your MAP estimates in (a) to the vector heart disease test in the file heart dataset test.mat, which indicates whether the patients suffer from heart disease or not. Solution a. By Bayes rule for mixed random variables, the chain rule and the conditional independence assumption, we have a posteriori probability pH (h)pS,C |H (s, c|h)fX|H,S,C (x|h, s, c) fX|S,C (x|s, c)pS,C (s, c) pH (h)pS|H (s|h)pC|H (c|h)fX|H (x|h) = . fX|S,C (x|s, c)pS,C (s, c)

pH|S,C,X (h|s, c, x) =

than it should be. 2 This data set is collected from five hospitals in Hungary, Switzerland and the United States, and is available at http://archive.ics.uci.edu/ml/datasets/Heart+Disease.

Page 10 of 11

EE 278, Autumn 2018

Since the denominator does not depend on h, the MAP rule reduces to ( ˆh(s, c, x) = 0 if pH (1)pS|H (s|1)pC|H (c|1)fX|H (x|1) < pH (0)pS|H (s|0)pC|H (c|0)fX|H (x|0) 1 otherwise The MATLAB code for the estimate is as follows: MAP estimates = P H1 .* P S H1(sex test) .* P C H1(chest pain test) .* f X H1(cholesterol test) > ... P H0 .* P S H0(sex test) .* P C H0(chest pain test) .* f X H0(cholesterol test); b. The error rate code is as follows: Error rate = sum(abs(MAP estimates - heart disease test)) ./ length(heart disease test), which computes to 0.14

Homework #4 Solutions

Page 11 of 11...