ML-exercise 04solution PDF

Title ML-exercise 04solution
Course Machine learning
Institution Technische Universität München
Pages 12
File Size 342.4 KB
File Type PDF
Total Downloads 45
Total Views 434

Summary

ML-exercise 04solution...


Description

Exercise 4: Linear Regression Group 116: Zhixiong Zhuang, Yuyin Lang, Bowen Zhang

Least squares regression Problem 4: Let’s assume we have a dataset where each datapoint, (xi, yi) is weighted by a scalar factor which we will call ti. We will assume that ti > 0 for all i. This makes the sum of squares error function look like the following:

E weighted ( w ) =

1 N t i [wT  (x i ) − y i ]2  2 i=1

Find the equation for the value of w that minimizes this error function. Furthermore, explain how this weighting factor, ti, can be interpreted in terms of 1) the variance of the noise on the data and 2) data points for which there are exact copies in the dataset. Solution: * w = arg min E weighted ( w) w

1 N t i[ wT  ( x i ) − y i] 2  2 i =1 w 1 = arg min (  w − y) T T (  w − y) w 2 = arg min

t1 0 In which T =  0  0

0 t2 0 0

0 0

0 0 

0

 tN 

! 1 w ELS =  w ( wT  T − yT )(T w− Ty) =  T T w−  TTy= 0 2 * T w = ( T )− 1  TTy

1) Eweighted (w ) represents weighted least squares error in which the errors covariance matrix is allowed to be different from an identity matrix.

y i = f w( x i ) +  i

 i represent noise of a zero-mean gaussian 1 y i ~  ( f w( x i ),  − ) 1 − distribution with a fixed precision = 2 ,  i ~  (0,  1 )

As was defined in our class,



EML (

 ) =w  ( Tw( i ) −x 2 N

i

)2 y+

n co t

i=1

In this case, Eweighted ( w ) =

1 N  ti [wT  (xi ) − yi ]2 ,ti replace the fixed precision of noise 𝛽, which 2 i=1

means this error function allows the distribution of the noise at different observation points has different variance. In another word, by introducing ti, the heteroskedasticity of noise function can be eliminated.

2) For these points which has exact copies in the dataset, the way to estimate the weights is to set the weight for each data point equal to the reciprocal of the sample variance obtained from the set of replicate measurements to which the data point belongs. Mathematically this would be

 ni 2    (y ij − y i )  1 j =1  w ij = 2 =   ˆ i  ni −1    

−1

Where i indexes the unique combinations of predictor variable values, j indexes the replicates within each combination of predictor variable values.

Ridge regression Problem 5: Show that ridge regression on a design matrix Φ ∈ RN×M with regularization strength λ is equivalent to ordinary least squares regression with an augmented design matrix and target vector

    y ˆ = and yˆ =      I   0M  M   Solution: Let’s write the form of LS regression with augmented design matrix and target vector

E ( w) =

1 ˆ ˆ w − y) ( w − y )T ( 2

1  w   y =  −  2   I M w   0 M 1   w− y  =   2  I M w 

T

   w   y    −           I M w   0 M  

T

  w− y     I M w  1 1 T T = ( w− y) ( w− y) +  w w 2 2 N 1 1 =  (w T (xi ) − y i )2 +  2 i= 1 2 Which is exactly the same as ridge regression.

Problem 6: John Doe is a data scientist, and he wants to fit a polynomial regression model to his data. For this, he needs to choose the degree of the polynomial that works best for his problem. Unfortunately, John hasn’t attended IN2064, so he writes the following code for choosing the optimal degree of the polynomial: X, y = load_data() best_error = -1 best_degree = None for degree in range(1, 50): w = fit_polynomial_regression(X, y, degree) y_predicted = predict_polynomial_regression(X, w, degree) error = compute_mean_squared_error(y, y_predicted) if (error...


Similar Free PDFs