Title | Nonparametric Density Estimation |
---|---|
Author | Andrea Travaglia |
Course | Machine Learning in Robotics |
Institution | Technische Universität München |
Pages | 3 |
File Size | 82.4 KB |
File Type | |
Total Downloads | 112 |
Total Views | 166 |
Exercise Nonparametric Density Estimation - Machine Learning in Robotics 2020/2021...
Human-centered Assistive Robotics Technische Universit¨at M¨unchen Prof. Dr.-Ing. Dongheui Lee
MACHINE LEARNING IN ROBOTICS Lecture 8: Non-parametric Density Estimation Summer Semester 2021
Exercise 1 Consider a histogram-like density model in which the space x is divided into fixed regions for which the density p(x) takes the constant value hk over the k th region, and that the volume of region k is denoted by ∆k . Suppose we have a set of n observations of x such that nk of these observations fall in region k . Using a Lagrange multiplier to enforce the normalization constraint on the density, derive an expression for the maximum likelihood estimator for the hk .
Solution to Exercise 1 The value of the density p(x) at a point xn is given by hj(n) , where the notation j(n) denotes that data point xn falls within region j. Thus, the log likelihood function takes the form: N X
ln p(xn ) =
N X
ln hj(n)
n=1
n=1
We now need to take account of the constraint that p(x) must integrate to unity (i.e., 1). Since p(x) has the P constant value hi over region i, which has volume ∆i , the normalization constraint becomes i hi ∆i = 1, as all the probabilities over all regions must add up to 1. Introducing a Lagrange multiplier λ we then minimize the function N X
ln hj(n) + λ
n=1
X i
!
h i ∆i − 1
Taking partial derivative with respect to hk and evaluating it to zero gives: ∂ nk = + λ∆k hk ∂hk = nk + λ∆k hk
=0 =0
where nk denotes the total number of data points falling within region k . Now making use of the normalization constraint, X X 0= ni + λ( hi ∆i ) ⇒ λ = −N i
i
Eliminating λ then gives our final result for the maximum likelihood solution for hk in the form hk =
nk 1 N ∆k
Note that, for equal sized bins ∆k = ∆ we obtain a bin height hk which is proportional to the fraction of points falling within that bin, as expected.
Exercise 2 Given a set of 5 data points {x(1) = 4, x(2) = 1.5, x(3) = 2.5, x(4) = 6, x(5) = 2}, find the pdf estimate at x = 3 using the Gaussian kernel function with variance σ = 1 as a window function.
Solution to Exercise 2 Let us make the value x = 3 be represented by x¯. From slide 11 in the lecture notes, we are given a definition of the Parzen window method: n
x¯ − x(i) 1X 1 pn (x) = k( ) n i=1 hm h This equation suggests a general approach to estimate density by substituting the Parzen window function with another kernel window function. This equation can be considered as the average of a set of window functions. Since we are considering a Gaussian window function, we will have the following (same as slide 13 in lecture notes): n
1X 1 (¯ x − x(i) )2 √ pn (x) = ) exp(− n i=1 2πh 2h2 In this problem, we are given a variance σ = 1, which is the same as the bandwidth h. We also have a value n = 5 since we have 5 data points originally. Therefore, we can solve this as follows: n
1X 1 (¯ x − x(i) )2 √ pn (x) = ) exp(− n i=1 2πσ 2σ 2 (3−4)2 (3−1.5)2 (3−2.5)2 (3−6)2 (3−2)2 1 1 pn (3) = × √ (e(− 2 ) + e(− 2 ) + e(− 2 ) + e(− 2 ) + e(− 2 ) ) 5 2π 1 1 = × √ (0.6065 + 0.3247 + 0.8825 + 0.0111 + 0.6065) 5 2π = 0.1940
ML9-2
Exercise 3 The following table shows 6 training samples of responses gathered from a survey. Two attributes (x1 , x2 ) have been selected to classify data samples as good or bad. x1 8 7 5 4 2 1
x2 5 4 7 3 6 4
y (classification) Bad Bad Bad Good Good Good
Suppose we are presented with a new sample as (x1 = 3, x2 = 8). Classify the sample by using the k-nearest neighbours method (k-NN) for k = 1 and k = 4.
Solution to Exercise 3 For this problem, we first need to calculate the Euclidean distance between the new sample point and all training samples that we have in the table: p √ distance from (3, 8) to (8, 5) : (8 − 3)2 + (5 − 8)2 = 34 p √ distance from (3, 8) to (7, 4) : (7 − 3)2 + (4 − 8)2 = 32 p √ distance from (3, 8) to (5, 7) : (5 − 3)2 + (7 − 8)2 = 5 p √ distance from (3, 8) to (4, 3) : (4 − 3)2 + (3 − 8)2 = 26 p √ distance from (3, 8) to (2, 6) : (2 − 3)2 + (6 − 8)2 = 5 p √ distance from (3, 8) to (1, 4) : (1 − 3)2 + (4 − 8)2 = 20 Using the measured distances, we can now determine nearest neighbors based on the k closest points. • For k = 1, we can either pick (5, 7) or (2, 6) as the closest example, which is either ‘Good’ or ‘Bad’. This is not very informative since this ends up being a 50/50 chance. • For k = 4, we can pick among (5, 7), (1, 4), (4, 3), and (2, 6). Here, we have 3 points labeled ‘Good’ and 1 point labeled ‘Bad’. Therefore, this sample will be classified as ‘Good’.
ML9-3...