Nonparametric Density Estimation PDF

Title	Nonparametric Density Estimation
Author	Andrea Travaglia
Course	Machine Learning in Robotics
Institution	Technische Universität München
Pages	3
File Size	82.4 KB
File Type	PDF
Total Downloads	112
Total Views	166

Preview

CLICK TO PREVIEW PDF

Summary

Exercise Nonparametric Density Estimation - Machine Learning in Robotics 2020/2021...

Description

Human-centered Assistive Robotics Technische Universit¨at M¨unchen Prof. Dr.-Ing. Dongheui Lee

MACHINE LEARNING IN ROBOTICS Lecture 8: Non-parametric Density Estimation Summer Semester 2021

Exercise 1 Consider a histogram-like density model in which the space x is divided into fixed regions for which the density p(x) takes the constant value hk over the k th region, and that the volume of region k is denoted by ∆k . Suppose we have a set of n observations of x such that nk of these observations fall in region k . Using a Lagrange multiplier to enforce the normalization constraint on the density, derive an expression for the maximum likelihood estimator for the hk .

Solution to Exercise 1 The value of the density p(x) at a point xn is given by hj(n) , where the notation j(n) denotes that data point xn falls within region j. Thus, the log likelihood function takes the form: N X

ln p(xn ) =

N X

ln hj(n)

n=1

n=1

We now need to take account of the constraint that p(x) must integrate to unity (i.e., 1). Since p(x) has the P constant value hi over region i, which has volume ∆i , the normalization constraint becomes i hi ∆i = 1, as all the probabilities over all regions must add up to 1. Introducing a Lagrange multiplier λ we then minimize the function N X

ln hj(n) + λ

n=1

X i

!

h i ∆i − 1

Taking partial derivative with respect to hk and evaluating it to zero gives: ∂ nk = + λ∆k hk ∂hk = nk + λ∆k hk

=0 =0

where nk denotes the total number of data points falling within region k . Now making use of the normalization constraint, X X 0= ni + λ( hi ∆i ) ⇒ λ = −N i

i

Eliminating λ then gives our final result for the maximum likelihood solution for hk in the form hk =

nk 1 N ∆k

Note that, for equal sized bins ∆k = ∆ we obtain a bin height hk which is proportional to the fraction of points falling within that bin, as expected.

Exercise 2 Given a set of 5 data points {x(1) = 4, x(2) = 1.5, x(3) = 2.5, x(4) = 6, x(5) = 2}, find the pdf estimate at x = 3 using the Gaussian kernel function with variance σ = 1 as a window function.

Solution to Exercise 2 Let us make the value x = 3 be represented by x¯. From slide 11 in the lecture notes, we are given a definition of the Parzen window method: n

x¯ − x(i) 1X 1 pn (x) = k( ) n i=1 hm h This equation suggests a general approach to estimate density by substituting the Parzen window function with another kernel window function. This equation can be considered as the average of a set of window functions. Since we are considering a Gaussian window function, we will have the following (same as slide 13 in lecture notes): n

1X 1 (¯ x − x(i) )2 √ pn (x) = ) exp(− n i=1 2πh 2h2 In this problem, we are given a variance σ = 1, which is the same as the bandwidth h. We also have a value n = 5 since we have 5 data points originally. Therefore, we can solve this as follows: n

1X 1 (¯ x − x(i) )2 √ pn (x) = ) exp(− n i=1 2πσ 2σ 2 (3−4)2 (3−1.5)2 (3−2.5)2 (3−6)2 (3−2)2 1 1 pn (3) = × √ (e(− 2 ) + e(− 2 ) + e(− 2 ) + e(− 2 ) + e(− 2 ) ) 5 2π 1 1 = × √ (0.6065 + 0.3247 + 0.8825 + 0.0111 + 0.6065) 5 2π = 0.1940

ML9-2

Exercise 3 The following table shows 6 training samples of responses gathered from a survey. Two attributes (x1 , x2 ) have been selected to classify data samples as good or bad. x1 8 7 5 4 2 1

x2 5 4 7 3 6 4

y (classification) Bad Bad Bad Good Good Good

Suppose we are presented with a new sample as (x1 = 3, x2 = 8). Classify the sample by using the k-nearest neighbours method (k-NN) for k = 1 and k = 4.

Solution to Exercise 3 For this problem, we first need to calculate the Euclidean distance between the new sample point and all training samples that we have in the table: p √ distance from (3, 8) to (8, 5) : (8 − 3)2 + (5 − 8)2 = 34 p √ distance from (3, 8) to (7, 4) : (7 − 3)2 + (4 − 8)2 = 32 p √ distance from (3, 8) to (5, 7) : (5 − 3)2 + (7 − 8)2 = 5 p √ distance from (3, 8) to (4, 3) : (4 − 3)2 + (3 − 8)2 = 26 p √ distance from (3, 8) to (2, 6) : (2 − 3)2 + (6 − 8)2 = 5 p √ distance from (3, 8) to (1, 4) : (1 − 3)2 + (4 − 8)2 = 20 Using the measured distances, we can now determine nearest neighbors based on the k closest points. • For k = 1, we can either pick (5, 7) or (2, 6) as the closest example, which is either ‘Good’ or ‘Bad’. This is not very informative since this ends up being a 50/50 chance. • For k = 4, we can pick among (5, 7), (1, 4), (4, 3), and (2, 6). Here, we have 3 points labeled ‘Good’ and 1 point labeled ‘Bad’. Therefore, this sample will be classified as ‘Good’.

ML9-3...