Estimation Notes PDF

Title	Estimation Notes
Author	Eel Tess
Course	Inferential Statistics
Institution	Universiti Teknologi Malaysia
Pages	35
File Size	737.1 KB
File Type	PDF
Total Downloads	22
Total Views	155

Preview

CLICK TO PREVIEW PDF

Summary

Download Estimation Notes PDF

Description

CHAPTER 1 Estimation

1.1 Point Estimation We may have several different choices for the point estimator of a parameter. For example if we wish to estimate the mean of a population, we might consider the sample mean, the sample median or may be even the average of the smallest and the largest observations in the sample as point estimator. Therefore we need to examine their statistical properties and develop some criteria for comparing estimators.

1.2 Methods of Point Estimations The methods of estimation to be discussed here are maximum likelihood estimation and estimation by methods of moments.

Some might asked, why do we need to estimate parameters? A brief description to answer the question is as follows:

Let X be a random variable with probability function f(x,θ), where θ is a parameter in a parameter space Ω. If θ is known, then the probability function would be known and consequently will be able to calculate the probabilities related to X. Usually the θ is unknown, then the objective is to estimate θ on the basis of a random sample of size n, X 1 , X 2 , , X n from f(x,θ). Of course we would like to get a “good estimate” of θ.

1.2.1

Maximum Likelihood Estimation (MLE)

The most widely accepted principle is the principle of Maximum Likelihood. The principle involved here is choosing an estimate of the parameter, θ through the process of maximizing n

the likelihood function L ( x )   f (x i , ) which depends on the sample values (observed i1

values). We will try to illustrate this method through an example.

Note: 

The method of maximum likelihood cannot be applied without the knowledge of underlying distribution.



Joint pdf’s and likelihood functions look the same, but the two are interpreted differently. A joint pdf defined for a set of n random variables is a multivariate function of those random variables. In contrast, L is a function of  ; it should be considered a function of xi ' s .



There are a few situations where the equations dL( )  0 or d

d ln L( )  0 are not meaningful and do not yield solution for ˆ . In d

those cases, the MLE often turns out to be an order statistic, for reasons having to do with range of the random variable.

Definition 1.1 Let X 1 , X 2 , , X n be a random sample from f ( x;  ) , where θ is an unknown parameter. The likelihood function , L( x) , is the product of the pdf f ( x;  ) evaluated at n data points. That is, n

L ( x )   f (x i , ) i 1

2

Example 1.1 A random sample of size n, X 1 , X 2 , , X n are taken from B(1,p) distribution with observed values, x1 , x 2 , , xn . Determine the maximum likelihood estimate of the parameter p. Answer : ˆp  x

Example 1.2 Consider a Poisson distribution with probability function f (x ,  ) 

e   x ; x!

x  0, 1, 2, 

Suppose that a random sample x1 , x 2 , , x n is taken from the distribution. What is the maximum likelihood estimate of  ? Answer : ˆ  x

Example 1.3 It is known that a sample of 8, 10.9, 11.5, 14.6, 7.9, 11.2, 13.3 comes from a population with probability function

  ,  f (x ; )   x 1 0 , 

x 1 otherwise

where  > 0. Find the maximum likelihood estimate of  . Answer : ˆ 

n n

 0.423

ln xi

i 1

3

Example 1.4 Based on the random sample Y1 = 16.83, Y2 = 4.6, Y3 = 3.2, Y4 = 11.1, Y5 = 23.2 and Y6= 14.5, use the method of maximum likelihood to estimate the parameter  in the uniform probability density function 1 f ( y, )  ,

0  y 



Answer : ˆ  ymax  23.2

Example 1.5 A random sample of size n is taken from the probability density function f ( x,  ) 2x 2,

0y

1





Find an expression for  , the maximum likelihood estimator for . 1 Answer : ˆ  ymax

Note: Finding MLEs When More Than One Parameter is Unknown If a family of probability models is indexed by two or more unknown parameters, say,

1,  2 , ,  k ; finding MLEs for the  i ' s requires the solution of a set of k simultaneous equations. If k = 2, for example, we would need to solve the system d ln L(1 ,  2 ) 0 d 1 d ln L(1 ,  2 ) 0 d 2

4

Example 1.6 Suppose a random sample x1 , x 2 , , x n is taken from a normal distribution N ( ,  2 ) . Find the maximum likelihood estimators for  and  2 .   xi  x  n

Answer : ˆ  x ,

ˆ  i 1 2

2

n

Theorem 1.1 Let ˆ  ˆ( x) be the MLE of  on the basis of observed values x1 , x 2 , , x n of the random * sample X 1 , X 2 , , X n from the probability function f ( x;  ) ,      . Also let   g ( )

be one-to-one function defined on  onto *   . Then the MLE of  * , ˆ* ( x), is given by

ˆ* (x)  g (ˆ (x )) . Example 1.7 Given that the MLE of p is x , what is the MLE of p(1  p ) ? Answer : x( 1  x ).

1.2.2

Method of Moments

A second procedure for estimating parameters is the method of moments. The method of moments is often more tractable than the method of maximum likelihood in situations where the underlying probability model has multiple parameters.

The general idea behind the method of moments is to equate population moments, which are defined in terms of expected values, to the corresponding sample moments. The population moments will be functions of the unknown parameters. Then these equations are solved to yield estimators of the unknown parameters. 5

Suppose that X is a continuous random variable and its pdf is a function of k unknown parameters, ˆ1, , ˆk . The first k moments of X, if exist, are given by the integrals  j

E( X ) 

x

f ( x;ˆ1 ,,ˆ k ) dx,

j

j  1, 2, , k



Corresponding to each population(or theoretical) moment, E( X j ) , is a sample moment

1 n j  xi . n i 1 Intuitively, the jth sample moment is an approximation to the jth population moment.

Definition 1.2 Let X 1 , X 2 , , X n be a random sample from f (x ; 1, , k ) . The method of moments estimates ˆ1, , ˆk for the model’s unknown parameters are the solutions of the k simultaneous equations



 



x f X (x ,1 ,, k ) dx 

1 n x n i 1 i

x2 f X (x ,1 , , k ) dx 







xk f X (x ,1 , , k ) dx 

1 n 2 x n i 1 i



1 n k x i n i 1

6

Example 1.8 Suppose that Y1 = 0.56, Y2 = 0.30, Y3 = 0.48, Y4 = 0.89 and Y5 = 0.33 is a random of size 5 from the pdf f ( y; )  y 1, 0  y  1

Find the method of moments estimate for  . y Answer : ˆ   1 y

Example 1.9 Let x1 , x 2 , , xn be a random sample of size n from the exponential with pdf f (x , )   e x ; x  0 ;     (0, )

Find the method of moments estimate and maximum likelihood estimate for  . Which the better estimate of the parameter  ? 1 Answer : ˆ  x

Example 1.10 Suppose that Y1  4.6, Y2  3.2, Y3  8.1, Y4  7.4, and Y5  5.8 is a random sample of size 5 from the two-parameter uniform pdf, f ( y ; 1, 2 ) 

1 2 2

, 1  2  y  1   2

Use the method of moments to calculate ˆ1 and ˆ 2 . n 2     1 yi 2 i ˆ ˆ Answer : 1  y  5.82 and  2  3  y   3.103 .   n  

7

1.3 Measures of Quality of Estimators 1.3.1

Unbiased Estimators

Definition 1.3 Let X 1 , X 2 , , X n be a random sample with probability function f ( x; ) where      , and let U  U ( X1, , X n ) for an estimate of  . The estimate U is said to be unbiased if E (U )   for all    .

Note: Any good estimate should be “close” to the value it is estimating. “Unbiasedness” means essentially that the average value of the estimate will be close to the true parameter value. Although it is desirable that the estimate to be unbiased, there may be occasions when we might prefer biased estimate.

Example 1.11 A random sample of size n, X 1 , X 2 , , X n are taken from B(1,p) distribution. Determine n

whether X 

Xi i 1

is an unbiased estimate of p.

n n

Answer : X 

Xi i 1

n

is an unbiased estimate of p.

8

Example 1.12 Let X 1 , X 2 , , X n be a random sample from a Poisson distribution with parameter  . n

Determine whether X 

Xi i 1

n

is an unbiased estimate of  .

n

Answer : X 

Xi i 1

n

is an unbiased estimate of  .

Example 1.13 Let X 1 , X 2 , , X n be a random sample from a distribution with p.d.f f ( x , ) 

1  x   e ; x  0 ;     (0, ) . Determine whether X is an unbiased estimate



of  . Answer : whether X is an unbiased estimate of  .

Example 1.14 Let X 1 , X 2 , , X n be uniformly distributed on the interval 0 to a. Show that the moment ˆ  2 X . Is this an unbiased estimator? estimator of a is a ˆ  2 X . Is this an unbiased estimator. Answer : a

1.3.2

Variance of a Point Estimator

The variance of a random variable measures the variability of the random variable about its expected value. Hence to require an unbiased estimate and to have small variance is intuitively appealing. If the variance is small, then the value of the random variable tends to be close to its 9

mean, which is the case of an unbiased estimate means close to the true value of the parameter. When there are two unbiased estimates of  estimates, ˆ 1 and ˆ2 where Var (ˆ1)  Var (ˆ2 ) , we would prefer ˆ1 to ˆ 2 .

In the case of two estimates ˆ3 and ˆ 4 , where ˆ 3 is unbiased but ˆ4 is not, but Var (ˆ 3 )  Var (ˆ 4) , the decision is not clear. This means that while on the average ˆ3 will be

close to  , its larger variance indicates that considerable deviations from  would not be surprising. ˆ4 on the other hand, would tend to be larger than  on the average, and yet might be closer to  than ˆ3 .

Definition 1.4 The unbiased estimate U  U (X 1,, X n ) of  is said to be Uniformly Minimum Variance Unbiased (UMVU), if for any other unbiased estimate V  V (X 1, X n ) it holds that: Var (U )  Var(V ) for all  

The process of seeking a UMVU estimate is facilitated by the Cramer-Rao inequality in the next theorem.

Note: Suppose a random of size n is taken from a probability distribution f ( x;  ) , where  is an unknown parameter. Associated with f ( x; ) is a theoretical limit below which the variance of any unbiased estimator for  cannot fall. That limit is the Cramer-Rao lower bound. If the variance of a given ˆ is equal to the Cramer-Rao lower bound, we know that, that estimator is optimal in the sense that no other unbiased ˆ can estimate  with greater precision.

10

Theorem 1.2: Cramer-Rao Inequality Let X 1 , X 2 , , X n be a random sample with pdf f (x; ),      , has continuous firstorder and second-order partial derivatives at all but finite set of points. Suppose the set of

xi ' s for which f (x; )  0 does not depend on  . Let ˆ  h X 1 , X 2 , , X n  be any unbiased estimator for  . Then

1

2     2 ln f ( x,  )     ln f (x , )        Var(ˆ)   nE  nE       2             

1

Example 1.15 Suppose the random variables X 1 , X 2 , , X n denote the number of successes in each of n independent trials, where p = P(success occurs at any given trial) is an unknown parameter. The probability function for this distribution is

f ( x; p)  p k (1  p) 1k , k  0, 1, 0  p 1

n

Show that pˆ 

Xi i1

n

is a UMVU estimator of p.

Answer : Var ( pˆ ) = Cramer- Rao lower bound

11

Example 1.16

Let X 1 , X 2 , , X n be a random sample from f ( x; ) 

1



e x  , x  0

Compare the Cramer-Rao lower bound for f ( x; ) with the variance of the MLE for  ,

ˆ 

n 1 n ˆ  1 X the best estimator for  ? . Is X    n i 1 i n i 1 i

1 n Answer: Var (x ) = Cramer- Rao lower bound. ˆ   X i the best estimator for  . n i 1

Example 1.17 If X 1 , X 2 , , X n is a random sample from the Poisson distribution P ( ) , show that the sample mean X is the UMVU estimate of  . Answer : Var (x ) = Cramer- Rao lower bound. X is the UMVU estimate of  .

1.4

INTERVAL ESTIMATION

1.4.1

Introduction

Point estimates, no matter how they are obtained, do not provide any indication of their inherent precision. The usual way to quantify the amount of uncertainty in an estimator is to construct confidence interval. In principle, confidence intervals are ranges of numbers that have high probability of “containing” the unknown parameter as an interior point. By looking at the width of a confidence interval, we can get a good sense of the estimator’s precision. 12

Definition Let X 1 ,  , X n be a random sample with pdf f (x; ),      . Then ; i) ii)

A random interval is an interval whose end-points are random variables. A confidence interval for  with confidence coefficient 1   ( where 0    1) is a random interval whose end-points are statistics

L ( X1 , Xn ) andU ( X1 , X n ), such that L ( X1 , X n )  U ( X 1 ,  X n ) and P L (X1 , X n )    U (X1 ,  X n )  1  for all    This form of confidence interval is known as two-sided confidence interval. iii)

The statistic L (X 1, , X n ) is called a lower confidence limit for  with confidence coefficient 1   , if the interval [L (X 1 , , X n ), ) is a confidence interval for  with confidence coefficient 1   . Likewise, U ( X 1, , X n ) is said the upper confidence limit for  with confidence coefficient 1   , if the interval (,U ( X 1, , X n )] is a confidence interval for  with confidence coefficient 1   . These kind of confidence intervals are also known as one-sided confidence intervals. These intervals can be summarized as the followings: 

Lower Confidence interval P L( X 1 , , X n )      1  



Upper confidence interval P      U ( X1 , , X n )  1  

The followings are the outlines in constructing a confidence interval. i)

Think of a random variable which contains the parameter  , the random variables

X 1 ,  , X n , preferable in the form of a sufficient statistic, and whose distribution is (exactly or at least approximately) known. ii)

Determine suitable points a < b such that the random variables in step (i), lies in [a, b] with P(a    b)  1   .

iii)

In the expression in step (ii), rearrange the terms so that we get an interval with the end-points being statistics and containing  . 13

Remark: On the basis of the observed values, x 1, , x n of X1 ,  , X n , construct the interval with endpoints L (X 1, , X n ) and U ( X 1, , X n ) ; denote it with L1 ,U1 . Repeat the underlying random experiment independently for another n times and form L2 ,U 2  . Repeat this process a large number of times N, independently each time and let L N , U N  be the corresponding interval. Then

L( X 1 ,  X n ) , U ( X1 ,  X n ) is a confidence interval for 

with confidence coefficient 1  .

This means that approximately 100(1  )% of the above N intervals will cover  , no matter what its value is.

1.4.2

Confidence Intervals for Means

Suppose we are willing to accept as a fact that the outcome (numerical) X of a random experiment is a random variable that has a normal distribution with unknown mean  and known variance  2 . To get some information about this distribution, we conduct this experiment (under identical condition) independently n times. Let X1 ,  , X n be random variables denoting the outcomes from the n repetition of the experiment. Consider the MLE of

 , namely ˆ  X . And we have already known that   X   2  0.954 P   2  / n   However, the events

14

X  ~ N (0, 1) . Thus, / n

2  2

X 

/ n

X 

n X

 2,

(1)

2 n

,

2 2   X  n n

( 2) (3)

(1), (2) and (3) are equivalent. Thus these events have the same probability. That is

 2 2   X   0.954 P  X  n n  Since the  is a known number, each of the random variables X 

2 2 and X  is a n n

 2 2  statistic. The interval  X  , X  is a random interval. n n  The probability is 0.954 that the random interval includes the unknown parameter  . Now suppose the experiment yields X 1  x1, X 2  x 2 , , X n  xn . Then the sample value of X is

x

1 n  x , a known number. Since the  is known, then the interval n i 1 i

 2 2  , x  x   has n n 

known end points.

The number 0.954 is called the confidence coefficient. The confidence coefficient is equal to the probability that the random interval includes the parameter. Of course we can also obtain an 80%, a 90% or 99% confidence interval for  by replacing the constant 2 with 1.282, 1.645, or 2.576. A statistical inference of this sort is an example of interval estimation of a parameter.

Note: The interval estimate of  is found by taking a good (here the ML) estimate x of  and adding and subtracting k  / n , where k is dependent on the confidence coefficient. Now, lets construct the confidence intervals in general as follows: 15

Case 1:  2 known Let X 1 , X 2 , , X n denote a random sample of size n from a normal distribution with unknown  2 mean  and known variance  2 . Consider X , the MLE of  . Thus, X ~ N  , n 

X  / n

 and  

~ N 0, 1.

For the probability 1   , we can find a number z  such that 2

  X  P  z   z    1   2 2 / n 

     X    z   1  P  z 2 2 n  n      P  X  z     X  z  1   2 2 n n         X z    1   P X  z  2 n 2 n    A 1    100% confidence interval estimate of  is

    , X  z  X  z   2 n 2 n 

Example 1.18 Suppose Y1 = 6.5, Y2 = 9.2, Y3 = 9.9 and Y4 = 12.4 is a random sample of size 4 from normal distribution with mean  and variance 0.64. What would be the values of  which are believable in the presence of the four data? Answer : (8.72, 10.28).

16

Theorems









(i) If X ~ N ,  2 , 2 > 0, then W  (ii) If X ~ N ,  2 , 2 > 0, then V 

X  ~ N 0, 1 . 

 X   2 2

~  21 .

t – distribution Let W ~ N 0, 1 and V ~  2 r  . Let W and V be independent. A new variable T is defined by...