Summary Fundamentals of Mathematical Statistics PDF

Title	Summary Fundamentals of Mathematical Statistics
Course	Intro Prob Solv & Programming
Institution	University of Texas at Austin
Pages	12
File Size	560.9 KB
File Type	PDF
Total Downloads	102
Total Views	148

Preview

CLICK TO PREVIEW PDF

Summary

Stastitics summary...

Description

Fundamentals of Mathematical Statistics Definitions/Lemmas by Roman B¨ ohringer

Multinomial Distribution: P (N1 = n1 , . . . , Nk = nk ) =

ETH Z¨ urich January 2, 2021

Probability Theory Conditional Probability: P (A | B) =



n n1 · · · nk

Poisson Distribution:





:=

n n1 · · · nk





pθ (x) = exp 

pn1 1 · · · pnkk

n! n1 ! · · · nk !

P (X = x) = e−λ P (A ∩ B) P (B)

The family is in canonical form if:

Distributions

k X j=1



θj Tj (x) − d(θ) h(x)

Where:  Z

d(θ) = log 

λx x!

   k X   exp  h( x ) dν(x ) θj Tj (x) j=1

Estimation

Normal Distribution:

Estimator: An estimator T (X) is a function T (·) evaluated at the observations X. The function T (·) is not allowed to P (B) depend on unknown parameters. P (B | A) = P (A | B) P (A) Sum of Independent Normal / Poisson Variables: For Empirical Distribution Function: 2 Y ∼ N (µY , σY2 ) 1 Law of Total Probability: For a partition {Bj } (Bj ∩ Bk = X and Y independent: X ∼ N (µX , σ X ) and Fˆn (·) := # {Xi ≤ ·, 1 ≤ i ≤ n} with Z = X + Y , then Z ∼ N (µX + µY , σ 2X + σY2 ). n ∅ for all j 6= k and P (∪j Bj ) = 1): X ∼ P(λ), Y ∼ P(µ) ⇒ Z ∼ P(λ + µ). X Method of Moments: Given the first p moments of X : Chi-Square Distribution: Let Z1 , . . . , Zp be i.i.d. N (0, 1)P (A | Bj ) P (Bj ) P (A) = distributed and define the p-vector: j Z   µj (θ) = Eθ X j = xj dFθ (x), j = 1, . . . , p Marginal Density: Z1 Z   Z :=  ...  fX (·) = fX,Y (·, y)dy And the map m with inverse m−1 : Zp Z m(θ ) = [µ1 (θ), . . . , µp (θ )] m−1 (µ1 , . . . , µp ) fX (x) = fX (x | y)fY (y)dy Z is N (0, I) distributed and the χ2 -distribution with p degrees of freedom is defined as (kZk22 ∼ χp2): We calculate: Conditional Density: Z n p 1 X j X µ ˆj := Xi = xj dFˆn (x), j = 1, . . . , p fX,Y (x, y) 2 2 Zj kZ k2 := n fX (x | y) := i=1 fY (y) j=1 And plug in: fY (y) Distribution of Maximum: Z := max{X1 , X2 } with X1 , fY (y | x) = fX (x | y) θˆ := m−1 (ˆ µ1 , . . . , µ ˆp ) fX (x) X2 independent having distribution F and density f . Maximum Likelihood Estimator: Given the likelihood Conditional Expectation: function: fZ (z ) = 2F (z )f (z ) Z n Y LX (ϑ) := pϑ (Xi ) , ϑ ∈ Θ E[g(X, Y ) | Y = y] := fX (x | y)g(x, y)dx Exponential Families: A k-dimensional exponential fami=1 ily is a family of distributions with densities of the form: Iterated Expectations Lemma: We calculate: Bayes Theorem:

E [E [g(X, Y ) | Y ]] = Eg(X, Y ) Law of Total Variance: var(Y ) = var(E(Y | Z)) + Evar(Y | Z )

1 1 x−µ 2 √ e− 2 ( σ ) σ 2π



pθ (x) = exp 

k X j=1



c j (θ)Tj (x) − d(θ) h(x)

ˆ θ = arg maxlog LX (ϑ) = arg max ϑ∈Θ

ϑ∈Θ

n X i=1

log pϑ (Xi )

Minimal Sufficiency: Two likelihoods Lx (θ) and Lx˜ (θ) are And for γ = c(θ): proportional at (x, x˜) if

Sufficiency Sufficiency: Some given map S : X → Y is called sufficient for θ ∈ Θ if for all θ, and all possible s, the following conditional distribution does not depend on θ : P θ (X ∈ · | S(X) = x) Factorization Theorem of NeymanPR : Given densities pθ , S is sufficient if and only if there are some functions gθ (·) ≥ 0 and h(·) ≥ 0 such that we can write: pθ (x) = gθ (S(x))h(x) ∀x, θ Sufficiency for Exponential Families: For a k dimensional exponential family, the k-dimensional statistic S(X ) = (T1 (X ), . . . , Tk (X)) is sufficient for θ. For n i.i.d. samples, the following statistic is sufficient: S(X) = (

n n 1X 1X T1 (xi ), . . . , Tk (xi )) n n i=1

i=1

Lx (θ) = Lx˜ (θ)c(x, x˜) ∀θ for some constant c(x, x˜). A sufficient statistic S is called minimal sufficient if S(x) = S(˜ x) for all x and x˜ where the likelihoods are proportional. Completeness: Sufficient statistic S is called complete if (where h is a function not depending on θ): Eθ h(S ) = 0∀θ ⇒ h(S ) = 0, P θ − a.s.

∀θ

C := {(c 1 (θ ), . . . , c k (θ)) : θ ∈ Θ} ⊂ R

If C is truly k-dimensional, S := (T1 , . . . , Tk ) is complete.

Fisher Information

Expectation/Covariance of Sufficient Statistic for ExScore Function: ponential Families: Given an exponential family in canonical form and: d p˙θ (x) sθ (x) := log pθ (x) =   2 2 pθ (x) dθ ∂ ∂ ∂ ˙ ¨ := d(θ) = d(θ) d(θ) := d(θ) d(θ) ∂θ∂θ′ ∂θj1 ∂θj2 ∂θ Eθ sθ (X) = 0     Eθ T1 (X) T1 (X) For n i.i.d. observations:     .. .. T (X) :=  n   , Eθ T (X) :=  . . X sθ (x) = sθ (xi ) Eθ Tk (X) Tk (X) Cov θ (T (X)) := Eθ T (X)T ′ (X) − Eθ T (X)Eθ T ′ (X )

We have

PR

: ˙ ¨ Eθ T (X) = d(θ), Cov θ (T (X)) = d(θ)

If the family is not in canonical form: ˙ d(θ) Eθ T (X) = c(θ) ˙ 1 varθ (T (X)) = 2 [c(θ)] ˙

! ˙ ¨ − d(θ) c¨(θ) d(θ) c(θ) ˙

Higher-Dimensional Extensions (Score Vector & Fisher Information Matrix):   ∂ log pθ /∂θ1   . .. sθ (·) :=   ∂ log pθ /∂θk I(θ) = Eθ sθ (X)sθ′ (X) = Cov θ (sθ (X))

Completeness for Exponential Families: Given a k-dimensional exponential family and k

i=1

Fisher Information: I(θ) := varθ (sθ (X)) I(θ) = −Eθ s˙θ (X) For n i.i.d. observations: I(θ) = nI (θ) Fisher Information for Exponential Families: ˙ ¨ − d(θ) c¨(θ) I(θ) = d(θ) c(θ) ˙

I(θ) I0 (γ) = ¨d 0 (γ) = 2 [c(θ)] ˙

Bias, Variance Bias: biasθ (T ) := Eθ T − g(θ ) T is unbiased if biasθ (T ) = 0 ∀θ . Mean Square ErrorP R : MSEθ (T ) := Eθ (T − g(θ ))2 MSEθ (T ) = bias2θ (T ) + varθ (T ) Uniform Minimum Variance Unbiased: Unbiased estimator T ∗ is UMVU if for any other unbiased estimator: varθ (T ∗ ) ≤ varθ (T ) ∀θ Conditioning on Sufficient Statistic: If T is unbiased, S sufficient, and T ∗ := E(T | S): Eθ (T ∗ ) = g(θ)

varθ (T ∗ ) ≤ varθ (T ) ∀θ

Lehmann-Scheff´ e Lemma: If T is an unbiased estimator of g(θ) with finite variance (for all θ) and S is sufficient and complete, T ∗ := E(T | S) is UMVU.

Cram´ er Rao Lower Bound: If the support of pθ does not depend on θ and pθ is differentiable in L2 , for an unbiased estimator T of g(θ) (with derivative g(θ)), ˙ we have: g˙θ (x) = cov(T , sθ (X))

Equivariant Statistics Location equivariant statistic: c ∈ R and x = (x1 , . . . , xn ):

For all constants

T (x1 + c, . . . , xn + c) = T (x1 , . . . , xn ) + c g˙ 2 (θ) varθ (T ) ≥ I(θ)

∀θ

CRLB for Exponential Families: If T is unbiased and reaches the CRLB, then there exist functions c(θ), d(θ ), and h(x) such that for all θ : pθ (x) = exp[c(θ)T (X) − d(θ)]h(x) x ∈ X ˙ c(θ) ˙ g(θ) = d(θ)/ Higher-Dimensional CRLB: For an unbiased estimator T of g(θ): varθ (T ) ≥ g(θ ˙ )′ I(θ )−1 g(θ) ˙

Comparison Risk: Given loss function L(·, ·): R(θ, T ) := Eθ (L(θ, T (X)) Risk and sufficiency: S sufficient for θ and d : X → A some decision. Then there is a randomized decision δ(S ) such that: R(θ, δ(S)) = R(θ, d ) ∀θ Rao-BlackwellPR : S sufficient for θ, A ⊂ Rp convex and a 7→ L(θ, a) convex for all θ. For decision d : X → A and d ′ (s) := E(d(X) | S = s): R (θ, d ′ ) ≤ R(θ, d ) ∀θ Sensitivity/Robustness: Influence function l(x) := (n+1) (Tn+1 (X1 , . . . , Xn , x) − Tn (X1 , . . . , Xn )) , x ∈ R

For m ≤ n:

ǫ(m) :=

sup x1∗,...,x∗m

|T

(x1∗, . . . , x∗m , Xm+1 , . . . , Xn )|

Break down point: ǫ∗ := min{m : ǫ(m) = ∞}/n

Location invariant loss function: For all constants c ∈ R: L(θ + c, a + c) = L(θ, a) (θ, a) ∈ R2 Risk for equivariant statistics/invariant loss functions: R(θ, T ) = Eθ L(0, T (X − θ)) = EL0 [T (ε)] Uniform Minimum Risk Equivariant: R(θ, T ) =

min

d equivariant

R(0, T ) =

R(θ, d) ∀θ

min

d equivariant

R(0, d )

Maximal Invariant: Map Y : Rn → Rn is maximal invariant if: Y(x) = Y (x′ ) ⇔ ∃c : x = x′ + c UMRE estimator construction: d(X) equivariant, Y := X − d(X). T ∗ (Y) := arg min E [L0 (v + d (ε)) | Y ] v

T ∗ (X) := T ∗ (Y) + d(X) is UMRE. UMRE estimator for quadratic loss: T is UMRE ⇔ E0 (T (X) | X − T (X)) = 0 Pitman estimator: T ∗ (X) = Xn − E (ǫn | Y)

Tests and Confidence Intervals Quantile Functions: F qsup (u) := sup{x : F (x) ≤ u} F qinf (u) := inf{x : F (x) ≥ u} := F −1 (u)

Test: For γ0 ∈ Γ, α ∈ [0, 1] a test for H0 : γ = γ0 is a statistic φ(X, γ0 ) ∈ {0, 1} such that P θ (φ(X, γ0 ) = 1) ≤ α for all θ ∈ {ϑ : g(ϑ) = γ0 } Pivot: Function Z(X, γ) such that for all θ ∈ Θ, this distribution does not depend on θ : Pθ (Z (X, g(θ)) ≤ ·) =: G(·) We can construct test for Hγ0 : α   α G G 1− qL := qsup , qR := qinf 2 2

Basu’s lemma : Let X have distribution P θ , suppose T is sufficient/complete, and Y = Y (X) has a distribution that does not depend on θ. Then, T and Y are independent under P θ for all θ .

if Z (X, γ0 ) ∈ / [qL , qR ] else

Student’s test: Assume data is normal distributed with same variance. Then: r ¯ ¯ Y −X nm T := Z (X, Y , 0) = Z (X, Y, γ) := S n+m   n m X 2  2 X   1 2 ¯ ¯ S := Xi − X + Yj − Y  m+n−2 i=1

j=1

And one-sided test at level α for H0 : γ = 0 against H1 : γ < 0 is: φ(X, Y) := {

1 0

if T < −tn+m−2 (1 − α) if T ≥ −tn+m−2 (1 − α)

Wilcoxon’s test: Let Ri := rank(Zi ) among the pooled sample. Then: T :=

PR

1 0

φ (X, γ0 ) := {

n X i=1

Ri = #{Yj < Xi } +

n(n + 1) 2

And (the distribution is often tabulated): PH0 (T = t) =

# {r :

i=1 ri

Pn

= t}

N!

Uniformly Most Powerful Tests Level: φ is a test at level α if: sup Eθ φ(X) ≤ α

θ∈Θ0

A test φ is UMP if it has level α and for all tests φ′ with level α: Eθ φ′ (X) ≤ Eθ φ(X) ∀θ ∈ Θ1

Neyman Pearson LemmaP R : H0 : θ = θ0 and H1 : θ = θ1 . E φ(X ), θ = θ0 R(θ, φ) := { θ 1 − Eθ φ(X), θ = θ1 1 φNP := { q 0

if p1 /p0 > c if p1 /p0 = c if p1 /p0 < c

R (θ1 , φNP) − R (θ1 , φ) ≤ c[R (θ0 , φ) − R (θ0 , φNP )] One Sided UMP TestPR : Given n i.i.d. copies of a Bernoulli random variable with success parameter θ and with P T := ni=1 Xi as the number of successes, the following test is UMP for H0 : θ ≥ c, H1 : θ < c (and also the weaker hypothesis H0 : θ = c, H1 : θ = c − or H0 : θ = c, H1 : θ < c):   1 q φ(T ) :=  0

if T < t0 if T = t0 if T > t0

Unbiased tests: Test φ is unbiased if for all θ ∈ Θ0 , ϑ ∈ Θ1 :

Decision Theory

Eθ φ(X) ≤ Eϑ φ(X)

Admissible Decision: A decision d ′ is strictly better than d if: R (θ, d ′ ) ≤ R(θ, d ) ∀θ

Uniformly Most Powerful Unbiased: Unbiased test φ is UMPU if it has level α and for all unbiased tests φ′ with level α, Eθ φ′ (X) ≤ Eθ φ(X) ∀θ ∈ Θ1 UMPU for a one-dimensional exponential family: P one-dimensional exponential family with c(θ) strictly increasing in θ. A UMPU test is then:  1    qL φ(T (x)) := qR    0

if T (x) < tL or T (x) > tR if T (x) = tL if T (x) = tR if tL < T (x) < tR

With constants tR , tL , qR , and qL such that:   d Eθ φ(X ) Eθ0 φ(X) = α, =0 dθ θ=θ0

Confidence Intervals

Confidence Set: Subset I = I(X) ⊂ Γ, depending only on the data, is a confidence set for γ at level 1 − α if: Pθ (γ ∈ I) ≥ 1 − α, ∀θ ∈ Θ Confidence Interval:

I := [γ , γ¯] ¯ with γ = γ (X), γ¯ = γ¯ (X). ¯ ¯ Where t0 is chosen such that P θ0 (T ≤ t0 − 1) ≤ Confidence Sets / Tests: Given for each γ0 ∈ R α, P θ0 (T ≤ t0 ) > α and q such that P θ0 (H0 rejected ) = a test at level α for Hγ0 , the following is a (1 − α)P θ0 (T ≤ t0 − 1) + qP θ0 (T = t0 ) := α, i.e.: confidence set for γ : α − P θ0 (T ≤ t0 − 1) q= P θ0 (T = t0 ) I(X) := {γ : φ(X, γ) = 0}

UMP tests for exponential families: H0 : θ ≤ θ0 , H1 : θ > θ0 , and c(θ) strictly increasing. Then a UMP test is: 1 if T (x) > t0 φ(T (x)) := { q if T (x) = t0 0 if T (x) < t0

Given a (1 − α)-confidence set for γ, the following test is a test at level α of Hγ0 : γ = γ0 for all γ0 : φ (X, γ0 ) = {

1 0

if γ0 ∈ / I(X) else

∃θ : R (θ, d ′ ) < R(θ, d )

d is called inadmissible when there exists a d ′ that is strictly better than d . Admissibility for the Neyman Pearson Test: A Neyman Pearson test is admissible if and only if its power is strictly less than 1 or it has minimal level among all tests with power 1. Admissible Estimators for the Normal MeanP R : X ∼ N (θ, 1), Θ := R and R(θ, T ) := Eθ (T − θ)2 . If we consider estimators of the form T = aX +b, a > 0, b ∈ R, T is admissible if and only if one of the following cases hold: 1. a < 1 2. a = 1 and b = 0 Minimax Decisions: d minimax if sup R(θ, d) = inf′ sup R (θ, d ′ ) θ

d

θ

Minimax for the Neyman Pearson Test: A Neyman Pearson test is minimax if and only if R(θ0 , φN P ) = R(θ1 , φN P )

Bayes Decisions Bayes Risk: Given probability measure Π (prior distribution) of Θ, and density w := dΠ/dµ: Z r(Π, d) := R(ϑ, d)dΠ(ϑ) Θ

r(Π, d) =

Z

Extended Bayes Decision: T is called extended Bayes if there exists a sequence of prior densities ′ {wm }∞ ′ m=1 such that rwm (T ) − inf T rwm (T ) → 0 as m → ∞.

Admissibility, Extended BayesPR : Suppose T is ex′ tended Bayes and for all T ′ , R(θ, R T ) is continuous in θ . Furthermore, with Πm (U ) := U wm (ϑ)dµm (ϑ) being the probabilty of U under the prior Πm :

Bayes Estimator for Quadratic Loss: L(θ, a) := (θ−a)2 , then:

rwm (T ) − inf T ′ rwm (T ′ ) →0 Πm (U )

d Bayes (X ) = E(θ | X )

R(ϑ, d)w(ϑ)dµ(ϑ) := rw (d)

Θ

Bayes Decision: A decision d is called Bayes if:

Then, T is admissible.

For T = E(θ | X), the Bayes risk of an estimator T ′ is: ′ 2

′

r(Π, d) = inf′ r (Π, d ′ )

rw (T ) = E var(θ | X) + E (T − T )

A posteriori density: Given pθ (x) = p(x | θ), and the marginal density: Z p(·) := p(· | ϑ)w(ϑ)dµ(ϑ)

Bayes Estimator/MAP/MLE: For L(θ, a) := 1{|θ − a| > c} and c small, Bayes rule is approximately the maximum a posteriori estimator, which is equivalent to the MLE for a uniform prior. With quadratic loss, Bayes estimator is the expectation value of the posterior, whereas the MAP is the maximum.

d

Θ

Credibility Interval: A (1 − α)-credibility interval is:

The a posterior density of θ is: w(ϑ | x) = p(x | ϑ)

w(ϑ) , ϑ ∈ Θ, x ∈ X p(x)

h

i I := θˆL (X), θˆR (X)

Bayes Decision Construction: Let Z l(x, a) := E[L(θ, a) | X = x] = L(ϑ, a)w(ϑ | x)dµ(ϑ)

Z

Then Bayes decision is:

Constructing Estimators

θˆR (X ) ˆ θL (X )

w(ϑ | X)dϑ = (1 − α)

Θ

d Bayes (X) = arg min l(X, a) a∈A

d Bayes (X) = arg min a∈A

Z

L(ϑ, a)gϑ (S)w(ϑ)dµ(ϑ)

Bayes Test: Assume H0 : θ = θ0 , H1 : θ = θ1 , L(θ0 , a) := a, L(θ1 , a) := 1 − a, w(θ0 ) =: w0 , and w(θ1 ) =: w1 = 1 − w0 . Bayes test is then (for an arbitrary q):

φBayes =

  1 q  0

if p1 /p0 > w0 /w1 if p1 /p0 = w0 /w1 if p1 /p0 < w0 /w1

MinimaxityP R : Suppose T is a statistic with risk R(θ, T ) = R(T ) not depending on θ. Then: 1. T admissible ⇒ T minimax 2. T Bayes ⇒ T minimax 3. T extended Bayes ⇒ T minimax AdmissibilityP R : Suppose T is Bayes for prior density w. Then 1. or 2. are sufficient for the admissibility: 1. T is unique Bayes (rw (T ) = rw (T ′ ) implies ∀θ, T = T ′ , P θ -almost surely) 2. For all T ′ , R(θ, T ′ ) is continuous inRθ and for all open U ⊂ Θ, the prior probability U w(ϑ)dµ(ϑ) of U is strictly positive.

The Linear Model Least Squares Estimator: Given (augmented) design matrix X ∈ Rn×p , the least squares estimator is the projection of Y on {Xb : b ∈ Rp }: βˆ := arg minp kY − Xbk22 b∈R

 −1 T X Y βˆ = X T X Distribution of the Least Square EstimatorP R :  −1 T For f = EY , let β ∗ := X T X X f and Xβ ∗ the best linear approximation of f . For EǫǫT = σ 2 I, ǫ := Y − f :   ˆ = σ 2 X T X −1 1. Eβˆ = β ∗ , Cov(β)   2   2. E X βˆ − β ∗  = σ 2 p 2 2 3. EkX βˆ − f k = σ 2 p + kXβ ∗ − f k2 2

2

PR Least Squares Estimator Expectation : When   −1  2 ∗ 2 ˆ ǫ ∼ N (0, σ I), we have β − β ∼ N 0, σ X T X 2

∗ ˆ )k2 kX (β−β ∼ χp2 A test for H0 : β = β0 is to reject σ2   ˆ H0 when kX β − β 0 k22 /σ02 > Gp−1(1 − α) where Gp

and

is the distribution function of a χ2p -distributed random variable.

Testing a Linear Hypothesis: Y = Xβ + ǫ with ǫ ∼ N (0, σ 2 I) and we want to test H0 : B β = 0. Under H0 , the following fraction is χ2q -distributed: ˆ0 k2 − kY − Xβk ˆ 22 kY − X β 2 σ2

Asymptotic Theory We assume an estimator Tn (X1 , . . . , Xn ) of γ is defined for all n, i.e. we consider a sequence of estimators. Markov’s/Chebyshev’s Inequality: For all increasing functions ψ : [0, ∞) → [0, ∞): P (kZk ≥ ǫ) ≤

P

D Slutsky’s TheoremP R : Assume that Zn −→ Z, An −→ a. Then:

D

ATn Zn −→ aT Z

n

Tn − γ =

Eψ (kZk) ψ(ǫ)

Almost Sure Convergence: Zn converges almost surely to Z if P( lim Zn = Z) = 1 n→∞

Convergence in Probability: Zn converges in probP ability to Z (Zn −→ Z) if for all ǫ > 0: lim P (kZn − Zk > ǫ) = 0

n→∞

Almost sure convergence implies convergence in probability, but not the other way around.

Central Limit Theorem: Let X1 , X2 , . . . be i.i.d. with mean µ, variance σ 2 . Then:

Stochastic Order Symbols: Let rn be strictly positive random variables. Zn = OP (1) (Zn bounded in probability) if: lim lim sup P (kZn k > M ) = 0

Convergence in Distribution: Zn converges in disD tribution to Z (Zn −→ Z) if for all continuous and bounded functions f :

Zn = OP (rn ) if Zn /rn = OP (1). When Zn converges in distribution, Zn = OP (1). If Zn converges in probability to zero, Zn = oP (1) and Zn = oP (rn ) if Zn /rn = oP (1).

lim Ef (Zn ) = Ef (Z)

n→∞

Convergence in probability implies convergence in distribution, but not the other way around. Portmanteau Theorem: The following statements are equivalent: D

1. Zn −→ Z (i.e., Ef (Zn ) → Ef (Z)∀f bounded and continuous) 2. Ef (Zn ) → Ef (Z )∀f bounded and Lipschitz (f Lipschitz if for a constant CL , |f (z) − f (˜ z )| ≤ CL kz − z˜k) 3. Ef (Zn ) → Ef (Z)∀f bounded an Q-a.s. continuous (where Q is the distribution of Z). 4. P (Zn ≤ z) → G(z) for all G-continuity points z (where G = Q(Z ≤ ·) is the distribution function of Z) Cram´ er-Wold Device: D

Zn −→ Z ⇔ aT Zn −→ aT Z∀a ∈ Rp

Consistent Estimators: Sequence of estimators Tn is consistent if:

P

θ Tn −→ γ

Asymptotically Normal Estimators: Sequence of estimators Tn is asymptotically normal with covariance matrix Vθ :

√ Dθ n (Tn − γ) −→ N (0, Vθ )

1X √ lθ (Xi ) + oPθ (1/ n) n i=1

The δ-Technique: Let h be differentiable at c and suppose: D (Tn − c) /rn −→ Z Then:

  D  √  n X¯n − µ −→ N 0, σ 2

M→∞ n→∞

D

Asymptotically Linear Estimators: Sequence of estimators Tn is asymptotically linear if for a (influence) function lθ : X → Rp with Eθ lθ (X) = 0 and Eθ lθ (X)lTθ (X) =: Vθ < ∞:

D ˙ T Z (h (Tn ) − h(c)) /rn −→ h(c)

˙ )T (Tn − c ) + oP (rn ) h (T...