Stastitics summary...


Fundamentals of Mathematical Statistics Definitions/Lemmas by Roman B¨ ohringer

Multinomial Distribution: P (N1 = n1 , . . . , Nk = nk ) =

ETH Z¨ urich January 2, 2021

Probability Theory Conditional Probability: P (A | B) =

n n1 · · · nk

Poisson Distribution:


n n1 · · · nk

pθ (x) = exp 

pn1 1 · · · pnkk

n! n1 ! · · · nk !

P (X = x) = e−λ P (A ∩ B) P (B)

The family is in canonical form if:


k X j=1

θj Tj (x) − d(θ) h(x)

Where:  Z

d(θ) = log 

λx x!

   k X   exp  h( x ) dν(x ) θj Tj (x) j=1


Normal Distribution:

Estimator: An estimator T (X) is a function T (·) evaluated at the observations X. The function T (·) is not allowed to P (B) depend on unknown parameters. P (B | A) = P (A | B) P (A) Sum of Independent Normal / Poisson Variables: For Empirical Distribution Function: 2 Y ∼ N (µY , σY2 ) 1 Law of Total Probability: For a partition {Bj } (Bj ∩ Bk = X and Y independent: X ∼ N (µX , σ X ) and Fˆn (·) := # {Xi ≤ ·, 1 ≤ i ≤ n} with Z = X + Y , then Z ∼ N (µX + µY , σ 2X + σY2 ). n ∅ for all j 6= k and P (∪j Bj ) = 1): X ∼ P(λ), Y ∼ P(µ) ⇒ Z ∼ P(λ + µ). X Method of Moments: Given the first p moments of X : Chi-Square Distribution: Let Z1 , . . . , Zp be i.i.d. N (0, 1)P (A | Bj ) P (Bj ) P (A) = distributed and define the p-vector: j Z   µj (θ) = Eθ X j = xj dFθ (x), j = 1, . . . , p Marginal Density: Z1 Z   Z :=  ...  fX (·) = fX,Y (·, y)dy And the map m with inverse m−1 : Zp Z m(θ ) = [µ1 (θ), . . . , µp (θ )] m−1 (µ1 , . . . , µp ) fX (x) = fX (x | y)fY (y)dy Z is N (0, I) distributed and the χ2 -distribution with p degrees of freedom is defined as (kZk22 ∼ χp2): We calculate: Conditional Density: Z n p 1 X j X µ ˆj := Xi = xj dFˆn (x), j = 1, . . . , p fX,Y (x, y) 2 2 Zj kZ k2 := n fX (x | y) := i=1 fY (y) j=1 And plug in: fY (y) Distribution of Maximum: Z := max{X1 , X2 } with X1 , fY (y | x) = fX (x | y) θˆ := m−1 (ˆ µ1 , . . . , µ ˆp ) fX (x) X2 independent having distribution F and density f . Maximum Likelihood Estimator: Given the likelihood Conditional Expectation: function: fZ (z ) = 2F (z )f (z ) Z n Y LX (ϑ) := pϑ (Xi ) , ϑ ∈ Θ E[g(X, Y ) | Y = y] := fX (x | y)g(x, y)dx Exponential Families: A k-dimensional exponential fami=1 ily is a family of distributions with densities of the form: Iterated Expectations Lemma: We calculate: Bayes Theorem:

E [E [g(X, Y ) | Y ]] = Eg(X, Y ) Law of Total Variance: var(Y ) = var(E(Y | Z)) + Evar(Y | Z )

1 1 x−µ 2 √ e− 2 ( σ ) σ 2π

pθ (x) = exp 

k X j=1

c j (θ)Tj (x) − d(θ) h(x)

ˆ θ = arg maxlog LX (ϑ) = arg max ϑ∈Θ


n X i=1

log pϑ (Xi )

Minimal Sufficiency: Two likelihoods Lx (θ) and Lx˜ (θ) are And for γ = c(θ): proportional at (x, x˜) if

Sufficiency Sufficiency: Some given map S : X → Y is called sufficient for θ ∈ Θ if for all θ, and all possible s, the following conditional distribution does not depend on θ : P θ (X ∈ · | S(X) = x) Factorization Theorem of NeymanPR : Given densities pθ , S is sufficient if and only if there are some functions gθ (·) ≥ 0 and h(·) ≥ 0 such that we can write: pθ (x) = gθ (S(x))h(x) ∀x, θ Sufficiency for Exponential Families: For a k dimensional exponential family, the k-dimensional statistic S(X ) = (T1 (X ), . . . , Tk (X)) is sufficient for θ. For n i.i.d. samples, the following statistic is sufficient: S(X) = (

n n 1X 1X T1 (xi ), . . . , Tk (xi )) n n i=1


Lx (θ) = Lx˜ (θ)c(x, x˜) ∀θ for some constant c(x, x˜). A sufficient statistic S is called minimal sufficient if S(x) = S(˜ x) for all x and x˜ where the likelihoods are proportional. Completeness: Sufficient statistic S is called complete if (where h is a function not depending on θ): Eθ h(S ) = 0∀θ ⇒ h(S ) = 0, P θ − a.s.


C := {(c 1 (θ ), . . . , c k (θ)) : θ ∈ Θ} ⊂ R

If C is truly k-dimensional, S := (T1 , . . . , Tk ) is complete.

Fisher Information

Expectation/Covariance of Sufficient Statistic for ExScore Function: ponential Families: Given an exponential family in canonical form and: d p˙θ (x) sθ (x) := log pθ (x) =   2 2 pθ (x) dθ ∂ ∂ ∂ ˙ ¨ := d(θ) = d(θ) d(θ) := d(θ) d(θ) ∂θ∂θ′ ∂θj1 ∂θj2 ∂θ Eθ sθ (X) = 0     Eθ T1 (X) T1 (X) For n i.i.d. observations:     .. .. T (X) :=  n   , Eθ T (X) :=  . . X sθ (x) = sθ (xi ) Eθ Tk (X) Tk (X) Cov θ (T (X)) := Eθ T (X)T ′ (X) − Eθ T (X)Eθ T ′ (X )

We have


: ˙ ¨ Eθ T (X) = d(θ), Cov θ (T (X)) = d(θ)

If the family is not in canonical form: ˙ d(θ) Eθ T (X) = c(θ) ˙ 1 varθ (T (X)) = 2 [c(θ)] ˙

! ˙ ¨ − d(θ) c¨(θ) d(θ) c(θ) ˙

Higher-Dimensional Extensions (Score Vector & Fisher Information Matrix):   ∂ log pθ /∂θ1   . .. sθ (·) :=   ∂ log pθ /∂θk I(θ) = Eθ sθ (X)sθ′ (X) = Cov θ (sθ (X))

Completeness for Exponential Families: Given a k-dimensional exponential family and k


Fisher Information: I(θ) := varθ (sθ (X)) I(θ) = −Eθ s˙θ (X) For n i.i.d. observations: I(θ) = nI (θ) Fisher Information for Exponential Families: ˙ ¨ − d(θ) c¨(θ) I(θ) = d(θ) c(θ) ˙

I(θ) I0 (γ) = ¨d 0 (γ) = 2 [c(θ)] ˙

Bias, Variance Bias: biasθ (T ) := Eθ T − g(θ ) T is unbiased if biasθ (T ) = 0 ∀θ . Mean Square ErrorP R : MSEθ (T ) := Eθ (T − g(θ ))2 MSEθ (T ) = bias2θ (T ) + varθ (T ) Uniform Minimum Variance Unbiased: Unbiased estimator T ∗ is UMVU if for any other unbiased estimator: varθ (T ∗ ) ≤ varθ (T ) ∀θ Conditioning on Sufficient Statistic: If T is unbiased, S sufficient, and T ∗ := E(T | S): Eθ (T ∗ ) = g(θ)

varθ (T ∗ ) ≤ varθ (T ) ∀θ

Lehmann-Scheff´ e Lemma: If T is an unbiased estimator of g(θ) with finite variance (for all θ) and S is sufficient and complete, T ∗ := E(T | S) is UMVU.

Cram´ er Rao Lower Bound: If the support of pθ does not depend on θ and pθ is differentiable in L2 , for an unbiased estimator T of g(θ) (with derivative g(θ)), ˙ we have: g˙θ (x) = cov(T , sθ (X))

Equivariant Statistics Location equivariant statistic: c ∈ R and x = (x1 , . . . , xn ):

For all constants

T (x1 + c, . . . , xn + c) = T (x1 , . . . , xn ) + c g˙ 2 (θ) varθ (T ) ≥ I(θ)


CRLB for Exponential Families: If T is unbiased and reaches the CRLB, then there exist functions c(θ), d(θ ), and h(x) such that for all θ : pθ (x) = exp[c(θ)T (X) − d(θ)]h(x) x ∈ X ˙ c(θ) ˙ g(θ) = d(θ)/ Higher-Dimensional CRLB: For an unbiased estimator T of g(θ): varθ (T ) ≥ g(θ ˙ )′ I(θ )−1 g(θ) ˙

Comparison Risk: Given loss function L(·, ·): R(θ, T ) := Eθ (L(θ, T (X)) Risk and sufficiency: S sufficient for θ and d : X → A some decision. Then there is a randomized decision δ(S ) such that: R(θ, δ(S)) = R(θ, d ) ∀θ Rao-BlackwellPR : S sufficient for θ, A ⊂ Rp convex and a 7→ L(θ, a) convex for all θ. For decision d : X → A and d ′ (s) := E(d(X) | S = s): R (θ, d ′ ) ≤ R(θ, d ) ∀θ Sensitivity/Robustness: Influence function l(x) := (n+1) (Tn+1 (X1 , . . . , Xn , x) − Tn (X1 , . . . , Xn )) , x ∈ R

For m ≤ n:

ǫ(m) :=

sup x1∗,...,x∗m


(x1∗, . . . , x∗m , Xm+1 , . . . , Xn )|

Break down point: ǫ∗ := min{m : ǫ(m) = ∞}/n

Location invariant loss function: For all constants c ∈ R: L(θ + c, a + c) = L(θ, a) (θ, a) ∈ R2 Risk for equivariant statistics/invariant loss functions: R(θ, T ) = Eθ L(0, T (X − θ)) = EL0 [T (ε)] Uniform Minimum Risk Equivariant: R(θ, T ) =


d equivariant

R(0, T ) =

R(θ, d) ∀θ


d equivariant

R(0, d )

Maximal Invariant: Map Y : Rn → Rn is maximal invariant if: Y(x) = Y (x′ ) ⇔ ∃c : x = x′ + c UMRE estimator construction: d(X) equivariant, Y := X − d(X). T ∗ (Y) := arg min E [L0 (v + d (ε)) | Y ] v

T ∗ (X) := T ∗ (Y) + d(X) is UMRE. UMRE estimator for quadratic loss: T is UMRE ⇔ E0 (T (X) | X − T (X)) = 0 Pitman estimator: T ∗ (X) = Xn − E (ǫn | Y)

Tests and Confidence Intervals Quantile Functions: F qsup (u) := sup{x : F (x) ≤ u} F qinf (u) := inf{x : F (x) ≥ u} := F −1 (u)

Test: For γ0 ∈ Γ, α ∈ [0, 1] a test for H0 : γ = γ0 is a statistic φ(X, γ0 ) ∈ {0, 1} such that P θ (φ(X, γ0 ) = 1) ≤ α for all θ ∈ {ϑ : g(ϑ) = γ0 } Pivot: Function Z(X, γ) such that for all θ ∈ Θ, this distribution does not depend on θ : Pθ (Z (X, g(θ)) ≤ ·) =: G(·) We can construct test for Hγ0 : α   α G G 1− qL := qsup , qR := qinf 2 2

Basu’s lemma : Let X have distribution P θ , suppose T is sufficient/complete, and Y = Y (X) has a distribution that does not depend on θ. Then, T and Y are independent under P θ for all θ .

if Z (X, γ0 ) ∈ / [qL , qR ] else

Student’s test: Assume data is normal distributed with same variance. Then: r ¯ ¯ Y −X nm T := Z (X, Y , 0) = Z (X, Y, γ) := S n+m   n m X 2  2 X   1 2 ¯ ¯ S := Xi − X + Yj − Y  m+n−2 i=1


And one-sided test at level α for H0 : γ = 0 against H1 : γ < 0 is: φ(X, Y) := {

1 0

if T < −tn+m−2 (1 − α) if T ≥ −tn+m−2 (1 − α)

Wilcoxon’s test: Let Ri := rank(Zi ) among the pooled sample. Then: T :=


1 0

φ (X, γ0 ) := {

n X i=1

Ri = #{Yj < Xi } +

n(n + 1) 2

And (the distribution is often tabulated): PH0 (T = t) =

# {r :

i=1 ri


= t}


Uniformly Most Powerful Tests Level: φ is a test at level α if: sup Eθ φ(X) ≤ α


A test φ is UMP if it has level α and for all tests φ′ with level α: Eθ φ′ (X) ≤ Eθ φ(X) ∀θ ∈ Θ1

Neyman Pearson LemmaP R : H0 : θ = θ0 and H1 : θ = θ1 . E φ(X ), θ = θ0 R(θ, φ) := { θ 1 − Eθ φ(X), θ = θ1 1 φNP := { q 0

if p1 /p0 > c if p1 /p0 = c if p1 /p0 < c

R (θ1 , φNP) − R (θ1 , φ) ≤ c[R (θ0 , φ) − R (θ0 , φNP )] One Sided UMP TestPR : Given n i.i.d. copies of a Bernoulli random variable with success parameter θ and with P T := ni=1 Xi as the number of successes, the following test is UMP for H0 : θ ≥ c, H1 : θ < c (and also the weaker hypothesis H0 : θ = c, H1 : θ = c − or H0 : θ = c, H1 : θ < c):   1 q φ(T ) :=  0

if T < t0 if T = t0 if T > t0

Unbiased tests: Test φ is unbiased if for all θ ∈ Θ0 , ϑ ∈ Θ1 :

Decision Theory

Eθ φ(X) ≤ Eϑ φ(X)

Admissible Decision: A decision d ′ is strictly better than d if: R (θ, d ′ ) ≤ R(θ, d ) ∀θ

Uniformly Most Powerful Unbiased: Unbiased test φ is UMPU if it has level α and for all unbiased tests φ′ with level α, Eθ φ′ (X) ≤ Eθ φ(X) ∀θ ∈ Θ1 UMPU for a one-dimensional exponential family: P one-dimensional exponential family with c(θ) strictly increasing in θ. A UMPU test is then:  1    qL φ(T (x)) := qR    0

if T (x) < tL or T (x) > tR if T (x) = tL if T (x) = tR if tL < T (x) < tR

With constants tR , tL , qR , and qL such that:   d Eθ φ(X ) Eθ0 φ(X) = α, =0 dθ θ=θ0

Confidence Intervals

Confidence Set: Subset I = I(X) ⊂ Γ, depending only on the data, is a confidence set for γ at level 1 − α if: Pθ (γ ∈ I) ≥ 1 − α, ∀θ ∈ Θ Confidence Interval:

I := [γ , γ¯] ¯ with γ = γ (X), γ¯ = γ¯ (X). ¯ ¯ Where t0 is chosen such that P θ0 (T ≤ t0 − 1) ≤ Confidence Sets / Tests: Given for each γ0 ∈ R α, P θ0 (T ≤ t0 ) > α and q such that P θ0 (H0 rejected ) = a test at level α for Hγ0 , the following is a (1 − α)P θ0 (T ≤ t0 − 1) + qP θ0 (T = t0 ) := α, i.e.: confidence set for γ : α − P θ0 (T ≤ t0 − 1) q= P θ0 (T = t0 ) I(X) := {γ : φ(X, γ) = 0}

UMP tests for exponential families: H0 : θ ≤ θ0 , H1 : θ > θ0 , and c(θ) strictly increasing. Then a UMP test is: 1 if T (x) > t0 φ(T (x)) := { q if T (x) = t0 0 if T (x) < t0

Given a (1 − α)-confidence set for γ, the following test is a test at level α of Hγ0 : γ = γ0 for all γ0 : φ (X, γ0 ) = {

1 0

if γ0 ∈ / I(X) else

∃θ : R (θ, d ′ ) < R(θ, d )

d is called inadmissible when there exists a d ′ that is strictly better than d . Admissibility for the Neyman Pearson Test: A Neyman Pearson test is admissible if and only if its power is strictly less than 1 or it has minimal level among all tests with power 1. Admissible Estimators for the Normal MeanP R : X ∼ N (θ, 1), Θ := R and R(θ, T ) := Eθ (T − θ)2 . If we consider estimators of the form T = aX +b, a > 0, b ∈ R, T is admissible if and only if one of the following cases hold: 1. a < 1 2. a = 1 and b = 0 Minimax Decisions: d minimax if sup R(θ, d) = inf′ sup R (θ, d ′ ) θ



Minimax for the Neyman Pearson Test: A Neyman Pearson test is minimax if and only if R(θ0 , φN P ) = R(θ1 , φN P )

Bayes Decisions Bayes Risk: Given probability measure Π (prior distribution) of Θ, and density w := dΠ/dµ: Z r(Π, d) := R(ϑ, d)dΠ(ϑ) Θ

r(Π, d) =


Extended Bayes Decision: T is called extended Bayes if there exists a sequence of prior densities ′ {wm }∞ ′ m=1 such that rwm (T ) − inf T rwm (T ) → 0 as m → ∞.

Admissibility, Extended BayesPR : Suppose T is ex′ tended Bayes and for all T ′ , R(θ, R T ) is continuous in θ . Furthermore, with Πm (U ) := U wm (ϑ)dµm (ϑ) being the probabilty of U under the prior Πm :

Bayes Estimator for Quadratic Loss: L(θ, a) := (θ−a)2 , then:

rwm (T ) − inf T ′ rwm (T ′ ) →0 Πm (U )

d Bayes (X ) = E(θ | X )

R(ϑ, d)w(ϑ)dµ(ϑ) := rw (d)


Bayes Decision: A decision d is called Bayes if:

Then, T is admissible.

For T = E(θ | X), the Bayes risk of an estimator T ′ is: ′ 2

r(Π, d) = inf′ r (Π, d ′ )

rw (T ) = E var(θ | X) + E (T − T )

A posteriori density: Given pθ (x) = p(x | θ), and the marginal density: Z p(·) := p(· | ϑ)w(ϑ)dµ(ϑ)

Bayes Estimator/MAP/MLE: For L(θ, a) := 1{|θ − a| > c} and c small, Bayes rule is approximately the maximum a posteriori estimator, which is equivalent to the MLE for a uniform prior. With quadratic loss, Bayes estimator is the expectation value of the posterior, whereas the MAP is the maximum.



Credibility Interval: A (1 − α)-credibility interval is:

The a posterior density of θ is: w(ϑ | x) = p(x | ϑ)

w(ϑ) , ϑ ∈ Θ, x ∈ X p(x)


i I := θˆL (X), θˆR (X)

Bayes Decision Construction: Let Z l(x, a) := E[L(θ, a) | X = x] = L(ϑ, a)w(ϑ | x)dµ(ϑ)


Then Bayes decision is:

Constructing Estimators

θˆR (X ) ˆ θL (X )

w(ϑ | X)dϑ = (1 − α)


d Bayes (X) = arg min l(X, a) a∈A

d Bayes (X) = arg min a∈A


L(ϑ, a)gϑ (S)w(ϑ)dµ(ϑ)

Bayes Test: Assume H0 : θ = θ0 , H1 : θ = θ1 , L(θ0 , a) := a, L(θ1 , a) := 1 − a, w(θ0 ) =: w0 , and w(θ1 ) =: w1 = 1 − w0 . Bayes test is then (for an arbitrary q):

φBayes =

  1 q  0

if p1 /p0 > w0 /w1 if p1 /p0 = w0 /w1 if p1 /p0 < w0 /w1

MinimaxityP R : Suppose T is a statistic with risk R(θ, T ) = R(T ) not depending on θ. Then: 1. T admissible ⇒ T minimax 2. T Bayes ⇒ T minimax 3. T extended Bayes ⇒ T minimax AdmissibilityP R : Suppose T is Bayes for prior density w. Then 1. or 2. are sufficient for the admissibility: 1. T is unique Bayes (rw (T ) = rw (T ′ ) implies ∀θ, T = T ′ , P θ -almost surely) 2. For all T ′ , R(θ, T ′ ) is continuous inRθ and for all open U ⊂ Θ, the prior probability U w(ϑ)dµ(ϑ) of U is strictly positive.

The Linear Model Least Squares Estimator: Given (augmented) design matrix X ∈ Rn×p , the least squares estimator is the projection of Y on {Xb : b ∈ Rp }: βˆ := arg minp kY − Xbk22 b∈R

 −1 T X Y βˆ = X T X Distribution of the Least Square EstimatorP R :  −1 T For f = EY , let β ∗ := X T X X f and Xβ ∗ the best linear approximation of f . For EǫǫT = σ 2 I, ǫ := Y − f :   ˆ = σ 2 X T X −1 1. Eβˆ = β ∗ , Cov(β)   2   2. E X βˆ − β ∗  = σ 2 p 2 2 3. EkX βˆ − f k = σ 2 p + kXβ ∗ − f k2 2


PR Least Squares Estimator Expectation : When   −1  2 ∗ 2 ˆ ǫ ∼ N (0, σ I), we have β − β ∼ N 0, σ X T X 2

∗ ˆ )k2 kX (β−β ∼ χp2 A test for H0 : β = β0 is to reject σ2   ˆ H0 when kX β − β 0 k22 /σ02 > Gp−1(1 − α) where Gp


is the distribution function of a χ2p -distributed random variable.

Testing a Linear Hypothesis: Y = Xβ + ǫ with ǫ ∼ N (0, σ 2 I) and we want to test H0 : B β = 0. Under H0 , the following fraction is χ2q -distributed: ˆ0 k2 − kY − Xβk ˆ 22 kY − X β 2 σ2

Asymptotic Theory We assume an estimator Tn (X1 , . . . , Xn ) of γ is defined for all n, i.e. we consider a sequence of estimators. Markov’s/Chebyshev’s Inequality: For all increasing functions ψ : [0, ∞) → [0, ∞): P (kZk ≥ ǫ) ≤


D Slutsky’s TheoremP R : Assume that Zn −→ Z, An −→ a. Then:


ATn Zn −→ aT Z


Tn − γ =

Eψ (kZk) ψ(ǫ)

Almost Sure Convergence: Zn converges almost surely to Z if P( lim Zn = Z) = 1 n→∞

Convergence in Probability: Zn converges in probP ability to Z (Zn −→ Z) if for all ǫ > 0: lim P (kZn − Zk > ǫ) = 0


Almost sure convergence implies convergence in probability, but not the other way around.

Central Limit Theorem: Let X1 , X2 , . . . be i.i.d. with mean µ, variance σ 2 . Then:

Stochastic Order Symbols: Let rn be strictly positive random variables. Zn = OP (1) (Zn bounded in probability) if: lim lim sup P (kZn k > M ) = 0

Convergence in Distribution: Zn converges in disD tribution to Z (Zn −→ Z) if for all continuous and bounded functions f :

Zn = OP (rn ) if Zn /rn = OP (1). When Zn converges in distribution, Zn = OP (1). If Zn converges in probability to zero, Zn = oP (1) and Zn = oP (rn ) if Zn /rn = oP (1).

lim Ef (Zn ) = Ef (Z)


Convergence in probability implies convergence in distribution, but not the other way around. Portmanteau Theorem: The following statements are equivalent: D

1. Zn −→ Z (i.e., Ef (Zn ) → Ef (Z)∀f bounded and continuous) 2. Ef (Zn ) → Ef (Z )∀f bounded and Lipschitz (f Lipschitz if for a constant CL , |f (z) − f (˜ z )| ≤ CL kz − z˜k) 3. Ef (Zn ) → Ef (Z)∀f bounded an Q-a.s. continuous (where Q is the distribution of Z). 4. P (Zn ≤ z) → G(z) for all G-continuity points z (where G = Q(Z ≤ ·) is the distribution function of Z) Cram´ er-Wold Device: D

Zn −→ Z ⇔ aT Zn −→ aT Z∀a ∈ Rp

Consistent Estimators: Sequence of estimators Tn is consistent if:


θ Tn −→ γ

Asymptotically Normal Estimators: Sequence of estimators Tn is asymptotically normal with covariance matrix Vθ :

√ Dθ n (Tn − γ) −→ N (0, Vθ )

1X √ lθ (Xi ) + oPθ (1/ n) n i=1

The δ-Technique: Let h be differentiable at c and suppose: D (Tn − c) /rn −→ Z Then:

  D  √  n X¯n − µ −→ N 0, σ 2

M→∞ n→∞


Asymptotically Linear Estimators: Sequence of estimators Tn is asymptotically linear if for a (influence) function lθ : X → Rp with Eθ lθ (X) = 0 and Eθ lθ (X)lTθ (X) =: Vθ < ∞:

D ˙ T Z (h (Tn ) − h(c)) /rn −→ h(c)

˙ )T (Tn − c ) + oP (rn ) h (T...

