Chap3 - The Geometry of Influence Functions
Chap3 - The Geometry of Influence Functions
ϕ(Z, θ) to emphasize that this random function will vary according to the
value of θ in the model. Unless otherwise stated, it will be assumed that ϕ(Z)
is evaluated at the truth and expectations are taken with respect to the truth.
Therefore, E{ϕ(Z)} is shorthand for
#
"
The random vector ϕ(Zi ) in (3.1) is referred to as the i-th influence func-
tion of the estimator β̂n or the influence function of the i-th observation of
the estimator β̂n . The term influence function comes from the robustness lit-
erature, where, to first order, ϕ(Zi ) is the influence of the i-th observation on
β̂n ; see Hampel (1974).
Since n1/2 (µ̂n −µ0 ) converges to a normal distribution and (µ̂n −µ0 ) converges
in probability to zero, this implies that n1/2 (µ̂n −µ0 )2 converges in probability
to zero (i.e., is op (1)). Consequently, we have demonstrated that σ̂n2 is an
asymptotically linear estimator for σ 2 whose i-th influence function is given
by ϕ(Zi ) = {(Zi − µ0 )2 − σ02 }. "#
i=1
Proof. By contradiction
Suppose not. Then there exists another influence function ϕ∗ (Z) such that
E{ϕ∗ (Z)} = 0,
and
n
!
n1/2 (β̂n − β0 ) = n−1/2 ϕ∗ (Zi ) + op (1).
i=1
"
n
Since n1/2 (β̂n − β0 ) is also equal to n−1/2 ϕ(Zi ) + op (1), this implies that
i=1
n
!
n−1/2 {ϕ(Zi ) − ϕ∗ (Zi )} = op (1).
i=1
i=1
3.1 Super-Efficiency
Example Due to Hodges
Let Z1 , . . . , Zn be iid N (µ, 1), µ ∈ R. For this simple model, we know that
the maximum "n likelihood estimator (MLE) of µ is given by the sample mean
Z̄n = n−1 i=1 Zi and that
D(µ)
n1/2 (Z̄n − µ) −−−→ N (0, 1).
Now, consider the estimator µ̂n given by Hodges in 1951 (see LeCam,
1953): %
Z̄n if |Z̄n | > n−1/4
µ̂n =
0 if |Z̄n | ≤ n−1/4 .
Some of the properties of this estimator are as follows.
If µ '= 0, then with increasing probability, the support of Z̄n moves away from
0 (see Figure 3.1).
3.1 Super-Efficiency 25
–n–1/4
0 n–1/4 µ
D(µ)
Therefore n1/2 (Z̄n −µ) = n1/2 (µ̂n −µ)+op (1) and n1/2 (µ̂n −µ) −−−→ N (0, 1).
If µ = 0, then the support of Z̄n will be concentrated in an O(n−1/2 )
neighborhood about the origin and hence, with increasing probability, will be
within ±n−1/4 (see Figure 3.2).
–n–1/4
n–1/4
& '
Therefore, this implies that P0 (µ̂n = 0) → 1. Hence P0 n1/2 µ̂n = 0 → 1, and
P D(0)
n1/2 (µ̂n − 0) −→
0
0 or −−−→ N (0, 0). Consequently, the asymptotic variance of
1/2
n (µ̂n − µ) is equal to 1 for all µ '= 0, as it is for the MLE Z̄n , but for µ = 0,
the asymptotic variance of n1/2 (µ̂n − µ) equals 0 and thus is super-efficient.
Although super-efficiency, at the surface, may seem like a good property
for an estimator to possess, upon further study we find that super-efficiency
is gained at the expense of poor estimation in a neighborhood of zero. To
26 3 The Geometry of Influence Functions
–n–1/4
n–1/3 n–1/4
Therefore, ( )
Pµn n1/2 (µ̂n − µn ) = −n1/2 µn → 1.
−n1/2 µn → −∞.
where T T
θn = (βnT , ηnT )T , θ∗ = (β ∗ , η ∗ )T .
An estimator β̂n , more specifically β̂n (Z1n , . . . , Znn ), is said to be regular if,
for each θ∗ , n1/2 (β̂n − βn ) has a limiting distribution that does not depend on
the LDGP. " #
where
Z1n , . . . , Znn are iid p(z, θ∗ ), for all n,
then * + D(θ )
n1/2 β̂n (Z1n , . . . , Znn ) − βn −−−−→ N (0, Σ∗ ),
n
where
Z1n , . . . , Znn are iid p(z, θn ),
1/2
and n (θn − θ ) → τ
∗
, where τ is any arbitrary constant vector.
p×1
It is easy to see that, in our previous example, the MLE Z̄n is a regular
estimator, whereas the super-efficient estimator µ̂n , given by Hodges, is not.
From now on, we will restrict ourselves to regular estimators; in fact,
we will only consider estimators that are regular and asymptotically linear
(RAL). Although most reasonable estimators are RAL, regular estimators do
exist that are not asymptotically linear. However, as a consequence of Hájek’s
(1970) representation theorem, it can be shown that the most efficient regular
estimator is asymptotically linear; hence, it is reasonable to restrict attention
to RAL estimators.
In Theorem 3.2 and its subsequent corollary, given below, we present a
very powerful result that allows us to describe the geometry of influence func-
tions for regular asymptotically linear (RAL) estimators. This will aid us in
defining and visualizing efficiency and will also help us generalize ideas to
semiparametric models.
First, we define the score vector for a single observation Z in a parametric
model, where Z ∼ pZ (z, θ), θ = (β T , η T )T , by Sθ (Z, θ0 ), where
,
∂ log pZ (z, θ) ,,
Sθ (z, θ0 ) = , (3.3)
∂θ θ=θ0
where
,q×1
∂ log pZ (z, θ) ,,
Sβ (z, θ0 ) = ,
∂β θ=θ0
and
,r×1
∂ log pZ (z, θ) ,,
Sη (z, θ0 ) = , .
∂η θ=θ0
Corollary 1.
(i)
E{ϕ(Z)SβT (Z, θ0 )} = I q×q
and
(ii)
E{ϕ(Z)SηT (Z, θ0 )} = 0q×r ,
where I q×q denotes the q ×q identity matrix and 0q×r denotes the q ×r matrix
of zeros.
3.2 m-Estimators (Quick Review) 29
Theorem 3.2 follows from the definition of regularity together with suffi-
cient smoothness conditions that makes a local data generating process con-
tiguous (to be defined shortly) to the sequence of distributions at the truth.
For completeness, we will give an outline of the proof. Before giving the gen-
eral proof of Theorem 3.2, which is complicated and can be skipped by the
reader not interested in all the technical details, we can gain some insight by
first showing how Corollary 1 could be proved for the special (and important)
case of the class of m-estimators.
Eθ {mT (Z, θ)m(Z, θ)} < ∞, and Eθ {m(Z, θ)mT (Z, θ)} is positive definite for
all θ ∈ Ω. Additional regularity conditions are also necessary and will be
defined as we need them.
The m-estimator θ̂n is defined as the solution (assuming it exists) of
n
!
m(Zi , θ̂n ) = 0
i=1
from a sample
Z1 , . . . , Zn iid pZ (z, θ)
θ ∈ Ω ⊂ Rp .
where Sθ (z, θ) is the score vector (i.e., the derivative of the log-density) defined
in (3.3). Since the score vector Sθ (Z, θ), under suitable regularity conditions,
has the property that Eθ {Sθ (Z, θ)} = 0 – see, for example, equation (7.3.8)
of Casella and Berger (2002) – , this implies that the MLE is an example of
an m-estimator.
In order to prove the consistency and asymptotic normality of m-estimators,
we need to assume certain regularity conditions. Some of the conditions that
are discussed in Chapter 36 of the Handbook . of /Econometrics by Newey
∂m(Z, θ0 )
and McFadden (1994) include that E be nonsingular, where
∂θT
∂m(Zi , θ)
is defined as the p × p matrix of all partial derivatives of the ele-
∂θT
ments of m(·) with respect to the elements of θ, and that
n
! . /
−1 ∂m(Zi , θ) P ∂m(Z, θ)
n → E θ0
i=1
∂θT ∂θT
Therefore,
3 n
4−1 % n
0
! ∂m(Zi , θ∗ ) !
1/2
n (θ̂n − θ0 ) = − n −1 n
n m(Zi , θ0 )
−1/2
∂θT
i=1 i=1
1 . /2−1 % !n
0
∂m(Zi , θ0 )
=− E n −1/2
m(Zi , θ0 ) + op (1).
∂θT i=1
where 7 8
var {m(Z, θ0 )} = E m(Z, θ0 )mT (Z, θ0 ) .
I(θ0 ) = Eθ0 {−Sθθ (Z, θ0 )} = Eθ0 {Sθ (Z, θ0 )SθT (Z, θ0 )}. (3.11)
θ = (β T , η T )T
and
Therefore, 9
∂
m(z, θ)p(z, θ)dν(z) = 0.
∂θT
3.2 m-Estimators (Quick Review) 33
where I p×p denotes the p×p identity matrix. Recall that the influence function
for θ̂n , given by (3.6), is
1 . /2−1
∂m(Z, θ0 )
ϕθ̂n (Zi ) = − E m(Zi , θ0 )
∂θT
* +T
and can be partitioned as ϕTβ̂ (Zi ), ϕTη̂n (Zi ) .
n
The covariance of the influence function ϕθ̂n (Zi ) and the score vector
Sθ (Zi , θ0 ) is
* +
E ϕθ̂n (Zi )SθT (Zi , θ0 )
1 . /2−1
∂m(Z, θ0 ) 7 8
=− E T
E m(Z, θ0 )SθT (Z, θ0 ) , (3.14)
∂θ
Consequently,
* +
(i) E ϕβ̂n (Zi )SβT (Zi , θ0 ) = I q×q (the q × q identity matrix)
34 3 The Geometry of Influence Functions
and
* +
(ii) E ϕβ̂n (Zi )SηT (Zi , θ0 ) = 0q×r .
Thus, we have verified that the two conditions of Corollary 1 hold for influence
functions of m-estimators.
Definition 2. Let Vn be a sequence of random vectors and let P1n and P0n be
sequences of probability measures with densities p1n (vn ) and p0n (vn ), respec-
tively. The sequence of probability measures P1n is contiguous to the sequence
of probability measures P0n if, for any sequence of events An defined with re-
spect to Vn , P0n (An ) → 0 as n → ∞ implies that P1n (An ) → 0 as n → ∞.
#
"
To illustrate that (3.15) holds for LDGPs under sufficient smoothness and
regularity conditions, we sketch out the following heuristic argument. Define
n
p1n (Vn ) - p(Zin , θn )
Ln (Vn ) = = .
p0n (Vn ) i=1 p(Zin , θ0 )
(ii) Since θn∗ → θ0 and Sθθ (Zin , θ0 ), i = 1, . . . , n are iid random matrices with
mean −I(θ0 ), then, under sufficient smoothness conditions,
36 3 The Geometry of Influence Functions
n
! P
n −1
{Sθθ (Zin , θn∗ ) − Sθθ (Zin , θ0 )} −
→ 0,
i=1
hence
n
! P
n −1
Sθθ (Zin , θn∗ ) −
→ −I(θ0 ).
i=1
1/2
By assumption, n (θn − θ0 ) → τ . Therefore, (i), (ii), and Slutsky’s theorem
imply that
# $
D(P0n ) τ T I(θ0 )τ T
log{Ln (Vn )} −−−−−→ N − , τ I(θ0 )τ .
2
Also, under P1n , [ϕ(Zin ) − Eθn {ϕ(Z)}], i = 1, . . . , n are iid mean-zero random
vectors with variance matrix Eθn (ϕϕT ) − Eθn (ϕ)Eθn (ϕT ). By the smoothness
assumption, Eθn (ϕϕT ) → Eθ0 (ϕϕT ) and Eθn (ϕ) → 0 as n → ∞. Hence, by
the CLT, we obtain
n
! # $
D(P1n )
n−1/2 [ϕ(Zin ) − Eθn {ϕ(Z)}] −−−−−→ N 0, Eθ0 (ϕϕT ) . (3.20)
i=1
By a simple Taylor series expansion, we deduce that β(θn ) ≈ β(θ0 )+Γ(θ0 )(θn −
θ0 ), where Γ(θ0 ) = ∂β(θ0 )/∂θT . Hence,
Finally,
9
n1/2 Eθn {ϕ(Z)} = n1/2 ϕ(z)p(z, θn )dν(z)
9 9 . /T
1/2 1/2 ∂p(z, θn∗ )
=n ϕ(z)p(z, θ0 )dν(z) + n ϕ(z) (θn − θ0 )dν(z)
∂θ
9 . /T
n→∞ ∂p(z, θ0 )
−−−−→ 0 + ϕ(z) /p(z, θ0 ) p(z, θ0 )dν(z)τ
∂θ
= Eθ0 {ϕ(Z)SθT (Z, θ0 )}τ, (3.22)
where θn∗ is some intermediate value between θn and θ0 . The only way that
(3.19) and (3.20) can hold is if the limit of (3.18), as n → ∞, is identically
equal to zero. By (3.21) and (3.22), this implies that
F G
Eθ0 {ϕ(Z)SθT (Z, θ0 )} − Γ(θ0 ) τ = 0q×1 .
B q×p Sθ (Z, θ0 )
Constructing Estimators
Let ϕ(Z) be a q-dimensional measurable function with zero mean and finite
variance that satisfies conditions (i) and (ii) of Corollary 1. Define
3.3 Geometry of Influence Functions for Parametric Models 39
Assume that we can find a root-n consistent estimator for the nuisance pa-
rameter η̂n (i.e., where n1/2 (η̂n −η0 ) is bounded in probability). In many cases
the estimator η̂n will be β-dependent (i.e., η̂n (β)). For example, we might use
the MLE for η, or the restricted MLE for η, fixing the value of β.
We will now argue that the solution to the equation
n
!
m{Zi , β, η̂n (β)} = 0, (3.24)
i=1
or
9
m(z, β0 , η)p(z, β0 , η)dν(z) = 0.
Consequently,
, 9
∂ ,,
m(z, β0 , η)p(z, β0 , η)dν(z) = 0,
∂η T ,η=η0
or
9 9
∂m(z, β0 , η0 )
p(z, β ,
0 0η )dν(z) + m(z, β0 , η0 )
∂η T (3.25)
×SηT (z, β0 , η0 )p(z, β0 , η0 )dν(z) = 0.
n1/2 (β̂n − β0 )
3 n
4−1 3 n
4
! ∂ !
=− n −1
m{Zi , βn , η̂n (β̂n )}
∗
n −1/2
m{Zi , β0 , η̂n (β̂n )} .
i=1
∂β T i=1
> ?@ A
⇓p
1 . /2−1
∂
E m(Z, β0 , η0 ) = −I q×q by (3.27) (3.29)
∂β T
"
n
Let us consider the second term of (3.29); namely, n−1/2 m{Zi , β0 , η̂n (β̂n )}.
i=1
By expansion, this equals
n
!
n−1/2 m(Zi , β0 , η0 )
i=1
% 0
n
! ∂m(Zi , β0 , η ∗ ) ( )
+ n −1 n
n1/2 {η̂n (β̂n ) − η0 } , (3.30)
i=1
∂η T > ?@ A
> ?@ A
. ⇓p / ⇓
∂ bounded in probability
E m(Z, β0 , η0 )
∂η T
= 0 by (3.26)
which illustrates that ϕ(Zi ) is the influence function for the i-th observation
of the estimator β̂n above.
Remark 3. This argument was independent of the choice of the root-n consis-
tent estimator for the nuisance parameter η. "
#
Remark 4. In the derivation above, the asymptotic distribution of the esti-
mator obtained by solving the estimating equation, which uses the estimating
function m(Z, β, η̂n ), is the same as the asymptotic distribution of the estima-
tor solving the estimating equation using the estimating function m(Z, β, η0 )
had the true value of the nuisance parameter η0 been known to us. This
fact follows from the orthogonality of the estimating function (evaluated at
the truth) to the nuisance tangent space. This type of robustness, where the
asymptotic distribution of an estimator is independent of whether the true
value of the nuisance parameter is known or whether (and how) the nuisance
parameter is estimated in an estimating equation, is one of the bonuses of
working with estimating equations with estimating functions that are orthog-
onal to the nuisance tangent space. " #
Remark 5. We want to make it clear that the estimator we just presented is
for theoretical purposes only and not of practical use. The starting point was
the choice of a function satisfying the conditions of Lemma 3.1. To find such
a function necessitates knowledge of the truth, which, of course, we don’t
have. Nonetheless, starting with some truth, say θ0 , and some function ϕ(Z)
satisfying the conditions of Corollary 1 (under the assumed true model), we
constructed an estimator whose influence function is ϕ(Z) when θ0 is the
truth. If, however, the data were generated, in truth, by some other value of
the parameter, say θ∗ , then the estimator constructed by solving (3.24) would
have some other influence function ϕ∗ (Z) satisfying the conditions of Lemma
3.1 at θ∗ . "#
Thus, by Corollary 1, all RAL estimators have influence functions that belong
to the subspace of our Hilbert space satisfying
(i) E{ϕ(Z)SβT (Z, θ0 )} = I q×q
and
(ii) E{ϕ(Z)SηT (Z, θ0 )} = 0q×r ,
and, conversely, any element in the subspace above is the influence function
of some RAL estimator.
H = M ⊕ M ⊥. "
#
.h − a0 , a/ = 0 for all a ∈ Λ.
3.4 Efficient Influence Function 43
can be written as the direct sum of the nuisance tangent space and the tangent
space generated by the score vector with respect to the parameter of interest
“β”. That is, if we define Tβ as the space {B q×q Sβ (Z, θ0 ) for all B q×q }, then
T = Tβ ⊕ Λ.
if and only if
var {aT ϕ(1) (Z)} ≤ var {aT ϕ(2) (Z)}
for all q × 1 constant vectors a. Equivalently,
This means that, for such cases, the variance matrix of *+h, for q-dimensional
* and h, is larger (in the multidimensional sense defined above) than either
the variance matrix of * or the variance matrix of h.
In many of the arguments that follow, we will be decomposing elements
of the Hilbert space as the projection to a tangent space or a nuisance tan-
gent space plus the residual after the projection. For such problems, because
the tangent space or nuisance tangent space is a q-replicating linear space,
we now know that we can immediately apply the multivariate version of the
Pythagorean theorem where the variance matrix of any element is always
larger than the variance matrix of the projection or the variance matrix of the
residual after projection. Consequently, we don’t have to distinguish between
the Hilbert space of one-dimensional random functions and q-dimensional ran-
dom functions.
Before describing the geometry of influence functions, we first give the defini-
tion of a linear variety (sometimes also called an affine space).
Definition 7. A linear variety is the translation of a linear subspace away
from the origin; i.e., a linear variety V can be written as V = x0 + M , where
x0 ∈ H and x0 ∈ / M, ,x0 , '= 0, and M is a linear subspace (see Figure 3.4).
#
"
M (linear subspace)
Theorem 3.4. The set of all influence functions, namely the elements of H
that satisfy condition (3.4) of Theorem 3.2, is the linear variety ϕ∗ (Z) + T ⊥ ,
where ϕ∗ (Z) is any influence function and T ⊥ is the space perpendicular to
the tangent space.
Proof. Any element l(Z) ∈ T ⊥ must satisfy
Therefore, if we take
46 3 The Geometry of Influence Functions
The efficient influence function ϕeff (Z), if it exists, is the influence func-
tion with the smallest variance matrix; that is, for any influence function
ϕ(Z) '= ϕeff (Z), var{ϕeff (Z)} − var{ϕ(Z)} is negative definite. That an ef-
ficient influence function exists and is unique is now easy to see from the
geometry of the problem.
Theorem 3.5. The efficient influence function is given by
or
Beff E{Sθ (Z, θ0 )SθT (Z, θ0 )} = Γ(θ0 ),
which implies
Beff = Γ(θ0 )I −1 (θ0 ),
where I(θ0 ) = E{Sθ (Z, θ0 )SθT (Z, θ0 )} is the information matrix. Conse-
quently, the efficient influence function is given by
Definition 8. The efficient score is the residual of the score vector with re-
spect to the parameter of interest after projecting it onto the nuisance tangent
space; i.e.,
Seff (Z, θ0 ) = Sβ (Z, θ0 ) − Π(Sβ (Z, θ0 )|Λ).
Recall that
Π(Sβ (Z, θ0 )|Λ) = E(Sβ SηT ){E(Sη SηT )}−1 Sη (Z, θ0 ). "
#
Therefore, if we define
then
(i) E[ϕeff (Z, θ0 )SβT (Z, θ0 )] = I q×q
and
48 3 The Geometry of Influence Functions
β ∈ Rq ,
η ∈ Rr ,
θ = (β T, η T )T , θ ∈ Rp , p = q + r.
Linear subspaces
Tangent space:
• Efficient score
for any root-n consistent estimator η̂n∗ (β), yields an estimator that is
asymptotically linear with the efficient influence function.
3. Assume Y1 , . . . , Yn are iid with distribution function F (y) = P (Y ≤ y),
which is differentiable everywhere with density f (y) = dFdy(y) . The median
& '
is defined as β = F −1 12 . The sample median is defined as
# $
1
β̂n ≈ F̂n−1 ,
2
3.6 Exercises for Chapter 3 51
"n
where F̂n (y) = n−1 i=1 I(Yi ≤ y) is the empirical distribution function.
Equivalently, β̂n is the solution to the m-estimating equation
n .
! /
1
I(Yi ≤ β) − ≈ 0.
i=1
2
(a) Find the influence function for the sample median β̂n .
Hint: You may assume the following to get your answer.
& '
(i) β̂n is consistent; i.e., β̂n → β0 = F −1 12 .
(ii) Stochastic equicontinuity:
( * + * +)
P
n1/2 F̂n (β̂n ) − F (β̂n ) − n1/2 F̂n (β0 ) − F (β0 ) − → 0.
(b) Let Y1 , . . . , Yn be iid N (µ, σ 2 ), µ ∈ R, σ 2 > 0. Clearly, for this model, the
median β is equal to µ. Verify, by direct calculation, that the influence
function for the sample median satisfies the two conditions of Corollary
1.