In statistics, the Lehmann–Scheffé theorem ties together completeness, sufficiency, uniqueness, and best unbiased estimation. The theorem states that any estimator that is unbiased for a given unknown quantity and that depends on the data only through a complete, sufficient statistic is the unique best unbiased estimator of that quantity. The Lehmann–Scheffé theorem is named after Erich Leo Lehmann and Henry Scheffé, given their two early papers.

If T {\displaystyle T} is a complete sufficient statistic for θ {\displaystyle \theta } and E ⁡ [ g ( T ) ] = τ ( θ ) {\displaystyle \operatorname {E} [g(T)]=\tau (\theta )} then g ( T ) {\displaystyle g(T)} is the uniformly minimum-variance unbiased estimator (UMVUE) of τ ( θ ) {\displaystyle \tau (\theta )}.

Statement

Let X → = X 1 , X 2 , … , X n {\displaystyle {\vec {X}}=X_{1},X_{2},\dots ,X_{n}} be a random sample from a distribution that has p.d.f (or p.m.f in the discrete case) f ( x : θ ) {\displaystyle f(x:\theta )} where θ ∈ Ω {\displaystyle \theta \in \Omega } is a parameter in the parameter space. Suppose Y = u ( X → ) {\displaystyle Y=u({\vec {X}})} is a sufficient statistic for θ, and let { f Y ( y : θ ) : θ ∈ Ω } {\displaystyle \{f_{Y}(y:\theta ):\theta \in \Omega \}} be a complete family. If φ : E ⁡ [ φ ( Y ) ] = θ {\displaystyle \varphi :\operatorname {E} [\varphi (Y)]=\theta } then φ ( Y ) {\displaystyle \varphi (Y)} is the unique MVUE of θ.

Proof

By the Rao–Blackwell theorem, if Z {\displaystyle Z} is an unbiased estimator of θ then φ ( Y ) := E ⁡ [ Z ∣ Y ] {\displaystyle \varphi (Y):=\operatorname {E} [Z\mid Y]} defines an unbiased estimator of θ with the property that its variance is not greater than that of Z {\displaystyle Z}.

Now we show that this function is unique. Suppose W {\displaystyle W} is another candidate MVUE estimator of θ. Then again ψ ( Y ) := E ⁡ [ W ∣ Y ] {\displaystyle \psi (Y):=\operatorname {E} [W\mid Y]} defines an unbiased estimator of θ with the property that its variance is not greater than that of W {\displaystyle W}. Then

E ⁡ [ φ ( Y ) − ψ ( Y ) ] = 0 , θ ∈ Ω . {\displaystyle \operatorname {E} [\varphi (Y)-\psi (Y)]=0,\theta \in \Omega .}

Since { f Y ( y : θ ) : θ ∈ Ω } {\displaystyle \{f_{Y}(y:\theta ):\theta \in \Omega \}} is a complete family

E ⁡ [ φ ( Y ) − ψ ( Y ) ] = 0 ⟹ φ ( y ) − ψ ( y ) = 0 , θ ∈ Ω {\displaystyle \operatorname {E} [\varphi (Y)-\psi (Y)]=0\implies \varphi (y)-\psi (y)=0,\theta \in \Omega }

and therefore the function φ {\displaystyle \varphi } is the unique function of Y with variance not greater than that of any other unbiased estimator. We conclude that φ ( Y ) {\displaystyle \varphi (Y)} is the MVUE.

Example for when using a non-complete minimal sufficient statistic

An example of an improvable Rao–Blackwell improvement, when using a minimal sufficient statistic that is not complete, was provided by Galili and Meilijson in 2016. Let X 1 , … , X n {\displaystyle X_{1},\ldots ,X_{n}} be a random sample from a scale-uniform distribution X ∼ U ( ( 1 − k ) θ , ( 1 + k ) θ ) , {\displaystyle X\sim U((1-k)\theta ,(1+k)\theta ),} with unknown mean E ⁡ [ X ] = θ {\displaystyle \operatorname {E} [X]=\theta } and known design parameter k ∈ ( 0 , 1 ) {\displaystyle k\in (0,1)}. In the search for "best" possible unbiased estimators for θ {\displaystyle \theta }, it is natural to consider X 1 {\displaystyle X_{1}} as an initial (crude) unbiased estimator for θ {\displaystyle \theta } and then try to improve it. Since X 1 {\displaystyle X_{1}} is not a function of T = ( X ( 1 ) , X ( n ) ) {\displaystyle T=\left(X_{(1)},X_{(n)}\right)}, the minimal sufficient statistic for θ {\displaystyle \theta } (where X ( 1 ) = min i X i {\displaystyle X_{(1)}=\min _{i}X_{i}} and X ( n ) = max i X i {\displaystyle X_{(n)}=\max _{i}X_{i}}), it may be improved using the Rao–Blackwell theorem as follows:

θ ^ R B = E θ ⁡ [ X 1 ∣ X ( 1 ) , X ( n ) ] = X ( 1 ) + X ( n ) 2 . {\displaystyle {\hat {\theta }}_{RB}=\operatorname {E} _{\theta }[X_{1}\mid X_{(1)},X_{(n)}]={\frac {X_{(1)}+X_{(n)}}{2}}.}

However, the following unbiased estimator can be shown to have lower variance:

θ ^ L V = 1 k 2 n − 1 n + 1 + 1 ⋅ ( 1 − k ) X ( 1 ) + ( 1 + k ) X ( n ) 2 . {\displaystyle {\hat {\theta }}_{LV}={\frac {1}{k^{2}{\frac {n-1}{n+1}}+1}}\cdot {\frac {(1-k)X_{(1)}+(1+k)X_{(n)}}{2}}.}

And in fact, it could be even further improved when using the following estimator:

θ ^ BAYES = n + 1 n [ 1 − X ( 1 ) ( 1 + k ) X ( n ) ( 1 − k ) − 1 ( X ( 1 ) ( 1 + k ) X ( n ) ( 1 − k ) ) n + 1 − 1 ] X ( n ) 1 + k {\displaystyle {\hat {\theta }}_{\text{BAYES}}={\frac {n+1}{n}}\left[1-{\frac {{\frac {X_{(1)}(1+k)}{X_{(n)}(1-k)}}-1}{\left({\frac {X_{(1)}(1+k)}{X_{(n)}(1-k)}}\right)^{n+1}-1}}\right]{\frac {X_{(n)}}{1+k}}}

The model is a scale model. Optimal equivariant estimators can then be derived for loss functions that are invariant.

See also