Scoring algorithm
In-game article clicks load inline without leaving the challenge.
Scoring algorithm, also known as Fisher's scoring, is a form of Newton's method used in statistics to solve maximum likelihood equations numerically, named after Ronald Fisher.
Sketch of derivation
Let Y 1 , … , Y n {\displaystyle Y_{1},\ldots ,Y_{n}} be random variables, independent and identically distributed with twice differentiable p.d.f. f ( y ; θ ) {\displaystyle f(y;\theta )}, and we wish to calculate the maximum likelihood estimator (M.L.E.) θ ∗ {\displaystyle \theta ^{*}} of θ {\displaystyle \theta }. First, suppose we have a starting point for our algorithm θ 0 {\displaystyle \theta _{0}}, and consider a Taylor expansion of the score function, V ( θ ) {\displaystyle V(\theta )}, about θ 0 {\displaystyle \theta _{0}}:
V ( θ ) ≈ V ( θ 0 ) − J ( θ 0 ) ( θ − θ 0 ) , {\displaystyle V(\theta )\approx V(\theta _{0})-{\mathcal {J}}(\theta _{0})(\theta -\theta _{0}),\,}
where
J ( θ 0 ) = − ∑ i = 1 n ∇ ∇ ⊤ | θ = θ 0 log f ( Y i ; θ ) {\displaystyle {\mathcal {J}}(\theta _{0})=-\sum _{i=1}^{n}\left.\nabla \nabla ^{\top }\right|_{\theta =\theta _{0}}\log f(Y_{i};\theta )}
is the observed information matrix at θ 0 {\displaystyle \theta _{0}}. Now, setting θ = θ ∗ {\displaystyle \theta =\theta ^{*}}, using that V ( θ ∗ ) = 0 {\displaystyle V(\theta ^{*})=0} and rearranging gives us:
θ ∗ ≈ θ 0 + J − 1 ( θ 0 ) V ( θ 0 ) . {\displaystyle \theta ^{*}\approx \theta _{0}+{\mathcal {J}}^{-1}(\theta _{0})V(\theta _{0}).\,}
We therefore use the algorithm
θ m + 1 = θ m + J − 1 ( θ m ) V ( θ m ) , {\displaystyle \theta _{m+1}=\theta _{m}+{\mathcal {J}}^{-1}(\theta _{m})V(\theta _{m}),\,}
and under certain regularity conditions, it can be shown that θ m → θ ∗ {\displaystyle \theta _{m}\rightarrow \theta ^{*}}.
Fisher scoring
In practice, J ( θ ) {\displaystyle {\mathcal {J}}(\theta )} is usually replaced by I ( θ ) = E [ J ( θ ) ] {\displaystyle {\mathcal {I}}(\theta )=\mathrm {E} [{\mathcal {J}}(\theta )]}, the Fisher information, thus giving us the Fisher Scoring Algorithm:
θ m + 1 = θ m + I − 1 ( θ m ) V ( θ m ) {\displaystyle \theta _{m+1}=\theta _{m}+{\mathcal {I}}^{-1}(\theta _{m})V(\theta _{m})}..
Under some regularity conditions, if θ m {\displaystyle \theta _{m}} is a consistent estimator, then θ m + 1 {\displaystyle \theta _{m+1}} (the correction after a single step) is 'optimal' in the sense that its error distribution is asymptotically identical to that of the true max-likelihood estimate.
See also
Further reading
- Jennrich, R. I. & Sampson, P. F. (1976). . Technometrics. 18 (1): 11–17. doi: (inactive 12 July 2025). JSTOR .
{{cite journal}}: CS1 maint: DOI inactive as of July 2025 (link)