The variance of the score is called the Fisher information:

\[\fI(\bt) = \Var \u(\bt|X).\]

On its surface, this would seem to have nothing to do with information. However, the connection between the variance of the score and the curvature of the log-likelihood is made clear in the following theorem.

Theorem: If the likelihood allows all second-order partial derivatives to be passed under the integral sign, then

\[\fI(\bts) = \Ex \oI(\bts|X).\]

Proof: Letting \(P\) denote the distribution function and \(p\) the density function,

\[\begin{alignat*}{2} \Ex \oI(\bts|X) &= - \int \nabla^2_{\bt} \log p(x|\bts) dP(x|\bts) \\ &= -\int \nabla_{\bt} \frac{\nabla_{\bt} p(x|\bts)}{p(x|\bts)} dP(x|\bts) &\hspace{2em}& \href{vector-calculus.html}{\text{CR}}; \nabla \log x = 1/x \\ &= -\int \frac{\nabla^2_{\bt} p(x|\bts) p(x|\bts) - \nabla_{\bt} p(x|\bts) \nabla_{\bt} p(x|\bts) \Tr}{p(x|\bts)^2} dP(x|\bts) && \href{vector-calculus.html}{\text{Chain rule}} \\ &= -\nabla^2_{\bt}\int dP + \int \left(\frac{\nabla_{\bt} p(x|\bts)}{p(x|\bts)}\right) \left(\frac{\nabla_{\bt} p(x|\bts)}{p(x|\bts)}\right) \Tr dP(x|\bts) && \text{If $\nabla^2_{\bt}$ can be passed} \\ &= -\nabla^2_{\bt} \int dP + \int \u \u \Tr dP && \href{score.html}{\text{Def score}} \\ &= \int \u \u \Tr dP && \int dP = 1 \\ &= \Var \u && \href{score-expectation.html}{\Ex \u = 0} \end{alignat*}\]

In the final step, note that \(\nabla^2_{\bt}\) passable \(\implies \nabla_{\bt}\) is passable.

Multiple observations

The above definition and proof assumes that we have a single observation. In cases where we have repeated observations \(X_1, \ldots, X_n\), the symbol \(\fI(\bt)\) is unchanged: it still represents the expected information from a single observation. To indicate the information for the entire sample, we use the symbols $\oI_n(\bt)$ and $\fI_n(\bt)$.

If observations are iid, then each observation has the same Fisher information and $\Ex \oI_n = n \fI = \fI_n$. If observations are independent, but not necessarily identically distributed, then \(\fI(\bt)\) requires subscripts and

\[\Ex \oI_n = \sum_{i=1}^n \fI_i = \fI_n.\]

If observations are not independent, then the Fisher information is more difficult to calculate as it involves the multivariate (joint) distribution of $\x$.