The variance of the score is called the Fisher information:

I(θ)=Vu(θ|X).

On its surface, this would seem to have nothing to do with information. However, the connection between the variance of the score and the curvature of the log-likelihood is made clear in the following theorem.

Theorem: If the likelihood allows all second-order partial derivatives to be passed under the integral sign, then

I(θ)=EI(θ|X).

Proof: Letting P denote the distribution function and p the density function,

EI(θ|X)=θ2logp(x|θ)dP(x|θ)=θθp(x|θ)p(x|θ)dP(x|θ)CR;logx=1/x=θ2p(x|θ)p(x|θ)θp(x|θ)θp(x|θ)p(x|θ)2dP(x|θ)Chain rule=θ2dP+(θp(x|θ)p(x|θ))(θp(x|θ)p(x|θ))dP(x|θ)If θ2 can be passed=θ2dP+uudPDef score=uudPdP=1=VuEu=0

In the final step, note that θ2 passable θ is passable.

Multiple observations

The above definition and proof assumes that we have a single observation. In cases where we have repeated observations X1,,Xn, the symbol I(θ) is unchanged: it still represents the expected information from a single observation. To indicate the information for the entire sample, we use the symbols In(θ) and In(θ).

If observations are iid, then each observation has the same Fisher information and EIn=nI=In. If observations are independent, but not necessarily identically distributed, then I(θ) requires subscripts and

EIn=i=1nIi=In.

If observations are not independent, then the Fisher information is more difficult to calculate as it involves the multivariate (joint) distribution of x.