The curvature of a regular log-likelihood is known as the information:

\[\oI(\bt) = -\nabla^2 \ell(\bt).\]

This is known as the “information” since the larger this curvature is, the sharper (less flat) the likelihood is around its maximum, and the less uncertainty we have about $\bt$.

Observed vs expected information

The quantity $\oI(\bt)$ depends on the data and is therefore random. To reflect this fact, it is also referred to as the observed information as opposed to the expected information (also known as the Fisher information) in which the random elements have been integrated out.

In situations where we have repeated samples $X_1, \ldots, X_n$, I will include a subscript: $\oI_i$ to indicate the likelihood contribution from observation $i$ alone and $\oI_n$ to indicate that the likelihood is based on all $n$ observations. In the iid case, one typically drops the subscript for the Fisher information, $\fI(\bt)$, since it is the same for every observation, while the total Fisher information for the entire sample is denoted $\fI_n(\bt)$. In other words, if observations are iid, $\Ex \oI_n = n \fI = \fI_n$. Note that this is not universal; other authors may introduce different symbols to represent total expected information.

Notation

This notation is by no means universal, but I use the following notation on this site and in class to distinguish between related “I” matrices:

$\I$: The identity matrix
$\oI$: The observed information
$\fI$: The expected (Fisher) information

This same convention is used for the inverse of the information:

$\oV$: The inverse of the observed information, $\oV = \oI^{-1}$
$\fV$: The inverse of the Fisher information, $\fV = \fI^{-1}$

Note that in pdf documents such as class notes, $\oI$, $\fI$, $\oV$, and $\fV$ appear in bold (when matrix-valued); unfortunately the bold version of these fonts is not available in html, so you will just have to figure out from context whether the information is scalar-valued or matrix-valued while on this site (typically not a problem).

As described above, typically $\fI$ and $\fV$ refer to the information and its inverse for a single observation. If we are referring to the whole sample, a subscript of $n$ is added:

\[\as{ \fV_n &= (\fI_n)^{-1} \\ &= \frac{1}{n} \fI^{-1} \quad \text{if iid} \\ }\]

Occasionally, we also need to refer to a partition of these matrices; in that case, the $n$ will be moved to a superscript. In other words, I will use $\fI^n_{11}$ to denote the upper left partition of $\fI_n$.