Theorem: For any probability model $p(x|\bt)$ and any parameter value $\bt$, we have

\[\frac{1}{n} \log \frac{L(\bt)}{L(\bt^*)} \inP -\KL(\bt^* \vert \bt),\]

where $\bts$ is the true value. Furthermore, the quantity on the RHS is less than 0 unless $p(x|\bt)=p(x|\bt^*)$ almost everywhere (i.e., the model is identifiable).

Proof:

\[\begin{alignat*}{2} \frac{1}{n} \log \frac{L(\bt)}{L(\bt^*)} &= \frac{1}{n} \sum_i \log \frac{L_i(\bt)}{L_i(\bt^*)} &\hspace{4em}& \log(a/b)=\log(a)-\log(b) \\ &\inP -\KL(\bt^* \vert \bt) && \href{kullback-leibler.html}{\text{Def KL}}, \href{weak-law-of-large-numbers.html}{\text{WLLN}} \end{alignat*}\]

Furthermore, this quantity is strictly negative by Gibbs’ inequality if identifiability holds.

Corollary: For any identifiable probability model,

\[\Pr\{L(\bt) < L(\bt^*)\} \to 1\]

for all $\bt \ne \bts$.