Theorem: For any probability model X1,X2,p(x|θ) and any parameter value θ, we have

1nlogL(θ)L(θ)pKL(θ|θ),

where θ is the true value. Furthermore, the quantity on the RHS is less than 0 unless p(x|θ)=p(x|θ) almost everywhere (i.e., the model is identifiable).

Proof:

1nlogL(θ)L(θ)=1nilogLi(θ)Li(θ)log(a/b)=log(a)log(b)pKL(θ|θ)Def KL,WLLN

Furthermore, this quantity is strictly negative by Gibbs’ inequality if identifiability holds.

Corollary: For any identifiable probability model,

P{L(θ)<L(θ)}1

for all θθ.