The following are the classical conditions to ensure that the likelihood is “regular”, meaning that at least asymptotically, the likelihood resembles that of a normal distribution. Note: saying “the classical conditions” is perhaps misleading, as there is some flexibility with respect to how these conditions are stated. Some additional commentary on this point is given below.
Core conditions
(A) IID:
(B) Interior point: There exists an open set
(C) Smoothness: For all
- (i) Derivatives up to second order can be passed under the integral sign in
. - (ii) The Fisher information
is positive definite. - (iii) The third derivatives
are bounded by on : for all , with
Note #1: Condition (C) describes what happens for a single observation. What happens in a random sample of
Note #2: Note that C(ii) applies to the Fisher information, while C(iii) applies to the derivative of the observed information. This is important! The observed information might randomly fail to be positive definite, but we don’t need to worry about that (asymptotically). Meanwhile, we need a bound on the observed derivatives, which can include
Note #3: Although not explicitly stated, the above conditions also ensure that both the observed information and Fisher information are continuous functions of
-
All differentiable functions are continuous. Thus, by requiring the third derivative to exist, we require that the second derivative (the observed information) is continuous. Similarly, the score must be continuous as well.
-
In fact, these conditions ensure that the observed information is uniformly continuous over
. For any , choose . Then for any satisfying we have (for each observation)where
is on the line segment connecting and and therefore also in . We therefore havenote that here we can choose a single value of
that works for all . In the above, we assumed a single parameter for the sake of simplicity, but the argument is effectively the same in higher dimensions.Uniform continuity is important because it provides uniform convergence of the observed information to the Fisher information:
as . Note that this is more complex than ordinary convergence – we can’t simply use the law of large numbers or the continuous mapping theorem here because both the information and the point at which the information is being evaluated are changing simultaneously. -
Similar arguments apply to the Fisher information. Just as the information (second derivative of the log-likelihood) is uniformly continuous over
, the score (first derivative) is also uniformly continuous over and the dominated convergence theorem applies:for any
.
Log-concavity
The above conditions apply only locally (within a neighborhood of
(D) Log-concavity: The Fisher information
Alternative statements
The above conditions are one way to state the conditions required for asymptotic normality of the MLE, but there is some flexibility. For example, the only reason condition C(i) is necessary is to ensure that
Similarly, the purpose of C(iii) is to establish a uniform bound on the observed information over
Finally, the IID assumption merely presents a basic, standard case in which likelihood theory holds. Certainly, likelihood estimates are asymptotically normal in all manner of non-IID settings (multiple groups, regression, etc.) as well – likelihood theory would not be terrible useful if this were not true. If one understands how the theory works in the IID case, it is typically relatively straightforward to extend theoretical results to other cases.