The following are the classical conditions to ensure that the likelihood is “regular”, meaning that at least asymptotically, the likelihood resembles that of a normal distribution. Note: saying “the classical conditions” is perhaps misleading, as there is some flexibility with respect to how these conditions are stated. Some additional commentary on this point is given below.

Core conditions

(A) IID: \(X_1, \ldots, X_n\) are iid with density \(p(x \vert \bt^*)\).

(B) Interior point: There exists an open set \(\bT^* \subset \bT \subset \real^d\) that contains \(\bt^*\).

(C) Smoothness: For all \(x\), \(p(x \vert \bt)\) is continuously differentiable with respect to \(\bt\) up to third order on \(\bT^*\), and satisfies the following conditions:

Note #1: Condition (C) describes what happens for a single observation. What happens in a random sample of \(n\) observations is governed by condition (A).

Note #2: Note that C(ii) applies to the Fisher information, while C(iii) applies to the derivative of the observed information. This is important! The observed information might randomly fail to be positive definite, but we don’t need to worry about that (asymptotically). Meanwhile, we need a bound on the observed derivatives, which can include \(x\). This means that our bound \(M(x)\) must be allowed to be random.

Note #3: Although not explicitly stated, the above conditions also ensure that both the observed information and Fisher information are continuous functions of \(\bt\).

Log-concavity

The above conditions apply only locally (within a neighborhood of \(\bt^*\)) and thus do not guarantee anything about the MLE, only about a local maximum near \(\bt^*\). To guarantee consistency and asymptotic normality of the MLE, the following stronger condition is needed, replacing C(ii).

(D) Log-concavity: The Fisher information \(\fI(\bt)\) is positive definite for all \(\bt \in \bT\), and \(\bT\) is a convex set.

Alternative statements

The above conditions are one way to state the conditions required for asymptotic normality of the MLE, but there is some flexibility. For example, the only reason condition C(i) is necessary is to ensure that \(\Ex \u(\bts) = \zero\) and \(\Ex \oI(\bts) = \fI(\bts)\). Thus, some authors will just state these conditions regarding the expected value of score and information directly rather than state necessary conditions for them.

Similarly, the purpose of C(iii) is to establish a uniform bound on the observed information over \(\bT^*\) – the core idea of the condition is that we have a single bound \(M(x)\) that works for all values of \(\bt \in \bT^*\). Bounding the third derivative is one way of accomplishing that. It is possible to relax this condition and require a uniform bound only on the second derivatives (plus some other conditions), although the resulting proofs of consistency and asymptotic normality become more complicated. For this reason, the simpler regularity conditions presented above are more common in practice.

Finally, the IID assumption merely presents a basic, standard case in which likelihood theory holds. Certainly, likelihood estimates are asymptotically normal in all manner of non-IID settings (multiple groups, regression, etc.) as well – likelihood theory would not be terrible useful if this were not true. If one understands how the theory works in the IID case, it is typically relatively straightforward to extend theoretical results to other cases.