Consistency of the MLE

Theorem (Consistency of the MLE): Suppose assumptions (A)-(D) are met. Then the maximum likelihood estimator \(\bth\) is consistent:

\[\bth \inP \bt^*.\]

Proof: Let \(\eps > 0, B = \tfrac{1}{6} d^3 \Ex M(X), \text{ and } r^* = \tfrac{1}{4}\lam/(d+B)\), where \(\lam\) is the smallest eigenvalue of \(\fI(\bts)\); note that \(\lam>0\) by C(ii). The main logic of the proof is presented below, followed by supplemental remarks concerning quantities that we will define as \(S_1, S_2, \text{ and } S_3\). The main idea is that we can construct a neighborhood around \(\bts\) with upper bounds for \(S_1\) and \(S_3\) and a lower bound for \(S_2\), thereby establishing that with probability going to 1, the likelihood at the center of the neighborhood is uniformly greater than the likelihood at the edge of the neighborhood, and therefore the MLE must be inside the neighborhood.

\[\begin{alignat*}{2} &\exists r \in (0, r^*): \forall i, \bt \in N_r(\bts), \\ &\quad \ell_i(\bt) = \ell_i(\bts) + \u_i(\bts)\Tr(\bt-\bts) - \tfrac{1}{2}(\bt-\bts)\Tr\oI_i(\bts)(\bt-\bts) &\hspace{4em}& \href{regularity-conditions.html}{\text{(B), (C)}} \text{ allow } \href{taylor-series.html}{\text{TSE}} \\ &\qquad\qquad + \tfrac{1}{6}\sum_{j=1}^d\sum_{k=1}^d\sum_{m=1}^d (\bt-\bts)_j(\bt-\bts)_k(\bt-\bts)_m \dddot\ell (\bar{\bt}_i) \\ &\quad \tfrac{1}{n}\ell(\bt) - \tfrac{1}{n}\ell(\bts) = S_1 - S_2 + S_3 && S_1, S_2, S_3 \text{ defined below} \\ &\exists N_1: n > N_1 \implies \Pr\{S_1 < dr^3 \,\forall\, \norm{\bt-\bts}=r\} > 1-\eps/3 && \href{regularity-conditions.html}{\text{(i)}}, (S1) \\ &\exists N_2: n > N_2 \implies \Pr\{S_2 > \tfrac{1}{4}\lam r^2 \,\forall\, \norm{\bt-\bts}=r\} > 1-\eps/3 && \href{regularity-conditions.html}{\text{(i), (ii)}}, (S2) \\ &\exists N_3: n > N_3 \implies \Pr\{S_3 < Br^3 \,\forall\, \norm{\bt-\bts}=r\} > 1-\eps/3 && \href{regularity-conditions.html}{\text{(iii)}}, (S3) \end{alignat*}\]

Thus, for \(n > \max(N_1, N_2, N_3)\), we have

\[\Pr\left\{\sup_{\bt:\norm{\bt-\bts} = r} S_1 - S_2 + S_3 < dr^3 - \tfrac{1}{4}\lam r^2 + Br^3\right\} > 1-\eps\]

by the union bound. Furthermore, since \(r < \tfrac{1}{4}\lam/(d+B)\),

\[\Pr\{\sup_{\bt:\norm{\bt-\bts} = r} \ell(\bt) < \ell(\bts)\} > 1-\eps.\]

Finally, since (D) implies a unimodal likelihood, we must have

\[\Pr\{\bth \in N_r(\bts)\} \to 1.\]

Now for the supplementary remarks about \(S_1, S_2, \text{ and } S_3\).

\[\as{ \tag{S1} S_1 &= (\bt-\bts)\Tr \tfrac{1}{n}\sum_{i=1}^n \u_i(\bts) \\ &\inP 0}\]

by the expectation of the score.

To streamline notation for \(S_2\), let us write \(\bt-\bts = r\u\), where \(\u\) is a \(d \times 1\) unit vector (not to be confused with the score, my apologies for the brief abuse of notation. Furthermore, in the equation below all information matrices are evaluated at \(\bts\).

\[\as{ \tag{S2} S_2 &= \tfrac{1}{2}(\bt-\bts)\Tr \left[\tfrac{1}{n}\sum_{i=1}^n\oI_i\right] (\bt-\bts) \\ &= \tfrac{1}{2}r^2\u\Tr\left[\tfrac{1}{n}\oI_n - \fI + \fI\right]\u \\ &= \tfrac{1}{2}r^2\u\Tr\fI\u - \tfrac{1}{2}r^2\u\Tr\left[\tfrac{1}{n}\oI_n - \fI\right]\u \\ &\ge \tfrac{1}{2}r^2\lam - o_p(1) }\]

since \(\lam\) is the minimum eigenvalue of \(\fI(\bts)\) and \(\tfrac{1}{n}\oI_n \inP \fI\) by the Fisher information theorem.

Finally,

\[\as{ \tag{S3} S_3 &= \tfrac{1}{n}\sum_{i=1}^n \tfrac{1}{6} \sum_{j=1}^d\sum_{k=1}^d\sum_{m=1}^d (\bt-\bts)_j(\bt-\bts)_k(\bt-\bts)_m \dddot\ell (\bar{\bt}_i) \\ &< \tfrac{1}{n}\sum_{i=1}^n \tfrac{1}{6} \sum_{j=1}^d\sum_{k=1}^d\sum_{m=1}^d r^3 M(x_i) \\ &= \tfrac{1}{6} r^3 d^3 \left\{\tfrac{1}{n}\sum_{i=1}^n M(x_i)\right\} \\ &\inP \tfrac{1}{6} r^3 d^3 \Ex M(X) \\ &= Br^3 }\]