On the Asymptotics of the Distribution of the Maximum Likelihood Statistic
Corresponding Member of the Academy of Sciences of the USSR Yu. V. LINNIK, N. M. MITROFANOVA
Submitted 1963-01-01 | SovietRxiv: ru-196301.48109 | Translated from Russian

Abstract Generated abstract

The paper studies higher order asymptotics for the distribution of the maximum likelihood estimator of a scalar shift parameter in densities of the form c exp(−F(x−θ)). Under smoothness, integrability, growth, and regularity assumptions on F, the authors derive an expansion for the distribution of the normalized estimator in powers of n−1/2, with explicitly constructible continuous correction terms and leading normal law with variance equal to the Rao-Cramér lower bound. The proof is outlined through an expansion of the likelihood equation, analysis of a root via a Newton polygon, truncation of the resulting series, and application of a multidimensional limit theorem. A related result establishes, under an additional positivity condition on a higher derivative of F, that the estimator has finite variance and that the variance of its normalized form converges to the limiting variance at rate O(n−1/2).

Full Text

MATHEMATICS

Corresponding Member of the Academy of Sciences of the USSR Yu. V. LINNIK, N. M. MITROFANOVA

ON THE ASYMPTOTICS OF THE DISTRIBUTION OF THE MAXIMUM-LIKELIHOOD STATISTIC

Consider a random variable \(X\) with distribution density \(f(x,\theta)\). Let \(x_1, x_2,\ldots,x_n\) be \(n\) independent observations of \(X\). Then, as is known, the maximum-likelihood estimate of the parameter \(\theta\), i.e. a consistent solution of the equation

\[ \sum_{i=1}^{n}\frac{\partial \ln f(x_i,\theta)}{\partial \theta}=0, \]

under fairly general conditions on \(f(x,\theta)\), possesses a number of asymptotic properties: asymptotic normality, asymptotic efficiency.

We shall consider functions \(f(x,\theta)\) of the form \(c e^{-F(x-\theta)}\), where \(\theta\) is a scalar shift parameter. Assuming that the corresponding conditions, to be specified below, are satisfied, we write the likelihood equation in the form

\[ \sum_{i=1}^{n} F'(x_i-\theta)=0. \tag{1} \]

Let the density \(f(x,\theta)\) satisfy the conditions ((\(^{1}\)), p. 544):

A. For every \(\theta\) from some nondegenerate interval \(\Theta\), for almost all \(x\) there exist the derivatives \(\partial \ln f/\partial \theta\), \(\partial^2 \ln f/\partial \theta^2\), and \(\partial^3 \ln f/\partial \theta^3\).

B. For every \(\theta\) from \(\Theta\),
\[ |\partial f/\partial \theta|<F_1(x),\quad |\partial^2/\partial \theta^2|<F_2(x),\quad |\partial^3 \ln f/\partial \theta^3|<H(x), \]
where \(F_1,F_2\) are integrable on \((-\infty,+\infty)\) and
\[ \int_{-\infty}^{+\infty} H(x) f(x,\theta)\,dx<M, \]
where \(M\) does not depend on \(\theta\).

C. For every \(\theta\) from \(\Theta\), the integral
\[ \int_{-\infty}^{+\infty}\left(\frac{\partial \ln f}{\partial \theta}\right)^2 f\,dx \]
is finite and positive.

Then equation (1) has a solution that converges in probability to the true value of the parameter \(\theta\). This solution will be an asymptotically normal and asymptotically efficient estimator for \(\theta\). Denote this solution by \(\hat{\theta}_n\).

Theorem 1. Let the true value of the parameter \(\theta\) be equal to 0, and let the function \(F(x)\) satisfy the following conditions:

  1. \(F(x)\) has \(k+2\) derivatives \((k>0)\).

  2. \(|F^{(i)}(x)|<\exp(\ln(|x|+1))^{m_i}\) for some values \(m_i\),
    \(i=1,2,\ldots,k+2\).

  3. \(E e^{b_i |F^{(i)}(x)|}<\infty\) for some values \(b_i>0\), \(i=1,2,\ldots,k+1\).

  4. Conditions B and C are satisfied.

5.
\[ \frac{x\ln(x)}{F(x)}\to 0 \quad \text{as } x\to \pm\infty. \]

Then for \(|x| < A\) (\(A\) a positive constant) the asymptotic expansion holds

\[ P\left(\hat{\theta}_n\sqrt{n}<x\right) = \Phi\left(\frac{x}{\chi}\right) + \sum_{j=1}^{\left[\frac{k-1}{2}\right]} n^{-j/2}K_j\left(\frac{x}{\chi}\right) + o\left(\frac{\ln n}{\sqrt n}\right)^{\left[\frac{k}{2}\right]}; \]

\(K_j(x)\) are certain effectively constructible continuous functions, and \(\chi^2\) is the Rao–Cramér lower bound for the variance of an estimate of the parameter \(\theta\).

We note that the \(K_j(x)\) are not Hermite polynomials.

The following theorem is directly adjacent to this theorem.

Theorem 2. Suppose that the function \(F(x)\) satisfies the conditions of Theorem 1 and, for some \(i\) \((3 \le i \le k+2)\), \(F^{(i)}(x)>c_0>0\); then the maximum likelihood estimate \(\hat{\theta}_n\) has its own variance, and moreover

\[ D\left(\hat{\theta}_n\sqrt n\right) = \chi^2 + O\left(\frac{1}{\sqrt n}\right). \]

Theorem 2, in the present particular case, establishes the validity of G. Chernov’s hypothesis \((^2)\) that the maximum likelihood estimate has its own variance converging to the variance of the limiting distribution, and establishes the rate of convergence.

We indicate the main steps of the proof. Assumption 1 of Theorem 1 makes it possible to write equation (1) in the form

\[ \frac{1}{\sqrt n} \left( \xi_1-\xi_2\theta+\cdots+ \frac{(-1)^{k-1}}{k!}\xi_{k+1}\theta^k \right) - a_2\theta+\cdots+ \frac{(-1)^{k-1}}{k!}a_{k+1}\theta^k + \rho_{kn}(\theta) =0, \tag{2} \]

where

\[ \xi_r = \frac{\sum_1^n\left(F^r(x_i)-a_r\right)}{\sqrt n}, \qquad a_r=EF^{(r)}(x), \]

\[ \rho_{kn}(\theta) = \frac{(-1)^k}{(k+1)!} \frac{\sum F^{(k+2)}(x_i-\beta\theta)}{n}, \qquad |\beta|<1. \]

Introduce the conditions: \(\mathfrak A_\varepsilon\), consisting in the fact that
\(|\xi_r|<n^\varepsilon,\ r=1,\ldots,k+1\), where \(\varepsilon\) is an arbitrarily small positive number; \(\mathfrak B\), consisting in the fact that
\(|x_i|<\ln n,\ i=1,\ldots,n\). Consider the equation

\[ \frac{1}{\sqrt n} \left( \xi_1-\cdots+ \frac{(-1)^{k-1}}{k!}\xi_{k+1}\theta^k \right) - a_2\theta+\cdots+ \frac{(-1)^{k-1}}{k!}\xi_{k+1}\theta^k =0. \tag{3} \]

With the aid of the Newton polygon of (3) we construct an expansion for a root of equation (3) in powers of \(\frac{1}{\sqrt n}\):

\[ \theta=\alpha_0+\alpha_1\frac{1}{\sqrt n}+\cdots \]

The root \(\theta_1\), for which \(\alpha_0=0\), has the form

\[ \theta_1 = \frac{1}{\sqrt n}\frac{\xi_1}{a_2} +\cdots \tag{4} \]

Under the condition \(\mathfrak A_\varepsilon\), expansion (4) converges in some neighborhood of zero. In addition, \(\theta_n\) and \(\hat{\theta}_1\) under the conditions \(\mathfrak A_\varepsilon\) and \(\mathfrak B\) are related by the inequality

\[ |\hat{\theta}_n-\theta_1| < \frac{C}{n^{\frac{k-1}{4}-\varphi}}, \]

where \(C\) does not depend on \(n\) and \(\varphi\) is a certain number that can be made arbitrarily small by the choice of \(\varepsilon\). Since the probability that the conditions \(\mathfrak{B}\) and \(\mathfrak{A}_\varepsilon\) fail is of order \(o\!\left(\dfrac{1}{n^k}\right)\) for any \(K\), one may consider the conditional probability \(P(\hat{\theta}\sqrt n < x \mid \mathfrak{A}_\varepsilon,\mathfrak{B})\). Next we truncate the series for \(\theta_1\) at the \(\left[\dfrac{k-1}{2}\right]\)-th term, and, since

\[ P(\hat{\theta}_n\sqrt n < x) = P\left( \alpha_1+\cdots+ \alpha_{\left[\frac{k-1}{2}\right]-1} \left(\frac{1}{\sqrt n}\right)^{\left[\frac{k-1}{2}\right]} < x \right) + o\left(\frac{1}{\sqrt n}\right)^{\left[\frac{k-1}{2}\right]}, \]

then, using the specific nature of the functions
\(\alpha_1,\ldots,\alpha_{\left[\frac{k-1}{2}\right]-1}\), we can apply the multidimensional limit theorem \({}^{(4)}\) to find this probability.

Concerning the proof of Theorem 2, we note that, as a root of equation (2), \(\hat{\theta}_n\) satisfies the inequality

\[ |\hat{\theta}_n| < \frac{ \displaystyle \max_{\tau=1,\ldots,k+1} \left| \frac{1}{(\tau-1)!}\sum_1^n F^{(\tau)}(x_i) \right| }{ \displaystyle \frac{1}{(k+1)!} \left| \sum_1^n F^{(k+2)}(x_i-\varphi\hat{\theta}_n) \right| } +1 \le \frac{ \displaystyle \frac{1}{\sqrt n}\sum |\xi_i|\,\frac{1}{(i-1)!} +\sum \frac{|a_i|}{(i-1)!} }{ \displaystyle c_0\,\frac{1}{(k+1)!} }, \]

whence it follows that

\[ \int \hat{\theta}_n^2(x_1,\ldots,x_n)\, f(x_1,\ldots,x_n)\, dx < \infty. \]

Further, in the region determined by the conditions \(\mathfrak{A}_\varepsilon,\mathfrak{B}\), for \(\hat{\theta}_n\) we have the expansion (4), and since

\[ \int_{\overline{\mathfrak{A}}_\varepsilon \overline{\mathfrak{B}}} \hat{\theta}_n^2(x_1,\ldots,x_n)\, f(x_1,\ldots,x_n)\, dx = o\left(\frac{1}{n^K}\right) \]

for any positive \(K\), Theorem 2 follows immediately from this.

Received
24 XII 1962

REFERENCES

\({}^{1}\) H. Cramér, Mathematical Methods of Statistics, Moscow, 1948.
\({}^{2}\) H. Chernoff, Ann. of Math. Stat., 27, No. 1 (1956).
\({}^{3}\) N. G. Chebotarev, The Theory of Algebraic Functions, Moscow–Leningrad, 1948.
\({}^{4}\) R. Rao, Bull. Am. Math. Soc., 67, No. 4, 359 (1961).

Submission history

On the Asymptotics of the Distribution of the Maximum Likelihood Statistic