Abstract Generated abstract
This note develops an analogue of Fisher information for parametric families whose densities need not be differentiable with respect to the parameter. It first introduces a nonsymmetric divergence between two probability measures and records its basic properties, including separation, monotonicity under passage to a subalgebra, a relation to sufficiency, and behavior under product measures. Using this divergence, the paper defines an information quantity for one-parameter, multiparameter, and normed-space parameter settings, showing that it coincides with Fisher information for sufficiently smooth families and is additive for independent samples in homogeneous families. It also derives corresponding Rao-Cramér type lower bounds for the variance or covariance matrix of unbiased estimators.
Full Text
MATHEMATICS
A. M. KAGAN
ON THE THEORY OF FISHER INFORMATION QUANTITY
(Presented by Academician V. I. Smirnov, 5 February 1963)
In this note an analogue of Fisher’s information quantity \((^{1})\) is constructed for families specified, generally speaking, by densities not differentiable with respect to the parameter. The corresponding generalization of the Rao–Cramér inequality \((^{1,2})\) is also given.
§ 1. \(W\)-divergence between two distributions
Let, on an abstract space \(X\) of elements \(x\) with a distinguished \(\sigma\)-algebra of subsets \(\mathfrak A\), probability measures \(P_1\) and \(P_2\) be given. One may always assume that they are given by densities \(p_1(x)=dP_1/d\mu,\ p_2(x)=dP_2/d\mu\) with respect to some measure \(\mu\) (as \(\mu\) one may take, for example, \(P_1+P_2\)).
Define the \(W\)-divergence between \(P_1\) and \(P_2\) as follows:
\[ W(P_1;P_2)= \int_{\{p_1(x)>0\}} \left[1-\frac{p_2(x)}{p_1(x)}\right]^2 p_1(x)\,d\mu(x) \tag{1} \]
and, analogously,
\[ W(P_2;P_1)= \int_{\{p_2(x)>0\}} \left[1-\frac{p_1(x)}{p_2(x)}\right]^2 p_1(x)\,d\mu(x). \tag{2} \]
Generally speaking, \(W(P_1;P_2)\ne W(P_2;P_1)\) (note that the Kullback–Leibler numbers \((^3)\) have this same property).
The introduced \(W\)-divergence has the following properties:
-
\(W(P_1;P_2)=0\) only when \(P_1=P_2\).
-
If \(W(P_1;P_2^{(n)})\to 0\) as \(n\to\infty\), then \(\operatorname{Var}|P_1-P_2^{(n)}|\to 0\). As examples show, the converse assertion is false.
-
Let \(\mathfrak B\) be a \(\sigma\)-subalgebra of the algebra \(\mathfrak A\); let \(\widetilde P_1\) and \(\widetilde P_2\) be the restrictions of the measures \(P_1\) and \(P_2\), respectively, to the \(\sigma\)-algebra \(\mathfrak B\). Then \(W(\widetilde P_1;\widetilde P_2)\le W(P_1;P_2)\), with equality if and only if \(\mathfrak B\) is a sufficient subalgebra for the family \((P_1;P_2)\) \((^4)\).
-
Suppose that \(P_1\) and \(P_2\) are mutually absolutely continuous. Let \(X^n=X\times\cdots\times X\), and let \(P_i^{(n)}\) be the direct product of the measure \(P_i\) with itself \(n\) times, \(i=1,2\). Then
\[ W(P_1^{(n)};P_2^{(n)})\ge n\,W(P_1;P_2). \]
§ 2. Parametric families
Let \(P=\{p(x|\theta);\ \theta\in\Theta\}\) be a family of distributions on \(\{X,\mathfrak A\}\), specified with respect to some measure \(\mu\) by densities \(p(x(\theta))\) depending on the parameter \(\theta\). The parameter set \(\Theta\) is assumed to be a finite or infinite interval of the line.
Put
\[ W(\theta;\theta+\Delta\theta)=\frac{1}{(\Delta\theta)^2} \int_{\{p(x|\theta)>0\}} \left[1-\frac{p(x|\theta+\Delta\theta)^2}{p(x|\theta)}\right] p(x|\theta)\,d\mu, \tag{3} \]
\[ W(\theta)=\lim_{\Delta\theta\to0}\inf W(\theta;\theta+\Delta\theta). \tag{4} \]
\(W(\theta)\) is an analogue of Fisher’s information quantity \(I(\theta)\) in those cases where it does not exist. For sufficiently smooth families \(W(\theta)=I(\theta)\). If the family \(P\) is assumed homogeneous (i.e., all distributions belonging to it are mutually absolutely continuous), then the following properties of \(W(\theta)\) can be established:
- If
\[ W^{(n)}(\theta;\theta+\Delta\theta) =\frac{1}{(\Delta\theta)^2} \int_X\cdots\int_X \left[ 1-\frac{p(x_1|\theta+\Delta\theta)\cdots p(x_n|\theta+\Delta\theta)^2} {p(x_1|\theta)\cdots p(x_n|\theta)} \right] \times \]
\[ \times\,p(x_1|\theta)\cdots p(x_n|\theta)\,d\mu(x_1),\ldots,d\mu(x_n), \tag{5} \]
\[ W^{(n)}(\theta)=\lim_{\Delta\theta\to0}\inf W^{(n)}(\theta;\theta+\Delta\theta), \tag{6} \]
then
\[ W^{(n)}(\theta)=nW(\theta). \tag{7} \]
- Let \(\varphi(x)\) be an unbiased estimate of the parameter \(\theta\). Then the following analogue of the Rao–Cramér inequality holds:
\[ E(\varphi(x)-\theta)^2\geq \frac{1}{W(\theta)}. \tag{8} \]
§ 3. Suppose now that the parameter set \(\Theta\) is an \(s\)-dimensional parallelepiped; \(\theta=(\theta_1,\ldots,\theta_s)\). Put
\[ W_{ij}(\theta)=\lim_{|\Delta\theta|\to0}\inf \frac{1}{\Delta\theta_i\Delta\theta_j} \int_X \left[1-\frac{p(x|\theta+\Delta\theta_i)}{p(x|\theta)}\right] \times \]
\[ \times \left[1-\frac{p(x|\theta+\Delta\theta_j)}{p(x|\theta)}\right] p(x|\theta)\,d\mu(x), \tag{9} \]
\[ W(\theta)=\|W_{ij}(\theta)\|_{i,j=1,\ldots,s}. \tag{10} \]
If \(B(\theta)\) is the correlation matrix of an unbiased estimate \(\varphi(x)\) of the parameter \(\theta\), and \(W^{-1}(\theta)\) exists, then, in the well-known sense,
\[ B(\theta)-W^{-1}(\theta)\geq 0. \tag{11} \]
The proof is carried out by the method of [2].
- Let \(\Theta\) be an open subset of a normed space, and let the unbiased estimate \(\varphi(x)\) of the parameter \(\theta\) be Bochner-integrable [5] with respect to the measures \(p(x|\theta)\,d\mu\).
\[ W(\theta)=\lim_{\|\Delta\theta\|\to0}\inf \frac{1}{\|\Delta\theta\|^2} \int_X \left[1-\frac{p(x|\theta+\Delta\theta)^2}{p(x|\theta)}\right] p(x|\theta)\,d\mu. \tag{12} \]
Then
\[ E\|\varphi-\theta\|^2\geq \frac{1}{W(\theta)}. \tag{13} \]
Received
2 II 1963
References Cited
- H. Cramér, Mathematical Methods of Statistics, IL, 1948.
- O. V. Shalaevsky, Theory of Probability and Its Applications, 6, 3 (1961).
- S. Kullback, R. Leibler, Ann. Math. Statistics, 22, 1 (1951).
- P. Halmos, L. Savage, Ann. Math. Statistics, 20, 1 (1949).
- E. Hille, Functional Analysis and Semigroups, IL, 1951.