Maximum correlation coefficient (asymmetric case)
O. V. SARMANOV
Submitted 1958-01-01 | SovietRxiv: ru-195801.91874 | Translated from Russian

Abstract Generated abstract

This paper extends the notion of a maximal correlation coefficient to the nonsymmetric case of two random variables with a joint density on a rectangular, possibly infinite, domain. Using Hilbert-Schmidt theory, it associates the nonsymmetric density with two symmetric kernels having the same eigenvalue spectrum and defines maximal correlation through the leading nontrivial eigenvalue and corresponding eigenfunctions. The paper gives a successive approximation procedure for computing these eigenfunctions and the coefficient, proves that vanishing maximal correlation is equivalent to independence, and shows that in the rectilinear case it coincides with the ordinary correlation coefficient. An analogous construction and iterative computation are also formulated for discrete random variables given by a rectangular correlation table.

Full Text

Reports of the Academy of Sciences of the USSR
1958. Volume 121, No. 1

MATHEMATICS

O. V. SARMANOV

THE MAXIMAL CORRELATION COEFFICIENT

(THE NONSYMMETRIC CASE)

(Presented by Academician S. N. Bernstein on 21 II 1958)

1. Let \(F(x,y)\) be the density of a distribution defining the correlation between the random variables \(x\) and \(y\) in the rectangular domain
\[ \Omega=[a\leq x\leq b;\ a_1\leq y\leq b_1], \]
which may also be infinite. By
\[ p(x)=\int_{a_1}^{b_1} F(x,y)\,dy,\qquad P(y)=\int_a^b F(x,y)\,dx \]
we denote the a priori densities, respectively, of \(x\) and \(y\), and suppose that the square of the kernel
\[ K(x,y)=\frac{F(x,y)}{\sqrt{p(x)P(y)}} \]
is integrable in both variables.

The nonsymmetric density \(F(x,y)\) defines two symmetric densities
\[ F_1(x,y)=\int_{a_1}^{b_1}\frac{F(x,t)F(y,t)}{P(t)}\,dt;\qquad F_2(x,y)=\int_a^b\frac{F(t,x)F(t,y)}{p(t)}\,dt \tag{1} \]
and two symmetric kernels
\[ K_1(x,y)=\frac{F_1(x,y)}{\sqrt{p(x)p(y)}};\qquad K_2(x,y)=\frac{F_2(x,y)}{\sqrt{P(x)P(y)}}. \tag{2} \]

The kernels (2) are positive and have identical spectra of eigenvalues
\[ 1<\lambda_1^2\leq \lambda_2^2\leq \cdots \leq \lambda_k^2\leq \cdots \tag{3} \]
and, generally speaking, different spectra of eigenfunctions
\[ \{\varphi_i(x)\},\qquad \{\psi_i(x)\},\qquad i=1,2,\ldots \tag{4} \]

According to the Hilbert–Schmidt theory of eigenfunctions, the spectrum of the kernel \(K(x,y)\) has the form
\[ \sqrt{p(x)},\quad \sqrt{P(y)},\quad \sqrt{p(x)}\,\varphi_i(x),\quad \sqrt{P(y)}\,\psi_i(y),\qquad i=1,2,\ldots, \tag{5} \]
where the bilinear expansion
\[ K(x,y)\sim \sqrt{p(x)}\,\sqrt{P(y)} +\sum_{i=1}^{\infty} \frac{\varphi_i(x)\sqrt{p(x)}\,\psi_i(y)\sqrt{P(y)}}{\lambda_i} \tag{6} \]
converges in the mean to \(K(x,y)\) in the domain \(\Omega\).

2. Definition. We shall call \(R^*=\dfrac{1}{\lambda_1}\) the maximal (in absolute value) correlation coefficient corresponding to the density \(F(x,y)\). This definition is analogous to the concept, introduced in the work \((^1)\), of the maximal coefficient for a symmetric density—

In particular, \(R^{*}=\dfrac{1}{\lambda_1^2}\) is the maximal correlation coefficient for symmetric densities (1).

  1. To compute the maximal correlation coefficient, one may recommend the following process of successive approximations, whose convergence is proved, for example, in \((^2)\).

As the “zero approximation” \(r_0(y)\) one may take any function having variance (without loss of generality, we shall assume that the mean value of \(r_0(y)\) is equal to zero). Put

\[ r_{2k+1}(x)=\int_{a_1}^{b_1} r_{2k}(y)\frac{F(x,y)}{p(x)}\,dy,\qquad r_{2k+2}(y)=\int_a^b r_{2k+1}(x)\frac{F(x,y)}{P(y)}\,dx, \]

\[ k=0,1,2,\ldots \tag{7} \]

Then, as follows from \((^2)\), the first pair of eigenfunctions, to within the normalizing factors \(e_1\) and \(g_1\), is determined by the equalities

\[ e_1\psi_1(y)=\lim_{k\to\infty} r_{2k}(y)\lambda_1^{2k},\qquad g_1\varphi_1(x)=\lim_{k\to\infty} r_{2k+1}(x)\lambda_1^{2k}. \tag{8} \]

If \(k\) is sufficiently large, then

\[ R^{*}=\frac{1}{\lambda_1^2}\simeq \frac{r_{2k}(y)}{r_{2k-2}(y)}\simeq \frac{r_{2k+1}(x)}{r_{2k-1}(x)}. \tag{9} \]

The sign of the correlation coefficient \(R^{*}\) between the functions found, \(\varphi_1(x)\) and \(\psi_1(y)\), is easily determined by direct computation.

  1. The advisability of introducing the concept of the maximal correlation coefficient is justified by the following theorems.

Theorem 1. For the independence of random variables \(x,y\), it is necessary and sufficient that the maximal correlation coefficient vanish.

Proof. Necessity is obvious, since if \(x\) and \(y\) are independent, \(\varphi_1(x)\) and \(\psi_1(y)\) are also independent, and the correlation coefficient between them is equal to zero, i.e. \(R^{*}=0\).

Now let \(R^{*}=\dfrac{1}{\lambda_1}=0\); then, according to (5), the kernel \(K(x,y)\) has no eigenfunctions except \(1\cdot\sqrt{p(x)}\) and \(1\cdot\sqrt{P(y)}\), and from the mean-square convergence of the bilinear expansion (6) it follows that

\[ \int_a^b\int_{a_1}^{b_1} \left[ \frac{F(x,y)}{\sqrt{p(x)P(y)}}-\sqrt{p(x)P(y)} \right]^2 dx\,dy=0, \]

i.e.

\[ F(x,y)=p(x)P(y) \tag{10} \]

for almost all \(x\) and \(y\) in the domain \(\Omega\), as was required to prove.

Theorem 2. If the correlation is rectilinear, then the ordinary correlation coefficient \(R\) between \(x\) and \(y\) coincides with the maximal correlation coefficient \(R^{*}\).

Proof. In this case, in the spectrum (5) there is a pair of linear and, consequently, monotone eigenfunctions,

\[ \varphi_1(x)=\frac{x-c}{\sigma},\qquad \psi_1(y)=\frac{y-c_1}{\sigma_1}, \tag{11} \]

where \(c\) and \(c_1\) are the means, and \(\sigma^2\) and \(\sigma_1^2\) are the variances of \(x\) and \(y\), respectively.

It is known (see, for example, \((^3)\)) that if in the spectrum of an asymmetric stochastic kernel there is a pair of monotone functions, then they always belong to the first eigenvalue; therefore the correlation coefficient

between \(x\) and \(y\), equal to the correlation coefficient between their linear functions \(\dfrac{x-c}{\sigma}\) and \(\dfrac{y-c_1}{\sigma_1}\), coincides with the maximal correlation coefficient, as was required to prove.

  1. For discrete random variables the maximal correlation coefficient is defined analogously.

Let the correlation dependence between discrete random variables be defined by the rectangular matrix

\[ \{p_{ij}\}, \qquad i=1,2,\ldots,n, \qquad j=1,2,\ldots,m, \tag{12} \]

where

\[ 0 \le p_{ij}=P\{x=x_i;\ y=y_j\}, \qquad \sum_{ij} p_{ij}=1, \]

\[ p_i=\sum_{j=1}^{m} p_{ij}=P\{x=x_i\}, \qquad P_j=\sum_{i=1}^{n} p_{ij}=P\{y=y_j\}. \tag{13} \]

With the aid of (12) form two square symmetric matrices

\[ \left\{\frac{p_{ij}^{(1)}}{\sqrt{p_i p_j}}\right\}, \qquad i,j=1,2,\ldots,n; \qquad \left\{\frac{p_{ij}^{(2)}}{\sqrt{P_i P_j}}\right\}, \qquad i,j=1,2,\ldots,m. \tag{14} \]

where

\[ p_{ij}^{(1)}=\sum_{k=1}^{m}\frac{p_{ik}p_{jk}}{P_k}, \qquad p_{ij}^{(2)}=\sum_{k=1}^{n}\frac{p_{ki}p_{kj}}{p_k}, \tag{15} \]

Then the correlation coefficient between the first eigenvectors of the matrices (14) is called the maximal correlation coefficient \(R^{*}\) between random variables with correlation table (12).

Let us note that with a matrix (12) having \(n\) rows and \(m\) columns, one can associate an infinite set of pairs of vectors \(X\{x_1, x_2,\ldots,x_n\}\) and \(Y\{y_1, y_2,\ldots,y_m\}\), and for each pair one can compute the correlation coefficient. If, however, one takes the pair of first eigenvectors of the matrices (14), then the correlation coefficient between them has the maximal (in absolute value) value. The square of the maximal correlation coefficient is the first eigenvalue of both matrices (14).

The process of successive approximations for finding \(R^{*}\) is analogous to that described in Sec. 3.

Let \(r_0(y)\) be an arbitrary vector with coordinates \(\{y_1^{(0)}, y_2^{(0)},\ldots,y_m^{(0)}\}\) such that

\[ \sum_{j=1}^{m} y_j^{(0)} P_j=0. \]

Put

\[ x_i^{(2k+1)}=\sum_{j=1}^{m}\frac{p_{ij}}{p_i}y_j^{(2k)}, \qquad i=1,2,\ldots,n, \qquad k=0,1,2,\ldots, \]

\[ y_j^{(2k+2)}=\sum_{i=1}^{n}\frac{p_{ij}}{P_j}x_i^{(2k+1)}, \qquad j=1,2,\ldots,m, \qquad k=0,1,2,\ldots. \]

If \(k\) is sufficiently large, then

\[ R^{*2}=\frac{1}{\lambda_1^2}\approx \frac{y_j^{(2k)}}{y_j^{(2k-2)}}\approx \frac{x_i^{(2k+1)}}{x_i^{(2k-1)}} \]

for all \(i\) and \(j\).

The coordinates of the first eigenvectors of the matrices (14) are determined from the conditions

\[ \begin{aligned} \xi_i &= \lim_{k \to \infty} x_i^{(2k+1)} \lambda_1^{2k}, \qquad i = 1,2,\ldots,n; \\ \eta_j &= \lim_{k \to \infty} y_j^{(2k)} \lambda_1^{2k}, \qquad j = 1,2,\ldots,m . \end{aligned} \tag{16} \]

V. A. Steklov Mathematical Institute
Academy of Sciences of the USSR

Received
20 II 1958

References

¹ O. V. Sarmanov, DAN, 120, No. 4 (1958). ² O. V. Sarmanov, DAN, 53, No. 9 (1946). ³ M. K. Nomokonov, DAN, 72, No. 6 (1950).

Submission history

Maximum correlation coefficient (asymmetric case)