Abstract Generated abstract
This paper analyzes parallel decoding for discrete stationary channels with independent noise as a way to reduce the computational burden of optimal decoding, whose complexity grows exponentially with block length. The method divides the received word into subwords, enumerates candidate input subwords simultaneously, and applies a sequence of threshold decisions based on likelihood-related statistics. Asymptotic estimates are derived for the mean number of computations, threshold values, subblock lengths, and admissible parameter ranges, with comparisons to ordinary block decoding and the single-stage case. The results indicate that, especially for high noise levels and rates below capacity, increasing the number of parallel parts can substantially lower the exponential order of decoding complexity while retaining a prescribed probability of correct decoding.
Full Text
CYBERNETICS AND CONTROL THEORY
B. S. FLEISHMAN
PARALLEL DECODING
(Presented by Academician V. M. Glushkov, 25 I 1965)
Consider a discrete stationary channel with independent noise, defined by the matrix of transition probabilities \(\mathbf p=\|p_i^j\|\) from input symbols \(i=1,\ldots,a\) to output symbols \(j=1,\ldots,b\). The practical implementation of the procedure of optimal, in the Shannon sense, coding and decoding for such a channel \((^{1-3})\) encounters difficulties connected with the exponential growth (with the length \(n\) of the input words) of the number \(N\) of computations in decoding (exponential difficulties).
Sequential decoding \((^{4-6})\), for transmission rate \(R\) substantially less than the channel capacity \(C\), \(0<R<R_{\mathrm{comp}}/2<C/2\), and for the special tree-like random coding case, leads to a power-law, as a function of \(n\), growth of the mean number of computations \(\bar N\).
In \((^7)\) a procedure was proposed for the simultaneous decoding of \(m>1\) parts of the received output word (parallel decoding). Under this procedure the growth of the mean number of computations \(\bar N_m\) is also exponential, but, for the same probability \(P\) of correct decoding, substantially slower than the growth of \(N\) in the interval \(0<R<C\), and substantially slower than the growth of \(\bar N\) in the interval \(R_{\mathrm{comp}}/2<R<C\) (with the exception of a small neighborhood of the point \(R_{\mathrm{comp}}/2\)).
In the present work a number of new estimates of the parameters of parallel decoding are given.
The procedure of parallel decoding is as follows. The received output word \(y=(j_1,\ldots,j_n)\) of length \(n\) is divided into \(m\) subwords \(y_1,\ldots,y_r,\ldots,y_m\) of lengths \(n_1^m,\ldots,n_r^m,\ldots,n_m^m\), respectively,
\[ \left(\sum_{r=1}^m n_r^m=n\right). \]
Analogously, the input word \(x=(i_1,\ldots,i_n)\) is divided into \(m\) subwords \(x_1,\ldots,x_r,\ldots,x_m\), each of which encodes one of \(N_r\approx \exp(n_r^m R)\) distinct highly probable and equiprobable source subwords of length \(n_r^m\ge n_0>5\) \((^2)\). Next, the procedure of simultaneous enumeration of the \(N_r\) possible values \(x_r\) \((r=1,\ldots,m)\) is carried out by the method indicated in \((^{3,8})\), in which the “composite” subwords \(x_r'=(x_1,\ldots,x_r)\) of length \(n_r'=n_1^m+\cdots+n_r^m\) are fed to each decision device. The decision to pass the composite subword \(x_r'\) on to the next decision device is made on the basis of comparing it with the output subword \(y_r'=(y_1,\ldots,y_r)\) and a one-threshold comparison (with threshold \(X_r^m\)) of the quantity
\[ l_r=\sum_{s=1}^r \sum_{ij}\frac{m_{ij}^s}{n_r'}\ln\frac{p_i^j}{q_j}, \]
where \(m_{ij}^s=m_{ij}(x_s,y_s)\) is the number of symbols \((i,j)\) in the corresponding positions of the subwords \(x_s\) and \(y_s\);
\[ q_j=\sum_i p_i p_i^j,\qquad p_i=\operatorname{Prob}(i),\qquad C=\max_{\{p_i\}} E_1, \]
\[ \sum_{ij}p_iq_j\ln\frac{p_i^j}{q_j}=E_0<X_r^m<E_1=\sum_{ij}p_ip_i^j\ln\frac{p_i^j}{q_j}\le -E_0. \]
Using the results of \((^{3,7})\), one can find, asymptotically for large values of \(n\), \(\bar N_m\) and the other parameters of the proposed procedure:
\[
\bar N_m=\frac12 m n_1^m \exp(n_1^m R), \qquad
n_1^m \approx \ln(1-P)^{-1}/k_1^1,
\]
\[
n \approx \ln(1-P)^{-1}/k_m^1;
\tag{1}
\]
\[
(k_1^0 k_2^0 \ldots k_m^0)^{1/m}
=
R(1+k_1^1/R)^{1/m},
\qquad
X_1^m \leq \ldots \leq X_m^m \leq X_1^1,
\tag{2}
\]
where \(k_r^0=k^0(\widetilde X_r^m-E_0)=k_{\mathscr P_0,D}(\widetilde X_r^m-E_0)\) and \(k_r^1=k^1(X_r^m-E_1)=k_{\mathscr P_1,D}(X_r^m-E_1)\) are the so-called \(k\)-functions \((^{2,3})\) with parameters
\[ \mathscr P_0=\|p_i q_j\|,\qquad \mathscr P_1=\|p_i p^j\|,\qquad D=\left\|\ln\frac{p_i^j}{q_j}\right\|, \]
\[ \widetilde X_1^m=X_1^m \quad\text{and}\quad \widetilde X_{r+1}^m = X_r^m+\frac{X_{r+1}^m-X_r^m}{k_r^1/k_{r+1}^1-1} \quad (r=1,\ldots,m-1). \]
Fig. 1
The threshold values \(X_r^m\) are uniquely determined by the system of recurrence relations:
\[ k_2^1=F_1(X_1^m)\equiv \frac{k_1^1}{1+k_1^0/R}, \]
\[ k_{r+1}^1=F_r(X_r^m)\equiv \frac{k_r^1}{1+k_r^0(1-k_r^1/k_{r-1}^1)/R} \tag{3} \]
\[ (r=2,\ldots,m-1), \]
\[ k_1^1=G(X_m^m)\equiv \frac{R}{(k_m^0/k_m^1)(1-k_m^1/k_{m-1}^1)-1}, \]
whose geometric interpretation is given in Fig. 1.
Next, from \(X_r^m\) and \(n\), the lengths \(n_r^m\) of the subblocks are determined from the relations
\[ n_1^m \approx n k_m^1\frac{1}{k_1^1}, \qquad n_{r+1}^m \approx n k_m^1\left(\frac{1}{k_{r+1}^1}-\frac{1}{k_r^1}\right) \quad (r=1,\ldots,m-1). \tag{4} \]
The quantities \(n\) and \(m\) satisfy the inequalities
\[ n \geq n_0\,\frac{k_1^1}{k_m^1}\,\frac{R}{k_1^0} > \frac{\ln(k_1^1/k_m^1)}{\ln(1+k_1^0/R)} \geq m-1 \tag{5} \]
for
\[ P \geq P_{\mathrm{cr}}=1-\exp(-n_0 k_1^1 R/k_1^0). \tag{6} \]
In the particular case \(m=1\), from (1) and (2) we obtain the known \((^3)\) relations
\[ \bar N_1=\frac12 n_1^1 \exp(n_1^1 R), \qquad n_1^1 \approx \ln(1-P)^{-1}/k_1^1; \tag{7} \]
\[ k_1^0=R+k_1^1. \tag{8} \]
Comparison of (1) and (2) with (7) and (8) shows that, for one and the same \(P\), the relations
\[ \ln \overline{N}_m / \ln \overline{N}_1 \approx n_1^m / n_1^1 \approx k_1^1 / k_1^m \leqslant 1. \tag{9} \]
hold.
Block decoding (1), based on the maximum-likelihood principle, leads to a number of computations \(N \approx \overline{N}_1\). In the range of high transmission rates \({}^{1}/{}_{2}R_{\text{comp}}<R<C\), it leads to a somewhat smaller, than \(n_1^1\), value of the word length \(n \approx \ln(1-P)^{-1}/k^1(X_R-E_1)\), where \(X_R<X_1^1\) satisfies the relation \(k^0(X_R-E_0)=R\) (see Fig. 1).
If \(R<C\) is fixed, then as \(m\) increases there quickly occurs a limiting decrease in the exponential order of growth of \(\overline{N}_{\infty}\):
\[ \ln \overline{N}_m / \ln \overline{N}_1 \approx k^1(X_1^1-C)/(-E_0)\leqslant k^1(X_1^1-C)/C, \tag{10} \]
which is practically attained already at \(m=10\).
If \(m\) is fixed, then as \(R\to C\), in the limiting case we have from relations (2) \(X_1^m=\ldots=X_m^m=X_1^1=C\) and \(\ln\overline{N}_m/\ln\overline{N}_1=1\), i.e., the effectiveness of parallel decoding is leveled out.
For the most difficult case of a high noise level, determined by an arbitrary matrix \(\mathbf p\), for which \(C\to0\), we have
\[ k_r^1\approx{}^{1}/{}_{4}C(1-\xi_r^m)^2,\qquad k_r^0\approx{}^{1}/{}_{4}C(1+\xi_r^m),\qquad \xi_1^1\approx\eta,\qquad -E_0\approx E_1=C, \]
where \(\xi_r^m=X_r^m/C\) and \(\eta=R/C\). In this case condition (6) is always satisfied, and it can be shown that, however small the number \(\varepsilon>0\), there will always be a sufficiently large number \(m<\dfrac{8}{\varepsilon^3}\ln\dfrac{2}{1-\eta}\) such that
\[ \overline{N}_m={}^{1}/{}_{2}mn_1^m\exp\bigl[(\eta+\varepsilon)\ln(1-P)^{-1}\bigr]\qquad (n_1^m\approx \ln(1-P)^{-1}/C), \tag{11} \]
whereas
\[ \overline{N}_1=\frac{1}{2}n_1^1\exp\left[\frac{4\eta}{(1-\eta)^2}\ln(1-P)^{-1}\right]\qquad \left(n_1^1\approx\frac{4\ln(1-P)^{-1}}{C(1-\eta)^2}\right). \tag{12} \]
Putting \(P=1-10^{-5/\eta}\), from (11) and (12) we obtain
\[ \overline{N}_m\approx{}^{1}/{}_{2}mn_1^m10^5,\qquad n_1^m\approx 5/C\eta,\qquad n\approx 20/C\eta(1-\xi_m^m)^2, \]
\[ \overline{N}_1\approx{}^{1}/{}_{2}n_1^1 10^{20/(1-\eta)^2},\qquad n_1^1\approx 20/C\eta(1-\eta)^2. \]
It follows from this that \(\overline{N}_1\) lies beyond the limits of feasibility, whereas \(\overline{N}_m\) is feasible with the aid of an electronic computer and is acceptable for a number of cases of receiving signals against a background of noise, such as, for example, when reception and decoding may take place at different times.
Preliminary experiments have shown that the total machine time necessary for carrying out parallel decoding on an electronic computer does not exceed realistic limits.
Institute of Radio Engineering and Electronics
Academy of Sciences of the USSR
Received
20 I 1965
CITED LITERATURE
- C. E. Shannon, Information and Control, 1, 1, 6 (1957).
- R. M. Fano, Transmission of Information, N. Y., 1961.
- B. S. Fleishman, Constructive Methods of Optimal Coding for Channels with Noise, Publishing House of the Academy of Sciences of the USSR, 1963.
- J. M. Wozencraft, B. Reiffen, Sequential Decoding, N. Y., 1961.
- I. Ziv, IEEE Trans. on Inform. Theory, 9, No. 2, 64 (1963).
- R. M. Fano, ibid., p. 64.
- B. S. Fleishman, Radiotekhnika i elektronika, 10, 5, 817 (1965).
- B. S. Fleishman, Izv. AN SSSR, Technical Cybernetics, No. 3, 28 (1963).