Some Problems on the Behavior of Finite Automata
================================
Submitted 1961-01-01 | SovietRxiv: ru-196101.86536 | Translated from Russian

Abstract Generated abstract

This paper studies finite automata operating in random environments that penalize actions with specified probabilities, formulating their behavior as Markov chains under ergodicity assumptions. It analyzes automata with a linear tactic, deriving the expected penalty and showing that in a stationary environment this expectation decreases with memory capacity and approaches the minimum attainable penalty probability. The paper then extends the model to a compound environment whose probabilistic state changes according to a two-state Markov chain, deriving transition probabilities and an explicit expression for the expected penalty in a symmetric case. In such time-varying environments, the expected penalty is shown to attain a minimum at a finite memory capacity, suggesting an optimal balance between insufficient memory and excessive averaging of changing environmental statistics.

Full Text

CYBERNETICS AND CONTROL THEORY

M. L. TSETLIN

SOME PROBLEMS ON THE BEHAVIOR OF FINITE AUTOMATA

(Presented by Academician I. G. Petrovskii, 22 III 1961)

The note is devoted to the study of the behavior of finite automata in environments that react to their actions in a random manner. Under the assumption that the environment “penalizes” the automaton with a probability determined for each of its actions, it will be shown that the functioning of the automaton is then described by a Markov chain. An example is given of an automaton (an automaton with a linear tactic) for which the mathematical expectation of the penalty decreases as the memory capacity increases and in the limit coincides with the smallest possible in the given environment. Next, the behavior of an automaton with a linear tactic is studied in an environment whose probabilistic properties depend on time as determined by a Markov chain. It is shown that in this case the mathematical expectation of the penalty attains a minimum at some fixed memory capacity and increases with any further increase of it.

  1. Suppose that the input variable \(s(t)\) \((t = 1, 2, \ldots)\) of a finite automaton \(\mathfrak A\) can take only two values, 0 and 1. We shall call the value \(s = 1\) a penalty, and the value \(s = 0\) a non-penalty. Suppose further that the automaton can have \(m\) states \(\varphi_1, \ldots, \varphi_m\). We shall call the number \(m\) the memory capacity of the automaton. The state of the automaton at time \(t\) will be denoted by \(\varphi(t)\).

The automaton is specified by a matrix \(A(s) = \|a_{ik}(s)\|\) \((i, k = 1, \ldots, m)\) of the states of the automaton. The matrix \(A(s)\) determines the transitions of states: if at time \(t\) the automaton \(\mathfrak A\) was in state \(\varphi_i\), then at time \(t + 1\) the automaton will pass into that state \(\varphi_k\) for which \(a_{ik}(s(t + 1)) = 1\). Each row of the matrix contains, for each fixed value of \(s\), exactly one element equal to one, while the remaining elements are zeros. Such matrices in \((^{1,2})\) were called simple.

From the matrix \(A(s)\) of the states of the automaton, for each \(s\) \((s = 0, 1)\) one can construct a state graph. In doing so, to each state \(\varphi_i\) of the automaton there is put in correspondence a vertex \(i\) of the graph, and to each nonzero matrix element \(a_{ik}(s)\), an arrow directed from vertex \(i\) to vertex \(k\). Owing to the simplicity of the matrix \(A(s)\), exactly one arrow issues from each vertex of the state graph. From the state graphs one can uniquely reconstruct the state matrix \(A(s)\).

Suppose further that the output variable \(f(t)\) of the automaton can take only two values, 0 and 1, depending on its state \(\varphi(t)\):

\[ f(t) = f[\varphi(t)]^*. \]

In this case we shall say that the automaton performs one of two actions, 0 and 1.

  1. Let us now turn to the study of the functioning of automata in environments that react to their actions in a random manner.

We shall say that an automaton functions in a stationary random environment \(C = C(p_0, p_1)\) if the values of the input variable and the actions of the automaton are related in such a way that the action \(f\) \((f = 0, 1)\), performed by the automaton at time \(t\), gives rise at time \(t + 1\) to the value \(s = 1\)

* This equality and the state matrix completely determine the automaton. In particular, from them one can reconstruct the canonical equations of the automaton (for more detail see \((^{1,2})\)).

(penalty) with probability \(p_f\), and the value \(s = 0\) (no penalty) with probability \(q_f = 1 - p_f\).

Let us set, on the basis of natural symmetry considerations, \(m = 2n\) and

\[ f(\varphi_j)=0 \quad \text{for } j=1,\ldots,n; \qquad f(\varphi_j)=1 \quad \text{for } j=n+1,\ldots,2n . \tag{1} \]

Then the probability \(p_{ik}\) of transition of the automaton from the \(i\)-th state to the \(k\)-th is determined by the formula

\[ p_{ik} = \begin{cases} p_0 a_{ik}(1) + q_0 a_{ik}(0), & \text{for } i=1,\ldots,n;\\ p_1 a_{ik}(1) + q_1 a_{ik}(0), & \text{for } i=n+1,\ldots,2n. \end{cases} \tag{2} \]

From the simplicity of the matrix \(A(s)\) it follows that the matrix \(P=\|p_{ik}\|\) is stochastic, so that the functioning of the automaton in a stationary random environment is described by a stationary Markov chain. Without any essential restriction of generality one may suppose that this chain is ergodic, i.e., that there exist limiting probabilities of the automaton states independent of its initial state.

Denote by \(r_j\) the limiting probability of the state \(\varphi_j\). Then the mathematical expectation \(M = M(\mathfrak A, C)\) of the penalty for the automaton \(\mathfrak A\) in the environment \(C\) is expressed by the formula

\[ M(\mathfrak A, C)=p_0 \sum_{j=1}^{n} r_j + p_1 \sum_{j=n+1}^{2n} r_j . \tag{3} \]

Let further \(M_{\max}=\max(p_0,p_1)\), \(M_0=\tfrac12(p_0+p_1)\), \(M_{\min}=\min(p_0,p_1)\). Then, obviously, \(M_{\min} \leq M \leq M_{\max}\). The expediency of the automaton’s behavior can be defined as the closeness of the quantity \(M\) to \(M_{\min}\); an automaton for which \(M=M_0\) may naturally be called an automaton not possessing expedient behavior.

Let us give examples of automata with expedient behavior. Let the automaton \(L_1\) have two states \(\varphi_1\) and \(\varphi_2\), \(f(\varphi_1)=0\), \(f(\varphi_2)=1\). The state matrix has the form

\[ A(1)= \begin{pmatrix} 0 & 1\\ 1 & 0 \end{pmatrix}, \qquad A(0)= \begin{pmatrix} 1 & 0\\ 0 & 1 \end{pmatrix}, \]

i.e., the states of the automaton change under a penalty and remain unchanged when there is no penalty. Then, in accordance with (2), we have

\[ P= \begin{pmatrix} q_0 & p_0\\ p_1 & q_1 \end{pmatrix}, \qquad r_1=\frac{p_1}{p_0+p_1}, \quad r_2=\frac{p_0}{p_0+p_1}, \]

and, according to (3),

\[ M(L_1,C)=\frac{2p_0p_1}{p_0+p_1}. \]

It is not difficult to verify that \(M \leq M_0\) (equality occurs only when \(p_0=p_1\)).

Let us now consider the automaton \(L_n\) (an automaton with a linear tactic*). This automaton has \(2n\) states; its state graph is shown in Fig. 1. It is associated with a state matrix whose elements \(a_{ik}(s)\) are determined in the following way:

\[ a_{ik}(0)=1 \quad \text{for } i=2,3,\ldots,n,n+2,\ldots,2n \text{ and } k=i-1; \]

\[ a_{11}(0)=a_{n+1,n+1}(0)=1; \]

\[ a_{ik}(1)=1 \quad \text{for } i=1,2,\ldots,n-1,n+1,\ldots,2n-1 \text{ and } k=i+1; \]

\[ a_{n,2n}(1)=a_{2n,n}(1)=1, \]

the remaining elements are zero. The actions of the automaton are specified by formulas (1).**

* This automaton is a natural generalization of \(L_1\). The term “linear” is connected with the form of the graph.

** Models of automata with a linear tactic were constructed and experimentally studied by V. L. Buĭlov.

From the matrix \(A(s)\) and formulas (2) it is not difficult to find the matrix \(P=\|p_{ik}\|\) of transition probabilities and to compute the limiting probabilities \(r_j\) \((j=1,\ldots,2n)\) of the states of the automaton. For the mathematical expectation of the penalty \(M(L_n,C)\) we obtain:

\[ M(L_n,C)= \frac{ p_0p_1^n \dfrac{p_0^n-q_0^n}{p_0-q_0} + p_1p_0^n \dfrac{p_1^n-q_1^n}{p_1-q_1} }{ p_1^n \dfrac{p_0^n-q_0^n}{p_0-q_0} + p_0^n \dfrac{p_1^n-q_1^n}{p_1-q_1} }. \tag{4} \]

It is important to note that \(M(L_n,C)\) is a decreasing function of the memory capacity \(n\) and that for \(M_{\min}\leq 1/2\)

\[ \lim_{n\to\infty} M(L_n,C)=M_{\min}. \tag{5} \]

One can construct an automaton analogous to the automaton with linear tactics, but having \(k\) actions \(f_1,\ldots,f_k\) and functioning in a stationary random environment \(C=C(p_1,\ldots,p_k)\), where \(p_i\) is the probability of a penalty for the action \(f_i\), and compute for it the mathematical expectation of the penalty, which also decreases as the memory capacity increases and in the limit coincides with \(M_{\min}\).

Fig. 1

  1. Let us now turn to the study of the behavior of automata in environments whose probabilistic properties are given by a Markov chain.

Consider the simplest “compound” environment \(K=K(C_1,C_2,\delta)\), specified by a Markov chain with two states \(C_1\) and \(C_2\) and with transition-probability matrix

\[ \Delta= \begin{pmatrix} 1-\delta & \delta\\ \delta & 1-\delta \end{pmatrix}, \quad \delta\leq 1/2. \]

We shall say that an automaton \(\mathfrak A\) interacts with the compound environment \(K\) if at each moment of time it functions in one of the environments \(C_1\) or \(C_2\), i.e., at each moment of time its actions and the values of the input variable are related as was described above for one of the environments. Moreover, if at moment \(t\) the automaton functioned in the environment \(C_\alpha=C(p_0^{(\alpha)},p_1^{(\alpha)})\), then at moment \(t+1\) it will, with probability \(1-\delta\), function in the same environment and, with probability \(\delta\), in the other one.

Fig. 2

We shall say that the system \(S\), automaton—compound environment, is at moment \(t\) in the state \(\psi_i^{(\alpha)}\) \((i=1,\ldots,2n;\ \alpha=1,2)\), if at this moment the automaton is in the state \(\varphi_i\), while the compound environment is in the state \(C_\alpha\). The probabilities of transitions of the states of the system are described by the block matrix \(\Pi=\|\pi_{ik}\|\), \(i,k=1,\ldots,2n\),

\[ \pi_{ik}= \begin{pmatrix} \bigl[a_{ik}(1)p_\beta^{(1)}+a_{ik}(0)q_\beta^{(1)}\bigr](1-\delta) & \bigl[a_{ik}(1)p_\beta^{(1)}+a_{ik}(0)q_\beta^{(1)}\bigr]\delta \\ \bigl[a_{ik}(1)p_\beta^{(2)}+a_{ik}(0)q_\beta^{(2)}\bigr]\delta & \bigl[a_{ik}(1)p_\beta^{(2)}+a_{ik}(0)q_\beta^{(2)}\bigr](1-\delta) \end{pmatrix} \tag{6} \]

where \(\beta=0\) for \(i=1,\ldots,n\) and \(\beta=1\) for \(i=n+1,\ldots,2n\).

Under the assumption that the system under consideration is ergodic, the limiting probabilities \(r_i^{(\alpha)}\) of the states \(\psi_i^{(\alpha)}\) of the system \(S\) do not depend on the initial-

states. Then the mathematical expectation of the penalty is expressed by the formula

\[ M = M(\mathfrak{A}, K)=p_0^{(1)}\sum_{i=1}^{n} r_i^{(1)} +p_1^{(1)}\sum_{i=n+1}^{2n} r_i^{(1)} +p_1^{(2)}\sum_{i=1}^{n} r_i^{(2)} +\bar p_1^{(2)}\sum_{i=n+1}^{2n} r_i^{(2)} . \tag{7} \]

For an automaton with the linear strategy described above, one can compute the limiting probabilities of the states of the system and determine the value \(M(L_n,K)\) of the mathematical expectation of the penalty. Such a computation (for the case \(p_0^{(1)}=p_1^{(2)}=p;\; p_1^{(1)}=p_1^{(2)}=1-p\)) leads to the formula

\[ M(L_n,K)=\frac12-\frac{(\alpha-1)^2}{2} -\frac{\operatorname{ch} ny-1} {\dfrac{2n\delta}{1-2\delta}(\alpha+1)^2\operatorname{ch} ny +(\alpha-1)^2\operatorname{cth}\dfrac{y}{2}\operatorname{sh} ny}, \tag{8} \]

where \(\alpha=\dfrac{p}{(1-p)}\); \(\operatorname{ch} y=\dfrac{(1+\alpha)^2}{2\alpha}\dfrac{(1-\delta)}{(1-2\delta)}-1\). Typical curves are presented in Fig. 2 (\(p=0.33\)).

It is not difficult to verify that \(M(L_n,K)\) attains a minimum \(m\) at some finite value \(n=n_0\) of the memory capacity; the cases \(n=0\) and \(n=\infty\) give no gain in comparison with an automaton that does not possess purposeful behavior.

The existence of this minimum can be explained by the fact that when the memory size is too small, information about the state of the component environment in which the automaton is located is used insufficiently; whereas with an excessive growth of \(n\) there occurs, so to speak, an averaging of the statistical properties of both states of the component environment. It is therefore natural that, as \(\delta\) decreases, the value \(n_0\) increases, while \(m\) decreases. As \(\delta\to0\), expression (8) passes into (4), so that \(n_0\to\infty\), \(m\to M_{\min}=\min(p,1-p)\).

Formula (8) makes it possible to construct automata with linear strategy that possess the most purposeful behavior in a given component environment. The difference \(d=1/2-m\) may serve as a measure of the distinguishability of random environments \(C_1\) and \(C_2\) for a finite automaton with linear strategy (in our case \(M_0=1/2\)). The values \(n_0\) and \(d\) for several values of \(p\) and \(\delta\) are given in Table 1 (in each cell there is written a pair of numbers, the first of which is \(n_0\), and the second \(d\)).

Table 1

\(p\) \ \(\delta\) 0.001 0.01 0.032 0.1 0.32 0.45
0.1 3; 0.396 2; 0.372 2; 0.336 1; 0.256 1; 0.115 1; 0.032
0.2 5; 0.294 3; 0.266 2; 0.223 2; 0.157 1; 0.065 1; 0.018
0.25 6; 0.244 4; 0.212 3; 0.172 2; 0.116 1; 0.045 1; 0.012
0.33 8; 0.158 5; 0.125 3; 0.091 2; 0.055 1; 0.020 1; 0.006
0.4 11; 0.089 6; 0.056 4; 0.037 2; 0.020 1; 0.007 1; 0.002
0.45 16; 0.036 7; 0.017 4; 0.010 2; 0.005 1; 0.002 1; 0.001

I take this opportunity to express my deep gratitude to I. M. Gelfand, D. S. Lebedev, and O. B. Lupanov for their interest in and attention to this work, and to B. D. Efremov for great assistance with the computations.

Steklov Mathematical Institute
Academy of Sciences of the USSR

Received
16 III 1961

References

  1. M. L. Tsetlin, DAN, 117, No. 6 (1957).
  2. M. L. Tsetlin, in: Problems of Cybernetics, vol. 1, Moscow, 1958.

Submission history

Some Problems on the Behavior of Finite Automata