Abstract Generated abstract
This paper studies a differential game of pursuit for two linear controlled motions with bounded controls, where the pursuer seeks coincidence of specified coordinates in minimal worst case time. It relates the feedback synthesis problem to programmed minimax problems and formulates an extremal aiming rule using the fundamental matrix and a minimal function that determines the absorption time. The paper shows that in nondegenerate cases this rule yields the optimal encounter time, while an oscillator example demonstrates complications caused by sliding, cyclic, or quasicyclic regimes. To handle these degeneracies, a regularized formulation with stepwise updating of an auxiliary encounter time is introduced, preserving the game information structure and recovering the same optimal value.
Full Text
UDC 517.934+62.50
CYBERNETICS AND CONTROL THEORY
Corresponding Member of the Academy of Sciences of the USSR N. N. KRASOVSKII
ON THE PROBLEM OF A GAME ENCOUNTER OF MOTIONS
Consider the problem \((^{1-5})\) of the minimax of the time \(T\) until the encounter of the pursuing \((y(t))\) and pursued \((z(t))\) motions
\[ \dot y=Ay+Bu,\qquad \dot z=Az+Bv, \tag{1} \]
where the controls \(u\) and \(v\) are constrained by
\[ \|u(t)\|\leq \mu,\qquad \|v(t)\|\leq \nu\qquad (\mu>\nu-\text{ const}), \tag{2} \]
and the symbol \(\|q\|\) denotes the Euclidean norm of the vector \(q\). We shall treat the vectors under consideration as column vectors; the superscript \(*\) will denote transposition; the symbol \([Q]_m\) will denote the matrix composed of the first \(m\) rows of the matrix \(Q\). The objective of pursuit consists in the coincidence of the vectors \([y(t)]_m\) and \([z(t)]_m\). The control \(u\) must be formed according to the feedback principle at each instant \(t=\tau\), on the basis of the realized values \(y(\tau)\) and \(z(\tau)\), i.e. \(u[\tau]=u[y(\tau),z(\tau)]\). In order to distinguish the programmed controls \(u\) and \(v\), specified a priori as functions of the time \(t\), from the realizations of the controls \(u\) and \(v\) constructed according to the feedback principle,
\[ u=u[y,z],\qquad v=v[y,z], \tag{3} \]
but realized in concrete cases as functions of the time \(t\), we shall denote the former by the symbols \(u(t), v(t)\), and the latter by the symbols \(u[t]\) and \(v[t]\). We assume that the coordinates \(y_j(t)\) and \(z_j(t)\) \((j=1,\ldots,m)\) are controllable, i.e. for any \(y(\tau)\) and \(z(\tau)\) and for any \(\vartheta>\tau\) one can choose \(u(t)\) and \(v(t)\) \((\tau\leq t<\vartheta)\) ensuring the equalities \([y(\vartheta)]_m=0,\ [z(\vartheta)]_m=0\). Thus we have the problem: find
\[ T_*^0=\min_u \sup_v T\quad \text{for}\quad [y(\tau+T)]_m=[z(\tau+T)]_m=0, \tag{4} \]
where \(u\) is chosen among functions (3), and \(v\) among functions (3) and among controls \(v(t)\). We shall call the solution \(u^0[y(\tau),z(\tau)]\) of problem (4) the optimal control.
Problem (4), like other problems of optimal synthesis, is connected with suitable programmed problems. A rigorous justification of such an approach in the general case of a differential game, as is well known \((^2)\), encounters great difficulties. In the particular case under consideration, the desired minimum \(T^0\) of \(\max_v T\) is determined by the instant of absorption \((^{4,5})\) \(\vartheta^0\) of the process \(z(t)\) by the process \(y(t)\), and the construction of the control \(u^0\) can be connected with the rule of extremal aiming described in \((^{4,5})\). The purpose of the present article is to discuss this rule in the case (1), (2). The extremal control \(u[y(\tau),z(\tau)]^0\) has the form
\[ u[y(\tau),z(\tau)]^0 =\mu h^0_{x(\tau)}(\tau+0)/\|h^0_{x(\tau)}(\tau+0)\|. \tag{5} \]
Here \(x(\tau)=y(\tau)-z(\tau)\); \(h^0_{x(\tau)}(t)=l^{0*}[F(\vartheta^0-t)B]_m\) \((\tau\leq t<\vartheta^0)\) —
minimal function, moreover
\[ \int_{\tau}^{\vartheta^0}\left\|\,l^{0*}\left[F(\vartheta^0-t)B\right]_m\,\right\|\,dt = \min_l \int_{\tau}^{\vartheta^0}\left\|\,l^*\left[F(\vartheta^0-t)B\right]_m\,\right\|\,dt = \varkappa[x(\tau),\vartheta^0]=\frac{1}{\zeta} \tag{6} \]
\[ \left(l^*\left[F(\vartheta^0-\tau)x(\tau)\right]_m=-1\right), \]
where \(\zeta=\mu-\nu\), \(\vartheta^0\) is the smallest root of the equation \(\varkappa[x(\tau),\vartheta]=1/\zeta\), and \(F(t)\) is the fundamental matrix of the equation \(\dot x=Ax\).
By choosing \(v=\nu u/\mu\) it is verified that, for any \(u=u[y(\tau),z(\tau)]\), the inequality \(\sup_v T \ge \vartheta^0-\tau\) holds, and equality is possible only when \(u=[y(\tau),z(\tau)]\) (in essence), i.e. \(T^0\ge \vartheta^0-\tau\). The most convenient is the nondegenerate case in which \(\|h^0_{x(\tau)}(\tau)\|>0\) for all \(\tau\) (for each function \(h^0_{x(\tau)}(t)\)). In this case, for \(u=u[y(\tau),z(\tau)]^0\) and for the corresponding admissible \(v\), the first equation (1) works as an ordinary differential equation; problem (4) has the solution \(T^0=\vartheta^0-\tau\) and \(v^0[y(\tau),z(\tau)]=u[y(\tau),z(\tau)]^0\).
Fig. 1
In the special case, the simple example
\[ \dot y_1=y_2,\qquad \dot y_2=-y_1+u;\qquad \dot z_1=z_2,\qquad \dot z_2=-z_1+v, \tag{7} \]
where it is required to achieve meeting in both coordinates, shows that the solution of problem (4) becomes seriously more complicated. In example (7), equality (5) means that the control \(u[y(\tau),z(\tau)]^0\) is determined from the phase vector \(x=y(\tau)-z(\tau)\) in the following way (see Fig. 1). Above the curve \(L\) we have \(u[y(\tau),z(\tau)]^0=-\mu\), below the curve \(L\) we have \(u[y(\tau),z(\tau)]^0=\mu\), and on the curve \(L\): \(u[y(\tau),z(\tau)]^0=-\mu\) for \(x_1<0\) and \(u[y(\tau),z(\tau)]^0=\mu\) for \(x_1>0\). The function \(v[y(\tau),z(\tau)]\) can be selected so that, for \(u=u[y(\tau),z(\tau)]^0\), a sliding mode will arise on the curve \(L\).* This mode can be included among the admissible motions by introducing sequences \(\{\tau_k\}\) \((\tau_{k+1}-\tau_k=\delta>0)\) and setting \(u[\tau]=u[y(\tau_k),z(\tau_k)]\) for \(\tau_k\le \tau<\tau_{k+1}\), and then letting \(\delta\) tend to zero. However, the parameters \(\mu\) and \(\nu\) and the function \(v\) can then be chosen so that, for \(u=u[y(\tau),z(\tau)]^0\) and for every arbitrarily small \(\delta>0\), cyclic or quasicyclic motions will arise in a neighborhood of the point \(A\). Therefore, as \(\delta\to0\), it is reasonable to treat this point as a point of rest on the sliding mode, which, consequently, does not converge at the point \(x=y-z=0\). The indicated circumstances lead to a modification of the problem. We introduce the above-mentioned sequences \(\{\tau_k\}\) and admit among the arguments of the function \(u\) one more quantity \(\vartheta\), which will also change only at the instants \(\tau=\tau_k\). In this case, the quantity \(\vartheta\) at the instant \(\tau_k\) must be determined only from the values \(y[\tau_k]\), \(z[\tau_k]\), and \(\vartheta[\tau_{k-1}]\). (If pursuit began at \(\tau>\tau_{k-1}\), then \(\vartheta[\tau_k]\) is determined only from the values \(y[\tau_k]\), \(z[\tau_k]\).) We estimate the result of the pursuit by the quantity
\[ \gamma_u=\sup_{\varepsilon}\left\{\limsup_{\delta\to0}\left[\sup_v T^\varepsilon_{u,v}\right]\right\}, \]
where \(\varepsilon>0\), and \(t=\tau+T^\varepsilon_{\eta,v}\) is the instant at which, for the chosen \(u\) and \(v\), for the first time
\[ \left\|[y(t)]_m-[z(t)]_m\right\|\le \varepsilon. \]
The problem consists in choosing a control \(u^0[y,z,\vartheta]\) that ensures the minimum
\[ T^0=\gamma_{u^0}=\min_u \gamma_u. \tag{8} \]
* The author’s attention was drawn to this circumstance, which served as the source of the present work, by Yu. M. Repin, who modeled similar modes on analog devices (see \((^4)\)).
The following assertion is valid:
Problem (8) has a solution, and \(T^0=\vartheta^0-\tau\). At the points \(x=y(\tau)-z(\tau)\), where \(\|h_x^0(\tau)\|>0\) (for each function \(h_x^0(t)\)), on limiting motions we have \(u^0[\tau]=u[y(\tau),z(\tau)]\), whatever admissible \(v\) may be. At the points \(x\) where \(\|h_x^0(\tau)\|=0\), the limiting values \(u^0[\tau]\) (defined conditionally, as a regularization of the sliding regime) depend, generally speaking, on \(v\).
The assertion is proved by constructing the control \(u^0[y[t],z[t],\vartheta[t]]\) in a form analogous to (5). In this case, by introducing the quantity (8), regularization is achieved by replacing, at each step \(\tau_k\), the irregular problems of minimizing and maximizing the time of encounter \(\vartheta\) by more regular problems of minimizing and maximizing the distance between the vectors \([y(\vartheta)]_m\) and \([z(\vartheta)]_m\), or the intensities of the controls \(\operatorname{vrai\,max}\|u(t)\|\), \(\operatorname{vrai\,max}\|v(t)\|\) \((\tau\leq t<\vartheta)\) for fixed time \(\vartheta\). We emphasize that the regularization of the problem associated with the introduction of \(\vartheta[\tau]\) and \(\gamma_u\) preserves the game condition: in computing \(u[\tau]\), information about the choice of \(v[t]\) or \(v(t)\) \((t\geq\tau)\) is not used.
Sverdlovsk Branch
of the V. A. Steklov Mathematical Institute
Academy of Sciences of the USSR
Received
28 XI 1966
REFERENCES
- R. Bellman, Adaptive Control Processes, IL, 1964.
- L. S. Pontryagin, Uspekhi Mat. Nauk, 21, no. 4 (130) (1966).
- R. Isaacs, Differential Games, N.Y.—London—Sydney, 1965.
- N. N. Krasovskii, Yu. M. Fel’dman, V. E. Tretyakov, Izv. AN SSSR, Tekhnicheskaya kibernetika, no. 4 (1965).
- N. N. Krasovskii, PMM, 30, no. 2 (1966).