Abstract Generated abstract
This note extends Kuhn’s theorem on the equivalence of mixed and behavior strategies in games in generalized form to a broader class of recall structures. It introduces auxiliary notions describing relevant preceding information sets, proves two general results about possible positions and relevance of strategies, and defines acyclic memory, ordering memory, reductions, and reduced strategies. The main result shows that if a player has ordering memory, then every mixed strategy of that player is equivalent to its corresponding reduced strategy. In the special case of perfect memory, these reductions coincide with behavior strategies, so the theorem recovers the sufficient part of Kuhn’s result as a special case.
Full Text
MATHEMATICS
N. N. VOROB'EV
REDUCED STRATEGIES FOR GAMES IN GENERALIZED FORM
(Presented by Academician A. N. Kolmogorov on 5 III 1957)
In Kuhn’s paper \((^{1})\) it was proved that if, in a game in generalized form, a certain player has perfect recall, then every mixed strategy of his is equivalent to the corresponding behavior strategy. Further properties of players’ recall for games in generalized form were considered in \((^{2-4})\).
In the present note an extension of Kuhn’s theorem on behavior strategies is given for a more general case of a player’s recall.
The notation and terms introduced below without definitions or explanations correspond to those in paper \((^{1})\).
In what follows, \(\upsilon_U\) will denote an arbitrary alternative of the information set \(U\).
If \(U\) is an information set in which the turn of move belongs to the \(i\)-th player, then \(\mathfrak{B}_i(U)\) denotes the family of all information sets \(V\) for which: a) \(V \in \mathfrak{U}_i\); b) \(V < U\); c) there does not exist such an alternative \(\upsilon_V\) that \(U \subset D(V,\upsilon_V)\).
If \(X\) is a position, \(U\) an information set, with \(U < X\), then \(\upsilon_U^{(X)}\) denotes such an alternative \(U\) that
\[ X \in D(U,\upsilon_U^{(X)}). \]
Theorem 1. If \(X, Y\) are positions of the game contained in the information set \(U\), and \(\pi_i\) is a strategy of the \(i\)-th player for which \(X \in \operatorname{Poss}\pi_i\), then there exists a strategy \(\pi_i^*\) of the \(i\)-th player, coinciding with \(\pi_i\) on all information sets from \(\mathfrak{U}_i \setminus \mathfrak{B}_i(U)\), for which \(Y \in \operatorname{Poss}\pi_i^*\).
References to this theorem (valid for arbitrary games in generalized form) can in a number of cases replace references to perfect recall of the \(i\)-th player (which, evidently, does not always hold).
Let \(X\) be a position. Denote by \(\mathfrak{D}_i(X)\) the family of all information sets from \(\mathfrak{U}_i\) preceding \(X\).
Let \(\mathfrak{B} \subset \mathfrak{U}_i\); \(\pi_i\) be a strategy of the \(i\)-th player; \(X\) a position. We shall write
\[ X \in \operatorname{Poss}\pi_i, \]
if for every \(U \in \mathfrak{B}\) one has \(\pi_i(U)=\upsilon_U^{(X)}\).
Theorem 2. Let \(X \in U \in \mathfrak{U}_i\) and \(X < Z\). Denote by \(Y\) the position nearest to \(X\) following \(X\) and preceding \(Z\) (if such exists), and by \(V\) the information set containing it.
Then, whatever the strategy \(\pi_i\) of the \(i\)-th player may be, in order that
\[ U \in \operatorname{Rel}\pi_i,\qquad \pi_i(U)=\upsilon_U^{(Z)},\qquad X \in \operatorname{Poss}_{\mathfrak{D}_i^*(X)\cap \mathfrak{B}_i(U)} \pi_i, \]
it is necessary and sufficient that
\[ V \in \operatorname{Rel}\pi_i,\qquad Y \in \operatorname{Poss}\pi_i. \]
\[ \mathfrak{D}_i(Y)\cap \mathfrak{B}_i(V) \]
We shall say that the \(i\)-th player has acyclic memory in the game \(\Gamma\) if in the family \(\mathfrak{U}_i\) of all his information sets one cannot indicate a sequence \(V_1,\ldots,V_k\) such that
\[ V_2<\cdots<V<V_1. \]
The family \(\mathfrak{U}_i=\{U_1,\ldots,U_r\}\) is called standardly ordered if from \(k<l\) it follows that \(U_l \not< U_k\). Obviously, for the possibility of a standard ordering of the family \(\mathfrak{U}_i\) it is necessary and sufficient that the \(i\)-th player have acyclic memory.
In what follows it will be assumed that the enumeration \(U_1,\ldots,U_r\) of the elements of \(\mathfrak{U}_i\) is their enumeration in the standard order.
A system of nonnegative numbers
\[ \rho_i\bigl[v_U,\ v_U\bigl(U\in \mathfrak{B}_i(U_k)\bigr)\bigr], \tag{1} \]
defined for all \(U_k\in\mathfrak{U}_i\), for arbitrary alternatives \(v_{U_k}\) and \(v_U\bigl(U\in\mathfrak{B}_i(U_k)\bigr)\), and satisfying the conditions
\[ \sum_{v_{U_k}}\rho_i\bigl[v_{U_k},\ v_U\bigl(U\in\mathfrak{B}_i(U_k)\bigr)\bigr]=1 \tag{2} \]
for any \(U_k\in\mathfrak{U}_i\) and \(v_U\bigl(U\in\mathfrak{B}_i(U_k)\bigr)\), will be called a reduction of the \(i\)-th player in the game \(\Gamma\).
It is readily established that if, for any set \(T\) of pure strategies of the \(i\)-th player, one puts
\[ \rho_i[T]=\sum_{\pi_i\in T}\prod_{k=1}^{r} \rho_i\bigl[\pi_i(U_k), \]
\[ \pi_i(U)\bigl(U\in\mathfrak{B}_i(U_k)\bigr)\bigr], \]
then \(\rho_i\) turns out to be a probability measure on the set of all pure strategies of the \(i\)-th player, i.e., a certain mixed strategy of his. This mixed strategy is called the mixed strategy corresponding to the reduction \(\rho_i\).
One of the ways of specifying reductions of the \(i\)-th player is the following. Let \(\mu_i\) be some mixed strategy of the \(i\)-th player. Put, for any \(U_k\in\mathfrak{U}_i\), \(v_{U_k}\), and \(v_U\bigl(U\in\mathfrak{B}_i(U_k)\bigr)\),
\[ \rho_{\mu_i}\bigl[v_{U_k},\ v_U\bigl(U\in\mathfrak{B}_i(U_k)\bigr)\bigr] = \frac{\displaystyle \sum_{\pi_i}' \mu_i[\pi_i]} {\displaystyle \sum_{\pi_i}'' \mu_i[\pi_i]}, \]
where the sum \(\sum'\) is taken over those \(\pi_i\) satisfying the conditions
\(\pi_i(U_k)=v_{U_k}\), \(\pi_i(U)=v_U\bigl(U\in\mathfrak{B}_i(U_k)\bigr)\), \(U_k\in\operatorname{Rel}\pi_i\), and the sum \(\sum''\) over those \(\pi_i\) satisfying the conditions
\(\pi_i(U)=v_U\bigl(U\in\mathfrak{B}_i(U_k)\bigr)\), \(U_k\in\operatorname{Rel}\pi_i\), if
the fraction on the right has meaning (i.e., if its denominator is different from zero), and
\[ \rho_{\mu_i}=[v_{U_k},\,v_U(U\in \mathfrak{B}_i(U_k))]= \sum_{\substack{\pi_i\\ \pi_i(U_k)=v_{U_k}}}\mu_i[\pi_i] \]
otherwise (here \(\mu_i[\pi_i]\) denotes the probability of the pure strategy \(\pi_i\) under the mixed strategy \(\mu_i\)).
The numbers of the form (1) thus defined are, obviously, nonnegative and satisfy the relations (2), so that \(\rho_{\mu_i}\) may be regarded as a reduction of the \(i\)-th player. This reduction is called the reduction corresponding to the mixed strategy \(\mu_i\).
If \(\mu_i'\) is a mixed strategy corresponding to the reduction \(\rho_{\mu_i}\), which in turn corresponds to the mixed strategy \(\mu_i\), then \(\mu_i'\) is called a reduced strategy corresponding to \(\mu_i\).
We shall say that in the game \(\Gamma\) the \(i\)-th player has an ordering memory if, whatever the information sets \(U\) and \(V\) from \(\mathfrak{U}_i\) may be, from \(V<U\) it follows that \(U\subset D(V)\).
Obviously, a player who possesses ordering memory in the game \(\Gamma\) also possesses acyclic memory in this game.
Theorem 3. If in the game \(\Gamma\) the \(i\)-th player possesses ordering memory, then every mixed strategy of this player is equivalent to the corresponding reduced strategy.
It is not difficult to verify that, in the case where the player has perfect memory, his reductions are nothing other than his behaviors, and the reduced strategies are behavior strategies. Therefore Theorem 3 generalizes Kuhn’s theorem on behavior strategies in its sufficient part.
Received
9 I 1957
REFERENCES
¹ H. W. Kuhn, Contributions to the Theory of Games, 2, Princeton, 1953, p. 193—216. ² N. Dabbkey, ibid., p. 217—243. ³ G. L. Thompson, ibid., p. 267—277. ⁴ B. J. Brich, Proc. Cambr. Phil. Soc., 51, No. 2, 275 (1955).