Monday, September 10, 2007

The Hunt for a New Solution Concept

The Hunt is On:

In 2006, Daskalakis, Goldberg, and Papadimitriou published a proof that computing Nash Equilibria in matrix games is complete for the complexity class PPAD. For the purposes of this blog entry, lets take that to mean that there is no efficient algorithm to find Nash equilibria in general games (although I'm not sure how strong the consensus is that PPAD is intractable). More recently, it was shown that computing equilibria even in infinitely repeated games (despite the seeming prevalence of such equilibria) is also PPAD complete.

What these results do is to cast doubt upon the Nash equilibrium as a plausible solution concept: Do we really expect a set of disorganized, computationally bounded, possibly ill-informed players to perform a massive intractable computation in some kind of distributed way? This is a call for theory building: As computer scientists, we have to find a solution concept that is as appealing as Nash's, but that is also feasible in a world with computational limitations.

What are we Looking For?

Fabrikant and Papadimitriou have just published a paper that outlines in its introduction three criteria we would like for a good solution concept. It's a good list. In summary:

  1. Realism: We are looking, after all, for a solution concept that will help us predict the behavior of selfish agents. The solutions that fall our of our models should be plausibly rational. Pure strategy Nash equilibria have this property (if we can find them): Since everyone is simultaneously playing a best response to everyone else, it is plausible that no one should want to deviate (although one might also wish to model players who are not perfectly rational, or who wish to deviate from a Nash so as to jolt the system into a more favorable Nash). Mixed Nash are sometimes less plausible: They might require that players randomize among equally good alternatives so as to induce the same behavior in other players -- so that players are optimizing not only their own payoff, but the stability of the system. (On the other hand, in large population games, you can imagine mixed Nash as modeling the behavior of a much larger, but deterministic set of players)

  2. Ubiquity: We would like that our solution concept always exist. Nash proved that mixed Nash always exist, but unfortunately pure Nash equilibria do not (Try to find the pure strategy equilibria in rock-paper-scissors).

  3. Tractability: This is the viewpoint that computer scientists bring to the field, but perhaps this requirement should really be the same as 1: We can't really consider a solution concept to be a realistic model of selfish behavior if it is computationally intractable to find. Unfortunately, neither pure nor mixed Nash equilibria are computationally tractable to find (modulo some big surprises).

We might add a fourth bullet:

4. Flexibility: Playing your part in a Nash equilibrium only gives a meaningful guarantee if everyone else is simultaneously playing their part. If someone deviates from the Nash profile, all bets are off. But do we really expect everyone in a large population game to behave how we predict they will? What if some are irrational, ill-informed, adversarial, mistake-prone, or just have a slightly different objective function than we think they do? It would be nice to be able to still make some predictions about behavior. Degradation in some global objective function due to selfishness is referred to as the Price of Anarchy -- it is odd that existing bounds based on Nash equilibria only hold if everyone is acting in a proscribed manner. Where is the anarchy?

What Have We Got?

There have been some recent attempts at defining new solution concepts. All of them make assumptions about the dynamics of the game, and so really are modeling repeated games. So far, none of them satisfy the full list of desiderata:

Sink Equlibria: A solution concept suggested by Goemans, Mirrokni, and Vetta, sink equilibria live in the better-response graph of a game. Each vertex in the graph corresponds to an action profile, assigning an action to each player. Edges correspond to better-response moves: There is an edge from u to v (with label i) if u and v differ only in the action of player i, and by changing her action from u_i to v_i, player i increases her utility. In the best-response graph (which is what GMV actually study), each player has at most one edge from each vertex, corresponding to the action that improves her payoff the most.

The best-response graph models a dynamic in which players move sequentially, each making a myopic best response (they care only about their immediate payoff). We may imagine play proceeding as a random walk on the graph. Pure strategy Nash equilibria correspond to sinks in the graph (since everyone is playing a best response already, the corresponding vertex has no outgoing edges). Such sinks do not necessarily exist (think rock paper scissors). However, if we imagine collapsing each maximal connected component (Those parts of the graph that we might enter, but can never leave) into a single vertex, the graph is now guaranteed to have sinks. These sinks are the sink equilibria of the game, and their value is the expected value of a random walk on the sink component. The idea is, a random walk in this best-response graph will eventually get caught in a sink, and the value of the game will be dominated by the expected value of the sink.

  1. Realism: This model assumes that players are myopic -- they strategize only to the extent that they greedily pick the best payoff in the next time step. This is limiting in that it sometimes leads to foolish play, and would seem to underestimate the rationality of the player. On the other hand, this might be an ok first order approximation, and in many games it at least allows the players to calculate their next move efficiently, which is an important criteria in realism.

  2. Ubiquity: By construction, sink equilibria always exist (although they may include every state in the game, and so be of limited value).

  3. Tractability: Unfortunately, the FP paper shows that in a large class of concicely representable games, even identifying whether a given game state is part of a sink is PSPACE complete, and so it seems we are no better off than when we started, in terms of computational efficiency.

  4. Flexibility: This model assumes that players play on a best-response graph, and sinks are defined with respect to the graph. But if players deviate -- perhaps they incorporate randomness into their play, fail to calculate their best-response move, or strategize more than one move ahead, the graph that the players are playing on changes, as do the sinks. As a result, this model is brittle to unmodeled behavior.

CUREs: The FP paper suggests another model for repeated games. Imagine that players commit to a strategy that is determined by a restricted class of finite automata, that FP call Unit Recall Strategies. The states of the URS for player i correspond to player i's actions, and the automata reads a tape that at time t tells it what action all the other players are playing at time t. The URS is defined by a set of transitions from action profiles to actions. (So the strategy that a player is committing to is limited by the fact that it must specify an action to play at time t+1 given only the state of the game at time t -- it may not remember anything else). If each player commits to an URS, then we have a completely specified set of game dynamics. Since each player's strategy is a finite automata, game play will eventually loop, and the payoff to player i is defined to be her average payoff of state sin this loop. A Unit Recall Equilibria (URE) is a set of URS such that no player can unilaterally improve her payoff by switching to another URS. A component-wise Unit Recall Equilibria (CURE) is a set of URS such that no player can unilaterally improve her payoff by switching only a single transition rule in her existing URS.

  1. Realism: The model assumes that players commit to a restricted class of finite automata strategies. In general, this is a reasonable assumption, although the restriction that player strategies cannot have memory seems a bit strong. CURE seem particularly restrictive in that they assume myopic play the same way the sink-equilibria do: Even if players may only switch one transition in their automata strategy per timestep, the CURE concept is assuming that they also cannot plan more than one time-step into the future.

  2. Ubiquity: FP show that unfortunately, the more realistic solution concept of the URE need not necessarily exist. The less realistic one, the CURE, however, always exists.

  3. Tractability: FP conjecture that finding an example of the more realistic solution concept, the URE, is in polynomial time, although prove only that it is in NP. It is easy, however, to find a CURE. Unfortunately, the CURE that FP show how to construct is a good example of why the CURE lacks realism as a solution concept -- it depends critically on the fact that players are not permitted to think beyond one step into the future.

  4. Flexibility: The model depends on all players using restricted finite automata strategies. If any player deviates, URE guarantees no longer apply, and CUREs are no longer equilibria if any player may change more than one transition at a time, or may anticipate more than one move into the future.

No Regret Play: Avrim Blum, MohammadTaghi Hajiaghayi, Katrina Ligett, and I have proposed the following solution concept: We imagine that players behave so that they have no regret. By that, we mean the following: In retrospect, after seeing (and fixing) how all of the other players played, no player should have been able to improve his payoff by playing the same fixed action at every turn. Except for this restriction, they may play arbitrarily. This definition is appealing for several reasons.

a) It is strictly a generalization of the one-shot-game Nash equilibrium, because if players repeatedly play a Nash equilibrium, they will satisfy the no-regret property.

b) Although we make no assumptions about the particulars of how players behave, there are simple and efficient algorithms that players could use to guarantee that they have no regret, regardless of how other players behave.

c) Our generalization is enough to get strong guarantees. In particular, in valid games, traffic routing games (both atomic and nonatomic), generalized Hotelling games, average-load-balancing, and presumably others, the "price of anarchy" for no regret play matches the price of anarchy for Nash equilibria.

d) Proving a no-regret price-of-anarchy upper bound implies a price-of-anarchy upper for pure and mixed Nash equilibria, and is often simpler than proving the result directly for mixed Nash.

  1. Realism: The model makes only the assumption that players play so as to have no regret. How realistic is this? Well, its strictly a generalization of the one-shot Nash equilibrium (both pure and mixed), so in this sense its strictly more realistic. In most large-player games, it is rational to desire no regret, in the sense that if you have regret to some fixed action a, you would have been better off pursuing some other strategy (such as playing a the whole time), and there are efficient algorithms that you could run to get this guarantee. On the other hand, the no-regret assumption implicitly assumes that your actions have a negligible affect on other players actions. For example, in a repeated prisoners dilemma game, the (lucrative) collaborative strategy of both players cooperating every round is not no-regret -- both players would have regret to defecting, not realizing that had they defected every round, they would have changed the behavior of their opponent, thus not truly obtaining a higher payoff. So it seems that the no-regret assumption is a realistic model of rational behavior, but only in games for which ones own actions have a minimal affect on the actions of others. (Perhaps formalizing what such games look like would be a fruitful area of research).
  2. Ubiquity: Every game has a no-regret history of play. The proof of this is simply the existence of algorithms that give each player the no-regret guarantee.
  3. Tractability: It is easy for players to guarantee that they have no regret (and thus find no-regret histories of play) in most classes of games. In particular, we can do this in polynomial time for any game that has only a polynomial number of actions for players, and also in some games that have an exponential number of actions per player (such as routing games, and many valid games).
  4. Flexibility: The no-regret property is flexible enough to allow occasional random behavior (A player could build up 'negative regret', and then occasionally spend it with 'irrational' behavior). Since we don't assume players use particular algorithms, players could also use arbitrary strategies that just happen to generate no-regret histories (even if they don't explicitly guarantee this). Perhaps most significantly, we can often prove "price-of-anarchy" style bounds even when only a subset of the players experience no-regret, and others behave arbitrarily (or adversarially). This flexibility seems essential when quantifying "anarchy".

A Common Thread

Each of these solution concepts postulate dynamic equilibria -- none of them require that play 'converge' in the sense that the same action profile be played round after round. This is a property that is probably necessary if we want to take into account "irrational" behavior.

Each solution restricts the set of allowable player strategies, to greater or lesser degree. While this might be necessary for tractable analysis, if we wish to model reality, we should aim for models that are minimally restrictive.

So the hunt is on. A successful model will probably have properties that lend themselves to easy mathematical analysis, but hopefully will also be a convincing description of selfish behavior. Perhaps some experimental work is in order? In large scale networks, do people appear to be playing sequentially in sink equilibria? Can their actions be modeled by simple finite automata? Do they have regret?

(Thanks to the SODA paper announcements here here and here that led me to discover the FP paper which instigated this post)

No comments: