Adventures in Computation: November 2008

Friday, November 28, 2008

Incentives, computation, and privacy

What do computer scientists bring to the study of game theory? One thing is simply a computer science style of analysis: worst case input, competitive ratios, approximation factors, etc. Perhaps the more interesting thing, however, is the study of how incentive constraints and computation interact: since we want to actually implement any mechanism we might dream up (in principle), it is subject not only to traditional game theoretic constraints such as truthfulness, but also standard computational limitations.

One fundamental question is: What can be computed by algorithms that satisfy constraint X, and it is natural to substitute "incentive compatibility" for "constraint X". An interesting study along these lines appeared in the last FOCS: "On The Hardness of Being Truthful"by Papadimitriou, Schapira, and Singer. This paper gives an example of a natural NP hard problem that can be well approximated efficiently, can be implemented truthfully, but cannot be efficiently well approximated by a truthful implementation: that is, they show that for this problem, truthfulness and computational limitation restrict what is algorithmically possible, although neither limitation is insurmountable on its own.

They call the problem the "Combinatorial Public Projects Problem" (CPPP), although it is really just submodular maximization. In the CPPP, there are n agents, and m potential public goods. a mechanism for the CPP problem will select a subset of k of these goods. The agents have an opinion over which subsets are best: each agent i has a submodular valuation function f_i over sets of goods. The mechanism wishes to maximize the social welfare: it wants to pick the set of k goods that maximizes the sum of the agents valuation functions. We'll call this sum F. Notice that since each f_i is submodular, F is submodular as well, and so the problem the mechanism faces is simply submodular maximization, and it is easy to show that the greedy algorithm (which selects at each stage the good that produces the largest marginal utility, until k goods are selected) achieves a (1-1/e) approximation. Not bad! But also not truthful -- agents may be incentivized to misreport their valuations. Whats more, the classic VCG mechanism achieves perfect social welfare, and enforces truthfulness -- but of course cannot be implemented efficiently, since it involves solving an NP hard problem exactly. What Papadimitriou, Schapira, and Singer show is that no truthful efficient algorithm can guarantee social welfare better by any polynomial factor than Sqrt(m) -- a huge gap from what is possible with either constraint alone.

Now whenever you see a striking impossibility result like this, its tempting to ask how resilient it is. Does it still hold if no player by himself constitutes a large fraction of the total social welfare? What if we relax truthfulness to approximate truthfulness, so that perhaps players can benefit by lying, but can't gain more than some small epsilon?

It turns out this particular problem is no longer so hard when we relax some of our constraints. Anupam Gupta, Katrina Ligett, Frank McSherry, Kunal Talwar, and I have a mostly unrelated paper, called "Differentially Private Approximation Algorithms". In this paper we also ask "What can be computed by algorithms that satisfy constraint X?", but in this paper, constraint X is differential privacy. We get mostly positive results: many NP hard problems, including the CPPP problem can be well approximated efficiently by privacy preserving algorithms. So what does this have to do with truthfulness?

In their 2007 FOCS paper "Mechanism Design via Differential Privacy", Mcsherry and and Talwar observed that mechanisms that satisfy differential privacy are also approximately truthful: privacy guarantees that any single player who changes his input does not change the probability of any output by more than a (1+epsilon) factor; therefore, he cannot increase his utility by more than a (1+epsilon) factor by deviating from a truthful input. In general, lets say that a mechanism satisfies gamma-truthfulness if no agent can benefit by more than an additive gamma factor by deviating from truthful behavior. So suppose in the CPP problem we scale every agent's valuation function so that her maximum valuation is 1. As a corollary of one of our approximation results, we show that for any gamma, there exists an efficient, gamma-truthful mechanism that achieves an additive O(k log m * log(1/gamma)/gamma) approximation factor. (Contrasting with the impossibility result of a multiplicative Sqrt(m) factor for 0-truthful mechanisms). In particular, for instances in which we have OPT = Ω(k log m * log(1/gamma)/gamma), this is an approximately truthful constant factor approximation algorithm.

I think that computer scientists are just starting to scratch the surface of the interplay between game theory and computation. It seems that there is a rich theory of incentive compatible computation waiting to be discovered, and that algorithmic game theory will prove to fit in nicely with the rest of theoretical computer science as a discipline that studies the fundamental abilities and limits of computation.

Friday, November 21, 2008

Computer science in the news

The New York Times magazine has a feature this week on the NetFlix challenge: http://www.nytimes.com/2008/11/23/magazine/23Netflix-t.html?pagewanted=1&hp

Its actually puts computer science in a pretty nice light -- it makes it seem very exciting. In particular, it seems to take the position that complex machine learning systems have something approaching real intelligence -- or at least that they are complex enough that they aren't fully understood by their own creators:

There’s a sort of unsettling, alien quality to their computers’ results. When the teams examine the ways that singular value decomposition is slotting movies into categories, sometimes it makes sense to them — as when the computer highlights what appears to be some essence of nerdiness in a bunch of sci-fi movies. But many categorizations are now so obscure that they cannot see the reasoning behind them. Possibly the algorithms are finding connections so deep and subconscious that customers themselves wouldn’t even recognize them. At one point, Chabbert showed me a list of movies that his algorithm had discovered share some ineffable similarity; it includes a historical movie, “Joan of Arc,” a wrestling video, “W.W.E.: SummerSlam 2004,” the comedy “It Had to Be You” and a version of Charles Dickens’s “Bleak House.” For the life of me, I can’t figure out what possible connection they have, but Chabbert assures me that this singular value decomposition scored 4 percent higher than Cinematch — so it must be doing something right.

The author is definitely giving the computer too much credit for these anomylous clusters (and holds up "singular value decomposition" throughout the article as some sort of magical technique), but the point made is the right one. The difference between real intelligence and "just an algorithm" does in large part seem to be whether or not you can anticipate what its going to do. When you create a program that can come up with output that surprises you, thats pretty cool.

Saturday, November 08, 2008

Auctions with Unknown Supply

Its been a long time since my last post; I spent the summer at Microsoft Research in Silicon Valley, which was tons of fun, and since then I've been writing up some of the work I did there. Oh, and I've also been wasting an inordinate amount of time reading political blogs. But all of these things are now over, and its time for a post. Today, I'll write about one of the cool projects I worked on at Microsoft, with Moshe Babaioff and Liad Blumrosen.

Lots of people have heard talks about truthful auctions by now: A seller has some object he wants to sell, and bidders each have some private value for the item. The seller wants to sell the item to the highest bidder. How should he do it? If he just asks for bids and sells to the highest bidder, charging the bidder his own bid, buyers will lie about how much they value the item, and the item may not actually go to the person who values it the most. A classic result of Vickrey says that the seller should solicit bids, and give the item to the highest bidder, but charge him the second highest price. Then, the bidders can't do any better for themselves by lying about their valuations -- they should all just tell the truth! Auctions that have this property are called truthful auctions. Its easy to generalize this to the case in which the seller has not just a single item, but instead is selling k identical items.

This is all well and good -- the problem of selling items seems solved. And now that auctions are used to sell billions of advertisements on the internet, the theory of truthful auctions can be put to good use.

Unfortunately, there are some characteristics of these auctions that haven't been modelled. Even if we abstract away the details of particular ad systems (different slots worth different amounts, etc.), the items we are selling are page views, and page views are peculiar items indeed: First and foremost, we don't know how many items we will have available to sell! The items arrive online as users make clicks, and since we can't delay showing the user a page, we must decide on the allocation and price of each item as soon as it arrives, before we know how many future items will arrive. So the auctions we really want to study are online. And unlike previous work on online auctions, the auctioneer knows who the bidders are -- its the items that arrive online.

The Vickery auction maximizes social welfare -- the sum of the valuations of the bidders who recieve items. A basic question is how well we can approximate the optimal social welfare when items arrive online. At first blush, it seems that perhaps we can achieve optimal social welfare: After all, if we knew bidders' valuations, the greedy algorithm which simply allocated items as they came in to the unsatisfied bidder with the highest bid would achieve this. The problem is designing a payment rule that incentivizes bidders to actually bid their true values.

But lets step back -- what do we mean, anyway, when we say that the supply is unknown to the mechanism? There are several things we might mean. Perhaps, as is standard in the analysis of online algorithms, we mean that the supply is adversarial -- items arrive one at a time, but may stop at any time, and we want good guarantees on social welfare regardless of how many items arrive. However, this might not be totally realistic. After all, if we are Microsoft (or, lets see... Google?), we're running lots of these auctions every day, and we have good distributional information about how many items arrive each day. So another thing we might mean is that the total number of items that will arrive in any given day is drawn from a distribution that is known to the auctioneer. In this case, we only want good welfare guarantees in expectation over this distribution.

And this gets to the heart of our result: distributional information is -really- important. What we show is that in the adversarial setting, its impossible to get a constant approximation to social welfare. In fact, no deterministic truthful mechanism gets better than the trivial n-approximation in the worst case. Even randomized truthful mechanisms can't achieve better than an Ω(loglog(n)) approximation. On the other hand, in the stochastic setting, it is possible to get a constant approximation to social welfare. For any distribution that satisfies a technical condition known as monotone hazard rate (and many distributions do), there is a deterministic truthful mechanism which achieves a ~ 16 approximation.

So what is the take home message? Pay attention to distributional information, at least when it comes to your supply. Economists often make distributional assumptions about the values of the bidders, and our result shows that this is less important in our setting -- we don't need this to get a good approximation to welfare. This is good, because a search engine has direct access to the distribution over clicks and page views, but must do some guessing to figure out the true values of its bidders (as distinct from their bids). And this is how it should be -- if we have a model in which what we want to accomplish is impossible, we should fix it with the information that we have right under our noses.