Saturday, March 29, 2014

Lecture 9 -- Purchasing Private Data from Privacy Sensitive Individuals

Yesterday in our privacy and mechanism design course, we were fortunate to have a guest lecture by David Xiao. David told us his exciting recent paper, with Kobbi Nissim and Salil Vadhan, Redrawing the Boundaries on Purchasing Private Data from Privacy-Sensitive Individuals.

Consider the following scenario: An analyst wishes to conduct some medical study about an underlying population, but needs to obtain permission from each individual whose data he uses. On the one hand, he needs to buy data from a representative sample of the population so that his study is accurate. On the other hand, he needs to compensate individuals for their privacy costs, and would like to come up with a payment scheme that incentivizes them to report their true privacy costs, rather than inflating them for selfish gain. Finally, he wants the mechanism to be individually rational: that no rational agent should obtain negative utility by interacting with the analyst.

Because individual's costs for privacy are a function of the method by which their reports are used to compute the outcome of the mechanism, rather than just a function of the outcome itself, this takes us outside of a standard mechanism design setting. What makes the problem tricky is that individual's costs for privacy could quite plausibly be correlated with their private data. Suppose the analyst wishes to estimate the fraction of people in some population who have syphilis. It is reasonable to expect that syphilitics will on the whole want to be compensated more than healthy individuals for a loss of privacy. But this means that even computations on agents reported costs for privacy (and independent of agent's supposedly private data) can lead to privacy loss for those agents, and so must be compensated.

Some years ago Arpita Ghosh and I studied this problem, and showed an impossibility result when making some (unreasonably) strong assumptions. One might have hoped that our result could be circumvented with one of several tweaks to the model. But no. David and his coauthors extend this impossibility result to have much wider applicability, making fewer assumptions on what the analyst is able to observe, and far fewer assumptions about the form of the privacy loss function of the agents. Their result is quite robust: Under extremely general circumstances, no truthful individually rational mechanism which makes finite payments can distinguish between two populations, in one of which everyone has syphilis, and in the other of which nobody does. This result says that no mechanism can simultaneously enjoy truthfulness, individual rationality, and non-trivial accuracy properties, and so without drastically relaxing the model of how people might value privacy, you must always give up on one of these.

They do propose one such relaxation, which seems to reduce to something like the assumption that contracting syphilis can only ever cause your costs for privacy to increase, never to decrease. But this is probably not the last word. I think that convincing answers for what to do in the face of their impressive impossibility result are still to be proposed, and is a really interesting question.

Friday, March 21, 2014

Lecture 8 -- Implementing Correlated Equilibria Ex-Post in Incomplete Information Games

After a spring break hiatus, our class on privacy and mechanism design returns (with all of our students working on their course projects!)

In our third and final lecture on using mediators to implement equilibrium of complete information games in settings of incomplete information, we ask how far we can push the agenda of obtaining ex-post implementations via the technique of differentially private equilibrium computation. Recall that last lecture we saw how to do this in large congestion games. Can we get similar results in arbitrary large games?

One obstacle is that we do not know good algorithms for computing Nash equilibria in arbitrary games at all, let alone privately. However, we do know how to compute arbitrarily good approximations to correlated equilibria in arbitrary n player games! In this lecture, we explore the game theoretic implications of private computation of correlated equilibria, and then show how to do it.

The punchline is you can still implement (correlated) equilibria of the complete information game as an ex-post Nash equilibrium of the incomplete information game, but you need a slightly stronger mediator (which has the power to verify player types if they decide to use the mediator).

To accomplish this goal, we introduce a couple of interesting tools: A powerful composition theorem in differential privacy due to Dwork, Rothblum, and Vadhan, and the multiplicative weights (or weighted majority, or polynomial weights, or hedge, or...) algorithm, which is a natural learning dynamic and can be used to quickly compute (coarse) correlated equilibria in arbitrary games.

Next week we will have a guest lecture, combined with our theory seminar, by David Xiao, who will tell us about his exciting recent work on Redrawing the Boundaries on Purchasing Data from Privacy Sensitive Individuals. 

Saturday, March 01, 2014

Lecture 7 -- Privacy Preserving Public Information for Sequential Games

Our seventh lecture was given by Jamie Morgenstern, about her very interesting paper joint with Avrim Blum, Ankit Sharma, and Adam Smith.

The birds-eye view is the following:

Suppose players (think financial institutions) take turns sequentially deciding which investments to make, from amongst a feasible set, which can be different for each player. In general, the profit that a player gets from an investment is a decreasing function of how many players previously have made the same investment. (Perhaps these investments are structured like pyramid schemes, where there is a big advantage in getting in early, or perhaps the market is somehow destabilized if there is too much capital invested in it).

We can study this interaction in various information models. In the complete information setting, each player sees exactly how much has been invested in each resource at the time that she arrives, and the unique dominant strategy solution of the game is for each player to invest greedily. They show that this solution achieves a good constant factor approximation to the optimal welfare.

But if the players are financial institutions, then their investments might represent sensitive trade secrets, and they may not want to share this information with others -- in which case the complete information setting seems unrealistic. This could be very bad news however -- if players have no information at all about what has gone on before their arrival, its not hard to cook up plausible sounding behaviors for them which result in disastrous welfare outcomes.

So the paper asks: can we introduce a differentially private signal (so one that necessarily reveals little actionable information about each agent, and therefore one whose introduction the agents have little reason to object to) that nevertheless allows the market to achieve social welfare that approximates OPT.

Skipping over some details, this paper shows that the answer is yes. Making public a differentially private count of how much has been invested in each resource as the game plays out is enough to guarantee that sequential play (studied either as the simple behavioral strategy in which players imagine that the noisy counts are exactly correct, or any solution in which players play only undominated strategies) results in an outcome that has a bounded competitive ratio with OPT.

This paper also contains an interesting technique that will probably be useful in other contexts: they develop a new method of privately maintaining the count of a set of numbers that achieves better additive error as compared to previous work, at the cost of introducing some small multiplicative error. In the application they need counters for in this paper, this modification gives improved overall bounds.