Saturday, March 29, 2014

Lecture 9 -- Purchasing Private Data from Privacy Sensitive Individuals

Yesterday in our privacy and mechanism design course, we were fortunate to have a guest lecture by David Xiao. David told us his exciting recent paper, with Kobbi Nissim and Salil Vadhan, Redrawing the Boundaries on Purchasing Private Data from Privacy-Sensitive Individuals.

Consider the following scenario: An analyst wishes to conduct some medical study about an underlying population, but needs to obtain permission from each individual whose data he uses. On the one hand, he needs to buy data from a representative sample of the population so that his study is accurate. On the other hand, he needs to compensate individuals for their privacy costs, and would like to come up with a payment scheme that incentivizes them to report their true privacy costs, rather than inflating them for selfish gain. Finally, he wants the mechanism to be individually rational: that no rational agent should obtain negative utility by interacting with the analyst.

Because individual's costs for privacy are a function of the method by which their reports are used to compute the outcome of the mechanism, rather than just a function of the outcome itself, this takes us outside of a standard mechanism design setting. What makes the problem tricky is that individual's costs for privacy could quite plausibly be correlated with their private data. Suppose the analyst wishes to estimate the fraction of people in some population who have syphilis. It is reasonable to expect that syphilitics will on the whole want to be compensated more than healthy individuals for a loss of privacy. But this means that even computations on agents reported costs for privacy (and independent of agent's supposedly private data) can lead to privacy loss for those agents, and so must be compensated.

Some years ago Arpita Ghosh and I studied this problem, and showed an impossibility result when making some (unreasonably) strong assumptions. One might have hoped that our result could be circumvented with one of several tweaks to the model. But no. David and his coauthors extend this impossibility result to have much wider applicability, making fewer assumptions on what the analyst is able to observe, and far fewer assumptions about the form of the privacy loss function of the agents. Their result is quite robust: Under extremely general circumstances, no truthful individually rational mechanism which makes finite payments can distinguish between two populations, in one of which everyone has syphilis, and in the other of which nobody does. This result says that no mechanism can simultaneously enjoy truthfulness, individual rationality, and non-trivial accuracy properties, and so without drastically relaxing the model of how people might value privacy, you must always give up on one of these.

They do propose one such relaxation, which seems to reduce to something like the assumption that contracting syphilis can only ever cause your costs for privacy to increase, never to decrease. But this is probably not the last word. I think that convincing answers for what to do in the face of their impressive impossibility result are still to be proposed, and is a really interesting question.

No comments: