Tuesday, November 25, 2014

Differential Privacy Workshop in London

There will be a differential privacy workshop in London in April, accepting submissions soon. If you have something you are working on, consider submitting here. Some highlights:

-- Although the workshop will not have proceedings (so you can as usual submit a talk and still submit it to a conference of your choosing), exceptional submissions will be invited to a special issue of the Journal of Privacy and Confidentiality.

-- The PC is pretty inter-disciplinary. This is certainly not just a Theory-A workshop. Work on privacy in all areas is on topic.

-- You get to hear Jon Ullman give a (presumably excellent) talk. 

TPDP 2015
First workshop on the Theory and Practice of Differential Privacy
18th April 2015, London, UK
Affiliated to ETAPS

Differential privacy is a promising approach to the privacy-preserving
release of data: it offers a strong guaranteed bound on the increase
in harm that a user incurs as a result of participating in a
differentially private data analysis.

Researchers in differential privacy come from several area of computer
science as algorithms, programming languages, security, databases,
machine learning, as well as from several areas of statistics and data
analysis. The workshop is intended to be an occasion for researchers
from these different research areas to discuss the recent developments
in the theory and practice of differential privacy.


The overall goal of TPDP is to stimulate the discussion on the
relevance of differentially private data analyses in practice. For
this reason, we seek contributions from different research areas of
computer science and statistics.

Authors are invited to submit a short abstract (4-5 pages maximum) of
their work by January 23, 2015. Abstracts must be written in English
and be submitted as a single PDF file at the EasyChair page for TPDP:

Submissions will be judged on originality, relevance, interest and
clarity. Submission should describe novel works or works that have
already appeared elsewhere but that can stimulate the discussion
between the different communities. Accepted abstracts will
be presented at the workshop.

The workshop will not have formal proceedings, but we plan to have a
special issue of the Journal of Privacy and Confidentiality devoted to
TPDP. Authors presenting valuable contributions at the workshop will
be invited to submit a journal version of their work right after the

**Important Dates**

-January 23, 2015 - Abstract Submission
-February 10, 2015 - Notification
-February 14, 2015 - Deadline early registration ETAPS
-April 18, 2015 - Workshop

-May 15, 2015 Deadline for journal special issue


Specific topics of interest for the workshop include (but are not limited t=
theory of differential privacy,
verification techniques for differential privacy,
programming languages for differential privacy,
models for differential privacy,
trade-offs between privacy protection and analytic utility,
differential privacy and surveys,
relaxations of the differential privacy definition,
differential privacy vs other privacy notions and methods,
differential privacy and accuracy,
practical differential privacy,
implementations for differential privacy,
differential privacy and security,
applications of differential privacy.

**Invited Speakers**

Jonathan Ullman - Simons Fellow at Columbia University,

Another invited speaker joint with HotSpot'15 to be confirmed.

**Program Committee**

Gilles Barthe - IMDEA Software
Konstantinos Chatzikokolakis - CNRS and LIX, Ecole Polytechnique
Kamalika Chaudhuri - UC San Diego
Graham Cormode - University of Warwick
George Danezis - University College London
Marco Gaboardi - University of Dundee
Matteo Maffei - CISPA, Saarland University
Catuscia Palamidessi - INRIA and LIX, Ecole Polytechnique
Benjamin C. Pierce - University of Pennsylvania
Aaron Roth - University of Pennsylvania
David Sands - Chalmers University of Technology
Chris Skinner - London School of Economics
Adam Smith - Pennsylvania State University
Carmela Troncoso - Gradiant
Salil Vadhan - Harvard University

Friday, August 15, 2014

Differential Privacy Book is Here

After a much longer time than either of us thought it would take, my book with Cynthia Dwork, "The Algorithmic Foundations of Differential Privacy" is finally available.

You have 3 options for obtaining a copy! (I must admit to not quite understanding the pricing model of our publisher).

  1. Hard Copy: You can buy a hard copy directly from NOW for $99 (http://www.nowpublishers.com/articles/foundations-and-trends-in-theoretical-computer-science/TCS-042/book-details) or from Amazon for $101 and free shipping (http://www.amazon.com/Algorithmic-Foundations-Differential-Privacy/dp/1601988184/ )
  2. "ebook" format (which I believe is just a downloadable pdf): If you don't have room on your book shelf, and are happy with a PDF, the book can be yours from NOW for only $240. This is a bargain -- coming in at 281 pages, this is less than 86 cents per digital "page".   (http://www.nowpublishers.com/articles/foundations-and-trends-in-theoretical-computer-science/TCS-042/book-details)
  3. Free download: The PDF is also available for free on my web page: http://www.cis.upenn.edu/~aaroth/privacybook.html

Friday, April 18, 2014

Lecture 12 -- Privacy Yields an Anti-Folk Theorem in Repeated Games

Last week, Kobbi Nissim gave us an excellent guest lecture on differential privacy and machine learning. The semester has gone by fast -- this week is our last lecture in the privacy and mechanism design class. (But stop by next week to hear the students present their research projects!)

Today we'll talk about infinitely repeated games. In an infinitely repeated game, n players repeatedly, in an infinite number of stages, play actions and obtain payoffs based on some commonly known stage game. Since the game is infinitely repeated, in order to make sense of players total payoff, we employ a discount factor delta that specifies how much less valuable a dollar is tomorrow compared to a dollar today. (delta is some number in [0, 1) ). In games of perfect monitoring, players perfectly observe what actions each of their opponents have played in past rounds, but in large n player games, it is much more natural to think about games of imperfect monitoring, in which agents see only some noisy signal of what their opponents have played.

For example, one natural signal players might observe in an anonymous game is a noisy histogram estimating what fraction of the population has played each type of action. (This is the kind of signal you might get if you see a random subsample of what people play -- for example, you have an estimate of how many people drove on each road on the way to work today by looking at traffic reports). Alternately, there may be some low dimensional signal (like the market price of some good) that everyone observes that is computed as a randomized function of everyone's actions today (e.g. how much of the good each person produced).

A common theme in repeated games of all sorts are folk theorems. Informally, these theorems state that in repeated games, we should expect a huge multiplicity of equilibria, well beyond the equilibria we would see in the corresponding one-shot stage game. This is because players observe each other's past behavior, and so can threaten each other to behave in prescribed ways or else face punishment. Whether or not a folk theorem is a positive result or a negative result depends on whether you want to design behavior, or predict behavior. If you are a mechanism designer, a folk theorem might be good news -- you can try and encourage equilibrium behavior that has higher welfare than any equilibrium of the stage game. However, if you want to predict behavior, it is bad news -- there are now generically a huge multiplicity of very different equilibria, and some of them have much worse welfare than any equilibrium of the stage game.

In this lecture (following a paper joint with Mallesh Pai and Jon Ullman) we argue that:

  1. In large games, many natural signaling structures produce signal distributions that are differentially private in the actions of the players, where the privacy parameters tends to 0 as the size of the game gets large, and
  2. In any such game, for any discount factor delta, as the size of the game gets large, the set of equilibria of the repeated game collapse to the set of equilibria of the stage game. In other words, there are no "folk theorem equilibria" -- only the equilibria that already existed in the one shot game. 
This could be interpreted in a couple of ways. On the one hand, this means that in large games, it might be harder to sustain cooperation (which is a negative result). On the other hand, since it shrinks the set of equilibria, it means that adding noise to the signaling structure in a large game generically improves the price of anarchy over equilibria of the repeated game, which is a positive result. 

Friday, April 04, 2014

Lecture 10 -- Running Ascending Price Auctions that Make Sincere Bidding an Ex-Post Dominant Strategy

In the 10th lecture in our privacy and mechanism design class, we consider the problem of running an ascending price auction. An ascending price auction is just a generalization of what you normally see as an "auction" on TV -- rather than submitting your valuation in some kind of one-shot protocol, the prices of the goods gradually rise, and you take turns with other bidders making bids on the goods as a function of the current prices.

Why would you want to run such an auction when the VCG mechanism already can provide welfare optimal outcomes for every social choice function, while making truthful reporting a dominant strategy? People quote a couple of reasons:

  1. It might be hard to actually report your full valuation: in principle, you need to figure out exactly your value for every bundle you might receive, and its difficult to pin down a number. In an ascending price auction, all you need to do is be able to point to your favorite good (or bundle of goods) that you would buy if the current prices were the final prices, which is often an easier task. 
  2. An ascending price auction can end without you having to reveal your full type. For example, in a single item second price auction, the highest bidder never has to reveal (even to the auctioneer) his value for the good -- only that it is higher than that of the second highest bidder. Hence, people might prefer such auctions for "privacy" reasons. 
In an ascending price auction, "truthful" reporting doesn't make sense, since nobody ever asks you to report your type. But we can ask for "sincere bidding", in which bidders truthfully bid on the item at each round that is their favorite, given the current prices. But there is a problem: we typically can't implement sincere bidding as a dominant strategy, because of the problem of threats. Consider the following simple example:

Suppose we have two unit demand bidders 1 and 2, and two goods for sale a and b. We have v_{1,a} = 1, v_{1,b} = epsilon and v_{2,a} = 1/2, v_{2, b} = 1/2 - \epsilon. Suppose moreover that bidder 2 takes the following strategy: "Bid on good a. If bidder 1 bids on good a, then outbid him on whatever he bids on until the price is > 1.'' Against this strategy, bidder 1 cannot obtain non-negative utility if he bids on his favorite good (a), and so his best response is to place an insincere bid on good 2. Moreover, bidder 2 has a clear motivation to take this threatening position -- he obtains substantially higher payoff than if players followed sincere bidding, since he gets his most preferred good without any competition. As a result of instances like these, typically ascending price auctions can implement sincere bidding at best as an (ex-post) Nash equilibirum. 

In this lecture, we talk about how to implement an ascending auction such that the prices are differentially private in the bidding strategies of the players (and the allocation in the end is jointly differentially private). This fixes two of the problems above:
  1. The privacy guaranteed by the ascending price auction is no longer hand-wavy and qualitative, but rather precise and quantitative. 
  2. We get sincere bidding as an asymptotic ex-post dominant strategy for all players.
To get this result, we need only a mild large-market assumption: that the "supply" of each good is modestly large compared to the number of different types of goods -- but crucially we need to assume nothing about how bidder preferences are generated. 

The intuition, which we will appeal to again later, is that by running the auction privately, we have eliminated the possibility that players can distort incentives by threatening each other.

Saturday, March 29, 2014

Lecture 9 -- Purchasing Private Data from Privacy Sensitive Individuals

Yesterday in our privacy and mechanism design course, we were fortunate to have a guest lecture by David Xiao. David told us his exciting recent paper, with Kobbi Nissim and Salil Vadhan, Redrawing the Boundaries on Purchasing Private Data from Privacy-Sensitive Individuals.

Consider the following scenario: An analyst wishes to conduct some medical study about an underlying population, but needs to obtain permission from each individual whose data he uses. On the one hand, he needs to buy data from a representative sample of the population so that his study is accurate. On the other hand, he needs to compensate individuals for their privacy costs, and would like to come up with a payment scheme that incentivizes them to report their true privacy costs, rather than inflating them for selfish gain. Finally, he wants the mechanism to be individually rational: that no rational agent should obtain negative utility by interacting with the analyst.

Because individual's costs for privacy are a function of the method by which their reports are used to compute the outcome of the mechanism, rather than just a function of the outcome itself, this takes us outside of a standard mechanism design setting. What makes the problem tricky is that individual's costs for privacy could quite plausibly be correlated with their private data. Suppose the analyst wishes to estimate the fraction of people in some population who have syphilis. It is reasonable to expect that syphilitics will on the whole want to be compensated more than healthy individuals for a loss of privacy. But this means that even computations on agents reported costs for privacy (and independent of agent's supposedly private data) can lead to privacy loss for those agents, and so must be compensated.

Some years ago Arpita Ghosh and I studied this problem, and showed an impossibility result when making some (unreasonably) strong assumptions. One might have hoped that our result could be circumvented with one of several tweaks to the model. But no. David and his coauthors extend this impossibility result to have much wider applicability, making fewer assumptions on what the analyst is able to observe, and far fewer assumptions about the form of the privacy loss function of the agents. Their result is quite robust: Under extremely general circumstances, no truthful individually rational mechanism which makes finite payments can distinguish between two populations, in one of which everyone has syphilis, and in the other of which nobody does. This result says that no mechanism can simultaneously enjoy truthfulness, individual rationality, and non-trivial accuracy properties, and so without drastically relaxing the model of how people might value privacy, you must always give up on one of these.

They do propose one such relaxation, which seems to reduce to something like the assumption that contracting syphilis can only ever cause your costs for privacy to increase, never to decrease. But this is probably not the last word. I think that convincing answers for what to do in the face of their impressive impossibility result are still to be proposed, and is a really interesting question.

Friday, March 21, 2014

Lecture 8 -- Implementing Correlated Equilibria Ex-Post in Incomplete Information Games

After a spring break hiatus, our class on privacy and mechanism design returns (with all of our students working on their course projects!)

In our third and final lecture on using mediators to implement equilibrium of complete information games in settings of incomplete information, we ask how far we can push the agenda of obtaining ex-post implementations via the technique of differentially private equilibrium computation. Recall that last lecture we saw how to do this in large congestion games. Can we get similar results in arbitrary large games?

One obstacle is that we do not know good algorithms for computing Nash equilibria in arbitrary games at all, let alone privately. However, we do know how to compute arbitrarily good approximations to correlated equilibria in arbitrary n player games! In this lecture, we explore the game theoretic implications of private computation of correlated equilibria, and then show how to do it.

The punchline is you can still implement (correlated) equilibria of the complete information game as an ex-post Nash equilibrium of the incomplete information game, but you need a slightly stronger mediator (which has the power to verify player types if they decide to use the mediator).

To accomplish this goal, we introduce a couple of interesting tools: A powerful composition theorem in differential privacy due to Dwork, Rothblum, and Vadhan, and the multiplicative weights (or weighted majority, or polynomial weights, or hedge, or...) algorithm, which is a natural learning dynamic and can be used to quickly compute (coarse) correlated equilibria in arbitrary games.

Next week we will have a guest lecture, combined with our theory seminar, by David Xiao, who will tell us about his exciting recent work on Redrawing the Boundaries on Purchasing Data from Privacy Sensitive Individuals. 

Saturday, March 01, 2014

Lecture 7 -- Privacy Preserving Public Information for Sequential Games

Our seventh lecture was given by Jamie Morgenstern, about her very interesting paper joint with Avrim Blum, Ankit Sharma, and Adam Smith.

The birds-eye view is the following:

Suppose players (think financial institutions) take turns sequentially deciding which investments to make, from amongst a feasible set, which can be different for each player. In general, the profit that a player gets from an investment is a decreasing function of how many players previously have made the same investment. (Perhaps these investments are structured like pyramid schemes, where there is a big advantage in getting in early, or perhaps the market is somehow destabilized if there is too much capital invested in it).

We can study this interaction in various information models. In the complete information setting, each player sees exactly how much has been invested in each resource at the time that she arrives, and the unique dominant strategy solution of the game is for each player to invest greedily. They show that this solution achieves a good constant factor approximation to the optimal welfare.

But if the players are financial institutions, then their investments might represent sensitive trade secrets, and they may not want to share this information with others -- in which case the complete information setting seems unrealistic. This could be very bad news however -- if players have no information at all about what has gone on before their arrival, its not hard to cook up plausible sounding behaviors for them which result in disastrous welfare outcomes.

So the paper asks: can we introduce a differentially private signal (so one that necessarily reveals little actionable information about each agent, and therefore one whose introduction the agents have little reason to object to) that nevertheless allows the market to achieve social welfare that approximates OPT.

Skipping over some details, this paper shows that the answer is yes. Making public a differentially private count of how much has been invested in each resource as the game plays out is enough to guarantee that sequential play (studied either as the simple behavioral strategy in which players imagine that the noisy counts are exactly correct, or any solution in which players play only undominated strategies) results in an outcome that has a bounded competitive ratio with OPT.

This paper also contains an interesting technique that will probably be useful in other contexts: they develop a new method of privately maintaining the count of a set of numbers that achieves better additive error as compared to previous work, at the cost of introducing some small multiplicative error. In the application they need counters for in this paper, this modification gives improved overall bounds.