Tuesday, May 24, 2016

Fairness in Learning

The very real problem of (un)fairness in algorithmic decision making in general, and machine learning in particular seems to have finally reached the forefront of public attention. Every day there is a new popular article about the topic. Just in the last few weeks, we have seen articles in the Times about built in bias in facebook, and an in-depth ProPublica study about racial bias in statistical models for predicting criminal recidivism. Earlier this month, the White House released a report on the challenges in promoting fairness in Big Data.

The tricky thing is saying something concrete and technical about this problem -- even defining what "fairness" is is delicate. There has been some good technical work in this area that I have long admired from afar -- see e.g. the "FATML" (Fairness and Transparency in Machine Learning) Workshop to get an idea of the range of work being done, and the folks doing it. People like Cynthia DworkMoritz HardtSolon BarocasSuresh VenkatasubramanianSorelle Friedler, Cathy O'Neil, and others have been doing important work thinking about these problems for quite some time. A particularly nice early paper that I recommend everyone interested in the area read is Fairness Through Awareness, by Dwork, Hardt, Pitassi, Reingold, and Zemel. It was first posted online in 2011(!), and in retrospect is quite prescient in its discussion of algorithmic fairness.

So I'm happy to finally have something interesting to say about the topic! My student Matthew JosephMichael KearnsJamie Morgenstern, and I just posted a new paper online that I'm excited about: Fairness in Learning: Classic and Contextual Bandits. I'll mostly let the paper speak for itself, but briefly, we write down a simple but (I think) compelling definition of fairness in a stylized general model of sequential decision making called the "contextual bandit setting". To keep a canonical problem in your mind, imagine the following: There are a bunch of different populations (say racial or socioeconomic groups), and you are a loan officer. Every day, an individual from each population applies for a loan. You get to see the loan application for each person (this is the "context"), and have to decide who to give the loan to. When you give out the loan, you observe some reward (e.g. you see if they paid back the loan), but you don't see what reward you -would- have gotten had you given the loan to someone else. Our fairness condition says roughly that an algorithm is "fair" if it never preferentially gives a loan to a less qualified applicant over a more qualified applicant -- where the quality of an applicant in our setting is precisely the probability that they pay back the loan. (It prohibits discriminating against qualified applicants on an individual basis -- even if  they happen to come from a population that is less credit-worthy on average, or from a population that the bank doesn't understand as well).

It might seem like this definition of fairness is entirely consistent with the profit motivation of a bank -- why would a bank ever want to give a loan to an applicant less likely to pay it back? Indeed, this would be true if the bank had nothing to learn -- i.e. if it already knew the optimal rule mapping loan applications to credit-worthiness. Said another way, implementing the optimal policy is entirely consistent with our fairness definition. Our main conceptual message is that fairness can nevertheless be an obstruction to learning the optimal policy.

What our results say is that "fairness" always has a cost in terms of the optimal learning rate achievable by algorithms in this setting. For some kinds of problems, the cost is mild in that the cost of fairness on the learning rate is only polynomial (e.g. when credit-worthiness is determined by a simple linear regression model on the features of a loan application). On the other hand, for other kinds of problems, the cost of fairness on the learning rate is severe, in that it can slow learning by an exponential factor (e.g. when credit-worthiness is determined by an AND of features in the loan application). Put another way, for the problems in which the cost of fairness is severe, if the bank were to use a fast learning algorithm (absent a fairness constraint), the algorithm might be "unfair" for a very long time, even if in the limit, once it learned the truly optimal policy, it would eventually be fair. One friction to fairness is that we don't live in the limit -- we are always in a state of learning.

Saturday, October 10, 2015

Penn is Hiring in Computer Science

Penn Engineering is embarking on a period of substantial growth, and the computer science department plans to hire lots of people over the next several years. (You can see a draft of the Penn Engineering plan for growth here. "Data Science and Computation" and Security are both priority areas).

This year, we are planning to hire for multiple positions, both junior and senior, with a focus on (among other things), machine learning. 

So if you are on the market (or thinking about it), send us your application -- apply here: https://facultysearches.provost.upenn.edu/postings/663 

(Unless you want to apply for an endowed chair in computer graphics -- then apply here: https://facultysearches.provost.upenn.edu/postings/664 )

Edit: I forgot to mention! Interest in machine learning is university wide -- both statistics and ESE also plan to hire in machine learning.
Here is the link for applying to the statistics position: https://statistics.wharton.upenn.edu/recruiting/facultypositions/
And here is the link for ESE: http://www.ese.upenn.edu/faculty-staff/index.php

Tuesday, June 09, 2015

An action packed Monday at FCRC

FCRC begins next week, and due to the wonders of co-location will be action packed. (Monday will have among other things EC workshops, as well as the first full day of STOC talks). I've found myself on the organizing committee of two of the three EC workshops, and Monday is also when our STOC paper is scheduled.

So here are three (partially) conflicting events you might be interested in:

NetEcon 2015: Highlights include excellent keynote speakers: Rakesh Vohra at  9:00 am, Eva Tardos at 2:00 pm, and R. Srikant at 4:00 pm

The Workshop on Algorithmic Game Theory and Data Science: This workshop is highlighting work in a very exciting area -- the intersection between "data science" and mechanism design -- which is at its very beginning, and I think ripe for lots of important work to be done. Its also attracted a great lineup of speakers (my student Steven will be giving the talk on our paper).

The STOC presentation of our paper, Preserving Statistical Validity in Adaptive Data Analysis: This is at 1:55pm, unfortunately conflicting with NetEcon -- but the attendees of the AGT + Data Science workshop will get a break in order to attend. Vitaly will be giving the talk, which I expect will be very good.

If you want to hear a longer (but probably less good) version of the talk, you can tune in tomorrow at 1pm eastern when I talk about this paper for TCS+

EDIT: The video of the talk is here. It went off with only one technical hitch.


Thursday, April 16, 2015

Algorithmic Game Theory and Data Science

With FOCS submissions sent off and EC rejections in hand, its time to think about presenting your work at a workshop, and chat with your colleagues doing similar things. 

If you are working on something at the intersection of algorithmic game theory and machine learning (this includes e.g. the sample complexity of auction design, or learning from revealed preferences, or learning from censored feedback), then you should consider the "Algorithmic Game Theory and Data Science" workshop that we'll be running during EC 2015. Conveniently, this is at FCRC, so if you were planning on attending EC or STOC (or SIGmetrics, or CCC, or...) you'll already be there in Portland. 
Deadline is in 10 days!

https://sites.google.com/site/agtanddatascienceworkshop2015/

---------------------------


Call for Papers

In conjunction with the Sixteenth ACM Conference on Economics and Computation (EC'15), we solicit submissions for the First Workshop on Algorithmic Game Theory and Data Science, to be held on June 15, 2015 in Portland, Oregon, USA.

Computer systems have become the primary mediator of social and economic interactions, enabling transactions at ever-increasing scale.  Mechanism design when done on a large scale needs to be a data-driven enterprise.  It seeks to optimize some objective with respect to a huge underlying population that the mechanism designer does not have direct access to.  Instead, the mechanism designer typically will have access to sampled behavior from that population (e.g. bid histories, or purchase decisions).  This means that, on the one hand, mechanism designers will need to bring to bear data-driven methodology from statistical learning theory, econometrics, and revealed preference theory.  On the other hand, strategic settings pose new challenges in data science, and approaches for learning and inference need to be adapted to account for strategization.  

The goal of this workshop is to frame the agenda for research at the interface of algorithms, game theory, and data science.  Papers from a rich set of experimental, empirical, and theoretical perspectives are invited.  Topics of interest include but are not limited to:
  • Can good mechanisms be learned by observing agent behavior in response to other mechanisms?  How hard is it to "learn'' a revenue maximizing auction given a sampled bid history?  How hard is it to learn a predictive model of customer purchase decisions, or better yet, a set of prices that will accurately maximize profit under these behavioral decisions? 
  • What is the sample complexity of mechanism design?  How much data is necessary to enable good mechanism design?
  • How does mechanism design affect inference?  Are outcomes of some mechanisms more informative than those of others from the viewpoint of inference?
  • How does inference affect mechanism design?  If participants know that their data is to be used for inference, how does this knowledge affect their behavior in a mechanism?
  • Can tools from computer science and game theory be used to contribute rigorous guarantees to interactive data analysis?  Strategic interactions between a mechanism and a user base are often interactive (e.g. in the case of an ascending price auction, or repeated interaction with a customer and an online retailer), which is a setting in which traditional methods for preventing data over-fitting are weak.
  • Is data an economic model? Can data be used to evaluate or replace existing economic models?  What is the consequence for game theory and economics for replacing the model with data.

Submission Instructions

Any submission format between abstracts and full papers will be considered.  Abstracts may be rejected if we cannot sufficiently evaluate their contribution.  Full papers will be evaluated after page 10 only at the discretion of the committee.
We solicit both new work and work recently published or soon to be published in another venue.  For submissions of the latter kind, authors must clearly state the venue of publication.  This workshop will have no published proceedings.  Papers appearing in published conference proceedings or journals subsequent to EC 2014 will be considered, though preference may be given to papers that have not yet appeared.  Papers that have appeared or are to appear at EC or affiliated workshops will not be considered.
Authors are encouraged to provide a link to an online version of the paper (such as on arXiv).  If accepted, such papers will be linked via an index to give an informal record of the workshop.
All submissions should be sent electronically to AGTDataScienceWorkshop15@gmail.com on or before April 26th, 2015.  Notification of acceptance will be on May 11, 2015.

Organizing Committee

Shuchi Chawla, U. of Wisconsin 
Hu Fu, Microsoft Research
Jason Hartline, Northwestern U.
Denis Nekipelov, U. of Virginia
Aaron Roth, U. of Pennsylvania
Kane Sweeney, eBay and Stubhub

Tuesday, April 07, 2015

Netecon deadline in two weeks

A reminder that the NetEcon workshop deadline is coming up in two weeks. If you plan to be at FCRC (for e.g. STOC, or EC, or Sigmetrics), this will be a great place to present your work and get it seen by both the EC and the SIGmetrics community. There's also a great lineup of invited talks (abstracts here: http://netecon.eurecom.fr/NetEcon2015/keynotes.html ) by R. Srikant, Rakesh Vohra, and Eva Tardos.

Here is the call: http://netecon.eurecom.fr/NetEcon2015/index.html

The submission deadline is April 22.

Sunday, February 22, 2015

STOC 2015 Call for Workshops and Tutorials

Sanjeev Khanna and Chandra Chekuri are co-chairing the STOC 2015 workshops and tutorials, and are looking for submissions. The call is below:

Call for STOC 2015 Workshop and Tutorial Proposals

  • Workshop and Tutorial Day: Sunday, June 14, 2015
  • Workshop and Tutorial Co-Chairs: Chandra Chekuri and Sanjeev Khanna
  • Submission deadline: March 20, 2015
  • Notification: March 30, 2015
On Sunday, June 14, immediately preceding the main conference, STOC 2015 will hold a workshop-and-tutorials day. We invite groups of interested researchers to submit workshop or tutorial proposals. The goal of a workshop is to provide an informal forum for researchers to discuss important research questions, directions, and challenges. Connections between theoretical computer science and other areas, topics that are not well represented at STOC, and open problems are encouraged as workshop topics. Organizers are completely free to choose their workshop formats (invited speakers, panel discussions, etc.). The program for June 14th may also involve tutorials, each consisting of 1-2 survey talks on a particular area, and we welcome tutorial proposals as well.
STOC does not have funds to pay travel expenses or honoraria to invited workshop and tutorial speakers. Workshop and tutorials attendance will be free, and there is no separate registration for attending them. STOC will support coffee breaks for the workshops/tutorials attendees but no lunch will be provided. Note that STOC registration neither includes FCRC registration for Sunday, June 14 nor covers lunch for Sunday, June 14. Workshop/tutorial attendees who wish to attend another FCRC conference on June 14, would need to register for that conference.

Proposal submission

Workshop and tutorial proposals should be no longer than 2 pages. Please include a list of names and email addresses of the organizers, a description of the topic and the goals of the workshop or tutorial, the proposed workshop format (invited talks, contributed talks, panel, etc.), and proposed or tentatively confirmed speakers if known. Please also indicate the preferred length of time for your workshop or tutorial, along with the minimum acceptable time. We anticipate a 4-5 hour block for each workshop and a 2-4 hour block for each tutorial. Please feel free to contact the Workshop and Tutorial Co-Chairs at the email addresses below if you have questions about workshop or tutorial proposals.

Submission deadline

Proposals should be submitted by March 20, 2015, via email to chekuri@illinois.edu and sanjeev@cis.upenn.edu. Proposers will be notified by March 30, 2015, about whether their proposals have been accepted.

http://acm-stoc.org/stoc2015/callforworkshops.html

Friday, February 13, 2015

Bringing Differential Privacy to the Masses <3

This Valentines Day, we'll be bringing differential privacy to the (scientific) masses, with a session at the AAAS annual meeting in San Jose.

If you happen to be attending, you should stop by: https://aaas.confex.com/aaas/2015/webprogram/Session9556.html

Wednesday, January 28, 2015

NetEcon 2015

Patrick Loiseau, Adam Wierman, and I are co-chairing the 2015 NetEcon workshop, to be held in conjunction with EC and Sigmetrics at this year's FCRC in Portland. You should stop by if you are attending any of the FCRC conferences, (and submit a paper). If you haven't been to a previous iteration, this workshop brings together people interested in game theory from both the networking and theory/AI communities, so is a great place to present work that might be of interest to both communities, or to pick up problems that are interesting from a different community than yours.

We've got a great set of invited speakers: R. Srikant, Ricky Vohra, and Eva Tardos.

So start thinking about what you want to submit -- you've only got about 3 months.
*******************************************************************
Call For Papers:

         NetEcon 2015: The 10th Workshop on the Economics of Networks, 
Systems and Computation
         At FCRC 2015, in conjunction with ACM EC and ACM SIGMETRICS
         Monday, June 15, 2015 (Portland, Oregon, USA)

http://netecon.eurecom.fr/NetEcon2015/


*******************************************************************
INVITED SPEAKERS

     * *R. Srikant*, University of Illinois at Urbana-Champaign
     * *Rakesh V. Vohra*, University of Pennsylvania
     * *Eva Tardos*, Cornell University


*******************************************************************
CALL FOR PAPERS

Today's communication networks and networked systems are highly complex 
and heterogeneous, and are often owned by multiple profit-making 
entities. For new technologies or infrastructure designs to be adopted, 
they must not be only based on sound engineering performance 
considerations but also present the right economic incentives. Recent 
changes in regulations of the telecommunication industry make such 
economic considerations even more urgent. For instance, concerns such as 
network neutrality have a significant impact on the evolution of 
communication networks.

At the same time, communication networks and networked systems support 
increasing economic activity based on applications and services such as 
cloud computing, social networks, and peer-to-peer networks. These 
applications pose new challenges such as the development of good pricing 
and incentive mechanisms to promote effective system-wide behavior. In 
relation to these applications, security and privacy also require 
consideration of economic aspects to be fully understood.

The aim of NetEcon is to foster discussions on the application of 
economic and game-theoretic models and principles to address challenges 
in the development of networks and network-based applications and 
services. NetEcon was established in 2006 (succeeding to the P2PECON, 
IBC and PINS workshops) and merged with the W-PIN workshop in 2013. We 
invite submission of extended abstracts describing original research on 
theoretical/methodological contributions or on applications to cases of 
interest. It is our hope that NetEcon will serve as a feeder workshop, 
i.e., that expanded, polished versions of extended abstracts will appear 
later in major conference proceedings and refereed journals of relevant 
research communities.

Topics of interest include (but are not limited to):

     * Pricing of resources in communication networks, grids, and cloud 
computing
     * Pricing of information goods and services; copyright issues, 
effect of network externalities (e.g., in social network)
     * Economic issues in universal broadband access; economics of 
interconnection and peering
     * Effects of market structure and regulations (e.g., network 
neutrality)
     * Economics of network security and privacy; valuation of personal data
     * Auctions with applications to networks: spectrum auctions, 
auction-based marketplaces for network and cloud resources
     * Incentive mechanisms for networks: peer-to-peer systems, clouds, 
wireless networks, spam prevention, security
     * Methods for engineering incentives and disincentives (e.g., 
reputation, trust, control, accountability, anonymity)
     * Empirical studies of strategic behavior (or the lack thereof) in 
existing, deployed systems
     * Design of incentive-aware network architectures and protocols
     * Game-theoretic models and techniques for network economics: large 
games, learning, mechanism design, interaction of game theory and 
information theory or queuing theory, information exchange, diffusion, 
dynamics of cooperation and network formation, trades in social and 
economic networks
     * Algorithmic mechanism design for network systems
     * Critiques of existing models and solution concepts, as well as 
proposals of better models and solution concepts
     * Studies of open collaboration, peer production, crowdsourcing, 
and human computation.


*******************************************************************
SUBMISSION FORMATTING GUIDELINES AND PROCEEDINGS

Submissions must be in the form of extended abstracts of 3-4 pages, 
including all figures, tables, references, appendices, etc. They must be 
formatted according to the standard alternate ACM PER double column 
format using letter paper. You are encouraged to use the ACM 
sig-alternate-per latex template 
(http://www.sigmetrics.org/sig-alternate-per.cls).

Accepted extended abstracts will be published in a special issue of ACM 
Performance Evaluation Review (PER) and will be available online through 
ACM portal digital library. Authors of accepted abstracts grant ACM 
permission to publish them in print and digital formats.

Note that authors retain the copyright of their work published in ACM 
PER, with freedom to submit it elsewhere. Yet, authors for whom 
publication of a 3-4 pages extended abstract in the NetEcon 2015 
proceedings would preclude later publication of an expanded version in 
the relevant venue may elect to contribute only a one-page abstract of 
their submitted extended abstract to the NetEcon 2015 proceedings. Such 
an abstract should include the URL of a working paper or preprint that 
contains the main results presented at the NetEcon workshop. Authors 
will make this decision after receiving a notice of acceptance.

If the number of excellent submissions is larger than we have space to 
allot presentations for, some authors will be offered the opportunity to 
present their work during a poster session.


*******************************************************************
COMMITTES

PC CHAIRS
     Patrick Loiseau (EURECOM, France)
     Aaron Roth (UPenn, USA)
     Adam Wierman (Caltech, USA)

WEBMASTER
     Michela Chessa (EURECOM, France)

TECHNICAL PROGRAM COMMITTEE
     Matthew Andrews (Alcatel-Lucent Bell Labs, USA)
     Itai Ashlagi (MIT, USA)
     Moshe Babaioff (Microsoft Research, Israel)
     Tamer BaÅŸar (University of Illinois, Urbana-Champaign, USA)
     Bobby Bhattarcharjee (University of Maryland, USA)
     Rainer Böhme (WWU Münster, Germany)
     Kostas Bimpikis (Stanford University, USA)
     Eilyan Bitar (Cornell University, USA)
     Augustin Chaintreau (Columbia University, USA)
     Michela Chessa (EURECOM, France)
     kc claffy (CAIDA / UC San Diego, USA)
     Costas Courcoubetis (SUTD, Singapore and AUEB, Greece)
     Amogh Dhamdhere (CAIDA / UC San Diego, USA)
     Constantine Dovrolis (GeorgiaTech, USA)
     Rachid Elazouzi (University of Avignon, France)
     Sergey Gorinsky (Institute IMDEA Networks, Spain)
     Jens Grossklags (The Pennsylvania State University, USA)
     Roch Guerin (Washington University in St. Louis, USA)
     Nidhi Hegde (Alcatel-Lucent Bell Labs, France)
     Ekram Hossain (University of Manitoba, Canada)
     Stratis Ioannidis (Yahoo! labs, USA)
     Krisnamurthy Iyer (Cornell University, USA)
     Rahul Jain (USC, USA)
     Ian Kash (Microsoft Research, UK)
     David Kempe (USC, USA)
     Peter Key (Microsoft Research, UK)
     Nikolaos Laoutaris (Telefonica Research, Spain)
     Dave Levin (University of Maryland, USA)
     Patrick Loiseau (EURECOM, France) -- co-chair
     Brendan Lucier (Microsoft Research, USA)
     John C. S. Lui (The Chinese University of Hong Kong, Hong Kong)
     Patrick Maillé (Institut Mines-Telecom / Telecom Bretagne, France)
     Jason Marden (University of Colorado, Boulder, USA)
     Ravi Mazumdar (University of Waterloo, Canada)
     Jeonghoon Mo (Yonsei University, South Korea)
     John Musacchio (UC Santa Cruz, USA)
     Andrew Odlyzko (University of Minnesota, Minneapolis, USA)
     Aaron Roth (UPenn, USA) -- co-chair
     Galina Schwartz (UC Berkeley, USA)
     Paul G. Spirakis (University of Liverpool, UK and CTI, Greece)
     Nicolás Stier Moses (Facebook Data Science, USA)
     Vijay Subramanian (University of Michigan, Ann Arbor, USA)
     John N. Tsitsiklis (MIT, USA)
     Bruno Tuffin (Inria, France)
     Adrian Vetta (McGill University, Canada)
     Steven Weber (Drexel University, USA)
     Adam Wierman (Caltech, USA) -- co-chair


*******************************************************************
IMPORTANT DATES

     * Wednesday April 22, 2015, 11:59pm PST: Submission deadline (firm)
     * Wednesday May 13, 2015: Notification to authors
     * Monday June 8, 2015: Final version for the workshop's website due
     * Monday June 15, 2015: Workshop in Portland
     * Monday July 13, 2015: Final version for the ACM PER proceedings due


*******************************************************************
ADDITIONAL INFORMATION

For more information, please contact the organizers or visit the 
workshop website: http://netecon.eurecom.fr/NetEcon2015/.

Tuesday, November 25, 2014

Differential Privacy Workshop in London

There will be a differential privacy workshop in London in April, accepting submissions soon. If you have something you are working on, consider submitting here. Some highlights:

-- Although the workshop will not have proceedings (so you can as usual submit a talk and still submit it to a conference of your choosing), exceptional submissions will be invited to a special issue of the Journal of Privacy and Confidentiality.

-- The PC is pretty inter-disciplinary. This is certainly not just a Theory-A workshop. Work on privacy in all areas is on topic.

-- You get to hear Jon Ullman give a (presumably excellent) talk. 


CALL FOR PAPERS
TPDP 2015
First workshop on the Theory and Practice of Differential Privacy
18th April 2015, London, UK
Affiliated to ETAPS
http://tpdp.computing.dundee.ac.uk

Differential privacy is a promising approach to the privacy-preserving
release of data: it offers a strong guaranteed bound on the increase
in harm that a user incurs as a result of participating in a
differentially private data analysis.

Researchers in differential privacy come from several area of computer
science as algorithms, programming languages, security, databases,
machine learning, as well as from several areas of statistics and data
analysis. The workshop is intended to be an occasion for researchers
from these different research areas to discuss the recent developments
in the theory and practice of differential privacy.

**Submissions**

The overall goal of TPDP is to stimulate the discussion on the
relevance of differentially private data analyses in practice. For
this reason, we seek contributions from different research areas of
computer science and statistics.

Authors are invited to submit a short abstract (4-5 pages maximum) of
their work by January 23, 2015. Abstracts must be written in English
and be submitted as a single PDF file at the EasyChair page for TPDP:
https://easychair.org/conferences/?conf=3Dtpdp2015

Submissions will be judged on originality, relevance, interest and
clarity. Submission should describe novel works or works that have
already appeared elsewhere but that can stimulate the discussion
between the different communities. Accepted abstracts will
be presented at the workshop.

The workshop will not have formal proceedings, but we plan to have a
special issue of the Journal of Privacy and Confidentiality devoted to
TPDP. Authors presenting valuable contributions at the workshop will
be invited to submit a journal version of their work right after the
workshop.


**Important Dates**

-January 23, 2015 - Abstract Submission
-February 10, 2015 - Notification
-February 14, 2015 - Deadline early registration ETAPS
-April 18, 2015 - Workshop

-May 15, 2015 Deadline for journal special issue


**Topics**

Specific topics of interest for the workshop include (but are not limited t=
o):
theory of differential privacy,
verification techniques for differential privacy,
programming languages for differential privacy,
models for differential privacy,
trade-offs between privacy protection and analytic utility,
differential privacy and surveys,
relaxations of the differential privacy definition,
differential privacy vs other privacy notions and methods,
differential privacy and accuracy,
practical differential privacy,
implementations for differential privacy,
differential privacy and security,
applications of differential privacy.


**Invited Speakers**

Jonathan Ullman - Simons Fellow at Columbia University,

Another invited speaker joint with HotSpot'15 to be confirmed.


**Program Committee**

Gilles Barthe - IMDEA Software
Konstantinos Chatzikokolakis - CNRS and LIX, Ecole Polytechnique
Kamalika Chaudhuri - UC San Diego
Graham Cormode - University of Warwick
George Danezis - University College London
Marco Gaboardi - University of Dundee
Matteo Maffei - CISPA, Saarland University
Catuscia Palamidessi - INRIA and LIX, Ecole Polytechnique
Benjamin C. Pierce - University of Pennsylvania
Aaron Roth - University of Pennsylvania
David Sands - Chalmers University of Technology
Chris Skinner - London School of Economics
Adam Smith - Pennsylvania State University
Carmela Troncoso - Gradiant
Salil Vadhan - Harvard University

Friday, August 15, 2014

Differential Privacy Book is Here

After a much longer time than either of us thought it would take, my book with Cynthia Dwork, "The Algorithmic Foundations of Differential Privacy" is finally available.

You have 3 options for obtaining a copy! (I must admit to not quite understanding the pricing model of our publisher).


  1. Hard Copy: You can buy a hard copy directly from NOW for $99 (http://www.nowpublishers.com/articles/foundations-and-trends-in-theoretical-computer-science/TCS-042/book-details) or from Amazon for $101 and free shipping (http://www.amazon.com/Algorithmic-Foundations-Differential-Privacy/dp/1601988184/ )
  2. "ebook" format (which I believe is just a downloadable pdf): If you don't have room on your book shelf, and are happy with a PDF, the book can be yours from NOW for only $240. This is a bargain -- coming in at 281 pages, this is less than 86 cents per digital "page".   (http://www.nowpublishers.com/articles/foundations-and-trends-in-theoretical-computer-science/TCS-042/book-details)
  3. Free download: The PDF is also available for free on my web page: http://www.cis.upenn.edu/~aaroth/privacybook.html

Friday, April 18, 2014

Lecture 12 -- Privacy Yields an Anti-Folk Theorem in Repeated Games

Last week, Kobbi Nissim gave us an excellent guest lecture on differential privacy and machine learning. The semester has gone by fast -- this week is our last lecture in the privacy and mechanism design class. (But stop by next week to hear the students present their research projects!)

Today we'll talk about infinitely repeated games. In an infinitely repeated game, n players repeatedly, in an infinite number of stages, play actions and obtain payoffs based on some commonly known stage game. Since the game is infinitely repeated, in order to make sense of players total payoff, we employ a discount factor delta that specifies how much less valuable a dollar is tomorrow compared to a dollar today. (delta is some number in [0, 1) ). In games of perfect monitoring, players perfectly observe what actions each of their opponents have played in past rounds, but in large n player games, it is much more natural to think about games of imperfect monitoring, in which agents see only some noisy signal of what their opponents have played.

For example, one natural signal players might observe in an anonymous game is a noisy histogram estimating what fraction of the population has played each type of action. (This is the kind of signal you might get if you see a random subsample of what people play -- for example, you have an estimate of how many people drove on each road on the way to work today by looking at traffic reports). Alternately, there may be some low dimensional signal (like the market price of some good) that everyone observes that is computed as a randomized function of everyone's actions today (e.g. how much of the good each person produced).

A common theme in repeated games of all sorts are folk theorems. Informally, these theorems state that in repeated games, we should expect a huge multiplicity of equilibria, well beyond the equilibria we would see in the corresponding one-shot stage game. This is because players observe each other's past behavior, and so can threaten each other to behave in prescribed ways or else face punishment. Whether or not a folk theorem is a positive result or a negative result depends on whether you want to design behavior, or predict behavior. If you are a mechanism designer, a folk theorem might be good news -- you can try and encourage equilibrium behavior that has higher welfare than any equilibrium of the stage game. However, if you want to predict behavior, it is bad news -- there are now generically a huge multiplicity of very different equilibria, and some of them have much worse welfare than any equilibrium of the stage game.

In this lecture (following a paper joint with Mallesh Pai and Jon Ullman) we argue that:

  1. In large games, many natural signaling structures produce signal distributions that are differentially private in the actions of the players, where the privacy parameters tends to 0 as the size of the game gets large, and
  2. In any such game, for any discount factor delta, as the size of the game gets large, the set of equilibria of the repeated game collapse to the set of equilibria of the stage game. In other words, there are no "folk theorem equilibria" -- only the equilibria that already existed in the one shot game. 
This could be interpreted in a couple of ways. On the one hand, this means that in large games, it might be harder to sustain cooperation (which is a negative result). On the other hand, since it shrinks the set of equilibria, it means that adding noise to the signaling structure in a large game generically improves the price of anarchy over equilibria of the repeated game, which is a positive result. 

Friday, April 04, 2014

Lecture 10 -- Running Ascending Price Auctions that Make Sincere Bidding an Ex-Post Dominant Strategy

In the 10th lecture in our privacy and mechanism design class, we consider the problem of running an ascending price auction. An ascending price auction is just a generalization of what you normally see as an "auction" on TV -- rather than submitting your valuation in some kind of one-shot protocol, the prices of the goods gradually rise, and you take turns with other bidders making bids on the goods as a function of the current prices.

Why would you want to run such an auction when the VCG mechanism already can provide welfare optimal outcomes for every social choice function, while making truthful reporting a dominant strategy? People quote a couple of reasons:

  1. It might be hard to actually report your full valuation: in principle, you need to figure out exactly your value for every bundle you might receive, and its difficult to pin down a number. In an ascending price auction, all you need to do is be able to point to your favorite good (or bundle of goods) that you would buy if the current prices were the final prices, which is often an easier task. 
  2. An ascending price auction can end without you having to reveal your full type. For example, in a single item second price auction, the highest bidder never has to reveal (even to the auctioneer) his value for the good -- only that it is higher than that of the second highest bidder. Hence, people might prefer such auctions for "privacy" reasons. 
In an ascending price auction, "truthful" reporting doesn't make sense, since nobody ever asks you to report your type. But we can ask for "sincere bidding", in which bidders truthfully bid on the item at each round that is their favorite, given the current prices. But there is a problem: we typically can't implement sincere bidding as a dominant strategy, because of the problem of threats. Consider the following simple example:

Suppose we have two unit demand bidders 1 and 2, and two goods for sale a and b. We have v_{1,a} = 1, v_{1,b} = epsilon and v_{2,a} = 1/2, v_{2, b} = 1/2 - \epsilon. Suppose moreover that bidder 2 takes the following strategy: "Bid on good a. If bidder 1 bids on good a, then outbid him on whatever he bids on until the price is > 1.'' Against this strategy, bidder 1 cannot obtain non-negative utility if he bids on his favorite good (a), and so his best response is to place an insincere bid on good 2. Moreover, bidder 2 has a clear motivation to take this threatening position -- he obtains substantially higher payoff than if players followed sincere bidding, since he gets his most preferred good without any competition. As a result of instances like these, typically ascending price auctions can implement sincere bidding at best as an (ex-post) Nash equilibirum. 

In this lecture, we talk about how to implement an ascending auction such that the prices are differentially private in the bidding strategies of the players (and the allocation in the end is jointly differentially private). This fixes two of the problems above:
  1. The privacy guaranteed by the ascending price auction is no longer hand-wavy and qualitative, but rather precise and quantitative. 
  2. We get sincere bidding as an asymptotic ex-post dominant strategy for all players.
To get this result, we need only a mild large-market assumption: that the "supply" of each good is modestly large compared to the number of different types of goods -- but crucially we need to assume nothing about how bidder preferences are generated. 

The intuition, which we will appeal to again later, is that by running the auction privately, we have eliminated the possibility that players can distort incentives by threatening each other.


Saturday, March 29, 2014

Lecture 9 -- Purchasing Private Data from Privacy Sensitive Individuals

Yesterday in our privacy and mechanism design course, we were fortunate to have a guest lecture by David Xiao. David told us his exciting recent paper, with Kobbi Nissim and Salil Vadhan, Redrawing the Boundaries on Purchasing Private Data from Privacy-Sensitive Individuals.

Consider the following scenario: An analyst wishes to conduct some medical study about an underlying population, but needs to obtain permission from each individual whose data he uses. On the one hand, he needs to buy data from a representative sample of the population so that his study is accurate. On the other hand, he needs to compensate individuals for their privacy costs, and would like to come up with a payment scheme that incentivizes them to report their true privacy costs, rather than inflating them for selfish gain. Finally, he wants the mechanism to be individually rational: that no rational agent should obtain negative utility by interacting with the analyst.

Because individual's costs for privacy are a function of the method by which their reports are used to compute the outcome of the mechanism, rather than just a function of the outcome itself, this takes us outside of a standard mechanism design setting. What makes the problem tricky is that individual's costs for privacy could quite plausibly be correlated with their private data. Suppose the analyst wishes to estimate the fraction of people in some population who have syphilis. It is reasonable to expect that syphilitics will on the whole want to be compensated more than healthy individuals for a loss of privacy. But this means that even computations on agents reported costs for privacy (and independent of agent's supposedly private data) can lead to privacy loss for those agents, and so must be compensated.

Some years ago Arpita Ghosh and I studied this problem, and showed an impossibility result when making some (unreasonably) strong assumptions. One might have hoped that our result could be circumvented with one of several tweaks to the model. But no. David and his coauthors extend this impossibility result to have much wider applicability, making fewer assumptions on what the analyst is able to observe, and far fewer assumptions about the form of the privacy loss function of the agents. Their result is quite robust: Under extremely general circumstances, no truthful individually rational mechanism which makes finite payments can distinguish between two populations, in one of which everyone has syphilis, and in the other of which nobody does. This result says that no mechanism can simultaneously enjoy truthfulness, individual rationality, and non-trivial accuracy properties, and so without drastically relaxing the model of how people might value privacy, you must always give up on one of these.

They do propose one such relaxation, which seems to reduce to something like the assumption that contracting syphilis can only ever cause your costs for privacy to increase, never to decrease. But this is probably not the last word. I think that convincing answers for what to do in the face of their impressive impossibility result are still to be proposed, and is a really interesting question.

Friday, March 21, 2014

Lecture 8 -- Implementing Correlated Equilibria Ex-Post in Incomplete Information Games

After a spring break hiatus, our class on privacy and mechanism design returns (with all of our students working on their course projects!)

In our third and final lecture on using mediators to implement equilibrium of complete information games in settings of incomplete information, we ask how far we can push the agenda of obtaining ex-post implementations via the technique of differentially private equilibrium computation. Recall that last lecture we saw how to do this in large congestion games. Can we get similar results in arbitrary large games?

One obstacle is that we do not know good algorithms for computing Nash equilibria in arbitrary games at all, let alone privately. However, we do know how to compute arbitrarily good approximations to correlated equilibria in arbitrary n player games! In this lecture, we explore the game theoretic implications of private computation of correlated equilibria, and then show how to do it.

The punchline is you can still implement (correlated) equilibria of the complete information game as an ex-post Nash equilibrium of the incomplete information game, but you need a slightly stronger mediator (which has the power to verify player types if they decide to use the mediator).

To accomplish this goal, we introduce a couple of interesting tools: A powerful composition theorem in differential privacy due to Dwork, Rothblum, and Vadhan, and the multiplicative weights (or weighted majority, or polynomial weights, or hedge, or...) algorithm, which is a natural learning dynamic and can be used to quickly compute (coarse) correlated equilibria in arbitrary games.

Next week we will have a guest lecture, combined with our theory seminar, by David Xiao, who will tell us about his exciting recent work on Redrawing the Boundaries on Purchasing Data from Privacy Sensitive Individuals. 

Saturday, March 01, 2014

Lecture 7 -- Privacy Preserving Public Information for Sequential Games

Our seventh lecture was given by Jamie Morgenstern, about her very interesting paper joint with Avrim Blum, Ankit Sharma, and Adam Smith.

The birds-eye view is the following:

Suppose players (think financial institutions) take turns sequentially deciding which investments to make, from amongst a feasible set, which can be different for each player. In general, the profit that a player gets from an investment is a decreasing function of how many players previously have made the same investment. (Perhaps these investments are structured like pyramid schemes, where there is a big advantage in getting in early, or perhaps the market is somehow destabilized if there is too much capital invested in it).

We can study this interaction in various information models. In the complete information setting, each player sees exactly how much has been invested in each resource at the time that she arrives, and the unique dominant strategy solution of the game is for each player to invest greedily. They show that this solution achieves a good constant factor approximation to the optimal welfare.

But if the players are financial institutions, then their investments might represent sensitive trade secrets, and they may not want to share this information with others -- in which case the complete information setting seems unrealistic. This could be very bad news however -- if players have no information at all about what has gone on before their arrival, its not hard to cook up plausible sounding behaviors for them which result in disastrous welfare outcomes.

So the paper asks: can we introduce a differentially private signal (so one that necessarily reveals little actionable information about each agent, and therefore one whose introduction the agents have little reason to object to) that nevertheless allows the market to achieve social welfare that approximates OPT.

Skipping over some details, this paper shows that the answer is yes. Making public a differentially private count of how much has been invested in each resource as the game plays out is enough to guarantee that sequential play (studied either as the simple behavioral strategy in which players imagine that the noisy counts are exactly correct, or any solution in which players play only undominated strategies) results in an outcome that has a bounded competitive ratio with OPT.

This paper also contains an interesting technique that will probably be useful in other contexts: they develop a new method of privately maintaining the count of a set of numbers that achieves better additive error as compared to previous work, at the cost of introducing some small multiplicative error. In the application they need counters for in this paper, this modification gives improved overall bounds.

Friday, February 21, 2014

Lecture 6 -- Implementing Nash equilibria ex-post in incomplete information games (Part 2)

In Lecture 6 of our Privacy and Mechanism Design course, we continue our study of the use of differential privacy to implement equilibrium of complete information games -ex post- in settings of incomplete information augmented with an extremely weak "mediator".

Last class, we proved that to accomplish this task, it would be sufficient to design an algorithm that:

  1. Satisfied (joint) differential privacy, and...
  2. Computed an (asymptotic) Nash equilibrium of the complete information game induced by reported types. 
We also took a brief detour into private algorithmic techniques, and learned how to privately maintain a count on a stream of numbers. 

This class, we partially implement this agenda. We define large games and congestion games, and give an algorithm for privately computing Nash equilibrium in large congestion games, which uses our newly learned ability to count. We conclude that in such games, with the help of a weak mediator, we can enjoy the nice properties of the pure strategy Nash equilibria of these games in the complete information setting, even in incomplete information settings when player types can be arbitrary (i.e. not necessarily drawn from any prior distribution). 

There remains one more lecture in this series -- we will show that if we strengthen the power of our weak mediators somewhat, by giving them the ability to verify reported player types, then we can extend these results to arbitrary large games (i.e. not necessarily congestion games) by privately computing correlated equilibria. 

This last part will have to wait a couple of weeks though. Next week, we will have Jamie Morgenstern giving us a guest lecture on her very interesting recent work joint with Avrim Blum, Ankit Sharma, and Adam Smith. 

Sunday, February 16, 2014

Lecture 4 -- The Empirical Implications of Privacy Aware Choice

Just barely out of order, the notes for Lecture 4 of our privacy and mechanism design class are now online.

In this lecture, Rachel Cummings told us about her very interesting work with Federico Echenique and Adam Wierman, studying what implications preferences for privacy might have in the revealed preferences setting.

The idea, in a nutshell, is this: suppose some entity is observing all of your purchase decisions online, and trying to deduce from them what your utility function over goods is. For example, if it observes you purchasing beer when you could have had milk for the same price, it will deduce that you strictly prefer beer to milk.

Now suppose that you understand that you are being observed, and you have preferences not only over the goods that you buy, but also over the set of deductions that the observer is going to make about you. Given a set of prices and a budget, you will choose which bundle of goods to buy to optimize this tradeoff between your utility for the bundle, and your cost for what is learned about you by the observer. As always, however, the observer only gets to see what you buy.

Can the observer tell whether or not you care about privacy? Can he deduce what your preferences over goods are? Can he prove that you are not buying goods totally at random?

One of the main results discussed in this lecture, is that for sufficiently general forms of the preferences that agents may hold, the answer to all of these questions is "no". Specifically, all sets of observations are rationalizable by some privacy-aware utility function.

Friday, February 14, 2014

Lecture 5 -- Implementing Nash equilibria ex-post in incomplete information games (Part 1)

Welcome back (after a brief hiatus) to the privacy and mechanism design course. Last week we had a guest lecture by Rachel Cummings about her work on the Empirical Implications of Privacy Aware Choice.  We will have notes and a blog post about that soon, but for now, welcome to Lecture 5!

In this lecture, we consider the following problem. Suppose we have an n player game. We might want to implement a Nash equilibrium of the complete information game (which often have nice properties), but naturally find ourselves in the common situation when the n players do not know each others types. Is there a modification of the game that can implement the complete-information game equilibrium as a simple ex-post Nash equilibrium, which does not require that players know anything about each others types, and which requires no coordination?

The modification of the game we consider is the introduction of a very weak "mediator", which essentially only has the power to listen and to give advice, but not to verify or to enforce. Players will have the option of reporting their type to the mediator, in which case it will suggest an action for them to play. However, they can:

  1. Completely ignore the mediator and choose an action to play on their own
  2. Lie to the mediator about their types, and/or
  3. Use the mediator's suggestion however they like, not necessarily following it (i.e. the mediator can not enforce its suggestion)
What we would like is that players should all follow "good behavior", which involves

  1. Truthfully reporting their type to the mediator, and then
  2. Faithfully following the mediator's suggestion.

We prove that if the mediator offers its suggestions by computing a Nash equilibrium of the complete information game defined by players reported types, and satisfies (a variant of) differential privacy, then the mediator makes "good behavior" an asymptotic ex-post Nash equilibrium of the incomplete information game, and that in this ex-post equilibrium, players end up playing a Nash equilibrium of the original game of complete information (defined by the realized player types).

This leaves us with the minor task of actually figuring out how to compute a Nash equilibrium privately. Here, we realize that thus far in the class, we have been skating by knowing nothing about private algorithm design except for the exponential mechanism. So we take a detour to learn some more basics. We get as far as learning how to count, which turns out to be enough. This will set us up for next lecture, when we can actually implement this agenda and design a mediator that satisfies the hypotheses we need.

(The idea of designing a mediator by privately computing an equilibrium comes from joint work with Michael Kearns, Mallesh Pai, and Jon Ullman, which shows how to privately compute correlated equilibria in arbitrary large games. We will cover this later. The implication in this lecture about mediators that privately compute Nash equilibria comes from joint work with Ryan Rogers, which shows how to privately compute Nash equilibria in large congestion games. )

Thursday, January 30, 2014

Lecture 3 -- Mechanism Design for arbitrary objective functions, and without money!

In the third lecture of our course on privacy and mechanism design, we go back to using differential privacy as a tool in mechanism design, and cover a remarkable result of Nissim, Smorodinsky, and Tennenholtz.

Traditionally in mechanism design, social welfare holds a special place in the pantheon of objective functions: this is because in any social choice setting, we can use the VCG mechanism to exactly optimize for social welfare while guaranteeing dominant strategy truthfulness. In general, it is not possible to truthfully optimize most other objective functions -- and even for social welfare, the VCG mechanism requires the ability to charge payments. It is the rare setting when we can achieve some objective truthfully without payments, and even rarer when that objective is not social welfare.

In todays lecture, we give a general mechanism that in any social choice setting (satisfying certain technical conditions) approximately optimizes any low-sensitivity objective function (social welfare satisfies this condition, but so do many others). Moreover, this mechanism is strictly dominant strategy truthful, and crucially does not require the use of payments. Of course, we can't get around classical impossibility results for free -- the tradeoff, as always, is that we are only optimizing our objective approximately up to some additive loss. However, in large economies, this loss will often become negligible as the size of the population grows.

After class, Rachel Cummings will give our theory seminar on her own work in privacy and mechanism design: The Empirical Implications of Privacy Aware Choice, which is joint work with Federico Echenique and Adam Wierman, both from Caltech. Be there or be square!

Thursday, January 23, 2014

Lecture 2: Privacy as a Desideratum in Mechanism Design -- Making the Exponential Mechanism Truthful

For those of you following the privacy and mechanism design course at home, tomorrow's Lecture 2 is now online.

This lecture explores a different aspect of privacy and mechanism design. Last week, we considered the use of differential privacy purely as a tool, to design a prior free asymptotically truthful and revenue optimal mechanism for digital goods auctions. In this lecture, we motivate differential privacy as a desideratum for its own sake, and argue that it gives mechanisms (often at very little cost) a natural notion of robustness against the possibility of extra unmodeled parts of agents utility functions.

Finally, we derive a private version of the VCG mechanism, that for general social choice problems is simultaneously:

  1. (Exactly) dominant strategy truthful
  2. Differentially private
  3. (Approximately) welfare optimal
This result is from Huang and Kannan 2012: "The Exponential Mechanism for Social Welfare: Private, Truthful, and Nearly Optimal". Although this is a recent paper, it is a basic, important, foundational result that "should" have been discovered earlier. It fits in very nicely at the beginning of the semester.

(And for those of you actually taking this class and looking for inspiration to do a course project, this paper started as Zhiyi Huang's course project in the 2011 privacy course I taught here at Penn)