Tuesday, May 24, 2016

Fairness in Learning

The very real problem of (un)fairness in algorithmic decision making in general, and machine learning in particular seems to have finally reached the forefront of public attention. Every day there is a new popular article about the topic. Just in the last few weeks, we have seen articles in the Times about built in bias in facebook, and an in-depth ProPublica study about racial bias in statistical models for predicting criminal recidivism. Earlier this month, the White House released a report on the challenges in promoting fairness in Big Data.

The tricky thing is saying something concrete and technical about this problem -- even defining what "fairness" is is delicate. There has been some good technical work in this area that I have long admired from afar -- see e.g. the "FATML" (Fairness and Transparency in Machine Learning) Workshop to get an idea of the range of work being done, and the folks doing it. People like Cynthia DworkMoritz HardtSolon BarocasSuresh VenkatasubramanianSorelle Friedler, Cathy O'Neil, and others have been doing important work thinking about these problems for quite some time. A particularly nice early paper that I recommend everyone interested in the area read is Fairness Through Awareness, by Dwork, Hardt, Pitassi, Reingold, and Zemel. It was first posted online in 2011(!), and in retrospect is quite prescient in its discussion of algorithmic fairness.

So I'm happy to finally have something interesting to say about the topic! My student Matthew JosephMichael KearnsJamie Morgenstern, and I just posted a new paper online that I'm excited about: Fairness in Learning: Classic and Contextual Bandits. I'll mostly let the paper speak for itself, but briefly, we write down a simple but (I think) compelling definition of fairness in a stylized general model of sequential decision making called the "contextual bandit setting". To keep a canonical problem in your mind, imagine the following: There are a bunch of different populations (say racial or socioeconomic groups), and you are a loan officer. Every day, an individual from each population applies for a loan. You get to see the loan application for each person (this is the "context"), and have to decide who to give the loan to. When you give out the loan, you observe some reward (e.g. you see if they paid back the loan), but you don't see what reward you -would- have gotten had you given the loan to someone else. Our fairness condition says roughly that an algorithm is "fair" if it never preferentially gives a loan to a less qualified applicant over a more qualified applicant -- where the quality of an applicant in our setting is precisely the probability that they pay back the loan. (It prohibits discriminating against qualified applicants on an individual basis -- even if  they happen to come from a population that is less credit-worthy on average, or from a population that the bank doesn't understand as well).

It might seem like this definition of fairness is entirely consistent with the profit motivation of a bank -- why would a bank ever want to give a loan to an applicant less likely to pay it back? Indeed, this would be true if the bank had nothing to learn -- i.e. if it already knew the optimal rule mapping loan applications to credit-worthiness. Said another way, implementing the optimal policy is entirely consistent with our fairness definition. Our main conceptual message is that fairness can nevertheless be an obstruction to learning the optimal policy.

What our results say is that "fairness" always has a cost in terms of the optimal learning rate achievable by algorithms in this setting. For some kinds of problems, the cost is mild in that the cost of fairness on the learning rate is only polynomial (e.g. when credit-worthiness is determined by a simple linear regression model on the features of a loan application). On the other hand, for other kinds of problems, the cost of fairness on the learning rate is severe, in that it can slow learning by an exponential factor (e.g. when credit-worthiness is determined by an AND of features in the loan application). Put another way, for the problems in which the cost of fairness is severe, if the bank were to use a fast learning algorithm (absent a fairness constraint), the algorithm might be "unfair" for a very long time, even if in the limit, once it learned the truly optimal policy, it would eventually be fair. One friction to fairness is that we don't live in the limit -- we are always in a state of learning.

Saturday, October 10, 2015

Penn is Hiring in Computer Science

Penn Engineering is embarking on a period of substantial growth, and the computer science department plans to hire lots of people over the next several years. (You can see a draft of the Penn Engineering plan for growth here. "Data Science and Computation" and Security are both priority areas).

This year, we are planning to hire for multiple positions, both junior and senior, with a focus on (among other things), machine learning. 

So if you are on the market (or thinking about it), send us your application -- apply here: https://facultysearches.provost.upenn.edu/postings/663 

(Unless you want to apply for an endowed chair in computer graphics -- then apply here: https://facultysearches.provost.upenn.edu/postings/664 )

Edit: I forgot to mention! Interest in machine learning is university wide -- both statistics and ESE also plan to hire in machine learning.
Here is the link for applying to the statistics position: https://statistics.wharton.upenn.edu/recruiting/facultypositions/
And here is the link for ESE: http://www.ese.upenn.edu/faculty-staff/index.php

Tuesday, June 09, 2015

An action packed Monday at FCRC

FCRC begins next week, and due to the wonders of co-location will be action packed. (Monday will have among other things EC workshops, as well as the first full day of STOC talks). I've found myself on the organizing committee of two of the three EC workshops, and Monday is also when our STOC paper is scheduled.

So here are three (partially) conflicting events you might be interested in:

NetEcon 2015: Highlights include excellent keynote speakers: Rakesh Vohra at  9:00 am, Eva Tardos at 2:00 pm, and R. Srikant at 4:00 pm

The Workshop on Algorithmic Game Theory and Data Science: This workshop is highlighting work in a very exciting area -- the intersection between "data science" and mechanism design -- which is at its very beginning, and I think ripe for lots of important work to be done. Its also attracted a great lineup of speakers (my student Steven will be giving the talk on our paper).

The STOC presentation of our paper, Preserving Statistical Validity in Adaptive Data Analysis: This is at 1:55pm, unfortunately conflicting with NetEcon -- but the attendees of the AGT + Data Science workshop will get a break in order to attend. Vitaly will be giving the talk, which I expect will be very good.

If you want to hear a longer (but probably less good) version of the talk, you can tune in tomorrow at 1pm eastern when I talk about this paper for TCS+

EDIT: The video of the talk is here. It went off with only one technical hitch.


Thursday, April 16, 2015

Algorithmic Game Theory and Data Science

With FOCS submissions sent off and EC rejections in hand, its time to think about presenting your work at a workshop, and chat with your colleagues doing similar things. 

If you are working on something at the intersection of algorithmic game theory and machine learning (this includes e.g. the sample complexity of auction design, or learning from revealed preferences, or learning from censored feedback), then you should consider the "Algorithmic Game Theory and Data Science" workshop that we'll be running during EC 2015. Conveniently, this is at FCRC, so if you were planning on attending EC or STOC (or SIGmetrics, or CCC, or...) you'll already be there in Portland. 
Deadline is in 10 days!

https://sites.google.com/site/agtanddatascienceworkshop2015/

---------------------------


Call for Papers

In conjunction with the Sixteenth ACM Conference on Economics and Computation (EC'15), we solicit submissions for the First Workshop on Algorithmic Game Theory and Data Science, to be held on June 15, 2015 in Portland, Oregon, USA.

Computer systems have become the primary mediator of social and economic interactions, enabling transactions at ever-increasing scale.  Mechanism design when done on a large scale needs to be a data-driven enterprise.  It seeks to optimize some objective with respect to a huge underlying population that the mechanism designer does not have direct access to.  Instead, the mechanism designer typically will have access to sampled behavior from that population (e.g. bid histories, or purchase decisions).  This means that, on the one hand, mechanism designers will need to bring to bear data-driven methodology from statistical learning theory, econometrics, and revealed preference theory.  On the other hand, strategic settings pose new challenges in data science, and approaches for learning and inference need to be adapted to account for strategization.  

The goal of this workshop is to frame the agenda for research at the interface of algorithms, game theory, and data science.  Papers from a rich set of experimental, empirical, and theoretical perspectives are invited.  Topics of interest include but are not limited to:
  • Can good mechanisms be learned by observing agent behavior in response to other mechanisms?  How hard is it to "learn'' a revenue maximizing auction given a sampled bid history?  How hard is it to learn a predictive model of customer purchase decisions, or better yet, a set of prices that will accurately maximize profit under these behavioral decisions? 
  • What is the sample complexity of mechanism design?  How much data is necessary to enable good mechanism design?
  • How does mechanism design affect inference?  Are outcomes of some mechanisms more informative than those of others from the viewpoint of inference?
  • How does inference affect mechanism design?  If participants know that their data is to be used for inference, how does this knowledge affect their behavior in a mechanism?
  • Can tools from computer science and game theory be used to contribute rigorous guarantees to interactive data analysis?  Strategic interactions between a mechanism and a user base are often interactive (e.g. in the case of an ascending price auction, or repeated interaction with a customer and an online retailer), which is a setting in which traditional methods for preventing data over-fitting are weak.
  • Is data an economic model? Can data be used to evaluate or replace existing economic models?  What is the consequence for game theory and economics for replacing the model with data.

Submission Instructions

Any submission format between abstracts and full papers will be considered.  Abstracts may be rejected if we cannot sufficiently evaluate their contribution.  Full papers will be evaluated after page 10 only at the discretion of the committee.
We solicit both new work and work recently published or soon to be published in another venue.  For submissions of the latter kind, authors must clearly state the venue of publication.  This workshop will have no published proceedings.  Papers appearing in published conference proceedings or journals subsequent to EC 2014 will be considered, though preference may be given to papers that have not yet appeared.  Papers that have appeared or are to appear at EC or affiliated workshops will not be considered.
Authors are encouraged to provide a link to an online version of the paper (such as on arXiv).  If accepted, such papers will be linked via an index to give an informal record of the workshop.
All submissions should be sent electronically to AGTDataScienceWorkshop15@gmail.com on or before April 26th, 2015.  Notification of acceptance will be on May 11, 2015.

Organizing Committee

Shuchi Chawla, U. of Wisconsin 
Hu Fu, Microsoft Research
Jason Hartline, Northwestern U.
Denis Nekipelov, U. of Virginia
Aaron Roth, U. of Pennsylvania
Kane Sweeney, eBay and Stubhub

Tuesday, April 07, 2015

Netecon deadline in two weeks

A reminder that the NetEcon workshop deadline is coming up in two weeks. If you plan to be at FCRC (for e.g. STOC, or EC, or Sigmetrics), this will be a great place to present your work and get it seen by both the EC and the SIGmetrics community. There's also a great lineup of invited talks (abstracts here: http://netecon.eurecom.fr/NetEcon2015/keynotes.html ) by R. Srikant, Rakesh Vohra, and Eva Tardos.

Here is the call: http://netecon.eurecom.fr/NetEcon2015/index.html

The submission deadline is April 22.

Sunday, February 22, 2015

STOC 2015 Call for Workshops and Tutorials

Sanjeev Khanna and Chandra Chekuri are co-chairing the STOC 2015 workshops and tutorials, and are looking for submissions. The call is below:

Call for STOC 2015 Workshop and Tutorial Proposals

  • Workshop and Tutorial Day: Sunday, June 14, 2015
  • Workshop and Tutorial Co-Chairs: Chandra Chekuri and Sanjeev Khanna
  • Submission deadline: March 20, 2015
  • Notification: March 30, 2015
On Sunday, June 14, immediately preceding the main conference, STOC 2015 will hold a workshop-and-tutorials day. We invite groups of interested researchers to submit workshop or tutorial proposals. The goal of a workshop is to provide an informal forum for researchers to discuss important research questions, directions, and challenges. Connections between theoretical computer science and other areas, topics that are not well represented at STOC, and open problems are encouraged as workshop topics. Organizers are completely free to choose their workshop formats (invited speakers, panel discussions, etc.). The program for June 14th may also involve tutorials, each consisting of 1-2 survey talks on a particular area, and we welcome tutorial proposals as well.
STOC does not have funds to pay travel expenses or honoraria to invited workshop and tutorial speakers. Workshop and tutorials attendance will be free, and there is no separate registration for attending them. STOC will support coffee breaks for the workshops/tutorials attendees but no lunch will be provided. Note that STOC registration neither includes FCRC registration for Sunday, June 14 nor covers lunch for Sunday, June 14. Workshop/tutorial attendees who wish to attend another FCRC conference on June 14, would need to register for that conference.

Proposal submission

Workshop and tutorial proposals should be no longer than 2 pages. Please include a list of names and email addresses of the organizers, a description of the topic and the goals of the workshop or tutorial, the proposed workshop format (invited talks, contributed talks, panel, etc.), and proposed or tentatively confirmed speakers if known. Please also indicate the preferred length of time for your workshop or tutorial, along with the minimum acceptable time. We anticipate a 4-5 hour block for each workshop and a 2-4 hour block for each tutorial. Please feel free to contact the Workshop and Tutorial Co-Chairs at the email addresses below if you have questions about workshop or tutorial proposals.

Submission deadline

Proposals should be submitted by March 20, 2015, via email to chekuri@illinois.edu and sanjeev@cis.upenn.edu. Proposers will be notified by March 30, 2015, about whether their proposals have been accepted.

http://acm-stoc.org/stoc2015/callforworkshops.html

Friday, February 13, 2015

Bringing Differential Privacy to the Masses <3

This Valentines Day, we'll be bringing differential privacy to the (scientific) masses, with a session at the AAAS annual meeting in San Jose.

If you happen to be attending, you should stop by: https://aaas.confex.com/aaas/2015/webprogram/Session9556.html