The very real problem of (un)fairness in algorithmic decision making in general, and machine learning in particular seems to have finally reached the forefront of public attention. Every day there is a new popular article about the topic. Just in the last few weeks, we have seen articles in the Times about built in bias in facebook, and an in-depth ProPublica study about racial bias in statistical models for predicting criminal recidivism. Earlier this month, the White House released a report on the challenges in promoting fairness in Big Data.
The tricky thing is saying something concrete and technical about this problem -- even defining what "fairness" is is delicate. There has been some good technical work in this area that I have long admired from afar -- see e.g. the "FATML" (Fairness and Transparency in Machine Learning) Workshop to get an idea of the range of work being done, and the folks doing it. People like Cynthia Dwork, Moritz Hardt, Solon Barocas, Suresh Venkatasubramanian, Sorelle Friedler, Cathy O'Neil, and others have been doing important work thinking about these problems for quite some time. A particularly nice early paper that I recommend everyone interested in the area read is Fairness Through Awareness, by Dwork, Hardt, Pitassi, Reingold, and Zemel. It was first posted online in 2011(!), and in retrospect is quite prescient in its discussion of algorithmic fairness.
So I'm happy to finally have something interesting to say about the topic! My student Matthew Joseph, Michael Kearns, Jamie Morgenstern, and I just posted a new paper online that I'm excited about: Fairness in Learning: Classic and Contextual Bandits. I'll mostly let the paper speak for itself, but briefly, we write down a simple but (I think) compelling definition of fairness in a stylized general model of sequential decision making called the "contextual bandit setting". To keep a canonical problem in your mind, imagine the following: There are a bunch of different populations (say racial or socioeconomic groups), and you are a loan officer. Every day, an individual from each population applies for a loan. You get to see the loan application for each person (this is the "context"), and have to decide who to give the loan to. When you give out the loan, you observe some reward (e.g. you see if they paid back the loan), but you don't see what reward you -would- have gotten had you given the loan to someone else. Our fairness condition says roughly that an algorithm is "fair" if it never preferentially gives a loan to a less qualified applicant over a more qualified applicant -- where the quality of an applicant in our setting is precisely the probability that they pay back the loan. (It prohibits discriminating against qualified applicants on an individual basis -- even if they happen to come from a population that is less credit-worthy on average, or from a population that the bank doesn't understand as well).
It might seem like this definition of fairness is entirely consistent with the profit motivation of a bank -- why would a bank ever want to give a loan to an applicant less likely to pay it back? Indeed, this would be true if the bank had nothing to learn -- i.e. if it already knew the optimal rule mapping loan applications to credit-worthiness. Said another way, implementing the optimal policy is entirely consistent with our fairness definition. Our main conceptual message is that fairness can nevertheless be an obstruction to learning the optimal policy.
What our results say is that "fairness" always has a cost in terms of the optimal learning rate achievable by algorithms in this setting. For some kinds of problems, the cost is mild in that the cost of fairness on the learning rate is only polynomial (e.g. when credit-worthiness is determined by a simple linear regression model on the features of a loan application). On the other hand, for other kinds of problems, the cost of fairness on the learning rate is severe, in that it can slow learning by an exponential factor (e.g. when credit-worthiness is determined by an AND of features in the loan application). Put another way, for the problems in which the cost of fairness is severe, if the bank were to use a fast learning algorithm (absent a fairness constraint), the algorithm might be "unfair" for a very long time, even if in the limit, once it learned the truly optimal policy, it would eventually be fair. One friction to fairness is that we don't live in the limit -- we are always in a state of learning.
In conjunction with the Sixteenth ACM Conference on Economics and Computation (EC'15), we solicit submissions for the First Workshop on Algorithmic Game Theory and Data Science, to be held on June 15, 2015 in Portland, Oregon, USA.
Computer systems have become the primary mediator of social and economic interactions, enabling transactions at ever-increasing scale. Mechanism design when done on a large scale needs to be a data-driven enterprise. It seeks to optimize some objective with respect to a huge underlying population that the mechanism designer does not have direct access to. Instead, the mechanism designer typically will have access to sampled behavior from that population (e.g. bid histories, or purchase decisions). This means that, on the one hand, mechanism designers will need to bring to bear data-driven methodology from statistical learning theory, econometrics, and revealed preference theory. On the other hand, strategic settings pose new challenges in data science, and approaches for learning and inference need to be adapted to account for strategization.
The goal of this workshop is to frame the agenda for research at the interface of algorithms, game theory, and data science. Papers from a rich set of experimental, empirical, and theoretical perspectives are invited. Topics of interest include but are not limited to:
Any submission format between abstracts and full papers will be considered. Abstracts may be rejected if we cannot sufficiently evaluate their contribution. Full papers will be evaluated after page 10 only at the discretion of the committee.
We solicit both new work and work recently published or soon to be published in another venue. For submissions of the latter kind, authors must clearly state the venue of publication. This workshop will have no published proceedings. Papers appearing in published conference proceedings or journals subsequent to EC 2014 will be considered, though preference may be given to papers that have not yet appeared. Papers that have appeared or are to appear at EC or affiliated workshops will not be considered.
Authors are encouraged to provide a link to an online version of the paper (such as on arXiv). If accepted, such papers will be linked via an index to give an informal record of the workshop.
All submissions should be sent electronically to AGTDataScienceWorkshop15@gmail.com on or before April 26th, 2015. Notification of acceptance will be on May 11, 2015.