Individual Notions of Fairness You Can Use
Our group at Penn has been thinking about when individual notions of fairness might be practically achievable for awhile, and we have two new approaches.
Background:
Statistical Fairness
I've written about this before, here. But briefly: there are two families of definitions in the fairness in machine learning literature. The first group of definitions, which I call statistical fairness notions, is far and away the most popular. If you want to come up with your own statistical fairness notion, you can follow this recipe:
- Partition the world into a small number of "protected sub-groups". You will probably be thinking along the lines of race or gender or something similar when you do this.
- Pick your favorite error/accuracy metric for a classifier. This might literally be classification error, or false positive or false negative rate, or positive predictive value, or something else. Lots of options here.
- Ask that this metric be approximately equalized across your protected groups.
- Finally, enjoy your new statistical fairness measure! Congratulations!
These definitions are far and away the most popular in this literature, in large part (I think) because they are so immediately actionable. Because they are defined as conditions on a small number of expectations, you can easily check whether your classifier is "fair" according to these metrics, and (although there are some interesting computational challenges) go and try and learn classifiers subject to these constraints.
Their major problem is related to the reason for their success: they are defined as conditions on a small number of expectations or averages over people, and so they don't promise much to particular individuals. I'll borrow an example from our fairness gerrymandering paper from a few years ago to put this in sharp relief. Imagine that we are building a system to decide who to incarcerate, and we want to be "fair" with respect to both gender (men and women) and race (green and blue people). We decide that in our scenario, it is the false positives who are harmed (innocent people sent to jail), and so to be fair, we decide should equalize the false positive rate: across men and women, and across greens and blues. But one way to do this is to jail all green men and blue women. This does indeed equalize the false positive rate (at 50%) across all four of the groups we specified, but is cold comfort if you happen to be a green man --- since then you will be jailed with certainty. The problem was our fairness constraint was never a promise to an individual to begin with, just a promise about the average behavior of our classifier over a large group. And although this is a toy example constructed to make a point, things like this happen in real data too.
Individual Fairness
Individual notions of fairness, on the other hand, really do correspond to promises made to individuals. There are at least two kinds of individual fairness definitions that have been proposed: metric fairness, and weakly meritocratic fairness. Metric fairness proposes that the learner will be handed a task specific similarity metric, and requires that individuals who are close together in the metric should have a similar probability of being classified as positive. Weakly meritocratic fairness, on the other hand, takes the (unknown) labels of an individual as a measure of merit, and requires that individuals who have a higher probability of really having a positive label should have only a higher probability of being classified as positive. This in particular implies that false positive and false negative rates should be equalized across individuals, where now the word rate is averaging over only the randomness of the classifier, not over people. What makes both of these individual notions of fairness is that they impose constraints that bind on all pairs of individuals and not just over averages of people.
Definitions like this have the advantage of strong individual-level semantics, which the statistical definitions don't have. But they also have big problems: for metric fairness, the obvious question is: where does the metric come from? Even granting that fairness should be some Lipschitz condition on a metric, it seems hard to pin down what the metric is, and different people will disagree: coming up with the metric seems to encapsulate a large part of the original problem of defining fairness. For weakly meritocratic fairness, the obvious problem is that we don't know what the labels are. Its possible to do non-trivial things if you make assumptions about the label generating process, but its not at all clear you can do any non-trivial learning subject to this constraint if you don't make strong assumptions.
Two New Approaches:
We have two new approaches, building off of metric fairness and weakly meritocratic fairness respectively. Both have the advantages of statistical notions of fairness in that they can be put into practice without making unrealistic assumptions about the data, and without needing to wait on someone to hand us a metric. But they continue to make meaningful promises to individuals.
Subjective Individual Fairness
Lets start with our variant of metric fairness, which we call subjective individual fairness. (This is joint work with Michael Kearns, our PhD students Chris Jung and Seth Neel, our former PhD student Steven Wu, and Steven's student (our grand student!) Logan Stapleton). The paper is here: https://arxiv.org/abs/1905.10660. We stick with the premise that "similar people should be treated similarly", and that whether or not it is correct/just/etc., it is at least fair to treat two people the same way, in the sense that we classify them as positive with the same probability. But we don't want to assume anything else.
Suppose I were to create a machine learning fairness panel: I could recruit "AI Ethics" experts, moral philosophers, hyped up consultants, people off the street, toddlers, etc. I would expect that there would be as many different conceptions of fairness as there were people on the panel, and that none of them could precisely quantify what they meant by fairness --- certainly not in the form of a "fairness metric". But I could still ask these people, in particular cases, if they thought it was fair that two particular individuals be treated differently or not.
Of course, I would have no reason to expect that the responses that I got from the different panelists would be consistent with one another --- or possibly even internally consistent (we won't assume, e.g. that the responses satisfy any kind of triangle inequality). Nevertheless, once we fix a data distribution and a group of people who have opinions about fairness, we have a well defined tradeoff we can hope to manage: any classifier we could choose will have both:
- Some error rate, and
- Some frequency with which it makes a pair of decisions that someone in the group finds unfair.
We can hope to find classifiers that optimally trade off 1 and 2: note this is a coherent tradeoff even though we haven't forced the people to try and express their conceptions of fairness into some consistent metric. What we show is that you can do this.
Specifically, given a set of pairs that we have determined should be treated similarly, there is an oracle efficient algorithm that can find the optimal classifier subject to the constraint that no pair of individuals that has been specified as a constraint should have a substantially different probability of positive classification. Oracle efficiency means that what we can do is reduce the "fair learning" problem to a regular old learning problem, without fairness constraints. If we can solve the regular learning problem, we can also solve the fair learning problem. This kind of fairness constraint also generalizes in the standard way: if you ask your fairness panel about a reasonably small number of pairs, and then solve the in-sample problem subject to these constraints, the classifier you learn will also satisfy the fairness constraints out of sample. And it works: we implement the algorithm and try it out on the COMPAS data set, with fairness constraints that we elicited from 43 human (undergrad) subjects. The interesting thing is that once you have an algorithm like this, it isn't only a tool to create "fair" machine learning models: its also a new instrument to investigate human conceptions of fairness. We already see quite a bit of variation among our 43 subjects in our preliminary experiments. We plan to pursue this direction more going forward.
Average Individual Fairness
Next, our variant of weakly meritocratic fairness. This is joint work with Michael Kearns and our student Saeed Sharifi. The paper is here: https://arxiv.org/abs/1905.10607. In certain scenarios, it really does seem tempting to think about fairness in terms of false positive rates. Criminal justice is a great example, in the sense that it is clear that everyone agrees on which outcome they want (they would like to be released from jail), and so the people we are being unfair to really do seem to be the false positives: the people who should have been released from jail, but who were mistakenly incarcerated for longer. So in our "fairness gerrymandering" example above, maybe the problem with thinking about false positive rates wasn't a problem with false positives, but with rates: i.e. the problem was that the word rate averaged over many people, and so it didn't promise you anything. Our idea is to redefine the word rate.
In some (but certainly not all) settings, people are subject to not just one, but many classification tasks. For example, consider online advertising: you might be shown thousands of targeted ads each month. Or applying for schools (a process that is centralized in cities like New York): you apply not just to one school, but to many. In situations like this, we can model the fact that we have not just a distribution over people, but also a distribution over (or collection of) problems.
Once we have a distribution over problems, we can define the error rate, or false positive rate, or any other rate you like for individuals. It is now sensible to talk about Alice's false positive rate, or Bob's error rate, because rate has been redefined as an average over problems, for a particular individual. So we can now ask for individual fairness notions in the spirit of the statistical notions of fairness we discussed above! We no longer need to define protected groups: we can now ask that the false positive rates, or error rates, be equalized across all pairs of people.
It turns out that given a reasonably sized sample of people, and a reasonably sized sample of problems, it is tractable to find the optimal classifier subject to constraints like this in sample, and that these guarantees generalize out of sample. The in-sample algorithm is again an oracle-efficient algorithm, or in other words, a reduction to standard, unconstrained learning. The generalization guarantee here is a little interesting, because now we are talking about simultaneous generalization in two different directions: to people we haven't seen before, and also to problems we haven't seen before. This requires thinking a little bit about what kind of object we are even trying to output: a mapping from new problems to classifiers. The details are in the paper (spoiler --- the mapping is defined by the optimal dual variables for the empirical risk minimization problem): here, I'll just point out that again, the algorithm is practical to implement, and we perform some simple experiments with it.