## Wednesday, November 15, 2017

### Between "statistical" and "individual" notions of fairness in Machine Learning

If you follow the fairness in machine learning literature, you might notice after awhile that there are two distinct families of fairness definitions:

1. Statistical definitions of fairness (sometimes also called group fairness), and
2. Individual notions of fairness.
The statistical definitions are far-and-away the most popular, but the individual notions of fairness offer substantially stronger semantic guarantees. In this post, I want to dive into this distinction a little bit. Along the way I'll link to some of my favorite recent fairness papers, and end with a description of a recent paper from our group which tries to get some of the best properties of both worlds.

Statistical Definitions of Fairness
At a high level, statistical definitions of fairness partition the world into some set of "protected groups", and then ask that some statistical property be approximately equalized across these groups. Typically, the groups are defined via some high level sensitive attribute, like race or gender. Then, statistical definitions of fairness ask that some relevant statistic about a classifier be equalized across those groups. For example, we could ask that the rate of positive classification be equal across the groups (this is sometimes called statistical parity). Or, we could ask that the false positive and false negative rates be equal across the groups (This is what Hardt, Price, and Srebro call "Equalized Odds"). Or, we could ask that the positive predictive value (also sometimes called calibration) be equalized across the groups.  These are all interesting definitions, and there are very interesting impossibility results that emerge if you think you might want to try and satisfy all of them simultaneously.

These definitions are attractive: first of all, you can easily check whether a classifier satisfies them, since verifying that these fairness constraints hold on a particular distribution simply involves estimating some expectations, one for each protected group. It can be computationally hard to find a classifier that satisfies them, while minimizing the usual convex surrogate losses --- but computational hardness pervades all of learning theory, and hasn't stopped the practice of machine learning. Already there are a number of practical algorithms for learning classifiers subject to these statistical fairness constraints --- this recent paper by Agarwal et al. is one of my favorites.

Semantics
But what are the semantics of these constraints, and why do they correspond to "fairness"? You can make a different argument for each one, but here is the case for equalizing false positive and negative rates across groups (equalized odds), which is one of my favorites:

You can argue (making the big assumption that the data is not itself already corrupted with bias --- see this paper by Friedler et al. for a deep dive into these assumptions...) that a classifier potentially does harm to an individual when it misclassifies them. If the (say) false positive rate is higher amongst black people than white people, then this means that if you perform the thought experiment of imagining yourself as a uniformly random black person with a negative label, the classifier is more likely to harm you than if you imagine yourself as a uniformly random white person with a negative label. On the other hand, if the classifier equalizes false positive and negative rates across populations, then behind the "veil of ignorance", choosing whether you want to be born as a uniformly random black or white person (conditioned on having a negative label), all else being equal, you should be indifferent!

Concerns
However, in this thought experiment, it is crucial that you are a uniformly random individual from the population. If you know some fact about yourself that makes you think you are not a uniformly random white person, the guarantee can lack bite. Here is a simple example:

Suppose we have a population of individuals, with two features and a label. Each individual has a race (either black or white) and a gender (either male or female). The features are uniformly random and independent. Each person also has a label (positive or negative) that is also uniformly random, and independent of the features. We can define four natural protected groups: "Black people', "White people", "Men", and "Women".

Now suppose we have a classifier that labels an individual as positive if and only if they are either a black man or a white woman. Across these four protected groups, this classifier appears to be fair: the false positive rate and false negative rate are all exactly 50% on all four of the protected groups, so it satisfies equalized odds. But this hides a serious problem: on two subgroups (black men and white women), the false positive rate is 100%! If being labelled positive is a bad thing, you would not want to be a random (negative) member of those sub-populations.

Of course, we could remedy this problem by simply adding more explicitly protected subgroups. But this doesn't scale well: If we have d protected binary features, there are 2^(2^d) such subgroups that we can define. If the features are real valued, things are even worse, and we quickly get into issues of overfitting (see the next section).

Individual Definitions of Fairness
At a high level, the problem with statistical definitions of fairness is that they are defined with respect to group averages, so if you are not an "average" member of your group, they don't promise you anything. Individual definitions of fairness attempt to remedy this problem by asking for a constraint that binds at the individual level. The first attempt at this was in the seminal paper "Fairness Through Awareness", which suggested that fairness should mean that "similar individuals should be treated similarly". Specifically, if the algorithm designer knows the correct metric by which individual similarity should be judged, the constraint requires that individuals who are close in that metric should have similar probabilities of any classification outcome. This is a semantically very strong definition. It hasn't gained much traction in the literature yet, because of the hard problem of specifying the similarity metric. But I expect that it will be an important definition going forward --- there are just some obstacles that have to be overcome. I've been thinking about it a lot myself.

Another definition of individual fairness is one that my colleagues and I at Penn introduced in this paper, which we have taken to calling "Weakly Meritocratic Fairness". I have previously blogged about it here. We define it in a more general context, and in an online setting, but it makes sense to think about what it means in the special case of batch binary classification as well. In that case, it reduces to three constraints:
1. For any two individuals with a negative label, the false positive rate on those individuals must be equal
2. For any two individuals with a positive label, the false negative rate on those individuals must be equal
3. For any pair of individuals A and B such that A has a negative label, and B has a positive label, the true positive rate on B can't be lower than the false positive rate on A. (Assuming that positive is the more desirable label --- otherwise this is reversed).
Again, we are talking about false positive rates --- but now there is no distribution --- i.e. nothing to take an average over. Instead, the constraints bind on every pair of individuals. As a result, the "rate" we are talking about is calculated only over the randomness of the classifier itself.

This constraint in particular implies that false positive rates are equal across any sub-population that you might define, even ex-post! Problem solved! Why don't we all just use this definition of fairness? Here is why:

Concerns:
Weakly meritocratic fairness is extremely similar (identical, except for the added condition 3) to asking for equalized odds fairness on the (potentially infinitely large) set of "protected groups" corresponding to singletons --- everyone forms their own protected group! As a result, you shouldn't hope to be able to satisfy a fairness definition like this without making some assumption on the data generating process. Even if on your sample of data, it looks like you are satisfying this definition of fairness, you are overfitting --- the complexity of this class of groups is too large, and your in-sample "fairness" won't generalize out of sample. If you are willing to make assumptions on the data generating process, then you can do it! In our original paper, we show how to do it if you assume the labels obey a linear relationship to the features. But assumptions of this sort generally won't hold exactly in practice, which makes this kind of approach difficult to implement in practical settings.

The "Fairness Through Awareness" definition at a high level suffers from the same kind of problem: you can only use it if you make a very strong assumption (namely that you have what everyone agrees is the "right" fairness metric). And this is the crux of the problem with definitions of individual fairness: they seem to require such strong assumptions, that we cannot actually realize them in practice.

A Middle Ground
Asking for statistical fairness across a small number of coarsely defined groups leaves us open to unfairness on structured subgroups we didn't explicitly designate as protected (what you might call "Fairness Gerrymandering").  On the other hand, asking for statistical fairness across every possible division of the data that can be defined ex-post leaves us with an impossible statistical problem. (Consider, that every imperfect classifier could be accused of being "unfair" to the subgroup that we define, ex-post, to be the set of individuals that the classifier misclassifies! But this is just overfitting...)

What if we instead ask for statistical fairness across exponentially many (or infinitely many) subgroups, defined over a set of features we think should be protected, but ask that this family of subgroups itself has bounded VC-dimension? This mitigates the statistical problem --- we can now in principle train "fair" classifiers according to this definition without worrying about overfitting, assuming we we have enough data (proportional to the VC-dimension of the set of groups we want to protect, and the set of classifiers we want to learn). And we can do this without making any assumptions about the data. It also mitigates the "Fairness Gerrymandering" problem --- at least now, we can explicitly protect an enormous number of detailed subgroups, not just coarsely defined ones.

This is what we (Michael Kearns, Seth Neel, Steven Wu, and I) propose in our new paper. In most of the paper, we investigate the computational challenges surrounding this kind of fairness constraint, when the statistical fairness notion we want is equality of false positive rates, false negative rates, or classification rates (statistical parity):
• First, it is no longer clear how to even check whether a fixed classifier satisfies a fairness constraint of this sort, without explicitly enumerating all of the protected groups (and recall, there might now be exponentially or even uncountably infinitely many such subgroups). We call this the Auditing problem. And indeed, in the worst case, this is a hard problem: we show that it is equivalent to weak agnostic learning, which brings with it a long list of computational hardness results from learning theory. It is hard to audit even for simple subgroup structures, definable by boolean conjunctions over features, or linear threshold functions. However, this connection also suggests an algorithmic approach to auditing. The fact that learning linear separators is NP hard in the worst case hasn't stopped us from doing this all the time, with simple heuristics like logistic regression and SVMs, as well as more sophisticated techniques. These same heuristics can be used to solve the auditing problem.
• Going further, lets suppose those heuristics work --- i.e. we have oracles which can optimally solve agnostic learning problems over some collection of classifiers C, and can optimally solve the auditing problem over some class of subgroups G. Then we give an algorithm that only has to maintain small state, and provably converges to the optimal distribution over classifiers in C  that equalizes false positive rates (or false negative rates, or classification rates...) over the groups G. Our algorithm draws inspiration from the Agarwal et al. paper: "A Reductions Approach to Fair Classification".
• And we can run these algorithms! We use a simple linear regression heuristic to implement both the agnostic learning and auditing "oracles", and run the algorithm to learn a distribution over linear threshold functions (defined over 122 attributes) that approximately equalizes false positive rates across every group definable by a linear threshold function over 18 real valued racially associated attributes, in the "Communities and Crime" dataset. It seems to work and do well!

In Conclusion...
I've long been bothered by the seemingly irreconcilable gap between individual and group notions of fairness. The individual notions of fairness have the right semantics, but they are unimplementable. The group notions of fairness promise something too weak. Going forward, I think these two schools of thought on fairness have to be reconciled. I think of our work as a small step in that direction.