## Tuesday, February 26, 2019

### Impossibility Results in Fairness as Bayesian Inference

One of the most striking results about fairness in machine learning is the impossibility result that Alexandra Chouldechova, and separately Jon Kleinberg, Sendhil Mullainathan, and Manish Raghavan discovered a few years ago. These papers say something very crisp. I'll focus here on the binary classification setting that Alex studies because it is much simpler. There are (at least) three reasonable properties you would want your "fair" classifiers to have. They are:
1. False Positive Rate Balance: The rate at which your classifier makes errors in the positive direction (i.e. labels negative examples positive) should be the same across groups.
2. False Negative Rate Balance:  The rate at which your classifier makes errors in the negative direction (i.e. labels positive examples negative) should be the same across groups.
3. Predictive Parity: The statistical "meaning" of a positive classification should be the same across groups (we'll be more specific about what this means in a moment)
What Chouldechova and KMR show is that if you want all three, you are out of luck --- unless you are in one of two very unlikely situations: Either you have a perfect classifier that never errs, or the base rate is exactly the same for both populations --- i.e. both populations have exactly the same frequency of positive examples. If you don't find yourself in one of these two unusual situations, then you have to give up on properties 1, 2, or 3.

This is discouraging, because there are good reasons to want each of properties 1, 2, and 3. And these aren't measures made up in order to formulate an impossibility result --- they have their root in the Propublica/COMPASS controversy. Roughly speaking, Propublica discovered that the COMPASS recidivism prediction algorithm violated false positive and negative rate balance, and they took the position that this made the classifier unfair. Northpointe (the creators of the COMPASS algorithm) responded by saying that their algorithm satisfied predictive parity, and took the position that this made the classifier fair. They were seemingly talking past each other by using two different definitions of what "fair" should mean. What the impossibility result says is that there is no way to satisfy both sides of this debate.

So why is this result true? The proof in Alex's paper can't be made simpler --- its already a one liner, following from an algebraic identity. But the first time I read it I didn't have a great intuition for why it held. Viewing the statement through the lens of Bayesian inference made the result very intuitive (at least for me). With this viewpoint, all the impossibility result is saying is: "If you have different priors about some event (say that a released inmate will go on to commit a crime) for two different populations, and you receive evidence of the same strength for both populations, then you will have different posteriors as well". This is now bordering on obvious --- because your posterior belief about an event is a combination of your prior belief and the new evidence you have received, weighted by the strength of that evidence.

Lets walk through this. Suppose we have two populations, call them $A$s and $B$s. Individuals $x$ from these populations have some true binary label $\ell(x) \in \{0,1\}$ which we are trying to predict. Individuals from the two populations are drawn from different distributions, which we'll call $D_A$ and $D_B$. We have some classifier that predicts labels $\hat\ell(x)$, and we would like it to satisfy the three fairness criteria defined above. First, lets define some terms:

The base rate for a population $i$ is just the frequency of positive labels:
$$p_i = \Pr_{x \sim D_i}[\ell(x) = 1].$$
The false positive and false negative rates of the classifier are:
$$FPR_i = \Pr_{x \sim D_i}[\hat\ell(x) = 1 | \ell(x) = 0] \ \ \ \ FNR_i = \Pr_{x \sim D_i}[\hat\ell(x) = 0 | \ell(x) = 1].$$
And the positive predictive value of the classifier is:
$$PPV_i = \Pr_{x \sim D_i}[\ell(x) = 1 | \hat\ell(x)=1].$$
Satisfying all three fairness constraints just means finding a classifier such that $FPR_A = FPR_B$, $FNR_A = FNR_B$, and $PPV_A = PPV_B$.

How should we prove that this is impossible? All three of these quantities are conditional probabilities, so we are essentially obligated to apply Bayes Rule:
$$PPV_i = \Pr_{x \sim D_i}[\ell(x) = 1 | \hat\ell(x)=1] = \frac{ \Pr_{x \sim D_i}[\hat\ell(x)=1 | \ell(x) = 1]\cdot \Pr_{x \sim D_i} [\ell(x) = 1]}{ \Pr_{x \sim D_i}[\hat \ell(x) = 1]}$$
But now these quantities on the right hand side are things we have names for. Substituting in, we get:
$$PPV_i = \frac{p_i(1-FNR_i)}{p_i(1-FNR_i) + (1-p_i)FPR_i}$$

And so now we see the problem. Suppose we have $FNR_A = FNR_B$ and $FPR_A = FPR_B$. Can we have $PPV_A = PPV_B$? There are only two ways. If $p_A = p_B$, then we are done, because the right hand side is the same for either $i \in \{A,B\}$. But if the base rates are different, then the only way to make these two quantities equal is if $FNR_i = FPR_i = 0$ --- i.e. if our classifier is perfect.

The piece of intuition here is that the base rate is our prior belief that $\ell(x) = 1$, before we see the output of the classifier. The positive predictive value is our posterior belief that $\ell(x) = 1$, after we see the output of the classifier. And all we need to know about the classifier in order to apply Bayes rule to derive our posterior from our prior is its false positive rate and its false negative rate --- these fully characterize the "strength of the evidence." Hence: "If our prior probabilities differ, and we see evidence of a positive label of the same strength, then our posterior probabilities will differ as well."

Once you realize this, then you can generalize the fairness impossibility result to other settings by making equally obvious statements about probability elsewhere. :-)

For example, suppose we generalize the labels to be real valued instead of binary --- so when making decisions, we can model individuals using shades of gray. (e.g. in college admissions, we don't have to model individuals as "qualified" or not, but rather can model talent as a real value.) Lets fix a model for concreteness, but the particulars are not important. (The model here is related to my paper with Sampath Kannan and Juba Ziani on the downstream effects of affirmative action)

Suppose that in population $i$, labels are distributed according to a Gaussian distribution with mean $\mu_i$: $\ell(x) \sim N(\mu_i, 1)$. For an individual from group $i$, we have a test that gives an unbiased estimator of their label, with some standard deviation $\sigma_i$: $\hat \ell(x) \sim N(\ell(x), \sigma_i)$.

In a model like this, we have analogues of our fairness desiderata in the binary case:

• Analogue of Error Rate Balance: We would like our test to be equally informative about both populations: $\sigma_A = \sigma_B$.
• Analogue of Predictive Parity: Any test score $t$ should induce the same posterior expectation on true labels across populations: $$E_{D_A}[\ell(x) | \hat \ell(x) = t] = E_{D_B}[\ell(x) | \hat \ell(x) = t]$$
Can we satisfy both of these conditions at the same time? Because the normal distribution is self conjugate (that's why we chose it!) Bayes Rule simplifies to have a nice closed form, and we can compute our posteriors as follows:
$$E_{D_i}[\ell(x) | \hat \ell(x) = t] = \frac{\sigma_i^2}{\sigma_i^2 + 1}\cdot \mu_i + \frac{1}{\sigma_i^2 + 1}\cdot t$$
So there are only two ways we can achieve both properties:
1. We can of course satisfy both conditions if the prior distributions are the same for both groups: $\mu_A = \mu_B$. Then we can set $\sigma_A = \sigma_B$ and observe that the right hand side of the above expression is identical for $i \in \{A, B\}$.
2. We can also satisfy both conditions if the prior means are different, but the signal is perfect: i.e. $\sigma_A = \sigma_B = 0$. (Then both posterior means are just $t$, independent of the prior means).
But we can see from inspection these are the only two cases. If $\sigma_A = \sigma_B$, but the prior means are different, then the posterior means will be different for every $t$. This is really the same impossibility result as in the binary case: all it is saying is that if I have different priors about different groups, but the evidence I receive has the same strength, then my posteriors will also be different.

So the mathematical fact is simple --- but its implications remain deep. It means we have to choose between equalizing a decision maker's posterior about the label of an individual, or providing an equally accurate signal about each individual, and that we cannot have both. Unfortunately, living without either one of these conditions can lead to real harm.