## Can Game Theory Help Us Choose Among Fairness Constraints?

*This blog post is about a new paper, joint with Christopher Jung, Sampath Kannan, Changhwa Lee, Mallesh M. Pai, and Rakesh Vohra.*

A lot of the recent boom in interest in fairness in machine learning can be traced back to the 2016 Propublica article Machine Bias. To summarize what you will already know if you have interacted with the algorithmic fairness literature at all --- Propublica discovered that the COMPAS recidivism prediction instrument (used to inform bail and parole decisions by predicting whether individuals would go on to commit violent crimes if released) made errors of different sorts on different populations. The false positive rate (i.e. the rate at which it incorrectly labeled people "high risk") was much higher on the African American population than on the white population, and the false negative rate (i.e. the rate at which it incorrectly labeled people as "low risk") was much higher on the white population. Because being falsely labeled high risk is harmful (it decreases the chance you are released), this was widely and reasonably viewed as unfair.

But the story wasn't so simple. Northpointe, the company that produced COMPAS (They have since changed their name) responded by pointing out that their instrument satisfied predictive parity across the two populations --- i.e. that the

*positive predictive value*of their instrument was roughly the same for both white and African American populations. This means that their predictions conveyed the same meaning across the two populations: the people that COMPAS predicted were high risk had roughly the same chance of recidivating, on average, whether or not they were black or white. This is also desirable, because if we use an instrument that produces predictions whose meanings differ according to an individual's demographic group, then we are explicitly incentivizing judges to make decisions based on race, after they are shown the prediction of the instrument. Of course, we now know that simultaneously equalizing false positive rates, false negative rates, and positive predictive values across populations is generically impossible --- i.e. it is impossible except under very special conditions, such as when the underlying crime rate is exactly the same in both populations. This follows from thinking about Bayes Rule.

Another sensible notion of fairness suggests that "similarly risky people should be treated similarly". This harkens back to notions of individual fairness, and suggests that we should do something like the following: we should gather as much information about an individual as we possibly can, and condition on all of it to find a (hopefully correct) posterior belief that they will go on to commit a crime. Then, we should make incarceration decisions by subjecting everyone to the same threshold on these posterior beliefs --- any individual who crosses some uniform threshold should be incarcerated; anyone who doesn't cross the threshold should not be. This is the approach that Corbett-Davies and Goel advocate for, and it seems to have a lot going for it. In addition to uniform thresholds feeling fair, its also easy to see that doing this is the Bayes-optimal decision rule to optimize any societal cost function that differently weights the cost of false positives and false negatives. But applying a uniform threshold on posterior distributions unfortunately will generally result in a decision rule that neither equalizes false positive and false negative rates, nor positive predictive value. Similarly, satisfying these other notions of fairness will generally result in a decision rule that is sub-optimal in terms of its predictive performance.

Unfortunately, this leaves us with little guidance --- should we aim to equalize false positive and negative rates (sometime called

*equalized odds*in this literature)? Should we aim to equalize positive predictive value? Or should we aim for using uniform thresholds on posterior beliefs? Should we aim for something else entirely? More importantly, by what means should we aim to make these decisions?

## A Game Theoretic Model

One way we can attempt to choose among different fairness "solution concepts" is to try and think about the larger societal effects that imposing a fairness constraint on a classifier will have. This is tricky, of course --- if we don't commit to some model of the world, then different fairness constraints can have either good or bad long term effects, which still doesn't give us much guidance. Of course making modeling assumptions has its own risks: inevitably the model won't match reality, and we should worry that the results that we derive in our stylized model will not tell us anything useful about the real world. Nevertheless, it is worth trying to proceed: all models are wrong, but some are useful. Our goal will be to come up with a clean, simple model, in which results are robust to modelling choices, and the necessary assumptions are clearly identified. Hopefully the result is some nugget of insight that applies outside of the model. This is what we try to do in our new paper with Chris Jung, Sampath Kannan, Changhwa Lee, Mallesh Pai, and Rakesh Vohra. We'll use the language of criminal justice here, but the model is simple enough that you could apply it to a number of other settings of interest in which we need to design binary classification rules.

In our model, individuals make rational choices about whether or not to commit crimes: that is, individuals have some "outside option" (their opportunity for legal employment, for example), some expected monetary benefit of crime, and some dis-utility for being incarcerated. In deciding whether or not to commit a crime, an individual will weigh their expected benefit of committing a crime, compared to taking their outside option ---- and this calculation will involve their risk of being incarcerated if they commit a crime, and also if they do not (since inevitably any policy will both occasionally free the guilty as well as incarcerate the innocent). Different people might make different decisions because their benefits and costs of crime may differ --- for example, some people will have better opportunities for legal employment than others. And in our model, the only way two different populations differ is in their distributions of these benefits and costs. Each person draws, i.i.d. from a distribution corresponding to their group, a type which encodes this outside option value and cost for incarceration. So in our model, populations differ e.g. only in their access to legal employment opportunities, and this is what will underlie any difference in criminal base rates.

As a function of whether each person commits a crime or not, a "noisy signal" is generated. In general, think of higher signals as corresponding to increased evidence of guilt, and so if someone commits a crime, they will tend to draw higher signals than those who don't commit crimes --- but the signals are noisy, so there is no way to perfectly identify the guilty.

Incarceration decisions are made as a function of these noisy signals: society has a choice as to what incarceration rule to choose, and can potentially choose a different rule for different groups. Once an incarceration rule is chosen, this determines each person's incentive to commit crime, which in turn fixes a base rate of crime in each population. In general, base rates will be different across different groups (because outside option distributions differ), so the impossibility of e.g. equalizing false positive rates, false negative rates, and positive predictive value across groups will hold in our setting. Since crime rates in our setting are a function of the incarceration rule we choose, there is a natural objective to consider: finding the policy that

*minimizes crime*.
Lets think about how we might implement different fairness notions in this setting. First, how should we think about posterior probabilities that an individual will commit a crime? Before we see an individual's noisy signal, but after we see his group membership, we can form our

*prior*belief that he has committed a crime --- this is just the base crime rate in his population. After we observe his noisy signal, we can use Bayes rule to calculate a posterior probability that he has committed a crime. So we could apply the "uniform posterior threshold" approach to fairness and use an incarceration rule that would incarcerate an individual exactly when their posterior probability of having committed a crime exceeded some uniform threshold. But note that because crime rates (and hence prior probabilities of crime) will generically differ between populations (because outside option distributions differ), setting the -same- threshold on posterior probability of crime for both groups corresponds to setting*different*thresholds on the raw noisy signals. This makes sense --- a Bayesian doesn't need as strong evidence to convince her that someone from a high crime group has committed a crime, as she would need to be convinced that someone from a low crime group has committed a crime, because she started off with a higher prior belief about the person from the high crime group. This (as we already know) results in a classification rule that has different false positive rates and false negative rates across groups.
On the other hand, if we want to equalize false positive and false negative rates across groups, we need an incarceration rule that sets the same threshold on raw noisy signals, independently of group. This will of course correspond to setting different thresholds on the posterior probability of crime (i.e. thresholding calibrated risk scores differently for different groups). And this will always be sub-optimal from the point of view of predicting crime --- the Bayes optimal predictor uniformly thresholds posterior probabilities.

## Which Notions of Fairness Lead to Desirable Outcomes?

But only one of these solutions is consistent with our social goal of minimizing crime. And its not the Bayes optimal predictor. The crime-minimizing solution is the one that sets

*different*thresholds on posterior probabilities (i.e. uniform thresholds on signals) so as to equalize false positive rates and false negative rates. In other words, to minimize crime, society should explicitly commit to*not*conditioning on group membership, even when group membership is statistically informative for the goal of predicting crime.
Why? Its because although using demographic information is statistically informative for the goal of predicting crime when base rates differ, it is not something that is under the control of individuals --- they can control their own choices, but not what group they were born into. And making decisions about individuals using information that is not under their control has the effect of distorting their dis-incentive to commit crime --- it ends up providing less of a dis-incentive to individuals from the higher crime group (since they are more likely to be wrongly incarcerated even if they don't commit a crime). And because in our model people are rational actors, minimizing crime is all about managing incentives.

This is our baseline model, and in the paper we introduce a number of extensions, generalizations, and elaborations on the model in order to stress-test it. The conclusions continue to hold in more elaborate and general settings, but at a high level, the key assumptions that are needed to reach them are that:

- The underlying base rates are rationally responsive to the decision rule used by society.
- Signals are observed at the same rates across populations, and
- The signals are conditionally independent of an individual’s group, conditioned on the individual’s decision about whether or not to commit crime.

Here, conditions (2) and (3) are unlikely to hold precisely in most situations, but we show that they can be relaxed in various ways while still preserving the core conclusion.

But more generally, if we are in a setting in which we believe that individual decisions are rationally made in response to the deployed classifier, and yet the deployed classifier does not equalize false positive and negative rates, then this is an indication that

*either*the deployed classifier is sub-optimal (for the purpose of minimizing crime rates), or that one of conditions (2) and (3) fails to hold. Since in fairness relevant settings, the failure of conditions (2) and (3) is itself undesirable, this can be a diagnostic to highlight discriminatory conditions earlier in the pipeline than the final incarceration rule. In particular, if conditions (2) or (3) fail to hold, then imposing technical fairness constraints on a deployed classifier may be premature, and instead attention should be focused on structural differences in the observations that are being fed into the deployed classifier.