# The Emerging Theory of Algorithmic Fairness

>>Hey, welcome everyone. I’m delighted this

afternoon to introduce our AI distinguished

lecturer Cynthia Dwork, who many of you know. Cynthia is one of the most

accomplished technical women who I’ve ever been around. She has four fancy titles. Right now, she’s

the Gordon MacKay Professor at Harvard in the Computer

Science department. She’s also the Radcliffe

Alumna Professor at the Radcliffe Institute

for Advanced Studies, and she’s an Affiliate Faculty

at the Harvard Law School. In addition to that, she’s a distinguished Scientist

at Microsoft Research. She’s was one of

the inaugural class of distinguished scientists

here about a decade or so ago, maybe

a little bit more. Cynthia received her PhD at

Cornell with John Hopcroft. From there she did a couple of year Postdoc with

Nancy Lynch at MIT, and then went on to

spend a fair amount of time at IBM Almaden, where she did a lot of foundational work that

I was not all that familiar with till earlier today. The more I knew about

that she did there was something on

rank aggregation, which was very important

in web search. She also did work on the

cryptography foundations, as well as proof of work, which is one of the underlying

foundations of Bitcoin. At MSR, she is well known for

two other strands of work. One is on differential privacy, putting privacy on

theoretical firm ground, and it’s so incredibly

important today. What she’ll talk about

with us this afternoon is more nascent work on

algorithmic fairness. This work, and her lifelong

accomplishments have been acknowledged by more prizes, and I name, more prizes, and the fancy titles I

mentioned at the outset. She’s obviously

a fellow of the ACM. She won the Dijkstra Prize. More than a decade ago, she won the Go del prize. She’s one of the very few

people who are members of both the National

Academy of Science, and National Academy

of Engineering. She’s an amazingly

a depth scientist, and one who picks problems that have real practical impact, which I think is

a really unique skill. So, with that introduction, I’d like to introduce Cynthia, who will tell us about some of her work on algorithmic fairness.>>Thank you very much for

that gracious introduction. Thank you for inviting me, and I’m really very

happy to be here. I have a couple of questions before I proceed with the talk. So, first of all,

how many people here are working in Machine

Learning and AI? Okay. How many people here have worked on

fairness particularly? All right, and any theoretical

computer scientists? Great. All right, good. That helps me figure out

what I should be saying. This is a research effort that actually began at Microsoft, now defunct Silicon Valley Lab. Started in approximately 2010. I should mention that all

of the good slides in the talk are due to my student

Cristina Elemental. Here’s our problem. We’re interested in

algorithmic fairness, will be more specific in

a minute. We have a population. Our population is diverse

in many different ways, ethnic diversity,

religious diversity, geographic, medical, and so on. Well, whatever it

is we want to do, we want to be sure that somehow or other

we’re being fair. Typical examples would

be something like this. We have a bank that receives detailed user

information about you, before giving you

your online experience. You surf on over to some bank, and before they

show you anything, they consult with

tracking networks, and find out stuff about you. The concern, and this was

a concern that was raised in a newspaper article in

the Wall Street Journal, in a series that they ran in 2010 called, “What they know.” It was mostly about privacy, but it also raised

the following fairness concern. The concern was that of

the illegal practice of steering minorities to credit card offerings of

less desirable terms. The financial industry

is more heavily regulated than

many other industries, and so this problem

really was a legal one, not just “a fairness one.” Just to dispense with

some obvious things, one suggestion is

of course to hide the sensitive information

from the classifier. Whether you’re

a member of a minority group or not would be hidden. That falls victim to

a couple of things. First of all, there’s red lining. Who knows the term red lining? Great. We’ll see a picture

of it in a minute, and on the web

there’s web lining. This is basically say

when zip code is used as a redundant encoding of race. It’s illegal to

discriminate based on race, it’s not illegal to

discriminate based on zip code. We have this issue now in

general of redundant encoding. Sensitive attributes

can be embedded holographically in

your other attributes. The other is a Birds of

a Feather phenomenon. Has anybody seen

this picture before? This was an undergraduate

research project at a class at MIT. This is not a

full-fledged studying, but essentially what the students found was the following. If you’re male,

and something like five percent of your friends on Facebook are

self-described as gay. Then you’re very

likely to be gay. You may not choose to release this information

about yourself, but from this Birds of a Feather flock

together phenomenon, it can be determined from

the attributes of your friends. Hiding the sensitive attribute

is a little dicing. Now, there’s another problem with hiding the

sensitive information. That is that in some sense culturally aware

algorithms can be more accurate than algorithms

that survive. If you have a culturally

aware algorithm, it can make very good use of this sensitive information

to be more accurate. In a typical example,

and by the way, let me introduce some populations that I’m going to be

using in the talk. We all like food or

most of us like food. Some people like to have their food flavored

with the herb sage, and the rest prefer to have their food flavored

with the herb thyme. So, my groups are going to be the thyme eaters and

the sage eaters. Generally, sage will

be the minority group, but not everything

will require that. I’m going to be

calling them S and T. Maybe you have a situation where hearing voices is a very common religious

experience among the sage eaters, but among the thyme eaters it’s a diagnostic criterion

for schizophrenia. This is a real medical example. So, knowing whether or not

one is a sage eater or a thyme eater permits better classification, more

accurate classification.>>Now, as you all know, in machine learning we

train on historical data. Historical data is often biased, and so we can’t just say train

on the data because we’ll imbibe the biases in the

mother’s milk of training data. This brings up

a really important problem, which is that in general, there is no general source of ground truth for the things

that we’re trying to do. So, we have to figure out how to extract fairness

when we possibly can, keeping this in mind. We can’t just go to

the training data. Now, who knows what

this is a picture of? Okay. All right. This is a picture from

the Sistine Chapel, it’s Michelangelo’s depiction

of the last judgment. So, we have Jesus in the middle, and on his left are the people who are being condemned to hell, and on the right are people who are going

to go to heaven. So, this is a binary

classification situation. You can have a ternary

classification situation. So, here, we have some slides

from brain tissue. Over on the left, this is cancer, on the right, is healthy brain tissue, and in the middle we have

cancerous cells on the left. These cells, they’re not cancerous but they’re

not normal either, they’re affected by the tumor. So, we might want a ternary, a more complicated classification system for something like this. Here’s another

classification problem. I’m calling this a dependent

classification problem. So, let’s say that you’re

trying to hire people. You’re job is not just to distinguish the qualified

from the unqualified, or even gradations among that, you have a more

complicated problem. Maybe there are hundreds of qualified people and

well-qualified people, but you only want to

interview 10 of them because, after all, you have

finite resources. So, you want to fairly select a cohort of

people to interview. So, that’s a classification

problem where you can’t just classify each person individually as to whether they’re

good enough to interview, you have to somehow constrain it. Another kind of classification

problem is scoring. This is something for

evaluating arrested children to determine whether or

not they should be eligible for

detention or release, whether they should be

detained or released. So, it’s a scoring system. You can see sorts of things like, there are points for

the various kinds of offenses that they committed

this particular time. Then, there are

other sorts of points for their prior offense history. There are negative points that come from

mitigating factors. These scores are then converted

into a binary decision, whether to release or to detain. Depending on the definition

that you’re using, fairness in scoring

will or will not result in fairness in the decision. I still haven’t defined anything. Now, we want to have a mathematically

rigorous approach to algorithmic fairness. Historically, there’s

a way of trying to take these big goals and translate them into

a research program that should lead to

some practicable outcome. So, the first item in the

basic paradigm is definitions. What are you trying to do? In our case, this is going to be, what do we mean by fair. So, one way of doing this is that you can define

fairness by thinking about what it is you’re

trying to prevent. So, in a good

cryptographic tradition, we imagine an adversary, and the adversary

is trying to force unfairness and you want

to defend against this. So, what are the various

behaviors for goals that the adversary may have that

you want to defend against? So, you define fairness. Then, what you want to do, start with a definition, then you want to create algorithms, and you want that

your algorithms will enforce whatever this thing

is that you’ve defined. So, once we have

the definition of fairness, then we want to build a

classification algorithm that we’ll be fair according

to that definition. Finally, we need a

composition theorem. So, think about the systems that you might use

that classify people. Let’s say, you’re

determining whether they’re appropriate for

this ad or that ad. You don’t just

classify them once, you classify them over, and over, and over again for all sorts

of different kinds of ads, and many different times of day, and all sorts of

things like that. Similarly, if you’re doing a privacy preserving

data analysis, nobody just wants one privacy

preserving statistic, they want a whole

ton of statistics. Cryptography, you don’t just see one message or one does

digital signature, you’ve got digital signatures on tons of different messages, and encryptions of tons

of different messages. So, you need to prove some sort

of theorem that talks about how these things

combine in practice, and those are called

composition theorems. So, here, what we would want is we would want

to say that if you have a system based on you

know made a fair pieces, then the system somehow

is fair in total. So, this is a general paradigm

that has been successful, particularly in cryptography

and in privacy. So, can we apply

this to fairness? The short answer is, “I think we probably can

but it’s way harder in my experience than in

the other scenarios.” Defining fairness is very, very difficult and very,

very controversial. I don’t believe it’s

going to be possible to come up with a single definition that the technical community can come up with

a single definition that the population as a whole

would say, “Yes that’s it.” I don’t think that’s

going to happen.>>Building algorithms

that’s our bread and butter once we have definitions. There’s been a lot

of work already on building algorithms, some of it very recent

and very exciting. So I’ll talk

a little bit about that, and composition turns

out to be very tricky and so we’ll see a little bit

about that as well. So let’s talk about

defining fairness. I said that we think about what it is

an adversary is trying to do. So think about for example,

digital signature schemes. What do you want from

a digital signature scheme? You’re all going to say

the same thing right away. Nobody should be able to forge, then you have to get into

some details about all of the right quantifiers

about what does it mean to say nobody

should be able to forge, but forgeries are what

you’re trying to prevent. So, what are we trying

to prevent here? So one of our starting scenarios was actually in the context

of advertising. So, advertisers might be biased and we want to

build a platform that will mitigate the biases of

the advertisers and be fair. So, we literally sat

around and said, “Okay, what are some of

the kinds of things, a really nasty ill-intentioned

advertiser might do?” So red lining we talked

about and that’s real. Red lining in loans

was absolutely real. Housing loans for example

in the United States. There’s a fabulous book

you should read by Richard Rothstein

called, The Color of Law, which talks about the effect of the US government

in these biases. There’s reverse tokenism. So let’s say for example, you turn me down

for a loan I say, “Oh you turned me down

because I have curly hair,” and they point

to Sergei and they say, “Obviously, he’s way more

qualified than you are. He has straight hair and

we turned him down too.” The thing is they’re using him as the example for all of

the curly haired people. So that’s a reverse tokenism. Then there’s what

I’m going to call deliberately targeting

the wrong subset of the sage eaters and

I’m going to actually postpone that example

for a slid or two. So we literally just

sat down and wrote a whole bunch of negative things that we wanted to try to prevent. Is this a reasonable approach? Well, I don’t know another one and one of

the advantages with this sort of approach is

let’s say that you build a system that protects against all these things

that we’ve listed. That’s nice. Then somebody

comes along and says, “You missed one,”

you say, “Great.” Now you redesign your system to protect against

the new one as well. In the meantime,

you’ve been protecting against these first

behaviors anyway, so something happened that

was better than nothing. When most people think about fairness the way

I’ve introduced it, they’ll immediately think

in terms of groups and in fact the way I introduced

it with suggestive of that. So what are group fair? Group fairness properties

are basically statistical requirements about the treatment

of different groups. So statistical parity,

which is also known as demographic

parenting, says that, let’s say if we’re just

doing binary classification positive and negative, it says that the demographics

of the people with positive classification

should be the same as the demographics of

the general population. So the same proportions are being positively

classified for sage eaters and time eaters as

in the population as a whole and also the same for the negative classifications. Now, this is nice. When is it really meaningful? As we’ll see, it’s mostly

meaningful in the breach. If you see some

incredible departure from statistical parity, that’s a red flag. Doesn’t mean that things have

been unfair necessarily, but it certainly is a nice place

to go looking to see, why did this happen? So, it can be meaningful in the breach but it

permits what we call, as I said targeting

the wrong subset of S. So I have a spa, it’s a luxurious spa and I really don’t want

those sage eaters at my spa, but I have to

advertise to them in the same proportion as their proportion in

the population as a whole. So what do I do?

How do I be biased? I advertise to the members of S who can’t afford

to come to my spa. So I’ve deliberately targeting the ones who can’t come and therefore I’m ensuring that my spa isn’t gaining

new sage eaters. But among the time eaters, I advertise to the wealthy

and they can come. So demographic

parody, oops it’s not enough and you have to look

at it much more carefully. There’s also this question

of the intersectional cases, which leads to what’s colorfully called

fairness gerrymandering. So, we have our population. I’ve divide it into the

sage eaters and the time eaters, but they’re also the tea drinkers and the coffee drinkers. What we maybe really want is that if we look at the four different

intersection categories, we should have

proportional representation for each of these categories. It would be very unfair

if we only showed the ad to sage eating

coffee drinkers. So you can fill in

demographic groups for where you really clearly want

this intersectional parody. Statistical parody by itself does not protect against this kind

of fairness gerrymandering. So statistical parody was one definition of

fairness for groups, there are other

definitions of fairness for groups and then we have to face the fact that some group fairness properties

are mutually incompatible. So how many of you

know this story? So many of you do but

most of you don’t. So, I have a classifier. So let’s say that we are classifying people

and we are predicting. So we’re classifying as to whether we think you’re

going to do something, we think you’re

going to recidivate versus we don’t think

you’re going to recidivate. So importantly here, there’s information that’s

going to be missing, like the future hasn’t happened yet and whether

someone released from jail recidivates may

depend on many things that happened to them during

their experiences. So one thing you might want reasonably is

that you would have an equal false positive rates across your different

demographic groups. Another might be

that you would want an equal false negative rates across the different

demographic groups. You might also want equal positive predictive

value which is among those who are

predicted to recidivate, how many of them actually do? That’s the positive

predictive value. I chose these three,

these aren’t my only three but these

are very famous, because it’s known that no imperfect classifier

can simultaneously ensure equal false positive

rates across groups and equal false negative rates and equal positive

predictive value, unless the base rate is

the same across groups. So unless the base rate of recidivating is the

same across groups. The reason for this, as Chouldechova points

out in her paper, is that these quantities are

related by this equation. So we could write down for the sage eaters

the false positive rate, the false negative rate and the positive predictive value

and they would satisfy this equation where P would be the base rate among

the sage eaters. We would then write down the same equation

for the time eaters. Now, requiring

equal false positive rates across the two groups means that this term would have to be the

same in the two equations. So then these blue guys have to be the same and that blue term would have to be the same and the only way

this could happen is if the base rate P is the

same in the two groups. When the future is unknown, so therefore classification

would be imperfect, no algorithm can

simultaneously ensure these three very natural to desiderata unless

the base rate is the same. Nobody cares how

that algorithm is implemented. So if you believe in

free choice or randomness, you get the same thing

whether the algorithm is a machine or whether

it’s some other form.>>So, it does not make sense to say let’s solve this problem by bringing a human

into the loop. Does nobody could bring into the loop that would

address this problem. It’s just the math. So as you can probably tell, I find a group fairness notions dissatisfying for

a number of reasons. I’m sharing some of

these reasons with you. Now this brings us to

a very famous video. How many people have seen this? Not all, great.

I mean not even half. So, this is an experiment with Capuchin Monkeys,

they’re paired up. The monkey has to

do a task which is to hand you a rock and then

the monkey gets a reward. So, let’s see, sorry.>>So, she gives a rock

to us that’s the task. We give our piece of

Cucumber and she eats it. The other one needs

to give a rock to us, and that’s what she does. She gets a Grape. If the other one sees that, she gives a rock to us

now gets again Cucumber.>>Okay. So that brings us to the notion that I prefer

which is Individual Fairness. Individual Fairness

says intuitively that, “For a given classification task, if two people are

similar with respect to that classification task then they should be

treated similarly.” Now to make any sense

out of this, we need the right

notion of what it means to be similar

or dissimilar. We need some kind of

a distance notion. So, a classic example is in credit scoring where

the difference in credit scoring literally gives you

a metric for how dissimilar a pair of

people are considered to be for credit worthiness. Okay. Of course, again, I have to emphasize this

is a Task-Specific Notion. So, Sergei and I might be in fact comparably qualified for a loan but for recommending hair products who would

be completely different. So we’re similar

for one purpose but totally different

for another purpose. So, it’s very important

that when we talk about these metrics they’re

Task-Specific. Now, thinking about

credit scores, maybe people in

general are sort of smushed uniformly from highly qualified to very unqualified. So, when you’re trying to decide whether to

give a loan based on whether somebody

crosses a threshold, you may have two people

who are very close and you give one of them the loan and you deny it to the other one, right? If the scores are in fact sort of densely packed

along this line, then indeed that has to happen. So, you sort of have to

draw the line somewhere, that’s the general worry. Then you would have

very similar people who are being treated

very dissimilarly. So, what do the theoreticians

say that you should do in this case? You randomize. So, instead of deciding yes definitely for you and

definitely not for you, you assign to people

a probability of getting the loan or

a probability of getting an A depending on

their creditworthiness. You arrange that if

two people are very similar, then the probability

distribution on outcomes that they see is

very similar. Yes?>>Two policemen were just

fired for tossing a coin to decide whether to arrest

someone for speeding ticket. So, they may have

heard your lecture.>>No. I say is there are circumstances under which it might be

the right thing to do. Police have huge amounts

of discretion, and if they treat similar people similarly

maybe that’s fine.>>But maybe instead

of randomizing, one could just scale the loans. So instead of having

a fixed amount that you then, you could change the amount

of loan until reaches zero.>>You could certainly

do something like that. I would have to think through

all of the ramifications. But, there are cases where you must make

a binary decision. So, we’re going to take this as our

definition of Fairness. So, we have a set of outcomes. In this case, it’s

yes or no that’s O. This is the probability

distributions on outcomes. So, we want to find

a classifier that maps individuals in our universe to probability distributions on outcomes with

the following condition. For some appropriate measure of the difference of

two probability distributions, for example total

variation distance, we want that the difference in these distributions will be bounded by the distance

of these two individuals. So, it’s a Lipschitz constraint, the distance under the appropriate

tasks specific metric. Now, of course this immediately says where does

the metric come from? How are you going to get

your hands on the metric. I don’t have

a great answer to that, but I also believe that if we actually had a classification system that

people thought was fair, then we could extract

from it a metric.>>So, how do we ensure

that the metric is fair?>>That’s exactly what

I just said, we don’t know where to get

our hands on the metrics. So, when we did

this work initially, we said that at the very

best it was going to be societies best guess at the moment in clearly

what we would need. I’m now understanding

more and more that these things are

going to be handled by political processes, that there’s no other way

of getting these things other than combining stakeholders and measurements

to get somewhere. But we don’t have

a straight mathematical answer for where we’re going

to get the metric. Yes?>>Is the sort of tension between>>Who watches their

watchmen as an argument or the best person to decide

if I’m going to offend you, is you not me. It doesn’t mean the rule

of fairness is it to be a case that in society

will dictate that, we’ll go back into

the group fairness. So, for individual fairness, the toughest nut to crack

is that the source of the metric is the subject to which the decision

will be applied. As in the case of the monkey, the monkey was the one

decided, “This is not fair.” So, any hands as to how to integral and other societal and

individual fairness or separate them properly?>>So, I don’t have a

mathematical answer to that yet. What I’m starting to engage

in very heavily now is, what is the right way of

thinking about it politically. I don’t mean

political correctness, I mean political

theory and ethics. But, it’s an unknown, and work on it guys. It really makes sense, it’s an important problem. Maybe ask me more

about it at the end.>>Are we right people

to work on it?>>So, not by yourselves, unless you have

a whole very broad education. Inevitably, there’s going to be legislation and regulation. We have to be sure that

what gets legislated and regulated is not

technically stupid, that it’s not infeasible. People will say, I think there’s a real risk of very bad law

and regulation. So, I think that

the technologists have an important role to play in helping to

steer away from that. Also, I think that

technologists sort of hearing more about

the concerns and the issues, they have some of

the tools needed to try to move those and act

on those concerns. So, we are sort of, but not by ourselves. Yes?>>How do we know that

the metric is even computable? For example, if a brain is

a universal Turing machine, then they can exist. The computer will function

but not serving in extreme machine

[inaudible] machine to, a computer will number,

so it’s just a metric.>>So, I’m hearing

about two-thirds of your words, but you seem to be saying, how do we know that it’s

computable in the sense of how do we know that

an information is available? How do we get our hands

on the information, which would give us the metric?>>If you suppose that our brain is

a universal Turing machine, and it got eschewed

any art training machine, the Nell computer will

also can map from a universal Turing machine to a metric, computable metric.>>So, I’m not sure what decision problem you’re

trying to work in here like whether the brain stops on itself when

it gets itself, I don’t know exactly. Maybe asked me after, I’m sorry. I’m going to go on. Let’s talk a little bit about algorithms. So, first, as I said, these are challenges

that we are familiar. So, in the ad setting, we imagined the situation where we actually knew

our universe of individuals, and we even gave the advertiser the opportunity

to say for each individual and

each potential outcome how bad is it for this individual to be

mapped to that outcome. So, what’s a loss

that’s associated incurred by mapping

this individual to outcome, o. It’s just something that we

are adding optionally to allow the vendors to

weigh in on things. Then, we assemble

the ingredients of the science fiction metric and the vendor’s loss functions. We want to minimize the vendor’s loss subject

to the fairness conditions. So, we’re viewing the

loss functions as soft constraints and the theorists conditions

as hard constraints. This is just a linear program. So, in theory, this

can be solved, and in fact, there’s

an interesting observation, which is sort of evocative

of those theorems about what you can’t do when

the base rates are unequal. If you solve the fairness

linear program, you get a classifier that gives

you a statistical parity, if and only if, intuitively that

the distributions are the same. So, the Earth mover

distance between the uniform distribution on the sage eaters and

the time eaters is zero. So, that was something where we assumed that we knew

the whole universe, very recently Rothblum

and Yona have shown how to relax this a little bit

to what they’re calling probably

approximate fairness, and then they get

generalizability. We didn’t generalize

in our initial result. So, that’s a 2018 result. Now, I mentioned

that there’s been a lot of progress

in recent years. So, for example, Hardt, Price, and Srebro took

equal false-negative rate and equal false-positive rate, and optionally also equal

false-positive rate as a fairness requirement, and they showed how to convert unfair predictors

to fair predictors. Not for individual fairness, their notion of fairness,

these group fairness. Something that I think that

you guys might appreciate is, they did this all

in post-processing. So, as you know, if you’re trying to

get a product group to use one of your tools, it’s much better

if you don’t have to interfere in their pipeline, and you can stick something

at the beginning or the end. So, that’s something they did. Then, in other work,

Joseph Kearns, Morgan Stern, and Roth looked

at a different problem. They were looking at

a bandit setting. So, you have basically one arm for each demographic group. So, in our example, there

would be two groups, but you could be more general. They’re worried about the fact that you may have a lack of training data for

your minority group. So, for the sage leader, you don’t have a lot

of training data. What they require is

that at each round, let’s say, think about

this in terms of loan. At each round, one person from each demographic group comes and you have to choose

one to give a loan to, or at most one to give a loan to. Maybe what you’ll do is

you’ll actually choose a probability

distribution and you will then choose somebody

from that distribution. The requirement is that

a worse applicant will never be favored in terms of

these probabilities over a better one

despite uncertainty. So, they study how to minimize the regret and learn

a fair policy. So, they look at

the difference between the regret that you get when you enforce this kind of

fairness condition, which basically costs you while you were learning about

the minority group. Once you know

the minority group well, you can predict with confidence, but when you don’t, can’t. So, they show that there’s a gap, that it’s more expensive. You have higher regret

when you’re learning a fair policy versus regret when fairness is not a concern.

There was a question?>>Yes. So, what is the fairness

that are used here?>>They’re using individual

fairness at each round. Among the people who have

shown up at that round.>>The probability that you choose usually shouldn’t differ.>>Right, shouldn’t be. Actually, no sorry.

There’s something a little bit more worse. Just a worse applicant

is not favored over a better applicant. Yes, sorry. I misremembered. That’s the one for this paper. So, it was 2016, 2017 was

a really interesting year. So, two independent groups of researchers started looking at some of these similar questions. So, now, we’re getting into the intersectional case

that I mentioned before. So, imagine that you have, for the first paper, this is where the gerrymandering

nomenclature came from. Imagine that you have a couple

of sensitive bits for now. Lets say, K sensitive bits. So, sage versus time, and coffee versus tea, and a couple of

other sensitive bits. This gives you two

to the K subsets.>>Suppose all of

those subsets are still large, they’re not tiny relative to

the population as a whole. So, this is sort of large

intersections of large set. All right, so, they’re studying the intersectional case and they want to learn a classifier

that will essentially ensure, well they looked at

a few different notions, statistical parody, and equalizing the false

positive or false negative rates across these intersectional sets. So, I described it as being

defined by say K bits, all of the subsets

to those K bits. But you can have other ways of describing your

collection of sets, like a set of all the sets that can be described

by small circuits. So, as long as the set is

these sets are large enough, they want to try to guarantee these notions of fairness

across the groups. They also showed that the general problem

of what they call fairness auditing is hard. So I give you a classifier and these descriptions and perhaps

implicit of these groups, how do you determine

whether there is a group to whom

you’re being unfair? When one of these

intersectional things to whom you’re being unfair, and they showed hardness by reduction from agnostic learning. So, the reason they

can have an algorithm as well as these negative

results is as you all know machine learning often does interesting things even when

in theory they’re hard. Very similar questions were

looked at by Hebert-Johnson, Kim, Reingold and Roth. Wait I made a mistake

this is Roth Blom. The other one is Roth

this is Roth Blom. They studied calibration. So suppose my predictor

is now giving a score, and so I’m going to rate people some number between zero and one. Think about this as

my prediction about your probability of

recidivating, for example. So, the system is calibrated if among those that are rated v, the fraction who truly

will end up with a positive label

is close to v. So, in other words it’s requiring

correctness on average for those rated v for all v. So, they got results that

were very similar but they’re using

this kind of notion of correctness as fairness through the calibration version. If you’re familiar

with the Kleinberg, Mullainathan, and

Raghavan result, this is the kind of

calibration notion there or related

calibration notion. They argue that this makes more sense than these other

group fairness notions but other than that they

get very similar results. So, for every set defined by a circuit in

a predetermined set C, they get this approximate

calibration simultaneously. Subsequent work by Michael Kim

and James Zoe has shown that they’ve used this to improve some medical predictors. They’ve gotten some nice results. Then, how long is my talk?>>[inaudible].>>Okay, I’ll talk for

just a little more then. Okay, so, the next batch

of results deals with algorithmic work

that tries to bridge the gap between group fairness and

individual fairness by assuming that we have some sort of oracle that tells us for we can ask about certain collections

of edges and say, “What are the distances

on these edges?” And then it leverages that to do powerful things without requiring you to ask about all pairs. In the deep learning community, there’s a whole different

approach which is adversarial learning or learning a fair representations and

adversarial version of that. So, this is a group

fairness notion. Oh shoot. Sorry, all right let

me skip this slide. Okay, so the general idea here is we start with

instances in some set X and we’re going to learn a new representation for people that in some sense retains as much information as

possible while suppressing some key attributes such

as whether somebody is or is not a member

of a sage eater, whether they’re saying

eater or a time eater. And they’re using

gens and so we’ve got an adversary that is trying to distinguish

given a z whether it came from the sage eater X

or a time eater X. The key point in this intuition is the predictor is going to operate on

the representation Z. If the predictor is not so powerful as

the adversary who is trying to distinguish sage eaters from time eaters and who has trained the encoder to make that representation not

yield distinguishability, the predictor is no more

powerful than A than in some sense its predictions

can’t be biased. This is really,

really intriguing. It is a group notion

of fairness and I would love to see

individual fairness worked in here but this is another very interesting

direction of research. So, in the last five minutes I’m going to talk

about composition. Because this is really

interesting and this is new. So, our intuition is, if we have a system in

which we have a bunch of individually fair

pieces then when we look at the system that’s

made of these as a whole, it’s still at least

relatively fair. And the answer is,

it’s complicated. So, here’s a very simple example. Suppose let’s make the example really simple you’re looking at the New York Times

website and there is a single slot for a banner ad

at the top of the page. That’s going to be our scenario. As you know, there’s an auction

run for your attention. Advertisers are competing

to get that slot. So, let’s look at the case where we have two kinds of advertisers. One is advertising tech jobs and the other one is advertising

a grocery delivery service. So, the tech jobs advertiser and the grocery delivery

service advertiser are competing for your attention. Ideally if our classifier

for determining who is a a good bet for the tech

jobs ad is fair, and if our classifier

for determining who’s a good bet for

the grocery ad is fair, then somehow the system

will be fair as a whole. But we want to require that

in this complex system, it’s still the case

that people who are similarly qualified for tech jobs should be similarly likely to see tech job ads and same

thing for groceries. So, suppose that

the grocery advertiser routinely outbids

the tech jobs advertiser. It’s an auction,

the outbidder wins. So we can even think of it as just the outbidder goes first. The grocery advertiser

is going first or its bidding more, all right? And so the tech company is now essentially being

reduced to the leftovers, the ones who weren’t grabbed

by the groceries advertiser. This is true again whether

it’s done temporally first one than the other or whether it’s just

done by the money bids. So, this is a major problem. It says that no matter how qualified you are

for a tech jobs ad, if you’re very qualified

for groceries, you’re not going to

see the tech jobs ads. Now who is qualified for

the grocery service, the grocery delivery service? New parents. Advertisers drool over the things they can start advertising to new parents and those tend not to be the tech

jobs advertisers, and in fact, the tech jobs

advertisers you might say shouldn’t touch

the parenthood bid with a 10 foot pole. So, even if they are really appropriately not looking

at parenthood status, you would think it was

appropriate not to be looking at parenthood status

in determining who should see the ad there’re competitors for your attention are looking at that

and that’s affecting what happens under

the tech jobs advertiser. Okay and there are versions of this particular problem both for individual fairness and

for group fairness. So, this problem doesn’t go away. No time for these things. There is a way out here though

which I will mention which is we could fix a probability distribution

on the advertising tasks. Any probability

distribution we want. Then we pick a task according

to that distribution, and then we run

the classifier just for the chosen task and the proof that this gives us individual fairness

is very, very simple. But it leaves money on

the table because there are some people for whom we are just not going to be showing ads. So, that raises a number of very interesting

economics questions for those who are interested in the intersection of

algorithms and economics. Because of time, I

will skip the rest.>>Final comments. First of all, Interpretability

and Causality. There are big literatures on interpretability and

causality for fairness. Why is this slide left blank? First of all, I have

tried very hard to make headway with

interpretability, and I have failed. I don’t understand

what’s being asked, and how it can be achieved. There’s a very interesting

article called the Mythos of Interpretability

by Zachary Lipton. He explains, he catalogs, these are some things

that people could want, and here are some of the issues. It’s very, very thoughtful, and I’m not done

thinking about this, but I’m stumped on this one. As far as causality, it’s problematic for a number

of reasons that I can see. So, first of all,

if you’re using, say Pearl’s theory

of causal reasoning, you need a generative model. Now, maybe you’re familiar with the saying that all

models are wrong, but some models are useful. So, if your definition of fairness ties the

classifier to the model, and you need those two together to determine

if something is fair, and your model is wrong, you may very well come up

with the wrong conclusion. So, that’s a glimpse into what some of

the difficulties I’ve encountered with causality are. Another brief comment as we’ve

talked about the metric, we’ve talked about the issues of where do you get the metric from, there’s still problems. So, here, suppose we have

news organization, a nice news. Your news organization

has an environment, there’s randomness

in the environment, you have an individual x, there’s randomness in x, but you can talk about, although you can’t

get your hands on, the probability that

this individual x will succeed at nice news. It’s well-defined

once you specify the coins for this randomness and the coins for this randomness. Again, you can’t get

your hands on it, but at least it makes

sense mathematically. Now, what you might require is that if two people

are similar, meaning they have similar

probabilities now in success. And success can be

very crisply defined, like you stay in the company

for at least three years, and you have

at least one promotion during that time, okay? You might say that if you

have two individuals who are so equally likely to succeed, then they should

have very similar probabilities of being hired. That seems to make some sense, so the metric then is capturing the

probability of success. But, what happens if

this is not nice news, but it’s nasty news, which has a history of being

very hostile to sage eaters. So, sage eaters can’t succeed no matter how

talented they are, or they’ll have

a real uphill struggle. Should the metric, in fact, be capturing whether people are equally likely to succeed? Should it be capturing whether

they’re equally talented? What’s the right

thing to do here? It’s not clear.

So, final remarks, truth is elusive,

it’s not remedy. This problem is not remedied

by having a computer. They’ve argued, I really believed that metrics are really at the heart of everything even if it may take us

a long time to find them, but we advocate always

sunshine for the metric. The metric should

not be kept secret, it has to be open for debate and discussion

and refinement. Computers are not necessarily

worse than humans, they may be more accurate, especially on the easy cases, they may be more easily tested, they don’t get tired

when you keep throwing examples at them to see

what they have to do. That’s it. Thank you.>>There’s a question.>>Sure. Go ahead. Sorry, one behind, and then you.>>So, I’m really

[inaudible] a lot. I’m sure you thought

about this and might even mentioned it in

the previous slide, when you have curious

figure of two thing, so in considering

the individual there, if we’re looking at time eaters, time eaters always

teach their kids, and there is this terrible

sage brush infection, anyone who eats

sage get very ill, also there is this terrible

historical injustice right off the sage eaters. They just did weigh less

good, apparently everything, they’re so focused on

handling this illness. So, it just don’t happens via this historical coincidence, the sage eaters are almost never like a similarly

situated individual, they compete with

the time eaters. It seems like on

that individual fairness metric, we would just say,

okay no problem, and if we’re designing jobs, and it’s the time eaters

they get all of them, whereas we’re not

considering metric of group aspect and I think somewhere along [inaudible] societies

[inaudible] historical justice.>>That’s, of course,

an excellent point, and in the paper, we described what we call

Fair Affirmative Action, which is a method for

dealing with this, where we break certain of the Lipschitz constraints

for exactly this reason, but it’s a principled

approach to this thing. I should mention that it’s not so unlike what the state

of California does, and the state of Texas does

with respect to college. That if you’re in the top

10 percent of your class, then you’re admitted to one

of the state in California, it’s one of the UCs,

and in Texas, it’s one of the state

colleges, I guess. The point is that schools in different neighborhoods

may have very, very different

performances on tests, and so on, and so forth, but

it’s the same basic idea. So, we add a bit to that, it’s not quite as course as

taking the top 10 percent. There’s a very interesting work

by John Roemer at Yale, who is very concerned with theories of

distributive justice, and he thinks a lot

about education. He thinks that you must

invest, and invest, and invest in children

in order to compensate for differences of situation, and that as you go through

the schooling process, you do less and less of

this compensating because the children having been given a good foundation or more,

theoretically, I guess, more responsible for how they

make use of what they have, but in particular, I remember he said the following in the context of

college admissions. Stratify the students according to the education level

of the mother, and within each stratum, look at the cumulative

distribution function describing how many hours a week the students

spend on homework, and view people in

the various percentiles in that CDF from the different

strata, as being comparable. One of the points he makes, which I found extremely

compelling is that, if you’re brought up in a home

where nobody is educated, then you may not

even think that it’s possible to spend 10 or 15 hours

a week on homework. It just doesn’t cross

your mind to do it, or you might be working, and unable to do it. I think that these sorts of insights from the social

scientists are very important in what we are doing

is in some sense analogous to that. It’s

a very good point.>>So, that whole conversation is basically related

to this nasty news, choosing the metric thing, and it comes down to questions of what the condition on

mediating variables and stuff. So, basically, the rest of the talk didn’t address

those types of issues, like how to choose

the metric fairly. Is there any work that uses graphical models or other ideas

for causal inference to, because we have

this intuitive idea, oh, well we should condition on these things we want

to make a fair metric, but we definitely don’t want to condition on these other things. Is there any work that

tries to formalize that?>>There is work. It’s very interesting, and if you’re interested, I can point you to

a bunch of papers. The one difficulty with some

of this work is that you can have the same

observable distribution on data that’s consistent with very different

causal models, and so that just causes problems.>>[inaudible] models

is just really hard.>>So, I don’t know what

to say about that yet. I was really excited when I saw that there was

a counterfactual fairness paper. I thought, “Yes,

this is it, this is it,” but then we found two different models

that gave rise to the same distribution, and the classifier, and the

thing was fair under one, and not under the other.

So, what do you do?>>Yes.>>Suppose I want to get

a chess team for my school.>>You want to get what?>>I’m sorry, select

the chess team of three people to represent

my school at some company. What I’d probably do

to try and measure how good the people are and pick the best three by measurement, and see if they fit

your definition of individual fairness.>>Yes.>>The more accurately

I can measure people, the less randomness there is, and the less fair it is.>>You introduced the randomness.>>There is some random.>>You say that the algorithm

introduces the randomness.>>I mean, if you just take

what people typically do, they’re not going to introduce randomness out of the fact

but there will be randomness and because some

people win, some people lose. There’s randomness

in how well you can measure people’s skill. It seemed like by

your definition, the more accurately we can

measure people’s skill, the less randomness there is, the less fair

the selection process.>>If you’re saying that you

can get very, very, very.>>Maybe, wait a minute, Y is a completely random process more fair than a meritocratic Y.>>Right. Okay. This is the kind of problem

that shows up, in fairness, all the time. I mean, from

this particular example, you’d hit the nail on the head. In the definition of

individual fairness, buried in there is the idea

that if you simply flipped coins all the time,

that would be fair. If all you did was decide whether someone could be on the chess

team by flipping a coin, certainly similar people

would be treated similarly. What we’re not ensuring is that dissimilar people are

treated dissimilarly. This is where we want to throw in they will minimize

the loss subject to the fairness conditions

and we want to incorporate some notion of

loss in the decision process. We want to stay fair but we also want to maximize utility.>>I think it’s not

fair to be [inaudible] a fairness constraint has to

be random. That’s simple.>>I know what you’re saying, and you can probably come up with even better examples of where randomness is unfair or where treating everybody

the same way is unfair. Often, it’s possible to reframe the problem so

that you’re pushing those concerned into

the objective function rather than the fairness. Yes. By the way, we thought that we

were so original with similar treated

similarly but of course, Aristotle says the same thing, and Aristotle says dissimilar should be treated dissimilarly. Since we were thinking

initially about advertising, and I was thinking

about how when you look at a physical newspaper, everybody gets the same ads. There’s no targeting. That seems pretty fair. In terms of exposing young people to ads

for luxury goods, does that help form

aspirations or not? Should you be saying

that poor people should not see poor children, shouldn’t see nice things

or pictures of nice things? There are people who work on these kinds of questions from the psychological and

sociological point of view and they think that yes, the ads are very tied in with the formulation

of the self and that having a real difference in

what people see is harmful. So, I don’t know. Yes.>>When hearing the conversation of individual fairness metrics, I was thinking about

drug treatment effects and where you’re doing these randomized

controlled assignments. You’re thinking about,

here’s the population, let me find someone from this other population

that matters well, see the treatment effect when different treatments

are assigned. Then really, I can imagine a different sort of problem

here where you really want the classifier to behave

like a placebo for these matched individuals rather than as a drug with

some treatment effect. That actually boils down not to the individual fairness where you’re assigning this metric on every pair of individuals in the population but only

like across groups.>>Right.>>Then you do allow meritocracy

within a group but you want placeboness across groups. It’s good to address this.>>This is really interesting

and I don’t know. I mentioned that in

the causality literature, there’s been work sort

of in the Pearl model. In the Rubin model,

I don’t know of any. Maybe there, but I

don’t know of any, and I want to look at that. I think this is definitely

worth thinking through, so very good question.

Wait, someone, yes?>>I was wondering

about something. This title is about

this going towards the theory of modeling fairness. I was thinking, have people tried to turn it around

and say that you’re going to accept that

things are going to be unfair and we want to actually quantify the degree of

how unfair things are. In other words, instead of treating them as

fairness constraints, try to put them into

maybe equivalence classes of the way people study program classes and say B and D. Do you have any thoughts

in that direction?>>I only know of one theoretical result

along those lines, which when I was talking about when individual fairness

implies statistical parody. In fact, there’s a tighter

characterization of how unfair it is in terms of the move or distance

between groups. I don’t know of, I guess, some of the very recent works

do look at what happens if you relax

things a little bit like that probably

approximately fair notion. You’re saying, “Okay,

we’ll introduce a little bit of slack, what can we get from it?” There is some of that, but I don’t know of

wholesale saying, “Okay. Well, we’re just going to

treat these people this way.” I don’t know.>>I was wondering, in

differential privacy, there’s this trade

off of, let’s say, the epsilon parameter you said to get somewhere on that curve. You give up privacy

to get more accuracy.>>Yes. If you think about it, the definition of

fairness is very similar in flavor to

the definition of differential privacy but it doesn’t behave the same way

under composition. Yes. There is

one result also about using a particular algorithm from differential privacy

to ensure fairness, where the trade-off

that you talk about would perhaps be relevant. Yes. Anyone else? Yes.>>I heard a question

about the practice. This problem is complicated. You’ve given that very

beautiful summary of the related work and talking about

how it’s going to take a long time to

reach maybe consensus, maybe not, but

some legislation on this. At the same time,

computers are not necessarily worse than humans and people are trying

to put mission and models into the world

in different domains.>>RIght.>>What is your prescription about what needs to

be done right now? Do we stop and wait until we really understand

these things better? Do we apply some

reasonable approach because something is

better than nothing? What is the practitioner’s guide of what needs to be done?>>I feel wildly unqualified to answer that question

but I’ll try anyway. Even if I thought, and I don’t, but even if I thought it

was appropriate to wait, that’s a nonstarter in

the business world. Facebook is not going to stop. You just cannot stop industry

where it is, I think. Just, the political forces

are way too strong. Now, what you can do, of course, is start creating test sets, test sets where there were at

least the people who create the sets believe they know

what the outcomes should be. For example, algorithmic assistance in bail determination. It’s happening in many states. I can’t remember the number, I’m sure it’s at

least 12 but I can’t remember why that number

isn’t in my head. It’s happening in

a lot of states. One thing that we would want, I don’t know how useful

it is to see the code because code can be hard to

understand, but certainly, you would want that

somehow appropriate people and organizations should

be able to throw test cases at the algorithm

and see what it does. I would say really intensive

monitoring with the best golden sets

you can come up with and lots of people doing it, I think that’s where

we are for now. Yes, okay. Because

monitoring, you can do.>>Yeah.>>So, if you just do what people are doing and

trying to maximize your accuracy and release

these algorithms in the wild, are there any instances where it’s clear

that it went bad?>>Where it’s clear that

the algorithm has gone bad?>>Yes.>>Lots of cases.>>Like what?>>Speed bump. Is that

what it was called? Street bump? I forget. So, pothole detectors that will report to the

municipality about potholes. But this was an app on an iPhone, and so the only areas that got the reports of the potholes

for the wealthy areas.>>But it seems like,

that’s an issue where it’s not as accurate

as he would like. So, the question is->>It is certainly

an unfair impact->>Inaccurate. Then okay, that’s probably like

you’d like to fit, but the people making

that app would like to be more accurate and that’s what

they’re trying to do.>>No. No, I don’t understand. If the app isn’t being brought

over the right potholes, if it’s not make penetrating

into the poor areas of town, it’s just not giving you

the information you want. The system as a whole is unfair. So, you’re saying, is there

a classification algorithm?>>Lucy, that’s like

the minority group, in which it has less data

so it’s less accurate. Is roughly->>It had, I mean, I’m not sure that it was

inaccurate where it was, it just wasn’t coverage. There was lack of coverage.>>Maybe you don’t

understand because like->>Okay. Do you want to go

ahead and provide an example?>>Yes. So compass, so if you define ->>Wait, so you’re going

to start with compass. The problem with compass is

that they were test fair.>>Right, for those criteria

but they weren’t for- or positive and false

positive, false negative.>>Yes. But they were test

fair and it’s impossible to simultaneously be

test fair and have equal false positive rates

and have equal false negative rates. So->>I’m looking for one

where it’s clear that it’s unfair by if you talk to

a random person on the street, comes very clearly unfair.>>So->>College admissions?>>Huh?>>College admissions and wealth is probably

a really good example.>>So affirmative action

is controversial, right?>>This is part of the issue. When I began the talk by saying that there’s

consensus at least, there’s more consensus on privacy than there

is in fairness. So is affirmative action fair? What do you think? There will be a range of opinions here as to

whether it’s fair or not. Have there been

examples historically that news classification

algorithms that were very biased? Yes. The answer is yes.>>When we talk about

fairness and classification, it seems like some

classification problems are less important than others. For example, if you’re trying to decide what ads to show some that sort of

caused people problems, but if a car is trying to decide which person to

kill its classification->>So here’s what I think

we should do about cars. Is that your question or not?>>Well, I mean, what

are you going to say?>>Oh, thank you. All right. So for a while, I

thought “Oh my God, we can’t do anything

until we figure out what’s right to do

in all situations.” Then thinking about the cars, I thought well suppose

you have an algorithm that what you do is you learn

what do average people do. Maybe you used

the Moral Machine for this. Maybe you have some other way of figuring out statistically which you learn what sort of

a random person would do. And then your car

driving algorithm should do the following, when it gets into one of

these really touchy situations, it should do what

a random person does, and the rest of

the time, it should just do it automated driving. So this would be much better than humans because in

all the usual situation, it’ll be more accurate. In the really wacky ones, it’ll be no worse than humans. So altogether, it’s better.>>And you say, oh what

a random person would do, what does that really mean? I mean, I think one way, one intuition is that everyone is equal so there’s no way-.>>So what a random

person would do. So with some probability

distribution and you just draw from that distribution and you deal with

that person does.>>Is a shifting

down the problem, like when do you switch mode? Is it really the shifting

the decision there that?>>So I was thinking that it would not be difficult to know when to switch modes, that most of the time was

totally clear what to do and then sometimes there

was a moral question, and then you see you

go to your database of what would people do

and you consult that.>>It would also seems like

it would just reinforce societal biases if you’re sampling from a bias sample

whereas people are biased. Even if you pick a

random, that’s going to be a biased decision.>>Okay.>>I have a question

about the- sorry.>>Last question. Go ahead.>>Okay. I have a question

about the data part. It seems like a lot

of this classroom just that the asset because these kind of problems that

fairness problems are figure any bias like

recidivism the assets, that people tend to be arrested at higher

rates, you know, even what kind of

people the base rate really starts off being

bias their sense. So maybe, like

income example for instance, like maybe, you don’t see so many women CEOs and therefore, if you are trying to do a model where you’re predicting pay, it tend to be fewer highly

paid women historically, so you don’t even have the

data in the first place.>>Right.>>So it seems to me that there should be some kind of work, like just working

in the data space, trying to maybe hallucinate what fair training data

will look like, something.>>There are efforts that involve what I would call

massaging training data, exactly as you’re suggesting, I don’t know how to put them on a sort of

firm theoretical foundation, but people have thought about

those sorts of things, yes. As you say, they sort of, you could hallucinate some data to change your training set. I don’t know a lot

of work on doing that and how you would

do it, but I guess, I mean now that you mentioned it, Statisticians do this sort

of thing all the time when they weight

their samples. So something maybe

could be doable, but then you have to decide how are you going

to do that waiting.>>The last comment is like the Rubin counterfactual

fairness approach?>>What was it?>>Rubin’s style

counterfactual fairness. So not [inaudible]

, Rubin’s style. You would be trying to

weigh up the women CEO because she’s very rare sample

compared to CEO. So when you actually train on it, you way up the woman

CEO a little bit more, and you pass these courses

to find that way, yeah.>>Yeah. So as I said, I think it’s a really good idea

to try to pursue that direction and

see what it gives.>>There was a nice

similar paper about that.>>Cool. Awesome, yeah.>>Okay, good. All right.>>Thank you very much.>>Thank you all.

Communists trying to warp and destroy AI logic for the sake of equal outcomes.

that guy at 0:07:00 hit the nail on the freaking head. 100 points. random guessing is most "fair" over the variables but not with regards to the actual task and the performance of the model.

The best part of video is at 1:04:53 This theory suggests that randomness is more fair than meritocracy. How can this be even considered to be fair?