How to Remove Gender Bias in Machine Learning Models: NLP and Word Embeddings

Most word embeddings used are glaringly sexist, let us look at some ways to de-bias such embeddings.

Note - This article provides a review and the arguments made by Bolukbasi et al. in the paper “Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings”. All graphical drawings are made using draw.io.

Word Embeddings are the core of NLP applications, and often, they end up being biased towards a gender due to the inherent stereotype present in the large text corpora they are trained on. Such models, when deployed to production can result in further widening of gender inequality and can have far fetched consequences on our society as a whole.

To get a gist of what I’m talking about, here is a snippet from Bolukbasi et al., 2016 “Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings”,

"As an example, suppose the search query is “cmu computer science phd student” for a computer science Ph.D. student at Carnegie Mellon University. Now, the directory offers 127 nearly identical web pages for students — these pages differ only in the names of the students. …

However, word embeddings also rank terms related to computer science closer to male names than female names. The consequence is that, between two pages that differ only in the names Mary and John, the word embedding would influence the search engine to rank John’s web page higher than Mary."

So, what is a Word Embedding?

Word Embeddings are a form of representation of vocabulary. They are vector representative of words, where the spatial closeness determines the similarity or context between words.

For reference, here are four words represented by vectors. As expected, ‘Dog’ and ‘Cat’ are closer to each other since both of them represent animals, whereas, ‘Mango’ and ‘Apple’ are closer to each other since they represent fruits. On the contrary, both the groups are far away from each other since they are not similar to each other.

In this diagram, the vector is two-dimensional for easier visualization, however, most Word Embedding models like Word2Vec, GloVe, etc. are of several hundred dimensions. For this article, we’ll be using Word2Vec for all the examples.

Problem

In our vocabulary, there are words which are gender-neutral, for example, ‘football’ and ‘receptionist’, whereas some words are gender-specific like ‘brother’ and ‘father’. Through various studies [1], the gender-neutral word embeddings acquire stereotype and bias.

Direct & Indirect Bias

Words like ‘receptionist’ are much closely related to females while ‘football’ is closer to males. This is called direct bias. There are also numerous cases, where the bias is not direct, but rather known via linking advanced features, known as indirect bias. Words like ‘bookkeeper’ are much closer to ‘softball’ than ‘football’ that may result indirectly from female associations with 'bookkeeper', 'receptionist' and 'softball'.

Vector distance between males and surgeons is same as females and nurses, indicating direct gender bias

Our primary goal is to reduce bias in gender-neutral words, preserve the gender factor in gender-specific words, all while conserving the useful properties of word embeddings.

Bolukbasi et al., “Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings”, NIPS 2016

In this figure, the x-axis captures the difference between the word embeddings he and she, whereas the y-axis denotes the gender neutrality, where words above the axis are gender-neutral in nature, and words below the axis are gender-specific. Our aim is to collapse the words above the horizontal line to the y-axis to remove all bias.

Solution

There are three steps, Identify Gender Subspace, Neutralize & Equalize as proposed by Bolukbasi et al. [1].

(1) Identify Gender Subspace

A word embedding consists of hundreds of dimensions. To remove the gender bias, we first need to identify the dimension (also called subspace) or dimensions (there can be more than one dimension) where the bias is captured in the embedding.

Identifying Gender Subspace by taking the differences between sets of gender-specific words

We first take the difference between sets of word embeddings (denoted by e) which define the concept of gender (e.g. ‘male’ & ‘female’, ‘he’ and ‘she’, etc.). Then, the bias subspace is calculated by taking the SVD of these differences. One simpler way, intuitively, is taking the average of the differences to broadly capture gender.

(2) Neutralize

After obtaining the bias direction b, we’ll remove the bias components from all gender-neutral words like receptionist and surgeon by subtracting the embedding’s projection onto the bias axis b (using the dot product of e & b).

Removing the bias component in gender-neutral words [1]

After neutralizing, the bias in a gender-neutral word is removed.

(3) Equalize

For the final step, we take care of the gender-specific words. Words like boy and girl should differ by gender equally i.e. the word ‘boy’ should not be any more masculine as the word ‘girl’ is feminine.

Equalizing Gender-Specific Words [1]

Intuition — For every gender-specific word, we equalize their vector lengths, such that, the gender component is preserved with equal strength in all pairs of words. Furthermore, it enforces that all gender-neutral words are equidistant from gender-specific words. E.g. receptionist is equidistant from boy and girl.

After equalizing, the gender component is equal in gender-specific words

Through all these steps, we can remove the gender-bias in word embeddings, while preserving their useful properties. Some people argue that such word embeddings since trained on large word corpora, capture the statistical reality rather than bias and are acceptable, e.g. more number of computer programmers are males, so it is okay to say that word ‘programmer’ is closer to males than females.

In the end, it all depends on the context on which these embeddings will be used. There are some scenarios such as university applications where it is important for word embeddings to be unbiased and gender-neutral, while some scenarios simply do not need the debiasing. What do you think?

References

[1] Bolukbasi et al., Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings, (2016) NIPS

[2] Andrew Ng et al., Sequence Models, Coursera Course