Reproduction and Extension of "Queens are Powerful too: Mitigating Gender Bias in Dialogue Generation"
This repository is setup to provide access to the code and data which was used in writing the paper Reproducing and extending “Queens are Powerful too: Mitigating Gender Bias in Dialogue Generation”. The original paper Queens are Powerful too: Mitigating Gender Bias in Dialogue Generation utilizes ParlAI for training, evaluation, and text generation. The model used in these papers is an 88M parameter transformer with 16 attention heads, 8 encoder layers, 8 decoder layers, and embedding size of 512. This model is pretrained on the large Reddit dataset with approximately 1.7 billion comments. In all experiments conducted in this paper, this pretrained model was fine-tuned on the LIGHT dataset with some form of data augmentation or training to mitigate the gender bias in the conversations in the game. In large corpora, there is potential for text to contain gender bias, which the models can learn from and generate gender biased text. The goal of these papers is to mitigate gender bias present in corpora and prevent models from generating gender biased text. In order to accomplish this task, the original paper created three bias mitigation techniques.
- Counterfactual Data Augmentation
This augmentation method uses a set of gendered words and replaces them with their opposite. For example, it swaps king for queen or priest for priestess. - Positively-Biased Data Collection
This method utilizes positively biased, crowd-sourced data to shift the gender bias to a more neutral level. Although the amount of data infusion is about 10% of the original dataset, it has a huge effect on shifting the bias present in the dataset. - Bias Controlled Training
This method puts each label into a bucket based on the type of bias present in the text. For example, if the text includes male gendered words but no female gendered words, it is assigned to the f0 m1 bucket, but if both female and male gendered words are present, it assigns it to the f1 m1 bucket. The bucket labels are added to the end or training features for the model to learn from when generating a response. - Neutral Generated Data
This method is an extension to the original paper. For this method, we originally train a model using counterfactual data augmentation and bias controlled training, and use it to generate responses for the entire training data. Next, we go through the generated text and 90% of the time pick the generated responses that are neutral, such as those belonging to the f0 m0 bucket or f1 m1 bucket with a relatively equal number of male and female gendered words. We use the neutral generated responses to reconstruct the conversations and then train a new model on these neutral conversations.
In the original paper, Dinan et al. use the ParlAI infrastructure to run the experiments. The genderation bias github repository gives the necessary information on the paper. The general Facebook research ParlAI GitHub repository gives information about the pretrained model and datasets used for various research projects, including the LIGHT dataset.
The entire source code is documented and sectioned for ease of access in an iPython notebook. Each section in the notebook includes a particular experiment's code and results, which are described in the paper. In addition, extension code and results are provided in the iPython notebook. The code is organized in such a way that to run the notebook for the very first time, you need to run the initial setup section and then the experiment code, starting with the general training subsection. However, all subsequent times, the regular setup can be used, followed by the desired experiment cell. This code is best suited for running on Google Drive with Google Colaboratory with access to a PyTorch compatible GPU, but with minor modifications it can be run on a regular or other virtual machines. More description on how to use the code is provided on the iPython notebook provided in the src folder.
The dependencies for this project are all met when running the iPython notebook on Google Drive with Google Colaboratory with access to a PyTorch compatible GPU, but if this is not the machine you are using, you need the following dependencies to be met:
- PyTorch with access to a compatible GPU
- ParlAI
- subword-nmt
- Numpy
- NLTK
Additionally, general python 3.7.10 libraries need to be installed, such as re, copy, os, json, pickle, random, sys.
To download the data, please install ParlAI using:
pip install parlaiOnce ParlAI is installed on the machine, use the command below from the paper's ParlAI page to download all the data from the original paper.
parlai display_data -t light_genderation_biasThis data is also available in the data folder of this repository for ease of access.
The pretrained model is automatically downloaded when using the code provided in the iPython notebook in the src folder, but the following command will also download the pretrained model trained on the Reddit dataset used in this paper.
parlai interactive -mf zoo:tutorial_transformer_generator/model All the code and commands for data preprocessing, training, evaluation, and paper extensions are provided in the iPython notebook in the src folder. In addition, ParlAI's documentation is quite helpful for commands not used in the notebook.
Below are the results from reproducing the experiments done by Dinan et al. and for extensions to these experiments. The experiment extensions are both aimed at addressing the high time and monetary cost of positively biased data collection, which requires crowdsourcing data. The first of these extensions is fine-tuning the pretrained Reddit model on the data generated from counterfactual data augmentation and using bias controlled training to determine the impact of excluding positively biased data collection. The second extension still fine-tunes the model using counterfactual data augmentation and bias controlled training, but also includes neutral data we generate (the process for generating this data is described in the "Neutral Generated Data" section above). The results below give the percent gendered words (number of gendered words out of the total number of words in the generated responses), percent male bias (number of male gendered words out of the gendered words), and F1 score for each model for four bins: F0M0, F+M0, F0M+, and F+M+. The test data is split into bins based on the presence of gendered words in the label (the next response in the conversation). F0M0 means there are no gendered words in the label. F+M0 means there is at least one female gendered word but no male gendered words in the label. F0M+ means there are no female gendered words but at least one male gendered word in the label. F+M+ means there is at least one female and one male gendered word in the label. Discussion of the results is included in our paper.
Figure 1 below shows how each bias mitigation technique is used to mitigate gender bias in the generated text. The plots separate the data into buckets used for bias controlled training to show how these techniques mitigate bias in the generated text. These plots also show how well bias controlled training gives control to the model when generating text by telling the model what type of data it must generate via passing the bucket as part of the features in the episode.
Figure 1: Results for Reproducing the Experiments in Original Paper by Dinan et al.
Figure 2 below shows the results from the original paper and the results from our extensions to the original paper. The two extensions are using counterfactual data augmentation and bias controlled training techniques without the positive-biased data augmentation, and counterfactual data augmentation and bias controlled training when adding our neutral, generated data for data augmentation. The results suggest that adding neutral generated utterances instead of the crowd sourced positively-biased data collection can yield similar or better results than using the "All" method (combining all 3 bias original bias mitigation techniques) from the original paper, and approximately the same or slightly higher F1 scores.
Figure 2: Results for Combining all 3 Bias Mitigation Techniques vs. Counterfactual Data Augmentation and Bias Controlled Training both with and without Neutral Generated Data.
In addition, using neutral generated utterances with counterfactual data augmentation and bias controlled training techniques result in producing more gender-neutral generated text, but maintains similar control on the level of bias in the generated text, as was evident from the similar F1 scores it achieved in Figure 2 for each bucket.
Figure 3: Percent of Generated Responses from each Model in each Bin.
The tables below show the results from our paper for each model, reproducing and extending the orginal paper by Dinan et al.
Results for Each Model for F0M0 Bin:
| Model | % Gendered Words | % Male Bias | F1 Score | % Generated Responses in This Bin |
|---|---|---|---|---|
| Baseline | 5.48 | 45.14 | 13.22 | 35.11 |
| CDA | 5.35 | 38.05 | 12.98 | 38.96 |
| Pos Data | 5.94 | 46.50 | 13.06 | 36.31 |
| Bias Ctrl Training | 0.69 | 56.85 | 13.59 | 41.30 |
| All | 0.32 | 43.53 | 13.75 | 39.41 |
| CDA + Bias Ctrl | 0.80 | 44.96 | 14.62 | 41.94 |
| CDA + Bias Ctrl + Our Gen. Data | 0.72 | 49.68 | 14.62 | 41.40 |
Results for Each Model for F+M0 Bin:
| Model | % Gendered Words | % Male Bias | F1 Score | % Generated Responses in This Bin |
|---|---|---|---|---|
| Baseline | 6.40 | 42.07 | 14.84 | 29.88 |
| CDA | 6.16 | 33.85 | 14.27 | 31.04 |
| Pos Data | 7.62 | 40.88 | 14.99 | 31.48 |
| Bias Ctrl Training | 8.76 | 4.70 | 15.4 | 34.26 |
| All | 8.25 | 1.95 | 15.92 | 35.02 |
| CDA + Bias Ctrl | 7.62 | 4.08 | 15.48 | 33.74 |
| CDA + Bias Ctrl + Our Gen. Data | 8.44 | 5.90 | 15.4 | 33.41 |
Results for Each Model for F0M+ Bin:
| Model | % Gendered Words | % Male Bias | F1 Score | % Generated Responses in This Bin |
|---|---|---|---|---|
| Baseline | 6.90 | 52.35 | 15.12 | 20.38 |
| CDA | 6.46 | 41.53 | 14.9 | 18.67 |
| Pos Data | 7.51 | 53.53 | 15.41 | 19.92 |
| Bias Ctrl Training | 7.36 | 94.37 | 15.4 | 14.82 |
| All | 7.89 | 97.13 | 17.31 | 13.41 |
| CDA + Bias Ctrl | 6.97 | 95.52 | 16.37 | 14.00 |
| CDA + Bias Ctrl + Our Gen. Data | 6.55 | 93.41 | 16.6 | 12.98 |
Results for Each Model for F+M+ Bin:
| Model | % Gendered Words | % Male Bias | F1 Score | % Generated Responses in This Bin |
|---|---|---|---|---|
| Baseline | 7.70 | 46.28 | 15.38 | 14.64 |
| CDA | 7.00 | 44.19 | 14.83 | 11.33 |
| Pos Data | 8.51 | 49.71 | 15.37 | 12.28 |
| Bias Ctrl Training | 11.40 | 36.41 | 15.56 | 9.62 |
| All | 12.55 | 43.01 | 16.73 | 12.15 |
| CDA + Bias Ctrl | 11.15 | 40.89 | 15.48 | 10.32 |
| CDA + Bias Ctrl + Our Gen. Data | 11.54 | 44.64 | 16.61 | 12.21 |
This repository is MIT licensed. See the LICENSE file for details.