Public replication repository for "Estimating Wage Disparities Using Foundation Models"
Data files are stored within the data directory in this repository with the exception of the fine-tuned model predictions which can be downloaded from this link. Download, unzip, and save into the data directory in the root repository.
Note: For Stanford users, they are also are under the Sherlock computing cluster $OAK/career-wage-gaps/data_for_analysis. Download $OAK/career-wage-gaps/data_for_analysis/master_dataset_gender-1990-2019_1-16.csv.
Navigate to the root directory. Create a conda virtual environment named career-wage-gaps-replication with conda create -n career-wage-gaps-replication python=3.10 command. Activate career-wage-gaps-replication. Then run the following command to install package requirements.
pip install -r requirements.txt
Next, run the following command to be able to use Jupyter lab or Jupyter notebook if you do not have it installed already:
pip install jupyter
The scripts generating all the data used in this paper can be found in the /code directory within the root repository. Navigate to the subdirectory and execute the following five scripts.
To create the figures and tables for the semi-synthetic experiments, execute the jupyter notebook code/semi_synthetic_figs_and_tables.ipynb. Namely, the following figures and tables in the paper are generated by this file:
- Figure 1
- Figure S1
- Figure S2
- Table S6
To create the figures and tables for the predictive accuracy metrics, execute the jupyter notebook code/mse.ipynb. Namely, the following figures and tables in the paper are generated by this file:
- Table 1
To create the figures and tables for the wage gap analysis, execute the jupyter notebook code/wage_gaps.ipynb. Namely, the following figures and tables in the paper are generated by this file:
- Table S5
- Table 2
- Table S4
- Figure S3
- Figure S4
- Figure S5
To create the figures and tables for the omitted variable bias analysis, execute the jupyter notebook code/clustering.ipynb.
- Figure 2
- Table S2
- Table S3
To create the table comparing our sample to Blau Kahn, execute the jupyter notebook code/compare_to_blau_kahn_sample.ipynb.ipynb. namely
- Table S1