Node scripts to gather and clean data for an article on the rise of double-barrelled last names in US professional sports.
Clone the repo and run npm i
Pulls down html pages of alphabetized WNBA players from Basketball Reference and saves into the output/wnba folder as names-{letter}.html
Compiles and formats all WNBA player names and saves into output/wnba as names.csv
Pulls down html pages of alphabetized NBA players from Basketball Reference and saves into the output/nba folder as names-{letter}.html
Compiles and formats all NBA player names and saves into output/nba as names.csv
Pulls down html pages of alphabetized NFL players from Football Reference and saves into the output/nfl folder as names-{letter}.html
Compiles and formats all NFL player names and saves into output/nfl as names.csv
Pulls down html pages of alphabetized MLB players from Baseball Reference and saves into the output/mlb folder as names-{letter}.html
Compiles and formats all MLB player names and saves into output/mlb as names.csv
Pulls down html pages of alphabetized NHL players from Hockey Reference and saves into the output/nhl folder as names-{letter}.html
Compiles and formats all NHL player names and saves into output/nhl as names.csv
Compiles previously downloaded MLS player names from output/mls/csvs and saves into output/nwsl as names-no-years.csv
Formats all MLS player names and saves into output/mls as names.csv
Pulls down html pages of alphabetized NWSL players from NWSL and saves into the output/nhl folder as season-{season}.html
Compiles all NWSL player names and saves into output/nwsl as names-no-years.csv
Formats all NWSL player names and saves into output/nwsl as names.csv
Pulls down html pages of alphabetized US congressional members from congress.gov and saves into the output/congress folder as names-{page}.html
Compiles and formats all congressional names and saves into output/congress as names.csv
Compiles names from all leagues and saves into output as:
allCombinedNames.csvwhich includes names from CongresshyphensCombinedNames.csvwhich includes only last names with hyphenssportsCombinedNames.csvwhich includes all sports names without Congress
Korean names, where the last name appears before the first name, were later manually untagged as hyphenated names. Players were grouped into decades by the season in which they played in their first professional game. When seasons spanned multiple years (i.e. 1979-1980), the last year was used as the decade. The reasons for hyphenation were manually researched and added into sportsCombinedNames_withReasons.csv.