PrincetonCourses Scraper

Playwright-based scraper for princetoncourses.com that:

launches with a persistent browser profile from ./profile
waits for Princeton login to complete before scraping
uses Playwright for headed login/session handling and then switches to PrincetonCourses' authenticated JSON API for speed
partitions discovery to get around the site's 150-result search cap
scrapes course metadata plus evaluation comments/reviews for every course instance
writes both SQLite and JSON outputs to ./output

Install

npm install playwright
npx playwright install chromium

Run

npm run scrape

The browser opens in headed mode by default. If Princeton authentication is required, log in in that window and the scraper will continue automatically once /api/semesters becomes available for the authenticated session.

Useful Flags

npm run inspect
npm run scrape -- --max-courses=25 --search-concurrency=3 --course-concurrency=6
npm run scrape -- --headless=true

--inspect-current validates authentication and writes an API inspection payload to ./inspections
--max-courses is useful for trial runs
--search-concurrency controls discovery parallelism
--course-concurrency controls detail-fetch parallelism
--min-delay-ms / --max-delay-ms add jitter between requests so the crawl stays fast without hammering the site

Outputs

output/princetoncourses.sqlite
output/princetoncourses-data.json
output/princetoncourses-discovery-log.json

The JSON payload groups semester instances under stable courseId values, preserves website-native tags, and includes all review comments pulled from the course API. The SQLite DB mirrors the same dataset in queryable tables so the later AI-tagging pass can build on top of it.

Bedrock Integration

Amazon Bedrock now supports API keys as Bearer tokens. This repo includes a direct HTTP Bedrock client in src/bedrock.js and a tagging scaffold in src/tag-courses.js.

The Bedrock key is not written anywhere in this repo. Use the official environment variable at runtime:

export AWS_BEARER_TOKEN_BEDROCK='...your key...'
npm run tag:bedrock -- --max-courses=20

By default the tagger targets amazon.nova-micro-v1:0, which is a relatively cheap Bedrock model for the later meta-tagging pass. Override with BEDROCK_MODEL_ID or --model-id if you want a different model.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
site		site
src		src
.gitignore		.gitignore
README.md		README.md
easyprincetoncourses_logo.png		easyprincetoncourses_logo.png
package-lock.json		package-lock.json
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PrincetonCourses Scraper

Install

Run

Useful Flags

Outputs

Bedrock Integration

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

PrincetonCourses Scraper

Install

Run

Useful Flags

Outputs

Bedrock Integration

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages