Skip to content

✨ feat(crawler): add headless_wait option for Chromium mode#1473

Merged
harehare merged 2 commits intomainfrom
feat/crawler-headless-wait
Mar 19, 2026
Merged

✨ feat(crawler): add headless_wait option for Chromium mode#1473
harehare merged 2 commits intomainfrom
feat/crawler-headless-wait

Conversation

@harehare
Copy link
Copy Markdown
Owner

Adds --headless-wait CLI argument to control wait time after page load in headless mode. Passes wait duration to Chromium client and applies sleep after load event. Useful for JS-heavy pages that require extra rendering time.

Adds --headless-wait CLI argument to control wait time after page load in headless mode. Passes wait duration to Chromium client and applies sleep after load event. Useful for JS-heavy pages that require extra rendering time.
Copilot AI review requested due to automatic review settings March 19, 2026 12:30
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a --headless-wait CLI option to mq-crawl to optionally wait after Chromium’s page load event in headless mode, improving crawling results for JS-heavy pages.

Changes:

  • Add --headless-wait CLI argument (seconds) for headless Chromium mode.
  • Pass the wait duration into the Chromium HttpClient and sleep after page load before reading content.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
crates/mq-crawler/src/main.rs Adds CLI flag and passes wait duration to the Chromium client constructor.
crates/mq-crawler/src/http_client.rs Extends the Chromium client variant to store a wait duration and sleeps before capturing page content.

Co-authored-by: Copilot Autofix powered by AI <[email protected]>
Copilot AI review requested due to automatic review settings March 19, 2026 12:45
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new --headless-wait option to mq-crawl so headless Chromium mode can pause for an additional duration after the page load event, improving extraction for JS-heavy pages.

Changes:

  • Add --headless-wait CLI flag (seconds) gated behind --headless.
  • Pass the wait duration into the Chromium HttpClient and sleep after new_page()/load before reading page.content().

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File Description
crates/mq-crawler/src/main.rs Adds CLI flag, validates the value, and passes it into Chromium client creation.
crates/mq-crawler/src/http_client.rs Extends Chromium client variant to store a wait Duration and sleeps before capturing page content.

@harehare harehare merged commit bfcbec9 into main Mar 19, 2026
8 checks passed
@harehare harehare deleted the feat/crawler-headless-wait branch March 19, 2026 12:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants