✨ feat(crawler): add headless_wait option for Chromium mode#1473
Merged
✨ feat(crawler): add headless_wait option for Chromium mode#1473
Conversation
Adds --headless-wait CLI argument to control wait time after page load in headless mode. Passes wait duration to Chromium client and applies sleep after load event. Useful for JS-heavy pages that require extra rendering time.
Contributor
There was a problem hiding this comment.
Pull request overview
This PR adds a --headless-wait CLI option to mq-crawl to optionally wait after Chromium’s page load event in headless mode, improving crawling results for JS-heavy pages.
Changes:
- Add
--headless-waitCLI argument (seconds) for headless Chromium mode. - Pass the wait duration into the Chromium
HttpClientand sleep after page load before reading content.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
crates/mq-crawler/src/main.rs |
Adds CLI flag and passes wait duration to the Chromium client constructor. |
crates/mq-crawler/src/http_client.rs |
Extends the Chromium client variant to store a wait duration and sleeps before capturing page content. |
Co-authored-by: Copilot Autofix powered by AI <[email protected]>
Contributor
There was a problem hiding this comment.
Pull request overview
Adds a new --headless-wait option to mq-crawl so headless Chromium mode can pause for an additional duration after the page load event, improving extraction for JS-heavy pages.
Changes:
- Add
--headless-waitCLI flag (seconds) gated behind--headless. - Pass the wait duration into the Chromium
HttpClientand sleep afternew_page()/loadbefore readingpage.content().
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| crates/mq-crawler/src/main.rs | Adds CLI flag, validates the value, and passes it into Chromium client creation. |
| crates/mq-crawler/src/http_client.rs | Extends Chromium client variant to store a wait Duration and sleeps before capturing page content. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adds --headless-wait CLI argument to control wait time after page load in headless mode. Passes wait duration to Chromium client and applies sleep after load event. Useful for JS-heavy pages that require extra rendering time.