Skip to content

Conversation

@EvanUp
Copy link

@EvanUp EvanUp commented Feb 10, 2025

Begin migrating library from requests to selenium. Added selenium calls (built on undetected-chromedriver) in _conduct_chromedriver_search and _send_chromedriver_request. Also added an additional step to launch chromedriver to facilitate debugging. test/selenium_test.py contains a functioning example call.

TODO: fix broken poetry command, include chromedriver update/path instructions, remove requests-specific headers

@gitronald gitronald changed the base branch from master to selenium March 9, 2025 17:07
@gitronald gitronald requested a review from Copilot March 9, 2025 17:08
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Overview

This PR migrates the scraping functionality from the requests-based implementation to a selenium/undetected-chromedriver setup for improved resilience and compatibility with modern search pages. Key changes include:

  • Introducing selenium calls and helper methods (_init_chromedriver, _send_chromedriver_request, _conduct_chromedriver_search) in the SearchEngine class.
  • Removing or commenting out requests-specific header and session code.
  • Updating documentation and dependency lists to reflect the switch to selenium.
  • Adding a basic selenium test and updating demo scripts and the README.

Reviewed Changes

File Description
WebSearcher/searchers.py Added selenium integrations, removed unused requests session/headers.
tests/selenium_test.py Introduced an example test for selenium-driven search.
pyproject.toml Updated dependencies to include undetected-chromedriver and selenium.
README.md Updated instructions and notes to reflect the new selenium-based approach.
scripts/demo_search.py Updated to launch chromedriver before initiating search.

Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.

Comments suppressed due to low confidence (1)

WebSearcher/searchers.py:169

  • [nitpick] The variable name 'ai_button' is misleading since it holds a boolean rather than an element; consider renaming it to 'ai_expand_available' or similar for clarity.
            ai_button = self._check_ai_expand()

self.log.exception(f'SERP | Timeout error | {self.serp_id}')
except Exception:
self._send_chromedriver_request()
except:
Copy link

Copilot AI Mar 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid using a bare except; specify the exception type (e.g. Exception) to prevent masking unexpected errors.

Suggested change
except:
except Exception:

Copilot uses AI. Check for mistakes.
@gitronald gitronald merged commit 6ff6262 into gitronald:selenium Mar 9, 2025
@gitronald
Copy link
Owner

Merge munging notes:

  • kept requests code for compatibility
  • dropped commented out examples
  • started making a collection choice possible with method arg in se.search
  • todo:
    • make it possible to set chromedriver_path
    • update demos with method arg where needed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants