Skip to content

Comments

Fix scraping multiple URLs #5677

Merged
WithoutPants merged 4 commits intostashapp:developfrom
WithoutPants:issues/5294
Feb 25, 2025
Merged

Fix scraping multiple URLs #5677
WithoutPants merged 4 commits intostashapp:developfrom
WithoutPants:issues/5294

Conversation

@WithoutPants
Copy link
Collaborator

The mapped scraper functionality is in sore need of a rework. #5294 has highlighted this with the new URLs field. For now, I've hacked in a solution that should correctly convert URLs to a list.

Notably, this does not work where URLs is used in a sub-object scenario. For example, it will populate only a single URL in the URLs field of performers within a scene scrape. This is because there is currently no way to determine which performer a given URL value belongs to. Therefore, it follows existing convention and assigns one URL for each performer result. This means that it is not possible to have multiple URLs for any performer within a scene scrape.

Resolves #5294

For testing, I used the following performer scraper for LinkTree:

name: LinkTree
performerByURL:
  - action: scrapeXPath
    url:
      - linktr.ee
    scraper: performerScraper

xPathScrapers:
  performerScraper:
    performer:
      Name: //div[@id='profile-title']/h1/text()
      URLs:
        selector: //div[@id='links-container']//a/@href

@WithoutPants WithoutPants added the bug Something isn't working label Feb 24, 2025
@WithoutPants WithoutPants added this to the Version 0.28.0 milestone Feb 24, 2025
@DogmaDragon
Copy link
Collaborator

Tested with several existing community scrapers (IAFD, The Nude, Babepedia, MFC) and some custom scrapers.
Tested across different actions (script, scrapeJson, scrapeXPath).

Everything looks to work in my testing so far.

If URLs is used, URL, Twitter and Instagram are ignored while URLs is processed correctly. Which maintains compatibility for older scrapers.

@WithoutPants WithoutPants merged commit 1e05766 into stashapp:develop Feb 25, 2025
2 checks passed
feederbox826 added a commit to stashapp/CommunityScrapers that referenced this pull request Feb 26, 2025
XGFan pushed a commit to XGFan/stash that referenced this pull request Mar 27, 2025
* Hack fix for scraping URLs field
* Rewrite apply function using known value types
@DogmaDragon DogmaDragon linked an issue Mar 28, 2025 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug Report] XPath scraper is missing string array support [Feature] Add multiple urls to be returned by scrapers.

2 participants