Skip to content

Scrape and store age rating data from ScreenScraper.fr#3089

Merged
gantoine merged 4 commits intomasterfrom
copilot/scrape-age-rating-data
Mar 8, 2026
Merged

Scrape and store age rating data from ScreenScraper.fr#3089
gantoine merged 4 commits intomasterfrom
copilot/scrape-age-rating-data

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Mar 8, 2026

ScreenScraper returns classifications (PEGI, ESRB, BBFC, etc.) in its game data, but this was never extracted or stored, making age-based filtering impossible for SS-only libraries.

Changes

  • ss_handler.py: Added age_ratings: list[str] to SSMetadata TypedDict; added _get_age_ratings() inside extract_metadata_from_ss_rom() that maps classifications entries to formatted strings:

    # {"type": "PEGI", "text": "7"} → "PEGI 7"
    # {"type": "ESRB", "text": "E"} → "ESRB E"

    Empty type or text values are filtered out.

  • 0070_ss_age_ratings.py (new migration): Updates the roms_metadata view (PostgreSQL + MySQL/MariaDB) to include ss_metadata → 'age_ratings' in the age_ratings COALESCE, positioned after IGDB and before LaunchBox priority order.

Original prompt

This section details on the original issue you should resolve

<issue_title>[Feature] Scrape and Include Age Rating Data from ScreenScraper.fr</issue_title>
<issue_description>Is your feature request related to a problem? Please describe.
I want to use the age rating that screenscraper provides in order to be able to filter my library for games suitable for children. Currently no age rating data is pulled, thus I cannot filter the library.

Describe the solution you'd like
ScreenScraper has age ratings included in most of their games. These should be scraped and included in the library database as well.

Describe alternatives you've considered
I considered using IGDB as an alternate datasource, but wanted to stay off/away from amazon and twitch products for privacy reasons. I also already have a screenscraper account, and thus would prefer to use the resource that I already have available.

Additional context
No further context to add, will add more information if required.
</issue_description>

Comments on the Issue (you are @copilot in this section)


🔒 GitHub Advanced Security automatically protects Copilot coding agent pull requests. You can protect all pull requests by enabling Advanced Security for your repositories. Learn more about Advanced Security.

Copilot AI changed the title [WIP] Add age rating data scraping from ScreenScraper.fr Scrape and store age rating data from ScreenScraper.fr Mar 8, 2026
@gantoine gantoine marked this pull request as ready for review March 8, 2026 21:27
@gantoine gantoine self-requested a review March 8, 2026 21:27
@greptile-apps
Copy link
Copy Markdown

greptile-apps bot commented Mar 8, 2026

Greptile Summary

This PR adds ScreenScraper age-rating scraping and display to RomM by extracting classifications from the SS API response, persisting them as structured SSAgeRating objects in ss_metadata, exposing them through an updated roms_metadata view, and rendering them in GameInfo.vue using IGDB-hosted rating icons (with a chip fallback for unsupported systems).

Key changes and findings:

  • ss_handler.py: _get_age_ratings() is correctly guarded and maps classification["text"]rating and classification["type"]category. Minor: subscript access is used after a get() truthiness guard — consider using .get() consistently for clarity.
  • Migration 0070: The COALESCE chain correctly returns NULL (not an empty array) from the IGDB and SS CASE branches, so fall-through to lower-priority sources works as intended. The downgrade() drops the view without recreating the previous version, which is the established pattern in this codebase.
  • GameInfo.vue: The ssByRating Map is keyed by bare rating text (e.g. "12"). If ScreenScraper returns two ratings that share the same numeric label from different systems (e.g. PEGI 12 and USK 12 — a realistic overlap at values 12, 16, 18), the Map will silently drop one entry. The surviving entry's icon will display twice while the dropped entry falls through to an unlabelled chip. This should be addressed before the feature is considered complete.
  • The filter-click path uses value.rating (e.g. "7") as the search term, which means filtering by a badge will match any rating system that uses that numeric label. This is arguably the desired behaviour for an age-suitability filter, but is worth documenting.

Confidence Score: 3/5

  • Mostly safe to merge for the primary SS-only use case, but the Map key collision in GameInfo.vue is a real logic bug that will silently misdisplay ratings for games with overlapping numeric rating codes across systems.
  • The backend extraction and SQL migration are correct and well-structured. The frontend rendering has one confirmed logic issue (ssByRating Map key collision on shared numeric codes like 12, 16, 18 across PEGI and USK) that would cause silent mis-attribution of rating category icons for affected games. The feature works correctly for the common single-system case (PEGI-only or ESRB-only), but the edge case is realistic enough to warrant a fix before shipping.
  • Pay close attention to frontend/src/components/Details/Info/GameInfo.vue — the ssByRating Map key collision is the primary issue requiring a fix.

Important Files Changed

Filename Overview
backend/alembic/versions/0070_ss_age_ratings.py New migration that recreates the full roms_metadata view for both PostgreSQL and MySQL/MariaDB, correctly adding a SS-specific CASE that returns NULL (not an empty array) from the IGDB and SS branches so that COALESCE can fall through properly. The downgrade() only drops the view (consistent with prior migrations in this repo). No structural issues beyond the inherited flat-rating-value design.
backend/handler/metadata/ss_handler.py Adds _get_age_ratings() that correctly maps ScreenScraper classifications entries to SSAgeRating objects; filter guards are sound. Removes the unused rating_cover_url field from SSAgeRating. Minor style note on mixed get()/subscript access.
frontend/src/components/Details/Info/GameInfo.vue Adds SS rating lookup via a ssByRating Map and constructs IGDB-hosted icon URLs for SS ratings. Has a logic issue: the Map is keyed by bare rating text, so two SS ratings with the same numeric code but different categories (e.g. PEGI 12 and USK 12) will collide, and one will silently render as an unlabelled chip.
frontend/src/generated/models/SSAgeRating.ts New auto-generated model with rating: string and category: string, correctly omitting rating_cover_url since SS does not supply it.
frontend/src/generated/models/RomSSMetadata.ts Adds age_ratings?: Array<SSAgeRating> to the RomSSMetadata type, consistent with the backend SSMetadata TypedDict change.
frontend/src/generated/index.ts Exports the new SSAgeRating type; straightforward barrel-export addition.

Sequence Diagram

sequenceDiagram
    participant SS as ScreenScraper API
    participant Handler as ss_handler.py
    participant DB as roms table (ss_metadata)
    participant View as roms_metadata view
    participant FE as GameInfo.vue

    SS->>Handler: SSGame { classifications: [{type, text}] }
    Handler->>Handler: _get_age_ratings()<br/>maps text→rating, type→category
    Handler->>DB: SSMetadata { age_ratings: [{rating, category}] }

    FE->>View: SELECT age_ratings FROM roms_metadata
    View->>DB: CASE: jsonb_path_query_array(ss_metadata, '$.age_ratings[*].rating')
    View-->>FE: age_ratings = ["7", "M"] (flat string array)

    FE->>FE: Build ssByRating Map<br/>{ rating → SSAgeRating } from ss_metadata
    FE->>FE: For each entry in metadatum.age_ratings:<br/>1. Check manual (":") format<br/>2. Look up in igdbByRating<br/>3. Look up in ssByRating → construct icon URL<br/>4. Fallback chip
    FE->>FE: Render v-img (if cover URL) or v-chip (fallback)
    FE->>FE: onFilterClick('ageRatings', value.rating) → route to search
Loading

Last reviewed commit: c89753c

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 8, 2026

Test Results (postgresql)

942 tests  ±0   941 ✅ ±0   2m 20s ⏱️ +4s
  1 suites ±0     1 💤 ±0 
  1 files   ±0     0 ❌ ±0 

Results for commit c89753c. ± Comparison against base commit 9dd155a.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 8, 2026

Test Results (mariadb)

942 tests  ±0   941 ✅ ±0   2m 15s ⏱️ ±0s
  1 suites ±0     1 💤 ±0 
  1 files   ±0     0 ❌ ±0 

Results for commit c89753c. ± Comparison against base commit 9dd155a.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 8, 2026

☂️ Python Coverage

current status: ✅

Overall Coverage

Lines Covered Coverage Threshold Status
13808 9259 67% 0% 🟢

New Files

No new covered files...

Modified Files

File Coverage Status
backend/handler/metadata/ss_handler.py 31% 🟢
TOTAL 31% 🟢

updated for commit: c89753c by action🐍

@gantoine gantoine merged commit 38b311d into master Mar 8, 2026
14 checks passed
@gantoine gantoine deleted the copilot/scrape-age-rating-data branch March 8, 2026 22:44
Comment on lines +131 to 133
const ssByRating = new Map(
ssRatings.map((r) => [String(r.rating).trim(), r]),
);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ssByRating Map drops duplicate rating codes across categories

ssByRating is keyed by r.rating (the raw text like "12", "16", etc.). When ScreenScraper returns ratings for two different systems that share the same numeric label — e.g. PEGI 12 and USK 12 — the Map constructor will silently overwrite the first with the second (last-write wins). The flattened view column will contain two "12" entries, but both lookup calls (ssByRating.get("12")) will resolve to the same object, so only the winner's icon is shown twice while the loser falls through to the bare-chip fallback with an empty category.

PEGI and USK share the values 12, 16, and 18, so this is a realistic scenario for many European releases.

A robust approach is to key the map by ${r.category}:${r.rating} and extract the category from the view as well (either by storing "PEGI:12" style strings, or by fetching the full objects from ss_metadata). Alternatively, look up the full object list directly from ss_metadata.age_ratings and render that instead of going through the flattened metadatum.age_ratings.

Suggested change
const ssByRating = new Map(
ssRatings.map((r) => [String(r.rating).trim(), r]),
);
const ssByCategory = new Map(
ssRatings.map((r) => [`${r.category}:${r.rating}`.trim(), r]),
);

Comment on lines +173 to +183
// ScreenScraper age ratings need to have cover URLs constructed
const ssMatch = ssByRating.get(entry.trim());
if (ssMatch) {
const slug = categorySlug[ssMatch.category];
return {
...ssMatch,
rating_cover_url: slug
? `https://www.igdb.com/icons/rating_icons/${slug}/${slug}_${normalizeRatingCode(ssMatch.rating)}.png`
: undefined,
};
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SS rating lookup does not account for shared numeric codes

Building on the map-collision issue above, the lookup ssByRating.get(entry.trim()) uses the bare rating text "7", "12", etc. as the key. Because the flattened view column also only carries the text value (not the category), there is no way to distinguish a PEGI 12 entry from a USK 12 entry at this point.

If the map key is updated to ${category}:${rating}, the lookup here must change accordingly — which requires knowing the category at this step. The cleanest fix is to skip the view-column lookup entirely for SS ratings and render directly from ss_metadata.age_ratings (the structured data is already available in ssRatings).

Comment on lines +368 to +376
def _get_age_ratings(game: SSGame) -> list[SSAgeRating]:
return [
SSAgeRating(
rating=classification["text"],
category=classification["type"],
)
for classification in game.get("classifications", [])
if classification.get("type") and classification.get("text")
]
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

classifications items are accessed by subscript after a get() guard — inconsistent defensiveness

The guard classification.get("type") and classification.get("text") ensures both keys are present and truthy before the list comprehension body runs, so classification["text"] and classification["type"] are safe. However, the ScreenScraper API is loosely typed — if an entry ever contains, say, {"type": "PEGI"} (no "text" key at all), classification.get("text") returns None (filtered), but if "text" is present with value 0 (integer zero, unlikely but possible), the truthy guard would filter it when it maybe shouldn't. Using .get() consistently would make the intent clearer:

Suggested change
def _get_age_ratings(game: SSGame) -> list[SSAgeRating]:
return [
SSAgeRating(
rating=classification["text"],
category=classification["type"],
)
for classification in game.get("classifications", [])
if classification.get("type") and classification.get("text")
]
def _get_age_ratings(game: SSGame) -> list[SSAgeRating]:
return [
SSAgeRating(
rating=classification.get("text", ""),
category=classification.get("type", ""),
)
for classification in game.get("classifications", [])
if classification.get("type") and classification.get("text")
]

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature] Scrape and Include Age Rating Data from ScreenScraper.fr

2 participants