Skip to content

Don't show trap links to recognized bots#12413

Merged
mekarpeles merged 1 commit into
internetarchive:masterfrom
cdrini:perf/bot-trap-links
Apr 21, 2026
Merged

Don't show trap links to recognized bots#12413
mekarpeles merged 1 commit into
internetarchive:masterfrom
cdrini:perf/bot-trap-links

Conversation

@cdrini
Copy link
Copy Markdown
Collaborator

@cdrini cdrini commented Apr 20, 2026

Closes #

Technical

Note: Pre-existing bug! We need to update our caching prethread to copy over both is_bot and is_recognized_bot. Currently we have cache leak.

Testing

Confirmed locally trap link still shows. Put on prod, saw a drop in identifying bots hitting trap links:

sudo tac /1/var/log/nginx/access.log | obfi_grep_bots -v | grep 'show_page_status=1' | obfi_count_minute

Screenshot

Stakeholders

@cdrini cdrini added Patch Deployed This PR has been deployed to production independently, outside of the regular deploy cycle. On Testing labels Apr 20, 2026
@cdrini cdrini marked this pull request as ready for review April 21, 2026 17:03
Copilot AI review requested due to automatic review settings April 21, 2026 17:03
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to reduce “trap link” exposure for known/recognized crawlers by gating the hidden show_page_status=1 link behind a recognized-bot check, leveraging the request-scoped context vars infrastructure.

Changes:

  • Conditionally render the hidden “Page Status” trap link only when not is_recognized_bot() in the global nav header.
  • Expose a new is_recognized_bot() helper in openlibrary.plugins.openlibrary.code for use in templates.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
openlibrary/templates/lib/nav_head.html Suppresses the hidden trap link for recognized bots.
openlibrary/plugins/openlibrary/code.py Adds a @public is_recognized_bot() helper backed by req_context.
Comments suppressed due to low confidence (1)

openlibrary/plugins/openlibrary/code.py:1241

  • The comment says this reads a ContextVar set by set_context_from_fastapi(), but set_context_from_fastapi() currently never sets is_recognized_bot (it only computes is_bot). As a result, is_recognized_bot() will always return False in FastAPI contexts. Either update set_context_from_fastapi() to compute/populate is_recognized_bot as well, or adjust this comment so it doesn’t imply FastAPI support.
    # Reads from the request-scoped ContextVar set by set_context_from_legacy_web_py()
    # (web.py) or set_context_from_fastapi() — the web.py equivalent of web.ctx.
    return req_context.get().is_recognized_bot

Comment on lines +62 to +66
$if not is_recognized_bot():
$# detect-missing-i18n-skip-line
<a
href="$changequery(dict(show_page_status=1))"
style="color:transparent;position:absolute;pointer-events:none;"
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is_recognized_bot() is derived from req_context. For responses computed in memcache async update threads (via caching_prethread() in openlibrary/plugins/openlibrary/home.py), the new thread currently recomputes request context from a dummy web.ctx.env, which makes is_recognized_bot false even for recognized crawlers. That means recognized bots can still receive cached pages containing this trap link. Consider updating caching_prethread() to propagate the original UA / is_recognized_bot (or directly copy RequestContextVars) into the background thread before template rendering.

Copilot uses AI. Check for mistakes.
Comment on lines +62 to +69
$if not is_recognized_bot():
$# detect-missing-i18n-skip-line
<a
href="$changequery(dict(show_page_status=1))"
style="color:transparent;position:absolute;pointer-events:none;"
tabindex="-1"
aria-hidden="true"
>Page Status</a>
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This behavior change (suppressing the trap link for recognized bots) doesn’t appear to be covered by tests. There are existing template-rendering tests (e.g., openlibrary/plugins/openlibrary/tests/test_home.py) that could be extended to assert the "Page Status" link is absent when req_context.is_recognized_bot=True and present otherwise.

Copilot uses AI. Check for mistakes.
@mekarpeles mekarpeles merged commit 1c3c54c into internetarchive:master Apr 21, 2026
7 checks passed
@cdrini cdrini deleted the perf/bot-trap-links branch April 21, 2026 17:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

On Testing Patch Deployed This PR has been deployed to production independently, outside of the regular deploy cycle.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants