Fix URL encoding in llms.txt (em dashes to hyphens)#174
Conversation
✅ Deploy Preview for docs-extensions-and-macros ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the Use the checkbox below for a quick retry:
📝 WalkthroughWalkthroughThis pull request introduces a new Antora extension that converts LLM-related documentation to plain text files during the build pipeline. The extension hooks into three build events: on Sequence DiagramsequenceDiagram
participant BP as Build Pipeline
participant EXT as convert-llms-to-txt Extension
participant PP as Playbook/Catalog
participant CS as Content Staging
participant FS as File System
BP->>EXT: playbookBuilt event
EXT->>PP: Select site URL (PREVIEW flag)
Note over EXT: Store selected URL
BP->>EXT: contentClassified event
EXT->>CS: Inject site-url attribute<br/>into home component versions
BP->>EXT: beforePublish event
EXT->>CS: Locate llms.adoc page<br/>in home component
EXT->>EXT: Extract markdown content
EXT->>EXT: Clean content<br/>(remove HTML comments,<br/>fix em-dashes)
EXT->>PP: Record llms.html URL<br/>in unpublishedPages
EXT->>CS: Remove HTML page output
EXT->>FS: Write llms-full.txt<br/>(aggregated latest-version pages)
EXT->>FS: Write llms.txt<br/>(from llms page content)
Note over EXT: Log processing steps,<br/>warnings, and errors
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
The convert-to-markdown extension applies smart typography that converts double hyphens (--) to em dashes (—) in URLs. This breaks URLs like: deploy-preview-159--redpanda-documentation.netlify.app becomes: deploy-preview-159—redpanda-documentation.netlify.app Added regex replacement to convert em dashes back to double hyphens in all URLs within markdown content before generating llms.txt. This fixes afdocs checks: - llms-txt-links-resolve - markdown-url-support - content-negotiation Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
e11fdb0 to
373b745
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
preview/home/modules/ROOT/pages/llms.adoc (1)
7-8: Add one explicit deploy-preview URL fixture with--for regression coverage.This sample page is the easiest place to keep the failing hostname pattern visible to the converter.
Suggested fixture tweak
- {site-url}/test-page[Test Page]: A test page link - {site-url}/another-page[Another Page]: Another test link +- https://deploy-preview-159--redpanda-documentation.netlify.app/test-page[Deploy Preview Test Page]: Regression fixture for em-dash URL conversion🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@preview/home/modules/ROOT/pages/llms.adoc` around lines 7 - 8, Add one explicit deploy-preview URL fixture to preview/home/modules/ROOT/pages/llms.adoc so the converter sees a hostname containing `--`; specifically, alongside the existing link entries ({site-url}/test-page and {site-url}/another-page) add another link line that uses a deploy-preview-style hostname containing the double-hyphen pattern (deploy-preview--<number>--<host>), ensuring the fixture is an actual URL entry in the same list so the failing hostname regex is exercised during conversion.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@extensions/convert-llms-to-txt.js`:
- Around line 76-82: The em-dash URL normalization is only applied to the local
variable content (used for llms.txt) but not to the raw page.markdownContents
written to llms-full.txt; update the code so both outputs use the same
normalized text—either normalize page.markdownContents before writing
llms-full.txt or replace usages of page.markdownContents with the
already-normalized content variable when writing llms-full.txt (ensure the same
regex replace(/\(https?:\/\/[^)]*—[^)]*\)/g, ...) is applied and keep the
logger.debug('Fixed em dashes in URLs') call to indicate normalization).
---
Nitpick comments:
In `@preview/home/modules/ROOT/pages/llms.adoc`:
- Around line 7-8: Add one explicit deploy-preview URL fixture to
preview/home/modules/ROOT/pages/llms.adoc so the converter sees a hostname
containing `--`; specifically, alongside the existing link entries
({site-url}/test-page and {site-url}/another-page) add another link line that
uses a deploy-preview-style hostname containing the double-hyphen pattern
(deploy-preview--<number>--<host>), ensuring the fixture is an actual URL entry
in the same list so the failing hostname regex is exercised during conversion.
ℹ️ Review info
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Jira integration is disabled
You can enable these sources in your CodeRabbit configuration.
⛔ Files ignored due to path filters (1)
package-lock.jsonis excluded by!**/package-lock.json
📒 Files selected for processing (5)
extensions/convert-llms-to-txt.jslocal-antora-playbook.ymlpackage.jsonpreview/home/antora.ymlpreview/home/modules/ROOT/pages/llms.adoc
Summary
Fixes URL encoding issue in llms.txt where double hyphens (
--) in URLs were being converted to em dashes (—) by the markdown converter's smart typography feature.Problem
The convert-to-markdown extension applies smart typography that converts:
to:
This breaks all Netlify deploy preview URLs and causes afdocs checks to fail:
Solution
Added regex replacement in convert-llms-to-txt.js to convert em dashes back to double hyphens in all URLs within markdown content before generating llms.txt.
Testing
After this fix, URLs in llms.txt are properly formatted and all afdocs checks should pass.
Changes
extensions/convert-llms-to-txt.js- Added URL em dash fixpackage.json- Bumped version to 4.15.1package-lock.json- Updated for new version