Skip to content

Fix URL encoding in llms.txt (em dashes to hyphens)#174

Merged
JakeSCahill merged 3 commits intomainfrom
add-llms-txt-converter
Mar 3, 2026
Merged

Fix URL encoding in llms.txt (em dashes to hyphens)#174
JakeSCahill merged 3 commits intomainfrom
add-llms-txt-converter

Conversation

@JakeSCahill
Copy link
Copy Markdown
Contributor

Summary

Fixes URL encoding issue in llms.txt where double hyphens (--) in URLs were being converted to em dashes () by the markdown converter's smart typography feature.

Problem

The convert-to-markdown extension applies smart typography that converts:

deploy-preview-159--redpanda-documentation.netlify.app

to:

deploy-preview-159—redpanda-documentation.netlify.app

This breaks all Netlify deploy preview URLs and causes afdocs checks to fail:

  • ✗ llms-txt-links-resolve: Only 1/30 links resolve (3%)
  • ✗ markdown-url-support: No pages support .md URLs
  • ✗ content-negotiation: Server ignores Accept header

Solution

Added regex replacement in convert-llms-to-txt.js to convert em dashes back to double hyphens in all URLs within markdown content before generating llms.txt.

// Fix URLs: convert em dashes back to double hyphens
content = content.replace(/\(https?:\/\/[^)]*[^)]*\)/g, (match) => {
  return match.replace(//g, '--');
});

Testing

After this fix, URLs in llms.txt are properly formatted and all afdocs checks should pass.

Changes

  • extensions/convert-llms-to-txt.js - Added URL em dash fix
  • package.json - Bumped version to 4.15.1
  • package-lock.json - Updated for new version

@netlify
Copy link
Copy Markdown

netlify bot commented Mar 3, 2026

Deploy Preview for docs-extensions-and-macros ready!

Name Link
🔨 Latest commit 373b745
🔍 Latest deploy log https://app.netlify.com/projects/docs-extensions-and-macros/deploys/69a6a81f367d45000721cd50
😎 Deploy Preview https://deploy-preview-174--docs-extensions-and-macros.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 3, 2026

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

This pull request introduces a new Antora extension that converts LLM-related documentation to plain text files during the build pipeline. The extension hooks into three build events: on playbookBuilt it selects the appropriate site URL, on contentClassified it injects site-url attributes into the home component, and on beforePublish it processes the llms.adoc page to extract and clean markdown content, unpublishes the HTML version, and generates both llms.txt and llms-full.txt files in the site root. The PR also includes configuration updates to the Antora playbook, a package.json version bump with the new extension export, and a sample preview/home component with test documentation.

Sequence Diagram

sequenceDiagram
    participant BP as Build Pipeline
    participant EXT as convert-llms-to-txt Extension
    participant PP as Playbook/Catalog
    participant CS as Content Staging
    participant FS as File System

    BP->>EXT: playbookBuilt event
    EXT->>PP: Select site URL (PREVIEW flag)
    Note over EXT: Store selected URL

    BP->>EXT: contentClassified event
    EXT->>CS: Inject site-url attribute<br/>into home component versions

    BP->>EXT: beforePublish event
    EXT->>CS: Locate llms.adoc page<br/>in home component
    EXT->>EXT: Extract markdown content
    EXT->>EXT: Clean content<br/>(remove HTML comments,<br/>fix em-dashes)
    EXT->>PP: Record llms.html URL<br/>in unpublishedPages
    EXT->>CS: Remove HTML page output
    EXT->>FS: Write llms-full.txt<br/>(aggregated latest-version pages)
    EXT->>FS: Write llms.txt<br/>(from llms page content)
    
    Note over EXT: Log processing steps,<br/>warnings, and errors
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested reviewers

  • Feediver1
  • paulohtb6
🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main fix: addressing URL encoding where em dashes are substituted for double hyphens in llms.txt URLs.
Description check ✅ Passed The description is directly related to the changeset, clearly explaining the problem (smart typography breaking URLs), the solution (regex replacement), and the expected outcome.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch add-llms-txt-converter

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

JakeSCahill and others added 3 commits March 3, 2026 09:21
The convert-to-markdown extension applies smart typography that converts
double hyphens (--) to em dashes (—) in URLs. This breaks URLs like:
  deploy-preview-159--redpanda-documentation.netlify.app
becomes:
  deploy-preview-159—redpanda-documentation.netlify.app

Added regex replacement to convert em dashes back to double hyphens in
all URLs within markdown content before generating llms.txt.

This fixes afdocs checks:
- llms-txt-links-resolve
- markdown-url-support
- content-negotiation

Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
@JakeSCahill JakeSCahill force-pushed the add-llms-txt-converter branch from e11fdb0 to 373b745 Compare March 3, 2026 09:21
@JakeSCahill JakeSCahill requested a review from a team March 3, 2026 09:21
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
preview/home/modules/ROOT/pages/llms.adoc (1)

7-8: Add one explicit deploy-preview URL fixture with -- for regression coverage.

This sample page is the easiest place to keep the failing hostname pattern visible to the converter.

Suggested fixture tweak
 - {site-url}/test-page[Test Page]: A test page link
 - {site-url}/another-page[Another Page]: Another test link
+- https://deploy-preview-159--redpanda-documentation.netlify.app/test-page[Deploy Preview Test Page]: Regression fixture for em-dash URL conversion
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@preview/home/modules/ROOT/pages/llms.adoc` around lines 7 - 8, Add one
explicit deploy-preview URL fixture to preview/home/modules/ROOT/pages/llms.adoc
so the converter sees a hostname containing `--`; specifically, alongside the
existing link entries ({site-url}/test-page and {site-url}/another-page) add
another link line that uses a deploy-preview-style hostname containing the
double-hyphen pattern (deploy-preview--<number>--<host>), ensuring the fixture
is an actual URL entry in the same list so the failing hostname regex is
exercised during conversion.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@extensions/convert-llms-to-txt.js`:
- Around line 76-82: The em-dash URL normalization is only applied to the local
variable content (used for llms.txt) but not to the raw page.markdownContents
written to llms-full.txt; update the code so both outputs use the same
normalized text—either normalize page.markdownContents before writing
llms-full.txt or replace usages of page.markdownContents with the
already-normalized content variable when writing llms-full.txt (ensure the same
regex replace(/\(https?:\/\/[^)]*—[^)]*\)/g, ...) is applied and keep the
logger.debug('Fixed em dashes in URLs') call to indicate normalization).

---

Nitpick comments:
In `@preview/home/modules/ROOT/pages/llms.adoc`:
- Around line 7-8: Add one explicit deploy-preview URL fixture to
preview/home/modules/ROOT/pages/llms.adoc so the converter sees a hostname
containing `--`; specifically, alongside the existing link entries
({site-url}/test-page and {site-url}/another-page) add another link line that
uses a deploy-preview-style hostname containing the double-hyphen pattern
(deploy-preview--<number>--<host>), ensuring the fixture is an actual URL entry
in the same list so the failing hostname regex is exercised during conversion.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Jira integration is disabled

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between e844adc and e11fdb0.

⛔ Files ignored due to path filters (1)
  • package-lock.json is excluded by !**/package-lock.json
📒 Files selected for processing (5)
  • extensions/convert-llms-to-txt.js
  • local-antora-playbook.yml
  • package.json
  • preview/home/antora.yml
  • preview/home/modules/ROOT/pages/llms.adoc

@JakeSCahill JakeSCahill merged commit 7ea4622 into main Mar 3, 2026
18 checks passed
@JakeSCahill JakeSCahill deleted the add-llms-txt-converter branch March 3, 2026 13:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants