AI optimization: frontmatter exports and component-specific full.txt files by JakeSCahill · Pull Request #178 · redpanda-data/docs-extensions-and-macros

JakeSCahill · 2026-03-21T19:50:11Z

Preview: https://deploy-preview-178--docs-extensions-and-macros.netlify.app/preview/test.md

This pull request introduces significant improvements to the documentation export process for AI consumption and adds new metadata to documentation pages. The main changes include adding Git commit dates as page attributes, enhancing markdown exports with YAML frontmatter, generating new component-specific documentation exports, and improving file naming for markdown outputs. These updates improve traceability, AI-friendliness, and usability of the documentation exports.

Documentation metadata and export enhancements:

Added a new extension add-git-dates.js that injects git-created-date and git-modified-date attributes into each documentation page by extracting commit dates from Git. This metadata is now available for use in templates and exports. [1] [2] [3]
Markdown exports now include YAML frontmatter generated from a curated allowlist of AsciiDoc attributes, including the new Git dates, improving the context for AI tools and downstream consumers. [1] [2]
Each markdown file now starts with a canonical source reference and an AI-specific usage note, with improved hints for aggregated documentation files.

AI-friendly export and file structure improvements:

The documentation export extension (convert-llms-to-txt.js) now generates not only llms.txt and llms-full.txt, but also component-specific *-full.txt files for each product/component, providing more focused exports for AI agents and users. [1] [2] [3]
Markdown file naming was improved: directory-style HTML outputs like /docs/page/index.html are now exported as /docs/page.md instead of /docs/page/index.md, making the exports more intuitive and compatible with AI tools. [1] [2] [3]

Other improvements:

Updated the documentation and comments to clarify new features and export formats. [1] [2]
Bumped the package version to 4.15.3 to reflect these new capabilities.

These changes collectively make the documentation exports more traceable, structured, and AI-friendly, while providing users and downstream tools with richer metadata and more flexible export options.

…oper filenames ## Changes ### convert-to-markdown.js - Add generateFrontmatter() function to convert AsciiDoc attributes to YAML frontmatter - Include page metadata (title, description, categories, etc.) in markdown files - Filter out internal Antora attributes not useful for AI consumption - Fix markdown filename generation: /page.md instead of /page/index.md - Update canonical URLs to match new filename format - Update AI-friendly notes to reference component-specific exports ### convert-llms-to-txt.js - Generate component-specific full.txt files (e.g., redpanda-full.txt, cloud-full.txt) - Add AI-Friendly Documentation Formats section to llms-full.txt - List all component-specific exports with descriptions - Update individual page headers to mention component exports - Update extension documentation header ## Benefits for AI Tools - YAML frontmatter preserves page attributes for better context - Proper filenames (no more index.md confusion) improve discoverability - Component-specific exports enable focused queries - Complete site export provides comprehensive access - Individual pages have meaningful URLs

## New Extension Create enhance-robots-txt.js to enhance Antora's default robots.txt with AI-friendly crawler permissions. ### How It Works - Runs in beforePublish phase after Antora generates robots.txt - Only enhances "allow" version (production builds) - Leaves "disallow" version unchanged (preview builds) - Adds explicit Allow directives for AI crawlers: - OpenAI (GPTBot, ChatGPT-User) - Anthropic (Claude-Web, anthropic-ai) - Perplexity (Perplexity, PerplexityBot) - Google AI (Google-Extended, GoogleOther) - Common Crawl (CCBot) - Additional platforms (cohere-ai, Omgilibot, FacebookBot) - Includes sitemap reference - Adds crawl-delay directive ### Benefits - Explicit welcome for AI crawlers improves discoverability - Better than Antora's basic "Allow: /" directive - Maintains preview build protection (no changes to disallow) - Single source of truth for AI crawler permissions ### Usage Add to Antora playbook extensions list: - require: './extensions/enhance-robots-txt'

## Changes Use Antora's built-in robots feature instead of custom extension: - Add robots: | with custom multi-line content in site config - Explicitly allow common AI crawlers: - OpenAI (GPTBot, ChatGPT-User) - Anthropic (Claude-Web, anthropic-ai) - Perplexity (Perplexity, PerplexityBot) - Google AI (Google-Extended, GoogleOther) - Common Crawl (CCBot) - Additional platforms (cohere-ai, Omgilibot, FacebookBot) - Include sitemap reference (relative path works for all deployments) - Add crawl-delay directive - Remove enhance-robots-txt extension (not needed) ## Benefits - Simpler solution using Antora built-in feature - No custom extension required - Works across all deployment environments - Relative sitemap path adapts to any domain

…tly)

netlify · 2026-03-21T19:50:17Z

✅ Deploy Preview for docs-extensions-and-macros ready!

Name	Link
🔨 Latest commit	`7d1cd02`
🔍 Latest deploy log	https://app.netlify.com/projects/docs-extensions-and-macros/deploys/69ca5ef9f862890008e55731
😎 Deploy Preview	https://deploy-preview-178--docs-extensions-and-macros.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

coderabbitai · 2026-03-21T19:50:28Z

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 721088a2-8fe8-4ed0-92fe-c48156d3f61e

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

📝 Walkthrough

Walkthrough

The PR extends Antora documentation export functionality to better serve AI systems. The convert-llms-to-txt extension now generates per-component full text files alongside the aggregated output, includes AI documentation format listings, and appends build timestamps. The convert-to-markdown extension adds YAML frontmatter generation from page attributes, improves canonical URL path normalization for markdown compatibility, and references AI-friendly documentation indices. The playbook configuration adds robots.txt rules explicitly allowing AI crawler access with rate limiting.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Fix URL encoding in llms.txt (em dashes to hyphens) #174: Introduces the convert-llms-to-txt extension that this PR substantially extends with per-component file generation and timestamping logic.
DOC-1576 Convert docs to Markdown and publish alongside HTML #148: Modifies the same convert-to-markdown extension to handle HTML-to-Markdown conversion and path/link handling that this PR builds upon with frontmatter and AI-friendly hints.
Add convert-llms-to-txt extension for agent-friendly docs #173: Originally introduced the convert-llms-to-txt extension that this PR significantly expands with new output grouping and metadata appending features.

Suggested reviewers

paulohtb6
Feediver1

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately summarizes the main changes: YAML frontmatter additions and component-specific full.txt exports are the primary focus of the changeset.
Description check	✅ Passed	The pull request description accurately describes the changes made: adding Git commit dates, enhancing markdown exports with YAML frontmatter, generating component-specific documentation exports, and improving AI-friendliness through robots.txt configuration.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch ai-optimization-frontmatter-exports

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (1)

extensions/convert-to-markdown.js (1)
28-42: Prefer an allowlist for exported frontmatter fields.

This publishes every merged AsciiDoc attribute except a short skip list. That makes future build-only or internal Antora attributes public by default and is easy to regress as new keys appear. It would be safer to opt in only the metadata you want to expose, such as title, navtitle, description, and categories.

Also applies to: 45-62
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@extensions/convert-to-markdown.js` around lines 28 - 42, Replace the current
skip-based export (skipAttributes array) with an explicit allowlist of
frontmatter fields to export: create an allowedAttributes array containing the
exact keys to publish (e.g., 'title', 'navtitle', 'description', 'categories')
and change the code that currently references skipAttributes to only include
keys present in allowedAttributes when building frontmatter. Update both the
block that defines skipAttributes and the other spot that merges attributes (the
second export/merge section that currently uses the same skip logic) so all
exported metadata is explicitly opted-in via allowedAttributes.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@extensions/convert-llms-to-txt.js`:
- Around line 243-248: The exported page URLs still use page.pub.url (HTML
paths); create and use a normalization helper (e.g., toMarkdownUrl) that
converts root '/', trailing slashes, '/index.html' and '.html' into the
corresponding '.md' paths, and replace usages of page.pub.url in
componentPages.forEach (the block that writes the componentContent URL line) and
the earlier llms-full.txt page block so the code writes normalized Markdown URLs
before emitting the "**URL**" line.

In `@extensions/convert-to-markdown.js`:
- Around line 427-435: When canonicalUrl exists the code currently prepends the
HTML comment markers before the YAML frontmatter so frontmatter-aware consumers
miss the metadata; update the assembly logic that builds markdown (the block
using canonicalUrl, componentName and urlHint) to place frontmatter first, then
the HTML comment lines (Source and urlHint) so that markdown becomes
`${frontmatter}${'<!-- Source: ... -->\n' + urlHint}\n\n${markdown}` — ensure
you still include the Source comment and urlHint but move frontmatter to the
very top before any HTML comments.

In `@local-antora-playbook.yml`:
- Line 51: The Crawl-delay: 1 line is currently placed after the FacebookBot
group so it only applies to that User-agent; to throttle other crawlers move the
Crawl-delay: 1 setting into each User-agent block you want to affect (e.g.,
under the FacebookBot block and under the wildcard User-agent "*" block) or
duplicate it inside each specific User-agent stanza instead of leaving it
outside the groups.

---

Nitpick comments:
In `@extensions/convert-to-markdown.js`:
- Around line 28-42: Replace the current skip-based export (skipAttributes
array) with an explicit allowlist of frontmatter fields to export: create an
allowedAttributes array containing the exact keys to publish (e.g., 'title',
'navtitle', 'description', 'categories') and change the code that currently
references skipAttributes to only include keys present in allowedAttributes when
building frontmatter. Update both the block that defines skipAttributes and the
other spot that merges attributes (the second export/merge section that
currently uses the same skip logic) so all exported metadata is explicitly
opted-in via allowedAttributes.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: ac309e20-cb2f-43f9-8ace-a7be65002e6c

📥 Commits

Reviewing files that changed from the base of the PR and between 8484c74 and 1f59618.

📒 Files selected for processing (3)

extensions/convert-llms-to-txt.js
extensions/convert-to-markdown.js
local-antora-playbook.yml

extensions/convert-llms-to-txt.js

extensions/convert-to-markdown.js

local-antora-playbook.yml

- convert-llms-to-txt.js: Add toMarkdownUrl() helper and convert all page URLs from HTML paths to markdown paths (.md extension) - convert-to-markdown.js: Move YAML frontmatter before HTML comments so frontmatter-aware parsers see metadata first - convert-to-markdown.js: Replace skipAttributes with allowedAttributes allowlist for explicit opt-in to frontmatter fields - local-antora-playbook.yml: Move Crawl-delay inside wildcard User-agent block for proper robots.txt syntax

…ured data - Created add-git-dates extension to extract file creation and modification dates from Git history - Uses git log with --follow to track file renames - Adds git-created-date and git-modified-date attributes in YYYY-MM-DD format - Only includes page-beta-text in frontmatter when page-beta is true - Updated convert-to-markdown to include Git date attributes in allowlist - Configured extension to run in pagesComposed event before markdown conversion Performance: Adds ~8 seconds to build time for processing 4127 pages (3m 12s total) Co-Authored-By: Claude Sonnet 4.5 <[email protected]>

Changed extension to listen to 'documentsConverted' instead of 'pagesComposed' to ensure Git dates are available before template rendering. This fixes the issue where structured data (JSON-LD) was showing today's date instead of actual Git commit dates. The UI Handlebars helpers query contentCatalog during template rendering to access page.asciidoc.attributes, so the extension must add these attributes before that phase. Also updated test page to document the Git dates feature. Co-Authored-By: Claude Sonnet 4.5 <[email protected]>

Updated extension documentation to clarify that Git dates are used for: - Structured data (JSON-LD datePublished/dateModified) - Markdown frontmatter export Removed experimental AsciiDoc extension approach as the dates don't need to be accessible as AsciiDoc attributes - the important use cases (SEO structured data and AI crawler exports) work correctly via Handlebars helpers querying contentCatalog. Co-Authored-By: Claude Sonnet 4.5 <[email protected]>

micheleRP

Overall this is a solid set of improvements. A few issues worth addressing before merge — one security concern and a couple of correctness/format bugs.

extensions/add-git-dates.js

extensions/convert-to-markdown.js

extensions/convert-llms-to-txt.js

extensions/add-git-dates.js

extensions/convert-to-markdown.js

Fixed all issues raised by Michele in PR #178: 1. **Security: Shell injection vulnerability** (add-git-dates.js) - Replaced execSync with execFileSync to avoid shell interpretation - Use argument arrays instead of string interpolation for git commands - Added --reverse flag to avoid need for shell piping 2. **Code quality: YAML serializer** (convert-to-markdown.js) - Replaced hand-rolled YAML serializer with js-yaml library - Proper escaping of special characters (@, *, &, !, etc.) - Correct handling of arrays and complex types - Removed duplicate 'doctitle' from allowlist (already set as 'title') 3. **Code quality: URL conversion** (convert-to-markdown.js, convert-llms-to-txt.js) - Extracted toMarkdownUrl() to shared utility (extension-utils/url-utils.js) - Consistent URL conversion logic across extensions - Handles root path edge case (/ -> /index.md) 4. **Code quality: Invalid HTML in plain text** (convert-llms-to-txt.js) - Removed HTML comment timestamp from llms.txt output - File contents already change per build, timestamp adds no value Co-Authored-By: Claude Sonnet 4.5 <[email protected]>

micheleRP

thank you @JakeSCahill!

…ttributes Enhancements: - Fix version fields to show actual version (e.g., "24.3", "master") instead of boolean "true" - Add user-friendly support-status field (supported/nearing end-of-life/past end-of-life) - Add user-friendly release-status field for beta versions - Add YAML comments explaining EOL (End-of-Life) and beta fields - Add support for personas attribute - Add support for learning-objective-* attributes (learning-objective-1, -2, -3, etc.) - Change page-role to page-topic-type (correct attribute name) These changes make the markdown exports more useful for AI consumption by: - Providing actual version numbers instead of booleans - Using human-readable lifecycle status instead of technical flags - Supporting important content metadata (personas, learning objectives) - Adding helpful inline documentation via YAML comments Co-Authored-By: Claude Sonnet 4.5 <[email protected]>

…e naming Performance improvements: - Use parallel async execution with concurrency limit (20) - 4.5x faster - Remove --follow flag which caused 36-52% failure rate - Process both git log calls per file in parallel Bug fixes: - Add page- prefix to attributes so they appear in page.attributes for UI templates - Update convert-to-markdown allowlist to use new attribute names Benchmarks (500 files): - Before: ~32s, 48-64% success rate - After: ~7s, 100% success rate Co-Authored-By: Claude Opus 4.5 <[email protected]>

Include version attributes from antora.yml that are useful for AI agents: - full-version: Redpanda version (e.g., 25.3.5) - ROOT component only - latest-redpanda-tag: Latest Redpanda release tag - latest-console-tag: Latest Console release tag - latest-operator-version: Latest Kubernetes operator version - latest-connect-version: Latest Redpanda Connect version Added component exclusion logic to skip full-version for redpanda-connect since it uses latest-connect-version instead. Co-Authored-By: Claude Opus 4.5 <[email protected]>

- Remove full-version from allowlist (Connect shouldn't have it) - Remove component exclusion logic (no longer needed) - Keep latest-redpanda-tag which serves the same purpose Co-Authored-By: Claude Opus 4.5 <[email protected]>

Previously the extension only processed local repos with worktrees, skipping remote content sources (4122 pages) because Antora caches remote repos as bare Git repositories without worktrees. Changes: - Support both worktree (-C) and bare repo (--git-dir) modes - Check for either origin.worktree or origin.gitdir - Pass isBareRepo flag to getGitDates function - Update docs to explain bare repo support This fixes git dates for all remote content sources in the playbook. Now processes 3812+ pages instead of only 6 local pages. Co-Authored-By: Claude Opus 4.5 <[email protected]>

- Add git-full-clone extension to enable full history for remote repos - Optimize add-git-dates to walk log once per repo (~40x faster) - Decode HTML entities in markdown export titles (What's New vs What’s New) - Add production playbook with full clone configuration Performance: ~42s for 4127 pages with full git history (vs 1.3s shallow but inaccurate dates) Build time: 2:17 total with git dates enabled Co-Authored-By: Claude Sonnet 4.5 <[email protected]>

- Add netlify.toml with prod playbook and cache configuration - Add configure-cache-dir extension to use ANTORA_CACHE_DIR env var - Update prod playbook to use remote UI bundle - Configure Netlify to cache .cache/antora directory between builds This enables Netlify's built-in caching to preserve full git clones, avoiding re-cloning repositories on each build and reducing build time from ~2:17 to potentially under 1 minute after first build. Co-Authored-By: Claude Sonnet 4.5 <[email protected]>

CRITICAL FIXES: 1. Compare trees between commits to find actual modifications (not just files that exist in tree) 2. Group pages by BOTH gitdir AND ref to handle multiple branches per repo correctly Bugs fixed: - Was setting ALL files to commit date when they existed in tree - Was using first page's ref for all pages in same repo (mixing v/23.3, v/24.1, main dates) Performance: 14.5s for 4128 pages across 12 branches Accuracy: Now matches GitHub API exactly ✓ Verified: - rolling-upgrade.adoc v/23.3: modified 2024-02-26 (matches GitHub) - Local files: created 2023-07-06 (accurate) - Remote files: per-branch dates (accurate) Co-Authored-By: Claude Sonnet 4.5 <[email protected]>

Test configuration with: - Local UI bundle for accurate testing - Main branch only for faster builds - All git dates extensions enabled Useful for verifying git dates accuracy against GitHub API. Co-Authored-By: Claude Sonnet 4.5 <[email protected]>

FEATURES: - Auto-extract Q&A from sections (writers only provide anchors) - Manual override for custom question/answer text - Mixed usage: combine auto and manual FAQs - Zero content duplication USAGE (simple - recommended): :page-faq-1-anchor: #installation :page-faq-2-anchor: #requirements [#installation] == How do I install Redpanda? Content here... Extension extracts: - Question: Heading text - Answer: Section content - URL: page URL + anchor USAGE (manual override): :page-faq-1-question: Custom question :page-faq-1-answer: Custom answer :page-faq-1-anchor: #optional GENERATED OUTPUT: - schema.org FAQPage JSON-LD in <head> - Google rich results compatible - SEO optimized FILES: - extensions/add-faq-structured-data.js (new) - extensions/README-FAQ.md (new) - package.json (export added) - test-git-dates-playbook.yml (extension enabled) NOTE: Requires updated docs-ui with head-structured-data.hbs change Co-Authored-By: Claude Sonnet 4.5 <[email protected]>

Removed auto-extraction complexity - writers now provide question and answer directly as attributes with optional anchor for deep linking. USAGE: :page-faq-1-question: How do I install Redpanda? :page-faq-1-answer: Download and run the installer. See installation guide. :page-faq-1-anchor: #installation WHY SIMPLIFIED: - Auto-extraction from sections was complex and fragile - Different block types (headings, examples, sidebars) had edge cases - Content extraction logic required cheerio parsing and tree comparison - Manual entry is explicit, predictable, and flexible BENEFITS: - Simple: Just question + answer attributes - Flexible: Writers can reference prose or write standalone FAQs - Predictable: No magic extraction, what you write is what you get - Deep linking: Optional anchors to relevant sections UPDATED: - extensions/add-faq-structured-data.js (simplified) - extensions/README-FAQ.md (updated docs) - extensions/REFERENCE.adoc (added FAQ + git dates docs) Co-Authored-By: Claude Sonnet 4.5 <[email protected]>

ADDED: - llms.adoc: Comprehensive overview of Redpanda documentation - About documentation structure and components - AI-optimized documentation access methods - MCP server information (docs.redpanda.com/mcp) - Setup instructions for Claude Code integration - Static export formats (llms-full.txt, component-full.txt) - Key topics organized by component - Metadata standards and features - sitemap.adoc: Complete documentation sitemap - All components (ROOT, cloud, redpanda-connect, labs, api, home) - Version structure and access patterns - Topic organization by user journey and role - Navigation aids and external resources - Documentation source repositories - LLMS-TXT-SETUP.md: Setup and reference guide - How to configure llms.txt generation - MCP server tool descriptions - Extension flow explanation - Testing instructions - Template locations MCP SERVER DETAILS: - URL: https://docs.redpanda.com/mcp - Setup: npx doc-tools setup-mcp - Tools: Generate docs, check versions, query structure - Integration: Works with Claude Code for documentation automation USAGE: These files power the AI-optimized documentation at: - /llms.txt: Curated overview (this content) - /llms-full.txt: Complete export - /sitemap.md: Documentation structure - /mcp: Interactive MCP server Co-Authored-By: Claude Sonnet 4.5 <[email protected]>

- Creates human-readable markdown versions of sitemap.xml files - Organizes URLs by component/path for easy browsing - Includes page metadata (modified dates, priority) - AI-friendly format for LLM consumption - Runs automatically on beforePublish event Dependencies: - Added xml2js for XML parsing Co-Authored-By: Claude Sonnet 4.5 <[email protected]>

Improvements: - Find all sitemap files (sitemap.xml, sitemap-0.xml, sitemap-1.xml, etc.) - Generate individual markdown files for each sitemap - Create master sitemap-all.md combining all pages from all sitemaps - Sort sitemaps for consistent processing order This handles Antora's typical multi-sitemap output where sites are split into multiple sitemap files (usually 1000 URLs per file) plus a sitemap index. The master sitemap-all.md provides a single comprehensive view of all documentation pages, ideal for AI agents and documentation planning. Co-Authored-By: Claude Sonnet 4.5 <[email protected]>

Fixes: - Changed event from 'published' to 'sitePublished' (correct Antora 3.1 event) - Updated regex to match ALL sitemap files (sitemap-*.xml) Previously only matched sitemap-0.xml, sitemap-1.xml (numeric) Now matches sitemap-ROOT.xml, sitemap-home.xml, etc. (all components) - Added debug logging Results: - Generates 9 individual markdown files (one per XML sitemap) - Creates master sitemap-all.md combining 4,134 pages - Works with Antora's component-specific sitemap architecture Tested with local build showing: - sitemap-ROOT.md: 3,022 pages - sitemap-redpanda-cloud.md: 661 pages - sitemap-redpanda-connect.md: 400 pages - sitemap-all.md: 4,134 total pages Co-Authored-By: Claude Sonnet 4.5 <[email protected]>

In the sitemap index markdown (sitemap.md), the links now point to the markdown versions of sub-sitemaps instead of the XML files. Before: [sitemap-home.xml](https://.../sitemap-home.xml) After: [sitemap-home.xml](https://.../sitemap-home.md) This provides a better user experience - clicking links in the sitemap index now takes you to the human-readable markdown versions. Co-Authored-By: Claude Sonnet 4.5 <[email protected]>

Changes: - Second-level headings now use sentence case: * "## Sitemap index" (was "Sitemap Index") * "## Source sitemaps" (was "Source Sitemaps") - Removed (s) constructs: * "7 sub-sitemaps" (was "sub-sitemap(s)") * "8 sitemaps" (was "sitemap(s)") * Uses proper pluralization logic - Added number formatting with commas: * "Total pages: 4,126" (was "4126") * "Total pages: 3,022" (was "3022") This improves readability and follows documentation style standards. Co-Authored-By: Claude Sonnet 4.5 <[email protected]>

Major refactoring to follow Antora best practices: Changes: - Use siteCatalog.getFiles() instead of fs to find sitemaps - Use siteCatalog.addFile() instead of fs.writeFileSync() - Read from sitemapFile.contents instead of filesystem - Changed from sitePublished to beforePublish event Benefits: - Proper integration with Antora's publication lifecycle - Files tracked in Antora's catalog system - No direct filesystem operations - Follows same pattern as convert-llms-to-txt extension This is the correct Antora extension pattern for adding files during the build process. Co-Authored-By: Claude Sonnet 4.5 <[email protected]>

Consolidation: - Moved README-FAQ.md content into REFERENCE.adoc - Moved README-SITEMAP-MARKDOWN.md content into REFERENCE.adoc - Moved LLMS-TXT-SETUP.md content into REFERENCE.adoc - Added comprehensive sections for convert-to-markdown, convert-llms-to-txt, and convert-sitemap-to-markdown extensions Removed unnecessary files: - prod-antora-playbook.yml (testing only, not needed in extensions repo) - test-git-dates-playbook.yml (testing only) - configure-cache-dir.js (superfluous, Antora has built-in cache) - README-FAQ.md (consolidated into REFERENCE.adoc) - README-SITEMAP-MARKDOWN.md (consolidated into REFERENCE.adoc) - LLMS-TXT-SETUP.md (consolidated into REFERENCE.adoc) Result: All extension documentation is now in a single REFERENCE.adoc file following the existing pattern. Production playbooks should be in docs-site repo, not here. Co-Authored-By: Claude Sonnet 4.5 <[email protected]>

Implements Hakim's suggestion to add llms.txt to sitemaps: 1. Creates sitemap-llms.xml with all llms .txt exports: - llms.txt (curated overview) - llms-full.txt (complete export) - Component-specific exports (ROOT-full.txt, cloud-full.txt, etc.) 2. Adds sitemap-llms.xml reference to main sitemap.xml index 3. sitemap-llms.md automatically generated by convert-sitemap-to-markdown Implementation: - Generates sitemap-llms.xml in beforePublish after llms files created - Finds all .txt files in siteCatalog ending with -full.txt or llms.txt - Updates main sitemap index by editing XML to add new entry - Avoids tying llms files to component-specific sitemaps This makes all AI-optimized exports discoverable via sitemap. Co-Authored-By: Claude Sonnet 4.5 <[email protected]>

- Update sitemap-llms.xml to use actual git modified dates for each file - Update component sitemaps to use git dates where available - Add consistent <lastmod> to all sitemap entries in sitemap index - Build map of URL -> git date from contentCatalog for efficient lookups Each llms export now shows when its content was actually last modified: - llms.txt: uses llms.adoc git modified date - llms-full.txt: uses most recent date from all pages - component-full.txt: uses most recent date from that component Co-Authored-By: Claude Sonnet 4.5 <[email protected]>

- Log which repos are skipped due to missing gitdir - Log which repos are being processed successfully - Will help identify why cloud-docs and rp-connect-docs aren't getting git dates Co-Authored-By: Claude Sonnet 4.5 <[email protected]>

Use INFO level instead of DEBUG so logs show up in build output This will help diagnose why cloud-docs and rp-connect-docs aren't getting git dates Co-Authored-By: Claude Sonnet 4.5 <[email protected]>

Problem: Despite setting git.depth=0 in the playbook, Antora was still creating shallow clones for some repos (cloud-docs, rp-connect-docs), resulting in only 1 commit being available for git date extraction. Solution: Implement a two-phase approach: 1. Phase 1: Set depth=0 in playbook (best effort) 2. Phase 2: After content aggregation, detect any repos with a shallow file and run 'git fetch --unshallow' to convert them to full clones Results: - cloud-docs: Now walking 511 commits (was 1) - rp-connect-docs: Now walking 396 commits (was 1) - All sitemaps now show accurate git dates instead of build timestamps - Git dates processed for 4125 pages in 14.3s (3.5ms/page) Co-Authored-By: Claude Sonnet 4.5 <[email protected]>

Improvements for production readiness: - Add timeout protection (default 60s per repo, configurable via unshallowTimeout) - Add skipUnshallow config option for air-gapped environments - Add timing logs to monitor unshallow performance - Better error messages distinguishing timeouts from other failures - Document production considerations in code comments Configuration example: antora: extensions: - require: '@redpanda-data/docs-extensions-and-macros/extensions/git-full-clone' skipUnshallow: false unshallowTimeout: 120000 # 2 minutes for very large repos These safeguards ensure the extension won't hang or break builds even if: - Repos grow to 50k+ commits - Network is slow or intermittent - Running in air-gapped CI/CD environment Co-Authored-By: Claude Sonnet 4.5 <[email protected]>

Added comprehensive documentation for the git-full-clone extension: - How it works (two-phase approach) - Performance characteristics and scalability - Configuration options (skipUnshallow, unshallowTimeout) - Production considerations and best practices - Error handling and timeout protection - Optimization strategies for very large repos Also added git-full-clone to the extensions list in README.adoc under a new "Git integration" category. Co-Authored-By: Claude Sonnet 4.5 <[email protected]>

JakeSCahill added 4 commits March 21, 2026 10:18

Remove Sitemap directive from robots.txt (Antora resolves it incorrec…

1f59618

…tly)

This was referenced Mar 21, 2026

Update llms.adoc for proper markdown filenames and AI crawler permissions redpanda-data/docs-site#161

Merged

Add AI-friendly meta tags and enhanced structured data redpanda-data/docs-ui#371

Merged

JakeSCahill and others added 2 commits March 21, 2026 19:53

Merge branch 'main' into ai-optimization-frontmatter-exports

3c04baf

Bump version to 4.15.3

488ba28

coderabbitai bot reviewed Mar 21, 2026

View reviewed changes

extensions/convert-llms-to-txt.js Show resolved Hide resolved

extensions/convert-to-markdown.js Outdated Show resolved Hide resolved

local-antora-playbook.yml Outdated Show resolved Hide resolved

JakeSCahill and others added 3 commits March 21, 2026 20:10

Update convert-to-markdown.js

bae9f9e

Update local-antora-playbook.yml

4113e71

JakeSCahill requested a review from a team March 22, 2026 08:23

JakeSCahill and others added 3 commits March 22, 2026 12:33

JakeSCahill requested a review from micheleRP March 23, 2026 18:36

micheleRP reviewed Mar 23, 2026

View reviewed changes

extensions/add-git-dates.js Outdated Show resolved Hide resolved

extensions/convert-to-markdown.js Show resolved Hide resolved

extensions/convert-llms-to-txt.js Show resolved Hide resolved

micheleRP reviewed Mar 23, 2026

View reviewed changes

extensions/add-git-dates.js Outdated Show resolved Hide resolved

extensions/convert-to-markdown.js Outdated Show resolved Hide resolved

extensions/convert-to-markdown.js Outdated Show resolved Hide resolved

JakeSCahill requested a review from micheleRP March 24, 2026 08:10

micheleRP approved these changes Mar 24, 2026

View reviewed changes

JakeSCahill and others added 6 commits March 24, 2026 16:36

Update local-antora-playbook.yml

9b78203

JakeSCahill and others added 23 commits March 28, 2026 08:08

Remove netlify.toml (belongs in docs-site, not extensions repo)

e7bcdd3

Remove sitemap.adoc - doesn't belong in extensions repo

1eae8d2

Change debug logging to INFO level for visibility

72e2131

Use INFO level instead of DEBUG so logs show up in build output This will help diagnose why cloud-docs and rp-connect-docs aren't getting git dates Co-Authored-By: Claude Sonnet 4.5 <[email protected]>

JakeSCahill merged commit 82d683c into main Mar 30, 2026
18 checks passed

JakeSCahill deleted the ai-optimization-frontmatter-exports branch March 30, 2026 16:25

coderabbitai bot mentioned this pull request Mar 31, 2026

fix: place frontmatter after H1 heading in markdown exports #182

Merged

Conversation

JakeSCahill commented Mar 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

netlify bot commented Mar 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for docs-extensions-and-macros ready!

Uh oh!

coderabbitai bot commented Mar 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Estimated code review effort

Possibly related PRs

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

micheleRP left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

micheleRP left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

JakeSCahill commented Mar 21, 2026 •

edited

Loading

netlify bot commented Mar 21, 2026 •

edited

Loading

coderabbitai bot commented Mar 21, 2026 •

edited

Loading