AI optimization: frontmatter exports and component-specific full.txt files#178
AI optimization: frontmatter exports and component-specific full.txt files#178JakeSCahill merged 42 commits intomainfrom
Conversation
…oper filenames ## Changes ### convert-to-markdown.js - Add generateFrontmatter() function to convert AsciiDoc attributes to YAML frontmatter - Include page metadata (title, description, categories, etc.) in markdown files - Filter out internal Antora attributes not useful for AI consumption - Fix markdown filename generation: /page.md instead of /page/index.md - Update canonical URLs to match new filename format - Update AI-friendly notes to reference component-specific exports ### convert-llms-to-txt.js - Generate component-specific full.txt files (e.g., redpanda-full.txt, cloud-full.txt) - Add AI-Friendly Documentation Formats section to llms-full.txt - List all component-specific exports with descriptions - Update individual page headers to mention component exports - Update extension documentation header ## Benefits for AI Tools - YAML frontmatter preserves page attributes for better context - Proper filenames (no more index.md confusion) improve discoverability - Component-specific exports enable focused queries - Complete site export provides comprehensive access - Individual pages have meaningful URLs
## New Extension Create enhance-robots-txt.js to enhance Antora's default robots.txt with AI-friendly crawler permissions. ### How It Works - Runs in beforePublish phase after Antora generates robots.txt - Only enhances "allow" version (production builds) - Leaves "disallow" version unchanged (preview builds) - Adds explicit Allow directives for AI crawlers: - OpenAI (GPTBot, ChatGPT-User) - Anthropic (Claude-Web, anthropic-ai) - Perplexity (Perplexity, PerplexityBot) - Google AI (Google-Extended, GoogleOther) - Common Crawl (CCBot) - Additional platforms (cohere-ai, Omgilibot, FacebookBot) - Includes sitemap reference - Adds crawl-delay directive ### Benefits - Explicit welcome for AI crawlers improves discoverability - Better than Antora's basic "Allow: /" directive - Maintains preview build protection (no changes to disallow) - Single source of truth for AI crawler permissions ### Usage Add to Antora playbook extensions list: - require: './extensions/enhance-robots-txt'
## Changes Use Antora's built-in robots feature instead of custom extension: - Add robots: | with custom multi-line content in site config - Explicitly allow common AI crawlers: - OpenAI (GPTBot, ChatGPT-User) - Anthropic (Claude-Web, anthropic-ai) - Perplexity (Perplexity, PerplexityBot) - Google AI (Google-Extended, GoogleOther) - Common Crawl (CCBot) - Additional platforms (cohere-ai, Omgilibot, FacebookBot) - Include sitemap reference (relative path works for all deployments) - Add crawl-delay directive - Remove enhance-robots-txt extension (not needed) ## Benefits - Simpler solution using Antora built-in feature - No custom extension required - Works across all deployment environments - Relative sitemap path adapts to any domain
✅ Deploy Preview for docs-extensions-and-macros ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
📝 WalkthroughWalkthroughThe PR extends Antora documentation export functionality to better serve AI systems. The Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (1)
extensions/convert-to-markdown.js (1)
28-42: Prefer an allowlist for exported frontmatter fields.This publishes every merged AsciiDoc attribute except a short skip list. That makes future build-only or internal Antora attributes public by default and is easy to regress as new keys appear. It would be safer to opt in only the metadata you want to expose, such as
title,navtitle,description, andcategories.Also applies to: 45-62
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@extensions/convert-to-markdown.js` around lines 28 - 42, Replace the current skip-based export (skipAttributes array) with an explicit allowlist of frontmatter fields to export: create an allowedAttributes array containing the exact keys to publish (e.g., 'title', 'navtitle', 'description', 'categories') and change the code that currently references skipAttributes to only include keys present in allowedAttributes when building frontmatter. Update both the block that defines skipAttributes and the other spot that merges attributes (the second export/merge section that currently uses the same skip logic) so all exported metadata is explicitly opted-in via allowedAttributes.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@extensions/convert-llms-to-txt.js`:
- Around line 243-248: The exported page URLs still use page.pub.url (HTML
paths); create and use a normalization helper (e.g., toMarkdownUrl) that
converts root '/', trailing slashes, '/index.html' and '.html' into the
corresponding '.md' paths, and replace usages of page.pub.url in
componentPages.forEach (the block that writes the componentContent URL line) and
the earlier llms-full.txt page block so the code writes normalized Markdown URLs
before emitting the "**URL**" line.
In `@extensions/convert-to-markdown.js`:
- Around line 427-435: When canonicalUrl exists the code currently prepends the
HTML comment markers before the YAML frontmatter so frontmatter-aware consumers
miss the metadata; update the assembly logic that builds markdown (the block
using canonicalUrl, componentName and urlHint) to place frontmatter first, then
the HTML comment lines (Source and urlHint) so that markdown becomes
`${frontmatter}${'<!-- Source: ... -->\n' + urlHint}\n\n${markdown}` — ensure
you still include the Source comment and urlHint but move frontmatter to the
very top before any HTML comments.
In `@local-antora-playbook.yml`:
- Line 51: The Crawl-delay: 1 line is currently placed after the FacebookBot
group so it only applies to that User-agent; to throttle other crawlers move the
Crawl-delay: 1 setting into each User-agent block you want to affect (e.g.,
under the FacebookBot block and under the wildcard User-agent "*" block) or
duplicate it inside each specific User-agent stanza instead of leaving it
outside the groups.
---
Nitpick comments:
In `@extensions/convert-to-markdown.js`:
- Around line 28-42: Replace the current skip-based export (skipAttributes
array) with an explicit allowlist of frontmatter fields to export: create an
allowedAttributes array containing the exact keys to publish (e.g., 'title',
'navtitle', 'description', 'categories') and change the code that currently
references skipAttributes to only include keys present in allowedAttributes when
building frontmatter. Update both the block that defines skipAttributes and the
other spot that merges attributes (the second export/merge section that
currently uses the same skip logic) so all exported metadata is explicitly
opted-in via allowedAttributes.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: ac309e20-cb2f-43f9-8ace-a7be65002e6c
📒 Files selected for processing (3)
extensions/convert-llms-to-txt.jsextensions/convert-to-markdown.jslocal-antora-playbook.yml
- convert-llms-to-txt.js: Add toMarkdownUrl() helper and convert all page URLs from HTML paths to markdown paths (.md extension) - convert-to-markdown.js: Move YAML frontmatter before HTML comments so frontmatter-aware parsers see metadata first - convert-to-markdown.js: Replace skipAttributes with allowedAttributes allowlist for explicit opt-in to frontmatter fields - local-antora-playbook.yml: Move Crawl-delay inside wildcard User-agent block for proper robots.txt syntax
…ured data - Created add-git-dates extension to extract file creation and modification dates from Git history - Uses git log with --follow to track file renames - Adds git-created-date and git-modified-date attributes in YYYY-MM-DD format - Only includes page-beta-text in frontmatter when page-beta is true - Updated convert-to-markdown to include Git date attributes in allowlist - Configured extension to run in pagesComposed event before markdown conversion Performance: Adds ~8 seconds to build time for processing 4127 pages (3m 12s total) Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
Changed extension to listen to 'documentsConverted' instead of 'pagesComposed' to ensure Git dates are available before template rendering. This fixes the issue where structured data (JSON-LD) was showing today's date instead of actual Git commit dates. The UI Handlebars helpers query contentCatalog during template rendering to access page.asciidoc.attributes, so the extension must add these attributes before that phase. Also updated test page to document the Git dates feature. Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
Updated extension documentation to clarify that Git dates are used for: - Structured data (JSON-LD datePublished/dateModified) - Markdown frontmatter export Removed experimental AsciiDoc extension approach as the dates don't need to be accessible as AsciiDoc attributes - the important use cases (SEO structured data and AI crawler exports) work correctly via Handlebars helpers querying contentCatalog. Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
micheleRP
left a comment
There was a problem hiding this comment.
Overall this is a solid set of improvements. A few issues worth addressing before merge — one security concern and a couple of correctness/format bugs.
Fixed all issues raised by Michele in PR #178: 1. **Security: Shell injection vulnerability** (add-git-dates.js) - Replaced execSync with execFileSync to avoid shell interpretation - Use argument arrays instead of string interpolation for git commands - Added --reverse flag to avoid need for shell piping 2. **Code quality: YAML serializer** (convert-to-markdown.js) - Replaced hand-rolled YAML serializer with js-yaml library - Proper escaping of special characters (@, *, &, !, etc.) - Correct handling of arrays and complex types - Removed duplicate 'doctitle' from allowlist (already set as 'title') 3. **Code quality: URL conversion** (convert-to-markdown.js, convert-llms-to-txt.js) - Extracted toMarkdownUrl() to shared utility (extension-utils/url-utils.js) - Consistent URL conversion logic across extensions - Handles root path edge case (/ -> /index.md) 4. **Code quality: Invalid HTML in plain text** (convert-llms-to-txt.js) - Removed HTML comment timestamp from llms.txt output - File contents already change per build, timestamp adds no value Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
…ttributes Enhancements: - Fix version fields to show actual version (e.g., "24.3", "master") instead of boolean "true" - Add user-friendly support-status field (supported/nearing end-of-life/past end-of-life) - Add user-friendly release-status field for beta versions - Add YAML comments explaining EOL (End-of-Life) and beta fields - Add support for personas attribute - Add support for learning-objective-* attributes (learning-objective-1, -2, -3, etc.) - Change page-role to page-topic-type (correct attribute name) These changes make the markdown exports more useful for AI consumption by: - Providing actual version numbers instead of booleans - Using human-readable lifecycle status instead of technical flags - Supporting important content metadata (personas, learning objectives) - Adding helpful inline documentation via YAML comments Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
…e naming Performance improvements: - Use parallel async execution with concurrency limit (20) - 4.5x faster - Remove --follow flag which caused 36-52% failure rate - Process both git log calls per file in parallel Bug fixes: - Add page- prefix to attributes so they appear in page.attributes for UI templates - Update convert-to-markdown allowlist to use new attribute names Benchmarks (500 files): - Before: ~32s, 48-64% success rate - After: ~7s, 100% success rate Co-Authored-By: Claude Opus 4.5 <[email protected]>
Include version attributes from antora.yml that are useful for AI agents: - full-version: Redpanda version (e.g., 25.3.5) - ROOT component only - latest-redpanda-tag: Latest Redpanda release tag - latest-console-tag: Latest Console release tag - latest-operator-version: Latest Kubernetes operator version - latest-connect-version: Latest Redpanda Connect version Added component exclusion logic to skip full-version for redpanda-connect since it uses latest-connect-version instead. Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Remove full-version from allowlist (Connect shouldn't have it) - Remove component exclusion logic (no longer needed) - Keep latest-redpanda-tag which serves the same purpose Co-Authored-By: Claude Opus 4.5 <[email protected]>
Previously the extension only processed local repos with worktrees, skipping remote content sources (4122 pages) because Antora caches remote repos as bare Git repositories without worktrees. Changes: - Support both worktree (-C) and bare repo (--git-dir) modes - Check for either origin.worktree or origin.gitdir - Pass isBareRepo flag to getGitDates function - Update docs to explain bare repo support This fixes git dates for all remote content sources in the playbook. Now processes 3812+ pages instead of only 6 local pages. Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Add git-full-clone extension to enable full history for remote repos - Optimize add-git-dates to walk log once per repo (~40x faster) - Decode HTML entities in markdown export titles (What's New vs What’s New) - Add production playbook with full clone configuration Performance: ~42s for 4127 pages with full git history (vs 1.3s shallow but inaccurate dates) Build time: 2:17 total with git dates enabled Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
- Add netlify.toml with prod playbook and cache configuration - Add configure-cache-dir extension to use ANTORA_CACHE_DIR env var - Update prod playbook to use remote UI bundle - Configure Netlify to cache .cache/antora directory between builds This enables Netlify's built-in caching to preserve full git clones, avoiding re-cloning repositories on each build and reducing build time from ~2:17 to potentially under 1 minute after first build. Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
CRITICAL FIXES: 1. Compare trees between commits to find actual modifications (not just files that exist in tree) 2. Group pages by BOTH gitdir AND ref to handle multiple branches per repo correctly Bugs fixed: - Was setting ALL files to commit date when they existed in tree - Was using first page's ref for all pages in same repo (mixing v/23.3, v/24.1, main dates) Performance: 14.5s for 4128 pages across 12 branches Accuracy: Now matches GitHub API exactly ✓ Verified: - rolling-upgrade.adoc v/23.3: modified 2024-02-26 (matches GitHub) - Local files: created 2023-07-06 (accurate) - Remote files: per-branch dates (accurate) Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
Test configuration with: - Local UI bundle for accurate testing - Main branch only for faster builds - All git dates extensions enabled Useful for verifying git dates accuracy against GitHub API. Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
FEATURES: - Auto-extract Q&A from sections (writers only provide anchors) - Manual override for custom question/answer text - Mixed usage: combine auto and manual FAQs - Zero content duplication USAGE (simple - recommended): :page-faq-1-anchor: #installation :page-faq-2-anchor: #requirements [#installation] == How do I install Redpanda? Content here... Extension extracts: - Question: Heading text - Answer: Section content - URL: page URL + anchor USAGE (manual override): :page-faq-1-question: Custom question :page-faq-1-answer: Custom answer :page-faq-1-anchor: #optional GENERATED OUTPUT: - schema.org FAQPage JSON-LD in <head> - Google rich results compatible - SEO optimized FILES: - extensions/add-faq-structured-data.js (new) - extensions/README-FAQ.md (new) - package.json (export added) - test-git-dates-playbook.yml (extension enabled) NOTE: Requires updated docs-ui with head-structured-data.hbs change Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
Removed auto-extraction complexity - writers now provide question and answer directly as attributes with optional anchor for deep linking. USAGE: :page-faq-1-question: How do I install Redpanda? :page-faq-1-answer: Download and run the installer. See installation guide. :page-faq-1-anchor: #installation WHY SIMPLIFIED: - Auto-extraction from sections was complex and fragile - Different block types (headings, examples, sidebars) had edge cases - Content extraction logic required cheerio parsing and tree comparison - Manual entry is explicit, predictable, and flexible BENEFITS: - Simple: Just question + answer attributes - Flexible: Writers can reference prose or write standalone FAQs - Predictable: No magic extraction, what you write is what you get - Deep linking: Optional anchors to relevant sections UPDATED: - extensions/add-faq-structured-data.js (simplified) - extensions/README-FAQ.md (updated docs) - extensions/REFERENCE.adoc (added FAQ + git dates docs) Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
ADDED: - llms.adoc: Comprehensive overview of Redpanda documentation - About documentation structure and components - AI-optimized documentation access methods - MCP server information (docs.redpanda.com/mcp) - Setup instructions for Claude Code integration - Static export formats (llms-full.txt, component-full.txt) - Key topics organized by component - Metadata standards and features - sitemap.adoc: Complete documentation sitemap - All components (ROOT, cloud, redpanda-connect, labs, api, home) - Version structure and access patterns - Topic organization by user journey and role - Navigation aids and external resources - Documentation source repositories - LLMS-TXT-SETUP.md: Setup and reference guide - How to configure llms.txt generation - MCP server tool descriptions - Extension flow explanation - Testing instructions - Template locations MCP SERVER DETAILS: - URL: https://docs.redpanda.com/mcp - Setup: npx doc-tools setup-mcp - Tools: Generate docs, check versions, query structure - Integration: Works with Claude Code for documentation automation USAGE: These files power the AI-optimized documentation at: - /llms.txt: Curated overview (this content) - /llms-full.txt: Complete export - /sitemap.md: Documentation structure - /mcp: Interactive MCP server Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
- Creates human-readable markdown versions of sitemap.xml files - Organizes URLs by component/path for easy browsing - Includes page metadata (modified dates, priority) - AI-friendly format for LLM consumption - Runs automatically on beforePublish event Dependencies: - Added xml2js for XML parsing Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
Improvements: - Find all sitemap files (sitemap.xml, sitemap-0.xml, sitemap-1.xml, etc.) - Generate individual markdown files for each sitemap - Create master sitemap-all.md combining all pages from all sitemaps - Sort sitemaps for consistent processing order This handles Antora's typical multi-sitemap output where sites are split into multiple sitemap files (usually 1000 URLs per file) plus a sitemap index. The master sitemap-all.md provides a single comprehensive view of all documentation pages, ideal for AI agents and documentation planning. Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
Fixes: - Changed event from 'published' to 'sitePublished' (correct Antora 3.1 event) - Updated regex to match ALL sitemap files (sitemap-*.xml) Previously only matched sitemap-0.xml, sitemap-1.xml (numeric) Now matches sitemap-ROOT.xml, sitemap-home.xml, etc. (all components) - Added debug logging Results: - Generates 9 individual markdown files (one per XML sitemap) - Creates master sitemap-all.md combining 4,134 pages - Works with Antora's component-specific sitemap architecture Tested with local build showing: - sitemap-ROOT.md: 3,022 pages - sitemap-redpanda-cloud.md: 661 pages - sitemap-redpanda-connect.md: 400 pages - sitemap-all.md: 4,134 total pages Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
In the sitemap index markdown (sitemap.md), the links now point to the markdown versions of sub-sitemaps instead of the XML files. Before: [sitemap-home.xml](https://.../sitemap-home.xml) After: [sitemap-home.xml](https://.../sitemap-home.md) This provides a better user experience - clicking links in the sitemap index now takes you to the human-readable markdown versions. Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
Changes: - Second-level headings now use sentence case: * "## Sitemap index" (was "Sitemap Index") * "## Source sitemaps" (was "Source Sitemaps") - Removed (s) constructs: * "7 sub-sitemaps" (was "sub-sitemap(s)") * "8 sitemaps" (was "sitemap(s)") * Uses proper pluralization logic - Added number formatting with commas: * "Total pages: 4,126" (was "4126") * "Total pages: 3,022" (was "3022") This improves readability and follows documentation style standards. Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
Major refactoring to follow Antora best practices: Changes: - Use siteCatalog.getFiles() instead of fs to find sitemaps - Use siteCatalog.addFile() instead of fs.writeFileSync() - Read from sitemapFile.contents instead of filesystem - Changed from sitePublished to beforePublish event Benefits: - Proper integration with Antora's publication lifecycle - Files tracked in Antora's catalog system - No direct filesystem operations - Follows same pattern as convert-llms-to-txt extension This is the correct Antora extension pattern for adding files during the build process. Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
Consolidation: - Moved README-FAQ.md content into REFERENCE.adoc - Moved README-SITEMAP-MARKDOWN.md content into REFERENCE.adoc - Moved LLMS-TXT-SETUP.md content into REFERENCE.adoc - Added comprehensive sections for convert-to-markdown, convert-llms-to-txt, and convert-sitemap-to-markdown extensions Removed unnecessary files: - prod-antora-playbook.yml (testing only, not needed in extensions repo) - test-git-dates-playbook.yml (testing only) - configure-cache-dir.js (superfluous, Antora has built-in cache) - README-FAQ.md (consolidated into REFERENCE.adoc) - README-SITEMAP-MARKDOWN.md (consolidated into REFERENCE.adoc) - LLMS-TXT-SETUP.md (consolidated into REFERENCE.adoc) Result: All extension documentation is now in a single REFERENCE.adoc file following the existing pattern. Production playbooks should be in docs-site repo, not here. Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
Implements Hakim's suggestion to add llms.txt to sitemaps: 1. Creates sitemap-llms.xml with all llms .txt exports: - llms.txt (curated overview) - llms-full.txt (complete export) - Component-specific exports (ROOT-full.txt, cloud-full.txt, etc.) 2. Adds sitemap-llms.xml reference to main sitemap.xml index 3. sitemap-llms.md automatically generated by convert-sitemap-to-markdown Implementation: - Generates sitemap-llms.xml in beforePublish after llms files created - Finds all .txt files in siteCatalog ending with -full.txt or llms.txt - Updates main sitemap index by editing XML to add new entry - Avoids tying llms files to component-specific sitemaps This makes all AI-optimized exports discoverable via sitemap. Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
- Update sitemap-llms.xml to use actual git modified dates for each file - Update component sitemaps to use git dates where available - Add consistent <lastmod> to all sitemap entries in sitemap index - Build map of URL -> git date from contentCatalog for efficient lookups Each llms export now shows when its content was actually last modified: - llms.txt: uses llms.adoc git modified date - llms-full.txt: uses most recent date from all pages - component-full.txt: uses most recent date from that component Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
- Log which repos are skipped due to missing gitdir - Log which repos are being processed successfully - Will help identify why cloud-docs and rp-connect-docs aren't getting git dates Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
Use INFO level instead of DEBUG so logs show up in build output This will help diagnose why cloud-docs and rp-connect-docs aren't getting git dates Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
Problem: Despite setting git.depth=0 in the playbook, Antora was still creating shallow clones for some repos (cloud-docs, rp-connect-docs), resulting in only 1 commit being available for git date extraction. Solution: Implement a two-phase approach: 1. Phase 1: Set depth=0 in playbook (best effort) 2. Phase 2: After content aggregation, detect any repos with a shallow file and run 'git fetch --unshallow' to convert them to full clones Results: - cloud-docs: Now walking 511 commits (was 1) - rp-connect-docs: Now walking 396 commits (was 1) - All sitemaps now show accurate git dates instead of build timestamps - Git dates processed for 4125 pages in 14.3s (3.5ms/page) Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
Improvements for production readiness:
- Add timeout protection (default 60s per repo, configurable via unshallowTimeout)
- Add skipUnshallow config option for air-gapped environments
- Add timing logs to monitor unshallow performance
- Better error messages distinguishing timeouts from other failures
- Document production considerations in code comments
Configuration example:
antora:
extensions:
- require: '@redpanda-data/docs-extensions-and-macros/extensions/git-full-clone'
skipUnshallow: false
unshallowTimeout: 120000 # 2 minutes for very large repos
These safeguards ensure the extension won't hang or break builds even if:
- Repos grow to 50k+ commits
- Network is slow or intermittent
- Running in air-gapped CI/CD environment
Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
Added comprehensive documentation for the git-full-clone extension: - How it works (two-phase approach) - Performance characteristics and scalability - Configuration options (skipUnshallow, unshallowTimeout) - Production considerations and best practices - Error handling and timeout protection - Optimization strategies for very large repos Also added git-full-clone to the extensions list in README.adoc under a new "Git integration" category. Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
Preview: https://deploy-preview-178--docs-extensions-and-macros.netlify.app/preview/test.md
This pull request introduces significant improvements to the documentation export process for AI consumption and adds new metadata to documentation pages. The main changes include adding Git commit dates as page attributes, enhancing markdown exports with YAML frontmatter, generating new component-specific documentation exports, and improving file naming for markdown outputs. These updates improve traceability, AI-friendliness, and usability of the documentation exports.
Documentation metadata and export enhancements:
add-git-dates.jsthat injectsgit-created-dateandgit-modified-dateattributes into each documentation page by extracting commit dates from Git. This metadata is now available for use in templates and exports. [1] [2] [3]AI-friendly export and file structure improvements:
convert-llms-to-txt.js) now generates not onlyllms.txtandllms-full.txt, but also component-specific*-full.txtfiles for each product/component, providing more focused exports for AI agents and users. [1] [2] [3]/docs/page/index.htmlare now exported as/docs/page.mdinstead of/docs/page/index.md, making the exports more intuitive and compatible with AI tools. [1] [2] [3]Other improvements:
4.15.3to reflect these new capabilities.These changes collectively make the documentation exports more traceable, structured, and AI-friendly, while providing users and downstream tools with richer metadata and more flexible export options.