Changeset 3470692
- Timestamp:
- 02/26/2026 09:19:31 PM (4 weeks ago)
- Location:
- botkibble
- Files:
-
- 6 edited
- 6 copied
-
tags/1.3.0 (copied) (copied from botkibble/trunk)
-
tags/1.3.0/botkibble.php (copied) (copied from botkibble/trunk/botkibble.php) (2 diffs)
-
tags/1.3.0/composer.json (copied) (copied from botkibble/trunk/composer.json)
-
tags/1.3.0/includes (copied) (copied from botkibble/trunk/includes)
-
tags/1.3.0/includes/converter.php (modified) (2 diffs)
-
tags/1.3.0/includes/routing.php (modified) (1 diff)
-
tags/1.3.0/readme.txt (copied) (copied from botkibble/trunk/readme.txt) (8 diffs)
-
tags/1.3.0/vendor (copied) (copied from botkibble/trunk/vendor)
-
trunk/botkibble.php (modified) (2 diffs)
-
trunk/includes/converter.php (modified) (2 diffs)
-
trunk/includes/routing.php (modified) (1 diff)
-
trunk/readme.txt (modified) (8 diffs)
Legend:
- Unmodified
- Added
- Removed
-
botkibble/tags/1.3.0/botkibble.php
r3470652 r3470692 4 4 * Plugin URI: https://github.com/greg-randall/botkibble 5 5 * Description: Serve published posts and pages as clean Markdown for AI agents and crawlers. 6 * Version: 1. 2.16 * Version: 1.3.0 7 7 * Requires at least: 6.0 8 8 * Requires PHP: 8.2 … … 17 17 } 18 18 19 define( 'BOTKIBBLE_VERSION', '1. 2.1' );19 define( 'BOTKIBBLE_VERSION', '1.3.0' ); 20 20 define( 'BOTKIBBLE_PLUGIN_DIR', plugin_dir_path( __FILE__ ) ); 21 21 -
botkibble/tags/1.3.0/includes/converter.php
r3470652 r3470692 312 312 static $converter = null; 313 313 if ( null === $converter ) { 314 $converter = new HtmlConverter([314 $converter_options = [ 315 315 'strip_tags' => true, 316 316 'hard_break' => true, 317 ] ); 317 ]; 318 319 /** 320 * Optional: remove entire DOM node types before conversion. 321 * 322 * Keep this empty by default to preserve legacy behavior. 323 * Example return values: 324 * - array: [ 'script', 'style' ] 325 * - string: 'script style' 326 * 327 * @param array<int, string>|string $remove_nodes Requested node names. 328 * @param WP_Post $post Current post being rendered. 329 */ 330 $remove_nodes = apply_filters( 'botkibble_converter_remove_nodes', [], $post ); 331 $remove_nodes = botkibble_normalize_remove_nodes( $remove_nodes ); 332 if ( ! empty( $remove_nodes ) ) { 333 $converter_options['remove_nodes'] = implode( ' ', $remove_nodes ); 334 } 335 336 $converter = new HtmlConverter( $converter_options ); 318 337 } 319 338 … … 322 341 'word_count' => $word_count, 323 342 ]; 343 } 344 345 /** 346 * Normalize a converter remove_nodes value into a clean list of tag names. 347 * 348 * Accepts either a string (space/comma-separated) or an array of values and 349 * returns unique lowercase tag names safe to pass to HtmlConverter. 350 * 351 * @param array<int, mixed>|string $nodes Raw filter value. 352 * @return array<int, string> 353 */ 354 function botkibble_normalize_remove_nodes( $nodes ): array { 355 if ( is_string( $nodes ) ) { 356 $nodes = preg_split( '/[\s,]+/', $nodes ) ?: []; 357 } elseif ( ! is_array( $nodes ) ) { 358 return []; 359 } 360 361 $out = []; 362 foreach ( $nodes as $node ) { 363 $name = strtolower( trim( (string) $node ) ); 364 if ( '' === $name ) { 365 continue; 366 } 367 368 // Keep DOM-like node names only (e.g. script, style, iframe). 369 if ( ! preg_match( '/^[a-z][a-z0-9:_-]*$/', $name ) ) { 370 continue; 371 } 372 373 $out[ $name ] = true; 374 } 375 376 return array_keys( $out ); 324 377 } 325 378 -
botkibble/tags/1.3.0/includes/routing.php
r3470652 r3470692 655 655 */ 656 656 function botkibble_send_content_signal_header( ?WP_Post $post = null ): void { 657 $signal = apply_filters( 'botkibble_content_signal', 'ai-train= yes, search=yes, ai-input=yes', $post );657 $signal = apply_filters( 'botkibble_content_signal', 'ai-train=no, search=yes, ai-input=yes', $post ); 658 658 $signal = str_replace( [ "\r", "\n" ], '', $signal ); 659 659 if ( $signal ) { -
botkibble/tags/1.3.0/readme.txt
r3470652 r3470692 5 5 Tested up to: 6.9 6 6 Requires PHP: 8.2 7 Stable tag: 1. 2.17 Stable tag: 1.3.0 8 8 License: GPL-2.0-only 9 9 License URI: https://www.gnu.org/licenses/gpl-2.0.html 10 10 11 Serve published posts and pages as clean Markdown with YAML frontmatter — built for AI agents and crawlers.11 Serves every published post and page as Markdown for AI agents and crawlers. No configuration, no API keys. Activate and it works. 12 12 13 13 == Description == 14 14 15 Botkibble converts any published post or page on your WordPress site to Markdown. It caches the output and serves it with `text/markdown` headers. 15 AI agents, LLMs, and crawlers have to wade through navigation bars, sidebars, ads, and comment forms to reach the content they want, and every element costs tokens. [Cloudflare measured](https://blog.cloudflare.com/markdown-for-agents/) an 80% reduction in token usage when converting a blog post from HTML to Markdown (16,180 tokens down to 3,150). 16 17 Botkibble adds a Markdown endpoint to every published post and page. 18 19 Cloudflare offers [Markdown for Agents](https://developers.cloudflare.com/fundamentals/reference/markdown-for-agents/) at the CDN edge on Pro, Business, and Enterprise plans. Botkibble does the same thing (for free) at the origin, so it works on any host. 16 20 17 21 [GitHub Repository](https://github.com/greg-randall/botkibble) … … 19 23 **Three ways to request Markdown:** 20 24 21 * **`.md` suffix** — append `.md` to any post or page URL (e.g. `example.com/my-post.md`) 22 * **Query parameter** — add `?format=markdown` to any post or page URL 23 * **Content negotiation** — send `Accept: text/markdown` in the request header 25 * **`.md` suffix**: append `.md` to any post or page URL (e.g. `example.com/my-post.md`) 26 * **Query parameter**: add `?format=markdown` to any post or page URL 27 * **Content negotiation**: send `Accept: text/markdown` in the request header 28 29 **What's in every response:** 30 31 * Structured metadata header with title, date, categories, tags, word count, character count, and estimated token count (in YAML frontmatter format, readable by any AI agent) 32 * Clean Markdown converted from fully-rendered post HTML (shortcodes run, filters applied) 33 * `Content-Type: text/markdown` and `Vary: Accept` response headers 34 * `Content-Signal` header for AI signal declaration — defaults to `ai-train=no, search=yes, ai-input=yes` — see [contentsignals.org](https://contentsignals.org/) 35 * `X-Markdown-Tokens` header with estimated token count 36 * Discovery via `<link rel="alternate">` in the HTML head and `Link` HTTP header 37 * Automatic cache invalidation when a post is updated or deleted 38 39 **Performance:** 40 41 Botkibble writes Markdown to disk on the first request, then serves it as a static file. A built-in Fast-Path serves cached files during WordPress's `init` hook, before the main database query runs. No extra configuration needed. 42 43 Add a web server rewrite rule and Botkibble bypasses PHP entirely, serving `.md` files the same way a server would serve an image or CSS file: 44 45 | Method | Avg. response time | 46 |---|---| 47 | Standard HTML | 0.97s | 48 | Markdown (cold, first request) | 0.95s | 49 | Markdown (cached, PHP Fast-Path) | 0.87s | 50 | Markdown (Nginx/Apache direct) | 0.11s | 51 52 Serving directly from disk is **88% faster** than a full WordPress page load. See the Performance section below for Nginx and Apache configuration. 53 54 **Security:** 55 56 * Drafts, private posts, and password-protected content return `403 Forbidden` 57 * Rate limits cache-miss regenerations (20/min by default) to mitigate DoS abuse 58 * `X-Robots-Tag: noindex` keeps Markdown versions out of search results 59 * `Link: rel="canonical"` points search engines back to the HTML version 24 60 25 61 **Cache variants (optional):** … … 30 66 /wp-content/uploads/botkibble/_v/<variant>/<slug>.md 31 67 32 **What you get:**33 34 * YAML frontmatter with title, date, categories, tags, `word_count`, `char_count`, and `tokens` (estimate)35 * Clean Markdown converted from the fully-rendered post HTML36 * `Content-Type: text/markdown` response header37 * `Content-Signal` header (`ai-train`, `search`, `ai-input`)38 * Discovery via `<link rel="alternate">` tag (body) and HTTP `Link` header39 * Static file offloading with automatic invalidation on post update40 * Rate limiting for cache-miss regenerations (20 per minute by default)41 42 68 **What it does NOT do:** 43 69 44 70 * Expose drafts, private posts, or password-protected content 45 71 * Serve non-post/page content types by default 46 * Require any configuration — activate it and it works72 * Require any configuration. Activate and it works. 47 73 48 74 == Why Markdown? == … … 55 81 56 82 If you use Cloudflare, both share the same `Accept: text/markdown` header, `Content-Signal` headers, and `X-Markdown-Tokens` response headers. 83 84 Cloudflare currently defaults to `Content-Signal: ai-train=yes, search=yes, ai-input=yes` with no way to change it. Botkibble defaults to `ai-train=no` and lets you override the full signal per site via the `botkibble_content_signal` filter. 57 85 58 86 == Performance & Static Offloading == … … 117 145 } ); 118 146 119 Be careful — only add post types that contain public content. Do not expose post types that may contain private or sensitive data (e.g. WooCommerce orders). 147 Be careful. Only add post types that contain public content. Do not expose post types that may contain private or sensitive data (e.g. WooCommerce orders). 148 149 = What does the YAML frontmatter include? = 150 151 Every response starts with a YAML block containing: 152 153 * `title` — the post title 154 * `date` — publish date in ISO 8601 format 155 * `type` — post type (e.g. `post`, `page`) 156 * `word_count` — word count of the Markdown body 157 * `char_count` — character count of the Markdown body 158 * `tokens` — estimated token count (word_count × 1.3) 159 * `categories` — array of category names (posts only) 160 * `tags` — array of tag names (posts only, omitted if none) 161 162 Example: 163 164 --- 165 title: My Post Title 166 date: '2025-06-01T12:00:00+00:00' 167 type: post 168 word_count: 842 169 char_count: 4981 170 tokens: 1095 171 categories: 172 - Technology 173 tags: 174 - wordpress 175 - markdown 176 --- 120 177 121 178 = How do I add custom fields to the frontmatter? = … … 164 221 } ); 165 222 223 = Can I strip script nodes during conversion? = 224 225 Yes. Botkibble keeps converter node removal disabled by default (for backward compatibility), but you can opt in with `botkibble_converter_remove_nodes`: 226 227 add_filter( 'botkibble_converter_remove_nodes', function ( $nodes ) { 228 $nodes = is_array( $nodes ) ? $nodes : []; 229 $nodes[] = 'script'; 230 return $nodes; 231 } ); 232 233 If you also need `application/ld+json`, extract it in `botkibble_clean_html` first, then let converter-level script removal clean up any remaining script tags. 166 234 = How do I modify the body before metrics are calculated? = 167 235 … … 212 280 * `Link: <url>; rel="canonical"` — points search engines to the original HTML post 213 281 * `Link: <url>; rel="alternate"` — advertises the Markdown version for discovery 214 * `Content-Signal: ai-train= yes, search=yes, ai-input=yes` — see [contentsignals.org](https://contentsignals.org/)282 * `Content-Signal: ai-train=no, search=yes, ai-input=yes` — see [contentsignals.org](https://contentsignals.org/) 215 283 216 284 == Credits == … … 219 287 220 288 == Changelog == 289 290 = 1.3.0 = 291 * Changed default Content-Signal from ai-train=yes to ai-train=no (opt-out of AI training by default). 292 * Added botkibble_converter_remove_nodes filter for opt-in HTML node stripping during conversion. 221 293 222 294 = 1.2.1 = -
botkibble/trunk/botkibble.php
r3470652 r3470692 4 4 * Plugin URI: https://github.com/greg-randall/botkibble 5 5 * Description: Serve published posts and pages as clean Markdown for AI agents and crawlers. 6 * Version: 1. 2.16 * Version: 1.3.0 7 7 * Requires at least: 6.0 8 8 * Requires PHP: 8.2 … … 17 17 } 18 18 19 define( 'BOTKIBBLE_VERSION', '1. 2.1' );19 define( 'BOTKIBBLE_VERSION', '1.3.0' ); 20 20 define( 'BOTKIBBLE_PLUGIN_DIR', plugin_dir_path( __FILE__ ) ); 21 21 -
botkibble/trunk/includes/converter.php
r3470652 r3470692 312 312 static $converter = null; 313 313 if ( null === $converter ) { 314 $converter = new HtmlConverter([314 $converter_options = [ 315 315 'strip_tags' => true, 316 316 'hard_break' => true, 317 ] ); 317 ]; 318 319 /** 320 * Optional: remove entire DOM node types before conversion. 321 * 322 * Keep this empty by default to preserve legacy behavior. 323 * Example return values: 324 * - array: [ 'script', 'style' ] 325 * - string: 'script style' 326 * 327 * @param array<int, string>|string $remove_nodes Requested node names. 328 * @param WP_Post $post Current post being rendered. 329 */ 330 $remove_nodes = apply_filters( 'botkibble_converter_remove_nodes', [], $post ); 331 $remove_nodes = botkibble_normalize_remove_nodes( $remove_nodes ); 332 if ( ! empty( $remove_nodes ) ) { 333 $converter_options['remove_nodes'] = implode( ' ', $remove_nodes ); 334 } 335 336 $converter = new HtmlConverter( $converter_options ); 318 337 } 319 338 … … 322 341 'word_count' => $word_count, 323 342 ]; 343 } 344 345 /** 346 * Normalize a converter remove_nodes value into a clean list of tag names. 347 * 348 * Accepts either a string (space/comma-separated) or an array of values and 349 * returns unique lowercase tag names safe to pass to HtmlConverter. 350 * 351 * @param array<int, mixed>|string $nodes Raw filter value. 352 * @return array<int, string> 353 */ 354 function botkibble_normalize_remove_nodes( $nodes ): array { 355 if ( is_string( $nodes ) ) { 356 $nodes = preg_split( '/[\s,]+/', $nodes ) ?: []; 357 } elseif ( ! is_array( $nodes ) ) { 358 return []; 359 } 360 361 $out = []; 362 foreach ( $nodes as $node ) { 363 $name = strtolower( trim( (string) $node ) ); 364 if ( '' === $name ) { 365 continue; 366 } 367 368 // Keep DOM-like node names only (e.g. script, style, iframe). 369 if ( ! preg_match( '/^[a-z][a-z0-9:_-]*$/', $name ) ) { 370 continue; 371 } 372 373 $out[ $name ] = true; 374 } 375 376 return array_keys( $out ); 324 377 } 325 378 -
botkibble/trunk/includes/routing.php
r3470652 r3470692 655 655 */ 656 656 function botkibble_send_content_signal_header( ?WP_Post $post = null ): void { 657 $signal = apply_filters( 'botkibble_content_signal', 'ai-train= yes, search=yes, ai-input=yes', $post );657 $signal = apply_filters( 'botkibble_content_signal', 'ai-train=no, search=yes, ai-input=yes', $post ); 658 658 $signal = str_replace( [ "\r", "\n" ], '', $signal ); 659 659 if ( $signal ) { -
botkibble/trunk/readme.txt
r3470652 r3470692 5 5 Tested up to: 6.9 6 6 Requires PHP: 8.2 7 Stable tag: 1. 2.17 Stable tag: 1.3.0 8 8 License: GPL-2.0-only 9 9 License URI: https://www.gnu.org/licenses/gpl-2.0.html 10 10 11 Serve published posts and pages as clean Markdown with YAML frontmatter — built for AI agents and crawlers.11 Serves every published post and page as Markdown for AI agents and crawlers. No configuration, no API keys. Activate and it works. 12 12 13 13 == Description == 14 14 15 Botkibble converts any published post or page on your WordPress site to Markdown. It caches the output and serves it with `text/markdown` headers. 15 AI agents, LLMs, and crawlers have to wade through navigation bars, sidebars, ads, and comment forms to reach the content they want, and every element costs tokens. [Cloudflare measured](https://blog.cloudflare.com/markdown-for-agents/) an 80% reduction in token usage when converting a blog post from HTML to Markdown (16,180 tokens down to 3,150). 16 17 Botkibble adds a Markdown endpoint to every published post and page. 18 19 Cloudflare offers [Markdown for Agents](https://developers.cloudflare.com/fundamentals/reference/markdown-for-agents/) at the CDN edge on Pro, Business, and Enterprise plans. Botkibble does the same thing (for free) at the origin, so it works on any host. 16 20 17 21 [GitHub Repository](https://github.com/greg-randall/botkibble) … … 19 23 **Three ways to request Markdown:** 20 24 21 * **`.md` suffix** — append `.md` to any post or page URL (e.g. `example.com/my-post.md`) 22 * **Query parameter** — add `?format=markdown` to any post or page URL 23 * **Content negotiation** — send `Accept: text/markdown` in the request header 25 * **`.md` suffix**: append `.md` to any post or page URL (e.g. `example.com/my-post.md`) 26 * **Query parameter**: add `?format=markdown` to any post or page URL 27 * **Content negotiation**: send `Accept: text/markdown` in the request header 28 29 **What's in every response:** 30 31 * Structured metadata header with title, date, categories, tags, word count, character count, and estimated token count (in YAML frontmatter format, readable by any AI agent) 32 * Clean Markdown converted from fully-rendered post HTML (shortcodes run, filters applied) 33 * `Content-Type: text/markdown` and `Vary: Accept` response headers 34 * `Content-Signal` header for AI signal declaration — defaults to `ai-train=no, search=yes, ai-input=yes` — see [contentsignals.org](https://contentsignals.org/) 35 * `X-Markdown-Tokens` header with estimated token count 36 * Discovery via `<link rel="alternate">` in the HTML head and `Link` HTTP header 37 * Automatic cache invalidation when a post is updated or deleted 38 39 **Performance:** 40 41 Botkibble writes Markdown to disk on the first request, then serves it as a static file. A built-in Fast-Path serves cached files during WordPress's `init` hook, before the main database query runs. No extra configuration needed. 42 43 Add a web server rewrite rule and Botkibble bypasses PHP entirely, serving `.md` files the same way a server would serve an image or CSS file: 44 45 | Method | Avg. response time | 46 |---|---| 47 | Standard HTML | 0.97s | 48 | Markdown (cold, first request) | 0.95s | 49 | Markdown (cached, PHP Fast-Path) | 0.87s | 50 | Markdown (Nginx/Apache direct) | 0.11s | 51 52 Serving directly from disk is **88% faster** than a full WordPress page load. See the Performance section below for Nginx and Apache configuration. 53 54 **Security:** 55 56 * Drafts, private posts, and password-protected content return `403 Forbidden` 57 * Rate limits cache-miss regenerations (20/min by default) to mitigate DoS abuse 58 * `X-Robots-Tag: noindex` keeps Markdown versions out of search results 59 * `Link: rel="canonical"` points search engines back to the HTML version 24 60 25 61 **Cache variants (optional):** … … 30 66 /wp-content/uploads/botkibble/_v/<variant>/<slug>.md 31 67 32 **What you get:**33 34 * YAML frontmatter with title, date, categories, tags, `word_count`, `char_count`, and `tokens` (estimate)35 * Clean Markdown converted from the fully-rendered post HTML36 * `Content-Type: text/markdown` response header37 * `Content-Signal` header (`ai-train`, `search`, `ai-input`)38 * Discovery via `<link rel="alternate">` tag (body) and HTTP `Link` header39 * Static file offloading with automatic invalidation on post update40 * Rate limiting for cache-miss regenerations (20 per minute by default)41 42 68 **What it does NOT do:** 43 69 44 70 * Expose drafts, private posts, or password-protected content 45 71 * Serve non-post/page content types by default 46 * Require any configuration — activate it and it works72 * Require any configuration. Activate and it works. 47 73 48 74 == Why Markdown? == … … 55 81 56 82 If you use Cloudflare, both share the same `Accept: text/markdown` header, `Content-Signal` headers, and `X-Markdown-Tokens` response headers. 83 84 Cloudflare currently defaults to `Content-Signal: ai-train=yes, search=yes, ai-input=yes` with no way to change it. Botkibble defaults to `ai-train=no` and lets you override the full signal per site via the `botkibble_content_signal` filter. 57 85 58 86 == Performance & Static Offloading == … … 117 145 } ); 118 146 119 Be careful — only add post types that contain public content. Do not expose post types that may contain private or sensitive data (e.g. WooCommerce orders). 147 Be careful. Only add post types that contain public content. Do not expose post types that may contain private or sensitive data (e.g. WooCommerce orders). 148 149 = What does the YAML frontmatter include? = 150 151 Every response starts with a YAML block containing: 152 153 * `title` — the post title 154 * `date` — publish date in ISO 8601 format 155 * `type` — post type (e.g. `post`, `page`) 156 * `word_count` — word count of the Markdown body 157 * `char_count` — character count of the Markdown body 158 * `tokens` — estimated token count (word_count × 1.3) 159 * `categories` — array of category names (posts only) 160 * `tags` — array of tag names (posts only, omitted if none) 161 162 Example: 163 164 --- 165 title: My Post Title 166 date: '2025-06-01T12:00:00+00:00' 167 type: post 168 word_count: 842 169 char_count: 4981 170 tokens: 1095 171 categories: 172 - Technology 173 tags: 174 - wordpress 175 - markdown 176 --- 120 177 121 178 = How do I add custom fields to the frontmatter? = … … 164 221 } ); 165 222 223 = Can I strip script nodes during conversion? = 224 225 Yes. Botkibble keeps converter node removal disabled by default (for backward compatibility), but you can opt in with `botkibble_converter_remove_nodes`: 226 227 add_filter( 'botkibble_converter_remove_nodes', function ( $nodes ) { 228 $nodes = is_array( $nodes ) ? $nodes : []; 229 $nodes[] = 'script'; 230 return $nodes; 231 } ); 232 233 If you also need `application/ld+json`, extract it in `botkibble_clean_html` first, then let converter-level script removal clean up any remaining script tags. 166 234 = How do I modify the body before metrics are calculated? = 167 235 … … 212 280 * `Link: <url>; rel="canonical"` — points search engines to the original HTML post 213 281 * `Link: <url>; rel="alternate"` — advertises the Markdown version for discovery 214 * `Content-Signal: ai-train= yes, search=yes, ai-input=yes` — see [contentsignals.org](https://contentsignals.org/)282 * `Content-Signal: ai-train=no, search=yes, ai-input=yes` — see [contentsignals.org](https://contentsignals.org/) 215 283 216 284 == Credits == … … 219 287 220 288 == Changelog == 289 290 = 1.3.0 = 291 * Changed default Content-Signal from ai-train=yes to ai-train=no (opt-out of AI training by default). 292 * Added botkibble_converter_remove_nodes filter for opt-in HTML node stripping during conversion. 221 293 222 294 = 1.2.1 =
Note: See TracChangeset
for help on using the changeset viewer.