Skip to content
/ patitas Public

ฅᨐฅ Patitas — CommonMark Markdown parser for Python 3.14+ with typed AST and free-threading

License

Notifications You must be signed in to change notification settings

lbliii/patitas

Repository files navigation

ฅᨐฅ Patitas

PyPI version Build Status Python 3.14+ License: MIT CommonMark ReDoS Safe

The secure, typed Markdown parser for modern Python.

from patitas import Markdown

md = Markdown()
html = md("# Hello **World**")

What is Patitas?

Patitas is a pure-Python Markdown parser that parses to a typed AST and renders to HTML. It's CommonMark 0.31.2 compliant, has zero runtime dependencies, and is built for Python 3.14+.


What it does

Function Description
parse(source) Parse Markdown to typed AST
parse_frontmatter(content) Parse YAML frontmatter to (metadata, body)
parse_notebook(content, source_path?) Parse Jupyter .ipynb to (markdown, metadata)
parse_incremental(new, prev, ...) Re-parse only the changed region (O(change))
render(doc) Render AST to HTML
render_llm(doc) Render AST to LLM-friendly plain text (no HTML)
sanitize(doc, policy) Strip HTML, dangerous URLs, zero-width chars
extract_text(node) Extract plain text from any AST node
extract_body(content) Strip --- delimited frontmatter block (no YAML parse)
Markdown() All-in-one parser and renderer

What's good about it

  • ReDoS-proof — O(n) finite state machine lexer, no regex backtracking. Safe for untrusted input in web apps and APIs.
  • Typed AST — Frozen dataclasses (Heading, Paragraph, Strong, etc.) with IDE autocomplete and type checking.
  • CommonMark — Full 0.31.2 spec compliance (652 examples).
  • Incremental parsing — Re-parse only changed blocks; ~200x faster for small edits than full re-parse.
  • Free-threading native — Frozen AST, ContextVar config, no shared mutable state. 1,000 documents parse in parallel with near-linear thread scaling on 3.14t — no locks, no special API.
  • LLM-saferender_llm + composable sanitize policies for RAG, retrieval, safe context.
  • Directives — MyST-style blocks (admonition, dropdown, tabs) plus custom directives.
  • Plugins — Tables, footnotes, math, strikethrough, task lists.
  • Minimal dependencies — PyYAML for frontmatter; core parser is pure Python.

Installation

pip install patitas

Requires Python 3.14+

Optional extras:

pip install patitas[syntax]      # Syntax highlighting via Rosettes
pip install patitas[all]         # All optional features

Quick Start

Parse and render

from patitas import parse, render

doc = parse("# Hello **World**")
html = render(doc)
# <h1 id="hello-world">Hello <strong>World</strong></h1>

Frontmatter

Parse YAML frontmatter from Markdown or other content, returning a (metadata, body) tuple:

from patitas import parse_frontmatter, extract_body

content = """---
title: Hello
weight: 10
---
# Body content
"""
metadata, body = parse_frontmatter(content)
# metadata: {"title": "Hello", "weight": 10.0}
# body: "# Body content"

# When YAML is broken, extract_body strips the --- block without parsing
body_only = extract_body(content)

Notebook support

Parse Jupyter notebooks (.ipynb) to Markdown content and metadata — stdlib JSON only:

from patitas import parse_notebook

with open("demo.ipynb") as f:
    content, metadata = parse_notebook(f.read(), "demo.ipynb")

# content: Markdown string (cells → fenced code, outputs → HTML)
# metadata: title, type, notebook{kernel_name, cell_count}, etc.

Security

Patitas is immune to ReDoS attacks.

Traditional Markdown parsers use regex patterns vulnerable to catastrophic backtracking:

# Malicious input that can freeze regex-based parsers
evil = "a](" + "\\)" * 10000

# Patitas: completes in milliseconds (O(n) guaranteed)

Patitas uses a hand-written finite state machine lexer:

  • Single character lookahead — No backtracking, ever
  • Linear time guaranteed — Processing time scales with input length
  • Safe for untrusted input — Use in web apps, APIs, user-facing tools

Learn more about Patitas security →


Performance

  • 652 CommonMark examples — ~26ms single-threaded

  • Incremental parsing — For a 1-char edit in a ~100KB doc, parse_incremental is ~200x faster than full re-parse (~160µs vs ~32ms)

  • Parallel scaling — Near-linear thread scaling under Python 3.14t free-threading. Run python benchmarks/benchmark_parallel.py to see results on your machine. Example on 8-core:

      Threads    Time      Speedup
      1          1.52s     1.00x
      2          0.79s     1.92x
      4          0.41s     3.71x
      8          0.23s     6.61x
    
# From repo (after uv sync --group dev):
python benchmarks/benchmark_vs_mistune.py
python benchmarks/benchmark_parallel.py
pytest benchmarks/benchmark_vs_mistune.py benchmarks/benchmark_incremental.py -v --benchmark-only

Usage

Typed AST — IDE autocomplete, catch errors at dev time
from patitas import parse
from patitas.nodes import Heading, Paragraph, Strong

doc = parse("# Hello **World**")
heading = doc.children[0]

# Full type safety
assert isinstance(heading, Heading)
assert heading.level == 1

# IDE knows the types!
for child in heading.children:
    if isinstance(child, Strong):
        print(f"Bold text: {child.children}")

All nodes are @dataclass(frozen=True, slots=True) — immutable and memory-efficient.

Directives — MyST-style blocks
:::{note}
This is a note admonition.
:::

:::{warning}
This is a warning.
:::

:::{dropdown} Click to expand
Hidden content here.
:::

:::{tab-set}

:::{tab-item} Python
Python code here.
:::

:::{tab-item} JavaScript
JavaScript code here.
:::

:::
Custom Directives — Extend with your own
from patitas import Markdown, create_registry_with_defaults
from patitas.directives.decorator import directive

# Define a custom directive with the @directive decorator
@directive("alert")
def render_alert(node, children: str, sb) -> None:
    sb.append(f'<div class="alert">{children}</div>')

# Extend defaults with your directive
builder = create_registry_with_defaults()  # Has admonition, dropdown, tabs
builder.register(render_alert())

# Use it
md = Markdown(directive_registry=builder.build())
html = md(":::{alert} This is important!\n:::")
Syntax Highlighting

With pip install patitas[syntax]:

from patitas import Markdown

md = Markdown(highlight=True)

html = md("""
```python
def hello():
    print("Highlighted!")

""")


Uses [Rosettes](https://github.com/lbliii/rosettes) for O(n) highlighting.

</details>

<details>
<summary><strong>Free-Threading</strong> — Python 3.14t</summary>

```python
from concurrent.futures import ThreadPoolExecutor
from patitas import parse

documents = ["# Doc " + str(i) for i in range(1000)]

with ThreadPoolExecutor() as executor:
    # Safe to parse in parallel — no shared mutable state
    results = list(executor.map(parse, documents))

Patitas is designed for Python 3.14t's free-threading mode (PEP 703).

LLM Safety — Sanitize and render for RAG, retrieval

When sending Markdown to an LLM, sanitize untrusted content and render to plain text:

from patitas import parse, sanitize, render_llm
from patitas.sanitize import llm_safe

doc = parse(user_content)
clean = sanitize(doc, policy=llm_safe)  # Strip HTML, dangerous URLs, zero-width chars
safe_text = render_llm(clean, source=user_content)

Pre-built policies: llm_safe, web_safe (alias), strict. Compose with |.


Migrate from mistune

Same API — swap the import:

from patitas import Markdown
md = Markdown()
html = md(source)

Full migration guide →


The Bengal Ecosystem

A structured reactive stack — every layer written in pure Python for 3.14t free-threading.

ᓚᘏᗢ Bengal Static site generator Docs
∿∿ Purr Content runtime
⌁⌁ Chirp Web framework Docs
=^..^= Pounce ASGI server Docs
)彡 Kida Template engine Docs
ฅᨐฅ Patitas Markdown parser ← You are here Docs
⌾⌾⌾ Rosettes Syntax highlighter Docs

Python-native. Free-threading ready. No npm required.


Development

git clone https://github.com/lbliii/patitas.git
cd patitas
uv sync --group dev
pytest

Run benchmarks (after uv sync --group dev):

python benchmarks/benchmark_vs_mistune.py
python benchmarks/benchmark_parallel.py   # Free-threading scaling demo

License

MIT License — see LICENSE for details.

About

ฅᨐฅ Patitas — CommonMark Markdown parser for Python 3.14+ with typed AST and free-threading

Topics

Resources

License

Security policy

Stars

Watchers

Forks

Contributors