markdown_it_yo

A high-performance Markdown parser written in Yo programming language — a direct 1:1 port of the popular markdown-it JavaScript library.

See also: markdown_yo — a custom implementation inspired by markdown-it, md4c, and markdown-wasm, optimized for speed. markdown_it_yo faithfully mirrors the original JS architecture for easier maintenance, while markdown_yo is a ground-up rewrite for maximum performance.

Features

100% CommonMark compliance
Extensions: tables, strikethrough, typographer, smartquotes, linkify
Fast native compilation via Yo → C11 → clang
CLI tool for converting markdown files/stdin to HTML

Prerequisites

Yo compiler (npm install -g @shd101wyy/yo)
A C compiler (clang recommended)

Build

yo build          # Build executable + static library
yo build run      # Build and run (reads from stdin)

Usage

# Read from stdin
echo "# Hello **World**" | ./yo-out/$TARGET/bin/markdown_it_yo -

# Read from file
./yo-out/$TARGET/bin/markdown_it_yo README.md

# CLI options
./yo-out/$TARGET/bin/markdown_it_yo --html --typographer input.md
./yo-out/$TARGET/bin/markdown_it_yo --commonmark input.md
./yo-out/$TARGET/bin/markdown_it_yo --no-lang-prefix input.md

CLI Flags

Flag	Description
`--html`	Enable HTML tags in source
`--typographer`	Enable typographic replacements
`--commonmark`	Use CommonMark preset
`--no-lang-prefix`	Disable language prefix on fenced code blocks

Test

Unit Tests (Yo)

yo build test     # Run token, ruler, and other unit tests

Fixture Tests (vs markdown-it JS)

Compare output against the reference JavaScript implementation:

npm install                      # Install markdown-it JS dependency
node scripts/run_fixture_tests.js  # Run 826 fixture tests

Current results: 815 passed, 0 failed, 11 skipped across 9 test suites.

Benchmark

Running Benchmarks

# Generate benchmark input files (1MB and 5MB of repeated markdown)
node benchmark/generate_samples.js

# Run benchmark comparing JS and Yo
node benchmark/run.js

Results (Apple M4, macOS)

Parse time only — median of 10 runs, 3 warmup, --repeat to amortize process startup:

Input	markdown-it (JS)	markdown_it_yo (Native)	Ratio
1 MB	13.8 ms	115.3 ms	0.12×
5 MB	69.4 ms	594.9 ms	0.12×
20 MB	334.0 ms	2444.2 ms	0.14×

The 1:1 port is currently ~8× slower than the JS original, primarily due to reference counting overhead in the regex engine (~96% of heap allocations come from NfaThread objects during regex matching). See markdown_yo for a custom implementation that is 2-2.5× faster than JS through a SAX architecture with value-type tokens.

Note on wall-clock benchmarks: Single-run wall-clock timings (e.g., /usr/bin/time) can be misleading — Node.js startup adds ~60-150ms of overhead (VM init, JIT compilation, module loading) that amortizes away on repeated runs. The numbers above measure parse time only after JIT warmup using --repeat and process.hrtime.bigint() per iteration.

Optimizations Applied

Despite being a faithful 1:1 port, several optimizations have been applied to reduce overhead:

Enum token types — Token type_name uses an enum instead of String, eliminating millions of string allocations and comparisons
Value-type token tags — Token tag uses str (16-byte value type, pointer+length) instead of String (RC heap object)
Buffer-pattern renderer — Renderer appends to a pre-allocated String buffer via push_str/push_string instead of string concatenation
Zero-allocation HTML escaping — escape_html_to() appends escaped content directly to the output buffer
push_str for literals — String literal appends use push_str("...") (str type) instead of push_string(\...`)` (String type)
libc allocator — System malloc outperforms mimalloc for this allocation pattern (many small RC objects). Set via build.Allocator.Libc in build.yo
Pre-allocated buffers — Render buffer pre-allocated to 1.5× source size
Bulk memory operations — String.substring and String.trim use memcpy/extend_from_ptr instead of byte-by-byte copying
Pointer-based access — ArrayList.get_ptr returns pointers to elements without copying, avoiding RC overhead in hot loops
Regex caching — Compiled regex patterns cached as module-level variables
Regex VM buffer reuse — NfaVm hoisted outside search loops, seen/next_seen arrays pre-allocated as VM fields, swap+clear pattern for current/next thread lists (reduced NfaVm allocations by 81%)

Performance Analysis

Profiling with instrumented C code reveals that ~96% of heap allocations come from the regex engine, not the parser itself:

Allocator	Count (1 MB parse)	Notes
NfaThread	2,060,663	Dominant bottleneck — fork creates new RC object + slots
ArrayList(usize)	2,004,328	NfaThread capture slots (one per fork)
ArrayList(NfaThread)	394,168	Thread lists for NFA simulation
Token-related	~50,000	Actual parser allocations

V8's generational GC handles these short-lived objects nearly for free (young generation collection), while reference counting incurs per-operation overhead on every increment/decrement. This is the fundamental cost of RC vs tracing GC for allocation-heavy workloads.

Verifying Correctness

The benchmark verifies that Yo and JS produce byte-identical HTML output:

# Quick correctness check
./bench_native input.md > /tmp/yo.html
node -e "const md = require('markdown-it')(); const fs = require('fs'); process.stdout.write(md.render(fs.readFileSync('input.md', 'utf8')))" > /tmp/js.html
diff /tmp/yo.html /tmp/js.html  # Should produce no output

Project Structure

markdown_it_yo/
├── build.yo              # Build system configuration
├── src/
│   ├── main.yo           # CLI entry point
│   ├── lib.yo            # Library entry point
│   ├── markdown_it.yo    # Core MarkdownIt class
│   ├── options.yo        # Configuration presets
│   ├── token.yo          # Token type definition
│   ├── ruler.yo          # Rule chain manager
│   ├── renderer.yo       # HTML renderer
│   ├── common/           # Shared utilities (re-exports from Yo std)
│   ├── helpers/          # Parse helpers
│   ├── parser/           # Parser implementations
│   │   ├── core.yo       # Core parser (normalize, blocks, inline, linkify)
│   │   ├── block.yo      # Block-level parser
│   │   └── inline.yo     # Inline-level parser
│   └── rules/            # Parsing rules
│       ├── core/         # Core rules
│       ├── block/        # Block rules
│       └── inline/       # Inline rules
├── tests/
│   ├── *.test.yo         # Yo unit tests
│   └── fixtures/         # markdown-it fixture files
├── scripts/
│   └── run_fixture_tests.js  # Fixture test runner
└── benchmark/
    └── run.js            # Benchmark script

License

ISC (same as markdown-it)

Acknowledgments

This project is a direct port of markdown-it by Vitaly Puzrin and Alex Kocharin. All credit for the parser design and algorithms goes to the original authors.

Name		Name	Last commit message	Last commit date
Latest commit History 63 Commits
.github		.github
benchmark		benchmark
scripts		scripts
src		src
tests		tests
tmp		tmp
.envrc		.envrc
.gitignore		.gitignore
AGENTS.md		AGENTS.md
PLAN.md		PLAN.md
README.md		README.md
build.yo		build.yo
bun.lock		bun.lock
devenv.lock		devenv.lock
devenv.nix		devenv.nix
devenv.yaml		devenv.yaml
package-lock.json		package-lock.json
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

markdown_it_yo

Features

Prerequisites

Build

Usage

CLI Flags

Test

Unit Tests (Yo)

Fixture Tests (vs markdown-it JS)

Benchmark

Running Benchmarks

Results (Apple M4, macOS)

Optimizations Applied

Performance Analysis

Verifying Correctness

Project Structure

License

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

markdown_it_yo

Features

Prerequisites

Build

Usage

CLI Flags

Test

Unit Tests (Yo)

Fixture Tests (vs markdown-it JS)

Benchmark

Running Benchmarks

Results (Apple M4, macOS)

Optimizations Applied

Performance Analysis

Verifying Correctness

Project Structure

License

Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages