[Generative Retrieval] Transition Matrix Trie for Constraint Decoding

Implementation of STATIC from this paper: Vectorizing the Trie: Efficient Constrained Decoding for LLM-based Generative Retrieval on Accelerators

This is implemented using parallelisation on CPU, developing the GPU shader will be nice but quite time-consuming.

Motivation

A Rust library for fast, production-friendly constrained decoding over a large, fixed candidate set (e.g., Semantic IDs / item IDs for generative retrieval). It flattens a prefix trie into a CSR transition matrix and uses a vectorized kernel (VNTK) plus a bit-packed dense mask for shallow layers to keep constraint enforcement accelerator-friendly.

This design is inspired by STATIC (“Sparse Transition Matrix‑Accelerated Trie Index for Constrained Decoding”), which recasts trie traversal as vectorized sparse-matrix operations and reports very large speedups and low per-step overhead in an industrial deployment.

Why this matters commercially

Constrained decoding is what turns “LLM can generate an ID” into “LLM can generate only valid, in-policy IDs”:

Hard business logic: enforce inventory/freshness/eligibility policies at generation-time (no post-filtering surprises).
Lower latency overhead: move from pointer-chasing trie walks to contiguous CSR reads and vectorized scattering.
Scale: supports large constraint sets, where binary-search style methods can become an I/O bottleneck; STATIC frames this as O(1) I/O w.r.t. constraint-set size for the kernel.
Deterministic outputs: the index is static and reproducible, which helps auditing and rollout safety.

What’s included

Project layout:

src
├── trie.rs          # Prefix trie construction
├── transition.rs    # CSR transition matrix builder
├── dense_mask.rs    # Bit-packed dense mask for shallow layers
├── vntk.rs          # Vectorized Node Transition Kernel
├── decoder.rs       # Constrained decoding step (Algorithm 1)
└── types.rs         # TransitionMatrix, DecodingState, etc.

Quick start

Key structures:

DenseMask: bit-packed validity for the first d layers (typically d=2).
TransitionMatrix: CSR matrix with interleaved (token_id, next_node_id) entries.
StaticIndex: dense + sparse combined index.
ConstrainedDecoder: constrained beam step (mask → select → gather).

Build an index

use constraint_decoding_trie::transition::build_static_index;

let constraints = vec![
  vec![1u32, 2, 1],
  vec![3u32, 1, 2],
  vec![3u32, 1, 3],
];

let index = build_static_index(
  &constraints,
  /*vocab_size=*/ 4,
  /*sid_length=*/ 3,
  /*dense_depth=*/ 2,
);

Benchmarks (Criterion)

Benchmarks live under benches/ and use Criterion. Run:

cargo bench

Notes on correctness

Children in each CSR row are stored sorted by token ID (deterministic index layout).
Dense masking uses safe bit-range scans; boundary conditions are covered by tests.
Sparse masking uses VNTK to produce per-beam masks and next-node slots.

Roadmap ideas

Parquet persistence for StaticIndex
SIMD/bitset masks for faster CPU-side gating
GPU/TPU-friendly kernels (layout already designed for coalesced reads)

References

STATIC paper: “Vectorizing the Trie: Efficient Constrained Decoding for LLM-based Generative Retrieval on Accelerators” (2026), paper.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
benches		benches
src		src
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

[Generative Retrieval] Transition Matrix Trie for Constraint Decoding

Motivation

Why this matters commercially

What’s included

Quick start

Build an index

Benchmarks (Criterion)

Notes on correctness

Roadmap ideas

References

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

[Generative Retrieval] Transition Matrix Trie for Constraint Decoding

Motivation

Why this matters commercially

What’s included

Quick start

Build an index

Benchmarks (Criterion)

Notes on correctness

Roadmap ideas

References

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages