Releases: withcatai/node-llama-cpp
v3.18.1
3.18.1 (2026-03-17)
Features
- customize
postinstallbehavior (#582) (57bea3d) (documentation: CustomizingpostinstallBehavior) - experimental support for context KV cache type configurations (#582) (57bea3d) (documentation:
LlamaContextOptions["experimentalKvCacheKeyType"]) - support
NVFP4quants (#582) (57bea3d)
Shipped with llama.cpp release b8390
To use the latest
llama.cpprelease available, runnpx -n node-llama-cpp source download --release latest. (learn more)
v3.18.0
3.18.0 (2026-03-15)
Features
- automatic checkpoints for models that need it (#573) (c641959)
QwenChatWrapper: Qwen 3.5 support (#573) (c641959)inspect gpucommand: detect and report missing prebuilt binary modules and custom npm registry (#573) (c641959)
Bug Fixes
resolveModelFile: deduplicate concurrent downloads (#570) (cc105b9)- correct Vulkan URL casing in documentation links (#568) (5a44506)
- Qwen 3.5 memory estimation (#573) (c641959)
- grammar use with HarmonyChatWrapper (#573) (c641959)
- add mistral think segment detection (#573) (c641959)
- compress excessively long segments from the current response on context shift instead of throwing an error (#573) (c641959)
- default thinking budget to 75% of the context size to prevent low-quality responses (#573) (c641959)
Shipped with llama.cpp release b8352
To use the latest
llama.cpprelease available, runnpx -n node-llama-cpp source download --release latest. (learn more)
v3.17.1
v3.17.0
3.17.0 (2026-02-27)
Features
getLlama:build: "autoAttempt"(#564) (dda5ade) (documentation:LlamaOptions ["build"])- remove octokit dependency (#564) (dda5ade)
Bug Fixes
- CLI: disable Direct I/O by default (#564) (dda5ade)
- Bun segmentation fault on process exit with undisposed
Llamainstance (#564) (dda5ade) - detect glibc inside Nix (#564) (dda5ade)
Shipped with llama.cpp release b8169
To use the latest
llama.cpprelease available, runnpx -n node-llama-cpp source download --release latest. (learn more)
v3.16.2
v3.16.1
v3.16.0
3.16.0 (2026-02-19)
Features
- Exclude Top Choices (XTC) (#553) (57e8c22) (documentation:
LLamaChatPromptOptions["xtc"]) - DRY (Don't Repeat Yourself) repeat penalty (#553) (57e8c22) (documentation:
LLamaChatPromptOptions["dryRepeatPenalty"]) - Tiny Aya support (#553) (57e8c22)
Bug Fixes
- adjust the default VRAM padding config to reserve enough memory for compute buffers (#553) (57e8c22)
- support function call syntax with optional whitespace prefix (#553) (57e8c22)
- change the default value of
useDirectIotofalse(#553) (57e8c22) - Vulkan device dedupe (#553) (57e8c22)
Shipped with llama.cpp release b8095
To use the latest
llama.cpprelease available, runnpx -n node-llama-cpp source download --release latest. (learn more)
v3.15.1
v3.15.0
3.15.0 (2026-01-10)
Features
LlamaCompletion:stopOnAbortSignal(#538) (734693d) (documentation:LlamaCompletionGenerationOptions["stopOnAbortSignal"])
LlamaModel:useDirectIo(#538) (734693d) (documentation:LlamaModelOptions["useDirectIo"])
Bug Fixes
- support new CUDA 13.1 archs (#538) (734693d)
- build the prebuilt binaries with CUDA 13.1 instead of 13.0 (#538) (734693d)
Shipped with llama.cpp release b7698
To use the latest
llama.cpprelease available, runnpx -n node-llama-cpp source download --release latest. (learn more)