Releases: cactus-compute/cactus
Releases · cactus-compute/cactus
v1.11
What's Changed
- Fix/issue#490 by @lennartvoelz in #491
- simplify and align sdks by @jakmro in #489
- remove models by @jakmro in #492
- Update model configurations and enhance workflow settings in publish_… by @jakmro in #495
- Update workflow to use macos-latest instead of macos-latest-xlarge by @jakmro in #496
- Add dynamic max_tokens estimation based on audio length in cactus_tra… by @jakmro in #499
- macOS: link clang_rt.osx to fix SME2 (_arm_tpidr2*) link failures under rustc by @yujonglee in #498
- Add FFI log control: cactus_log_set_level and cactus_log_set_callback by @yujonglee in #497
- Karen/qwen3p5 by @kar-m in #481
- CLI upgrades by @rshemet in #504
- feat(stt): custom vocabulary biasing for all speech models by @vyomshah05 in #451
- Add Gemma 3N (text-only) model support by @ncylich in #493
- fix: make FunctionGemma prompt formatting strict by @lennartvoelz in #502
- fix: apply logit bias before greedy sampling by @ncylich in #507
- remove redundant file linking for tie_word_embeddings by @jakmro in #506
- Port general engine improvements for TinyLlama by @ncylich in #513
- Speech-to-Text Timestamps by @jakmro in #515
New Contributors
- @lennartvoelz made their first contribution in #491
Full Changelog: v1.10...v1.11
v1.10
What's Changed
- Enhance model publishing workflow with detailed metadata and licenses by @jakmro in #459
- Added parakeet to publish to hf yaml by @ParkiratS in #464
- Update telemetry for supported platforms by @justinl66 in #465
- added back moe weight conversion by @kar-m in #468
- adjust manual workflow for model publish by @jakmro in #470
- Parakeet blog by @ammesatyajit in #467
- perf: add FP16 fast path for LayerNorm by @yujonglee in #433
- Issue #406: Bilinear + Depthwise Optimizations by @PiyawanChaiprasit2006 in #466
- ARM SME2: Accelerate MatMul FP16 by @aarav18 in #457
- build: add Objective-C ARC support for NPU sources by @jakmro in #475
- long transcription by @jakmro in #482
- Language detection by @ParkiratS in #471
- Parakeet tdt by @ParkiratS in #476
- kotlin: expose forceTools in CompletionOptions by @rshemet in #484
- Update model list in README and publish_to_hf.yml with new LiquidAI m… by @jakmro in #487
- test: updated rag test conditions by @nshejwalkar in #488
- optimize scale correction in cactus_attention_f16_h64 by @jakmro in #485
- fix greedy sampler ignoring logit suppression by @jakmro in #486
New Contributors
- @PiyawanChaiprasit2006 made their first contribution in #466
- @aarav18 made their first contribution in #457
Full Changelog: v1.9...v1.10
v1.9
Whats New
- 50% faster int4
- Parakeet models
- LFM2-MOE models
- BugFixes
- Hybrid Inference
PRs
- fix stt test and add cpp ci by @yujonglee in #413
- add IRFFT by @yujonglee in #425
- fixed lfm2 vlm lmhead issue that came in with hf 5.0.0 by @kar-m in #426
- raspberry pi numebrs and linux fixes by @kar-m in #437
- Added parakeet model by @ParkiratS in #443
- Adding parakeet graph by @ParkiratS in #446
- Parakeet kernel by @ParkiratS in #445
- added cloud fallback and documentation+tests by @kar-m in #369
- Parakeet FFI by @ParkiratS in #447
- Parakeet convert and tests by @ParkiratS in #444
- Hybrid transcription blog post by @rshemet in #449
- Fixed missing engine changes by @ParkiratS in #453
- feat(python): add context manager support for safe resource cleanup by @yogyam in #412
- Completed ubuntu CICD pipeline by @ncylich in #455
- Tie-embed-conversion-fix by @ncylich in #454
- tiny graph fix and added benchmark by @kar-m in #456
Full Changelog: v1.8...v1.9
Breaking changes
Weights unfortunately need to be refreshed for this :(
v1.8
What's Changed
- Kernel optimisations by @HenryNdubuaku in #397
- Improve INT4 by @ncylich and @jrajala6 in #343
- add einops dependency to requirements by @jakmro in #371
- Add language parameter support for Whisper transcription by @rshemet in #384
- added moe support for lfm by @kar-m in #374
- Add raw FFI binding for Rust by @yujonglee in #382
- fix: handle spaces in paths when running shell commands by @adithya-n05 in #377
- fixing sentencepiece detection for transformers 5.0+ (still backwards compatible) by @ncylich in #373
- Improve Telemetry by @mhayes853 in #372
- proprietry commit by @HenryNdubuaku
- Update performance metrics for iPhone 13 Mini and Galaxy A56 by @jakmro in #386
- fix: improve version sorting and enhance model export tagging by @jakmro in #387
- Add Rust SDK and language parameter documentation by @rshemet in #389
- Basic addition of int4 functionality by @jrajala6 in #343
- add scalar log by @yujonglee in #390
- fix assertion and linux build in rust test by @yujonglee in #392
- Justin/api fixes by @justinl66 in #380
- Update telemetry by @justinl66 in #394
- docs: add compatibility guidelines for runtime and weights by @jakmro in #398
- add STFT_COMPLEX, derive stft_magnitude via graph composition by @yujonglee in #395
New Contributors
- @yujonglee made their first contribution in #382
- @adithya-n05 made their first contribution in #377
Full Changelog: v1.7...v1.8
Note:
This breaks the weights.
v1.7
What's Changed
- Brew setup @HenryNdubuaku
- Cactus auth @HenryNdubuaku
- Hybrid inference by the cactus team
- Karen/vlm fix by @kar-m in #311
- fixed moonshine state resetting and gemma3 4b layernorm loading by @kar-m in #317
- fix: LFM2 multiple tool calls by @mhayes853 in #316
- fix hf publish by @jakmro in #323
- update models list by @jakmro in #324
- Fixing pip command errors by @rshemet in #322
- Add instructions for installing Ruby version for xcodeproj gem by @jakmro in #327
- tests: remove duplicate vlm_multiturn test in runner by @AI-I224 in #332
- fix: replace NSLog with CACTUS_LOG for iOS NPU debuggability by @KayaanT in #328
- Kernel_attention optimization by @Ayan9074 in #319
- M4airbenchmarks by @Ayan9074 in #336
- docs: update cactus test command description for transcribe models (#297) by @AI-I224 in #339
- Accelerate FP16 matmul via cblas_sgemm for Apple AMX by @KayaanT in #340
- Fix hybrid attention sliding window for Gemma (#320) by @jrajala6 in #338
- bench: update README benchmark with M2 MacBook Air results by @vyomshah05 in #335
- docs: add iPad Pro (12.9") (6th Gen) benchmarks (#296) by @AI-I224 in #333
- removed unused graph i/o methods by @ncylich in #345
- feat: cpp-native telemetry by @justinl66 in #326
- Update CPP Telemetry to point to main DB by @justinl66 in #350
- update python bindings for stream transcribe by @jakmro in #351
- Update CPP Telemetry by @justinl66 in #352
- added only flag by @nshejwalkar in #347
- Added warmups and increased iterations for performance testing by @nshejwalkar in #355
- CMF Phone 2 Pro benchmarks by @jakmro in #356
- Vad by @jakmro in #353
- Cli reconvert by @jakmro in #357
- Asr cloud merging by @kar-m in #348
- Add optional cloud key prompt for transcribe by @rshemet in #359
- HF support multiple precision options by @jakmro in #361
- Add precision parameter to download_from_hf by @jakmro in #362
- revert silero download logic by @jakmro in #365
- Cactus clean now clears cache, Session metrics initialized properly for telemetry by @justinl66 in #363
- Curl prepack by @kar-m in #358
- Fix/f16 reduction accum by @vyomshah05 in #344
- Update telemetry by @justinl66 in #366
- Accelerate FP16 attention via cblas_sgemm for Apple AMX by @KayaanT in #346
New Contributors
- @AI-I224 made their first contribution in #332
- @jrajala6 made their first contribution in #338
- @vyomshah05 made their first contribution in #335
- @nshejwalkar made their first contribution in #347
Full Changelog: v1.6.0...v1.7
@mhayes853 API has breaking changes
v1.6
What's Changed
- Kernel Optimisations & advanced quantisation by @HenryNdubuaku
- Moonshine by @kar-m
- HF publish by @jakmro
- Streaming API by @jakmro
- Linux ARM support by @ncylich
- Stop generation on model end token by @Ayan9074
- i8MM runtime detection @mhayes853
FFI Note: This break API
v1.5
What's Changed
- Groupwise quantisation by @HenryNdubuaku
- Speech-To-Text streaming by @jakmro
- KV Quntisation by @HenryNdubuaku
- Evals by @justinl66 @ParkiratS
- INT4 support by @HenryNdubuaku
- Rust bindings by @mrsarac
Bindings: Please check Cactus FFIs again @jakmro @mrsarac @mhayes853
v1.4
What's Changed
- Cactus index by @jakmro
- Function Gemma by @HenryNdubuaku
- Perplexity eval to model by @ParkiratS
- Tool refactor by @rshemet
- F16 kernel updates by @KayaanT
- Multi-turn VLM conversation continuity by @HenryNdubuaku
- Bugfixes by @ammesatyajit
Instruction for bindings @mhayes853
- Should easily replace v1.3 without headaches
v1.3
What's Changed
- Apple NPU support by @HenryNdubuaku
- Optimized softmax to use Horner's method by @ParkiratS
- Tunix finetuning by @ncylich
- PCM input stream to whisper transcription by @devabhixda
- Cli mobile benchmarks by @jakmro
- Telemetry framework by @devabhixda
- Python bindings by @HenryNdubuaku
Bindings @devabhixda @jakmro @mhayes853
- The CLI has changed a bit, read the cactus_ffi.h files properly.
- If you need keys for the Pro, please reach out to [email protected]
Full Changelog: v1.2...v1.3
v1.2
Aggressive memory optimisations by @HenryNdubuaku
Binding Instrstructions: