Skip to content

tiny graph fix and added benchmark#456

Merged
HenryNdubuaku merged 1 commit intomainfrom
karen/pi-more-benchmark
Feb 26, 2026
Merged

tiny graph fix and added benchmark#456
HenryNdubuaku merged 1 commit intomainfrom
karen/pi-more-benchmark

Conversation

@kar-m
Copy link
Copy Markdown
Collaborator

@kar-m kar-m commented Feb 26, 2026

@HenryNdubuaku full results below but this is for weights that were loaded from huggingface, just making sure those are weights at the correct precision. Can't easily run reconvert with these models because it causes OOM on raspberry pi, but if hf doesn't have the int4 versions i can work around that.

╔══════════════════════════════════════════╗
║ 1K CONTEXT TEST ║
╚══════════════════════════════════════════╝
Response: The data represents the circumference of circles as they get closer to a specific value (100.530880). The circumference grows by 2 units for each circle added until it reaches 125.66 units, which is the closest value before reaching the next multiple of 100. As each subsequent circle's circumference adds only 1 unit (compared to an average increase of approximately 2 per previous round), they gradually converge on this target without achieving it exactly, demonstrating how small adjustments can be made for [-> stopped]

[Results]
"success": true,
"error": null,
"cloud_handoff": false,
"response": "The data represents the circumference of circles as they get closer to a specific value (100.530880). The circumference grows by 2 units for each circle added until it reaches 125.66 units, which is the closest value before reaching the next multiple of 100. As each subsequent circle's circumference adds only 1 unit (compared to an average increase of approximately 2 per previous round), they gradually converge on this target without achieving it exactly, demonstrating how small adjustments can be made for",
"function_calls": [],
"confidence": 0.8480,
"time_to_first_token_ms": 13129.16,
"total_time_ms": 22427.45,
"prefill_tps": 69.24,
"decode_tps": 10.65,
"ram_usage_mb": 868.52,
"prefill_tokens": 909,
"decode_tokens": 100,
"total_tokens": 1009
└─ Status: PASSED ✓
✓ PASS │ 1k_context

╔══════════════════════════════════════════╗
║ TRANSCRIPTION ║
╚══════════════════════════════════════════╝
Transcript: hello hello hello just quickly testing out creating a wa file through voice memos the goal is to use this wite file to test out whisper hopefully this will transcribe properly that's all i can hope for all right here we go

[Results]
"success": true,
"error": null,
"cloud_handoff": false,
"response": "hello hello hello just quickly testing out creating a wa file through voice memos the goal is to use this wite file to test out whisper hopefully this will transcribe properly that's all i can hope for all right here we go",
"function_calls": [],
"confidence": 1.0000,
"time_to_first_token_ms": 4581.88,
"total_time_ms": 4582.28,
"prefill_tps": 0.00,
"decode_tps": 182044.89,
"ram_usage_mb": 1931.03,
"prefill_tokens": 0,
"decode_tokens": 74,
"total_tokens": 74
✓ PASS │ transcription

╔══════════════════════════════════════════╗
║ VLM MULTI-TURN TEST ║
╚══════════════════════════════════════════╝

[Turn 1]
User: Describe what is happening in this image in two sentences.
Assistant: A close-up image of a brown monkey with dark green eyes, giving an intense gaze into the camera. The animal's face features deep wrinkles around its mouth and nose area. The background consists of blurred leaves, suggesting an outdoor environment in daylight.

[Results - Turn 1]
"success": true,
"error": null,
"cloud_handoff": false,
"response": "A close-up image of a brown monkey with dark green eyes, giving an intense gaze into the camera. The animal's face features deep wrinkles around its mouth and nose area. The background consists of blurred leaves, suggesting an outdoor environment in daylight.",
"function_calls": [],
"confidence": 0.8523,
"time_to_first_token_ms": 8910.40,
"total_time_ms": 13364.53,
"prefill_tps": 9.76,
"decode_tps": 11.23,
"ram_usage_mb": 2107.34,
"prefill_tokens": 87,
"decode_tokens": 51,
"total_tokens": 138

[Turn 2]
User: Describe the image once again.
Assistant: Here we have an image of a brown monkey with dark green eyes, capturing its intense gaze into the camera with wrinkled face features and soft fur adorning its body. The blurred green background indicates a natural environment where this animal is situated indoors or in captivity, creating the illusion of it being outside but looking very close up inside.

[Results - Turn 2]
"success": true,
"error": null,
"cloud_handoff": false,
"response": "Here we have an image of a brown monkey with dark green eyes, capturing its intense gaze into the camera with wrinkled face features and soft fur adorning its body. The blurred green background indicates a natural environment where this animal is situated indoors or in captivity, creating the illusion of it being outside but looking very close up inside.",
"function_calls": [],
"confidence": 0.8612,
"time_to_first_token_ms": 9347.62,
"total_time_ms": 15411.22,
"prefill_tps": 16.58,
"decode_tps": 10.88,
"ram_usage_mb": 2320.09,
"prefill_tokens": 155,
"decode_tokens": 67,
"total_tokens": 222
✓ PASS │ vlm_multiturn
────────────────────────────────────────────────────────────────────────────────────────
✓ All 1 tests passed!

Copilot AI review requested due to automatic review settings February 26, 2026 04:53
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request fixes C++20 designated initializer ordering compliance and adds Raspberry Pi 5 benchmark results. The changes ensure that OpParams designated initializers follow the struct field declaration order as required by C++20, and documents performance metrics for IoT device deployment.

Changes:

  • Fixed designated initializer field ordering in batchnorm and conv1d functions to match OpParams struct declaration
  • Added Raspberry Pi 5 benchmark results to the performance table

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
cactus/graph/graph_builder.cpp Corrected OpParams designated initializer field ordering to comply with C++20 requirements (epsilon before axis, output_precision before stride)
README.md Added Raspberry Pi 5 benchmark data showing 69tps/11tps prefill/decode performance

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@HenryNdubuaku HenryNdubuaku merged commit b36806e into main Feb 26, 2026
9 of 10 checks passed
cattermelon1234 pushed a commit to cattermelon1234/cactus that referenced this pull request Feb 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants