Conversation
Signed-off-by: Karen Mosoyan <[email protected]>
…anscribe for handoff detection Signed-off-by: Karen Mosoyan <[email protected]>
Signed-off-by: Karen Mosoyan <[email protected]>
Signed-off-by: Karen Mosoyan <[email protected]>
Signed-off-by: Karen Mosoyan <[email protected]>
Signed-off-by: Karen Mosoyan <[email protected]>
Signed-off-by: Karen Mosoyan <[email protected]>
There was a problem hiding this comment.
Pull request overview
This PR introduces a “cloud handoff” path for streaming ASR, allowing locally-confirmed segments to be optionally refined by a cloud transcription service and then merged back into the live transcript display.
Changes:
- Add cloud handoff signaling from
cactus_transcribe(entropy-based threshold) and propagate cloud job/result fields through the streaming FFI response. - Implement optional libcurl-based cloud transcription in the stream processor and return asynchronous job results to the client.
- Update the
tests/asr.cpplive UI to track “awaiting cloud” segments and merge cloud results back into confirmed text; adjust CMake curl wiring.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 11 comments.
Show a summary per file
| File | Description |
|---|---|
tests/asr.cpp |
Adds client-side segment tracking/merging UI for cloud results and warns when API key is missing. |
tests/CMakeLists.txt |
Removes explicit CURL linking for tests/apps. |
cactus/ffi/cactus_transcribe.cpp |
Adds entropy-based cloud_handoff boolean into the JSON response. |
cactus/ffi/cactus_stream.cpp |
Adds optional CURL cloud transcription, async job tracking, and additional JSON response fields for cloud job/results. |
cactus/engine/engine_model.cpp |
Sets default cloud handoff thresholds per model type. |
cactus/engine/engine.h |
Adds default_cloud_handoff_threshold to engine config. |
cactus/CMakeLists.txt |
Makes CURL optional, links it when found, and defines CACTUS_USE_CURL. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
cactus/ffi/cactus_stream.cpp
Outdated
| std::string pattern = "\"transcript\":\""; | ||
| size_t pos = response_body.find(pattern); | ||
| if (pos == std::string::npos) return original_text; | ||
| size_t start = pos + pattern.length(); | ||
| size_t end = response_body.find('"', start); | ||
| if (end == std::string::npos) return original_text; | ||
| return response_body.substr(start, end - start); |
There was a problem hiding this comment.
Response JSON parsing for the cloud call is done via a simple substring search for "transcript":" and then the next quote. This will truncate or fail if the transcript contains escaped quotes/backslashes or if the JSON is formatted differently; prefer using a proper JSON extractor/parser (or extend the existing json_string helper to handle escapes) and unescape the result.
| if (handle->previous_cloud_handoff && !confirmed.empty()) { | ||
| cloud_handoff_triggered = true; | ||
| #ifdef CACTUS_USE_CURL | ||
| std::vector<uint8_t> confirmed_audio( | ||
| handle->audio_buffer.begin(), | ||
| handle->audio_buffer.begin() + handle->previous_audio_buffer_size | ||
| ); | ||
| auto wav = build_wav(confirmed_audio.data(), confirmed_audio.size()); | ||
| std::string b64 = base64_encode(wav.data(), wav.size()); | ||
| cloud_job_id = handle->next_cloud_job_id++; | ||
| handle->pending_cloud_jobs.push_back({ | ||
| cloud_job_id, | ||
| std::async(std::launch::async, cloud_transcribe, b64, confirmed) | ||
| }); | ||
| #endif | ||
| } |
There was a problem hiding this comment.
cloud_handoff_triggered is set to true even when CACTUS_USE_CURL is not enabled (or when no job is actually queued), which can produce responses with cloud_handoff=true but cloud_job_id=0. Please only set cloud_handoff_triggered when a cloud job is successfully created (and consider skipping job creation entirely when CACTUS_CLOUD_API_KEY is empty).
| bool force_tools, include_stop_sequences, use_vad, telemetry_enabled; | ||
| float cloud_handoff_threshold = handle->model->get_config().default_cloud_handoff_threshold; | ||
| parse_options_json( | ||
| options_json ? options_json : "", temperature, | ||
| top_p, top_k, max_tokens, stop_sequences, | ||
| force_tools, tool_rag_top_k, confidence_threshold, | ||
| include_stop_sequences, use_vad, telemetry_enabled | ||
| ); | ||
| (void)telemetry_enabled; | ||
|
|
There was a problem hiding this comment.
telemetry_enabled is parsed from options_json but then explicitly ignored. This makes the public options surface misleading (callers can't actually disable telemetry); either wire this through to guard telemetry::recordTranscription calls, or remove the option from parsing for cactus_transcribe to avoid a no-op setting.
| float first_token_entropy = 0.0f; | ||
| float total_entropy_sum = 0.0f; | ||
| size_t total_entropy_count = 0; | ||
| float max_token_entropy_norm = 0.0f; |
There was a problem hiding this comment.
Variable name max_token_entropy_norm is misleading because it stores the maximum raw token entropy (no normalization is performed in this function). Renaming it to reflect what it contains (or adding a normalization step if intended) will make the cloud_handoff threshold logic easier to reason about and tune.
| float max_token_entropy_norm = 0.0f; | |
| // Stores the maximum raw (unnormalized) token entropy observed during decoding. | |
| float max_token_entropy_raw = 0.0f; | |
| #define max_token_entropy_norm max_token_entropy_raw |
| if (api_key.empty()) { | ||
| std::cout << colored("Warning: ", Color::YELLOW + Color::BOLD) | ||
| << "CACTUS_CLOUD_API_KEY environment variable not set.\n"; | ||
| std::cout << colored(" Cloud handoff will be disabled (fallback to local transcription).\n", Color::YELLOW); |
There was a problem hiding this comment.
This warning says cloud handoff will be disabled when CACTUS_CLOUD_API_KEY is missing, but the streaming code can still mark cloud_handoff and even create a (no-op) cloud job ID before immediately falling back. Please align the message with actual behavior (e.g., "Cloud refinement unavailable" / "No cloud requests will be sent") and/or ensure the stream path truly skips job creation when the key is empty.
| std::cout << colored(" Cloud handoff will be disabled (fallback to local transcription).\n", Color::YELLOW); | |
| std::cout << colored(" Cloud refinement unavailable; no cloud requests will be sent (fallback to local transcription).\n", Color::YELLOW); |
cactus/ffi/cactus_stream.cpp
Outdated
| static const std::string CLOUD_API_URL = "http://104.198.76.3/api/v1/transcribe"; | ||
|
|
There was a problem hiding this comment.
CLOUD_API_URL is hard-coded to a public IP and uses plain HTTP. This risks leaking audio/API keys in transit and makes deployments brittle; please make the endpoint configurable (e.g., env/config) and default to HTTPS with certificate verification.
| static const std::string CLOUD_API_URL = "http://104.198.76.3/api/v1/transcribe"; | |
| static std::string get_cloud_api_url() { | |
| const char* env_url = std::getenv("CACTUS_CLOUD_API_URL"); | |
| if (env_url && env_url[0] != '\0') { | |
| return std::string(env_url); | |
| } | |
| // Default to HTTPS for secure transport if no environment override is provided. | |
| return "https://104.198.76.3/api/v1/transcribe"; | |
| } | |
| static const std::string CLOUD_API_URL = get_cloud_api_url(); |
| CURL* curl = curl_easy_init(); | ||
| if (!curl) return original_text; | ||
|
|
||
| std::string response_body; | ||
| struct curl_slist* headers = nullptr; | ||
| headers = curl_slist_append(headers, ("X-API-Key: " + api_key).c_str()); | ||
| headers = curl_slist_append(headers, "Content-Type: application/json"); | ||
|
|
||
| curl_easy_setopt(curl, CURLOPT_URL, CLOUD_API_URL.c_str()); | ||
| curl_easy_setopt(curl, CURLOPT_HTTPHEADER, headers); | ||
| curl_easy_setopt(curl, CURLOPT_POSTFIELDS, payload.c_str()); | ||
| curl_easy_setopt(curl, CURLOPT_POSTFIELDSIZE, static_cast<long>(payload.size())); | ||
| curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, curl_write_cb); | ||
| curl_easy_setopt(curl, CURLOPT_WRITEDATA, &response_body); | ||
| curl_easy_setopt(curl, CURLOPT_TIMEOUT, 15L); | ||
|
|
||
| CURLcode res = curl_easy_perform(curl); |
There was a problem hiding this comment.
libcurl is used from async threads, but there is no curl_global_init() call anywhere in the codebase. Please add a one-time curl_global_init(CURL_GLOBAL_DEFAULT) (and matching curl_global_cleanup at shutdown) before any curl_easy_init(), otherwise behavior is undefined on some platforms.
| std::string b64 = base64_encode(wav.data(), wav.size()); | ||
| cloud_job_id = handle->next_cloud_job_id++; | ||
| handle->pending_cloud_jobs.push_back({ | ||
| cloud_job_id, | ||
| std::async(std::launch::async, cloud_transcribe, b64, confirmed) | ||
| }); |
There was a problem hiding this comment.
Each confirmed segment that triggers a handoff spawns a new std::async(std::launch::async, ...) without any concurrency/backpressure control. This can create unbounded threads and large in-flight base64 payloads under poor network conditions; consider a bounded queue + single worker thread (or a small thread pool) and drop/merge jobs when overloaded.
| struct CloudJob { | ||
| uint64_t id; | ||
| std::future<std::string> result; | ||
| }; | ||
| std::vector<CloudJob> pending_cloud_jobs; | ||
| std::vector<std::pair<uint64_t, std::string>> completed_cloud_results; |
There was a problem hiding this comment.
pending_cloud_jobs stores std::future objects returned by std::async. If the stream handle is destroyed while jobs are still running, std::future destructors can block waiting for completion, causing cactus_stream_transcribe_stop() to hang unexpectedly. Please make shutdown behavior explicit (e.g., drain results before stop, use a dedicated worker thread with cancel flag, or store shared_futures and detach worker lifetime from the handle).
| json_builder << "\"ram_usage_mb\":" << ram_usage_mb << ","; | ||
| json_builder << "\"prefill_tokens\":" << prefill_tokens << ","; | ||
| json_builder << "\"decode_tokens\":" << decode_tokens << ","; | ||
| json_builder << "\"total_tokens\":" << total_tokens; |
There was a problem hiding this comment.
New cloud fields (cloud_handoff/cloud_job_id/cloud_result_job_id/cloud_result/buffer_duration_ms) are added to the stream JSON response, but existing stream transcription tests don't assert any of this behavior. Please extend tests (e.g., tests/test_engine.cpp stream path) to validate the presence/shape of these fields and basic job lifecycle semantics (IDs are 0 when disabled, results map to job IDs, etc.).
| json_builder << "\"total_tokens\":" << total_tokens; | |
| json_builder << "\"total_tokens\":" << total_tokens << ","; | |
| json_builder << "\"cloud_handoff\":" << (cloud_handoff ? "true" : "false") << ","; | |
| json_builder << "\"buffer_duration_ms\":" << buffer_duration_ms << ","; | |
| json_builder << "\"cloud_job_id\":" << cloud_job_id << ","; | |
| json_builder << "\"cloud_result_job_id\":" << cloud_result_job_id << ","; | |
| json_builder << "\"cloud_result\":\"" << escape_json(cloud_result) << "\""; |
Signed-off-by: Karen Mosoyan <[email protected]>
* should be working logic for asr and merging, dummy logic for cloud Signed-off-by: Karen Mosoyan <[email protected]> * implemented cloud handoff in asr and entropy based logic in cactus_transcribe for handoff detection Signed-off-by: Karen Mosoyan <[email protected]> * tiny cleanup Signed-off-by: Karen Mosoyan <[email protected]> * changed latency wording a little Signed-off-by: Karen Mosoyan <[email protected]> * moved logic back to cactus_stream Signed-off-by: Karen Mosoyan <[email protected]> * addressed some copilot comments Signed-off-by: Karen Mosoyan <[email protected]> --------- Signed-off-by: Karen Mosoyan <[email protected]>
* should be working logic for asr and merging, dummy logic for cloud Signed-off-by: Karen Mosoyan <[email protected]> * implemented cloud handoff in asr and entropy based logic in cactus_transcribe for handoff detection Signed-off-by: Karen Mosoyan <[email protected]> * tiny cleanup Signed-off-by: Karen Mosoyan <[email protected]> * changed latency wording a little Signed-off-by: Karen Mosoyan <[email protected]> * moved logic back to cactus_stream Signed-off-by: Karen Mosoyan <[email protected]> * addressed some copilot comments Signed-off-by: Karen Mosoyan <[email protected]> --------- Signed-off-by: Karen Mosoyan <[email protected]>
No description provided.