Skip to content

Asr cloud merging#348

Merged
HenryNdubuaku merged 8 commits intocactus-compute:mainfrom
BruinAI:asr-cloud-merging
Feb 16, 2026
Merged

Asr cloud merging#348
HenryNdubuaku merged 8 commits intocactus-compute:mainfrom
BruinAI:asr-cloud-merging

Conversation

@kar-m
Copy link
Copy Markdown
Collaborator

@kar-m kar-m commented Feb 13, 2026

No description provided.

…anscribe for handoff detection

Signed-off-by: Karen Mosoyan <[email protected]>
Signed-off-by: Karen Mosoyan <[email protected]>
Signed-off-by: Karen Mosoyan <[email protected]>
Signed-off-by: Karen Mosoyan <[email protected]>
@kar-m kar-m marked this pull request as ready for review February 16, 2026 06:17
Copilot AI review requested due to automatic review settings February 16, 2026 06:17
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a “cloud handoff” path for streaming ASR, allowing locally-confirmed segments to be optionally refined by a cloud transcription service and then merged back into the live transcript display.

Changes:

  • Add cloud handoff signaling from cactus_transcribe (entropy-based threshold) and propagate cloud job/result fields through the streaming FFI response.
  • Implement optional libcurl-based cloud transcription in the stream processor and return asynchronous job results to the client.
  • Update the tests/asr.cpp live UI to track “awaiting cloud” segments and merge cloud results back into confirmed text; adjust CMake curl wiring.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 11 comments.

Show a summary per file
File Description
tests/asr.cpp Adds client-side segment tracking/merging UI for cloud results and warns when API key is missing.
tests/CMakeLists.txt Removes explicit CURL linking for tests/apps.
cactus/ffi/cactus_transcribe.cpp Adds entropy-based cloud_handoff boolean into the JSON response.
cactus/ffi/cactus_stream.cpp Adds optional CURL cloud transcription, async job tracking, and additional JSON response fields for cloud job/results.
cactus/engine/engine_model.cpp Sets default cloud handoff thresholds per model type.
cactus/engine/engine.h Adds default_cloud_handoff_threshold to engine config.
cactus/CMakeLists.txt Makes CURL optional, links it when found, and defines CACTUS_USE_CURL.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +246 to +252
std::string pattern = "\"transcript\":\"";
size_t pos = response_body.find(pattern);
if (pos == std::string::npos) return original_text;
size_t start = pos + pattern.length();
size_t end = response_body.find('"', start);
if (end == std::string::npos) return original_text;
return response_body.substr(start, end - start);
Copy link

Copilot AI Feb 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Response JSON parsing for the cloud call is done via a simple substring search for "transcript":" and then the next quote. This will truncate or fail if the transcript contains escaped quotes/backslashes or if the JSON is formatted differently; prefer using a proper JSON extractor/parser (or extend the existing json_string helper to handle escapes) and unescape the result.

Copilot uses AI. Check for mistakes.
Comment on lines +518 to +533
if (handle->previous_cloud_handoff && !confirmed.empty()) {
cloud_handoff_triggered = true;
#ifdef CACTUS_USE_CURL
std::vector<uint8_t> confirmed_audio(
handle->audio_buffer.begin(),
handle->audio_buffer.begin() + handle->previous_audio_buffer_size
);
auto wav = build_wav(confirmed_audio.data(), confirmed_audio.size());
std::string b64 = base64_encode(wav.data(), wav.size());
cloud_job_id = handle->next_cloud_job_id++;
handle->pending_cloud_jobs.push_back({
cloud_job_id,
std::async(std::launch::async, cloud_transcribe, b64, confirmed)
});
#endif
}
Copy link

Copilot AI Feb 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cloud_handoff_triggered is set to true even when CACTUS_USE_CURL is not enabled (or when no job is actually queued), which can produce responses with cloud_handoff=true but cloud_job_id=0. Please only set cloud_handoff_triggered when a cloud job is successfully created (and consider skipping job creation entirely when CACTUS_CLOUD_API_KEY is empty).

Copilot uses AI. Check for mistakes.
Comment on lines 103 to 112
bool force_tools, include_stop_sequences, use_vad, telemetry_enabled;
float cloud_handoff_threshold = handle->model->get_config().default_cloud_handoff_threshold;
parse_options_json(
options_json ? options_json : "", temperature,
top_p, top_k, max_tokens, stop_sequences,
force_tools, tool_rag_top_k, confidence_threshold,
include_stop_sequences, use_vad, telemetry_enabled
);
(void)telemetry_enabled;

Copy link

Copilot AI Feb 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

telemetry_enabled is parsed from options_json but then explicitly ignored. This makes the public options surface misleading (callers can't actually disable telemetry); either wire this through to guard telemetry::recordTranscription calls, or remove the option from parsing for cactus_transcribe to avoid a no-op setting.

Copilot uses AI. Check for mistakes.
float first_token_entropy = 0.0f;
float total_entropy_sum = 0.0f;
size_t total_entropy_count = 0;
float max_token_entropy_norm = 0.0f;
Copy link

Copilot AI Feb 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variable name max_token_entropy_norm is misleading because it stores the maximum raw token entropy (no normalization is performed in this function). Renaming it to reflect what it contains (or adding a normalization step if intended) will make the cloud_handoff threshold logic easier to reason about and tune.

Suggested change
float max_token_entropy_norm = 0.0f;
// Stores the maximum raw (unnormalized) token entropy observed during decoding.
float max_token_entropy_raw = 0.0f;
#define max_token_entropy_norm max_token_entropy_raw

Copilot uses AI. Check for mistakes.
if (api_key.empty()) {
std::cout << colored("Warning: ", Color::YELLOW + Color::BOLD)
<< "CACTUS_CLOUD_API_KEY environment variable not set.\n";
std::cout << colored(" Cloud handoff will be disabled (fallback to local transcription).\n", Color::YELLOW);
Copy link

Copilot AI Feb 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This warning says cloud handoff will be disabled when CACTUS_CLOUD_API_KEY is missing, but the streaming code can still mark cloud_handoff and even create a (no-op) cloud job ID before immediately falling back. Please align the message with actual behavior (e.g., "Cloud refinement unavailable" / "No cloud requests will be sent") and/or ensure the stream path truly skips job creation when the key is empty.

Suggested change
std::cout << colored(" Cloud handoff will be disabled (fallback to local transcription).\n", Color::YELLOW);
std::cout << colored(" Cloud refinement unavailable; no cloud requests will be sent (fallback to local transcription).\n", Color::YELLOW);

Copilot uses AI. Check for mistakes.
Comment on lines +149 to +150
static const std::string CLOUD_API_URL = "http://104.198.76.3/api/v1/transcribe";

Copy link

Copilot AI Feb 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CLOUD_API_URL is hard-coded to a public IP and uses plain HTTP. This risks leaking audio/API keys in transit and makes deployments brittle; please make the endpoint configurable (e.g., env/config) and default to HTTPS with certificate verification.

Suggested change
static const std::string CLOUD_API_URL = "http://104.198.76.3/api/v1/transcribe";
static std::string get_cloud_api_url() {
const char* env_url = std::getenv("CACTUS_CLOUD_API_URL");
if (env_url && env_url[0] != '\0') {
return std::string(env_url);
}
// Default to HTTPS for secure transport if no environment override is provided.
return "https://104.198.76.3/api/v1/transcribe";
}
static const std::string CLOUD_API_URL = get_cloud_api_url();

Copilot uses AI. Check for mistakes.
Comment on lines +224 to +240
CURL* curl = curl_easy_init();
if (!curl) return original_text;

std::string response_body;
struct curl_slist* headers = nullptr;
headers = curl_slist_append(headers, ("X-API-Key: " + api_key).c_str());
headers = curl_slist_append(headers, "Content-Type: application/json");

curl_easy_setopt(curl, CURLOPT_URL, CLOUD_API_URL.c_str());
curl_easy_setopt(curl, CURLOPT_HTTPHEADER, headers);
curl_easy_setopt(curl, CURLOPT_POSTFIELDS, payload.c_str());
curl_easy_setopt(curl, CURLOPT_POSTFIELDSIZE, static_cast<long>(payload.size()));
curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, curl_write_cb);
curl_easy_setopt(curl, CURLOPT_WRITEDATA, &response_body);
curl_easy_setopt(curl, CURLOPT_TIMEOUT, 15L);

CURLcode res = curl_easy_perform(curl);
Copy link

Copilot AI Feb 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

libcurl is used from async threads, but there is no curl_global_init() call anywhere in the codebase. Please add a one-time curl_global_init(CURL_GLOBAL_DEFAULT) (and matching curl_global_cleanup at shutdown) before any curl_easy_init(), otherwise behavior is undefined on some platforms.

Copilot uses AI. Check for mistakes.
Comment on lines +526 to +531
std::string b64 = base64_encode(wav.data(), wav.size());
cloud_job_id = handle->next_cloud_job_id++;
handle->pending_cloud_jobs.push_back({
cloud_job_id,
std::async(std::launch::async, cloud_transcribe, b64, confirmed)
});
Copy link

Copilot AI Feb 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each confirmed segment that triggers a handoff spawns a new std::async(std::launch::async, ...) without any concurrency/backpressure control. This can create unbounded threads and large in-flight base64 payloads under poor network conditions; consider a bounded queue + single worker thread (or a small thread pool) and drop/merge jobs when overloaded.

Copilot uses AI. Check for mistakes.
Comment on lines +271 to +276
struct CloudJob {
uint64_t id;
std::future<std::string> result;
};
std::vector<CloudJob> pending_cloud_jobs;
std::vector<std::pair<uint64_t, std::string>> completed_cloud_results;
Copy link

Copilot AI Feb 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pending_cloud_jobs stores std::future objects returned by std::async. If the stream handle is destroyed while jobs are still running, std::future destructors can block waiting for completion, causing cactus_stream_transcribe_stop() to hang unexpectedly. Please make shutdown behavior explicit (e.g., drain results before stop, use a dedicated worker thread with cancel flag, or store shared_futures and detach worker lifetime from the handle).

Copilot uses AI. Check for mistakes.
json_builder << "\"ram_usage_mb\":" << ram_usage_mb << ",";
json_builder << "\"prefill_tokens\":" << prefill_tokens << ",";
json_builder << "\"decode_tokens\":" << decode_tokens << ",";
json_builder << "\"total_tokens\":" << total_tokens;
Copy link

Copilot AI Feb 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New cloud fields (cloud_handoff/cloud_job_id/cloud_result_job_id/cloud_result/buffer_duration_ms) are added to the stream JSON response, but existing stream transcription tests don't assert any of this behavior. Please extend tests (e.g., tests/test_engine.cpp stream path) to validate the presence/shape of these fields and basic job lifecycle semantics (IDs are 0 when disabled, results map to job IDs, etc.).

Suggested change
json_builder << "\"total_tokens\":" << total_tokens;
json_builder << "\"total_tokens\":" << total_tokens << ",";
json_builder << "\"cloud_handoff\":" << (cloud_handoff ? "true" : "false") << ",";
json_builder << "\"buffer_duration_ms\":" << buffer_duration_ms << ",";
json_builder << "\"cloud_job_id\":" << cloud_job_id << ",";
json_builder << "\"cloud_result_job_id\":" << cloud_result_job_id << ",";
json_builder << "\"cloud_result\":\"" << escape_json(cloud_result) << "\"";

Copilot uses AI. Check for mistakes.
Signed-off-by: Karen Mosoyan <[email protected]>
@HenryNdubuaku HenryNdubuaku merged commit 3bdc39f into cactus-compute:main Feb 16, 2026
1 of 2 checks passed
ncylich pushed a commit that referenced this pull request Feb 24, 2026
* should be working logic for asr and merging, dummy logic for cloud

Signed-off-by: Karen Mosoyan <[email protected]>

* implemented cloud handoff in asr and entropy based logic in cactus_transcribe for handoff detection

Signed-off-by: Karen Mosoyan <[email protected]>

* tiny cleanup

Signed-off-by: Karen Mosoyan <[email protected]>

* changed latency wording a little

Signed-off-by: Karen Mosoyan <[email protected]>

* moved logic back to cactus_stream

Signed-off-by: Karen Mosoyan <[email protected]>

* addressed some copilot comments

Signed-off-by: Karen Mosoyan <[email protected]>

---------

Signed-off-by: Karen Mosoyan <[email protected]>
cattermelon1234 pushed a commit to cattermelon1234/cactus that referenced this pull request Feb 28, 2026
* should be working logic for asr and merging, dummy logic for cloud

Signed-off-by: Karen Mosoyan <[email protected]>

* implemented cloud handoff in asr and entropy based logic in cactus_transcribe for handoff detection

Signed-off-by: Karen Mosoyan <[email protected]>

* tiny cleanup

Signed-off-by: Karen Mosoyan <[email protected]>

* changed latency wording a little

Signed-off-by: Karen Mosoyan <[email protected]>

* moved logic back to cactus_stream

Signed-off-by: Karen Mosoyan <[email protected]>

* addressed some copilot comments

Signed-off-by: Karen Mosoyan <[email protected]>

---------

Signed-off-by: Karen Mosoyan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants