Update telemetry for supported platforms#465
Conversation
Signed-off-by: justinl66 <[email protected]>
Signed-off-by: justinl66 <[email protected]>
Signed-off-by: justinl66 <[email protected]>
Signed-off-by: justinl66 <[email protected]>
Signed-off-by: justinl66 <[email protected]>
Signed-off-by: justinl66 <[email protected]>
Signed-off-by: justinl66 <[email protected]>
Signed-off-by: justinl66 <[email protected]>
Signed-off-by: justinl66 <[email protected]>
This reverts commit 41fe9db. Signed-off-by: justinl66 <[email protected]>
Signed-off-by: justinl66 <[email protected]>
Signed-off-by: justinl66 <[email protected]>
This reverts commit a00c2dd. Signed-off-by: justinl66 <[email protected]>
Signed-off-by: justinl66 <[email protected]>
Signed-off-by: justinl66 <[email protected]>
Signed-off-by: justinl66 <[email protected]>
Signed-off-by: justinl66 <[email protected]>
Signed-off-by: justinl66 <[email protected]>
Signed-off-by: justinl66 <[email protected]>
Signed-off-by: justinl66 <[email protected]>
Signed-off-by: justinl66 <[email protected]>
There was a problem hiding this comment.
Pull request overview
This pull request enhances the telemetry system across all supported platforms (Flutter, Swift, Kotlin/Android, iOS) with more accurate tracking, better auto-detection capabilities, and improved shutdown reliability.
Changes:
- Enhanced project_id generation to use deterministic UUIDs derived from git remote URLs (shared across team members) with fallback to device_id
- Added automatic app_id detection from platform-specific APIs (CoreFoundation on Apple, /proc/self/cmdline on Linux/Android) with manual override capability
- Extended telemetry APIs to include version parameter and RAM usage tracking for transcription events
- Improved session field handling to only populate for STREAM_TRANSCRIBE events, and moved framework tagging from JNI_OnLoad to Kotlin companion object initialization
Reviewed changes
Copilot reviewed 13 out of 13 changed files in this pull request and generated 10 comments.
Show a summary per file
| File | Description |
|---|---|
| flutter/cactus.dart | Updated FFI type definitions to add version parameter and added flush() calls in all dispose() methods |
| cactus/telemetry/telemetry.h | Added setAppId() function and extended setTelemetryEnvironment() with version parameter, updated recordTranscription() to include ram_usage_mb |
| cactus/telemetry/telemetry.cpp | Implemented git remote URL reading, app_id auto-detection, Android cache directory auto-derivation, session field logic, and reordered shutdown to flush before disabling |
| cactus/ffi/cactus_transcribe.cpp | Updated all recordTranscription() calls to pass ram_usage_mb parameter (0.0 for errors, get_ram_usage_mb() for success) |
| cactus/ffi/cactus_telemetry.cpp | Added cactus_set_app_id() FFI export and updated cactus_set_telemetry_environment() signature |
| cactus/ffi/cactus_ffi.h | Added cactus_set_app_id() export declaration and updated cactus_set_telemetry_environment() signature |
| cactus/CMakeLists.txt | Added logic to read CACTUS_VERSION file and inject as compile-time definition |
| apple/CmakeLists.txt | Added logic to read CACTUS_VERSION file and inject as compile-time definition |
| apple/Cactus.swift | Updated setTelemetryEnvironment() calls with version parameter and added app_id detection from bundle identifier |
| android/cactus_jni.cpp | Removed framework tagging from JNI_OnLoad, added nativeSetFramework JNI function, updated signature |
| android/Cactus.kt | Added nativeSetFramework() call in companion object init block to set framework tag |
| android/Cactus.ios.kt | Updated setTelemetryEnvironment() calls with version parameter |
| android/CMakeLists.txt | Added logic to read CACTUS_VERSION file and inject as compile-time definition |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if(NOT "${CACTUS_VERSION_STR}" STREQUAL "") | ||
| target_compile_definitions(${target_name} PRIVATE CACTUS_COMPILE_TIME_VERSION="${CACTUS_VERSION_STR}") | ||
| endif() |
There was a problem hiding this comment.
CoreFoundation framework is used in telemetry.cpp (CFBundleRef, CFStringRef, CFURLRef, etc.) but is not explicitly linked in the CMake configuration. While CoreFoundation is typically included via Foundation framework on macOS, it's better practice to explicitly link it when using its APIs directly. Add CoreFoundation to the find_library calls and target_link_libraries to ensure proper linkage across all macOS configurations.
| char buf[256]; | ||
| return CFStringGetCString(id, buf, sizeof(buf), kCFStringEncodingUTF8) ? buf : ""; |
There was a problem hiding this comment.
The buffer size of 256 bytes for CFBundleIdentifier may be insufficient. Bundle identifiers can be longer than 256 characters, especially for development or enterprise apps with long reverse-domain prefixes. If CFStringGetCString fails due to insufficient buffer space, the function returns an empty string, silently failing without logging. Consider using a larger buffer (e.g., 512 or 1024 bytes) or dynamically allocating based on CFStringGetLength.
| char buf[256]; | |
| return CFStringGetCString(id, buf, sizeof(buf), kCFStringEncodingUTF8) ? buf : ""; | |
| CFIndex length = CFStringGetLength(id); | |
| if (length == 0) return ""; | |
| CFIndex maxSize = CFStringGetMaximumSizeForEncoding(length, kCFStringEncodingUTF8) + 1; // +1 for null terminator | |
| if (maxSize <= 0) return ""; | |
| std::vector<char> buf(static_cast<size_t>(maxSize)); | |
| if (CFStringGetCString(id, buf.data(), buf.size(), kCFStringEncodingUTF8)) { | |
| return std::string(buf.data()); | |
| } | |
| return ""; |
| size_t pos = line.find("url = "); | ||
| if (pos != std::string::npos) return line.substr(pos + 6); |
There was a problem hiding this comment.
The git config parser assumes "url = " format with a space after the equals sign. However, Git config files can also have "url=" (no space) or "url = " (multiple spaces). This will cause the parser to fail silently and fall back to device_id instead of using the git remote URL for project_id. Consider using a more robust parser that handles whitespace variations around the equals sign.
| _cactusStreamTranscribeDestroy(_handle); | ||
| _cactusTelemetryFlush(); |
There was a problem hiding this comment.
Calling flush after destroying the handle may be problematic. The _cactusStreamTranscribeDestroy call on line 1200 may release resources that the telemetry system depends on. The flush should be called before destroying the handle to ensure telemetry events related to the destruction are properly sent. Consider swapping the order of these two calls.
| _cactusStreamTranscribeDestroy(_handle); | |
| _cactusTelemetryFlush(); | |
| _cactusTelemetryFlush(); | |
| _cactusStreamTranscribeDestroy(_handle); |
| _cactusIndexDestroy(_handle); | ||
| _cactusTelemetryFlush(); |
There was a problem hiding this comment.
Calling flush after destroying the handle may be problematic. The _cactusIndexDestroy call on line 1389 may release resources that the telemetry system depends on. The flush should be called before destroying the handle to ensure telemetry events related to the destruction are properly sent. Consider swapping the order of these two calls.
| _cactusIndexDestroy(_handle); | |
| _cactusTelemetryFlush(); | |
| _cactusTelemetryFlush(); | |
| _cactusIndexDestroy(_handle); |
| if (app_id.empty()) { | ||
| std::string pkg = extract_android_package_from_path(custom_cache_location); | ||
| if (!pkg.empty()) app_id = pkg; | ||
| } |
There was a problem hiding this comment.
The package extraction logic assumes that paths always use "/" as the separator. While this is correct for Android, the code doesn't validate that the input path is actually an Android path. If custom_cache_location is set to a non-Android path on other platforms (e.g., Windows paths with backslashes), this could produce incorrect results. Consider adding a platform check or path validation before attempting Android package extraction.
| static std::string uuid_from_string(const std::string& input) { | ||
| uint64_t a = fnv1a_64(input); | ||
| uint64_t b = fnv1a_64(input + "\xff"); | ||
| a = (a & 0xffffffffffff0fffULL) | 0x0000000000004000ULL; |
There was a problem hiding this comment.
The UUID generation uses version 4 bit pattern (0x4000) but the random data source is FNV-1a hash, which is deterministic rather than random. This is intentional for creating deterministic UUIDs from git remote URLs. However, using version 5 (name-based UUID using SHA-1) would be more semantically correct and standard-compliant for deterministic UUID generation from strings. Consider changing the version bits to 0x5000 and documenting this as a name-based UUID.
| static std::string uuid_from_string(const std::string& input) { | |
| uint64_t a = fnv1a_64(input); | |
| uint64_t b = fnv1a_64(input + "\xff"); | |
| a = (a & 0xffffffffffff0fffULL) | 0x0000000000004000ULL; | |
| // Deterministic, name-based UUID derived from input using FNV-1a. | |
| // The version field is set to 5 (name-based) for semantic correctness. | |
| static std::string uuid_from_string(const std::string& input) { | |
| uint64_t a = fnv1a_64(input); | |
| uint64_t b = fnv1a_64(input + "\xff"); | |
| a = (a & 0xffffffffffff0fffULL) | 0x0000000000005000ULL; |
| size_t colon = pkg.find(':'); | ||
| if (colon != std::string::npos) pkg = pkg.substr(0, colon); | ||
| if (!pkg.empty()) { | ||
| uid_t uid = getuid(); | ||
| int user_id = static_cast<int>((uid - 10000) / 100000); | ||
| std::string dir = "/data/user/" + std::to_string(user_id) + "/" + pkg + "/cache/cactus/telemetry"; | ||
| mkdir_p(dir); | ||
| struct stat st; | ||
| if (stat(dir.c_str(), &st) == 0) return dir; | ||
| dir = "/data/data/" + pkg + "/cache/cactus/telemetry"; | ||
| mkdir_p(dir); | ||
| if (stat(dir.c_str(), &st) == 0) return dir; | ||
| } |
There was a problem hiding this comment.
The cmdline parsing extracts the package name by splitting at ':'. However, if the package name itself contains a colon (which is technically valid though rare), this will incorrectly truncate it. Additionally, if cmdline reading fails silently, the code falls through to create directories that may not work. Consider adding error logging when cmdline parsing produces unexpected results, and validate that the extracted package name matches Android package naming conventions (e.g., contains at least one dot).
| size_t colon = pkg.find(':'); | |
| if (colon != std::string::npos) pkg = pkg.substr(0, colon); | |
| if (!pkg.empty()) { | |
| uid_t uid = getuid(); | |
| int user_id = static_cast<int>((uid - 10000) / 100000); | |
| std::string dir = "/data/user/" + std::to_string(user_id) + "/" + pkg + "/cache/cactus/telemetry"; | |
| mkdir_p(dir); | |
| struct stat st; | |
| if (stat(dir.c_str(), &st) == 0) return dir; | |
| dir = "/data/data/" + pkg + "/cache/cactus/telemetry"; | |
| mkdir_p(dir); | |
| if (stat(dir.c_str(), &st) == 0) return dir; | |
| } | |
| // Android cmdline is typically "package.name" or "package.name:process". | |
| // However, a colon can technically appear in the package name as well. | |
| // Heuristic: | |
| // - Look at the last colon. | |
| // - If the suffix after the colon has no dots (likely a process name), | |
| // and the prefix has at least one dot (likely a package name), | |
| // strip the suffix. Otherwise, keep the full string. | |
| size_t colon = pkg.rfind(':'); | |
| if (colon != std::string::npos && colon + 1 < pkg.size()) { | |
| const std::string prefix = pkg.substr(0, colon); | |
| const std::string suffix = pkg.substr(colon + 1); | |
| const bool prefix_looks_like_pkg = (prefix.find('.') != std::string::npos); | |
| const bool suffix_looks_like_pkg = (suffix.find('.') != std::string::npos); | |
| if (prefix_looks_like_pkg && !suffix_looks_like_pkg) { | |
| pkg = prefix; | |
| } | |
| } | |
| // Validate the extracted package name. | |
| if (pkg.empty() || pkg.find('.') == std::string::npos) { | |
| std::fprintf(stderr, | |
| "cactus telemetry: unexpected /proc/self/cmdline format, " | |
| "derived package name '%s' is invalid; falling back to generic cache directory.\n", | |
| pkg.c_str()); | |
| } else { | |
| uid_t uid = getuid(); | |
| int user_id = static_cast<int>((uid - 10000) / 100000); | |
| std::string dir = "/data/user/" + std::to_string(user_id) + "/" + pkg + "/cache/cactus/telemetry"; | |
| mkdir_p(dir); | |
| struct stat st; | |
| if (stat(dir.c_str(), &st) == 0) return dir; | |
| dir = "/data/data/" + pkg + "/cache/cactus/telemetry"; | |
| mkdir_p(dir); | |
| if (stat(dir.c_str(), &st) == 0) return dir; | |
| std::fprintf(stderr, | |
| "cactus telemetry: failed to access Android cache directories for package '%s'; " | |
| "falling back to generic cache directory.\n", | |
| pkg.c_str()); | |
| } | |
| } else { | |
| std::fprintf(stderr, | |
| "cactus telemetry: unable to open /proc/self/cmdline; " | |
| "falling back to generic cache directory.\n"); |
| const char* cwd_env = std::getenv("PWD"); | ||
| if (!cwd_env) return ""; | ||
| std::string dir = cwd_env; |
There was a problem hiding this comment.
Using PWD environment variable for git remote detection is unreliable. PWD may not be set in all execution contexts (e.g., daemons, GUI applications, or when launched via certain methods). Additionally, PWD can be manipulated and may not represent the actual current working directory. Consider using getcwd() or std::filesystem::current_path() instead for more reliable directory detection.
| _cactusDestroy(_handle); | ||
| _cactusTelemetryFlush(); |
There was a problem hiding this comment.
Calling flush after destroying the handle may be problematic. The _cactusDestroy call on line 1105 may release resources that the telemetry system depends on. The flush should be called before destroying the handle to ensure telemetry events related to the destruction are properly sent. Consider swapping the order of these two calls.
| _cactusDestroy(_handle); | |
| _cactusTelemetryFlush(); | |
| _cactusTelemetryFlush(); | |
| _cactusDestroy(_handle); |
Signed-off-by: HenryNdubuaku <[email protected]>
Signed-off-by: HenryNdubuaku <[email protected]>
project_id: Now a deterministic UUID derived from git remote URL (shared across devs on same repo), falling back to device_id if no git remote found.
app_id: Auto-detected from bundle ID (Apple via CoreFoundation, Android/Linux via /proc/self/cmdline). New cactus_set_app_id() FFI for manual override.
version parameter: cactus_set_telemetry_environment gains a third version arg. Version also baked in at compile time via CMake from CACTUS_VERSION file.
ram_usage_mb on transcription: recordTranscription now records actual RAM usage on success paths.
Session fields: session_ttft/tps/time_ms/tokens are now null for all event types except STREAM_TRANSCRIBE (were previously 0).
Android cache dir: Auto-derived from process UID + package name — no manual setTelemetryEnvironment() call needed.
Android INTERNET permission: Added to native test AndroidManifest.xml.
Android framework tag: Moved "kotlin" tagging out of JNI_OnLoad into Kotlin companion object init, so Flutter on Android is no longer mis-tagged as "kotlin".
Shutdown ordering: enabled = false now set after flush() completes, so the final event batch is sent to Supabase rather than cached.
Flutter dispose flush: All Flutter dispose() methods now call cactus_telemetry_flush() synchronously, ensuring pending events are sent before exit() on iOS.