-
-
Notifications
You must be signed in to change notification settings - Fork 200
feat(logs): batching #1338
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(logs): batching #1338
Conversation
# Conflicts: # examples/example.c
supervacuus
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a first review and I guess it will still need a couple of rounds. Maybe we can work on these things a bit closer next.
Current IssuesNon-IssuesDropping Logs
The time-to-flush seems higher than local testing (from an average of 600us to 1200us) which could explain the high number of dropped logs. In the 5/50 example below, we even have flushes taking 76000us, which definitely blocks the logs buffer long enough to drop 90% of incoming logs. Example runs: Potential SolutionWe can only handle 20k logs/s project-wide, before abuse quota hits. This equates to one log every 50us; our stress-test might therefore be stressing the system too much (source: internal notion page) FixedStuck flushing (Windows)In some runs we seemingly get stuck trying to flush logs. We see a trigger every 5s, even though the timer_task should have stopped SolutionWe had a leftover sleep in the code that was removed in 208952b ; this caused the Windows tests to run for 30 minutes before closing Threaded logs test getting stuck (Windows)Different from the Fixed Logs Flushing problem mentioned below, we noticed that on 32-bit Windows the threaded logs test gets 'stuck'. See an example run here. In the logs we can notice the following; an unexpected return on the logs flush trigger followed by a failed assert in sentry_value.c. SolutionFixed in #1362 , we had some overlooked ownership issues with collecting the attributes for the logs, which caused Windows to show an error window (causing the test to hang). This was not a Windows-specific issue, but CI luckily (accidentally) caught it. |
- too much variability in thread scheduling, so we can expect pretty much anything
|
The switch statement on the return value of Currently it expects: |
* add `before_send_log` callback * add `before_send_log` callback tests * (temporary) add debug for calling sentry_options_free * remove early return * add late return * cleanup
* let the producer thread sleep for 1ms between logs * fix two missing NULL checks in the json serializer * clean up logging and early exits in `enqueue_log_single()` * clean up ownerships in logs * eliminate clones (we expect that everything outlives the logs being sent except local construction) * use incref everywhere where we ref global state. this was the cause of the UAF, partially solved with the clones but a few were missing. no reason to clone if we do not want to disconnect for a particular object graph * clarify that add_attribute expects ownership * minimize scope_mut by moving os_context out * raise that log output in throughput tests add to variability (stdout logging should be turned off when running a limit) * log error in case we weren't able to start the log batching worker * fix clang-cl warning * ci: fix failing mingw build (#1361) * ci: fix failing mingw build * split `ASM_MASM_COMPILER` and `_FLAGS` * add `ASM_MASM_FLAGS` in `mingw` install step * specify the `CMAKE_ASM_MASM_COMPILER` as a `FILEPATH` * clean up CMAKE_DEFINES construction so it is easier to diff in the future * fix `LLVM_MINGW_INSTALL_PATH` to be referenced locally rather than $env (cherry picked from commit 519554f) * use UNREACHABLE macro to fix anal warnings
|
* first attempt at double buffered * remove the sleep from the windows thread func * clean up thread waiting in the example * adapt the double buffer to use retries, fix remaining issues, clean up and write inline docs * return early in example on sentry_init error. * fix formatting via shorter name for thread gate atomic * improve inline docs of log_buffer_t members * fetch os_context from scope * move scope/options data retrieval into separate function + add expected keys to test * update logs API to return status code * cleanup * add log-event trace connection test * remove duplicate test * specify macOS SDKROOT --------- Co-authored-by: JoshuaMoelans <[email protected]>
Implements a double buffer queue for logs batching.
#skip-changelog