WIP implement doubling backoff for WAL watcher timer#11950
Closed
WIP implement doubling backoff for WAL watcher timer#11950
Conversation
Signed-off-by: Callum Styan <[email protected]>
7 tasks
5 tasks
cstyan
pushed a commit
to grafana/loki
that referenced
this pull request
May 17, 2023
**What this PR does / why we need it**: This PR implements a new mechanism for the wal Watcher in Promtail, to know there are new records to be read. It uses a combination of: - prometheus/prometheus#11950 - prometheus/prometheus#11949 The main idea is that the primary mechanism is a notification channel between the `wal.Writer` and `wal.Watcher`. The Watcher subscribes to write events the writer publishes, getting notified if the wal has been written. The same subscriptions design is used for cleanup events. As a backup, the watcher has a timer that implements an exponential backoff strategy, which is constrained by a minimum and maximum that the user can configure. Below the cpu difference is shown of running both main and this branch against the same scrape target. <img width="2496" alt="image" src="https://user-images.githubusercontent.com/2617411/232099483-7e5c36fa-9360-4eb9-8240-687adf46e330.png"> The yellow line is the latest main build from where this branch started, and the green line is this branch. Both promtails tailing docker logs, and using the following metrics to get cpu usage from cadvisor: ``` avg by (name) (rate(container_cpu_usage_seconds_total{job=~".+", instance=~".+", name=~"promtail-wal-test_promtail.+"}[$__rate_interval])) ``` **Which issue(s) this PR fixes**: Part of #8197 **Special notes for your reviewer**: **Checklist** - [ ] Reviewed the [`CONTRIBUTING.md`](https://github.com/grafana/loki/blob/main/CONTRIBUTING.md) guide (**required**) - [ ] Documentation added - [ ] Tests updated - [ ] `CHANGELOG.md` updated - [ ] Changes that require user attention or interaction to upgrade are documented in `docs/sources/upgrading/_index.md`
Member
Author
|
superseded by #11949 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR introduces a doubling backoff (naive approach, via a sleep rather than something more intelligent like a modified Timer implementation) within the WAL watchers read loop, if we read from the segment but don't actually read any new bytes (nothing has been written since the last read) the timeout before the next read increases. This cuts the WAL watcher cpu usage by ~40%.

In this case green is the prometheus built from the main branch and the teal line is the one built from this branch.
Signed-off-by: Callum Styan [email protected]