Skip to content

WIP implement doubling backoff for WAL watcher timer#11950

Closed
cstyan wants to merge 1 commit intomainfrom
callum-watcher-poll-backoff
Closed

WIP implement doubling backoff for WAL watcher timer#11950
cstyan wants to merge 1 commit intomainfrom
callum-watcher-poll-backoff

Conversation

@cstyan
Copy link
Copy Markdown
Member

@cstyan cstyan commented Feb 8, 2023

This PR introduces a doubling backoff (naive approach, via a sleep rather than something more intelligent like a modified Timer implementation) within the WAL watchers read loop, if we read from the segment but don't actually read any new bytes (nothing has been written since the last read) the timeout before the next read increases. This cuts the WAL watcher cpu usage by ~40%.
2023-02-06-165055_2437x900_scrot
In this case green is the prometheus built from the main branch and the teal line is the one built from this branch.

Signed-off-by: Callum Styan [email protected]

cstyan pushed a commit to grafana/loki that referenced this pull request May 17, 2023
**What this PR does / why we need it**:

This PR implements a new mechanism for the wal Watcher in Promtail, to
know there are new records to be read. It uses a combination of:
- prometheus/prometheus#11950
- prometheus/prometheus#11949

The main idea is that the primary mechanism is a notification channel
between the `wal.Writer` and `wal.Watcher`. The Watcher subscribes to
write events the writer publishes, getting notified if the wal has been
written. The same subscriptions design is used for cleanup events.

As a backup, the watcher has a timer that implements an exponential
backoff strategy, which is constrained by a minimum and maximum that the
user can configure.

Below the cpu difference is shown of running both main and this branch
against the same scrape target.

<img width="2496" alt="image"
src="https://user-images.githubusercontent.com/2617411/232099483-7e5c36fa-9360-4eb9-8240-687adf46e330.png">

The yellow line is the latest main build from where this branch started,
and the green line is this branch. Both promtails tailing docker logs,
and using the following metrics to get cpu usage from cadvisor:
```
avg by (name) (rate(container_cpu_usage_seconds_total{job=~".+", instance=~".+", name=~"promtail-wal-test_promtail.+"}[$__rate_interval]))
```

**Which issue(s) this PR fixes**:
Part of #8197

**Special notes for your reviewer**:

**Checklist**
- [ ] Reviewed the
[`CONTRIBUTING.md`](https://github.com/grafana/loki/blob/main/CONTRIBUTING.md)
guide (**required**)
- [ ] Documentation added
- [ ] Tests updated
- [ ] `CHANGELOG.md` updated
- [ ] Changes that require user attention or interaction to upgrade are
documented in `docs/sources/upgrading/_index.md`
@cstyan
Copy link
Copy Markdown
Member Author

cstyan commented Sep 1, 2023

superseded by #11949

@cstyan cstyan closed this Sep 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant