os/bluestore: Add health warning for bluestore fragmentation#61214
os/bluestore: Add health warning for bluestore fragmentation#61214
Conversation
src/os/bluestore/BlueStore.cc
Outdated
| auto now = mono_clock::now(); | ||
| timespan elapsed = now - last_fragmentation_check; | ||
| if (elapsed > make_timespan(period)) { | ||
| double score = store->alloc->get_fragmentation_score(); |
There was a problem hiding this comment.
Can we log at pretty low level (0?) before and after this call to be able to identify relevant stalls if any.
8921afc to
a98cef0
Compare
|
jenkins test make check |
|
jenkins test api |
src/os/bluestore/BlueStore.cc
Outdated
| if (elapsed > make_timespan(period)) { | ||
| last_fragmentation_check = now; | ||
| double score = 0; | ||
| if (store->alloc) { |
There was a problem hiding this comment.
may be do above along with "period !=0" check:
if (period != 0 && store->alloc)
...
|
jenkins test api |
|
This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved |
0c7c174 to
1861330
Compare
Changed "bluestore/fragmentation_micros" from quick imprecise to slow but more representative score. Introduced config "bluestore_warn_on_free_fragmentation" that controls when free space fragmentation score becomes a health warning. Currently calculation of fragmentation score might be non-instant for severly fragmented disks. It might induce stalls to write IO. Config value "bluestore_fragmentation_check_period" control score calculation period. In future, costly score calculation will be replaced with method that continously updates score. Signed-off-by: Adam Kupczyk <[email protected]>
Using fmt::format requires libfmt for linking Signed-off-by: Adam Kupczyk <[email protected]>
1861330 to
3c5ae6c
Compare
|
jenkins test make check |
|
[ FAILED ] TestLibRBD.TestPendingAio |
|
jenkins test api |
|
jenkins test make check |
|
make check failure unrelated |
|
jenkins test make check |
1 similar comment
|
jenkins test make check |
|
jenkins test make check arm64 |
| last_fragmentation_check = now; | ||
| double score; | ||
| score = store->alloc->get_fragmentation_score(); | ||
| store->logger->set(l_bluestore_fragmentation, score * 1e6); |
There was a problem hiding this comment.
@aclamk Was it intentional to make the fragmentation score a thousand times larger than before?
Prior to the PR, the fragmentation score returned by the ceph_bluestore_fragmentation_micros metric was simply multiplied by 1000, and documented as such:
How fragmented bluestore free space is (free extents / max possible number of free extents) * 1000
But now it's being multiplied by 1e6, making it hard to create consistent dashboards and alerts for both Reef and Squid clusters.
Changed "bluestore/fragmentation_micros" from quick imprecise to slow but more representative score.
Introduced config "bluestore_warn_on_free_fragmentation" that controls when free space fragmentation score becomes a health warning.
Currently calculation of fragmentation score might be non-instant for severly fragmented disks. It might induce stalls to write IO. Config value "bluestore_fragmentation_check_period" control score calculation period.
In future, costly score calculation will be replaced with method that continously updates score.
Contribution Guidelines
To sign and title your commits, please refer to Submitting Patches to Ceph.
If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.
When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an
xbetween the brackets:[x]. Spaces and capitalization matter when checking off items this way.Checklist
Show available Jenkins commands
jenkins retest this pleasejenkins test classic perfjenkins test crimson perfjenkins test signedjenkins test make checkjenkins test make check arm64jenkins test submodulesjenkins test dashboardjenkins test dashboard cephadmjenkins test apijenkins test docsjenkins render docsjenkins test ceph-volume alljenkins test ceph-volume toxjenkins test windowsjenkins test rook e2e