Revert "Merge pull request #47560 from pdvian/wip-pool-app"#53336
Revert "Merge pull request #47560 from pdvian/wip-pool-app"#53336
Conversation
This reverts commit 93c326f, reversing changes made to d1a0d8d. This patchset caused wide-spread component failures. With the increased scrutiny, there is a desire to simplify this [1] by either: (a) Creating a config that is turned on for all suites to turn off this behavior. (b) Add an ignorelist entry globally in the ceph.py|cephadm.py|rook.py tasks. (c) Fix the warning to only be generated after the new pool has no application for some amount of time. Option (c) is the most robust. (b) is probably the most straightforward. [1] ceph#47560 (review) Signed-off-by: Patrick Donnelly <[email protected]>
|
I did not include a revert of 4293d9b because it had other changes that should perhaps be kept. |
| overrides: | ||
| ceph: | ||
| log-ignorelist: | ||
| - \(POOL_APP_NOT_ENABLED\) |
There was a problem hiding this comment.
Are you sure that this was added strictly to mop up the fallout and not because crimson, being a RADOS suite in nature, has grown a legitimate need?
There was a problem hiding this comment.
Given it was merged 5 days ago, I think it's fallout mop up.
I think the revert would be justified only if we go with (c), which would also cover regular users/operators. If the intention is to proceed with (a) or (b), I doubt the revert churn is worth it -- individual suites have already been adjusted for the most part. |
rzarzynski
left a comment
There was a problem hiding this comment.
Thanks @batrick for bringing the commits. I talked with @pdvian yesterday on approaches for the warning suppression. Removing the POOL_APP_NOT_ENABLED from all the yamls will be useful in any way.
The thing I don't know yet is whether this revert should be merged immediately or new commits can be appended and merged in one go, assuring uninterrupted presence of the fix.
The questions from #47560 (comment) are still not answered entirely. I would like to know how painful the fallout from the patch really is. Since the discussion there I see that many has been replaced with wide-spread. However, none of these terms has been quantified.
Another uncovered aspect is how much of effort is needed to discriminate the noise. IIUC failures due to POOL_APP_NOT_ENABLED are schematic, possible to filter-out at first glance.
My intention here is to weigh what's less evil: 1) leaving some (how many? ;-) tests to fail and treating this as an incentive to quickly clean the fallout or 2) reverting the imperfect-but-existing fix and potentially stepping into long discussion on how to warn in the proper way. Personally I'm afraid something is missed in the design (therefore using emptiness as a proxy) and there is no proper way without touching the interface.
…-ignore-pool-app" This reverts commit a089373, reversing changes made to 6dda3e2. Signed-off-by: Patrick Donnelly <[email protected]>
5a08f7a to
3a24db3
Compare
There are several tickets now indicating that virtually every component was affected by this PR.
The larger concern is that these warnings may be shadowing true failures.
The widespread proliferation of filtering a legitimate warning should be very concerning. My position has always been that ignoring a warning across an entire suite is a band-aid that will always come back to bite us. For example: The entire rgw suite is ignoring I personally like @idryomov 's option (c). The PR should go back and address this. Selective reverts of most of the change-set by removing the warnings after a legitimate fix is even messier than a revert + proper fix in my opinion. So yes, I think this should be merged as-is immediately. |
Well, are they severely impacted? I'm not arguing on the lack of fallout. I'm arguing that reverting existing fix without bringing an alternative may be more problematic than the fallout itself.
True. The tests must be fixed in reasonable time. However, this doesn't imply reverting the fix immediately.
This would only change the proxy, the heuristic for guessing-out the information we miss: from emptiness to time. |
Global suppression vs guessing out the missed info. Both are stinky and now we're stepping into long discussion ;-) which one is worse. This is precisely the reason why I'm afraid about reverting without an accepted alternative. |
|
Closing this because the urgency is gone and @pdvian has been asked to fix this in a followup PR. |
This reverts commit 93c326f, reversing changes made to d1a0d8d.
This patchset caused wide-spread component failures. With the increased scrutiny, there is a desire to simplify this [1] by either:
(a) Creating a config that is turned on for all suites to turn off this behavior.
(b) Add an ignorelist entry globally in the ceph.py|cephadm.py|rook.py tasks.
(c) Fix the warning to only be generated after the new pool has no application for some amount of time.
Option (c) is the most robust. (b) is probably the most straightforward.
[1] #47560 (review)
Contribution Guidelines
To sign and title your commits, please refer to Submitting Patches to Ceph.
If you are submitting a fix for a stable branch (e.g. "pacific"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.
Checklist
Show available Jenkins commands
jenkins retest this pleasejenkins test classic perfjenkins test crimson perfjenkins test signedjenkins test make checkjenkins test make check arm64jenkins test submodulesjenkins test dashboardjenkins test dashboard cephadmjenkins test apijenkins test docsjenkins render docsjenkins test ceph-volume alljenkins test ceph-volume toxjenkins test windows