osd: fix require_min_compat_client handling for msr rules#59474
osd: fix require_min_compat_client handling for msr rules#59474
Conversation
…pool OSDMap::get_features is used by OSDMonitor::validate_crush_against_features via OSDMap::get_min_compat_client() to check whether changes to the crushmap will require newer features than the existing require_min_compat_client field. Monitor commands which create rules from ec profiles may result in msr rules. While it might be harmless to allow msr rules to exist as long as there aren't any pools actually using the rule, it's probably simpler to disallow their creation in the first place until require_min_compat_client is updated. Signed-off-by: Samuel Just <[email protected]>
This was simply a mistake: uint64_t features = 0; features |= CEPH_FEATURE_CRUSH_MSR; ceph_assert(HAVE_FEATURE(features, CRUSH_MSR)); will fail the assert as CEPH_FEATURE_CRUSH_MSR lacks the mask. In basically all cases, CEPH_FEATUREMASK_* is the correct choice. Signed-off-by: Samuel Just <[email protected]>
Also adjusts validate_crush_against_features to use an ostream& rather than a stringstream&. Fixes: https://tracker.ceph.com/issues/67755 Signed-off-by: Samuel Just <[email protected]>
|
jenkins test api |
|
jenkins test make check |
There was a problem hiding this comment.
LGTM. I tested this using the following steps:
Start a vstart cluster with 7 OSDs:
$ OSD=7 ../src/vstart.sh --debug --new -x --localhost --bluestore
Create MSR ec profile:
$ ./bin/ceph osd erasure-code-profile set ec86-profile k=8 m=6 crush-failure-domain=host crush-osds-per-failure-domain=4 crush-num-failure-domains=4
Try to create an ec pool with it. It correctly fails since require-min-compat-client is luminous (< squid):
$ ./bin/ceph osd pool create ec86-pool erasure ec86-profile
Error EINVAL: new crush map requires client version squid but require_min_compat_client is luminous
Set require-min-compat-client to something else, still < squid:
$ ./bin/ceph osd set-require-min-compat-client reef
set require_min_compat_client to reef
Try to create an ec pool with the MSR ec profile. It still fails since reef < squid:
$ ./bin/ceph osd pool create ec86-pool erasure ec86-profile
Error EINVAL: new crush map requires client version squid but require_min_compat_client is reef
Try to change the crushmap manually with an old crush map with an MSR rule. It fails since crushmaps with MSR rules should not be accepted if require-min-compat-client < squid:
$ ./bin/ceph osd setcrushmap -i ~/bz_2302230/crush.map.bin
Error EINVAL: new crush map requires client version squid but require_min_compat_client is reef
Set require-min-compat-client to squid:
$ ./bin/ceph osd set-require-min-compat-client squid
set require_min_compat_client to squid
Try to create a pool with the MSR ec profile. It succeeds this time since require-min-compat-client = squid:
$ ./bin/ceph osd pool create ec86-pool erasure ec86-profile
pool 'ec86-pool' created
Try to set require-min-compat-client lower than squid. This fails since we have an MSR rule in use, which uses squid features:
$ ./bin/ceph osd set-require-min-compat-client reef
Error EPERM: osdmap current utilizes features that require squid; cannot set require_min_compat_client below that to reef
Extract the crushmap and try to reapply it via CLI. This succeeds, as require-min-compat-client = squid.
$ ./bin/ceph osd getcrushmap > crush.map.bin
16
$ ./bin/ceph osd setcrushmap -i crush.map.bin
17
|
@athanatos so far I found one type of failure related to this PR. I'm still reviewing the upgrade suite, so full results pending. Tested here: https://tracker.ceph.com/issues/67783 |
… profiles Signed-off-by: Samuel Just <[email protected]>
|
Pushed hopefully a fix for ^ and started retest of a few of those jobs http://pulpito.ceph.com/sjust-2024-08-30_00:56:59-rados-wip-yuri4-testing-2024-08-28-1359-distro-default-smithi/ |
|
@ljflores Looks like my commit above fixed that failure. |
|
jenkins test make check |
All green! |
|
jenkins test make check |
This PR fixes 3 problems:
See commit messages for details.
Show available Jenkins commands
jenkins retest this pleasejenkins test classic perfjenkins test crimson perfjenkins test signedjenkins test make checkjenkins test make check arm64jenkins test submodulesjenkins test dashboardjenkins test dashboard cephadmjenkins test apijenkins test docsjenkins render docsjenkins test ceph-volume alljenkins test ceph-volume toxjenkins test windowsjenkins test rook e2e