Skip to content

Comments

dencoding: check struct_v against DECODE_START(v, ...) at compile-time#60159

Merged
rzarzynski merged 3 commits intoceph:mainfrom
rzarzynski:wip-denc-ctcheck-struct_v
Mar 25, 2025
Merged

dencoding: check struct_v against DECODE_START(v, ...) at compile-time#60159
rzarzynski merged 3 commits intoceph:mainfrom
rzarzynski:wip-denc-ctcheck-struct_v

Conversation

@rzarzynski
Copy link
Contributor

Typical problem found in recent reviews of dencoding changes is missed increment of the version number passed to DECODE_START() macro in a decoder.
It's easy to overlook and hard to find as the number is used for rarely happening explicit breaks of schema compatibility.

This commit brings compile-time checks for struct_v in decoders (and for denc also in encoders) to find a situation when it's compared against version higher than declared using the DECODE_START(v, ...) macro.

Additionally, the patch fixes the currently existing issues of this genre buried across the entire code base.

Signed-off-by: Radoslaw Zarzynski [email protected]


The goldbot snippet is available here.

Contribution Guidelines

  • To sign and title your commits, please refer to Submitting Patches to Ceph.

  • If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.

  • When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an x between the brackets: [x]. Spaces and capitalization matter when checking off items this way.

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows
  • jenkins test rook e2e

@rzarzynski
Copy link
Contributor Author

Do we have a framework for ensuring compilation failures? This isn't something GTest is able to do.

CC: @gregsfortytwo.

@rzarzynski rzarzynski requested a review from a team October 7, 2024 12:09
Copy link
Contributor

@NitzanMordhai NitzanMordhai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rzarzynski please see my comment

}
decode(attrs_to_read, bl);
if (struct_v > 2 && struct_v > struct_compat) {
if (struct_v > 2 && struct_v.v > struct_compat) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please explain why you're adding the compare against struct_compat and v ? there are few other compares against compact that are still compared to struct_v and not struct_v.v

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is comparing struct_compat against the struct_v which is not a regular integer but an instance of the CheckingWrapper. This wrapper is able to compare only against consteval expression but struct_compat is not like that. This leads to the following FTBFS:

ECMsgTypes.cc:223:34: error: the value of ‘struct_compat’ is not usable in a constant expression

@cbodley
Copy link
Contributor

cbodley commented Oct 8, 2024

Do we have a framework for ensuring compilation failures? This isn't something GTest is able to do.

@rzarzynski copying a suggestion from the cpplang slack:

The way you do compile tests testable with ctest is by using an object library via add_library(whatever OBJECT ...) that is EXCLUDE_FROM_ALL then explicitly calling the build system on the target with add_test(NAME whatever COMMAND "${CMAKE_COMMAND}" --build "${CMAKE_BINARY_DIR}" --config "\$<CONFIG>" --target whatever). Define cmake-properties(7) as necessary on the test if you want to test for failure or something specific.

@rzarzynski
Copy link
Contributor Author

Thanks, @cbodley!

@github-actions
Copy link

github-actions bot commented Dec 7, 2024

This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days.
If you are a maintainer or core committer, please follow-up on this pull request to identify what steps should be taken by the author to move this proposed change forward.
If you are the author of this pull request, thank you for your proposed contribution. If you believe this change is still appropriate, please ensure that any feedback has been addressed and ask for a code review.

@github-actions github-actions bot added the stale label Dec 7, 2024
@rzarzynski rzarzynski removed the stale label Dec 10, 2024
@github-actions
Copy link

github-actions bot commented Feb 8, 2025

This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days.
If you are a maintainer or core committer, please follow-up on this pull request to identify what steps should be taken by the author to move this proposed change forward.
If you are the author of this pull request, thank you for your proposed contribution. If you believe this change is still appropriate, please ensure that any feedback has been addressed and ask for a code review.

@github-actions github-actions bot added the stale label Feb 8, 2025
@rzarzynski
Copy link
Contributor Author

Unstale, please.

@github-actions github-actions bot removed the stale label Feb 10, 2025
@rzarzynski
Copy link
Contributor Author

jenkins test make check

@rzarzynski rzarzynski force-pushed the wip-denc-ctcheck-struct_v branch from a734e99 to e92eeec Compare February 15, 2025 15:24
@rzarzynski rzarzynski force-pushed the wip-denc-ctcheck-struct_v branch 3 times, most recently from 6b80485 to 8a3ed71 Compare February 16, 2025 15:43
@rzarzynski
Copy link
Contributor Author

@NitzanMordhai, @cbodley: implemented the unit testing. Would you mind re-review?

Copy link
Contributor

@NitzanMordhai NitzanMordhai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@idryomov
Copy link
Contributor

idryomov commented Feb 24, 2025

Too bad this doesn't catch cases where struct_v is passed to a helper or similar gymnastics -- it caught EventEntry::decode(), but not ClientData::decode() in the same file. Something that would have grouped encode() and decode() in such a way that struct_v is provided by the developer only once would have done it, but would have probably been a much more invasive change.

Also, I'd fold "osd/osd_types: fix wrong DECODE_START version for pg_notify_t" commit into the commit which has all the other fixups. The fact that this one is recent (November 2024) and was caught after a rebase doesn't make it special.

Copy link
Member

@gregsfortytwo gregsfortytwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CephFS changes look good and I like the compile-time checking. More explanation of how that works would be nice (this whole part of the code is really under-documented, I know) — I got confused a few times.

@gregsfortytwo
Copy link
Member

@rzarzynski do we need to backport the DECODE_START version changes? If we bump the compat_v in a future version to a version between correct struct_v and the bad DECODE_START(v) value, we could unexpectedly break things...but I don't think we've ever set compat_v to a value other than a newly-bumped struct_v, or that we have structures where it would make sense to do so.

@rzarzynski
Copy link
Contributor Author

(...) in such a way that struct_v is provided by the developer only once would have done it, but would have probably been a much more invasive change.

I think such a thing already exists -- it's our denc framework, where a developer typically writes just a single DENC_START(v, ...). Indeed, switching would be far more invasive.

@rzarzynski
Copy link
Contributor Author

Too bad this doesn't catch cases where struct_v is passed to a helper or similar gymnastics -- it caught EventEntry::decode(), but not ClientData::decode() in the same file.

Hmm, perhaps knocking out the conversion operator could help in this aspect but at the price of increasing the code churn.

@rzarzynski
Copy link
Contributor Author

@rzarzynski do we need to backport the DECODE_START version changes? If we bump the compat_v in a future version to a version between correct struct_v and the bad DECODE_START(v) value, we could unexpectedly break things...but I don't think we've ever set compat_v to a value other than a newly-bumped struct_v, or that we have structures where it would make sense to do so.

Do you mean the fixes? My perception was that having them only since a major release makes thinking easier. Also, we tend to avoid schema changes between minors.

@rzarzynski rzarzynski force-pushed the wip-denc-ctcheck-struct_v branch from 1552c09 to 755ae72 Compare February 25, 2025 00:53
@rzarzynski
Copy link
Contributor Author

Just squashed the recent fix for pg_notify_t with other fixes. No other changes.

@gregsfortytwo
Copy link
Member

@rzarzynski do we need to backport the DECODE_START version changes? If we bump the compat_v in a future version to a version between correct struct_v and the bad DECODE_START(v) value, we could unexpectedly break things...but I don't think we've ever set compat_v to a value other than a newly-bumped struct_v, or that we have structures where it would make sense to do so.

Do you mean the fixes? My perception was that having them only since a major release makes thinking easier. Also, we tend to avoid schema changes between minors.

Just the random decoders that had the wrong value which you identified and fixed here.
I started out thinking we should put them everywhere but I think I've talked myself into it not mattering; just want to make sure somebody else considers the impact on upgrades. :)

1 similar comment
@gregsfortytwo
Copy link
Member

@rzarzynski do we need to backport the DECODE_START version changes? If we bump the compat_v in a future version to a version between correct struct_v and the bad DECODE_START(v) value, we could unexpectedly break things...but I don't think we've ever set compat_v to a value other than a newly-bumped struct_v, or that we have structures where it would make sense to do so.

Do you mean the fixes? My perception was that having them only since a major release makes thinking easier. Also, we tend to avoid schema changes between minors.

Just the random decoders that had the wrong value which you identified and fixed here.
I started out thinking we should put them everywhere but I think I've talked myself into it not mattering; just want to make sure somebody else considers the impact on upgrades. :)

Copy link
Contributor

@chardan chardan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know this code well, so I don't have a lot to add-- mostly looks fine to me. Please see note about using identifiers that start with underscores. I have to wonder, also, if the whole enchilada could be done with templates, for the most part, but that is well out of the change scope.

@ljflores
Copy link
Member

Hey all, apologies for the delay in testing- the QA tracker was mislabeled. Reviews are ongoing now here: https://tracker.ceph.com/issues/70205

@yuriw
Copy link
Contributor

yuriw commented Mar 16, 2025

@rzarzynski
Copy link
Contributor Author

@yuriw: the checker has found a legit bug:

/home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos9/DIST/centos9/MACHINE_SIZE/gigantic/release/20.0.0-515-g8ff3a1a4/rpm/el9/BUILD/ceph-20.0.0-515-g8ff3a1a4/src/messages/MClientReply.h:232:23:   in ‘constexpr’ expansion of ‘StructVChecker<7>::CheckingWrapper(8)’

introduced 2 weeks ago (#60746).
I will rebase.

`throw` in C++20's `consteval` function results in an ill-formed
program. Useful as parameters of such functions aren't const-
expressions, and thus are unsuitable for `static_assert`.

Signed-off-by: Radoslaw Zarzynski <[email protected]>
@rzarzynski rzarzynski force-pushed the wip-denc-ctcheck-struct_v branch from 755ae72 to f0526be Compare March 18, 2025 11:34
Typical problem found in recent reviews of dencoding changes is missed
increment of the version number passed to `DECODE_START()` macro in
a decoder.
It's easy to overlook and hard to find as the number is used for rarely
happening explicit breaks of schema compatibility.

This commit brings compile-time checks for `struct_v` in decoders
(and for `denc` also in encoders) to find a situation when it's
compared against version higher than declared using the `DECODE_START(v, ...)`
macro.

Additionally, the patch fixes the currently existing issues of this
genre buried across the enitre code base.

Signed-off-by: Radoslaw Zarzynski <[email protected]>
@rzarzynski rzarzynski force-pushed the wip-denc-ctcheck-struct_v branch from f0526be to f983d75 Compare March 18, 2025 12:12
@rzarzynski
Copy link
Contributor Author

In RGW as well:

/home/rzarzynski/ceph2/src/rgw/rgw_zone_types.h:590:28:   in ‘constexpr’ expansion of ‘StructVChecker<2>::CheckingWrapper(3)’

We really need the compile-time checking.

@yuriw: this PR is particularly susceptible to catching undetected merge conflicts with other PRs. Let's get it in sooner than later.
I think it's ready for a retest.

@ljflores
Copy link
Member

@rzarzynski rzarzynski merged commit db3b965 into ceph:main Mar 25, 2025
11 checks passed
@rzarzynski rzarzynski deleted the wip-denc-ctcheck-struct_v branch March 25, 2025 22:05
rzarzynski added a commit to rzarzynski/ceph that referenced this pull request Mar 26, 2025
This commit fixes an undetected merge conflict between PRs ceph#61745
and ceph#60159. The dencoding problem has been introduced very recently,
it is straightforward and causes failures of the make check bot
everywhere, therefore -- if no objections -- I want to merge this
patch without the Teuthology testing.

Signed-off-by: Radoslaw Zarzynski <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants