include/ceph_assert: disable with NDEBUG by MaxKellermann · Pull Request #60160 · ceph/ceph

MaxKellermann · 2024-10-07T12:18:24Z

This makes ceph_assert() behave like the standard C assert() macro: in release builds (with -DNDEBUG), assertions should be compiled out, leaving no overhead.

This reduces the code size of ceph-mds (static build including libceph-common) by 550 kB.

Checklist

Tracker (select at least one)
- References tracker ticket
- Very recent bug; references commit where it was introduced
- New feature (ticket optional)
- Doc update (no ticket needed)
- Code cleanup (no ticket needed)
Component impact
- Affects Dashboard, opened tracker ticket
- Affects Orchestrator, opened tracker ticket
- No impact that needs to be tracked
Documentation (select at least one)
- Updates relevant documentation
- No doc update is appropriate
Tests (select at least one)
- Includes unit test(s)
- Includes integration test(s)
- Includes bug reproducer
- No tests

MaxKellermann · 2024-10-07T16:31:36Z

/home/jenkins-build/build/workspace/ceph-api/src/librados/librados_asio.h:193:10: error: no matching function for call to 'async_initiate'
  return boost::asio::async_initiate<CompletionToken, Signature>(
         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

"ceph API tests" failure apparently not related to this PR.

This makes ceph_assert() behave like the standard C assert() macro: in release builds (with `-DNDEBUG`), assertions should be compiled out, leaving no overhead. This reduces the code size of `ceph-mds` (static build including `libceph-common`) by 550 kB. Signed-off-by: Max Kellermann <[email protected]>

rzarzynski · 2024-10-13T22:09:17Z

src/include/ceph_assert.h


-#ifdef __SANITIZE_ADDRESS__
+#ifdef NDEBUG
+#define ceph_assert(expr) do {} while (false)


This is a huge change. Many years ago there was an idea to replace the important occurrences of ceph_assert() with ceph_assert_always() and compile-out everything else for production builds. The problem is of course in judging what's essential across the entire code-base.

If you want to resurrect the effort, I would propose to bring the topic to next Ceph Development Montly meeting.

How much of CPU overhead does this squeeze?

I havn't isolated this patch in my measurements, but I think it's a small piece. Most users can probably afford it, but we at IONOS have serious performance problems with the MDS, so we want to squeeze out every bit we can (and we will leave patches that are not accepted upstream in our private fork).
I do however not get the point of ceph_assert_always() - by design, assertions are only for detecting bugs in debug builds - is this for people not confident enough in their code, and they still think they need guaranteed assertions in release builds? Why not regular runtime checks? (I'm not a fan of redundant runtime checks either. Get the code right, verify it with fine-grained unit tests, and remove all the unnecessary checks at release-runtime.)

I havn't isolated this patch in my measurements, but I think it's a small piece.

I share this intuition on the CPU overhead.
I don't see benefit in optimizing the size of the executable as most of the asserts' code already is (IIRC; "should be" for sure) in cold sections, so they shouldn't inflict a cost even in terms of memory but just address space.

I do however not get the point of ceph_assert_always()

Yeah, the effort to differentiate the asserts is long dead. Perhaps we should say this explicitly and switch everybody to ceph_assert().

by design, assertions are only for detecting bugs in debug builds - is this for people not confident enough in their code, and they still think they need guaranteed assertions in release builds?

Hah, I don't think a programmer should be confident enough till having a mathematical proof of correctness.

Why not regular runtime checks? (I'm not a fan of redundant runtime checks either.

What do you mean by regular run-time checks? Sanitizers?

Get the code right, verify it with fine-grained unit tests, and remove all the unnecessary checks at release-runtime.)

These assertions saved me so much investigation time in the wild that I really appreciate their presence in productions builds. As there is no clear evidence of high cost, I'm for rejecting the idea. Yet, if you think you have strong enough arguments, I'm happy to discuss at CDM (see https://tracker.ceph.com/projects/ceph/wiki/Planning).

MaxKellermann · 2024-12-07T05:48:47Z

@rzarzynski, how's the discussion of this PR going? Yay or any? Merge or reject? Something else, like add another compile-time option?

rzarzynski · 2024-12-07T13:10:39Z

how's the discussion of this PR going? Yay or any? Merge or reject? Something else, like add another compile-time option?

Responded in comment above.

MaxKellermann · 2024-12-07T16:27:33Z

Okay then, I'll keep this one in our private fork.

MaxKellermann · 2024-12-09T17:37:32Z

How much of CPU overhead does this squeeze?

I was curious and measured this; a fio run without this change, measuring perf stat -p $(pidof ceph-mds):

          61226.41 msec task-clock                       #    1.508 CPUs utilized             
            842964      context-switches                 #   13.768 K/sec                     
               246      cpu-migrations                   #    4.018 /sec                      
            210994      page-faults                      #    3.446 K/sec                     
       86418062722      cycles                           #    1.411 GHz                       
       17704534501      stalled-cycles-frontend          #   20.49% frontend cycles idle      
       65247819356      instructions                     #    0.76  insn per cycle            
                                                  #    0.27  stalled cycles per insn   
       12910645552      branches                         #  210.867 M/sec                     
         523527782      branch-misses                    #    4.06% of all branches

With this change:

          57339.55 msec task-clock                       #    1.514 CPUs utilized             
            815840      context-switches                 #   14.228 K/sec                     
               306      cpu-migrations                   #    5.337 /sec                      
            199497      page-faults                      #    3.479 K/sec                     
       83150952383      cycles                           #    1.450 GHz                       
       16366759028      stalled-cycles-frontend          #   19.68% frontend cycles idle      
       62597488888      instructions                     #    0.75  insn per cycle            
                                                  #    0.26  stalled cycles per insn   
       12316935143      branches                         #  214.807 M/sec                     
         490099136      branch-misses                    #    3.98% of all branches

That's 6-7% less CPU usage (task-clock: 57339/61226 = 0.9365). It's much more than I thought.

I agree that having assertions enabled is often very useful (my own software runs with assertions enabled in production), but that's an explicit choice at compile time (by not specifying -DNDEBUG, as documented in the C++ standard). Ceph however deviates from the standard and does not grant users this choice.

MaxKellermann force-pushed the assert_ndebug branch from 42d46de to 912be7c Compare October 7, 2024 21:25

rzarzynski added pending-discussion DNM labels Oct 13, 2024

rzarzynski reviewed Oct 13, 2024

View reviewed changes

This was referenced Oct 31, 2024

msg: Optimize receive buffering #60224

Closed

common: add missing include, reduce header dependencies #60253

Merged

MaxKellermann closed this Dec 7, 2024

MaxKellermann deleted the assert_ndebug branch December 7, 2024 16:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

include/ceph_assert: disable with NDEBUG#60160

include/ceph_assert: disable with NDEBUG#60160
MaxKellermann wants to merge 1 commit intoceph:mainfrom
MaxKellermann:assert_ndebug

MaxKellermann commented Oct 7, 2024

Uh oh!

MaxKellermann commented Oct 7, 2024

Uh oh!

rzarzynski Oct 13, 2024

Uh oh!

MaxKellermann Oct 14, 2024

Uh oh!

rzarzynski Dec 7, 2024

Uh oh!

MaxKellermann commented Dec 7, 2024

Uh oh!

rzarzynski commented Dec 7, 2024

Uh oh!

MaxKellermann commented Dec 7, 2024

Uh oh!

MaxKellermann commented Dec 9, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

MaxKellermann commented Oct 7, 2024

Checklist

Uh oh!

MaxKellermann commented Oct 7, 2024

Uh oh!

rzarzynski Oct 13, 2024

Choose a reason for hiding this comment

Uh oh!

MaxKellermann Oct 14, 2024

Choose a reason for hiding this comment

Uh oh!

rzarzynski Dec 7, 2024

Choose a reason for hiding this comment

Uh oh!

MaxKellermann commented Dec 7, 2024

Uh oh!

rzarzynski commented Dec 7, 2024

Uh oh!

MaxKellermann commented Dec 7, 2024

Uh oh!

MaxKellermann commented Dec 9, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants