Skip to content

Comments

include/ceph_assert: disable with NDEBUG#60160

Closed
MaxKellermann wants to merge 1 commit intoceph:mainfrom
MaxKellermann:assert_ndebug
Closed

include/ceph_assert: disable with NDEBUG#60160
MaxKellermann wants to merge 1 commit intoceph:mainfrom
MaxKellermann:assert_ndebug

Conversation

@MaxKellermann
Copy link
Member

This makes ceph_assert() behave like the standard C assert() macro: in release builds (with -DNDEBUG), assertions should be compiled out, leaving no overhead.

This reduces the code size of ceph-mds (static build including libceph-common) by 550 kB.

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)

@MaxKellermann
Copy link
Member Author

/home/jenkins-build/build/workspace/ceph-api/src/librados/librados_asio.h:193:10: error: no matching function for call to 'async_initiate'
  return boost::asio::async_initiate<CompletionToken, Signature>(
         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

"ceph API tests" failure apparently not related to this PR.

This makes ceph_assert() behave like the standard C assert() macro: in
release builds (with `-DNDEBUG`), assertions should be compiled out,
leaving no overhead.

This reduces the code size of `ceph-mds` (static build including
`libceph-common`) by 550 kB.

Signed-off-by: Max Kellermann <[email protected]>

#ifdef __SANITIZE_ADDRESS__
#ifdef NDEBUG
#define ceph_assert(expr) do {} while (false)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a huge change. Many years ago there was an idea to replace the important occurrences of ceph_assert() with ceph_assert_always() and compile-out everything else for production builds. The problem is of course in judging what's essential across the entire code-base.

If you want to resurrect the effort, I would propose to bring the topic to next Ceph Development Montly meeting.

How much of CPU overhead does this squeeze?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I havn't isolated this patch in my measurements, but I think it's a small piece. Most users can probably afford it, but we at IONOS have serious performance problems with the MDS, so we want to squeeze out every bit we can (and we will leave patches that are not accepted upstream in our private fork).
I do however not get the point of ceph_assert_always() - by design, assertions are only for detecting bugs in debug builds - is this for people not confident enough in their code, and they still think they need guaranteed assertions in release builds? Why not regular runtime checks? (I'm not a fan of redundant runtime checks either. Get the code right, verify it with fine-grained unit tests, and remove all the unnecessary checks at release-runtime.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I havn't isolated this patch in my measurements, but I think it's a small piece.

I share this intuition on the CPU overhead.
I don't see benefit in optimizing the size of the executable as most of the asserts' code already is (IIRC; "should be" for sure) in cold sections, so they shouldn't inflict a cost even in terms of memory but just address space.

I do however not get the point of ceph_assert_always()

Yeah, the effort to differentiate the asserts is long dead. Perhaps we should say this explicitly and switch everybody to ceph_assert().

by design, assertions are only for detecting bugs in debug builds - is this for people not confident enough in their code, and they still think they need guaranteed assertions in release builds?

Hah, I don't think a programmer should be confident enough till having a mathematical proof of correctness.

Why not regular runtime checks? (I'm not a fan of redundant runtime checks either.

What do you mean by regular run-time checks? Sanitizers?

Get the code right, verify it with fine-grained unit tests, and remove all the unnecessary checks at release-runtime.)

These assertions saved me so much investigation time in the wild that I really appreciate their presence in productions builds. As there is no clear evidence of high cost, I'm for rejecting the idea. Yet, if you think you have strong enough arguments, I'm happy to discuss at CDM (see https://tracker.ceph.com/projects/ceph/wiki/Planning).

@MaxKellermann
Copy link
Member Author

@rzarzynski, how's the discussion of this PR going? Yay or any? Merge or reject? Something else, like add another compile-time option?

@rzarzynski
Copy link
Contributor

how's the discussion of this PR going? Yay or any? Merge or reject? Something else, like add another compile-time option?

Responded in comment above.

@MaxKellermann
Copy link
Member Author

Okay then, I'll keep this one in our private fork.

@MaxKellermann MaxKellermann deleted the assert_ndebug branch December 7, 2024 16:27
@MaxKellermann
Copy link
Member Author

How much of CPU overhead does this squeeze?

I was curious and measured this; a fio run without this change, measuring perf stat -p $(pidof ceph-mds):

          61226.41 msec task-clock                       #    1.508 CPUs utilized             
            842964      context-switches                 #   13.768 K/sec                     
               246      cpu-migrations                   #    4.018 /sec                      
            210994      page-faults                      #    3.446 K/sec                     
       86418062722      cycles                           #    1.411 GHz                       
       17704534501      stalled-cycles-frontend          #   20.49% frontend cycles idle      
       65247819356      instructions                     #    0.76  insn per cycle            
                                                  #    0.27  stalled cycles per insn   
       12910645552      branches                         #  210.867 M/sec                     
         523527782      branch-misses                    #    4.06% of all branches           

With this change:

          57339.55 msec task-clock                       #    1.514 CPUs utilized             
            815840      context-switches                 #   14.228 K/sec                     
               306      cpu-migrations                   #    5.337 /sec                      
            199497      page-faults                      #    3.479 K/sec                     
       83150952383      cycles                           #    1.450 GHz                       
       16366759028      stalled-cycles-frontend          #   19.68% frontend cycles idle      
       62597488888      instructions                     #    0.75  insn per cycle            
                                                  #    0.26  stalled cycles per insn   
       12316935143      branches                         #  214.807 M/sec                     
         490099136      branch-misses                    #    3.98% of all branches           

That's 6-7% less CPU usage (task-clock: 57339/61226 = 0.9365). It's much more than I thought.

I agree that having assertions enabled is often very useful (my own software runs with assertions enabled in production), but that's an explicit choice at compile time (by not specifying -DNDEBUG, as documented in the C++ standard). Ceph however deviates from the standard and does not grant users this choice.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants