Try fix flaky tests with transactions#37822
Conversation
This reverts commit 049d2a0.
|
@Mergifyio update |
✅ Branch has been successfully updated |
4868b96 to
1d5f43a
Compare
1d5f43a to
e4d4d0a
Compare
|
Integration tests - No space left on device But let's run tests again just in case |
|
@Mergifyio update |
✅ Branch has been successfully updated |
|
Integration tests (asan, actions) [1/3] - No space left on device Stateless tests (aarch64, actions) - 02232_dist_insert_send_logs_level_hung - #38219 Stateless tests flaky check (address, actions) - OK Stress test (debug, actions) and Stress test (thread, actions): Some inconsistency in docker images (race condition in CI, cc: @Felixoid) |
|
@Mergifyio update |
✅ Branch has been successfully updated |
| assert((expected) == uncaught || (expected) + 1 == uncaught); \ | ||
| if ((expected) < uncaught) \ | ||
| { \ | ||
| tryLogCurrentException("NOEXCEPT_SCOPE"); \ |
There was a problem hiding this comment.
This will not work, you cannot rethrow exception outside the catch block, this will simply terminate (via std::terminate) the program. So this NOEXCEPT_SCOPE macro will simply call std::terminate on exception and will lost original exception.
I was thinking about this a little, and the only thing that pops up in my mind is setting some flag here and check it in Exception ctor and if it set, call abort/std::terminate, that way you will not loose it, however it looks not great.
There was a problem hiding this comment.
You're right, it does not work as expected and seems like it's there's no way to get exception that is currently unwinding stack (even using __cxa_* functions or something from libunwind...). Not sure what options is better: just remove this like or add thread-local flag and check it when throwing an exception. Maybe it's ok to simply remove it for now and think about logging if we will face some issue that we will not be able to debug without information about unexpected exception...
There was a problem hiding this comment.
I have another idea on how NOEXCEPT_SCOPE can be implemented, by accepting the whole code that should be executed inside, like SCOPE_EXIT, and this will also make it more explicit then simply defining in some scope.
What do you think about this? In the meantime I will prepare a patch and see how it will looks like.
| } | ||
| } | ||
|
|
||
| NOEXCEPT_SCOPE; |
There was a problem hiding this comment.
Can you please explain why it is required? And looks like it is better to add a comment for users of NOEXCEPT_SCOPE, thoughts?
There was a problem hiding this comment.
See the comment:
ClickHouse/src/Common/noexcept_scope.h
Lines 30 to 33 in 39c0219
It's required in MergeTreeData::Transaction::commit because changes to precommitted parts and MergeTreeData state (like set of parts with their states, indexes and column sizes) are not undone on exceptions. Normally this code should not throw any exceptions (unless maybe MEMORY_LIMIT_EXCEEDED, but exceptions from MemoryTracker are blocked), but in case it does (due to some bug or whatever) I would prefer to terminate immediately.
You can think of this macro as kind of paranoid defensive programming :)
There was a problem hiding this comment.
Yep, I read that comment, and also came to conclusion that it is for counters accounting.
However I wasn't came to this PR randomly, I got some exception here, but due to NOEXCEPT_SCOPE the exception was not printed so I cannot know the exact reason, and zero information in logs. I've tried to get something from coredump but failed.
I've added a patch on top my build that will not loose the exception, however no exceptions occurred so far. I have one guess about exception there, that time zookeeper was under pressure, but this has been fixed before deploying the patch, and I'm not talking about some exception for zookeeper commands but instead some issues with parts maybe (some LOGICAL_ERROR).
|
|
||
| NOEXCEPT_SCOPE; | ||
| LockMemoryExceptionInThread lock_memory_tracker(VariableContext::Global); | ||
| NOEXCEPT_SCOPE_STRICT; |
There was a problem hiding this comment.
I see the difference between them, but can you explain why strict variant is used here?
There was a problem hiding this comment.
Because initially we had only strict variant and it was the only place where this macro was used. But strict variant is not generic enough to use it in other places. I decided to leave strict variant here because something is probably going wrong if some code tries to initialize transaction log during stack unwinding.
Changelog category (leave one):
Previous PR #36644 fixed tests, but it triggers a bug in bash...
Let's try to simply send SIGTERM without any tricks
Also fixes #37920