Skip to content

Fix abnormal server terminations due to write failures#20465

Merged
alexey-milovidov merged 2 commits intoClickHouse:masterfrom
azat:write-abnormal-server-termination-fixes
Feb 14, 2021
Merged

Fix abnormal server terminations due to write failures#20465
alexey-milovidov merged 2 commits intoClickHouse:masterfrom
azat:write-abnormal-server-termination-fixes

Conversation

@azat
Copy link
Copy Markdown
Member

@azat azat commented Feb 13, 2021

Maybe there will be more patches in this PR...

Changelog category (leave one):

  • Not for changelog

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Fix abnormal server terminations due to write failures

Fixes: #19451 (please add the no-backport label, since it has not bee included into any release yet)

@robot-clickhouse robot-clickhouse added the pr-bugfix Pull request with bugfix, not backported by default label Feb 13, 2021
@azat azat force-pushed the write-abnormal-server-termination-fixes branch from 37060fd to 48791e2 Compare February 13, 2021 10:17
@azat azat force-pushed the write-abnormal-server-termination-fixes branch from 48791e2 to 06e8065 Compare February 13, 2021 10:20
@amosbird
Copy link
Copy Markdown
Collaborator

Perhaps we should just ignore all errors in dtor as the same way the stdlib does it. If the user cares about errors during close, they must call close() themselves. Otherwise, the dtor calls close and ignores all errors from the call.

@azat
Copy link
Copy Markdown
Member Author

azat commented Feb 13, 2021

Perhaps we should just ignore all errors in dtor as the same way the stdlib does it. If the user cares about errors during close, they must call close() themselves. Otherwise, the dtor calls close and ignores all errors from the call.

That was previous behavior, but it does not looks safe, since "user" (developer) can forget about calling one of finalize/sync/next and the error will be simply ignored (and not only forgotten but also if something will be thrown in between), w/o any error provided to user.

@amosbird
Copy link
Copy Markdown
Collaborator

Perhaps we should just ignore all errors in dtor as the same way the stdlib does it. If the user cares about errors during close, they must call close() themselves. Otherwise, the dtor calls close and ignores all errors from the call.

That was previous behavior, but it does not looks safe, since "user" (developer) can forget about calling one of finalize/sync/next and the error will be simply ignored (and not only forgotten but also if something will be thrown in between), w/o any error provided to user.

But we should not throw in dtor anyway.

@amosbird
Copy link
Copy Markdown
Collaborator

amosbird commented Feb 14, 2021

Destructors are noexcept by default. We can annotate it with noexcept(false) and check if there is any std::uncaught_exception on the fly. If yes, we log the exception. If no, we throw.

@azat
Copy link
Copy Markdown
Member Author

azat commented Feb 14, 2021

Destructors are noexcept by default.

Right

We can annotate it with noexcept(false) and check if there is any std::uncaught_exception on the fly. If yes, we log the exception. If no, we throw.

I tried to do this, but the problem is - there is too much classes that should be changed (including poco).

And actually to me it is better to terminate the server if the write failed instead of writing error the log, someone may miss this and the data will be lost.
My initial plan was to do this and make all the callers safe to avoid abnormal server terminations.

Also note, that this patches are still make sense by themselves (regardless #19451)

@alexey-milovidov
Copy link
Copy Markdown
Member

alexey-milovidov commented Feb 14, 2021

We should not make destructors noexcept(false).
std::terminate in destructors is Ok.

@alexey-milovidov
Copy link
Copy Markdown
Member

00992_system_parts_race_condition_zookeeper

Broken after merging NuRaft.

@alexey-milovidov
Copy link
Copy Markdown
Member

AST fuzzer (TSan) — Lost connection to server. See the logs.

OOM.

@alexey-milovidov alexey-milovidov self-assigned this Feb 14, 2021
@robot-clickhouse robot-clickhouse added pr-not-for-changelog This PR should not be mentioned in the changelog and removed pr-bugfix Pull request with bugfix, not backported by default labels Feb 14, 2021
Poco::File(new_tmp_part_path_str + "checksums.txt").setWriteable();
WriteBufferFromFile checksums_out(new_tmp_part_path_str + "checksums.txt", 4096);
checksums.write(checksums_out);
checksums.close();
Copy link
Copy Markdown
Contributor

@baibaichen baibaichen Feb 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@azat

MergeTreeDataPartChecksums doesn't define close method, do you mean checksums_in.close() or checksums_out.close() ?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, my bad, can you send a patch? (Interesting, there is no splitted check on CI, hence this wasn't captured, since ENABLE_UTILS is set only for splitted)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@azat azat deleted the write-abnormal-server-termination-fixes branch February 15, 2021 18:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-no-backport pr-not-for-changelog This PR should not be mentioned in the changelog

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants