Skip to content

Write current batch for distributed send atomically (using .tmp + rename)#7600

Merged
alexey-milovidov merged 1 commit intoClickHouse:masterfrom
azat:DirectoryMonitor-current_batch.txt-corruption
Nov 5, 2019
Merged

Write current batch for distributed send atomically (using .tmp + rename)#7600
alexey-milovidov merged 1 commit intoClickHouse:masterfrom
azat:DirectoryMonitor-current_batch.txt-corruption

Conversation

@azat
Copy link
Copy Markdown
Member

@azat azat commented Nov 3, 2019

Category (leave one):

  • Bug Fix

Short description (up to few sentences):
Write current batch for distributed send atomically (using .tmp + rename)

Detailed description (optional):

Otherwise the following can happen after reboot:

    2019.11.01 11:46:12.217143 [ 187 ] {} <Error> dist.Distributed.DirectoryMonitor: Code: 27, e.displayText() = DB::Exception: Cannot parse input: expected \n before: S\'^A\0^]\0\0<BE>4^A\0r<87>\0\0<A2><D7>^D^Y\0<F2>{^E<CD>\0\0Hy\0\0<F2>^_^C\0^_&\0\0<FF><D3>\0\0
    <8D><91>\0\0<C0>9\0\0<C0><B0>^A\0^G<AA>\0\0<B5><FE>^A\0<BF><A7>^A\0<9B><CB>^A\0I^R^A\0<B7><AB>^A\0<BC><8F>\0\0˲^B\0Zy\0\0<94><AA>\0\0<98>
    <8F>\0\0\f<A5>\0\0^QN\0\0<E3><C6>\0\0<B1>6^B\0ɳ\0\0W<99>\0\0<B9><A2>\0\0:<BB>\0\0)<B1>\0\0#<8B>\0\0aW\0\0<ED>#\0\0<F1>@\0\0ˀ^B\0<D7><FC>\0\0<DF>, Stack trace:

    0. 0x559e27222e60 StackTrace::StackTrace() /usr/bin/clickhouse
    1. 0x559e27222c45 DB::Exception::Exception(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, int) /usr/bin/clickhouse
    2. 0x559e26de4473 ? /usr/bin/clickhouse
    3. 0x559e272494b5 DB::assertString(char const*, DB::ReadBuffer&) /usr/bin/clickhouse
    4. 0x559e2a5dab45 DB::StorageDistributedDirectoryMonitor::processFilesWithBatching(std::map<unsigned long, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<unsigned long>, std::allocator<std::pair<unsigned long const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > const&) /usr/bin/clickhouse
    5. 0x559e2a5db5fa DB::StorageDistributedDirectoryMonitor::processFiles() /usr/bin/clickhouse
    6. 0x559e2a5dba78 DB::StorageDistributedDirectoryMonitor::run() /usr/bin/clickhouse
    7. 0x559e2a5ddbbc ThreadFromGlobalPool::ThreadFromGlobalPool<void (DB::StorageDistributedDirectoryMonitor::*)(), DB::StorageDistributedDirectoryMonitor*>(void (DB::StorageDistributedDirectoryMonitor::*&&)(), DB::StorageDistributedDirectoryMonitor*&&)::{lambda()#1}::operator()() const /usr/bin/clickhouse
    8. 0x559e2726b07c ThreadPoolImpl<std::thread>::worker(std::_List_iterator<std::thread>) /usr/bin/clickhouse
    9. 0x559e2bbc3640 ? /usr/bin/clickhouse
    10. 0x7fbd62b3cfb7 start_thread /lib/x86_64-linux-gnu/libpthread-2.29.so
    11. 0x7fbd62a692ef __clone /lib/x86_64-linux-gnu/libc-2.29.so
     (version 19.17.1.1)

@azat azat force-pushed the DirectoryMonitor-current_batch.txt-corruption branch from 8642371 to 500d968 Compare November 3, 2019 19:26
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What will happen if the file already exists? Need a comment that it will truncate the existing file.
Also need a comment about what will hapen with obsolete .tmp files (nothing and user should clean manually).

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also need a comment about what will hapen with obsolete .tmp files (nothing and user should clean manually).

Should user care about these .tmp files, if they will be simply overwritten?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should user care about these .tmp files, if they will be simply overwritten?

No. But as far as I remember, we use auto-increment, so the next file will have different name.

Copy link
Copy Markdown
Member Author

@azat azat Nov 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, don't see the auto-increment bit.

@alexey-milovidov
Copy link
Copy Markdown
Member

ClickHouse build check

Will be fixed after you merge with master.

Copy link
Copy Markdown
Member

@alexey-milovidov alexey-milovidov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.

@azat azat force-pushed the DirectoryMonitor-current_batch.txt-corruption branch from 500d968 to f9f0edf Compare November 4, 2019 20:50
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But this is a race condition. Truncate (the default) is better.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is simply to put a message into the log

…ame)

Otherwise the following can happen after reboot:

    2019.11.01 11:46:12.217143 [ 187 ] {} <Error> dist.Distributed.DirectoryMonitor: Code: 27, e.displayText() = DB::Exception: Cannot parse input: expected \n before: S\'^A\0^]\0\0<BE>4^A\0r<87>\0\0<A2><D7>^D^Y\0<F2>{^E<CD>\0\0Hy\0\0<F2>^_^C\0^_&\0\0<FF><D3>\0\0
    <8D><91>\0\0<C0>9\0\0<C0><B0>^A\0^G<AA>\0\0<B5><FE>^A\0<BF><A7>^A\0<9B><CB>^A\0I^R^A\0<B7><AB>^A\0<BC><8F>\0\0˲^B\0Zy\0\0<94><AA>\0\0<98>
    <8F>\0\0\f<A5>\0\0^QN\0\0<E3><C6>\0\0<B1>6^B\0ɳ\0\0W<99>\0\0<B9><A2>\0\0:<BB>\0\0)<B1>\0\0#<8B>\0\0aW\0\0<ED>#\0\0<F1>@\0\0ˀ^B\0<D7><FC>\0\0<DF>, Stack trace:

    0. 0x559e27222e60 StackTrace::StackTrace() /usr/bin/clickhouse
    1. 0x559e27222c45 DB::Exception::Exception(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, int) /usr/bin/clickhouse
    2. 0x559e26de4473 ? /usr/bin/clickhouse
    3. 0x559e272494b5 DB::assertString(char const*, DB::ReadBuffer&) /usr/bin/clickhouse
    4. 0x559e2a5dab45 DB::StorageDistributedDirectoryMonitor::processFilesWithBatching(std::map<unsigned long, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<unsigned long>, std::allocator<std::pair<unsigned long const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > const&) /usr/bin/clickhouse
    5. 0x559e2a5db5fa DB::StorageDistributedDirectoryMonitor::processFiles() /usr/bin/clickhouse
    6. 0x559e2a5dba78 DB::StorageDistributedDirectoryMonitor::run() /usr/bin/clickhouse
    7. 0x559e2a5ddbbc ThreadFromGlobalPool::ThreadFromGlobalPool<void (DB::StorageDistributedDirectoryMonitor::*)(), DB::StorageDistributedDirectoryMonitor*>(void (DB::StorageDistributedDirectoryMonitor::*&&)(), DB::StorageDistributedDirectoryMonitor*&&)::{lambda()ClickHouse#1}::operator()() const /usr/bin/clickhouse
    8. 0x559e2726b07c ThreadPoolImpl<std::thread>::worker(std::_List_iterator<std::thread>) /usr/bin/clickhouse
    9. 0x559e2bbc3640 ? /usr/bin/clickhouse
    10. 0x7fbd62b3cfb7 start_thread /lib/x86_64-linux-gnu/libpthread-2.29.so
    11. 0x7fbd62a692ef __clone /lib/x86_64-linux-gnu/libc-2.29.so
     (version 19.17.1.1)

v2: remove fsync, to avoid possible stalls (ClickHouse#7600 (comment))
@azat azat force-pushed the DirectoryMonitor-current_batch.txt-corruption branch from f9f0edf to 4dfffdd Compare November 4, 2019 21:24
@alexey-milovidov alexey-milovidov merged commit 1bfade5 into ClickHouse:master Nov 5, 2019
alexey-milovidov added a commit that referenced this pull request Nov 5, 2019
@azat azat deleted the DirectoryMonitor-current_batch.txt-corruption branch November 5, 2019 20:56
@alesapin alesapin added pr-bugfix Pull request with bugfix, not backported by default v19.16 labels Nov 8, 2019
alesapin pushed a commit that referenced this pull request Nov 8, 2019
…ame)

Otherwise the following can happen after reboot:

    2019.11.01 11:46:12.217143 [ 187 ] {} <Error> dist.Distributed.DirectoryMonitor: Code: 27, e.displayText() = DB::Exception: Cannot parse input: expected \n before: S\'^A\0^]\0\0<BE>4^A\0r<87>\0\0<A2><D7>^D^Y\0<F2>{^E<CD>\0\0Hy\0\0<F2>^_^C\0^_&\0\0<FF><D3>\0\0
    <8D><91>\0\0<C0>9\0\0<C0><B0>^A\0^G<AA>\0\0<B5><FE>^A\0<BF><A7>^A\0<9B><CB>^A\0I^R^A\0<B7><AB>^A\0<BC><8F>\0\0˲^B\0Zy\0\0<94><AA>\0\0<98>
    <8F>\0\0\f<A5>\0\0^QN\0\0<E3><C6>\0\0<B1>6^B\0ɳ\0\0W<99>\0\0<B9><A2>\0\0:<BB>\0\0)<B1>\0\0#<8B>\0\0aW\0\0<ED>#\0\0<F1>@\0\0ˀ^B\0<D7><FC>\0\0<DF>, Stack trace:

    0. 0x559e27222e60 StackTrace::StackTrace() /usr/bin/clickhouse
    1. 0x559e27222c45 DB::Exception::Exception(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, int) /usr/bin/clickhouse
    2. 0x559e26de4473 ? /usr/bin/clickhouse
    3. 0x559e272494b5 DB::assertString(char const*, DB::ReadBuffer&) /usr/bin/clickhouse
    4. 0x559e2a5dab45 DB::StorageDistributedDirectoryMonitor::processFilesWithBatching(std::map<unsigned long, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<unsigned long>, std::allocator<std::pair<unsigned long const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > const&) /usr/bin/clickhouse
    5. 0x559e2a5db5fa DB::StorageDistributedDirectoryMonitor::processFiles() /usr/bin/clickhouse
    6. 0x559e2a5dba78 DB::StorageDistributedDirectoryMonitor::run() /usr/bin/clickhouse
    7. 0x559e2a5ddbbc ThreadFromGlobalPool::ThreadFromGlobalPool<void (DB::StorageDistributedDirectoryMonitor::*)(), DB::StorageDistributedDirectoryMonitor*>(void (DB::StorageDistributedDirectoryMonitor::*&&)(), DB::StorageDistributedDirectoryMonitor*&&)::{lambda()#1}::operator()() const /usr/bin/clickhouse
    8. 0x559e2726b07c ThreadPoolImpl<std::thread>::worker(std::_List_iterator<std::thread>) /usr/bin/clickhouse
    9. 0x559e2bbc3640 ? /usr/bin/clickhouse
    10. 0x7fbd62b3cfb7 start_thread /lib/x86_64-linux-gnu/libpthread-2.29.so
    11. 0x7fbd62a692ef __clone /lib/x86_64-linux-gnu/libc-2.29.so
     (version 19.17.1.1)

v2: remove fsync, to avoid possible stalls (#7600 (comment))
alesapin pushed a commit that referenced this pull request Nov 11, 2019
…ame)

Otherwise the following can happen after reboot:

    2019.11.01 11:46:12.217143 [ 187 ] {} <Error> dist.Distributed.DirectoryMonitor: Code: 27, e.displayText() = DB::Exception: Cannot parse input: expected \n before: S\'^A\0^]\0\0<BE>4^A\0r<87>\0\0<A2><D7>^D^Y\0<F2>{^E<CD>\0\0Hy\0\0<F2>^_^C\0^_&\0\0<FF><D3>\0\0
    <8D><91>\0\0<C0>9\0\0<C0><B0>^A\0^G<AA>\0\0<B5><FE>^A\0<BF><A7>^A\0<9B><CB>^A\0I^R^A\0<B7><AB>^A\0<BC><8F>\0\0˲^B\0Zy\0\0<94><AA>\0\0<98>
    <8F>\0\0\f<A5>\0\0^QN\0\0<E3><C6>\0\0<B1>6^B\0ɳ\0\0W<99>\0\0<B9><A2>\0\0:<BB>\0\0)<B1>\0\0#<8B>\0\0aW\0\0<ED>#\0\0<F1>@\0\0ˀ^B\0<D7><FC>\0\0<DF>, Stack trace:

    0. 0x559e27222e60 StackTrace::StackTrace() /usr/bin/clickhouse
    1. 0x559e27222c45 DB::Exception::Exception(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, int) /usr/bin/clickhouse
    2. 0x559e26de4473 ? /usr/bin/clickhouse
    3. 0x559e272494b5 DB::assertString(char const*, DB::ReadBuffer&) /usr/bin/clickhouse
    4. 0x559e2a5dab45 DB::StorageDistributedDirectoryMonitor::processFilesWithBatching(std::map<unsigned long, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<unsigned long>, std::allocator<std::pair<unsigned long const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > const&) /usr/bin/clickhouse
    5. 0x559e2a5db5fa DB::StorageDistributedDirectoryMonitor::processFiles() /usr/bin/clickhouse
    6. 0x559e2a5dba78 DB::StorageDistributedDirectoryMonitor::run() /usr/bin/clickhouse
    7. 0x559e2a5ddbbc ThreadFromGlobalPool::ThreadFromGlobalPool<void (DB::StorageDistributedDirectoryMonitor::*)(), DB::StorageDistributedDirectoryMonitor*>(void (DB::StorageDistributedDirectoryMonitor::*&&)(), DB::StorageDistributedDirectoryMonitor*&&)::{lambda()#1}::operator()() const /usr/bin/clickhouse
    8. 0x559e2726b07c ThreadPoolImpl<std::thread>::worker(std::_List_iterator<std::thread>) /usr/bin/clickhouse
    9. 0x559e2bbc3640 ? /usr/bin/clickhouse
    10. 0x7fbd62b3cfb7 start_thread /lib/x86_64-linux-gnu/libpthread-2.29.so
    11. 0x7fbd62a692ef __clone /lib/x86_64-linux-gnu/libc-2.29.so
     (version 19.17.1.1)

v2: remove fsync, to avoid possible stalls (#7600 (comment))
CurtizJ pushed a commit that referenced this pull request Nov 15, 2019
…-corruption

Write current batch for distributed send atomically (using .tmp + rename)

(cherry picked from commit 1bfade5)
@CurtizJ CurtizJ added the v19.14 label Nov 15, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-bugfix Pull request with bugfix, not backported by default

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants