Skip to content

Fix use-after-free for Map combinator that leads to incorrect result#38748

Merged
alexey-milovidov merged 2 commits intoClickHouse:masterfrom
azat:Map-combinator-fix
Jul 3, 2022
Merged

Fix use-after-free for Map combinator that leads to incorrect result#38748
alexey-milovidov merged 2 commits intoClickHouse:masterfrom
azat:Map-combinator-fix

Conversation

@azat
Copy link
Copy Markdown
Member

@azat azat commented Jul 3, 2022

Changelog category (leave one):

  • Bug Fix (user-visible misbehavior in official stable or prestable release)

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Fix use-after-free for Map combinator that leads to incorrect result

This use-after-free can be reproduced with distributed queries.

Also note, that this is not sumMappedArray() and friends (that
previously called sumMap()) but Map combinator (cc @alexey-milovidov )

You will find ASan report in details.

Details
READ of size 8 at 0x62d00012d218 thread T186 (QueryPipelineEx)
2022.07.03 05:09:40.000234 [ 31956 ] {} <Trace> AsynchronousMetrics: MemoryTracking: was 1.23 GiB, peak 1.23 GiB, will set to 1.25 GiB (RSS), difference: 19.51 MiB
2022.07.03 05:09:41.000137 [ 31956 ] {} <Trace> AsynchronousMetrics: MemoryTracking: was 1.25 GiB, peak 1.25 GiB, will set to 1.26 GiB (RSS), difference: 3.76 MiB
    #0 0x1233a0d8 in DB::AggregateFunctionSumData<>::get() const build_docker/../src/AggregateFunctions/AggregateFunctionSum.h:245:16
    #1 0x1233a0d8 in DB::AggregateFunctionSum<>::insertResultInto(char*, DB::IColumn&, DB::Arena*) const build_docker/../src/AggregateFunctions/AggregateFunctionSum.h:536:70
    #2 0x1470f910 in DB::AggregateFunctionMap<char8_t>::insertResultInto() const build_docker/../src/AggregateFunctions/AggregateFunctionMap.h:236:26
    #3 0x147110ce in DB::IAggregateFunctionHelper<>::insertResultIntoBatch() const build_docker/../src/AggregateFunctions/IAggregateFunction.h:618:53
    #4 0x2c4269d7 in void DB::Aggregator::convertToBlockImplFinal<>() const build_docker/../src/Interpreters/Aggregator.cpp:1878:49
    #5 0x2c403b9f in void DB::Aggregator::convertToBlockImpl<>() const build_docker/../src/Interpreters/Aggregator.cpp:1714:13
    #6 0x2be09b53 in DB::Aggregator::prepareBlockAndFillSingleLevel() const::$_2::operator()() const build_docker/../src/Interpreters/Aggregator.cpp:2144:9
    #7 0x2be09b53 in DB::Block DB::Aggregator::prepareBlockAndFill<>() const build_docker/../src/Interpreters/Aggregator.cpp:2000:5
    #8 0x2be09b53 in DB::Aggregator::prepareBlockAndFillSingleLevel() const build_docker/../src/Interpreters/Aggregator.cpp:2150:12
    #9 0x2be37de3 in DB::Aggregator::mergeBlocks() build_docker/../src/Interpreters/Aggregator.cpp:3032:17
    #10 0x308c27f8 in DB::MergingAggregatedBucketTransform::transform() build_docker/../src/Processors/Transforms/MergingAggregatedMemoryEfficientTransform.cpp:360:37

0x62d00012d218 is located 3608 bytes inside of 32768-byte region [0x62d00012c400,0x62d000134400)
freed by thread T186 (QueryPipelineEx) here:
    #0 0xd701312 in free (/work1/azat/tmp/upstream/clickhouse-asan+0xd701312) (BuildId: b7977aef37e9f720)
    ...
    #8 0x2e3c22eb in DB::ColumnAggregateFunction::~ColumnAggregateFunction() build_docker/../src/Columns/ColumnAggregateFunction.cpp:89:1
    ...
    #18 0xd9fcdd4 in std::__1::vector<DB::ColumnWithTypeAndName, std::__1::allocator<DB::ColumnWithTypeAndName> >::~vector() build_docker/../contrib/libcxx/include/vector:401:9
    #19 0x2be373f4 in DB::Aggregator::mergeBlocks() build_docker/../contrib/libcxx/include/__memory/unique_ptr.h
    #20 0x308c27f8 in DB::MergingAggregatedBucketTransform::transform() build_docker/../src/Processors/Transforms/MergingAggregatedMemoryEfficientTransform.cpp:360:37

previously allocated by thread T186 (QueryPipelineEx) here:
    #0 0xd7015be in malloc (/work1/azat/tmp/upstream/clickhouse-asan+0xd7015be) (BuildId: b7977aef37e9f720)
    #1 0xd85190a in Allocator<false, false>::allocNoTrack(unsigned long, unsigned long) build_docker/../src/Common/Allocator.h:227:27
    #2 0xd988d45 in Allocator<false, false>::alloc(unsigned long, unsigned long) build_docker/../src/Common/Allocator.h:96:16
    #3 0xd988d45 in DB::Arena::MemoryChunk::MemoryChunk(unsigned long, DB::Arena::MemoryChunk*) build_docker/../src/Common/Arena.h:54:64
    #4 0xd98904b in DB::Arena::addMemoryChunk(unsigned long) build_docker/../src/Common/Arena.h:122:20
    #5 0xec9542c in DB::Arena::alignedAlloc(unsigned long, unsigned long) build_docker/../src/Common/Arena.h:171:13
    #6 0x1470f123 in DB::AggregateFunctionMap<char8_t>::deserialize() const build_docker/../src/AggregateFunctions/AggregateFunctionMap.h:205:35

P.S. Thanks to @den-crane for the reproducer.

Fixes: #35359 (cc @den-crane @dongxiao-yang)

This use-after-free can be reproduced with distributed queries.

Also note, that this is not sumMappedArray() and friends (that
previously called sumMap()) but Map combinator.

You will find ASan report in details.

<details>

    READ of size 8 at 0x62d00012d218 thread T186 (QueryPipelineEx)
    2022.07.03 05:09:40.000234 [ 31956 ] {} <Trace> AsynchronousMetrics: MemoryTracking: was 1.23 GiB, peak 1.23 GiB, will set to 1.25 GiB (RSS), difference: 19.51 MiB
    2022.07.03 05:09:41.000137 [ 31956 ] {} <Trace> AsynchronousMetrics: MemoryTracking: was 1.25 GiB, peak 1.25 GiB, will set to 1.26 GiB (RSS), difference: 3.76 MiB
        #0 0x1233a0d8 in DB::AggregateFunctionSumData<>::get() const build_docker/../src/AggregateFunctions/AggregateFunctionSum.h:245:16
        ClickHouse#1 0x1233a0d8 in DB::AggregateFunctionSum<>::insertResultInto(char*, DB::IColumn&, DB::Arena*) const build_docker/../src/AggregateFunctions/AggregateFunctionSum.h:536:70
        ClickHouse#2 0x1470f910 in DB::AggregateFunctionMap<char8_t>::insertResultInto() const build_docker/../src/AggregateFunctions/AggregateFunctionMap.h:236:26
        ClickHouse#3 0x147110ce in DB::IAggregateFunctionHelper<>::insertResultIntoBatch() const build_docker/../src/AggregateFunctions/IAggregateFunction.h:618:53
        ClickHouse#4 0x2c4269d7 in void DB::Aggregator::convertToBlockImplFinal<>() const build_docker/../src/Interpreters/Aggregator.cpp:1878:49
        ClickHouse#5 0x2c403b9f in void DB::Aggregator::convertToBlockImpl<>() const build_docker/../src/Interpreters/Aggregator.cpp:1714:13
        ClickHouse#6 0x2be09b53 in DB::Aggregator::prepareBlockAndFillSingleLevel() const::$_2::operator()() const build_docker/../src/Interpreters/Aggregator.cpp:2144:9
        ClickHouse#7 0x2be09b53 in DB::Block DB::Aggregator::prepareBlockAndFill<>() const build_docker/../src/Interpreters/Aggregator.cpp:2000:5
        ClickHouse#8 0x2be09b53 in DB::Aggregator::prepareBlockAndFillSingleLevel() const build_docker/../src/Interpreters/Aggregator.cpp:2150:12
        ClickHouse#9 0x2be37de3 in DB::Aggregator::mergeBlocks() build_docker/../src/Interpreters/Aggregator.cpp:3032:17
        ClickHouse#10 0x308c27f8 in DB::MergingAggregatedBucketTransform::transform() build_docker/../src/Processors/Transforms/MergingAggregatedMemoryEfficientTransform.cpp:360:37

    0x62d00012d218 is located 3608 bytes inside of 32768-byte region [0x62d00012c400,0x62d000134400)
    freed by thread T186 (QueryPipelineEx) here:
        #0 0xd701312 in free (/work1/azat/tmp/upstream/clickhouse-asan+0xd701312) (BuildId: b7977aef37e9f720)
        ...
        ClickHouse#8 0x2e3c22eb in DB::ColumnAggregateFunction::~ColumnAggregateFunction() build_docker/../src/Columns/ColumnAggregateFunction.cpp:89:1
        ...
        ClickHouse#18 0xd9fcdd4 in std::__1::vector<DB::ColumnWithTypeAndName, std::__1::allocator<DB::ColumnWithTypeAndName> >::~vector() build_docker/../contrib/libcxx/include/vector:401:9
        ClickHouse#19 0x2be373f4 in DB::Aggregator::mergeBlocks() build_docker/../contrib/libcxx/include/__memory/unique_ptr.h
        ClickHouse#20 0x308c27f8 in DB::MergingAggregatedBucketTransform::transform() build_docker/../src/Processors/Transforms/MergingAggregatedMemoryEfficientTransform.cpp:360:37

    previously allocated by thread T186 (QueryPipelineEx) here:
        #0 0xd7015be in malloc (/work1/azat/tmp/upstream/clickhouse-asan+0xd7015be) (BuildId: b7977aef37e9f720)
        ClickHouse#1 0xd85190a in Allocator<false, false>::allocNoTrack(unsigned long, unsigned long) build_docker/../src/Common/Allocator.h:227:27
        ClickHouse#2 0xd988d45 in Allocator<false, false>::alloc(unsigned long, unsigned long) build_docker/../src/Common/Allocator.h:96:16
        ClickHouse#3 0xd988d45 in DB::Arena::MemoryChunk::MemoryChunk(unsigned long, DB::Arena::MemoryChunk*) build_docker/../src/Common/Arena.h:54:64
        ClickHouse#4 0xd98904b in DB::Arena::addMemoryChunk(unsigned long) build_docker/../src/Common/Arena.h:122:20
        ClickHouse#5 0xec9542c in DB::Arena::alignedAlloc(unsigned long, unsigned long) build_docker/../src/Common/Arena.h:171:13
        ClickHouse#6 0x1470f123 in DB::AggregateFunctionMap<char8_t>::deserialize() const build_docker/../src/AggregateFunctions/AggregateFunctionMap.h:205:35

</details>

P.S. Thanks to @den-crane for the reproducer.

Fixes: ClickHouse#35359 (cc @den-crane @dongxiao-yang)
Signed-off-by: Azat Khuzhin <[email protected]>
@robot-clickhouse robot-clickhouse added the pr-bugfix Pull request with bugfix, not backported by default label Jul 3, 2022
@azat
Copy link
Copy Markdown
Member Author

azat commented Jul 3, 2022

Stateless tests (release, s3 storage, actions) — fail: 1, passed: 4179, skipped: 56

Stateless tests (thread, actions) [3/3] — fail: 1, passed: 1387, skipped: 9

  • 01848_http_insert_segfault

Stress test (memory, actions) — Backward compatibility check: Error message in clickhouse-server.log (see bc_check_error_messages.txt)

2022.07.03 14:57:26.259459 [ 417370 ] {} <Error> test_3.tp_2 (a43efe11-bda9-4b90-9f34-5d43e50ea318): Cannot quickly remove directory /var/lib/clickhouse/store/a43/a43efe11-bda9-4b90-9f34-5d43e50ea318/delete_tmp_all_1_1_0_2 by removing files; fallback to recursive removal. Reason: Code: 458. DB::ErrnoException: Cannot unlink file /var/lib/clickhouse/store/a43/a43efe11-bda9-4b90-9f34-5d43e50ea318/delete_tmp_all_1_1_0_2/pp.proj, errno: 21, strerror: Is a directory. (CANNOT_UNLINK) (version 22.7.1.1)

This is the problem for projections removal, but these additional checks will go away in #38717

Stress test (debug, actions) — Backward compatibility check: Error message in clickhouse-server.log (see bc_check_error_messages.txt)

2022.07.03 15:01:38.377587 [ 249660 ] {fab8ddb3-7bd9-4ff3-974a-c514b3f305bc} <Error> executeQuery: Code: 390. DB::Exception: Table `concurrent_alter_detach_2` doesn't exist. (CANNOT_GET_CREATE_TABLE_QUERY) (version 22.7.1.1) (from [::1]:41342) (comment: 01079_parallel_alter_detach_table_zookeeper.sh) (in query: ATTACH TABLE concurrent_alter_detach_2), Stack trace (when copying this message, always include the lines below):

Same here.

But I forgot to add no-bc tag.

@azat
Copy link
Copy Markdown
Member Author

azat commented Jul 3, 2022

Stateless tests (thread, actions) [2/3] — fail: 1, passed: 1393, skipped: 19

  • 01301_aggregate_state_exception_memory_leak

The query just hang for more then 30 seconds (sadly but query profile cannot be used in TSan builds so no additional info):

2022.07.03 07:39:14.998865 [ 1304 ] {2a4f99e4-2d18-4c29-8610-2cb18ecf7316} <Trace> Aggregator: Aggregation method: key8
2022.07.03 07:42:36.504913 [ 41919 ] {2a4f99e4-2d18-4c29-8610-2cb18ecf7316} <Error> executeQuery: Code: 241. DB::Exception: Memory limit (for query) exceeded: would use 996.90 MiB (attempt to allocate chunk of 6291456 bytes), maximum: 953.67 MiB. OvercommitTracker decision: Memory overcommit isn't used. OvercommitTracker isn't set..: While executing AggregatingTransform. (MEMORY_LIMIT_EXCEEDED) (version 22.7.1.1) (from [::1]:54522) (comment: 01301_aggregate_state_exception_memory_leak.sh) (in query: SELECT uniqExactState(number) FROM system.numbers_mt GROUP BY number % 10 ), Stack trace (when copying this message, always include the lines below):

@alexey-milovidov
Copy link
Copy Markdown
Member

@azat Why this issue is not found automatically by the query fuzzer? Should we improve query fuzzer then?

@alexey-milovidov
Copy link
Copy Markdown
Member

00974_query_profiler is flaky: queue is full for system log OpenTelemetrySpanLog #38402

Then turn off OpenTelemetry for this test.

@alexey-milovidov alexey-milovidov added the pr-must-backport Pull request should be backported intentionally. Use this label with great care! label Jul 3, 2022
@alexey-milovidov alexey-milovidov self-assigned this Jul 3, 2022
@alexey-milovidov alexey-milovidov merged commit 1ee752b into ClickHouse:master Jul 3, 2022
@azat
Copy link
Copy Markdown
Member Author

azat commented Jul 3, 2022

Then turn off OpenTelemetry for this test

Yes, I thought about this, then decided to leave as-is, since this will only hides the problem.

@azat
Copy link
Copy Markdown
Member Author

azat commented Jul 3, 2022

@azat Why this issue is not found automatically by the query fuzzer?

It does not have good reference.

Should we improve query fuzzer then?

Indeed.

robot-clickhouse pushed a commit that referenced this pull request Jul 3, 2022
robot-clickhouse pushed a commit that referenced this pull request Jul 3, 2022
robot-clickhouse pushed a commit that referenced this pull request Jul 3, 2022
robot-clickhouse pushed a commit that referenced this pull request Jul 3, 2022
@azat azat deleted the Map-combinator-fix branch July 4, 2022 05:44
Avogar added a commit that referenced this pull request Jul 4, 2022
Backport #38748 to 22.3: Fix use-after-free for Map combinator that leads to incorrect result
Avogar added a commit that referenced this pull request Jul 4, 2022
Backport #38748 to 22.4: Fix use-after-free for Map combinator that leads to incorrect result
Avogar added a commit that referenced this pull request Jul 4, 2022
Backport #38748 to 22.6: Fix use-after-free for Map combinator that leads to incorrect result
Avogar added a commit that referenced this pull request Jul 4, 2022
Backport #38748 to 22.5: Fix use-after-free for Map combinator that leads to incorrect result
@Felixoid Felixoid added the pr-backports-created Backport PRs are successfully created, it won't be processed by CI script anymore label Jul 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-backports-created Backport PRs are successfully created, it won't be processed by CI script anymore pr-bugfix Pull request with bugfix, not backported by default pr-must-backport Pull request should be backported intentionally. Use this label with great care!

Projects

None yet

Development

Successfully merging this pull request may close these issues.

sumMap values return strangely huge number when query on distributed table with long data range

4 participants