Skip to content

Error on JSON import #19719

@inkrement

Description

@inkrement

Describe the bug
When importing a jsonlines file (gzip-compressed around 2GB), I always get an error message indicating some issues regarding memory allocation. The machine has 512GB RAM and most of it is not used, therefore, it could be related with the configuration or just simply a software bug. I am using the default config, but did not change it since a year (has something changed in the default values that could be relevant here?). Either way, I am not sure how to fix it. I am using the most recent version (Client & Server 21.1.2.15) and I can reproduce the error. However, due to copyright issues, I cannot share the original dataset.

I import the file as follows

zcat dataset.jsonl.gz|clickhouse-client --input_format_skip_unknown_fields=1 --input_format_allow_errors_num=1000 -q "INSERT INTO mydataset.mytable FORMAT JSONEachRow"

Error message and/or stacktrace
And after a while, I always receive the following error message:

Code: 49, e.displayText() = DB::Exception: Too large size (18446744071562077831) passed to allocator. It indicates an error., Stack trace (when copying this message, always include the lines below):

0. DB::Exception::Exception<unsigned long&>(int, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, unsigned long&) @ 0x864dfc7 in /usr/bin/clickhouse
1. Allocator<false, false>::checkSize(unsigned long) @ 0x864dd7e in /usr/bin/clickhouse
2. Allocator<false, false>::realloc(void*, unsigned long, unsigned long, unsigned long) @ 0x865c784 in /usr/bin/clickhouse
3. DB::loadAtPosition(DB::ReadBuffer&, DB::Memory<Allocator<false, false> >&, char*&) @ 0x865bbfa in /usr/bin/clickhouse
4. DB::fileSegmentationEngineJSONEachRowImpl(DB::ReadBuffer&, DB::Memory<Allocator<false, false> >&, unsigned long) @ 0xf98a612 in /usr/bin/clickhouse
5. DB::ParallelParsingInputFormat::segmentatorThreadFunction(std::__1::shared_ptr<DB::ThreadGroupStatus>) @ 0xf9aed84 in /usr/bin/clickhouse
6. ThreadFromGlobalPool::ThreadFromGlobalPool<void (DB::ParallelParsingInputFormat::*)(std::__1::shared_ptr<DB::ThreadGroupStatus>), DB::ParallelParsingInputFormat*, std::__1::shared_ptr<DB::ThreadGroupStatus> >(void (DB::ParallelParsingInputFormat::*&&)(std::__1::shared_ptr<DB::ThreadGroupStatus>), DB::ParallelParsingInputFormat*&&, std::__1::shared_ptr<DB::ThreadGroupStatus>&&)::'lambda'()::operator()() @ 0xf8a9677 in /usr/bin/clickhouse
7. ThreadPoolImpl<std::__1::thread>::worker(std::__1::__list_iterator<std::__1::thread, void*>) @ 0x86415ed in /usr/bin/clickhouse
8. ? @ 0x86451a3 in /usr/bin/clickhouse
9. start_thread @ 0x9609 in /usr/lib/x86_64-linux-gnu/libpthread-2.31.so
10. clone @ 0x122293 in /usr/lib/x86_64-linux-gnu/libc-2.31.so
 (version 21.1.2.15 (official build))
Code: 49, e.displayText() = DB::Exception: Too large size (18446744071562077831) passed to allocator. It indicates an error., Stack trace (when copying this message, always include the lines below):

0. DB::Exception::Exception<unsigned long&>(int, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, unsigned long&) @ 0x864dfc7 in /usr/bin/clickhouse
1. Allocator<false, false>::checkSize(unsigned long) @ 0x864dd7e in /usr/bin/clickhouse
2. Allocator<false, false>::realloc(void*, unsigned long, unsigned long, unsigned long) @ 0x865c784 in /usr/bin/clickhouse
3. DB::loadAtPosition(DB::ReadBuffer&, DB::Memory<Allocator<false, false> >&, char*&) @ 0x865bbfa in /usr/bin/clickhouse
4. DB::fileSegmentationEngineJSONEachRowImpl(DB::ReadBuffer&, DB::Memory<Allocator<false, false> >&, unsigned long) @ 0xf98a612 in /usr/bin/clickhouse
5. DB::ParallelParsingInputFormat::segmentatorThreadFunction(std::__1::shared_ptr<DB::ThreadGroupStatus>) @ 0xf9aed84 in /usr/bin/clickhouse
6. ThreadFromGlobalPool::ThreadFromGlobalPool<void (DB::ParallelParsingInputFormat::*)(std::__1::shared_ptr<DB::ThreadGroupStatus>), DB::ParallelParsingInputFormat*, std::__1::shared_ptr<DB::ThreadGroupStatus> >(void (DB::ParallelParsingInputFormat::*&&)(std::__1::shared_ptr<DB::ThreadGroupStatus>), DB::ParallelParsingInputFormat*&&, std::__1::shared_ptr<DB::ThreadGroupStatus>&&)::'lambda'()::operator()() @ 0xf8a9677 in /usr/bin/clickhouse
7. ThreadPoolImpl<std::__1::thread>::worker(std::__1::__list_iterator<std::__1::thread, void*>) @ 0x86415ed in /usr/bin/clickhouse
8. ? @ 0x86451a3 in /usr/bin/clickhouse
9. start_thread @ 0x9609 in /usr/lib/x86_64-linux-gnu/libpthread-2.31.so
10. clone @ 0x122293 in /usr/lib/x86_64-linux-gnu/libc-2.31.so
 (version 21.1.2.15 (official build))
Code: 49. DB::Exception: Too large size (18446744071562077831) passed to allocator. It indicates an error.: data for INSERT was parsed from stdin

Metadata

Metadata

Labels

bugConfirmed user-visible misbehaviour in official releasest-need-infoWe need extra data to continue (waiting for response). Either some details or a repro of the issue.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions