-
Notifications
You must be signed in to change notification settings - Fork 8.3k
Error on JSON import #19719
Description
Describe the bug
When importing a jsonlines file (gzip-compressed around 2GB), I always get an error message indicating some issues regarding memory allocation. The machine has 512GB RAM and most of it is not used, therefore, it could be related with the configuration or just simply a software bug. I am using the default config, but did not change it since a year (has something changed in the default values that could be relevant here?). Either way, I am not sure how to fix it. I am using the most recent version (Client & Server 21.1.2.15) and I can reproduce the error. However, due to copyright issues, I cannot share the original dataset.
I import the file as follows
zcat dataset.jsonl.gz|clickhouse-client --input_format_skip_unknown_fields=1 --input_format_allow_errors_num=1000 -q "INSERT INTO mydataset.mytable FORMAT JSONEachRow"Error message and/or stacktrace
And after a while, I always receive the following error message:
Code: 49, e.displayText() = DB::Exception: Too large size (18446744071562077831) passed to allocator. It indicates an error., Stack trace (when copying this message, always include the lines below):
0. DB::Exception::Exception<unsigned long&>(int, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, unsigned long&) @ 0x864dfc7 in /usr/bin/clickhouse
1. Allocator<false, false>::checkSize(unsigned long) @ 0x864dd7e in /usr/bin/clickhouse
2. Allocator<false, false>::realloc(void*, unsigned long, unsigned long, unsigned long) @ 0x865c784 in /usr/bin/clickhouse
3. DB::loadAtPosition(DB::ReadBuffer&, DB::Memory<Allocator<false, false> >&, char*&) @ 0x865bbfa in /usr/bin/clickhouse
4. DB::fileSegmentationEngineJSONEachRowImpl(DB::ReadBuffer&, DB::Memory<Allocator<false, false> >&, unsigned long) @ 0xf98a612 in /usr/bin/clickhouse
5. DB::ParallelParsingInputFormat::segmentatorThreadFunction(std::__1::shared_ptr<DB::ThreadGroupStatus>) @ 0xf9aed84 in /usr/bin/clickhouse
6. ThreadFromGlobalPool::ThreadFromGlobalPool<void (DB::ParallelParsingInputFormat::*)(std::__1::shared_ptr<DB::ThreadGroupStatus>), DB::ParallelParsingInputFormat*, std::__1::shared_ptr<DB::ThreadGroupStatus> >(void (DB::ParallelParsingInputFormat::*&&)(std::__1::shared_ptr<DB::ThreadGroupStatus>), DB::ParallelParsingInputFormat*&&, std::__1::shared_ptr<DB::ThreadGroupStatus>&&)::'lambda'()::operator()() @ 0xf8a9677 in /usr/bin/clickhouse
7. ThreadPoolImpl<std::__1::thread>::worker(std::__1::__list_iterator<std::__1::thread, void*>) @ 0x86415ed in /usr/bin/clickhouse
8. ? @ 0x86451a3 in /usr/bin/clickhouse
9. start_thread @ 0x9609 in /usr/lib/x86_64-linux-gnu/libpthread-2.31.so
10. clone @ 0x122293 in /usr/lib/x86_64-linux-gnu/libc-2.31.so
(version 21.1.2.15 (official build))
Code: 49, e.displayText() = DB::Exception: Too large size (18446744071562077831) passed to allocator. It indicates an error., Stack trace (when copying this message, always include the lines below):
0. DB::Exception::Exception<unsigned long&>(int, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, unsigned long&) @ 0x864dfc7 in /usr/bin/clickhouse
1. Allocator<false, false>::checkSize(unsigned long) @ 0x864dd7e in /usr/bin/clickhouse
2. Allocator<false, false>::realloc(void*, unsigned long, unsigned long, unsigned long) @ 0x865c784 in /usr/bin/clickhouse
3. DB::loadAtPosition(DB::ReadBuffer&, DB::Memory<Allocator<false, false> >&, char*&) @ 0x865bbfa in /usr/bin/clickhouse
4. DB::fileSegmentationEngineJSONEachRowImpl(DB::ReadBuffer&, DB::Memory<Allocator<false, false> >&, unsigned long) @ 0xf98a612 in /usr/bin/clickhouse
5. DB::ParallelParsingInputFormat::segmentatorThreadFunction(std::__1::shared_ptr<DB::ThreadGroupStatus>) @ 0xf9aed84 in /usr/bin/clickhouse
6. ThreadFromGlobalPool::ThreadFromGlobalPool<void (DB::ParallelParsingInputFormat::*)(std::__1::shared_ptr<DB::ThreadGroupStatus>), DB::ParallelParsingInputFormat*, std::__1::shared_ptr<DB::ThreadGroupStatus> >(void (DB::ParallelParsingInputFormat::*&&)(std::__1::shared_ptr<DB::ThreadGroupStatus>), DB::ParallelParsingInputFormat*&&, std::__1::shared_ptr<DB::ThreadGroupStatus>&&)::'lambda'()::operator()() @ 0xf8a9677 in /usr/bin/clickhouse
7. ThreadPoolImpl<std::__1::thread>::worker(std::__1::__list_iterator<std::__1::thread, void*>) @ 0x86415ed in /usr/bin/clickhouse
8. ? @ 0x86451a3 in /usr/bin/clickhouse
9. start_thread @ 0x9609 in /usr/lib/x86_64-linux-gnu/libpthread-2.31.so
10. clone @ 0x122293 in /usr/lib/x86_64-linux-gnu/libc-2.31.so
(version 21.1.2.15 (official build))
Code: 49. DB::Exception: Too large size (18446744071562077831) passed to allocator. It indicates an error.: data for INSERT was parsed from stdin