-
Notifications
You must be signed in to change notification settings - Fork 8.3k
S3 table engine compression doesn't work. #13871
Copy link
Copy link
Closed
Labels
bugConfirmed user-visible misbehaviour in official releaseConfirmed user-visible misbehaviour in official releasecomp-object-storageObject storage connectivity (S3/GCS/Azure) including credentials, retries, multipart, etc.Object storage connectivity (S3/GCS/Azure) including credentials, retries, multipart, etc.
Description
Describe the bug
Clickhouse expect last parameter of S3 table engine to be data format, not compression type.
How to reproduce
Clickhouse version 20.5.4.40
CREATE TABLE test_s3(S_SUPPKEY UInt32, S_NAME String, S_ADDRESS String, S_CITY LowCardinality(String), S_NATION LowCardinality(String), S_REGION LowCardinality(String), S_PHONE String) ENGINE=S3('https://s3.amazonaws.com/{some_bucket_path}.csv.gz','CSV','gzip');
SELECT * FROM test_s3;
Received exception from server (version 20.5.4):
Code: 73. DB::Exception: Received from localhost:9000. DB::Exception: Unknown format gzip.
0 rows in set. Elapsed: 0.001 sec.
Error message and/or stacktrace
Code: 73, e.displayText() = DB::Exception: Unknown format gzip (version 20.5.4.40 (official build)) (from 127.0.0.1:56140) (in query: select * from test_s3;), Stack trace (when copying this message, always include the lines below):
0. Poco::Exception::Exception(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int) @ 0x11b9acc0 in /usr/bin/clickhouse
1. DB::Exception::Exception(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int) @ 0x9f3e2cd in /usr/bin/clickhouse
2. ? @ 0xf36cd04 in /usr/bin/clickhouse
3. DB::FormatFactory::getInput(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, DB::ReadBuffer&, DB::Block const&, DB::Context const&, unsigned long, std::__1::function<void ()>) const @ 0xf36b0c9 in /usr/bin/clickhouse
4. DB::StorageS3::read(std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > const&, DB::SelectQueryInfo const&, DB::Context const&, DB::QueryProcessingStage::Enum, unsigned long, unsigned int) @ 0xefb7520 in /usr/bin/clickhouse
5. DB::ReadFromStorageStep::ReadFromStorageStep(DB::TableStructureReadLockHolder, DB::SelectQueryOptions, std::__1::shared_ptr<DB::IStorage>, std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > const&, DB::SelectQueryInfo const&, std::__1::shared_ptr<DB::Context>, DB::QueryProcessingStage::Enum, unsigned long, unsigned long) @ 0xf68f3fd in /usr/bin/clickhouse
6. DB::InterpreterSelectQuery::executeFetchColumns(DB::QueryProcessingStage::Enum, DB::QueryPlan&, std::__1::shared_ptr<DB::PrewhereInfo> const&, std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > const&) @ 0xea608c1 in /usr/bin/clickhouse
7. DB::InterpreterSelectQuery::executeImpl(DB::QueryPlan&, std::__1::shared_ptr<DB::IBlockInputStream> const&, std::__1::optional<DB::Pipe>) @ 0xea647a2 in /usr/bin/clickhouse
8. DB::InterpreterSelectQuery::buildQueryPlan(DB::QueryPlan&) @ 0xea65d54 in /usr/bin/clickhouse
9. DB::InterpreterSelectWithUnionQuery::buildQueryPlan(DB::QueryPlan&) @ 0xebce0f4 in /usr/bin/clickhouse
10. DB::InterpreterSelectWithUnionQuery::execute() @ 0xebce44c in /usr/bin/clickhouse
11. ? @ 0xed3c7ed in /usr/bin/clickhouse
12. DB::executeQuery(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, DB::Context&, bool, DB::QueryProcessingStage::Enum, bool) @ 0xed3fe2a in /usr/bin/clickhouse
13. DB::TCPHandler::runImpl() @ 0xf36443c in /usr/bin/clickhouse
14. DB::TCPHandler::run() @ 0xf365190 in /usr/bin/clickhouse
15. Poco::Net::TCPServerConnection::start() @ 0x11ab8aeb in /usr/bin/clickhouse
16. Poco::Net::TCPServerDispatcher::run() @ 0x11ab8f7b in /usr/bin/clickhouse
17. Poco::PooledThread::run() @ 0x11c37aa6 in /usr/bin/clickhouse
18. Poco::ThreadImpl::runnableEntry(void*) @ 0x11c32ea0 in /usr/bin/clickhouse
19. start_thread @ 0x76db in /lib/x86_64-linux-gnu/libpthread-2.27.so
20. clone @ 0x121a3f in /lib/x86_64-linux-gnu/libc-2.27.so
Additional context
Auto detect of compression type from file extension doesnt work either.
CREATE TABLE test_s3(S_SUPPKEY UInt32, S_NAME String, S_ADDRESS String, S_CITY LowCardinality(String), S_NATION LowCardinality(String), S_REGION LowCardinality(String), S_PHONE String) ENGINE=S3('https://s3.amazonaws.com/{some_bucket_path}.csv.gz','CSV');
Code: 27, e.displayText() = DB::Exception: Cannot parse input: expected ',' before: '��\b\bvY!_\0�supplier.csv\0��G��J�����*��\\���R9笝r�Y��5���R���=U�;�3fh6�\b�\0�u���~�������q��i|�l��d}4=.�מ[(F���WS�V���~�����R�\\"���}���8�M}�s�����T�?��Ҽ�}��"}�': (at row 1)
Row 1:
Column 0, name: S_SUPPKEY, type: UInt32, ERROR: text "<0x1F>�<BACKSPACE><BACKSPACE>vY!_<ASCII NUL><0x03>" is not like UInt32
: While executing S3 (version 20.5.4.40 (official build)) (from 127.0.0.1:56140) (in query: select * from test_s3;), Stack trace (when copying this message, always include the lines below):
0. Poco::Exception::Exception(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int) @ 0x11b9acc0 in /usr/bin/clickhouse
1. DB::Exception::Exception(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int) @ 0x9f3e2cd in /usr/bin/clickhouse
2. ? @ 0x9f808fd in /usr/bin/clickhouse
3. ? @ 0xf45bd5d in /usr/bin/clickhouse
4. DB::CSVRowInputFormat::readRow(std::__1::vector<COW<DB::IColumn>::mutable_ptr<DB::IColumn>, std::__1::allocator<COW<DB::IColumn>::mutable_ptr<DB::IColumn> > >&, DB::RowReadExtension&) @ 0xf45d179 in /usr/bin/clickhouse
5. DB::IRowInputFormat::generate() @ 0xf42cab1 in /usr/bin/clickhouse
6. DB::ISource::work() @ 0xf3aa2bb in /usr/bin/clickhouse
7. DB::InputStreamFromInputFormat::readImpl() @ 0xf36fe8d in /usr/bin/clickhouse
8. DB::IBlockInputStream::read() @ 0xe65fb3d in /usr/bin/clickhouse
9. DB::ParallelParsingBlockInputStream::parserThreadFunction(std::__1::shared_ptr<DB::ThreadGroupStatus>, unsigned long) @ 0xf374738 in /usr/bin/clickhouse
10. ? @ 0xf375494 in /usr/bin/clickhouse
11. ThreadPoolImpl<ThreadFromGlobalPool>::worker(std::__1::__list_iterator<ThreadFromGlobalPool, void*>) @ 0x9f6ca37 in /usr/bin/clickhouse
12. ThreadFromGlobalPool::ThreadFromGlobalPool<void ThreadPoolImpl<ThreadFromGlobalPool>::scheduleImpl<void>(std::__1::function<void ()>, int, std::__1::optional<unsigned long>)::'lambda1'()>(void&&, void ThreadPoolImpl<ThreadFromGlobalPool>::scheduleImpl<void>(std::__1::function<void ()>, int, std::__1::optional<unsigned long>)::'lambda1'()&&...)::'lambda'()::operator()() const @ 0x9f6d1aa in /usr/bin/clickhouse
13. ThreadPoolImpl<std::__1::thread>::worker(std::__1::__list_iterator<std::__1::thread, void*>) @ 0x9f6bf47 in /usr/bin/clickhouse
14. ? @ 0x9f6a433 in /usr/bin/clickhouse
15. start_thread @ 0x76db in /lib/x86_64-linux-gnu/libpthread-2.27.so
16. clone @ 0x121a3f in /lib/x86_64-linux-gnu/libc-2.27.so
Received exception from server (version 20.5.4):
Code: 27. DB::Exception: Received from localhost:9000. DB::Exception: Cannot parse input: expected ',' before: '��\b\bvY!_\0�supplier.csv\0��G��J�����*��\\���R9笝r�Y��5���R���=U�;�3fh6�\b�\0�u���~�������q��i|�l��d}4=.�מ[(F���WS�V���~�����R�\\"���}���8�M}�s�����T�?��Ҽ�}��"}�': (at row 1)
Row 1:
Column 0, name: S_SUPPKEY, type: UInt32, ERROR: text "<0x1F>�<BACKSPACE><BACKSPACE>vY!_<ASCII NUL><0x03>" is not like UInt32
: While executing S3.
0 rows in set. Elapsed: 0.702 sec.
If we in both CREATE table queries replace S3 with URL, SELECT queries will work fine.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugConfirmed user-visible misbehaviour in official releaseConfirmed user-visible misbehaviour in official releasecomp-object-storageObject storage connectivity (S3/GCS/Azure) including credentials, retries, multipart, etc.Object storage connectivity (S3/GCS/Azure) including credentials, retries, multipart, etc.