Skip to content

S3 table engine compression doesn't work. #13871

@UnamedRus

Description

@UnamedRus

Describe the bug
Clickhouse expect last parameter of S3 table engine to be data format, not compression type.

How to reproduce
Clickhouse version 20.5.4.40

CREATE TABLE test_s3(S_SUPPKEY       UInt32,         S_NAME          String,         S_ADDRESS       String,         S_CITY          LowCardinality(String),         S_NATION        LowCardinality(String),         S_REGION        LowCardinality(String),         S_PHONE         String) ENGINE=S3('https://s3.amazonaws.com/{some_bucket_path}.csv.gz','CSV','gzip');
SELECT * FROM test_s3;
Received exception from server (version 20.5.4):
Code: 73. DB::Exception: Received from localhost:9000. DB::Exception: Unknown format gzip.

0 rows in set. Elapsed: 0.001 sec.

Error message and/or stacktrace

 Code: 73, e.displayText() = DB::Exception: Unknown format gzip (version 20.5.4.40 (official build)) (from 127.0.0.1:56140) (in query: select * from test_s3;), Stack trace (when copying this message, always include the lines below):

0. Poco::Exception::Exception(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int) @ 0x11b9acc0 in /usr/bin/clickhouse
1. DB::Exception::Exception(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int) @ 0x9f3e2cd in /usr/bin/clickhouse
2. ? @ 0xf36cd04 in /usr/bin/clickhouse
3. DB::FormatFactory::getInput(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, DB::ReadBuffer&, DB::Block const&, DB::Context const&, unsigned long, std::__1::function<void ()>) const @ 0xf36b0c9 in /usr/bin/clickhouse
4. DB::StorageS3::read(std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > const&, DB::SelectQueryInfo const&, DB::Context const&, DB::QueryProcessingStage::Enum, unsigned long, unsigned int) @ 0xefb7520 in /usr/bin/clickhouse
5. DB::ReadFromStorageStep::ReadFromStorageStep(DB::TableStructureReadLockHolder, DB::SelectQueryOptions, std::__1::shared_ptr<DB::IStorage>, std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > const&, DB::SelectQueryInfo const&, std::__1::shared_ptr<DB::Context>, DB::QueryProcessingStage::Enum, unsigned long, unsigned long) @ 0xf68f3fd in /usr/bin/clickhouse
6. DB::InterpreterSelectQuery::executeFetchColumns(DB::QueryProcessingStage::Enum, DB::QueryPlan&, std::__1::shared_ptr<DB::PrewhereInfo> const&, std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > const&) @ 0xea608c1 in /usr/bin/clickhouse
7. DB::InterpreterSelectQuery::executeImpl(DB::QueryPlan&, std::__1::shared_ptr<DB::IBlockInputStream> const&, std::__1::optional<DB::Pipe>) @ 0xea647a2 in /usr/bin/clickhouse
8. DB::InterpreterSelectQuery::buildQueryPlan(DB::QueryPlan&) @ 0xea65d54 in /usr/bin/clickhouse
9. DB::InterpreterSelectWithUnionQuery::buildQueryPlan(DB::QueryPlan&) @ 0xebce0f4 in /usr/bin/clickhouse
10. DB::InterpreterSelectWithUnionQuery::execute() @ 0xebce44c in /usr/bin/clickhouse
11. ? @ 0xed3c7ed in /usr/bin/clickhouse
12. DB::executeQuery(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, DB::Context&, bool, DB::QueryProcessingStage::Enum, bool) @ 0xed3fe2a in /usr/bin/clickhouse
13. DB::TCPHandler::runImpl() @ 0xf36443c in /usr/bin/clickhouse
14. DB::TCPHandler::run() @ 0xf365190 in /usr/bin/clickhouse
15. Poco::Net::TCPServerConnection::start() @ 0x11ab8aeb in /usr/bin/clickhouse
16. Poco::Net::TCPServerDispatcher::run() @ 0x11ab8f7b in /usr/bin/clickhouse
17. Poco::PooledThread::run() @ 0x11c37aa6 in /usr/bin/clickhouse
18. Poco::ThreadImpl::runnableEntry(void*) @ 0x11c32ea0 in /usr/bin/clickhouse
19. start_thread @ 0x76db in /lib/x86_64-linux-gnu/libpthread-2.27.so
20. clone @ 0x121a3f in /lib/x86_64-linux-gnu/libc-2.27.so

Additional context
Auto detect of compression type from file extension doesnt work either.

CREATE TABLE test_s3(S_SUPPKEY       UInt32,         S_NAME          String,         S_ADDRESS       String,         S_CITY          LowCardinality(String),         S_NATION        LowCardinality(String),         S_REGION        LowCardinality(String),         S_PHONE         String) ENGINE=S3('https://s3.amazonaws.com/{some_bucket_path}.csv.gz','CSV');
Code: 27, e.displayText() = DB::Exception: Cannot parse input: expected ',' before: '��\b\bvY!_\0�supplier.csv\0��G��J�����*��\\���R9笝r�Y��5���R���=U�;�3fh6�\b�\0�u���~�������q��i|�l��d}4=.�מ[(F���WS�V���~�����R�\\"���}���8�M}�s�����T�?��Ҽ�}��"}�': (at row 1)

Row 1:
Column 0,   name: S_SUPPKEY, type: UInt32,                 ERROR: text "<0x1F>�<BACKSPACE><BACKSPACE>vY!_<ASCII NUL><0x03>" is not like UInt32

: While executing S3 (version 20.5.4.40 (official build)) (from 127.0.0.1:56140) (in query: select * from test_s3;), Stack trace (when copying this message, always include the lines below):

0. Poco::Exception::Exception(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int) @ 0x11b9acc0 in /usr/bin/clickhouse
1. DB::Exception::Exception(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int) @ 0x9f3e2cd in /usr/bin/clickhouse
2. ? @ 0x9f808fd in /usr/bin/clickhouse
3. ? @ 0xf45bd5d in /usr/bin/clickhouse
4. DB::CSVRowInputFormat::readRow(std::__1::vector<COW<DB::IColumn>::mutable_ptr<DB::IColumn>, std::__1::allocator<COW<DB::IColumn>::mutable_ptr<DB::IColumn> > >&, DB::RowReadExtension&) @ 0xf45d179 in /usr/bin/clickhouse
5. DB::IRowInputFormat::generate() @ 0xf42cab1 in /usr/bin/clickhouse
6. DB::ISource::work() @ 0xf3aa2bb in /usr/bin/clickhouse
7. DB::InputStreamFromInputFormat::readImpl() @ 0xf36fe8d in /usr/bin/clickhouse
8. DB::IBlockInputStream::read() @ 0xe65fb3d in /usr/bin/clickhouse
9. DB::ParallelParsingBlockInputStream::parserThreadFunction(std::__1::shared_ptr<DB::ThreadGroupStatus>, unsigned long) @ 0xf374738 in /usr/bin/clickhouse
10. ? @ 0xf375494 in /usr/bin/clickhouse
11. ThreadPoolImpl<ThreadFromGlobalPool>::worker(std::__1::__list_iterator<ThreadFromGlobalPool, void*>) @ 0x9f6ca37 in /usr/bin/clickhouse
12. ThreadFromGlobalPool::ThreadFromGlobalPool<void ThreadPoolImpl<ThreadFromGlobalPool>::scheduleImpl<void>(std::__1::function<void ()>, int, std::__1::optional<unsigned long>)::'lambda1'()>(void&&, void ThreadPoolImpl<ThreadFromGlobalPool>::scheduleImpl<void>(std::__1::function<void ()>, int, std::__1::optional<unsigned long>)::'lambda1'()&&...)::'lambda'()::operator()() const @ 0x9f6d1aa in /usr/bin/clickhouse
13. ThreadPoolImpl<std::__1::thread>::worker(std::__1::__list_iterator<std::__1::thread, void*>) @ 0x9f6bf47 in /usr/bin/clickhouse
14. ? @ 0x9f6a433 in /usr/bin/clickhouse
15. start_thread @ 0x76db in /lib/x86_64-linux-gnu/libpthread-2.27.so
16. clone @ 0x121a3f in /lib/x86_64-linux-gnu/libc-2.27.so


Received exception from server (version 20.5.4):
Code: 27. DB::Exception: Received from localhost:9000. DB::Exception: Cannot parse input: expected ',' before: '��\b\bvY!_\0�supplier.csv\0��G��J�����*��\\���R9笝r�Y��5���R���=U�;�3fh6�\b�\0�u���~�������q��i|�l��d}4=.�מ[(F���WS�V���~�����R�\\"���}���8�M}�s�����T�?��Ҽ�}��"}�': (at row 1)

Row 1:
Column 0,   name: S_SUPPKEY, type: UInt32,                 ERROR: text "<0x1F>�<BACKSPACE><BACKSPACE>vY!_<ASCII NUL><0x03>" is not like UInt32

: While executing S3.

0 rows in set. Elapsed: 0.702 sec.

If we in both CREATE table queries replace S3 with URL, SELECT queries will work fine.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugConfirmed user-visible misbehaviour in official releasecomp-object-storageObject storage connectivity (S3/GCS/Azure) including credentials, retries, multipart, etc.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions