Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
c1abf5d
*added adapters' boilerplate for Lzma buffers, *added submodule to gi…
Oct 31, 2020
5982f9f
removed extra record from cmake
Nov 1, 2020
731e274
changed moment of stream object initialization
Nov 1, 2020
be2b002
fixed cmake for building, added test for compression method, added ba…
Nov 1, 2020
495cd47
fixed compressor testing, added base logic for compressor and decompr…
Nov 1, 2020
805bfd2
resolved git modules
Nov 2, 2020
2ad01c5
fixed codestyle, added record to changelog
Nov 2, 2020
ba6fa5d
fixed whitespaces, added hidden submodule file
Nov 2, 2020
8098f86
added record for fasttest
Nov 2, 2020
8ecf1d0
attempt to update fasttest version
Nov 3, 2020
f9cebbf
added newline for files
Nov 3, 2020
986d13d
replaced null with nullptr
Nov 4, 2020
53a064b
added eof initializing in constuctor
Nov 4, 2020
87cc354
fix codestyle
Nov 4, 2020
ceda5cb
fix codestyle, resolvec conflict
Nov 4, 2020
73e5d28
regenerated ya.make
Nov 7, 2020
268f289
resolved conflict
Nov 7, 2020
6286775
Merge branch 'master' of https://github.com/ClickHouse/ClickHouse int…
Nov 9, 2020
f999ea2
renamed files, added new library, changed error codes, added tests fo…
Nov 9, 2020
124ef2f
added and successfully passed tests for content encoding and file() f…
Nov 11, 2020
9479052
Merge branch 'master' of https://github.com/ClickHouse/ClickHouse int…
Nov 11, 2020
55d05c9
fixed style, xz check fasttest skipped, removed fast-lzma2
Nov 11, 2020
1b06fd9
regenerated ya.make
Nov 11, 2020
fe5800a
remove commented code
Nov 11, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .gitmodules
Original file line number Diff line number Diff line change
Expand Up @@ -193,3 +193,6 @@
[submodule "contrib/miniselect"]
path = contrib/miniselect
url = https://github.com/danlark1/miniselect
[submodule "contrib/xz"]
path = contrib/xz
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uses spaces for indent, while all other lines uses tabs (since git submodule add uses them)

url = https://github.com/xz-mirror/xz
1 change: 1 addition & 0 deletions contrib/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,7 @@ add_subdirectory (murmurhash)
add_subdirectory (replxx-cmake)
add_subdirectory (ryu-cmake)
add_subdirectory (unixodbc-cmake)
add_subdirectory (xz)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like you've forget to add submodule

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really, this is just to know where to fetch the submodule from.

You need to run smth like this:

git submodule add https://github.com/xz-mirror/xz contrib/xz

And this will create a special file contrib/xz in the git index (but in the filesystem you will see it as a content of the repository), that will contain the HEAD for the submodule

Copy link
Copy Markdown
Member

@azat azat Nov 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also to make fasttest update your submodule you need to add contrib/xz into

SUBMODULES_TO_UPDATE=(contrib/boost contrib/zlib-ng contrib/libxml2 contrib/poco contrib/libunwind contrib/ryu contrib/fmtlib contrib/base64 contrib/cctz contrib/libcpuid contrib/double-conversion contrib/libcxx contrib/libcxxabi contrib/libc-headers contrib/lz4 contrib/zstd contrib/fastops contrib/rapidjson contrib/re2 contrib/sparsehash-c11 contrib/croaring)

Without fasttest other tests won't be run

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also it's possible to disable xz in fasttest.


add_subdirectory (poco-cmake)
add_subdirectory (croaring-cmake)
Expand Down
1 change: 1 addition & 0 deletions contrib/xz
Submodule xz added at 869b9d
2 changes: 1 addition & 1 deletion docker/test/fasttest/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# docker build -t yandex/clickhouse-fasttest .
FROM ubuntu:19.10
FROM ubuntu:20.04

ENV DEBIAN_FRONTEND=noninteractive LLVM_VERSION=10

Expand Down
1 change: 1 addition & 0 deletions docker/test/fasttest/run.sh
Original file line number Diff line number Diff line change
Expand Up @@ -268,6 +268,7 @@ TESTS_TO_SKIP=(
protobuf
secure
sha256
xz

# Not sure why these two fail even in sequential mode. Disabled for now
# to make some progress.
Expand Down
7 changes: 7 additions & 0 deletions src/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -330,6 +330,13 @@ if (ZSTD_LIBRARY)
endif ()
endif()

set (LZMA_LIBRARY liblzma)
set (LZMA_INCLUDE_DIR ${ClickHouse_SOURCE_DIR}/contrib/xz/src/liblzma/api)
Comment on lines +333 to +334
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And I guess it is better to move this out into separate contrib/xz-cmake/CMakeLists.txt, and add an interface library.
One of similar files is contrib/protobuf-cmake/CMakeLists.txt (protobuf has it's own cmake rules, while in clickhouse there is just a wrapper that install some options)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What should I do to pass description check?
I've added record to CHANCHELOG.md

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What should I do to pass description check?

Update description of PR and include changelog entry using specified format.
There is template for PRs - https://raw.githubusercontent.com/ClickHouse/ClickHouse/master/.github/PULL_REQUEST_TEMPLATE.md

Just copy it and modify the template, but leave the format.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's much better to provide our own CMakeLists.

CMake often cannot be safely reused (without building unneeded targets, polluting build options).

if (LZMA_LIBRARY)
target_link_libraries (clickhouse_common_io PUBLIC ${LZMA_LIBRARY})
target_include_directories (clickhouse_common_io SYSTEM BEFORE PUBLIC ${LZMA_INCLUDE_DIR})
endif()

if (USE_ICU)
dbms_target_link_libraries (PRIVATE ${ICU_LIBRARIES})
dbms_target_include_directories (SYSTEM PRIVATE ${ICU_INCLUDE_DIRS})
Expand Down
23 changes: 12 additions & 11 deletions src/Common/ErrorCodes.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -519,34 +519,35 @@
M(550, CONDITIONAL_TREE_PARENT_NOT_FOUND) \
M(551, ILLEGAL_PROJECTION_MANIPULATOR) \
M(552, UNRECOGNIZED_ARGUMENTS) \
\
M(553, LZMA_STREAM_ENCODER_FAILED) \
M(554, LZMA_STREAM_DECODER_FAILED) \
\
M(999, KEEPER_EXCEPTION) \
M(1000, POCO_EXCEPTION) \
M(1001, STD_EXCEPTION) \
M(1002, UNKNOWN_EXCEPTION) \
M(1002, UNKNOWN_EXCEPTION)

/* See END */

namespace DB
{

namespace ErrorCodes
{
#define M(VALUE, NAME) extern const Value NAME = VALUE;
APPLY_FOR_ERROR_CODES(M)
#undef M
#define M(VALUE, NAME) extern const Value NAME = VALUE;
APPLY_FOR_ERROR_CODES(M)
#undef M

constexpr Value END = 3000;
std::atomic<Value> values[END + 1] {};
std::atomic<Value> values[END + 1]{};

struct ErrorCodesNames
{
std::string_view names[END + 1];
ErrorCodesNames()
{
#define M(VALUE, NAME) names[VALUE] = std::string_view(#NAME);
APPLY_FOR_ERROR_CODES(M)
#undef M
#define M(VALUE, NAME) names[VALUE] = std::string_view(#NAME);
APPLY_FOR_ERROR_CODES(M)
#undef M
}
} error_codes_names;

Expand All @@ -557,7 +558,7 @@ namespace ErrorCodes
return error_codes_names.names[error_code];
}

ErrorCode end() { return END+1; }
ErrorCode end() { return END + 1; }
}

}
45 changes: 25 additions & 20 deletions src/IO/CompressionMethod.cpp
Original file line number Diff line number Diff line change
@@ -1,11 +1,13 @@
#include <IO/CompressionMethod.h>

#include <IO/BrotliReadBuffer.h>
#include <IO/BrotliWriteBuffer.h>
#include <IO/LZMADeflatingWriteBuffer.h>
#include <IO/LZMAInflatingReadBuffer.h>
#include <IO/ReadBuffer.h>
#include <IO/WriteBuffer.h>
#include <IO/ZlibInflatingReadBuffer.h>
#include <IO/ZlibDeflatingWriteBuffer.h>
#include <IO/BrotliReadBuffer.h>
#include <IO/BrotliWriteBuffer.h>
#include <IO/ZlibInflatingReadBuffer.h>

#if !defined(ARCADIA_BUILD)
# include <Common/config.h>
Expand All @@ -14,7 +16,6 @@

namespace DB
{

namespace ErrorCodes
{
extern const int NOT_IMPLEMENTED;
Expand All @@ -25,10 +26,16 @@ std::string toContentEncodingName(CompressionMethod method)
{
switch (method)
{
case CompressionMethod::Gzip: return "gzip";
case CompressionMethod::Zlib: return "deflate";
case CompressionMethod::Brotli: return "br";
case CompressionMethod::None: return "";
case CompressionMethod::Gzip:
return "gzip";
case CompressionMethod::Zlib:
return "deflate";
case CompressionMethod::Brotli:
return "br";
case CompressionMethod::Xz:
return "xz";
case CompressionMethod::None:
return "";
}
__builtin_unreachable();
}
Expand All @@ -52,27 +59,28 @@ CompressionMethod chooseCompressionMethod(const std::string & path, const std::s
return CompressionMethod::Zlib;
if (*method_str == "brotli" || *method_str == "br")
return CompressionMethod::Brotli;
if (*method_str == "LZMA" || *method_str == "xz")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A small note... xz is actually LZMA2. But why do we need this name? Maybe just leave xz only.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there the only place to remove LZMA?
Is it ok to leave all buffer filenames with LZMA* prefix?

return CompressionMethod::Xz;
if (hint.empty() || hint == "auto" || hint == "none")
return CompressionMethod::None;

throw Exception("Unknown compression method " + hint + ". Only 'auto', 'none', 'gzip', 'br' are supported as compression methods",
throw Exception(
"Unknown compression method " + hint + ". Only 'auto', 'none', 'gzip', 'br', 'xz' are supported as compression methods",
ErrorCodes::NOT_IMPLEMENTED);
}


std::unique_ptr<ReadBuffer> wrapReadBufferWithCompressionMethod(
std::unique_ptr<ReadBuffer> nested,
CompressionMethod method,
size_t buf_size,
char * existing_memory,
size_t alignment)
std::unique_ptr<ReadBuffer> nested, CompressionMethod method, size_t buf_size, char * existing_memory, size_t alignment)
{
if (method == CompressionMethod::Gzip || method == CompressionMethod::Zlib)
return std::make_unique<ZlibInflatingReadBuffer>(std::move(nested), method, buf_size, existing_memory, alignment);
#if USE_BROTLI
if (method == CompressionMethod::Brotli)
return std::make_unique<BrotliReadBuffer>(std::move(nested), buf_size, existing_memory, alignment);
#endif
if (method == CompressionMethod::Xz)
return std::make_unique<LZMAInflatingReadBuffer>(std::move(nested), buf_size, existing_memory, alignment);

if (method == CompressionMethod::None)
return nested;
Expand All @@ -82,12 +90,7 @@ std::unique_ptr<ReadBuffer> wrapReadBufferWithCompressionMethod(


std::unique_ptr<WriteBuffer> wrapWriteBufferWithCompressionMethod(
std::unique_ptr<WriteBuffer> nested,
CompressionMethod method,
int level,
size_t buf_size,
char * existing_memory,
size_t alignment)
std::unique_ptr<WriteBuffer> nested, CompressionMethod method, int level, size_t buf_size, char * existing_memory, size_t alignment)
{
if (method == DB::CompressionMethod::Gzip || method == CompressionMethod::Zlib)
return std::make_unique<ZlibDeflatingWriteBuffer>(std::move(nested), method, level, buf_size, existing_memory, alignment);
Expand All @@ -96,6 +99,8 @@ std::unique_ptr<WriteBuffer> wrapWriteBufferWithCompressionMethod(
if (method == DB::CompressionMethod::Brotli)
return std::make_unique<BrotliWriteBuffer>(std::move(nested), level, buf_size, existing_memory, alignment);
#endif
if (method == CompressionMethod::Xz)
return std::make_unique<LZMADeflatingWriteBuffer>(std::move(nested), level, buf_size, existing_memory, alignment);

if (method == CompressionMethod::None)
return nested;
Expand Down
6 changes: 4 additions & 2 deletions src/IO/CompressionMethod.h
Original file line number Diff line number Diff line change
@@ -1,14 +1,13 @@
#pragma once

#include <string>
#include <memory>
#include <string>

#include <Core/Defines.h>


namespace DB
{

class ReadBuffer;
class WriteBuffer;

Expand All @@ -26,6 +25,9 @@ enum class CompressionMethod
/// DEFLATE compression with zlib header and Adler32 checksum.
/// This option corresponds to HTTP Content-Encoding: deflate.
Zlib,
/// LZMA2-based content compression
/// This option corresponds to HTTP Content-Encoding: xz
Xz,
Brotli
};

Expand Down
125 changes: 125 additions & 0 deletions src/IO/LZMADeflatingWriteBuffer.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,125 @@
#include <IO/LZMADeflatingWriteBuffer.h>


namespace DB
{
namespace ErrorCodes
{
extern const int LZMA_STREAM_ENCODER_FAILED;
}

LZMADeflatingWriteBuffer::LZMADeflatingWriteBuffer(
std::unique_ptr<WriteBuffer> out_, int compression_level, size_t buf_size, char * existing_memory, size_t alignment)
: BufferWithOwnMemory<WriteBuffer>(buf_size, existing_memory, alignment), out(std::move(out_))
{

lstr = LZMA_STREAM_INIT;
lstr.allocator = nullptr;
lstr.next_in = nullptr;
lstr.avail_in = 0;
lstr.next_out = nullptr;
lstr.avail_out = 0;

// options for further compression
lzma_options_lzma opt_lzma2;
if (lzma_lzma_preset(&opt_lzma2, compression_level))
throw Exception(ErrorCodes::LZMA_STREAM_ENCODER_FAILED, "lzma preset failed: lzma version: {}", LZMA_VERSION_STRING);


// LZMA_FILTER_X86 -
// LZMA2 - codec for *.xz files compression; LZMA is not suitable for this purpose
// VLI - variable length integer (in *.xz most integers encoded as VLI)
// LZMA_VLI_UNKNOWN (UINT64_MAX) - VLI value to denote that the value is unknown
lzma_filter filters[] = {
{.id = LZMA_FILTER_X86, .options = nullptr},
{.id = LZMA_FILTER_LZMA2, .options = &opt_lzma2},
{.id = LZMA_VLI_UNKNOWN, .options = nullptr},
};
lzma_ret ret = lzma_stream_encoder(&lstr, filters, LZMA_CHECK_CRC64);

if (ret != LZMA_OK)
throw Exception(
ErrorCodes::LZMA_STREAM_ENCODER_FAILED,
"lzma stream encoder init failed: error code: {} lzma version: {}",
ret,
LZMA_VERSION_STRING);
}

LZMADeflatingWriteBuffer::~LZMADeflatingWriteBuffer()
{
try
{
finish();

lzma_end(&lstr);
}
catch (...)
{
tryLogCurrentException(__PRETTY_FUNCTION__);
}
}

void LZMADeflatingWriteBuffer::nextImpl()
{
if (!offset())
return;

lstr.next_in = reinterpret_cast<unsigned char *>(working_buffer.begin());
lstr.avail_in = offset();

lzma_action action = LZMA_RUN;
do
{
out->nextIfAtEnd();
lstr.next_out = reinterpret_cast<unsigned char *>(out->position());
lstr.avail_out = out->buffer().end() - out->position();

lzma_ret ret = lzma_code(&lstr, action);
out->position() = out->buffer().end() - lstr.avail_out;

if (ret == LZMA_STREAM_END)
return;

if (ret != LZMA_OK)
throw Exception(
ErrorCodes::LZMA_STREAM_ENCODER_FAILED,
"lzma stream encoding failed: error code: {}; lzma_version: {}",
ret,
LZMA_VERSION_STRING);

} while (lstr.avail_in > 0 || lstr.avail_out == 0);
}


void LZMADeflatingWriteBuffer::finish()
{
if (finished)
return;

next();

do
{
out->nextIfAtEnd();
lstr.next_out = reinterpret_cast<unsigned char *>(out->position());
lstr.avail_out = out->buffer().end() - out->position();

lzma_ret ret = lzma_code(&lstr, LZMA_FINISH);
out->position() = out->buffer().end() - lstr.avail_out;

if (ret == LZMA_STREAM_END)
{
finished = true;
return;
}

if (ret != LZMA_OK)
throw Exception(
ErrorCodes::LZMA_STREAM_ENCODER_FAILED,
"lzma stream encoding failed: error code: {}; lzma version: {}",
ret,
LZMA_VERSION_STRING);

} while (lstr.avail_out == 0);
}
}
32 changes: 32 additions & 0 deletions src/IO/LZMADeflatingWriteBuffer.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
#pragma once

#include <IO/BufferWithOwnMemory.h>
#include <IO/WriteBuffer.h>

#include <lzma.h>

namespace DB
{
/// Performs compression using lzma library and writes compressed data to out_ WriteBuffer.
class LZMADeflatingWriteBuffer : public BufferWithOwnMemory<WriteBuffer>
{
public:
LZMADeflatingWriteBuffer(
std::unique_ptr<WriteBuffer> out_,
int compression_level,
size_t buf_size = DBMS_DEFAULT_BUFFER_SIZE,
char * existing_memory = nullptr,
size_t alignment = 0);

void finish();

~LZMADeflatingWriteBuffer() override;

private:
void nextImpl() override;

std::unique_ptr<WriteBuffer> out;
lzma_stream lstr;
bool finished = false;
};
}
Loading