Implement function byteSize#18579
Conversation
src/Functions/byteSize.cpp
Outdated
| } | ||
|
|
||
| String getName() const override { return name; } | ||
| bool isDeterministic() const override { return false; } |
src/Functions/byteSize.cpp
Outdated
|
|
||
| ColumnPtr executeImpl(const ColumnsWithTypeAndName & arguments, const DataTypePtr &, size_t input_rows_count) const override | ||
| { | ||
| auto result_col = ColumnUInt64::create(input_rows_count, 0); |
There was a problem hiding this comment.
Maybe avoid zero initialization by filling with the size of the first argument?
src/Functions/byteSize.cpp
Outdated
| else if (byteSizeByColumn(data_type, column, vec_res)) | ||
| ; | ||
| else | ||
| LOG_WARNING(&Poco::Logger::get("FunctionByteSize"), |
There was a problem hiding this comment.
It should throw exception.
| "byteSize for \"{}\" is not supported.", data_type->getName()); | ||
| } | ||
|
|
||
| static bool byteSizeByDataType(const IDataType * data_type, UInt64 & byte_size) |
There was a problem hiding this comment.
There is
IDataType::isValueUnambiguouslyRepresentedInFixedSizeContiguousMemoryRegion
and
IDataType::getSizeOfValueInMemory
src/Functions/byteSize.cpp
Outdated
| return false; | ||
| } | ||
|
|
||
| static bool byteSizeByTypeId(TypeIndex type_id, UInt64 & byte_size) |
There was a problem hiding this comment.
It will allow to remove this code...
src/Functions/byteSize.cpp
Outdated
| ColumnString::Offset prev_offset = 0; | ||
| for (size_t i = 0; i < vec_size; ++i) | ||
| { | ||
| vec_res[i] += offsets[i] - prev_offset + sizeof(offsets[0]); |
There was a problem hiding this comment.
One minor issue with type aliasing.
As vec_res and offsets have the same type, compiler cannot rely to the fact that they are not aliased (aka not __restrict). It will do extra load of offsets[i] from memory.
To solve, you can save offsets[i] to the temporary variable on stack.
src/Functions/byteSize.cpp
Outdated
| for (size_t i = 0; i < vec_size; ++i) | ||
| vec_res[i] += byte_size; | ||
| } | ||
| else if (byteSizeByColumn(data_type, column, vec_res)) |
There was a problem hiding this comment.
How constant columns are processed?
src/Functions/byteSize.cpp
Outdated
|
|
||
| static UInt64 byteSizeForNestedItem(const IColumn & column, size_t idx) | ||
| { | ||
| if (const ColumnString * col_str = checkAndGetColumn<ColumnString>(&column)) |
There was a problem hiding this comment.
It's quite strange that only a few types are supported for array elements.
What about Array(Tuple(Nullable(FixedString(N)), ...))?
alexey-milovidov
left a comment
There was a problem hiding this comment.
Some changes needed.
@alexey-milovidov |
|
@pingyu I also decided to slightly change the way of implementation (add a method to IColumn), hope you will appreciate. |
@alexey-milovidov |
I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Added function
byteSizeto estimate of uncompressed byte size of its arguments in memory.E.g. for UInt32 argument it will return constant 4, for String argument - the string length + 9.
The function can take multiple arguments. The typical application is byteSize(*).
Close #17540