Skip to content

Implement function byteSize#18579

Merged
alexey-milovidov merged 14 commits intoClickHouse:masterfrom
pingyu:func_byte_size
Jan 3, 2021
Merged

Implement function byteSize#18579
alexey-milovidov merged 14 commits intoClickHouse:masterfrom
pingyu:func_byte_size

Conversation

@pingyu
Copy link
Copy Markdown
Contributor

@pingyu pingyu commented Dec 28, 2020

I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en

Changelog category (leave one):

  • New Feature

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Added function byteSize to estimate of uncompressed byte size of its arguments in memory.
E.g. for UInt32 argument it will return constant 4, for String argument - the string length + 9.
The function can take multiple arguments. The typical application is byteSize(*).

Close #17540

@robot-clickhouse robot-clickhouse added doc-alert pr-feature Pull request with new product feature labels Dec 28, 2020
}

String getName() const override { return name; }
bool isDeterministic() const override { return false; }
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not?


ColumnPtr executeImpl(const ColumnsWithTypeAndName & arguments, const DataTypePtr &, size_t input_rows_count) const override
{
auto result_col = ColumnUInt64::create(input_rows_count, 0);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe avoid zero initialization by filling with the size of the first argument?

else if (byteSizeByColumn(data_type, column, vec_res))
;
else
LOG_WARNING(&Poco::Logger::get("FunctionByteSize"),
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should throw exception.

"byteSize for \"{}\" is not supported.", data_type->getName());
}

static bool byteSizeByDataType(const IDataType * data_type, UInt64 & byte_size)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is
IDataType::isValueUnambiguouslyRepresentedInFixedSizeContiguousMemoryRegion
and
IDataType::getSizeOfValueInMemory

return false;
}

static bool byteSizeByTypeId(TypeIndex type_id, UInt64 & byte_size)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will allow to remove this code...

ColumnString::Offset prev_offset = 0;
for (size_t i = 0; i < vec_size; ++i)
{
vec_res[i] += offsets[i] - prev_offset + sizeof(offsets[0]);
Copy link
Copy Markdown
Member

@alexey-milovidov alexey-milovidov Dec 29, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One minor issue with type aliasing.
As vec_res and offsets have the same type, compiler cannot rely to the fact that they are not aliased (aka not __restrict). It will do extra load of offsets[i] from memory.

To solve, you can save offsets[i] to the temporary variable on stack.

for (size_t i = 0; i < vec_size; ++i)
vec_res[i] += byte_size;
}
else if (byteSizeByColumn(data_type, column, vec_res))
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How constant columns are processed?


static UInt64 byteSizeForNestedItem(const IColumn & column, size_t idx)
{
if (const ColumnString * col_str = checkAndGetColumn<ColumnString>(&column))
Copy link
Copy Markdown
Member

@alexey-milovidov alexey-milovidov Dec 29, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's quite strange that only a few types are supported for array elements.
What about Array(Tuple(Nullable(FixedString(N)), ...))?

Copy link
Copy Markdown
Member

@alexey-milovidov alexey-milovidov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some changes needed.

@pingyu
Copy link
Copy Markdown
Contributor Author

pingyu commented Jan 1, 2021

Some changes needed.

@alexey-milovidov
All comments addressed. PTAL, thanks~

@alexey-milovidov alexey-milovidov merged commit 4cf7f0c into ClickHouse:master Jan 3, 2021
@alexey-milovidov
Copy link
Copy Markdown
Member

@pingyu I also decided to slightly change the way of implementation (add a method to IColumn), hope you will appreciate.

@pingyu
Copy link
Copy Markdown
Contributor Author

pingyu commented Jan 3, 2021

@pingyu I also decided to slightly change the way of implementation (add a method to IColumn), hope you will appreciate.

@alexey-milovidov
The byteSizeAt(size_t n)is very nice, much better than if-else(s).
I have learnt a lot in my first contribution. Thank you very much~

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-feature Pull request with new product feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Function byteSize

3 participants