Skip to content

Fix race in data type Object#35409

Merged
CurtizJ merged 3 commits intoClickHouse:masterfrom
CurtizJ:dynamic-columns-3
Mar 19, 2022
Merged

Fix race in data type Object#35409
CurtizJ merged 3 commits intoClickHouse:masterfrom
CurtizJ:dynamic-columns-3

Conversation

@CurtizJ
Copy link
Copy Markdown
Member

@CurtizJ CurtizJ commented Mar 18, 2022

Changelog category (leave one):

  • Not for changelog (changelog entry is not required)

@robot-clickhouse robot-clickhouse added the pr-not-for-changelog This PR should not be mentioned in the changelog label Mar 18, 2022
@CurtizJ
Copy link
Copy Markdown
Member Author

CurtizJ commented Mar 18, 2022

The problem was that simdjson parser is not thread safe, so we cannot share ptr to serialization.

@CurtizJ CurtizJ added the pr-must-backport Pull request should be backported intentionally. Use this label with great care! label Mar 18, 2022
Copy link
Copy Markdown
Member

@alexey-milovidov alexey-milovidov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need some general approach like:

  1. ISerialization is presumed to be not thread safe, so you cannot reuse it in multiple threads for any data types.
    Also you can highlight it explicitly by making its method non-const (not sure if it is possible)...

  2. Or we can state that it is thread safe, but for simdjson, specifically, we will maintain an ObjectPool inside. This is similar to what we do for re2_st.

@alexey-milovidov alexey-milovidov self-assigned this Mar 18, 2022
@alexey-milovidov
Copy link
Copy Markdown
Member

Also need to check why this problem was not covered by stress tests...
Maybe we need to add more randomization?

@CurtizJ
Copy link
Copy Markdown
Member Author

CurtizJ commented Mar 18, 2022

Maybe we can just recreate parser on each row, because it's a quite light object and we don't use any functionality like reserve or smth.

@alexey-milovidov
Copy link
Copy Markdown
Member

It's at least one extra virtual call and allocation (unique_ptr).

@CurtizJ
Copy link
Copy Markdown
Member Author

CurtizJ commented Mar 18, 2022

But locking a mutex to get an object from pool for each row is also bad.

@CurtizJ
Copy link
Copy Markdown
Member Author

CurtizJ commented Mar 18, 2022

Actually I want to keep ISerialization thread safe. The problem is that parsing is implemented by calling deserializeText... for each row and the shared objects are the ISerialization class itself and IColumn. Maybe we can put parser to ColumnObject, since it's already thread unsafe. But it kinda breaks the semantics of separation data types, serializations and columns.

@CurtizJ CurtizJ force-pushed the dynamic-columns-3 branch from 676789e to d26ba35 Compare March 19, 2022 00:26
@CurtizJ
Copy link
Copy Markdown
Member Author

CurtizJ commented Mar 19, 2022

@alexey-milovidov

I've added a pool of parsers and optimized performance of inserts a bit more.

if (!is_nullable && info.have_nulls)
field = applyVisitor(FieldVisitorReplaceNull(base_type->getDefault(), value_dim), std::move(field));

auto value_type = createArrayOfType(base_type, value_dim);
Copy link
Copy Markdown
Member Author

@CurtizJ CurtizJ Mar 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here was one of the performance issues, because many excessive DataTypePtr objects were created.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-backports-created Backport PRs are successfully created, it won't be processed by CI script anymore pr-must-backport Pull request should be backported intentionally. Use this label with great care! pr-not-for-changelog This PR should not be mentioned in the changelog

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants