Fix race in data type Object#35409
Conversation
|
The problem was that simdjson parser is not thread safe, so we cannot share ptr to serialization. |
There was a problem hiding this comment.
We need some general approach like:
-
ISerialization is presumed to be not thread safe, so you cannot reuse it in multiple threads for any data types.
Also you can highlight it explicitly by making its method non-const (not sure if it is possible)... -
Or we can state that it is thread safe, but for simdjson, specifically, we will maintain an ObjectPool inside. This is similar to what we do for
re2_st.
|
Also need to check why this problem was not covered by stress tests... |
|
Maybe we can just recreate parser on each row, because it's a quite light object and we don't use any functionality like |
|
It's at least one extra virtual call and allocation (unique_ptr). |
|
But locking a mutex to get an object from pool for each row is also bad. |
|
Actually I want to keep |
676789e to
d26ba35
Compare
|
I've added a pool of parsers and optimized performance of inserts a bit more. |
| if (!is_nullable && info.have_nulls) | ||
| field = applyVisitor(FieldVisitorReplaceNull(base_type->getDefault(), value_dim), std::move(field)); | ||
|
|
||
| auto value_type = createArrayOfType(base_type, value_dim); |
There was a problem hiding this comment.
Here was one of the performance issues, because many excessive DataTypePtr objects were created.
Backport #35409 to 22.3: Fix race in data type `Object`
Changelog category (leave one):