Add schema inference to MongoDB engine#54638
Add schema inference to MongoDB engine#54638evillique wants to merge 9 commits intoClickHouse:masterfrom
Conversation
|
This is an automated comment for commit 78bc5f4 with description of existing statuses. It's updated for the latest CI running ✅ Click here to open a full report in a separate page Successful checks
|
| Poco::MongoDB::Element::Ptr child = parent->get(static_cast<int>(idx)); | ||
| types.push_back(getDataTypeMongoDB(*child)); | ||
| } | ||
| return std::make_shared<DataTypeTuple>(types); |
There was a problem hiding this comment.
As I can see we don't support reading Array with different types as Tuple from Mongo. It's a good idea to implement it later, but now we should check that all nested types are the same or have common type (using getLeastSuperType) and throw an exception if there is no common type or create an Array(comon_type) otherwise.
There was a problem hiding this comment.
Maybe I miss somethingm but I still don't see support for reading Array with different types as Tuple from Mongo
| Field field = Array(); | ||
| Array & arr = field.get<Array &>(); | ||
| arr.reserve(size); | ||
|
|
||
| while (!arrays.empty()) | ||
| for (size_t child_idx = 0; child_idx < size; ++child_idx) | ||
| { | ||
| size_t dimension_idx = arrays.size() - 1; | ||
| Poco::MongoDB::Element::Ptr child = parent->get(static_cast<int>(child_idx)); | ||
| arr.push_back(getValueField(nested_type, *child, name)); | ||
| } | ||
|
|
||
| if (dimension_idx + 1 > expected_dimensions) | ||
| throw Exception(ErrorCodes::BAD_ARGUMENTS, "Got more dimensions than expected"); | ||
| assert_cast<ColumnArray &>(column).insert(field); | ||
| break; |
There was a problem hiding this comment.
Working with Fields during insert is really suboptimal. Why don't just use insertValue function recursively here on nested column and Mongo array elements?
|
|
||
| dimensions[dimension_idx].emplace_back(Array(dimensions[dimension_idx + 1].begin(), dimensions[dimension_idx + 1].end())); | ||
| dimensions[dimension_idx + 1].clear(); | ||
| for (const auto col_num : collections::range(0, column_tuple.tupleSize())) |
There was a problem hiding this comment.
Just curious why not just for (size_t i = 0; i != column_tuple.tupleSize(); ++i) ?
| void insertValue( | ||
| IColumn & column, | ||
| const IDataType & data_type, | ||
| const ValueType type, | ||
| const Poco::MongoDB::Element & value, | ||
| const std::string & name, |
There was a problem hiding this comment.
Really interesting why do we need this ValueType at all. I would just remove it completely and use only our IDataType
| arrays.emplace_back(child.get(), 0); | ||
| } | ||
| else if (child->type() == Poco::MongoDB::ElementTraits<Poco::MongoDB::NullValue>::TypeId) | ||
| if (nested_value.isNull() || nested_value->type() == Poco::MongoDB::ElementTraits<Poco::MongoDB::NullValue>::TypeId) |
There was a problem hiding this comment.
This logic with Null value and Nullable column below repeats logic from generate() method. Let's just do it in the begining of the insertValue function as we can call it recursively now
|
Could we merge this? This would make the clickhouse-mongodb connection a lot better |
|
@Apidcloud could you resolve conflicts? |
|
Would love if this PR could get across the finish line. It's a big limitation not being able to work with Object and Array fields. |
|
Dear @Avogar, this PR hasn't been updated for a while. You will be unassigned. Will you continue working on it? If so, please feel free to reassign yourself. |
|
Any updates on this? |
|
Dear @Avogar, this PR hasn't been updated for a while. You will be unassigned. Will you continue working on it? If so, please feel free to reassign yourself. |
|
Dear @Avogar, this PR hasn't been updated for a while. You will be unassigned. Will you continue working on it? If so, please feel free to reassign yourself. |
|
Good feature, let's continue! |
|
Obsolete. |
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Add schema inference to MongoDB engine. Closes #51933
Add support for nested structures as Tuples. Closes #10919, reference #51929
Documentation entry for user-facing changes