Add Simple Streaming implementation#57830
Add Simple Streaming implementation#57830Michicosun wants to merge 89 commits intoClickHouse:masterfrom
Conversation
…/42990/simple_streaming
…/42990/simple_streaming
|
This is an automated comment for commit 210d0b7 with description of existing statuses. It's updated for the latest CI running ❌ Click here to open a full report in a separate page
Successful checks
|
…/42990/simple_streaming
…/42990/simple_streaming
0b00b24 to
3622f90
Compare
…/42990/simple_streaming
…/42990/simple_streaming
…/42990/simple_streaming
…/42990/simple_streaming
al13n321
left a comment
There was a problem hiding this comment.
The changes look great! Sorry for the slow response. Please resolve the conflicts, update SettingsChangesHistory.h, optionally address the new nitpicks, and should be good to go.
| { | ||
| buffer << "stream: " << has_stream; | ||
|
|
||
| buffer << "final: " << has_final; |
There was a problem hiding this comment.
| buffer << "final: " << has_final; | |
| buffer << ", final: " << has_final; |
| buffer << "STREAM"; | ||
|
|
||
| if (has_final) | ||
| buffer << "FINAL"; |
There was a problem hiding this comment.
Need a space between STREAM and FINAL, and update the conditions for the two buffer << ' ' calls below (or extract it into a helper lambda add_space_if_needed() or something). (Though ... STREAM FINAL is not a valid query, it's not obvious that this method can't get called before the query is rejected.)
| }}, | ||
| {"24.3", {{"s3_connect_timeout_ms", 1000, 1000, "Introduce new dedicated setting for s3 connection timeout"}, | ||
| {"allow_experimental_shared_merge_tree", false, true, "The setting is obsolete"}, | ||
| {"allow_experimental_streaming", false, false, "Allow to use Streaming Queries"}, |
There was a problem hiding this comment.
Need to move it to the current month. (We'll probably merge this within a month, so it won't have to be moved again.)
|
|
||
| if (table_expression_modifiers && table_expression_modifiers->hasStream()) | ||
| throw Exception(ErrorCodes::ILLEGAL_PREWHERE, | ||
| "PREWHERE does not supported for Streaming Queries"); |
There was a problem hiding this comment.
| "PREWHERE does not supported for Streaming Queries"); | |
| "PREWHERE is not supported for Streaming Queries"); |
| storage->getStorageID().getNameForLogs()); | ||
| if (table_node->hasTableExpressionModifiers() && table_node->getTableExpressionModifiers()->hasStream()) | ||
| throw Exception(ErrorCodes::ILLEGAL_PREWHERE, | ||
| "PREWHERE does not supported for Streaming Queries"); |
There was a problem hiding this comment.
| "PREWHERE does not supported for Streaming Queries"); | |
| "PREWHERE is not supported for Streaming Queries"); |
| storage->getStorageID().getNameForLogs()); | ||
| if (table_function_node->hasTableExpressionModifiers() && table_function_node->getTableExpressionModifiers()->hasStream()) | ||
| throw Exception(ErrorCodes::ILLEGAL_PREWHERE, | ||
| "PREWHERE does not supported for Streaming Queries"); |
There was a problem hiding this comment.
| "PREWHERE does not supported for Streaming Queries"); | |
| "PREWHERE is not supported for Streaming Queries"); |
| getStorageID().getFullTableName(), data_path_in_backup); | ||
| } | ||
|
|
||
| StorageSnapshotPtr IStorage::getStorageSnapshot(const StorageMetadataPtr & metadata_snapshot, ContextPtr query_context) const |
There was a problem hiding this comment.
Can't these 3 methods just use default argument value instead of separate overloads?
virtual StorageSnapshotPtr getStorageSnapshot(const StorageMetadataPtr & metadata_snapshot, ContextPtr query_context, const StorageSnapshotSettings & additional_settings = {}) const;
There was a problem hiding this comment.
actually it is because of clang-tidy warning: Default arguments on virtual or override methods are prohibited
| /// if block is added: push current_block to all subscribers under the same lock as commit. | ||
| /// But there may be a race between insert and creating a new subscription, in which case | ||
| /// the block may already have been cleared, but this is normal, because this is concurrent operations. |
There was a problem hiding this comment.
I would've liked to know how this translates to actual guarantees provided by streaming queries? Is this correct?:
| /// if block is added: push current_block to all subscribers under the same lock as commit. | |
| /// But there may be a race between insert and creating a new subscription, in which case | |
| /// the block may already have been cleared, but this is normal, because this is concurrent operations. | |
| /// If block is added: push current_block to all subscribers under the same lock as commit. | |
| /// This ensures that streaming query receives each block at most once: either from the | |
| /// initial read or from stream. | |
| /// But there may be a race between insert and creating a new subscription, in which case | |
| /// the block may already have been cleared, but this is normal: it's ok for streaming query | |
| /// to miss a block that was inserted concurrently with starting the streaming query. |
There was a problem hiding this comment.
That is absolutely correct! The main idea that in case of concurrent insert (block P) and starting streaming query, it may not read the block P.
|
@al13n321 thank you for review!! But I'm not recommend you to merge this pr, because:
I see that issues are not critical, so maybe you can also check out final pr? 63312 I just merged this pr and queue mode into single and above that implemented cursors. Because commits are the same, I do not close previous prs, to make easier review. Sorry about this move, but I needed to provide the final implementation of streaming queries read mechanics by the end of May (At least in the form of a pull request). |
|
Dear @al13n321, this PR hasn't been updated for a while. You will be unassigned. Will you continue working on it? If so, please feel free to reassign yourself. |
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
add streaming mode for every table engine via 2 processors in pipeline
example select:
insert works as usual
Documentation entry for user-facing changes