Feat/issue 1093 python external stream#1095
Conversation
…and node specifications
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f975c04950
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 3dd5602f0a
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e0376d4a10
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
- Preserve row count for zero-column projections in PythonTableTransform. - Skip empty generator batches in PythonStreamingSource instead of terminating. - Add regression tests for both cases.
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
proton/src/Interpreters/ApplyWithSubqueryVisitor.cpp
Lines 151 to 156 in ccd8b06
The new WITH-subquery rewrite only substitutes func.arguments->children[0], but python_table expects the source in its second argument (args[1] in TableFunctionPythonTable::parseArguments). As a result, a CTE alias used as the source (e.g., WITH src AS (...) SELECT * FROM python_table(py_stream, src, col1)) is left as an identifier and later resolved as a real table, producing UNKNOWN_TABLE errors. Consider extending the rewrite to handle the correct argument position for python_table (or making the position configurable per table function).
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
ee04b12 to
18a33c6
Compare
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 18a33c65f7
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 2bf00d9a4f
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
@codex review |
|
Codex Review: Didn't find any major issues. Breezy! ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 5254928172
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| const Py_ssize_t rows = PyList_Size(normalized.get()); | ||
| if (rows <= 0) | ||
| continue; | ||
|
|
||
| return Chunk(Columns{}, static_cast<UInt64>(rows)); |
There was a problem hiding this comment.
Validate row arity even when no columns are requested
When the output header has zero columns (e.g., SELECT count()), the streaming path returns a chunk based only on PyList_Size without checking that each row matches the declared tuple schema. If a Python generator yields rows with missing/extra elements, count() will still succeed and report a row count while SELECT * would throw, leading to inconsistent results and masking schema errors. Consider validating row shapes in this zero‑column path (or reusing the conversion routine) before emitting the count.
Useful? React with 👍 / 👎.
|
Addressed review r2735903551: even when output header has 0 columns (e.g. count-only), PythonStreamingSource now validates each row is tuple/list and its arity matches the declared tuple schema before emitting the row-count chunk. Added UT: CPythonTest.PythonStreamingSourceNoColumnsRejectsWrongArity. |
Summary
Add type='python' external streams with:
Result shape rules (must-follow)
Let the external stream schema have N columns:
Unsupported: coroutine / async generator results.
Examples
1) Basic read + write
2) Streaming generator (multi-column, yield tuple rows)
3) python_table() transform over an input source
Example to show real time hack-news posts
Compatibility / Notes