fix(sql): incorrect results produced by SAMPLE BY with mixed timestamp and aggregate expressions by puzpuzpuz · Pull Request #6254 · questdb/questdb

puzpuzpuz · 2025-10-09T14:25:54Z

SAMPLE BY rewrite fix 1

SAMPLE BY to parallel GROUP BY rewrite was not handling mixed timestamp and aggregate function expressions. This was leading to unordered and incorrect results returned from the query.

Example of a problematic query:

SELECT ts, count()::double / datediff('h', ts, dateadd('d', 1, ts, 'Europe/Copenhagen')) AS Coverage
FROM 'x'
SAMPLE BY 1d ALIGN TO CALENDAR TIME ZONE 'Europe/Copenhagen';

The above query was incorrectly rewritten into something like this (note that this is a simplified representation, without the actual GROUP BY details in the nested query):

SELECT ts, count()::double / datediff('h', ts, dateadd('d', 1, ts, 'Europe/Copenhagen')) AS Coverage
FROM (
  SELECT ts
  FROM 'x'
  SAMPLE BY 1d ALIGN TO CALENDAR TIME ZONE 'Europe/Copenhagen'
);

The fix disables the rewrite for such queries.

SAMPLE BY rewrite fix 2

The rewrite could throw away columns from the result set in case of certain expressions involving the designated timestamp column. Here is an example:

SELECT ts, ts2, ts - ts2 AS ts_diff, count() AS c
from x
SAMPLE BY 1d;

Before this patch, the last column (c) would not be returned from the query.

…p and aggregate expressions

coderabbitai · 2025-10-09T14:26:39Z

Important

Review skipped

Auto reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

Refactors SqlOptimiser temporary state to use new buffers (tempBoolList, tempIntList, tempColumns, tempColumns2) in place of prior lists, updating related methods and control paths. Adds two tests validating scenarios with timestamps mixed with aggregates: one for materialized views and one for SAMPLE BY rewrite and plans.

Changes

Cohort / File(s)	Summary
SqlOptimiser temp buffers refactor `core/src/main/java/io/questdb/griffin/SqlOptimiser.java`	Replaces groupByUsed, tempList, tmpCursorAliases, etc., with tempBoolList, tempIntList, tempColumns, tempColumns2, and tempCursorAliases. Updates clear(), addColumnToSelectModel, sample-by/group-by rewrite paths, and index/boolean tracking logic to use new buffers.
Materialized view timestamp+aggregate test `core/src/test/java/io/questdb/test/cairo/mv/MatViewTest.java`	Adds testQueryTimestampMixedWithAggregates to create a table with TIMESTAMP, build MV mixing timestamp with aggregate, and assert MV metadata/state and query results.
SAMPLE BY rewrite timestamp+aggregate test `core/src/test/java/io/questdb/test/griffin/engine/groupby/SampleByTest.java`	Adds testSampleByRewriteTimestampMixedWithAggregates to verify plan and results for SAMPLE BY with timestamp handling, including calendar/first-observation alignment, timezone/offset, and subquery timestamp usage.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant C as Client
  participant P as Parser
  participant O as SqlOptimiser
  participant E as Execution Engine

  C->>P: Submit SQL with SAMPLE BY / aggregates
  P->>O: AST
  rect rgba(230,245,255,0.6)
    note over O: Rewrite and planning (updated buffers)
    O->>O: Initialize tempBoolList / tempIntList / tempColumns / tempColumns2
    O->>O: Analyze group-by / sample-by predicates
    O->>O: Track timestamp and aggregate flags via tempBoolList
    O->>O: Manage column/index sets via tempColumns* and tempIntList
    O-->>P: Physical plan
  end
  P-->>E: Plan
  E-->>C: Result set (sampled/aggregated)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Pre-merge checks and finishing touches

❌ Failed checks (3 warnings)

Check name	Status	Explanation	Resolution
Linked Issues Check	⚠️ Warning	The linked issue demands disabling the SAMPLE BY to GROUP BY rewrite when timestamp expressions are mixed with aggregates to preserve row order, but the code changes summarized focus exclusively on introducing new internal buffers and refactoring SqlOptimiser state without showing the actual guard logic that implements the disable.	Please include or highlight the added logic in SqlOptimiser that conditionally skips the parallel GROUP BY rewrite for mixed timestamp and aggregate queries, update the code summary to reflect this change, and ensure the new tests verify the disabled path produces correctly ordered results.
Out of Scope Changes Check	⚠️ Warning	The pull request introduces extensive refactoring of SqlOptimiser’s internal temporary buffers and state management which is not mentioned in the linked issue objectives and appears unrelated to simply disabling the rewrite for mixed timestamp and aggregate expressions.	Consider separating the buffer refactoring into its own pull request or clearly document how these changes are necessary to support the rewrite-disable fix so reviewers can focus on the bug resolution without unrelated modifications.
Docstring Coverage	⚠️ Warning	Docstring coverage is 9.52% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title Check	✅ Passed	The title succinctly describes the primary change as a fix for incorrect results produced by SAMPLE BY when mixing timestamp and aggregate expressions, directly matching the linked issue and highlighting the core problem being addressed.
Description Check	✅ Passed	The pull request description clearly explains two distinct fixes for SAMPLE BY rewrite issues, references the linked issue #6239, and provides concrete SQL examples that match the changes summarized in the code diffs. It directly relates to the modifications in SqlOptimiser and the added tests, and it accurately conveys the intent and scope of the patch. The information provided is specific and on-topic rather than generic or unrelated.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

…sample_by

bluestreak01 · 2025-10-09T15:52:49Z

@CodeRabbit review

coderabbitai · 2025-10-09T15:53:40Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

core/src/main/java/io/questdb/griffin/SqlOptimiser.java (1)
5368-5375: Prevent dropping existing projection columns

needRemoveColumns is incremented for every entry in missingDependedTokens, even when addBottomUpColumnIfNotExists does not add anything because the dependency column is already present. In those cases the wrapper constructed later subtracts too many columns and removes legitimate user-selected columns from the projection. Example: SELECT ts, value, ts + value FROM x SAMPLE BY ... ends up returning only ts because value gets “removed” despite never being reintroduced.

Only mark a dependency for later removal when we actually insert it:
-        for (int i = 0, size = missingDependedTokens.size(); i < size; i++) {
-            model.addBottomUpColumnIfNotExists(nextColumn(missingDependedTokens.get(i)));
-        }
-        needRemoveColumns += missingDependedTokens.size();
+        for (int i = 0, size = missingDependedTokens.size(); i < size; i++) {
+            final CharSequence dependency = missingDependedTokens.get(i);
+            if (model.getAliasToColumnMap().excludes(dependency)) {
+                model.addBottomUpColumn(nextColumn(dependency));
+                needRemoveColumns++;
+            }
+        }
This keeps existing projection columns intact while still cleaning up the temporary ones.

🧹 Nitpick comments (1)

core/src/test/java/io/questdb/test/cairo/mv/MatViewTest.java (1)

4314-4334: Don’t pre-sort the results; assert natural ordering.

Issue #6239 is about unordered rows. Passing "ts" to assertQueryNoLeakCheck sorts the actual rows in the harness, masking ordering bugs. Compare without sorting to validate the engine’s natural order.

Apply this diff:

-            assertQueryNoLeakCheck(
-                    """
-                            ts\tCoverage
-                            2009-12-31T23:00:00.000000Z\t0.08333333333333333
-                            2019-12-31T23:00:00.000000Z\t0.041666666666666664
-                            2029-12-31T23:00:00.000000Z\t0.041666666666666664
-                            """,
-                    "x_view",
-                    "ts",
-                    true,
-                    true
-            );
+            assertQueryNoLeakCheck(
+                    """
+                            ts\tCoverage
+                            2009-12-31T23:00:00.000000Z\t0.08333333333333333
+                            2019-12-31T23:00:00.000000Z\t0.041666666666666664
+                            2029-12-31T23:00:00.000000Z\t0.041666666666666664
+                            """,
+                    "x_view"
+            );

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between dbbc7ba and 99445c3.

📒 Files selected for processing (3)

core/src/main/java/io/questdb/griffin/SqlOptimiser.java (20 hunks)
core/src/test/java/io/questdb/test/cairo/mv/MatViewTest.java (1 hunks)
core/src/test/java/io/questdb/test/griffin/engine/groupby/SampleByTest.java (2 hunks)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (28)

GitHub Check: New pull request (SelfHosted Other tests on linux-x64-zfs)
GitHub Check: New pull request (SelfHosted Griffin And Fuzz tests on linux-x64-zfs)
GitHub Check: New pull request (SelfHosted Other tests on linux-arm64)
GitHub Check: New pull request (SelfHosted Griffin And Fuzz tests on linux-arm64)
GitHub Check: New pull request (SelfHosted Other tests Start X64Zfs Agent)
GitHub Check: New pull request (Hosted Running tests on windows-other)
GitHub Check: New pull request (Hosted Running tests on windows-pgwire)
GitHub Check: New pull request (Hosted Running tests on windows-cairo)
GitHub Check: New pull request (Hosted Running tests on windows-fuzz2)
GitHub Check: New pull request (Hosted Running tests on windows-fuzz1)
GitHub Check: New pull request (Hosted Running tests on windows-griffin)
GitHub Check: New pull request (Hosted Running tests with cover on linux-other)
GitHub Check: New pull request (Hosted Running tests on mac-other)
GitHub Check: New pull request (Hosted Running tests with cover on linux-pgwire)
GitHub Check: New pull request (Hosted Running tests on mac-pgwire)
GitHub Check: New pull request (Hosted Running tests with cover on linux-cairo)
GitHub Check: New pull request (Hosted Running tests on mac-cairo-fuzz)
GitHub Check: New pull request (Hosted Running tests with cover on linux-fuzz)
GitHub Check: New pull request (Hosted Running tests on mac-cairo)
GitHub Check: New pull request (Hosted Running tests with cover on linux-griffin)
GitHub Check: New pull request (SelfHosted Griffin And Fuzz tests Start X64Zfs Agent)
GitHub Check: New pull request (SelfHosted Other tests Start ARM Agent)
GitHub Check: New pull request (Hosted Running tests on mac-griffin)
GitHub Check: New pull request (SelfHosted Griffin And Fuzz tests Start ARM Agent)
GitHub Check: New pull request (Rust Test and Lint on linux-jdk17)
GitHub Check: build
GitHub Check: build
GitHub Check: New pull request (Check Changes Check changes)

🔇 Additional comments (2)

core/src/test/java/io/questdb/test/cairo/mv/MatViewTest.java (1)

4294-4336: Solid regression test for mixed ts + aggregates in SAMPLE BY.

Covers the reported case well and validates MV refresh/state. Looks good.

core/src/test/java/io/questdb/test/griffin/engine/groupby/SampleByTest.java (1)

6128-6162: Great regression coverage for mixed timestamp rewrite

Appreciate how this locks in both the ordered results and the expected plan shape when aggregates coexist with the sampled timestamp—exactly the scenario from #6239. Solid guard against the rewrite regressing again.

…sample_by

core/src/main/java/io/questdb/griffin/SqlOptimiser.java

puzpuzpuz · 2025-10-09T16:56:42Z

Some comments are outside the diff and can’t be posted inline due to platform limitations.

This comment was spot-on. The fix is in 4a97543

bluestreak01

while the result is correct, the execution plan is highly sub-optimal. this should do sub-query extraction instead. E.g. aggregate on ts, count() and do calculations on top. Right now the whole thing degrades in performance massively.

puzpuzpuz · 2025-10-09T17:31:50Z

Right now the whole thing degrades in performance massively.

This is not degradation as such queries were not supported by the rewrite. Better to have correct results rather than get some garbage fast. The extraction must be implemented optimally, i.e. with aggregate reuse, so I suggest merging this patch to get a couple critical bugs fixed while I'm going to start working on mixed ts + aggregate expressions support in a separate PR. WDYT?

puzpuzpuz · 2025-10-10T17:49:01Z

Failed test is #6266, unrelated with these changes. I can reproduce locally, so going to fix it in a separate branch.

glasstiger · 2025-10-10T18:16:19Z

[PR Coverage check]

😍 pass : 61 / 63 (96.83%)

file detail

	path	covered line	new line	coverage
🔵	io/questdb/griffin/SqlOptimiser.java	61	63	96.83%

fix(sql): incorrect results produced by SAMPLE BY with mixed timestam…

20725bb

…p and aggregate expressions

puzpuzpuz self-assigned this Oct 9, 2025

puzpuzpuz added Bug Incorrect or unexpected behavior SQL Issues or changes relating to SQL execution labels Oct 9, 2025

puzpuzpuz added 2 commits October 9, 2025 18:34

Merge remote-tracking branch 'upstream/master' into puzpuzpuz_broken_…

2abe1a3

…sample_by

Fix index usage

99445c3

coderabbitai bot reviewed Oct 9, 2025

View reviewed changes

Merge remote-tracking branch 'upstream/master' into puzpuzpuz_broken_…

e9a6adc

…sample_by

bluestreak01 reviewed Oct 9, 2025

View reviewed changes

core/src/main/java/io/questdb/griffin/SqlOptimiser.java Show resolved Hide resolved

bluestreak01 reviewed Oct 9, 2025

View reviewed changes

core/src/main/java/io/questdb/griffin/SqlOptimiser.java Show resolved Hide resolved

Fix bug found by coderabbit

4a97543

Address nit

eea9178

bluestreak01 requested changes Oct 9, 2025

View reviewed changes

puzpuzpuz and others added 3 commits October 10, 2025 16:03

Improve tests

e98217b

Remove redundant hash set

e89153b

Merge branch 'master' into puzpuzpuz_broken_sample_by

166ddf8

bluestreak01 approved these changes Oct 10, 2025

View reviewed changes

bluestreak01 merged commit 61278ee into master Oct 10, 2025
35 checks passed

bluestreak01 deleted the puzpuzpuz_broken_sample_by branch October 10, 2025 22:39

coderabbitai bot mentioned this pull request Oct 15, 2025

fix(sql): breaking change 💥- invalid column error returned from GROUP BY on joined tables #6275

Merged

coderabbitai bot mentioned this pull request Dec 18, 2025

fix(sql): allow aggregate functions inside CASE expressions #6550

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(sql): incorrect results produced by SAMPLE BY with mixed timestamp and aggregate expressions#6254

fix(sql): incorrect results produced by SAMPLE BY with mixed timestamp and aggregate expressions#6254
bluestreak01 merged 9 commits intomasterfrom
puzpuzpuz_broken_sample_by

puzpuzpuz commented Oct 9, 2025 •

edited

Loading

Uh oh!

coderabbitai bot commented Oct 9, 2025 •

edited

Loading

Review skipped

Uh oh!

bluestreak01 commented Oct 9, 2025

Uh oh!

coderabbitai bot commented Oct 9, 2025

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

puzpuzpuz commented Oct 9, 2025

Uh oh!

bluestreak01 left a comment

Uh oh!

puzpuzpuz commented Oct 9, 2025

Uh oh!

puzpuzpuz commented Oct 10, 2025

Uh oh!

glasstiger commented Oct 10, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

puzpuzpuz commented Oct 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

SAMPLE BY rewrite fix 1

SAMPLE BY rewrite fix 2

Uh oh!

coderabbitai bot commented Oct 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Pre-merge checks and finishing touches

Uh oh!

bluestreak01 commented Oct 9, 2025

Uh oh!

coderabbitai bot commented Oct 9, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

puzpuzpuz commented Oct 9, 2025

Uh oh!

bluestreak01 left a comment

Choose a reason for hiding this comment

Uh oh!

puzpuzpuz commented Oct 9, 2025

Uh oh!

puzpuzpuz commented Oct 10, 2025

Uh oh!

glasstiger commented Oct 10, 2025

[PR Coverage check]

file detail

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

puzpuzpuz commented Oct 9, 2025 •

edited

Loading

coderabbitai bot commented Oct 9, 2025 •

edited

Loading