feat(sql): use vectorized computation in more non-keyed GROUP BY queries by puzpuzpuz · Pull Request #6805 · questdb/questdb

puzpuzpuz · 2026-02-23T17:49:05Z

Vectorized execution is now applied to more non-keyed GROUP BY queries like

SELECT first(price), last(price) FROM trades;

Internally, GroupByFunction#computeBatch() API is used for all queries without filter. This API was previously used only in WINDOW JOIN.

…oup_by

coderabbitai · 2026-02-23T17:49:59Z

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 5cb68041-225e-4d0a-8d5f-a796350bbaca

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

Walkthrough

Extended GroupByFunction.computeBatch to accept a long startRowId and updated many group-by implementations and call sites to support batch-aware, cumulative and row-id-ordered aggregation semantics; added vectorized batch execution paths and updated tests and factories to exercise accumulation across sequential batches.

Changes

Cohort / File(s)	Summary
Core API `core/src/main/java/io/questdb/griffin/engine/functions/GroupByFunction.java`	Changed `computeBatch(MapValue, long ptr, int count)` → `computeBatch(MapValue, long ptr, int count, long startRowId)` and updated docs.
Batch-aware group-by implementations `core/src/main/java/io/questdb/griffin/engine/functions/groupby/*`	Many group-by classes updated to new `computeBatch(..., long startRowId)` signature. Changes include: conditional first/last selection based on row id, offset-based first-not-null/last-not-null logic, guarded min/max updates, and accumulative sum/avg/count semantics (switches from put->add or read/compare-before-write).
Async non-keyed group-by / vectorization `core/src/main/java/io/questdb/griffin/engine/table/AsyncGroupByNotKeyedAtom.java`, `AsyncGroupByNotKeyedRecordCursorFactory.java`	Added `batchColumnIndexes`, vectorized execution path, constructor/API changes to carry batch eligibility, and new aggregateVect flow for batch processing.
Join / window call sites `core/src/main/java/io/questdb/griffin/engine/join/AsyncWindowJoin*.java`, `WindowJoinFastRecordCursorFactory.java`	Updated all calls to GroupByFunction.computeBatch to pass the new `startRowId` argument (often `0` at call sites).
Tests — batch behavior & vectorized plans `core/src/test/java/io/questdb/test/griffin/engine/functions/groupby/*BatchTest.java`, `GroupByTest.java`, `ParallelGroupByFuzzTest.java`, `ExplainPlanTest.java`, etc.	Updated many tests to call new computeBatch signature, added numerous accumulation tests for First/Last/NotNull/Min/Max/Sum/Avg/Count variants, adjusted expected plans to include vectorized flag, and removed JIT gating in some tests.
Utilities / small fixes assorted files	Minor import and NaN/LONG_NULL handling adjustments (e.g., Numbers import additions/removals), Unsafe usages for timestamp batches, and small refactors to support new semantics.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Suggested reviewers

puzpuzpuz
bluestreak01

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch puzpuzpuz_vect_group_by

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

…oup_by

glasstiger · 2026-03-04T11:57:45Z

@puzpuzpuz, LGTM, but Claude is concerned about test coverage:

  - No explicit test for the column-top fallback path in aggregateVect — the code falls back to row-by-row for frames with column tops, but I don't see a dedicated test
  - No test for AvgDoubleGroupByFunction batch accumulation across frames — this is the most complex accumulation logic due to sumDoubleAcc's side effects
  - No concurrency/stress test specifically targeting the new aggregateVect path with multiple workers

The last one is probably ok, the fuzz test exercises that path.
I guess fuzz test covers the first two as well, but not sure.

puzpuzpuz · 2026-03-04T15:19:58Z

@puzpuzpuz, LGTM, but Claude is concerned about test coverage

@glasstiger all three concerns are already covered by existing tests:

Column-top fallback path — testNonKeyedVectorizedColumnTops() in GroupByTest.java explicitly tests this. It creates a table, adds a new column via ALTER TABLE (which creates column tops on old partitions), inserts new data, and verifies correct results including NULLs from column tops.
AvgDoubleGroupByFunction batch accumulation — testNonKeyedVectorizedAllBatchEligible() tests avg(v) in a vectorized non-keyed GROUP BY. Additionally, AvgDoubleGroupByFunctionFactoryTest covers avg with NULLs, empty inputs, and infinity values. The column-tops test also exercises avg across frames since the fallback mixes with batch processing.
Concurrency/stress test — ParallelGroupByFuzzTest is a property-based fuzz test that generates random configurations with varying page frame sizes and worker counts, covering both keyed and non-keyed GROUP BY.

bluestreak01 · 2026-03-04T17:29:30Z

@CodeRabbit can you critically review this PR and find bugs

coderabbitai · 2026-03-04T17:33:34Z

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Get PR diff
gh pr view 6805 --repo questdb/questdb --json files,title,body | jq -r '.files[].path' | head -80

Length of output: 7645

🏁 Script executed:

#!/bin/bash
gh pr diff 6805 --repo questdb/questdb 2>/dev/null | head -2000

Length of output: 107961

🏁 Script executed:

#!/bin/bash
gh pr diff 6805 --repo questdb/questdb 2>/dev/null | tail -n +2001 | head -2000

Length of output: 102697

@bluestreak01 I've done a thorough review of the diff. Here are the concrete bugs I found, from most to least severe:

🔴 Bug 1: `AvgDoubleGroupByFunction.computeBatch` — accumulated `Infinity` is silently discarded

File: AvgDoubleGroupByFunction.java

final double prevSum = mapValue.getDouble(valueIndex);
if (Numbers.isFinite(prevSum)) {
    mapValue.putDouble(valueIndex, prevSum + batchSum);
} else {
    mapValue.putDouble(valueIndex, batchSum);  // ← triggered when prevSum = Infinity
}

Numbers.isFinite() returns false for both NaN (the empty/null sentinel set by setEmpty) and ±Infinity (a legitimate result when summing many large doubles). If the running sum legitimately reaches +Infinity across earlier frames, the else branch fires and overwrites it with batchSum, silently discarding the accumulated state.

The check must distinguish "not yet initialised" from "overflowed to Infinity". The correct guard is:

if (!Numbers.isNull(prevSum)) {   // NaN is null; Infinity is a valid (if extreme) sum
    mapValue.putDouble(valueIndex, prevSum + batchSum);
} else {
    mapValue.putDouble(valueIndex, batchSum);
}

🔴 Bug 2: `SumDoubleGroupByFunction.computeBatch` — identical Infinity-erasure bug

File: SumDoubleGroupByFunction.java

final double existing = mapValue.getDouble(valueIndex);
if (Numbers.isFinite(existing)) {
    mapValue.putDouble(valueIndex, existing + batchSum);
} else {
    mapValue.putDouble(valueIndex, batchSum);  // ← fires when existing = Infinity
}

Exactly the same problem. Once the running sum spills to Infinity, subsequent batches replace it instead of accumulating into it. Same fix applies: Numbers.isNull(existing) to identify the empty sentinel.

🔴 Bug 3: `SumFloatGroupByFunction.computeBatch` — same issue for `float`

File: SumFloatGroupByFunction.java

final float existing = mapValue.getFloat(valueIndex);
if (Float.isFinite(existing)) {
    mapValue.putFloat(valueIndex, existing + acc);
} else {
    mapValue.putFloat(valueIndex, acc);  // ← fires when existing = Infinity
}

Float.isFinite returns false for Float.NaN and Float.POSITIVE_INFINITY. If the running float sum reaches +Infinity, the next batch replaces it. The initial empty value is Float.NaN (set by setEmpty); the guard should be Float.isNaN(existing).

🟠 Bug 4: All window-join callers pass `startRowId = 0` — semantically broken for `first()`/`last()` across multi-batch groups

Files: AsyncWindowJoinRecordCursorFactory.java, AsyncWindowJoinFastRecordCursorFactory.java, WindowJoinFastRecordCursorFactory.java

The new computeBatch signature requires a meaningful startRowId for first()/last() to resolve ordering across multiple invocations. All existing window-join call sites hardcode startRowId = 0:

groupByFunctions.getQuick(i).computeBatch(value, ..., 0);  // ← every site

For LastXxx.computeBatch, the guard is lastRowId > existingRowId. With startRowId = 0 always, batch 1 sets existingRowId = count1 - 1. Batch 2 computes lastRowId = count2 - 1. If count2 > count1, batch 2 wins — a spurious ordering victory based on batch size rather than actual row position.

While window-join today calls computeBatch only once per group (making the bug dormant), the API contract is now violated and any future change to the calling pattern will silently produce wrong first()/last() results. The actual physical row ID (derivable from rowLo) should be passed here.

🟡 Bug 5: `MaxIPv4GroupByFunction` — null guard inconsistency (style/latent)

File: MaxIPv4GroupByFunction.java

final long existing = Numbers.ipv4ToLong(mapValue.getIPv4(valueIndex));
if (max > existing || existing == Numbers.IPv4_NULL) {

Numbers.IPv4_NULL is typed as int, so this promotes to long 0 for the comparison. MinIPv4GroupByFunction correctly precomputes:

private static final long IPv4_NULL_AS_LONG = Numbers.ipv4ToLong(Numbers.IPv4_NULL);

Accidentally works today (both evaluate to 0L), but the asymmetry is a maintenance hazard. MaxIPv4 should use the same IPv4_NULL_AS_LONG constant as MinIPv4.

🟡 Design concern: `FirstNotNull*` triple-condition guard

All FirstNotNull* variants contain:

if (rowId < existingRowId || existingRowId == Numbers.LONG_NULL
        || mapValue.getXxx(valueIndex + 1) == NULL_SENTINEL) {

The third arm (mapValue.getXxx() == NULL_SENTINEL) re-reads a value slot independently of the row-ID slot. If for any reason the value slot is null while the row-ID slot is non-null (e.g. from a partial initialisation, a future refactoring), this silently triggers an update that bypasses the row-ordering logic. It would be safer to rely exclusively on existingRowId == LONG_NULL to detect "empty state", since existingRowId and the value are always written atomically.

🧠 Learnings used

Learnt from: jerrinot
Repo: questdb/questdb PR: 0
File: :0-0
Timestamp: 2026-02-04T11:18:36.492Z
Learning: For PRs by jerrinot, always use a thorough and critical review mode: question design decisions, challenge assumptions, scrutinize implementations for edge cases and potential issues, and avoid superficial validation. They prefer hard-core reviews that push back on the code.

Learnt from: javier
Repo: questdb/questdb PR: 0
File: :0-0
Timestamp: 2026-02-06T18:22:28.130Z
Learning: For questdb/questdb, javier expects a rigorous, push-back style review; always include concrete line-cited findings and propose targeted tests/fixes when reviewing javier’s PRs.

Learnt from: javier
Repo: questdb/questdb PR: 0
File: :0-0
Timestamp: 2026-02-06T18:22:28.130Z
Learning: In questdb/questdb, for PRs requested by javier, provide line-cited, push-back style reviews with concrete follow-ups (tests, diffs, and performance considerations).

coderabbit mentioned bugs

…p_by

glasstiger · 2026-03-04T18:17:22Z

What Claude thinks about coderabbit comments:

                                                                                                                                                                                         
  Bugs 1-3 (Infinity erasure): Valid but extremely unlikely in practice                                                                                                                  
                                                                                                                                                                                         
  The bug is real — we proved it with tests. The isFinite() check conflates two states: "uninitialized" (NaN sentinel) and "overflowed to Infinity". However:                            
                                                                                                                                                                                         
  - Triggering requires the accumulated sum to overflow to ±Infinity across batches. For doubles this means values near Double.MAX_VALUE (~1.8×10³⁰⁸). Real-world time-series data       
  essentially never reaches these magnitudes.
  - The same bug exists in computeNext and merge for SumDouble and SumFloat — CodeRabbit didn't catch those. Interestingly, AvgDouble's computeNext and merge are fine because they use
  the count field to track initialization instead of isFinite().
  - The outer check has the same issue too: Numbers.isFinite(batchSum) in SumDouble silently discards an entire batch if its sum overflows to Infinity. CodeRabbit missed this.

  Verdict: Correct finding, trivial fix, near-zero practical impact. Worth fixing for correctness.

  Bug 4 (startRowId = 0 in window-join): Valid but dormant

  Confirmed: all window-join callers hardcode startRowId = 0. But the code always calls computeBatch exactly once per window with a fresh setEmpty() before each call. The startRowId
  parameter only matters when accumulating across multiple calls (for first()/last() ordering), which never happens in the current window-join paths.

  Verdict: Technically an API contract violation but completely dormant. Defensive fix would be nice, but it's not a bug today.

  Bug 5 (MaxIPv4 null guard): Not a bug

  Numbers.IPv4_NULL is int 0. When compared with long existing, Java promotes it to 0L. Numbers.ipv4ToLong(IPv4_NULL) also returns 0L. So the comparison works correctly. MinIPv4 uses a
  precomputed IPv4_NULL_AS_LONG constant for clarity.

  Verdict: Style inconsistency, not a bug. The behavior is identical.

  Design concern (FirstNotNull triple-condition): Speculative

  CodeRabbit worries about partial initialization where the value slot is null but the row-ID slot isn't. This can't happen with the current code — both slots are always written
  together atomically. The third condition is a harmless safety net.

  Verdict: Not actionable. The code is correct.

glasstiger · 2026-03-04T18:19:36Z

@CodeRabbit re-review the PR

coderabbitai · 2026-03-04T18:21:31Z

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Fetch key functions in SumDoubleGroupByFunction and SumFloatGroupByFunction
fd -e java 'SumDoubleGroupByFunction.java' --exec cat {}
echo "=========="
fd -e java 'SumFloatGroupByFunction.java' --exec cat {}
echo "=========="
fd -e java 'AvgDoubleGroupByFunction.java' --exec cat {}

Length of output: 34903

@glasstiger — Having read the actual source, I need to correct my previous review. I misread the code.

Corrections to my previous review

Bugs 1, 2, 3 were wrong. After reading the actual implementations:

AvgDoubleGroupByFunction.computeBatch uses !Numbers.isNull(batchSum) and !Double.isNaN(prevSum) — correctly distinguishes the NaN empty-sentinel from a legitimate Infinity.
SumDoubleGroupByFunction.computeBatch uses !Double.isNaN(existing) — same correct logic.
SumFloatGroupByFunction.computeBatch uses !Float.isNaN(existing) — same correct logic.

All three computeBatch implementations handle the Infinity vs. NaN distinction correctly. I reported the wrong guard (Numbers.isFinite) when the actual code uses isNaN. I apologize for the noise.

What is actually a real (minor) new issue

Claude caught something I missed: SumDoubleGroupByFunction.computeBatch has an outer guard:

final double batchSum = Vect.sumDouble(ptr, count);
if (Numbers.isFinite(batchSum)) {   // ← entire batch silently dropped if batchSum = ±Infinity
    ...
}

If a batch of values sums to ±Infinity, the whole batch is silently discarded — no update to existing at all. SumFloatGroupByFunction doesn't have this problem because its loop accumulates per-value into acc and only sets hasFinite = true on individual finite values, never discarding Infinity-producing partial sums as a whole unit. The pre-existing computeNext and merge paths in SumDouble have the same Infinity-erasure pattern, so this is consistent with existing behavior, but it's still semantically surprising. Near-zero real-world impact.

Standing findings (unchanged)

Bug 4 (dormant API contract): All window-join call sites pass startRowId = 0 to computeBatch. Currently dormant because each call site invokes computeBatch exactly once per group after setEmpty(). The ordering semantics of first()/last() would silently break if that calling pattern ever changes.

Bug 5 (MaxIPv4): Confirmed style inconsistency only, not a behavioral bug. Withdrawing the "bug" label.

🧠 Learnings used

Learnt from: javier
Repo: questdb/questdb PR: 0
File: :0-0
Timestamp: 2026-02-06T18:22:28.130Z
Learning: In questdb/questdb, for PRs requested by javier, provide line-cited, push-back style reviews with concrete follow-ups (tests, diffs, and performance considerations).

Learnt from: javier
Repo: questdb/questdb PR: 0
File: :0-0
Timestamp: 2026-02-06T18:22:28.130Z
Learning: For questdb/questdb, javier expects a rigorous, push-back style review; always include concrete line-cited findings and propose targeted tests/fixes when reviewing javier’s PRs.

Learnt from: jerrinot
Repo: questdb/questdb PR: 0
File: :0-0
Timestamp: 2026-02-04T11:18:36.492Z
Learning: For PRs by jerrinot, always use a thorough and critical review mode: question design decisions, challenge assumptions, scrutinize implementations for edge cases and potential issues, and avoid superficial validation. They prefer hard-core reviews that push back on the code.

✅ Actions performed

Full review triggered.

coderabbitai

Actionable comments posted: 8

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

core/src/main/java/io/questdb/griffin/engine/functions/groupby/LastNotNullDoubleGroupByFunction.java (1)
41-68: ⚠️ Potential issue | 🟡 Minor

Inconsistent null-check predicate: computeBatch uses isNull, computeNext uses isFinite.

Line 47 uses !Numbers.isNull(value) while line 63 uses Numbers.isFinite(arg.getDouble(record)). This creates a behavioral discrepancy:

Path NaN ±Infinity

computeBatch skipped accepted

computeNext skipped skipped

For last_not_null semantics, ±Infinity is arguably a valid non-null value and should be treated consistently. The computeNext predicate is stricter and may incorrectly skip infinite values.
🔧 Suggested fix: align computeNext with computeBatch
     `@Override`
     public void computeNext(MapValue mapValue, Record record, long rowId) {
-        if (Numbers.isFinite(arg.getDouble(record))) {
+        double val = arg.getDouble(record);
+        if (!Numbers.isNull(val)) {
             if (Numbers.isNull(mapValue.getDouble(valueIndex + 1)) || rowId > mapValue.getLong(valueIndex)) {
                 computeFirst(mapValue, record, rowId);
             }
         }
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@core/src/main/java/io/questdb/griffin/engine/functions/groupby/LastNotNullDoubleGroupByFunction.java`
around lines 41 - 68, computeNext currently uses
Numbers.isFinite(arg.getDouble(record)) which rejects ±Infinity while
computeBatch only rejects Numbers.isNull; update computeNext to use the same
null-check as computeBatch (i.e., treat any non-null double, including
±Infinity, as valid) by replacing the Numbers.isFinite predicate with a
!Numbers.isNull check on arg.getDouble(record), keeping the existing logic that
compares mapValue.getDouble(valueIndex + 1) and mapValue.getLong(valueIndex) and
calls computeFirst(mapValue, record, rowId) when appropriate.
core/src/main/java/io/questdb/griffin/engine/functions/groupby/MinCharGroupByFunction.java (1)
122-128: ⚠️ Potential issue | 🟠 Major

Bug: merge uses putInt instead of putChar.

Line 126 calls destValue.putInt(valueIndex, srcMin) but the value type is ColumnType.CHAR (initialized in initValueTypes). This should be putChar to match the column type and be consistent with computeFirst, computeNext, and computeBatch which all use putChar.
🐛 Proposed fix
     `@Override`
     public void merge(MapValue destValue, MapValue srcValue) {
         char srcMin = srcValue.getChar(valueIndex);
         char destMin = destValue.getChar(valueIndex);
         if (srcMin != 0 && (srcMin < destMin || destMin == 0)) {
-            destValue.putInt(valueIndex, srcMin);
+            destValue.putChar(valueIndex, srcMin);
         }
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@core/src/main/java/io/questdb/griffin/engine/functions/groupby/MinCharGroupByFunction.java`
around lines 122 - 128, In MinCharGroupByFunction, the merge(MapValue destValue,
MapValue srcValue) implementation stores the min value with
destValue.putInt(...) while the column type is CHAR; change the write to
destValue.putChar(valueIndex, srcMin) so it matches initValueTypes and the other
methods (computeFirst, computeNext, computeBatch) that use putChar and uses the
same valueIndex and MapValue semantics.

🧹 Nitpick comments (14)

core/src/test/java/io/questdb/test/griffin/engine/functions/groupby/ShortGroupByFunctionBatchTest.java (1)

205-205: Use a non-zero startRowId on the second sum batch to harden API-contract coverage.

On Line 205, passing 0 again doesn’t validate cross-segment offset handling. Using 2 here makes this test consistent with the new segmented-batch contract and better guards future regressions.

Suggested change

-            function.computeBatch(value, ptr, 2, 0);
+            function.computeBatch(value, ptr, 2, 2);

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In
`@core/src/test/java/io/questdb/test/griffin/engine/functions/groupby/ShortGroupByFunctionBatchTest.java`
at line 205, In ShortGroupByFunctionBatchTest, the second computeBatch call uses
startRowId 0 which doesn't exercise cross-segment offsets; update the second
invocation of function.computeBatch(value, ptr, 2, 0) to use a non-zero
startRowId (e.g., 2) so it becomes function.computeBatch(value, ptr, 2, 2),
ensuring the computeBatch method is tested with a non-zero startRowId to
validate segmented-batch behavior.

core/src/test/java/io/questdb/test/griffin/ExplainPlanTest.java (1)

3132-3137: Update stale inline comment to match the new expectation

Line 3132 says aggregate expressions disable vectorization, but Line 3136 now expects vectorized: true. Please align the comment with the actual behavior to avoid confusion.

✏️ Suggested comment tweak

-    `@Test` // expressions in aggregates disable vectorized impl
+    `@Test` // aggregate expressions may still vectorize depending on function support
     public void testGroupByNotKeyed2() throws Exception {

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@core/src/test/java/io/questdb/test/griffin/ExplainPlanTest.java` around lines
3132 - 3137, The inline comment before the test method testGroupByNotKeyed2 is
now misleading: it states "expressions in aggregates disable vectorized impl"
while the test expectation shows vectorized: true; update that comment to
reflect the current behavior (e.g., "expressions in aggregates do not disable
vectorized implementation" or similar) so the `@Test` comment matches the expected
output in the test asserting vectorized: true in testGroupByNotKeyed2.

core/src/main/java/io/questdb/griffin/engine/functions/groupby/CountGeoHashGroupByFunctionInt.java (1)

51-51: Skip zero-delta map writes in the batch count path.

Line 51 always performs a map mutation, even when nonNullCount is 0. In a tight vectorized path this adds avoidable write traffic.

♻️ Proposed tweak

-            mapValue.addLong(valueIndex, nonNullCount);
+            if (nonNullCount > 0) {
+                mapValue.addLong(valueIndex, nonNullCount);
+            }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In
`@core/src/main/java/io/questdb/griffin/engine/functions/groupby/CountGeoHashGroupByFunctionInt.java`
at line 51, In CountGeoHashGroupByFunctionInt update the batch count path to
avoid mutating the map when nonNullCount is zero: before calling
mapValue.addLong(valueIndex, nonNullCount) check if nonNullCount != 0 and only
perform the addLong when it's non-zero (leave existing behavior for non-zero
counts); this prevents unnecessary map writes in the tight vectorized path while
preserving current semantics for positive counts.

core/src/main/java/io/questdb/griffin/engine/functions/groupby/MaxDateGroupByFunction.java (1)

48-56: Missing null guard for batchMax – inconsistent with MinLongGroupByFunction.

Vect.maxLong can return Numbers.LONG_NULL when all batch values are null. The current condition batchMax > existing will evaluate to false in that case (since LONG_NULL is Long.MIN_VALUE), so it won't corrupt data. However, this differs from the pattern in MinLongGroupByFunction.computeBatch (lines 51-56) which explicitly guards:

if (batchMin != Numbers.LONG_NULL) {
    // ... update logic
}

Similarly, merge() at line 113 lacks a null guard unlike MinLongGroupByFunction.merge() (line 120).

Consider adding an explicit guard for consistency and clarity:

♻️ Suggested refactor for consistency

     public void computeBatch(MapValue mapValue, long ptr, int count, long startRowId) {
         if (count > 0) {
             final long batchMax = Vect.maxLong(ptr, count);
-            final long existing = mapValue.getDate(valueIndex);
-            if (batchMax > existing) {
-                mapValue.putDate(valueIndex, batchMax);
+            if (batchMax != Numbers.LONG_NULL) {
+                final long existing = mapValue.getDate(valueIndex);
+                if (batchMax > existing || existing == Numbers.LONG_NULL) {
+                    mapValue.putDate(valueIndex, batchMax);
+                }
             }
         }
     }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In
`@core/src/main/java/io/questdb/griffin/engine/functions/groupby/MaxDateGroupByFunction.java`
around lines 48 - 56, Add explicit null checks like MinLongGroupByFunction: in
MaxDateGroupByFunction.computeBatch, after computing batchMax =
Vect.maxLong(ptr, count) guard with "if (batchMax != Numbers.LONG_NULL)" before
comparing/updating mapValue (use mapValue.getDate(valueIndex) /
mapValue.putDate(valueIndex, batchMax)); likewise update
MaxDateGroupByFunction.merge() to first check merged value != Numbers.LONG_NULL
before merging into existing value to mirror MinLongGroupByFunction's null-guard
pattern for clarity and consistency.

core/src/test/java/io/questdb/test/griffin/engine/functions/groupby/DateGroupByFunctionBatchTest.java (1)

322-347: Strengthen min/max accumulation tests with a non-zero second-batch startRowId.

min/max should be row-id agnostic. Reusing 0 in both calls makes it easier for accidental row-id gating bugs to slip through unnoticed.

🔧 Suggested test hardening

-            function.computeBatch(value, ptr, 2, 0);
+            function.computeBatch(value, ptr, 2, 2);
...
-            function.computeBatch(value, ptr, 2, 0);
+            function.computeBatch(value, ptr, 2, 2);

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In
`@core/src/test/java/io/questdb/test/griffin/engine/functions/groupby/DateGroupByFunctionBatchTest.java`
around lines 322 - 347, Update the two tests so the second computeBatch call
uses a non-zero startRowId to ensure row-id agnosticism: in
testMaxDateBatchAccumulates and testMinDateBatchAccumulates change the second
call to MaxDateGroupByFunction.computeBatch(...) and
MinDateGroupByFunction.computeBatch(...) respectively to pass a non-zero
startRowId (e.g., 2) instead of 0 while keeping the batch size (2) unchanged;
this ensures the second batch is treated as a subsequent block of rows rather
than restarting at row 0.

core/src/test/java/io/questdb/test/cairo/fuzz/ParallelGroupByFuzzTest.java (1)

2246-2255: Add plan/path assertions to the two new path-specific tests.

Both tests currently validate only final results. They can still pass after a fallback to non-vectorized/non-hybrid execution, which weakens regression detection for the intended execution paths.

Also applies to: 2323-2333

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@core/src/test/java/io/questdb/test/cairo/fuzz/ParallelGroupByFuzzTest.java`
around lines 2246 - 2255, The two new tests
(testParallelNonKeyedGroupByWithColumnTopsAndFirstLast and the similar test at
lines 2323-2333 that call testParallelNonKeyedGroupBy(...)) only validate
results and must also assert the execution path/plan to prevent silent
fallbacks; update each test to capture and assert the query execution plan or
runtime path (e.g., via existing helpers like assertPlanContains or
assertPathUsed) to verify the expected vectorized/hybrid operator is present
(reference the call to testParallelNonKeyedGroupBy and the SQL string passed)
and assert that no non-vectorized/non-hybrid fallback operator is used. Ensure
the assertion checks for the unique operator or plan fragment name your engine
uses for the intended path so the test fails if execution falls back.

core/src/main/java/io/questdb/griffin/engine/table/AsyncGroupByNotKeyedAtom.java (1)

97-104: Avoid leaking mutable batch metadata state.

getBatchColumnIndexes() (Line 193) returns the internal array directly, while hasNonBatchFunctions is derived once in constructor (Line 97-104). External mutation can silently desync these two and affect execution-path decisions.

♻️ Suggested defensive copy

@@
 import java.io.Closeable;
+import java.util.Arrays;
@@
-            this.batchColumnIndexes = batchColumnIndexes;
+            this.batchColumnIndexes = Arrays.copyOf(batchColumnIndexes, batchColumnIndexes.length);
@@
     public int[] getBatchColumnIndexes() {
-        return batchColumnIndexes;
+        return Arrays.copyOf(batchColumnIndexes, batchColumnIndexes.length);
     }

Also applies to: 193-195

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In
`@core/src/main/java/io/questdb/griffin/engine/table/AsyncGroupByNotKeyedAtom.java`
around lines 97 - 104, Batch metadata can be mutated externally because
batchColumnIndexes is stored directly and getBatchColumnIndexes() exposes the
internal array while hasNonBatchFunctions is computed once in the constructor;
to fix, defensively copy the array both when assigning it in the constructor and
when returning it from getBatchColumnIndexes(), and recompute
hasNonBatchFunctions based on the defensive copy (or compute it from the copied
array) so the internal boolean stays consistent with the stored array; reference
symbols: AsyncGroupByNotKeyedAtom, batchColumnIndexes, hasNonBatchFunctions,
getBatchColumnIndexes().

core/src/main/java/io/questdb/griffin/engine/join/WindowJoinFastRecordCursorFactory.java (1)

889-892: Avoid hardcoding startRowId to 0 in vectorized batch calls.

Line 889 and Line 891 currently force zero-based row ids. Passing a logical window-relative start (e.g., derived from rowLo) makes this path safer if aggregation is ever split into multiple computeBatch() calls per function.

💡 Future-proof wiring

+                    final long batchStartRowId = rowLo;
                     for (int i = 0, n = groupByFunctions.size(); i < n; i++) {
                         int mapIndex = groupByFunctionToColumnIndex.getQuick(i);
                         final long ptr = slaveData.get(idx, mapIndex + 2);
                         if (ptr != 0) {
                             final long typeSize = ColumnType.sizeOfTag((short) groupByFuncTypes.getQuick(mapIndex));
                             // in case of include prevailing column values are shifted by one slot, hence `rowLo + 1`
                             final long efficientRowLo = includePrevailing ? rowLo + 1 : rowLo;
-                            groupByFunctions.getQuick(i).computeBatch(value, columnSink.of(ptr).startAddress() + typeSize * efficientRowLo, (int) (rowHi - rowLo), 0);
+                            groupByFunctions.getQuick(i).computeBatch(value, columnSink.of(ptr).startAddress() + typeSize * efficientRowLo, (int) (rowHi - rowLo), batchStartRowId);
                         } else { // no-arg function, e.g. count()
-                            groupByFunctions.getQuick(i).computeBatch(value, 0, (int) (rowHi - rowLo), 0);
+                            groupByFunctions.getQuick(i).computeBatch(value, 0, (int) (rowHi - rowLo), batchStartRowId);
                         }
                     }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In
`@core/src/main/java/io/questdb/griffin/engine/join/WindowJoinFastRecordCursorFactory.java`
around lines 889 - 892, The batch calls to groupByFunctions currently hardcode
startRowId to 0 for no-arg functions, which breaks correctness if aggregation is
split; update the no-arg computeBatch call in WindowJoinFastRecordCursorFactory
so it uses the logical window-relative start (the same start used for the
data-backed path) instead of 0 — e.g., pass the int-casted window start (derived
from rowLo / efficientRowLo) as the third computeBatch startRowId parameter when
calling groupByFunctions.getQuick(i).computeBatch(value, 0/...,
(int)(rowHi-rowLo), startRowId).

core/src/main/java/io/questdb/griffin/engine/table/AsyncGroupByNotKeyedRecordCursorFactory.java (2)

282-297: Reduce duplicated row-by-row fallback logic.

The column-top fallback loop in aggregateVect(...) repeats the same row iteration/update pattern already present in aggregate(...). A shared helper would reduce divergence risk between the two paths.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In
`@core/src/main/java/io/questdb/griffin/engine/table/AsyncGroupByNotKeyedRecordCursorFactory.java`
around lines 282 - 297, The column-top fallback loop in aggregateVect(...)
duplicates the row-by-row iteration/update logic from aggregate(...); extract
that logic into a shared private helper on
AsyncGroupByNotKeyedRecordCursorFactory (e.g., applyRowByRowFallback or
processFrameRowByRow) that accepts the frameMemory, frameRowCount, slotId, atom,
value, and record (or the minimal pieces needed), moves the
record.init(frameMemory)/record.setRowIndex(0) setup, obtains
GroupByFunctionsUpdater via atom.getFunctionUpdater(slotId), and runs the
for-loop using value.isNew(), functionUpdater.updateNew/updateExisting and
correct rowId incrementing; then replace the duplicated loop in both
aggregateVect(...) and aggregate(...) with a call to this helper so both paths
share identical logic.

75-76: Rename boolean field to isVectorized for guideline compliance.

The new boolean field name vectorized should use an is... prefix for consistency with repository conventions.

♻️ Proposed rename

-    private final boolean vectorized;
+    private final boolean isVectorized;
@@
-            this.vectorized = (reducer == AGGREGATE_VECT);
+            this.isVectorized = (reducer == AGGREGATE_VECT);
@@
-        sink.attr("vectorized").val(vectorized);
+        sink.attr("vectorized").val(isVectorized);

As per coding guidelines "When choosing a name for a boolean variable, field or method, always use the is... or has... prefix, as appropriate".

Also applies to: 136-137, 185-185

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In
`@core/src/main/java/io/questdb/griffin/engine/table/AsyncGroupByNotKeyedRecordCursorFactory.java`
around lines 75 - 76, Rename the boolean field vectorized in
AsyncGroupByNotKeyedRecordCursorFactory to isVectorized and update all
references (constructor parameter, assignments, any getters or uses) to the new
name; specifically change the private field declaration, the constructor
parameter and its assignment, and any reads of vectorized at locations
referenced (around the areas previously at lines ~136-137 and ~185) to use
isVectorized so the class, its methods and callers compile and follow the is...
boolean naming guideline.

core/src/test/java/io/questdb/test/griffin/engine/functions/groupby/FloatGroupByFunctionBatchTest.java (1)

71-83: Wrap newly added native-memory tests in assertMemoryLeak().

These tests allocate off-heap buffers; adding assertMemoryLeak() would verify cleanup on all paths, including assertion-failure/error paths.

As per coding guidelines "Write all tests using assertMemoryLeak(). This isn't needed for narrow unit tests that don't allocate native memory."

Also applies to: 126-138, 193-205, 222-237, 273-287, 303-315, 350-362, 395-418, 420-432
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@core/src/test/java/io/questdb/test/griffin/engine/functions/groupby/FloatGroupByFunctionBatchTest.java`
around lines 71 - 83, The test allocates off-heap buffers (via allocateFloats
and CountFloatGroupByFunction.computeBatch) but is not wrapped with
assertMemoryLeak(), so native memory might not be verified/cleaned on failure;
update the test method testCountFloatBatchAccumulates (and the other similar
tests in this file that allocate native buffers) to wrap the test body inside
assertMemoryLeak(() -> { ... }), keeping the existing calls to
prepare(function), allocateFloats(...), function.computeBatch(...), and
function.getLong(value) unchanged, and do the same for other
CountFloatGroupByFunction-related test methods referenced in the review (the
blocks around the noted ranges) so all off-heap allocations are checked for
leaks.

core/src/test/java/io/questdb/test/griffin/engine/functions/groupby/GeoHashGroupByFunctionBatchTest.java (1)

82-97: Wrap newly added geohash batch tests with assertMemoryLeak().

These tests rely on native buffers, so leak checks should be enforced at test execution scope, not only in teardown.

As per coding guidelines "Write all tests using assertMemoryLeak(). This isn't needed for narrow unit tests that don't allocate native memory."

Also applies to: 114-130, 146-162, 178-194, 208-221, 246-259, 289-304, 366-381
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@core/src/test/java/io/questdb/test/griffin/engine/functions/groupby/GeoHashGroupByFunctionBatchTest.java`
around lines 82 - 97, Wrap the body of the new geohash batch test methods (e.g.,
testCountGeoHashByteBatchAccumulates) in assertMemoryLeak(() -> { ... }) to
ensure native buffer leak checks run during test execution; keep the existing
resource handling (try-with-resources on SimpleMapValue from prepare(...)) and
calls to allocateBytes(...) and function.computeBatch(...) inside the lambda so
they execute under the leak assertion. Apply the same wrapping to all other
added geohash batch tests referenced (the additional test methods in this file)
so each test runs under assertMemoryLeak.

core/src/test/java/io/questdb/test/griffin/engine/groupby/GroupByFunctionCaseTest.java (1)

245-256: Rename boolean to isAsyncVectorized for guideline compliance.

Line [245] introduces a boolean local without the required is.../has... prefix.

✏️ Proposed refactor

-        boolean asyncVectorized = !rosti && keys == null
+        boolean isAsyncVectorized = !rosti && keys == null
                 && ((t == CHAR && f >= 4)                    // min/max on char
                 || (t == FLOAT && (f == 2 || f >= 4)));      // sum/min/max on float
@@
-                planSink.put("  vectorized: ").put(asyncVectorized).put('\n');
+                planSink.put("  vectorized: ").put(isAsyncVectorized).put('\n');

As per coding guidelines, "When choosing a name for a boolean variable, field or method, always use the is... or has... prefix, as appropriate".

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In
`@core/src/test/java/io/questdb/test/griffin/engine/groupby/GroupByFunctionCaseTest.java`
around lines 245 - 256, Rename the boolean local asyncVectorized to
isAsyncVectorized and update all its usages in this method: replace its
declaration "boolean asyncVectorized = ..." and the reference in the
planSink.put call so the variable name follows the is... prefix guideline;
ensure the boolean expression remains unchanged (still using rosti, keys, t, f)
and run tests to confirm no other references need updating.

core/src/main/java/io/questdb/griffin/engine/functions/groupby/AvgDoubleGroupByFunction.java (1)

55-59: Inconsistent null checks: Numbers.isNull vs Double.isNaN.

Line 53 uses Numbers.isNull(batchSum) while Line 55 uses Double.isNaN(prevSum). For consistency and correctness, consider using the same null-checking approach for both. If Numbers.isNull is the correct check for batch results from Vect.sumDoubleAcc, it should also be used for prevSum.

♻️ Suggested alignment

             if (!Numbers.isNull(batchSum)) {
                 final double prevSum = mapValue.getDouble(valueIndex);
-                if (!Double.isNaN(prevSum)) {
+                if (!Numbers.isNull(prevSum)) {
                     mapValue.putDouble(valueIndex, prevSum + batchSum);
                 } else {

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In
`@core/src/main/java/io/questdb/griffin/engine/functions/groupby/AvgDoubleGroupByFunction.java`
around lines 55 - 59, The null-checks in AvgDoubleGroupByFunction are
inconsistent: replace the Double.isNaN(prevSum) check with
Numbers.isNull(prevSum) so both batchSum and prevSum use the same null-detection
(the values produced by Vect.sumDoubleAcc should be tested with Numbers.isNull);
update the branch that writes to mapValue (using valueIndex) to treat a null
prevSum as missing and only set prevSum+batchSum when prevSum is not
Numbers.isNull, otherwise store batchSum, and ensure any existing logic that
relied on Double.isNaN is preserved but using Numbers.isNull instead.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@core/src/main/java/io/questdb/griffin/engine/functions/groupby/FirstCharGroupByFunction.java`:
- Around line 48-54: computeBatch in FirstCharGroupByFunction records the
first-seen row by comparing startRowId against existingRowId, but several
window/async join callers (WindowJoinFastRecordCursorFactory,
AsyncWindowJoinRecordCursorFactory, AsyncWindowJoinFastRecordCursorFactory)
always pass 0 for startRowId across batches, breaking correctness when later
batches contain lower row IDs; fix by changing those callers to compute and pass
the actual starting row ID for each batch (the first rowId represented by ptr
for that batch) instead of 0 so FirstCharGroupByFunction.computeBatch and its
use of mapValue/valueIndex continue to pick the true lowest row id, and add
regression tests exercising multi-batch group cases (e.g., mimic
BooleanGroupByFunctionBatchTest pattern) to ensure updates occur when later
batches contain smaller row IDs.

In
`@core/src/main/java/io/questdb/griffin/engine/functions/groupby/FirstNotNullIPv4GroupByFunctionFactory.java`:
- Around line 68-83: The computeBatch loop in
FirstNotNullIPv4GroupByFunctionFactory relies on a correct startRowId to
preserve first_not_null ordering, but the window join factories call
computeBatch(...) with hardcoded startRowId=0; update
WindowJoinFastRecordCursorFactory, AsyncWindowJoinRecordCursorFactory, and
AsyncWindowJoinFastRecordCursorFactory so they pass the actual starting slave
row id for the batch (derive startRowId from the first row in the batch range,
e.g., slaveRowIds.get(rowLo) or the equivalent index used for the batch) instead
of 0 when invoking computeBatch, ensuring computeBatch's rowId arithmetic
(startRowId + offset) reflects real row ordering for first_not_null.

In
`@core/src/main/java/io/questdb/griffin/engine/functions/groupby/LastNotNullTimestampGroupByFunction.java`:
- Around line 49-52: In LastNotNullTimestampGroupByFunction ensure
computeBatch() uses the same null-value fallback as computeNext(): when deciding
to replace the stored pair, check not only (rowId > existingRowId ||
existingRowId == Numbers.LONG_NULL) but also whether the stored timestamp is
NULL (e.g., mapValue.getLong(valueIndex + 1) == Numbers.LONG_NULL); update the
conditional in computeBatch() to include that OR check and write both
mapValue.putLong(valueIndex, rowId) and mapValue.putLong(valueIndex + 1, value)
accordingly. Also add a regression test that initializes a group with a null
stored value state, then calls computeBatch() with a non-null candidate and
asserts the stored values match the candidate (and matches computeNext()
behavior).

In
`@core/src/main/java/io/questdb/griffin/engine/functions/groupby/SumFloatGroupByFunction.java`:
- Around line 59-66: In SumFloatGroupByFunction, make computeBatch's
overflow/null check consistent with computeNext by replacing the existing check
that uses !Float.isNaN(existing) with Float.isFinite(existing); in the
computeBatch path that reads mapValue.getFloat(valueIndex) into existing, only
add acc to existing when Float.isFinite(existing), otherwise overwrite with acc
(same behavior as computeNext which uses Float.isFinite(sum)), so adjust the
conditional around mapValue.putFloat accordingly to reference
Float.isFinite(existing) and keep the same write behavior to
mapValue.putFloat(valueIndex, acc) when not finite.

In
`@core/src/main/java/io/questdb/griffin/engine/join/AsyncWindowJoinRecordCursorFactory.java`:
- Around line 500-503: The computeBatch() calls in
AsyncWindowJoinRecordCursorFactory are passing a hardcoded startRowId of 0 which
breaks the GroupByFunction.computeBatch() contract; update every
groupByFunctions.getQuick(i).computeBatch(...) invocation (the ones that
currently pass 0) to pass the actual batch start row id (use rowLo / cast to int
where necessary) instead of 0 so the i-th record is treated as startRowId + i;
ensure this change is applied to all occurrences around the
columnSink/of(ptr)/typeSize usages and in the no-arg branches so order-sensitive
aggregates like FirstDoubleGroupByFunction get the correct startRowId.

In
`@core/src/test/java/io/questdb/test/griffin/engine/functions/groupby/BooleanGroupByFunctionBatchTest.java`:
- Around line 66-78: The new native-memory test testFirstBooleanBatchAccumulates
(and the similar test around allocateBooleans at the other spot) must be wrapped
with the assertMemoryLeak() harness; update the test method(s) so the body that
calls allocateBooleans and function.computeBatch (for
FirstBooleanGroupByFunction and any other affected tests) is executed inside
assertMemoryLeak(() -> { ... }); thereby ensuring allocateBooleans usage is
covered by the leak checker while leaving the test assertions (e.g.,
Assert.assertTrue(function.getBool(value))) inside the lambda.

In
`@core/src/test/java/io/questdb/test/griffin/engine/functions/groupby/CharGroupByFunctionBatchTest.java`:
- Around line 243-245: The test calls computeBatch with length 4 after
allocateChars('d','b','c') only allocates 3 chars, risking OOB reads; update the
call in testMinCharBatchSimple to pass the correct batch length (use 3) or
allocate four characters to match the 4; locate the allocateChars(...) call and
the function.computeBatch(value, ptr, 4, 0) invocation and either change the
third argument from 4 to 3 or add a fourth char to allocateChars so the length
matches.

In
`@core/src/test/java/io/questdb/test/griffin/engine/functions/groupby/ShortGroupByFunctionBatchTest.java`:
- Around line 88-100: Wrap each test that uses native allocation in an
assertMemoryLeak(...) block: for the testFirstShortBatchAccumulates method (and
the other new tests around lines with FirstShortGroupByFunction usage at 161-175
and 197-209), place the existing test body that calls allocateShorts(...) and
invokes FirstShortGroupByFunction.computeBatch(...) and function.getShort(...)
inside assertMemoryLeak(() -> { ... }); so native memory allocated by
allocateShorts is always checked and released across all paths; keep the
existing try-with-resources (SimpleMapValue value = prepare(function)) inside
that lambda.

---

Outside diff comments:
In
`@core/src/main/java/io/questdb/griffin/engine/functions/groupby/LastNotNullDoubleGroupByFunction.java`:
- Around line 41-68: computeNext currently uses
Numbers.isFinite(arg.getDouble(record)) which rejects ±Infinity while
computeBatch only rejects Numbers.isNull; update computeNext to use the same
null-check as computeBatch (i.e., treat any non-null double, including
±Infinity, as valid) by replacing the Numbers.isFinite predicate with a
!Numbers.isNull check on arg.getDouble(record), keeping the existing logic that
compares mapValue.getDouble(valueIndex + 1) and mapValue.getLong(valueIndex) and
calls computeFirst(mapValue, record, rowId) when appropriate.

In
`@core/src/main/java/io/questdb/griffin/engine/functions/groupby/MinCharGroupByFunction.java`:
- Around line 122-128: In MinCharGroupByFunction, the merge(MapValue destValue,
MapValue srcValue) implementation stores the min value with
destValue.putInt(...) while the column type is CHAR; change the write to
destValue.putChar(valueIndex, srcMin) so it matches initValueTypes and the other
methods (computeFirst, computeNext, computeBatch) that use putChar and uses the
same valueIndex and MapValue semantics.

---

Nitpick comments:
In
`@core/src/main/java/io/questdb/griffin/engine/functions/groupby/AvgDoubleGroupByFunction.java`:
- Around line 55-59: The null-checks in AvgDoubleGroupByFunction are
inconsistent: replace the Double.isNaN(prevSum) check with
Numbers.isNull(prevSum) so both batchSum and prevSum use the same null-detection
(the values produced by Vect.sumDoubleAcc should be tested with Numbers.isNull);
update the branch that writes to mapValue (using valueIndex) to treat a null
prevSum as missing and only set prevSum+batchSum when prevSum is not
Numbers.isNull, otherwise store batchSum, and ensure any existing logic that
relied on Double.isNaN is preserved but using Numbers.isNull instead.

In
`@core/src/main/java/io/questdb/griffin/engine/functions/groupby/CountGeoHashGroupByFunctionInt.java`:
- Line 51: In CountGeoHashGroupByFunctionInt update the batch count path to
avoid mutating the map when nonNullCount is zero: before calling
mapValue.addLong(valueIndex, nonNullCount) check if nonNullCount != 0 and only
perform the addLong when it's non-zero (leave existing behavior for non-zero
counts); this prevents unnecessary map writes in the tight vectorized path while
preserving current semantics for positive counts.

In
`@core/src/main/java/io/questdb/griffin/engine/functions/groupby/MaxDateGroupByFunction.java`:
- Around line 48-56: Add explicit null checks like MinLongGroupByFunction: in
MaxDateGroupByFunction.computeBatch, after computing batchMax =
Vect.maxLong(ptr, count) guard with "if (batchMax != Numbers.LONG_NULL)" before
comparing/updating mapValue (use mapValue.getDate(valueIndex) /
mapValue.putDate(valueIndex, batchMax)); likewise update
MaxDateGroupByFunction.merge() to first check merged value != Numbers.LONG_NULL
before merging into existing value to mirror MinLongGroupByFunction's null-guard
pattern for clarity and consistency.

In
`@core/src/main/java/io/questdb/griffin/engine/join/WindowJoinFastRecordCursorFactory.java`:
- Around line 889-892: The batch calls to groupByFunctions currently hardcode
startRowId to 0 for no-arg functions, which breaks correctness if aggregation is
split; update the no-arg computeBatch call in WindowJoinFastRecordCursorFactory
so it uses the logical window-relative start (the same start used for the
data-backed path) instead of 0 — e.g., pass the int-casted window start (derived
from rowLo / efficientRowLo) as the third computeBatch startRowId parameter when
calling groupByFunctions.getQuick(i).computeBatch(value, 0/...,
(int)(rowHi-rowLo), startRowId).

In
`@core/src/main/java/io/questdb/griffin/engine/table/AsyncGroupByNotKeyedAtom.java`:
- Around line 97-104: Batch metadata can be mutated externally because
batchColumnIndexes is stored directly and getBatchColumnIndexes() exposes the
internal array while hasNonBatchFunctions is computed once in the constructor;
to fix, defensively copy the array both when assigning it in the constructor and
when returning it from getBatchColumnIndexes(), and recompute
hasNonBatchFunctions based on the defensive copy (or compute it from the copied
array) so the internal boolean stays consistent with the stored array; reference
symbols: AsyncGroupByNotKeyedAtom, batchColumnIndexes, hasNonBatchFunctions,
getBatchColumnIndexes().

In
`@core/src/main/java/io/questdb/griffin/engine/table/AsyncGroupByNotKeyedRecordCursorFactory.java`:
- Around line 282-297: The column-top fallback loop in aggregateVect(...)
duplicates the row-by-row iteration/update logic from aggregate(...); extract
that logic into a shared private helper on
AsyncGroupByNotKeyedRecordCursorFactory (e.g., applyRowByRowFallback or
processFrameRowByRow) that accepts the frameMemory, frameRowCount, slotId, atom,
value, and record (or the minimal pieces needed), moves the
record.init(frameMemory)/record.setRowIndex(0) setup, obtains
GroupByFunctionsUpdater via atom.getFunctionUpdater(slotId), and runs the
for-loop using value.isNew(), functionUpdater.updateNew/updateExisting and
correct rowId incrementing; then replace the duplicated loop in both
aggregateVect(...) and aggregate(...) with a call to this helper so both paths
share identical logic.
- Around line 75-76: Rename the boolean field vectorized in
AsyncGroupByNotKeyedRecordCursorFactory to isVectorized and update all
references (constructor parameter, assignments, any getters or uses) to the new
name; specifically change the private field declaration, the constructor
parameter and its assignment, and any reads of vectorized at locations
referenced (around the areas previously at lines ~136-137 and ~185) to use
isVectorized so the class, its methods and callers compile and follow the is...
boolean naming guideline.

In `@core/src/test/java/io/questdb/test/cairo/fuzz/ParallelGroupByFuzzTest.java`:
- Around line 2246-2255: The two new tests
(testParallelNonKeyedGroupByWithColumnTopsAndFirstLast and the similar test at
lines 2323-2333 that call testParallelNonKeyedGroupBy(...)) only validate
results and must also assert the execution path/plan to prevent silent
fallbacks; update each test to capture and assert the query execution plan or
runtime path (e.g., via existing helpers like assertPlanContains or
assertPathUsed) to verify the expected vectorized/hybrid operator is present
(reference the call to testParallelNonKeyedGroupBy and the SQL string passed)
and assert that no non-vectorized/non-hybrid fallback operator is used. Ensure
the assertion checks for the unique operator or plan fragment name your engine
uses for the intended path so the test fails if execution falls back.

In
`@core/src/test/java/io/questdb/test/griffin/engine/functions/groupby/DateGroupByFunctionBatchTest.java`:
- Around line 322-347: Update the two tests so the second computeBatch call uses
a non-zero startRowId to ensure row-id agnosticism: in
testMaxDateBatchAccumulates and testMinDateBatchAccumulates change the second
call to MaxDateGroupByFunction.computeBatch(...) and
MinDateGroupByFunction.computeBatch(...) respectively to pass a non-zero
startRowId (e.g., 2) instead of 0 while keeping the batch size (2) unchanged;
this ensures the second batch is treated as a subsequent block of rows rather
than restarting at row 0.

In
`@core/src/test/java/io/questdb/test/griffin/engine/functions/groupby/FloatGroupByFunctionBatchTest.java`:
- Around line 71-83: The test allocates off-heap buffers (via allocateFloats and
CountFloatGroupByFunction.computeBatch) but is not wrapped with
assertMemoryLeak(), so native memory might not be verified/cleaned on failure;
update the test method testCountFloatBatchAccumulates (and the other similar
tests in this file that allocate native buffers) to wrap the test body inside
assertMemoryLeak(() -> { ... }), keeping the existing calls to
prepare(function), allocateFloats(...), function.computeBatch(...), and
function.getLong(value) unchanged, and do the same for other
CountFloatGroupByFunction-related test methods referenced in the review (the
blocks around the noted ranges) so all off-heap allocations are checked for
leaks.

In
`@core/src/test/java/io/questdb/test/griffin/engine/functions/groupby/GeoHashGroupByFunctionBatchTest.java`:
- Around line 82-97: Wrap the body of the new geohash batch test methods (e.g.,
testCountGeoHashByteBatchAccumulates) in assertMemoryLeak(() -> { ... }) to
ensure native buffer leak checks run during test execution; keep the existing
resource handling (try-with-resources on SimpleMapValue from prepare(...)) and
calls to allocateBytes(...) and function.computeBatch(...) inside the lambda so
they execute under the leak assertion. Apply the same wrapping to all other
added geohash batch tests referenced (the additional test methods in this file)
so each test runs under assertMemoryLeak.

In
`@core/src/test/java/io/questdb/test/griffin/engine/functions/groupby/ShortGroupByFunctionBatchTest.java`:
- Line 205: In ShortGroupByFunctionBatchTest, the second computeBatch call uses
startRowId 0 which doesn't exercise cross-segment offsets; update the second
invocation of function.computeBatch(value, ptr, 2, 0) to use a non-zero
startRowId (e.g., 2) so it becomes function.computeBatch(value, ptr, 2, 2),
ensuring the computeBatch method is tested with a non-zero startRowId to
validate segmented-batch behavior.

In
`@core/src/test/java/io/questdb/test/griffin/engine/groupby/GroupByFunctionCaseTest.java`:
- Around line 245-256: Rename the boolean local asyncVectorized to
isAsyncVectorized and update all its usages in this method: replace its
declaration "boolean asyncVectorized = ..." and the reference in the
planSink.put call so the variable name follows the is... prefix guideline;
ensure the boolean expression remains unchanged (still using rosti, keys, t, f)
and run tests to confirm no other references need updating.

In `@core/src/test/java/io/questdb/test/griffin/ExplainPlanTest.java`:
- Around line 3132-3137: The inline comment before the test method
testGroupByNotKeyed2 is now misleading: it states "expressions in aggregates
disable vectorized impl" while the test expectation shows vectorized: true;
update that comment to reflect the current behavior (e.g., "expressions in
aggregates do not disable vectorized implementation" or similar) so the `@Test`
comment matches the expected output in the test asserting vectorized: true in
testGroupByNotKeyed2.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 8f0b9c3e-abe8-49b3-bea8-aad44e7d0f2a

📥 Commits

Reviewing files that changed from the base of the PR and between db3ec2c and fa588c6.

📒 Files selected for processing (103)

core/src/main/java/io/questdb/griffin/engine/functions/GroupByFunction.java
core/src/main/java/io/questdb/griffin/engine/functions/groupby/AvgDoubleGroupByFunction.java
core/src/main/java/io/questdb/griffin/engine/functions/groupby/AvgShortGroupByFunction.java
core/src/main/java/io/questdb/griffin/engine/functions/groupby/CountDoubleGroupByFunction.java
core/src/main/java/io/questdb/griffin/engine/functions/groupby/CountFloatGroupByFunction.java
core/src/main/java/io/questdb/griffin/engine/functions/groupby/CountGeoHashGroupByFunctionByte.java
core/src/main/java/io/questdb/griffin/engine/functions/groupby/CountGeoHashGroupByFunctionInt.java
core/src/main/java/io/questdb/griffin/engine/functions/groupby/CountGeoHashGroupByFunctionLong.java
core/src/main/java/io/questdb/griffin/engine/functions/groupby/CountGeoHashGroupByFunctionShort.java
core/src/main/java/io/questdb/griffin/engine/functions/groupby/CountIPv4GroupByFunction.java
core/src/main/java/io/questdb/griffin/engine/functions/groupby/CountIntGroupByFunction.java
core/src/main/java/io/questdb/griffin/engine/functions/groupby/CountLongConstGroupByFunction.java
core/src/main/java/io/questdb/griffin/engine/functions/groupby/CountLongGroupByFunction.java
core/src/main/java/io/questdb/griffin/engine/functions/groupby/FirstBooleanGroupByFunction.java
core/src/main/java/io/questdb/griffin/engine/functions/groupby/FirstByteGroupByFunction.java
core/src/main/java/io/questdb/griffin/engine/functions/groupby/FirstCharGroupByFunction.java
core/src/main/java/io/questdb/griffin/engine/functions/groupby/FirstDateGroupByFunction.java
core/src/main/java/io/questdb/griffin/engine/functions/groupby/FirstDoubleGroupByFunction.java
core/src/main/java/io/questdb/griffin/engine/functions/groupby/FirstFloatGroupByFunction.java
core/src/main/java/io/questdb/griffin/engine/functions/groupby/FirstGeoHashGroupByFunctionByte.java
core/src/main/java/io/questdb/griffin/engine/functions/groupby/FirstGeoHashGroupByFunctionInt.java
core/src/main/java/io/questdb/griffin/engine/functions/groupby/FirstGeoHashGroupByFunctionLong.java
core/src/main/java/io/questdb/griffin/engine/functions/groupby/FirstGeoHashGroupByFunctionShort.java
core/src/main/java/io/questdb/griffin/engine/functions/groupby/FirstIPv4GroupByFunction.java
core/src/main/java/io/questdb/griffin/engine/functions/groupby/FirstIntGroupByFunction.java
core/src/main/java/io/questdb/griffin/engine/functions/groupby/FirstLongGroupByFunction.java
core/src/main/java/io/questdb/griffin/engine/functions/groupby/FirstNotNullCharGroupByFunction.java
core/src/main/java/io/questdb/griffin/engine/functions/groupby/FirstNotNullDateGroupByFunction.java
core/src/main/java/io/questdb/griffin/engine/functions/groupby/FirstNotNullDoubleGroupByFunction.java
core/src/main/java/io/questdb/griffin/engine/functions/groupby/FirstNotNullFloatGroupByFunction.java
core/src/main/java/io/questdb/griffin/engine/functions/groupby/FirstNotNullGeoHashGroupByFunctionFactory.java
core/src/main/java/io/questdb/griffin/engine/functions/groupby/FirstNotNullIPv4GroupByFunctionFactory.java
core/src/main/java/io/questdb/griffin/engine/functions/groupby/FirstNotNullIntGroupByFunction.java
core/src/main/java/io/questdb/griffin/engine/functions/groupby/FirstNotNullLongGroupByFunction.java
core/src/main/java/io/questdb/griffin/engine/functions/groupby/FirstNotNullTimestampGroupByFunction.java
core/src/main/java/io/questdb/griffin/engine/functions/groupby/FirstShortGroupByFunction.java
core/src/main/java/io/questdb/griffin/engine/functions/groupby/FirstTimestampGroupByFunction.java
core/src/main/java/io/questdb/griffin/engine/functions/groupby/LastBooleanGroupByFunction.java
core/src/main/java/io/questdb/griffin/engine/functions/groupby/LastByteGroupByFunction.java
core/src/main/java/io/questdb/griffin/engine/functions/groupby/LastCharGroupByFunction.java
core/src/main/java/io/questdb/griffin/engine/functions/groupby/LastDateGroupByFunction.java
core/src/main/java/io/questdb/griffin/engine/functions/groupby/LastDoubleGroupByFunction.java
core/src/main/java/io/questdb/griffin/engine/functions/groupby/LastFloatGroupByFunction.java
core/src/main/java/io/questdb/griffin/engine/functions/groupby/LastGeoHashGroupByFunctionFactory.java
core/src/main/java/io/questdb/griffin/engine/functions/groupby/LastIPv4GroupByFunction.java
core/src/main/java/io/questdb/griffin/engine/functions/groupby/LastIntGroupByFunction.java
core/src/main/java/io/questdb/griffin/engine/functions/groupby/LastLongGroupByFunction.java
core/src/main/java/io/questdb/griffin/engine/functions/groupby/LastNotNullCharGroupByFunction.java
core/src/main/java/io/questdb/griffin/engine/functions/groupby/LastNotNullDateGroupByFunction.java
core/src/main/java/io/questdb/griffin/engine/functions/groupby/LastNotNullDoubleGroupByFunction.java
core/src/main/java/io/questdb/griffin/engine/functions/groupby/LastNotNullFloatGroupByFunction.java
core/src/main/java/io/questdb/griffin/engine/functions/groupby/LastNotNullGeoHashGroupByFunctionFactory.java
core/src/main/java/io/questdb/griffin/engine/functions/groupby/LastNotNullIPv4GroupByFunctionFactory.java
core/src/main/java/io/questdb/griffin/engine/functions/groupby/LastNotNullIntGroupByFunction.java
core/src/main/java/io/questdb/griffin/engine/functions/groupby/LastNotNullLongGroupByFunction.java
core/src/main/java/io/questdb/griffin/engine/functions/groupby/LastNotNullTimestampGroupByFunction.java
core/src/main/java/io/questdb/griffin/engine/functions/groupby/LastShortGroupByFunction.java
core/src/main/java/io/questdb/griffin/engine/functions/groupby/LastTimestampGroupByFunction.java
core/src/main/java/io/questdb/griffin/engine/functions/groupby/MaxCharGroupByFunction.java
core/src/main/java/io/questdb/griffin/engine/functions/groupby/MaxDateGroupByFunction.java
core/src/main/java/io/questdb/griffin/engine/functions/groupby/MaxDoubleGroupByFunction.java
core/src/main/java/io/questdb/griffin/engine/functions/groupby/MaxFloatGroupByFunction.java
core/src/main/java/io/questdb/griffin/engine/functions/groupby/MaxIPv4GroupByFunction.java
core/src/main/java/io/questdb/griffin/engine/functions/groupby/MaxIntGroupByFunction.java
core/src/main/java/io/questdb/griffin/engine/functions/groupby/MaxLongGroupByFunction.java
core/src/main/java/io/questdb/griffin/engine/functions/groupby/MaxTimestampGroupByFunction.java
core/src/main/java/io/questdb/griffin/engine/functions/groupby/MinCharGroupByFunction.java
core/src/main/java/io/questdb/griffin/engine/functions/groupby/MinDateGroupByFunction.java
core/src/main/java/io/questdb/griffin/engine/functions/groupby/MinDoubleGroupByFunction.java
core/src/main/java/io/questdb/griffin/engine/functions/groupby/MinFloatGroupByFunction.java
core/src/main/java/io/questdb/griffin/engine/functions/groupby/MinIPv4GroupByFunction.java
core/src/main/java/io/questdb/griffin/engine/functions/groupby/MinIntGroupByFunction.java
core/src/main/java/io/questdb/griffin/engine/functions/groupby/MinLongGroupByFunction.java
core/src/main/java/io/questdb/griffin/engine/functions/groupby/MinTimestampGroupByFunction.java
core/src/main/java/io/questdb/griffin/engine/functions/groupby/SumDoubleGroupByFunction.java
core/src/main/java/io/questdb/griffin/engine/functions/groupby/SumFloatGroupByFunction.java
core/src/main/java/io/questdb/griffin/engine/functions/groupby/SumIntGroupByFunction.java
core/src/main/java/io/questdb/griffin/engine/functions/groupby/SumLongGroupByFunction.java
core/src/main/java/io/questdb/griffin/engine/functions/groupby/SumShortGroupByFunction.java
core/src/main/java/io/questdb/griffin/engine/join/AsyncWindowJoinFastRecordCursorFactory.java
core/src/main/java/io/questdb/griffin/engine/join/AsyncWindowJoinRecordCursorFactory.java
core/src/main/java/io/questdb/griffin/engine/join/WindowJoinFastRecordCursorFactory.java
core/src/main/java/io/questdb/griffin/engine/table/AsyncGroupByNotKeyedAtom.java
core/src/main/java/io/questdb/griffin/engine/table/AsyncGroupByNotKeyedRecordCursorFactory.java
core/src/test/java/io/questdb/test/cairo/fuzz/ParallelGroupByFuzzTest.java
core/src/test/java/io/questdb/test/griffin/AggregateTest.java
core/src/test/java/io/questdb/test/griffin/ExplainPlanTest.java
core/src/test/java/io/questdb/test/griffin/GroupByTest.java
core/src/test/java/io/questdb/test/griffin/SqlOptimiserTest.java
core/src/test/java/io/questdb/test/griffin/engine/functions/finance/LevelTwoPriceFunctionFactoryTest.java
core/src/test/java/io/questdb/test/griffin/engine/functions/groupby/BooleanGroupByFunctionBatchTest.java
core/src/test/java/io/questdb/test/griffin/engine/functions/groupby/ByteGroupByFunctionBatchTest.java
core/src/test/java/io/questdb/test/griffin/engine/functions/groupby/CharGroupByFunctionBatchTest.java
core/src/test/java/io/questdb/test/griffin/engine/functions/groupby/DateGroupByFunctionBatchTest.java
core/src/test/java/io/questdb/test/griffin/engine/functions/groupby/DoubleGroupByFunctionBatchTest.java
core/src/test/java/io/questdb/test/griffin/engine/functions/groupby/FloatGroupByFunctionBatchTest.java
core/src/test/java/io/questdb/test/griffin/engine/functions/groupby/GeoHashGroupByFunctionBatchTest.java
core/src/test/java/io/questdb/test/griffin/engine/functions/groupby/IPv4GroupByFunctionBatchTest.java
core/src/test/java/io/questdb/test/griffin/engine/functions/groupby/IntGroupByFunctionBatchTest.java
core/src/test/java/io/questdb/test/griffin/engine/functions/groupby/LongGroupByFunctionBatchTest.java
core/src/test/java/io/questdb/test/griffin/engine/functions/groupby/MinCharGroupByFunctionFactoryTest.java
core/src/test/java/io/questdb/test/griffin/engine/functions/groupby/ShortGroupByFunctionBatchTest.java
core/src/test/java/io/questdb/test/griffin/engine/groupby/GroupByFunctionCaseTest.java

core/src/main/java/io/questdb/griffin/engine/functions/groupby/FirstCharGroupByFunction.java

...java/io/questdb/griffin/engine/functions/groupby/FirstNotNullIPv4GroupByFunctionFactory.java

...in/java/io/questdb/griffin/engine/functions/groupby/LastNotNullTimestampGroupByFunction.java

core/src/main/java/io/questdb/griffin/engine/functions/groupby/SumFloatGroupByFunction.java

core/src/main/java/io/questdb/griffin/engine/join/AsyncWindowJoinRecordCursorFactory.java

...t/java/io/questdb/test/griffin/engine/functions/groupby/BooleanGroupByFunctionBatchTest.java

...test/java/io/questdb/test/griffin/engine/functions/groupby/CharGroupByFunctionBatchTest.java

...est/java/io/questdb/test/griffin/engine/functions/groupby/ShortGroupByFunctionBatchTest.java

puzpuzpuz · 2026-03-04T20:20:34Z

Bugs 1, 2, 3 were wrong

@glasstiger have you seen #6805 (comment)? Coderabbit frequently hallucinates, so it's better to double-check whether its comments make sense.

puzpuzpuz · 2026-03-04T20:25:26Z

I've reverted the fix for bugs 1-3 in fb08047

glasstiger · 2026-03-04T20:34:39Z

Bugs 1, 2, 3 were wrong

@glasstiger have you seen #6805 (comment)? Coderabbit frequently hallucinates, so it's better to double-check whether its comments make sense.

It did not hallucinate.
I pushed the fix, and when I asked it to check again, it did not realise that there was a fix applied.
It thought it was wrong.

puzpuzpuz · 2026-03-04T20:55:25Z

I pushed the fix, and when I asked it to check again, it did not realise that there was a fix applied.

The fix (fa588c6) changed SumDoubleGroupByFunction#computeBatch from Numbers.isFinite(existing) to !Double.isNaN(existing), but computeNext (line 73) and merge (line 132) still use Numbers.isFinite(). This creates an inconsistency:

Method	Guard	When accumulated sum = ±Infinity
`computeBatch` (after fix)	`!Double.isNaN(existing)`	Preserves Infinity, accumulates into it
`computeNext`	`Numbers.isFinite(sum)`	Overwrites Infinity with next finite value
`merge`	`Numbers.isFinite(destSum)`	Overwrites Infinity with src value

The same query could produce different results depending on whether QuestDB takes the batch or row-by-row code path (e.g. column tops force fallback to row-by-row). That's worse than the original behavior where all three methods were at least consistent.

Same applies to SumFloat.

As for AvgDouble, the revert is incorrect since the OG patch version was inconsistent. I've reverted the fix for that function in bcd0a23

For other two functions, the revert restores isFinite in computeBatch to match computeNext/merge. If we want to preserve Infinity instead, all three methods need to change together. WDYT?

glasstiger · 2026-03-05T13:36:04Z

@CodeRabbit re-review the PR

coderabbitai · 2026-03-05T13:40:35Z

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Fetch current branch state of the key files after all commits
fd -e java 'SumDoubleGroupByFunction.java' --exec cat {}
echo "========== SumFloat =========="
fd -e java 'SumFloatGroupByFunction.java' --exec cat {}
echo "========== AvgDouble =========="
fd -e java 'AvgDoubleGroupByFunction.java' --exec cat {}

Length of output: 34846

@glasstiger — I've read all three files in their current state on the branch. Here's the full picture after all the fixes and reverts:

✅ All previous concerns resolved

SumDoubleGroupByFunction

All three code paths now use !Double.isNaN consistently:

Method	Guard	Status
`computeBatch`	`!Double.isNaN(batchSum)` / `!Double.isNaN(existing)`	✅
`computeNext`	`!Double.isNaN(value)` / `!Double.isNaN(sum)`	✅
`merge`	`!Double.isNaN(srcSum)` / `!Double.isNaN(destSum)`	✅

The inconsistency @puzpuzpuz flagged (where the intermediate fix left computeNext/merge using Numbers.isFinite while computeBatch used !isNaN) is fully resolved. As a bonus, the previous behavior where a batch summing to ±Infinity would be silently dropped (Numbers.isFinite(batchSum) outer guard) is also fixed — Infinity-valued batches are now correctly accumulated.

SumFloatGroupByFunction

Same pattern: computeBatch, computeNext, and merge all use !Float.isNaN consistently. ✅

AvgDoubleGroupByFunction

The tricky sumDoubleAcc side-effect on countPtr is handled correctly:

prevCount is saved before the sumDoubleAcc call
If batchSum is not NaN, the batch count is written to countPtr by the native call, and prevCount is added back via addLong
If batchSum is NaN (all-NaN batch), prevCount is restored directly, preventing the native call from zeroing out the accumulated count ✅

🟠 Remaining dormant issue (unchanged from before)

Window-join startRowId = 0 — all three window-join callers (AsyncWindowJoinRecordCursorFactory, AsyncWindowJoinFastRecordCursorFactory, WindowJoinFastRecordCursorFactory) still hardcode startRowId = 0. This is dormant today because each caller invokes computeBatch exactly once per group after setEmpty(), so no multi-batch ordering comparison ever fires. Not a blocking issue, but worth a follow-up comment in the code or a TODO.

Overall

The PR is in a correct, internally consistent state. Coverage at 98.92% is excellent, with all modified production files at 100%. No blocking issues found. The startRowId = 0 window-join case is the only residual concern, and it's dormant by construction.

🧠 Learnings used

Learnt from: javier
Repo: questdb/questdb PR: 0
File: :0-0
Timestamp: 2026-02-06T18:22:28.130Z
Learning: In questdb/questdb, for PRs requested by javier, provide line-cited, push-back style reviews with concrete follow-ups (tests, diffs, and performance considerations).

Learnt from: javier
Repo: questdb/questdb PR: 0
File: :0-0
Timestamp: 2026-02-24T14:27:11.921Z
Learning: In questdb/questdb, QuestDB currently only supports DOUBLE[] arrays for array group-by functions (`first_array`, `last_array`, and the upcoming `array_agg`). There are no INT[] or LONG[] array group-by functions.

Learnt from: jerrinot
Repo: questdb/questdb PR: 0
File: :0-0
Timestamp: 2026-02-04T11:18:36.492Z
Learning: For PRs by jerrinot, always use a thorough and critical review mode: question design decisions, challenge assumptions, scrutinize implementations for edge cases and potential issues, and avoid superficial validation. They prefer hard-core reviews that push back on the code.

Learnt from: javier
Repo: questdb/questdb PR: 0
File: :0-0
Timestamp: 2026-02-06T18:22:28.130Z
Learning: For questdb/questdb, javier expects a rigorous, push-back style review; always include concrete line-cited findings and propose targeted tests/fixes when reviewing javier’s PRs.

✅ Actions performed

Full review triggered.

coderabbitai

Actionable comments posted: 8

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)

core/src/test/java/io/questdb/test/griffin/engine/functions/groupby/MinCharGroupByFunctionFactoryTest.java (1)

62-90: ⚠️ Potential issue | 🟠 Major

Wrap this test with assertMemoryLeak() to enforce resource-safety checks.

testFirstNull() allocates writer/cursor resources and should follow the test harness leak-check pattern.

Proposed fix

 `@Test`
 public void testFirstNull() throws SqlException {
-    execute("create table tab (f char)");
-
-    final Rnd rnd = new Rnd();
-    char min = 0;
-    try (TableWriter w = getWriter("tab")) {
-        TableWriter.Row r = w.newRow();
-        r.append();
-        for (int i = 100; i > 10; i--) {
-            r = w.newRow();
-            char ch = rnd.nextChar();
-            r.putChar(0, ch);
-            r.append();
-            if (ch > 0 && (ch < min || min == 0)) {
-                min = ch;
-            }
-        }
-        w.commit();
-    }
-
-    try (RecordCursorFactory factory = select("select min(f) from tab")) {
-        try (RecordCursor cursor = factory.getCursor(sqlExecutionContext)) {
-            Record record = cursor.getRecord();
-            Assert.assertEquals(1, cursor.size());
-            Assert.assertTrue(cursor.hasNext());
-            Assert.assertEquals(min, record.getChar(0));
-        }
-    }
+    assertMemoryLeak(() -> {
+        execute("create table tab (f char)");
+
+        final Rnd rnd = new Rnd();
+        char min = 0;
+        try (TableWriter w = getWriter("tab")) {
+            TableWriter.Row r = w.newRow();
+            r.append();
+            for (int i = 100; i > 10; i--) {
+                r = w.newRow();
+                char ch = rnd.nextChar();
+                r.putChar(0, ch);
+                r.append();
+                if (ch > 0 && (ch < min || min == 0)) {
+                    min = ch;
+                }
+            }
+            w.commit();
+        }
+
+        try (RecordCursorFactory factory = select("select min(f) from tab")) {
+            try (RecordCursor cursor = factory.getCursor(sqlExecutionContext)) {
+                Record record = cursor.getRecord();
+                Assert.assertEquals(1, cursor.size());
+                Assert.assertTrue(cursor.hasNext());
+                Assert.assertEquals(min, record.getChar(0));
+            }
+        }
+    });
 }

As per coding guidelines: "**/test/**/*.java: Write all tests using assertMemoryLeak(). This isn't needed for narrow unit tests that don't allocate native memory."

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In
`@core/src/test/java/io/questdb/test/griffin/engine/functions/groupby/MinCharGroupByFunctionFactoryTest.java`
around lines 62 - 90, The test method testFirstNull allocates native resources
(TableWriter/RecordCursor) and must be wrapped with assertMemoryLeak to ensure
no leaks; update testFirstNull to call assertMemoryLeak(() -> { ... }) around
the existing test body (the parts that call getWriter("tab"), select("select
min(f) from tab"), and related TableWriter/RecordCursor usage) so that resource
allocation and cursor/writer lifecycles are executed inside the assertMemoryLeak
closure and existing try-with-resources blocks remain unchanged inside it.

core/src/main/java/io/questdb/griffin/engine/functions/groupby/MinCharGroupByFunction.java (1)

122-127: ⚠️ Potential issue | 🟠 Major

Type mismatch: putInt used for char value.

Line 126 uses destValue.putInt(valueIndex, srcMin) to store a char value, but the column type is CHAR (see initValueTypes at line 108). This should use putChar for consistency with how the value is stored elsewhere (e.g., computeFirst at line 68, computeBatch at line 60).

🐛 Proposed fix

     public void merge(MapValue destValue, MapValue srcValue) {
         char srcMin = srcValue.getChar(valueIndex);
         char destMin = destValue.getChar(valueIndex);
         if (srcMin != 0 && (srcMin < destMin || destMin == 0)) {
-            destValue.putInt(valueIndex, srcMin);
+            destValue.putChar(valueIndex, srcMin);
         }
     }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In
`@core/src/main/java/io/questdb/griffin/engine/functions/groupby/MinCharGroupByFunction.java`
around lines 122 - 127, The merge method in MinCharGroupByFunction writes a char
using destValue.putInt(valueIndex, srcMin); change this to use the correct char
setter destValue.putChar(valueIndex, srcMin) so the stored type matches
initValueTypes and the other methods (computeFirst, computeBatch) that use char
accessors; ensure valueIndex and srcMin remain the same and no other logic is
changed.

core/src/main/java/io/questdb/griffin/engine/functions/groupby/LastNotNullIPv4GroupByFunctionFactory.java (1)

82-86: ⚠️ Potential issue | 🟡 Minor

Add row-id guard to LastNotNullIPv4GroupByFunctionFactory.computeNext.

The IPv4 implementation is missing the row-id check present in other LastNotNull* functions (Double, Char, etc.). It unconditionally calls computeFirst when the value is non-null, which can overwrite a newer value with an older one if records are processed out of order.

Update line 84 to check both that the stored value is null (first assignment) and that the incoming rowId is newer:
if (Numbers.IPv4_NULL != arg.getIPv4(record)) {
    if (mapValue.getIPv4(valueIndex + 1) == Numbers.IPv4_NULL || rowId > mapValue.getLong(valueIndex)) {
        computeFirst(mapValue, record, rowId);
    }
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@core/src/main/java/io/questdb/griffin/engine/functions/groupby/LastNotNullIPv4GroupByFunctionFactory.java`
around lines 82 - 86, In LastNotNullIPv4GroupByFunctionFactory.computeNext the
method currently calls computeFirst whenever the incoming IPv4 is non-null,
which can overwrite newer values; change computeNext to first verify the
incoming value is non-null and then only call computeFirst if the stored slot is
empty or the incoming rowId is newer (check mapValue.getIPv4(valueIndex + 1) ==
Numbers.IPv4_NULL || rowId > mapValue.getLong(valueIndex)) so that
computeFirst(mapValue, record, rowId) is invoked only when appropriate.

core/src/test/java/io/questdb/test/griffin/engine/functions/groupby/CharGroupByFunctionBatchTest.java (1)

61-249: ⚠️ Potential issue | 🟠 Major

Wrap these native-memory tests in assertMemoryLeak().

These tests allocate off-heap memory (Unsafe.malloc) but are not wrapped with assertMemoryLeak(). Please apply the same leak-check wrapper pattern across the newly added/updated batch tests in this PR (including Date/Byte/IPv4/Float/Int/GeoHash batch test files), not just this class.

As per coding guidelines, Write all tests using assertMemoryLeak(). This isn't needed for narrow unit tests that don't allocate native memory.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@core/src/test/java/io/questdb/test/griffin/engine/functions/groupby/CharGroupByFunctionBatchTest.java`
around lines 61 - 249, These batch tests allocate native memory (via
allocateChars / Unsafe.malloc) and must be wrapped with assertMemoryLeak() to
catch leaks; update each affected test (e.g., testFirstCharBatch,
testFirstCharBatchAccumulates, testFirstNotNullCharBatch, testLastCharBatch,
testMaxCharBatch, testMinCharBatchSimple, etc.) to run their body inside
assertMemoryLeak(() -> { ... }), keeping the existing try-with-resources
prepare(...) and calls to allocateChars and computeBatch unchanged; apply the
same pattern to the other batch test classes mentioned
(Date/Byte/IPv4/Float/Int/GeoHash) so every test that calls allocateChars or
other native allocators is wrapped in assertMemoryLeak().

🧹 Nitpick comments (11)

core/src/test/java/io/questdb/test/griffin/ExplainPlanTest.java (1)

3132-3136: Update stale test intent comment in testGroupByNotKeyed2.

Line 3132 says expressions disable vectorization, but Line 3136 asserts vectorized: true. Please align the comment with the actual expected behavior to avoid misleading future debugging.

Suggested comment-only fix

-    `@Test` // expressions in aggregates disable vectorized impl
+    `@Test` // mixed aggregate args are expected to run with vectorized=true in this case
     public void testGroupByNotKeyed2() throws Exception {

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@core/src/test/java/io/questdb/test/griffin/ExplainPlanTest.java` around lines
3132 - 3136, The test comment for testGroupByNotKeyed2 is stale—update the
inline comment that currently reads "expressions in aggregates disable
vectorized impl" to reflect that vectorization is expected (the assertion shows
"vectorized: true"); modify the comment near the test method
testGroupByNotKeyed2 (method name: testGroupByNotKeyed2) so it accurately
describes that expressions in aggregates do not disable vectorized execution in
this case.

core/src/main/java/io/questdb/griffin/engine/functions/groupby/MaxTimestampGroupByFunction.java (1)

48-56: Minor style inconsistency with MinTimestampGroupByFunction.

The logic is correct—LONG_NULL (Long.MIN_VALUE) ensures any valid timestamp will compare as greater. However, MinTimestampGroupByFunction.computeBatch uses explicit null guards (batchMin != Numbers.LONG_NULL), whereas this relies on implicit sentinel comparison. Additionally, computeFirst and setNull use putTimestamp, but computeBatch uses putLong.

Consider aligning the pattern for consistency:

♻️ Suggested alignment (optional)

     public void computeBatch(MapValue mapValue, long ptr, int count, long startRowId) {
         if (count > 0) {
             final long batchMax = Vect.maxLong(ptr, count);
-            final long existing = mapValue.getLong(valueIndex);
-            if (batchMax > existing) {
-                mapValue.putLong(valueIndex, batchMax);
+            if (batchMax != Numbers.LONG_NULL) {
+                final long existing = mapValue.getTimestamp(valueIndex);
+                if (batchMax > existing || existing == Numbers.LONG_NULL) {
+                    mapValue.putTimestamp(valueIndex, batchMax);
+                }
             }
         }
     }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In
`@core/src/main/java/io/questdb/griffin/engine/functions/groupby/MaxTimestampGroupByFunction.java`
around lines 48 - 56, MaxTimestampGroupByFunction.computeBatch currently relies
on implicit sentinel comparison and uses mapValue.putLong, which is inconsistent
with MinTimestampGroupByFunction and with this class's computeFirst/setNull that
use putTimestamp; update computeBatch in MaxTimestampGroupByFunction to mirror
MinTimestampGroupByFunction by explicitly guarding against Numbers.LONG_NULL for
batchMax (e.g., if (batchMax != Numbers.LONG_NULL) { ... }) and when updating
the map use mapValue.putTimestamp(valueIndex, batchMax) instead of putLong so
the null-sentinel handling and write method are consistent with computeFirst and
setNull.

core/src/main/java/io/questdb/griffin/engine/functions/groupby/CountGeoHashGroupByFunctionByte.java (1)

41-51: Skip zero-delta writes in hot batch path.

This is correct functionally, but writing addLong(..., 0) on null-only batches adds avoidable map churn. Consider guarding on nonNullCount > 0 (as in sibling count implementations).

♻️ Suggested micro-optimization

-            mapValue.addLong(valueIndex, nonNullCount);
+            if (nonNullCount > 0) {
+                mapValue.addLong(valueIndex, nonNullCount);
+            }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In
`@core/src/main/java/io/questdb/griffin/engine/functions/groupby/CountGeoHashGroupByFunctionByte.java`
around lines 41 - 51, In computeBatch of CountGeoHashGroupByFunctionByte, avoid
calling mapValue.addLong(valueIndex, 0) for batches with no non-null bytes to
reduce map churn: after counting nonNullCount (the loop using
Unsafe.getUnsafe().getByte and GeoHashes.BYTE_NULL), only call
mapValue.addLong(valueIndex, nonNullCount) when nonNullCount > 0; leave the
counting loop and variable names (ptr, count, hi, nonNullCount, valueIndex)
unchanged.

core/src/main/java/io/questdb/griffin/engine/functions/groupby/FirstNotNullIPv4GroupByFunctionFactory.java (1)

70-73: Use Integer.BYTES instead of hardcoded 4L in pointer math.

This reduces magic numbers and keeps width semantics tied to type size.

♻️ Suggested cleanup

-                final long hi = ptr + count * 4L;
+                final long hi = ptr + count * (long) Integer.BYTES;
                 long offset = 0;
-                for (; ptr < hi; ptr += 4L) {
+                for (; ptr < hi; ptr += Integer.BYTES) {
                     int value = Unsafe.getUnsafe().getInt(ptr);
                     if (value != Numbers.IPv4_NULL) {
                         long rowId = startRowId + offset;
                         long existingRowId = mapValue.getLong(valueIndex);
                         if (rowId < existingRowId || existingRowId == Numbers.LONG_NULL || mapValue.getIPv4(valueIndex + 1) == Numbers.IPv4_NULL) {
                             mapValue.putLong(valueIndex, rowId);
                             mapValue.putInt(valueIndex + 1, value);
                         }
                         break;
                     }
                     offset++;
                 }

Also applies to: 83-83

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In
`@core/src/main/java/io/questdb/griffin/engine/functions/groupby/FirstNotNullIPv4GroupByFunctionFactory.java`
around lines 70 - 73, Replace the hardcoded 4L used for IPv4 pointer arithmetic
in FirstNotNullIPv4GroupByFunctionFactory with Integer.BYTES to avoid magic
numbers: compute hi as ptr + count * (long)Integer.BYTES, increment ptr by
(long)Integer.BYTES in the loop, and update the similar occurrence at the other
location (around the second loop at the noted lines); keep the getInt(ptr) usage
unchanged but ensure all pointer math consistently uses Integer.BYTES (cast to
long where needed).

core/src/test/java/io/questdb/test/griffin/engine/functions/groupby/DoubleGroupByFunctionBatchTest.java (1)

577-621: Place new testAvg... methods with the other testAvg... members.

The methods added at Line 577+ are out of alphabetical member order relative to the existing testAvg... block. Please move them to preserve class member ordering consistency.

As per coding guidelines: **/*.java: "Group Java class members by kind (static vs. instance) and visibility, sorted alphabetically."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@core/src/test/java/io/questdb/test/griffin/engine/functions/groupby/DoubleGroupByFunctionBatchTest.java`
around lines 577 - 621, The four new test methods
(testAvgDoubleComputeFirstInfinity, testAvgDoubleComputeFirstNaN,
testAvgDoubleComputeNextInfinity, testAvgDoubleComputeNextNaNSkipped) are placed
out of alphabetical order; move these methods inside the existing block of other
testAvg... members in DoubleGroupByFunctionBatchTest so they are grouped with
the other testAvg* methods and sorted alphabetically within that group, keeping
each method signature and body unchanged and preserving the try-with-resources
usage and assertions.

core/src/main/java/io/questdb/griffin/engine/functions/GroupByFunction.java (1)

58-75: Clarify startRowId monotonicity in the computeBatch contract.

The updated docs state multi-call accumulation, but they do not explicitly require increasing, non-overlapping startRowId ranges per group. Making that explicit will prevent incorrect first/last behavior if callers split work into multiple batch invocations.

✏️ Suggested Javadoc clarification

-     * This method may be called multiple times for the same group {`@link` MapValue} (e.g. once
-     * per page frame). Implementations must accumulate into the existing state set by
+     * This method may be called multiple times for the same group {`@link` MapValue} (e.g. once
+     * per page frame). Implementations must accumulate into the existing state set by
      * {`@link` `#setEmpty`(MapValue)}, not overwrite it.
+     * For a given group, repeated calls must use non-overlapping, increasing {`@code` startRowId}
+     * ranges so row-order-aware aggregates (e.g. first/last) remain correct.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@core/src/main/java/io/questdb/griffin/engine/functions/GroupByFunction.java`
around lines 58 - 75, Update the Javadoc for GroupByFunction.computeBatch to
explicitly state that for a given MapValue successive invocations of
computeBatch will receive non-overlapping, strictly increasing row ranges: each
call’s startRowId must be greater than the previous call’s startRowId and the
ranges must not overlap, so implementations of computeBatch (which must
accumulate into state set by setEmpty and may be invoked without a preceding
computeFirst or followed by merge when supportsBatchComputation() is true) may
rely on monotonic, non-overlapping batches to implement first/last semantics
correctly.

core/src/test/java/io/questdb/test/griffin/engine/functions/groupby/ShortGroupByFunctionBatchTest.java (2)

230-239: Strengthen accumulation coverage with non-zero second startRowId.

For a true “accumulates across batches” shape under the new API, make the second call use an offset (e.g., 2) rather than 0.

♻️ Suggested test tweak

- ptr = allocateShorts((short) 3, (short) 4);
- function.computeBatch(value, ptr, 2, 0);
+ ptr = allocateShorts((short) 3, (short) 4);
+ function.computeBatch(value, ptr, 2, 2);

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In
`@core/src/test/java/io/questdb/test/griffin/engine/functions/groupby/ShortGroupByFunctionBatchTest.java`
around lines 230 - 239, The test currently calls function.computeBatch(value,
ptr, 2, 0) twice, which doesn't verify accumulation when the second batch has a
non-zero startRowId; update the second computeBatch invocation to use a non-zero
offset (e.g., 2) so it reads like function.computeBatch(value, ptr, 2, 2) to
exercise accumulation across batches for SumShortGroupByFunction; keep the rest
of the test (prepare, allocateShorts, SimpleMapValue) unchanged.

109-117: Rename *AllNull tests to match actual inputs.

Both methods currently include one non-null value, so names like ...LeadingNull / ...TrailingNull would be less misleading during failure triage.

Also applies to: 187-196

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In
`@core/src/test/java/io/questdb/test/griffin/engine/functions/groupby/ShortGroupByFunctionBatchTest.java`
around lines 109 - 117, The test method names (e.g., testFirstShortBatchAllNull)
are misleading because the test bodies include one non-null value; rename them
to reflect the actual input pattern (e.g., testFirstShortBatchLeadingNull or
testFirstShortBatchTrailingNull) and update the corresponding test at the other
occurrence (lines ~187-196) accordingly; ensure you change the method
declarations and any references to these methods (search for
testFirstShortBatchAllNull and similar) and keep the test logic using
FirstShortGroupByFunction, allocateShorts, and prepare unchanged.

core/src/main/java/io/questdb/griffin/engine/table/AsyncGroupByNotKeyedRecordCursorFactory.java (1)

75-75: Rename boolean field to use is... prefix.

The field vectorized at line 75 violates the project convention for boolean naming. Rename to isVectorized to match the guideline: "When choosing a name for a boolean variable, field or method, always use the is... or has... prefix."

♻️ Proposed rename

- private final boolean vectorized;
+ private final boolean isVectorized;
...
- this.vectorized = (reducer == AGGREGATE_VECT);
+ this.isVectorized = (reducer == AGGREGATE_VECT);
...
- sink.attr("vectorized").val(vectorized);
+ sink.attr("vectorized").val(isVectorized);

Applies to lines: 75, 136, 185.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In
`@core/src/main/java/io/questdb/griffin/engine/table/AsyncGroupByNotKeyedRecordCursorFactory.java`
at line 75, Rename the boolean field 'vectorized' in
AsyncGroupByNotKeyedRecordCursorFactory to 'isVectorized' and update all
references (constructor parameter, assignments, and any usages such as in
methods or other classes) to the new name; specifically change the field
declaration 'vectorized' to 'isVectorized', update the constructor parameter and
assignment that set this vector flag, and replace all usages at the sites
referenced (previously at lines around 136 and 185) to use 'isVectorized' so
naming follows the is... boolean convention.

core/src/test/java/io/questdb/test/griffin/engine/functions/groupby/GeoHashGroupByFunctionBatchTest.java (1)

366-397: Add last_not_null accumulation tests for BYTE/SHORT/LONG geohash variants too.

This file now validates non-zero-offset accumulation for the INT variant, but the production change touched BYTE/SHORT/LONG as well. Add sibling two-batch tests (startRowId > 0 on the second batch) for those variants to catch type-specific row-id ordering regressions.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In
`@core/src/test/java/io/questdb/test/griffin/engine/functions/groupby/GeoHashGroupByFunctionBatchTest.java`
around lines 366 - 397, Add sibling two-batch unit tests for the BYTE and SHORT
geohash variants (and ensure the LONG variant has a two-batch accumulation test
with a non-zero second batch startRowId) similar to
testLastNotNullGeoHashBatchIntAccumulates: create tests (e.g.,
testLastNotNullGeoHashBatchByteAccumulates and
testLastNotNullGeoHashBatchShortAccumulates) that build a GroupByFunction via
newLastNotNullGeoHashFunction(GeoByteColumn.newInstance(...)) /
GeoShortColumn.newInstance(...), call function.setNull(value), then call
function.computeBatch(value, ptr, len, 0) for the first batch and
function.computeBatch(value, ptr2, len2, startRowId>0) for the second batch
using the appropriate allocateBytes/allocateShorts (and for LONG ensure a
second-batch startRowId>0 case if missing) with GeoHashes.*_NULL/values, and
finally assert the expected value with
function.getGeoByte/getGeoShort/getGeoLong and that
function.supportsBatchComputation() where appropriate.

core/src/test/java/io/questdb/test/griffin/GroupByTest.java (1)

2984-3006: Add plan-shape assertions to lock vectorized execution in these scenarios.

Lines 2995, 3025, 3043, and 3154 currently assert only final results. If planner routing regresses to a non-vectorized path, these tests can still pass and miss the performance regression this change targets.

Suggested pattern (example for one test)

 `@Test`
 public void testNonKeyedVectorizedAllNulls() throws Exception {
     assertMemoryLeak(() -> {
         execute("CREATE TABLE t (v INT, ts TIMESTAMP) TIMESTAMP(ts) PARTITION BY DAY");
         execute(
                 """
                         INSERT INTO t(ts) VALUES
                         ('2024-01-01T00:00:00.000000Z'),
                         ('2024-01-02T00:00:00.000000Z')"""
         );
+        String query = "SELECT sum(v), min(v), max(v), count(*), avg(v), first(v), last(v), first_not_null(v), last_not_null(v) FROM t";
+        assertPlanNoLeakCheck(
+                query,
+                """
+                        Async Group By workers: 1
+                          vectorized: true
+                          values: [sum(v),min(v),max(v),count(*),avg(v),first(v),last(v),first_not_null(v),last_not_null(v)]
+                          filter: null
+                            PageFrame
+                                Row forward scan
+                                Frame forward scan on: t
+                        """
+        );
         assertQueryNoLeakCheck(
                 """
                         sum\tmin\tmax\tcount\tavg\tfirst\tlast\tfirst_not_null\tlast_not_null
                         null\tnull\tnull\t2\tnull\tnull\tnull\tnull\tnull
                         """,
-                "SELECT sum(v), min(v), max(v), count(*), avg(v), first(v), last(v), first_not_null(v), last_not_null(v) FROM t",
+                query,
                 null,
                 false,
                 true
         );
     });
 }

Also applies to: 3008-3036, 3038-3054, 3136-3165

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@core/src/test/java/io/questdb/test/griffin/GroupByTest.java` around lines
2984 - 3006, The tests (e.g., GroupByTest.testNonKeyedVectorizedAllNulls)
currently assert only query results; add an assertion that the planner chooses
the vectorized execution path by running an EXPLAIN (or planner plan) for the
same query and asserting the plan text contains the vectorized operator
identifier used by the engine (e.g., "VectorizedGroupBy" or "vectorized") before
calling assertQueryNoLeakCheck; apply the same pattern to the other affected
tests (the tests around the shown ranges that use the same GROUP BY aggregate
SQL and assertQueryNoLeakCheck) so they fail if the planner regresses to a
non-vectorized plan.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@core/src/main/java/io/questdb/griffin/engine/functions/groupby/AvgDoubleGroupByFunction.java`:
- Around line 52-64: AvgDoubleGroupByFunction's computeBatch currently treats
Double.isNaN(batchSum) as an "empty" sentinel and discards the batch count,
which corrupts state when a batch produces NaN from non-empty data (e.g., +Inf
and -Inf); change the logic to use the actual batch count returned/written by
sumDoubleAcc (the countPtr/batchCount) to decide emptiness and always add the
batchCount to mapValue at valueIndex+1 when batchCount>0, while still merging
sums by reading prevSum and adding batchSum when applicable; update references
in computeBatch/sumDoubleAcc handling (valueIndex, countPtr, mapValue)
accordingly and add a regression test that runs computeBatch across two frames
where the first frame yields NaN from a non-empty set (e.g., +Infinity and
-Infinity) and asserts the aggregate remains NaN with the correct accumulated
count after a subsequent finite batch.

In
`@core/src/main/java/io/questdb/griffin/engine/functions/groupby/SumShortGroupByFunction.java`:
- Around line 48-58: computeBatch currently unconditionally accumulates batchSum
(from Vect.sumShort) which can be Numbers.LONG_NULL; update
SumShortGroupByFunction.computeBatch to treat Numbers.LONG_NULL like "no value"
similarly to merge(): obtain batchSum, if batchSum == Numbers.LONG_NULL then
return/do nothing; otherwise read existing = mapValue.getLong(valueIndex) and if
existing == Numbers.LONG_NULL put batchSum, else put existing + batchSum; this
preserves existing behavior and matches
SumIntGroupByFunction/SumLongGroupByFunction semantics.

In
`@core/src/main/java/io/questdb/griffin/engine/join/AsyncWindowJoinFastRecordCursorFactory.java`:
- Around line 569-572: The vectorized computeBatch invocations in
AsyncWindowJoinFastRecordCursorFactory use a hardcoded startRowId of 0 for
no-arg and shifted-column paths which breaks ordering guarantees for
first/last-style aggregates if batches are split; update all calls to
groupByFunctions.getQuick(i).computeBatch(...) to pass the logical batch start
instead of 0 — use rowLo for normal paths and rowLo+1 for shifted-column
(pointer-shifted) paths, and preserve the existing address computation
(columnSink.of(ptr).startAddress() + typeSize * rowLo) and batch length
((int)(rowHi - rowLo)); apply the same change at the other three sites where
computeBatch is called (the shifted and non-shifted variants).

In
`@core/src/main/java/io/questdb/griffin/engine/join/WindowJoinFastRecordCursorFactory.java`:
- Around line 889-892: The vectorized computeBatch invocations for
groupByFunctions.getQuick(i).computeBatch currently pass a hardcoded startRowId
of 0 which breaks row-order-aware aggregates; change both calls to pass the
correct logical startRowId instead of 0—use efficientRowLo when computing
against columnSink.of(ptr).startAddress() + typeSize * efficientRowLo (to match
the address offset) and use rowLo (or efficientRowLo if appropriate for your
window semantics) for the no-arg computeBatch path so startRowId tracks the
actual batch position rather than always 0.

In
`@core/src/test/java/io/questdb/test/griffin/engine/functions/groupby/DoubleGroupByFunctionBatchTest.java`:
- Around line 61-83: Wrap the native-buffer-allocating test body in an
assertMemoryLeak() guard: in testAvgDoubleBatchAccumulatedInfinityIsPreserved(),
surround the code that calls allocateDoubles(...) and function.computeBatch(...)
(the block using AvgDoubleGroupByFunction, prepare(...), SimpleMapValue, and
allocateDoubles) with an assertMemoryLeak(() -> { ... }) lambda so the native
allocations are leak-checked; keep the test logic intact and only move its
contents into the assertMemoryLeak wrapper.

In
`@core/src/test/java/io/questdb/test/griffin/engine/functions/groupby/LongGroupByFunctionBatchTest.java`:
- Around line 72-84: Wrap the native-memory-allocating test method
testCountLongBatchAccumulates with assertMemoryLeak(() -> { ... }) so the
allocateLongs (Unsafe.malloc) paths are leak-checked; specifically, keep the
existing instantiation of CountLongGroupByFunction, the prepare(...) usage, the
allocateLongs/computeBatch calls, and the Assert.assertEquals inside the lambda
passed to assertMemoryLeak to ensure error paths are covered by the leak
harness.

In
`@core/src/test/java/io/questdb/test/griffin/engine/groupby/GroupByFunctionCaseTest.java`:
- Around line 245-256: Rename the boolean local asyncVectorized to follow the
boolean naming convention (e.g., isAsyncVectorized): update its declaration
(currently "boolean asyncVectorized = ...") and all usages (the conditional
computing it and the planSink.put(asyncVectorized) call) to the new name so
compilation and behavior remain unchanged but the identifier uses the is...
prefix.
- Around line 195-206: In the GroupByFunctionCaseTest test SQL string (the
multiline SELECT that defines candle_st, venue, num_ticks, quote_volume, vwap
from 'spot_trades'), change the lowercase SQL keywords "like" and "in" to
uppercase "LIKE" and "IN" respectively so the query uses uppercase SQL keywords
per test style conventions; update the occurrences inside the WHERE clause of
that SQL literal accordingly.

---

Outside diff comments:
In
`@core/src/main/java/io/questdb/griffin/engine/functions/groupby/LastNotNullIPv4GroupByFunctionFactory.java`:
- Around line 82-86: In LastNotNullIPv4GroupByFunctionFactory.computeNext the
method currently calls computeFirst whenever the incoming IPv4 is non-null,
which can overwrite newer values; change computeNext to first verify the
incoming value is non-null and then only call computeFirst if the stored slot is
empty or the incoming rowId is newer (check mapValue.getIPv4(valueIndex + 1) ==
Numbers.IPv4_NULL || rowId > mapValue.getLong(valueIndex)) so that
computeFirst(mapValue, record, rowId) is invoked only when appropriate.

In
`@core/src/main/java/io/questdb/griffin/engine/functions/groupby/MinCharGroupByFunction.java`:
- Around line 122-127: The merge method in MinCharGroupByFunction writes a char
using destValue.putInt(valueIndex, srcMin); change this to use the correct char
setter destValue.putChar(valueIndex, srcMin) so the stored type matches
initValueTypes and the other methods (computeFirst, computeBatch) that use char
accessors; ensure valueIndex and srcMin remain the same and no other logic is
changed.

In
`@core/src/test/java/io/questdb/test/griffin/engine/functions/groupby/CharGroupByFunctionBatchTest.java`:
- Around line 61-249: These batch tests allocate native memory (via
allocateChars / Unsafe.malloc) and must be wrapped with assertMemoryLeak() to
catch leaks; update each affected test (e.g., testFirstCharBatch,
testFirstCharBatchAccumulates, testFirstNotNullCharBatch, testLastCharBatch,
testMaxCharBatch, testMinCharBatchSimple, etc.) to run their body inside
assertMemoryLeak(() -> { ... }), keeping the existing try-with-resources
prepare(...) and calls to allocateChars and computeBatch unchanged; apply the
same pattern to the other batch test classes mentioned
(Date/Byte/IPv4/Float/Int/GeoHash) so every test that calls allocateChars or
other native allocators is wrapped in assertMemoryLeak().

In
`@core/src/test/java/io/questdb/test/griffin/engine/functions/groupby/MinCharGroupByFunctionFactoryTest.java`:
- Around line 62-90: The test method testFirstNull allocates native resources
(TableWriter/RecordCursor) and must be wrapped with assertMemoryLeak to ensure
no leaks; update testFirstNull to call assertMemoryLeak(() -> { ... }) around
the existing test body (the parts that call getWriter("tab"), select("select
min(f) from tab"), and related TableWriter/RecordCursor usage) so that resource
allocation and cursor/writer lifecycles are executed inside the assertMemoryLeak
closure and existing try-with-resources blocks remain unchanged inside it.

---

Nitpick comments:
In
`@core/src/main/java/io/questdb/griffin/engine/functions/groupby/CountGeoHashGroupByFunctionByte.java`:
- Around line 41-51: In computeBatch of CountGeoHashGroupByFunctionByte, avoid
calling mapValue.addLong(valueIndex, 0) for batches with no non-null bytes to
reduce map churn: after counting nonNullCount (the loop using
Unsafe.getUnsafe().getByte and GeoHashes.BYTE_NULL), only call
mapValue.addLong(valueIndex, nonNullCount) when nonNullCount > 0; leave the
counting loop and variable names (ptr, count, hi, nonNullCount, valueIndex)
unchanged.

In
`@core/src/main/java/io/questdb/griffin/engine/functions/groupby/FirstNotNullIPv4GroupByFunctionFactory.java`:
- Around line 70-73: Replace the hardcoded 4L used for IPv4 pointer arithmetic
in FirstNotNullIPv4GroupByFunctionFactory with Integer.BYTES to avoid magic
numbers: compute hi as ptr + count * (long)Integer.BYTES, increment ptr by
(long)Integer.BYTES in the loop, and update the similar occurrence at the other
location (around the second loop at the noted lines); keep the getInt(ptr) usage
unchanged but ensure all pointer math consistently uses Integer.BYTES (cast to
long where needed).

In
`@core/src/main/java/io/questdb/griffin/engine/functions/groupby/MaxTimestampGroupByFunction.java`:
- Around line 48-56: MaxTimestampGroupByFunction.computeBatch currently relies
on implicit sentinel comparison and uses mapValue.putLong, which is inconsistent
with MinTimestampGroupByFunction and with this class's computeFirst/setNull that
use putTimestamp; update computeBatch in MaxTimestampGroupByFunction to mirror
MinTimestampGroupByFunction by explicitly guarding against Numbers.LONG_NULL for
batchMax (e.g., if (batchMax != Numbers.LONG_NULL) { ... }) and when updating
the map use mapValue.putTimestamp(valueIndex, batchMax) instead of putLong so
the null-sentinel handling and write method are consistent with computeFirst and
setNull.

In `@core/src/main/java/io/questdb/griffin/engine/functions/GroupByFunction.java`:
- Around line 58-75: Update the Javadoc for GroupByFunction.computeBatch to
explicitly state that for a given MapValue successive invocations of
computeBatch will receive non-overlapping, strictly increasing row ranges: each
call’s startRowId must be greater than the previous call’s startRowId and the
ranges must not overlap, so implementations of computeBatch (which must
accumulate into state set by setEmpty and may be invoked without a preceding
computeFirst or followed by merge when supportsBatchComputation() is true) may
rely on monotonic, non-overlapping batches to implement first/last semantics
correctly.

In
`@core/src/main/java/io/questdb/griffin/engine/table/AsyncGroupByNotKeyedRecordCursorFactory.java`:
- Line 75: Rename the boolean field 'vectorized' in
AsyncGroupByNotKeyedRecordCursorFactory to 'isVectorized' and update all
references (constructor parameter, assignments, and any usages such as in
methods or other classes) to the new name; specifically change the field
declaration 'vectorized' to 'isVectorized', update the constructor parameter and
assignment that set this vector flag, and replace all usages at the sites
referenced (previously at lines around 136 and 185) to use 'isVectorized' so
naming follows the is... boolean convention.

In
`@core/src/test/java/io/questdb/test/griffin/engine/functions/groupby/DoubleGroupByFunctionBatchTest.java`:
- Around line 577-621: The four new test methods
(testAvgDoubleComputeFirstInfinity, testAvgDoubleComputeFirstNaN,
testAvgDoubleComputeNextInfinity, testAvgDoubleComputeNextNaNSkipped) are placed
out of alphabetical order; move these methods inside the existing block of other
testAvg... members in DoubleGroupByFunctionBatchTest so they are grouped with
the other testAvg* methods and sorted alphabetically within that group, keeping
each method signature and body unchanged and preserving the try-with-resources
usage and assertions.

In
`@core/src/test/java/io/questdb/test/griffin/engine/functions/groupby/GeoHashGroupByFunctionBatchTest.java`:
- Around line 366-397: Add sibling two-batch unit tests for the BYTE and SHORT
geohash variants (and ensure the LONG variant has a two-batch accumulation test
with a non-zero second batch startRowId) similar to
testLastNotNullGeoHashBatchIntAccumulates: create tests (e.g.,
testLastNotNullGeoHashBatchByteAccumulates and
testLastNotNullGeoHashBatchShortAccumulates) that build a GroupByFunction via
newLastNotNullGeoHashFunction(GeoByteColumn.newInstance(...)) /
GeoShortColumn.newInstance(...), call function.setNull(value), then call
function.computeBatch(value, ptr, len, 0) for the first batch and
function.computeBatch(value, ptr2, len2, startRowId>0) for the second batch
using the appropriate allocateBytes/allocateShorts (and for LONG ensure a
second-batch startRowId>0 case if missing) with GeoHashes.*_NULL/values, and
finally assert the expected value with
function.getGeoByte/getGeoShort/getGeoLong and that
function.supportsBatchComputation() where appropriate.

In
`@core/src/test/java/io/questdb/test/griffin/engine/functions/groupby/ShortGroupByFunctionBatchTest.java`:
- Around line 230-239: The test currently calls function.computeBatch(value,
ptr, 2, 0) twice, which doesn't verify accumulation when the second batch has a
non-zero startRowId; update the second computeBatch invocation to use a non-zero
offset (e.g., 2) so it reads like function.computeBatch(value, ptr, 2, 2) to
exercise accumulation across batches for SumShortGroupByFunction; keep the rest
of the test (prepare, allocateShorts, SimpleMapValue) unchanged.
- Around line 109-117: The test method names (e.g., testFirstShortBatchAllNull)
are misleading because the test bodies include one non-null value; rename them
to reflect the actual input pattern (e.g., testFirstShortBatchLeadingNull or
testFirstShortBatchTrailingNull) and update the corresponding test at the other
occurrence (lines ~187-196) accordingly; ensure you change the method
declarations and any references to these methods (search for
testFirstShortBatchAllNull and similar) and keep the test logic using
FirstShortGroupByFunction, allocateShorts, and prepare unchanged.

In `@core/src/test/java/io/questdb/test/griffin/ExplainPlanTest.java`:
- Around line 3132-3136: The test comment for testGroupByNotKeyed2 is
stale—update the inline comment that currently reads "expressions in aggregates
disable vectorized impl" to reflect that vectorization is expected (the
assertion shows "vectorized: true"); modify the comment near the test method
testGroupByNotKeyed2 (method name: testGroupByNotKeyed2) so it accurately
describes that expressions in aggregates do not disable vectorized execution in
this case.

In `@core/src/test/java/io/questdb/test/griffin/GroupByTest.java`:
- Around line 2984-3006: The tests (e.g.,
GroupByTest.testNonKeyedVectorizedAllNulls) currently assert only query results;
add an assertion that the planner chooses the vectorized execution path by
running an EXPLAIN (or planner plan) for the same query and asserting the plan
text contains the vectorized operator identifier used by the engine (e.g.,
"VectorizedGroupBy" or "vectorized") before calling assertQueryNoLeakCheck;
apply the same pattern to the other affected tests (the tests around the shown
ranges that use the same GROUP BY aggregate SQL and assertQueryNoLeakCheck) so
they fail if the planner regresses to a non-vectorized plan.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: c2a9071a-a2fa-4afc-a617-327ce274b29a

📥 Commits

Reviewing files that changed from the base of the PR and between db3ec2c and f1f85a9.

📒 Files selected for processing (103)

core/src/main/java/io/questdb/griffin/engine/functions/GroupByFunction.java
core/src/main/java/io/questdb/griffin/engine/functions/groupby/AvgDoubleGroupByFunction.java
core/src/main/java/io/questdb/griffin/engine/functions/groupby/AvgShortGroupByFunction.java
core/src/main/java/io/questdb/griffin/engine/functions/groupby/CountDoubleGroupByFunction.java
core/src/main/java/io/questdb/griffin/engine/functions/groupby/CountFloatGroupByFunction.java
core/src/main/java/io/questdb/griffin/engine/functions/groupby/CountGeoHashGroupByFunctionByte.java
core/src/main/java/io/questdb/griffin/engine/functions/groupby/CountGeoHashGroupByFunctionInt.java
core/src/main/java/io/questdb/griffin/engine/functions/groupby/CountGeoHashGroupByFunctionLong.java
core/src/main/java/io/questdb/griffin/engine/functions/groupby/CountGeoHashGroupByFunctionShort.java
core/src/main/java/io/questdb/griffin/engine/functions/groupby/CountIPv4GroupByFunction.java
core/src/main/java/io/questdb/griffin/engine/functions/groupby/CountIntGroupByFunction.java
core/src/main/java/io/questdb/griffin/engine/functions/groupby/CountLongConstGroupByFunction.java
core/src/main/java/io/questdb/griffin/engine/functions/groupby/CountLongGroupByFunction.java
core/src/main/java/io/questdb/griffin/engine/functions/groupby/FirstBooleanGroupByFunction.java
core/src/main/java/io/questdb/griffin/engine/functions/groupby/FirstByteGroupByFunction.java
core/src/main/java/io/questdb/griffin/engine/functions/groupby/FirstCharGroupByFunction.java
core/src/main/java/io/questdb/griffin/engine/functions/groupby/FirstDateGroupByFunction.java
core/src/main/java/io/questdb/griffin/engine/functions/groupby/FirstDoubleGroupByFunction.java
core/src/main/java/io/questdb/griffin/engine/functions/groupby/FirstFloatGroupByFunction.java
core/src/main/java/io/questdb/griffin/engine/functions/groupby/FirstGeoHashGroupByFunctionByte.java
core/src/main/java/io/questdb/griffin/engine/functions/groupby/FirstGeoHashGroupByFunctionInt.java
core/src/main/java/io/questdb/griffin/engine/functions/groupby/FirstGeoHashGroupByFunctionLong.java
core/src/main/java/io/questdb/griffin/engine/functions/groupby/FirstGeoHashGroupByFunctionShort.java
core/src/main/java/io/questdb/griffin/engine/functions/groupby/FirstIPv4GroupByFunction.java
core/src/main/java/io/questdb/griffin/engine/functions/groupby/FirstIntGroupByFunction.java
core/src/main/java/io/questdb/griffin/engine/functions/groupby/FirstLongGroupByFunction.java
core/src/main/java/io/questdb/griffin/engine/functions/groupby/FirstNotNullCharGroupByFunction.java
core/src/main/java/io/questdb/griffin/engine/functions/groupby/FirstNotNullDateGroupByFunction.java
core/src/main/java/io/questdb/griffin/engine/functions/groupby/FirstNotNullDoubleGroupByFunction.java
core/src/main/java/io/questdb/griffin/engine/functions/groupby/FirstNotNullFloatGroupByFunction.java
core/src/main/java/io/questdb/griffin/engine/functions/groupby/FirstNotNullGeoHashGroupByFunctionFactory.java
core/src/main/java/io/questdb/griffin/engine/functions/groupby/FirstNotNullIPv4GroupByFunctionFactory.java
core/src/main/java/io/questdb/griffin/engine/functions/groupby/FirstNotNullIntGroupByFunction.java
core/src/main/java/io/questdb/griffin/engine/functions/groupby/FirstNotNullLongGroupByFunction.java
core/src/main/java/io/questdb/griffin/engine/functions/groupby/FirstNotNullTimestampGroupByFunction.java
core/src/main/java/io/questdb/griffin/engine/functions/groupby/FirstShortGroupByFunction.java
core/src/main/java/io/questdb/griffin/engine/functions/groupby/FirstTimestampGroupByFunction.java
core/src/main/java/io/questdb/griffin/engine/functions/groupby/LastBooleanGroupByFunction.java
core/src/main/java/io/questdb/griffin/engine/functions/groupby/LastByteGroupByFunction.java
core/src/main/java/io/questdb/griffin/engine/functions/groupby/LastCharGroupByFunction.java
core/src/main/java/io/questdb/griffin/engine/functions/groupby/LastDateGroupByFunction.java
core/src/main/java/io/questdb/griffin/engine/functions/groupby/LastDoubleGroupByFunction.java
core/src/main/java/io/questdb/griffin/engine/functions/groupby/LastFloatGroupByFunction.java
core/src/main/java/io/questdb/griffin/engine/functions/groupby/LastGeoHashGroupByFunctionFactory.java
core/src/main/java/io/questdb/griffin/engine/functions/groupby/LastIPv4GroupByFunction.java
core/src/main/java/io/questdb/griffin/engine/functions/groupby/LastIntGroupByFunction.java
core/src/main/java/io/questdb/griffin/engine/functions/groupby/LastLongGroupByFunction.java
core/src/main/java/io/questdb/griffin/engine/functions/groupby/LastNotNullCharGroupByFunction.java
core/src/main/java/io/questdb/griffin/engine/functions/groupby/LastNotNullDateGroupByFunction.java
core/src/main/java/io/questdb/griffin/engine/functions/groupby/LastNotNullDoubleGroupByFunction.java
core/src/main/java/io/questdb/griffin/engine/functions/groupby/LastNotNullFloatGroupByFunction.java
core/src/main/java/io/questdb/griffin/engine/functions/groupby/LastNotNullGeoHashGroupByFunctionFactory.java
core/src/main/java/io/questdb/griffin/engine/functions/groupby/LastNotNullIPv4GroupByFunctionFactory.java
core/src/main/java/io/questdb/griffin/engine/functions/groupby/LastNotNullIntGroupByFunction.java
core/src/main/java/io/questdb/griffin/engine/functions/groupby/LastNotNullLongGroupByFunction.java
core/src/main/java/io/questdb/griffin/engine/functions/groupby/LastNotNullTimestampGroupByFunction.java
core/src/main/java/io/questdb/griffin/engine/functions/groupby/LastShortGroupByFunction.java
core/src/main/java/io/questdb/griffin/engine/functions/groupby/LastTimestampGroupByFunction.java
core/src/main/java/io/questdb/griffin/engine/functions/groupby/MaxCharGroupByFunction.java
core/src/main/java/io/questdb/griffin/engine/functions/groupby/MaxDateGroupByFunction.java
core/src/main/java/io/questdb/griffin/engine/functions/groupby/MaxDoubleGroupByFunction.java
core/src/main/java/io/questdb/griffin/engine/functions/groupby/MaxFloatGroupByFunction.java
core/src/main/java/io/questdb/griffin/engine/functions/groupby/MaxIPv4GroupByFunction.java
core/src/main/java/io/questdb/griffin/engine/functions/groupby/MaxIntGroupByFunction.java
core/src/main/java/io/questdb/griffin/engine/functions/groupby/MaxLongGroupByFunction.java
core/src/main/java/io/questdb/griffin/engine/functions/groupby/MaxTimestampGroupByFunction.java
core/src/main/java/io/questdb/griffin/engine/functions/groupby/MinCharGroupByFunction.java
core/src/main/java/io/questdb/griffin/engine/functions/groupby/MinDateGroupByFunction.java
core/src/main/java/io/questdb/griffin/engine/functions/groupby/MinDoubleGroupByFunction.java
core/src/main/java/io/questdb/griffin/engine/functions/groupby/MinFloatGroupByFunction.java
core/src/main/java/io/questdb/griffin/engine/functions/groupby/MinIPv4GroupByFunction.java
core/src/main/java/io/questdb/griffin/engine/functions/groupby/MinIntGroupByFunction.java
core/src/main/java/io/questdb/griffin/engine/functions/groupby/MinLongGroupByFunction.java
core/src/main/java/io/questdb/griffin/engine/functions/groupby/MinTimestampGroupByFunction.java
core/src/main/java/io/questdb/griffin/engine/functions/groupby/SumDoubleGroupByFunction.java
core/src/main/java/io/questdb/griffin/engine/functions/groupby/SumFloatGroupByFunction.java
core/src/main/java/io/questdb/griffin/engine/functions/groupby/SumIntGroupByFunction.java
core/src/main/java/io/questdb/griffin/engine/functions/groupby/SumLongGroupByFunction.java
core/src/main/java/io/questdb/griffin/engine/functions/groupby/SumShortGroupByFunction.java
core/src/main/java/io/questdb/griffin/engine/join/AsyncWindowJoinFastRecordCursorFactory.java
core/src/main/java/io/questdb/griffin/engine/join/AsyncWindowJoinRecordCursorFactory.java
core/src/main/java/io/questdb/griffin/engine/join/WindowJoinFastRecordCursorFactory.java
core/src/main/java/io/questdb/griffin/engine/table/AsyncGroupByNotKeyedAtom.java
core/src/main/java/io/questdb/griffin/engine/table/AsyncGroupByNotKeyedRecordCursorFactory.java
core/src/test/java/io/questdb/test/cairo/fuzz/ParallelGroupByFuzzTest.java
core/src/test/java/io/questdb/test/griffin/AggregateTest.java
core/src/test/java/io/questdb/test/griffin/ExplainPlanTest.java
core/src/test/java/io/questdb/test/griffin/GroupByTest.java
core/src/test/java/io/questdb/test/griffin/SqlOptimiserTest.java
core/src/test/java/io/questdb/test/griffin/engine/functions/finance/LevelTwoPriceFunctionFactoryTest.java
core/src/test/java/io/questdb/test/griffin/engine/functions/groupby/BooleanGroupByFunctionBatchTest.java
core/src/test/java/io/questdb/test/griffin/engine/functions/groupby/ByteGroupByFunctionBatchTest.java
core/src/test/java/io/questdb/test/griffin/engine/functions/groupby/CharGroupByFunctionBatchTest.java
core/src/test/java/io/questdb/test/griffin/engine/functions/groupby/DateGroupByFunctionBatchTest.java
core/src/test/java/io/questdb/test/griffin/engine/functions/groupby/DoubleGroupByFunctionBatchTest.java
core/src/test/java/io/questdb/test/griffin/engine/functions/groupby/FloatGroupByFunctionBatchTest.java
core/src/test/java/io/questdb/test/griffin/engine/functions/groupby/GeoHashGroupByFunctionBatchTest.java
core/src/test/java/io/questdb/test/griffin/engine/functions/groupby/IPv4GroupByFunctionBatchTest.java
core/src/test/java/io/questdb/test/griffin/engine/functions/groupby/IntGroupByFunctionBatchTest.java
core/src/test/java/io/questdb/test/griffin/engine/functions/groupby/LongGroupByFunctionBatchTest.java
core/src/test/java/io/questdb/test/griffin/engine/functions/groupby/MinCharGroupByFunctionFactoryTest.java
core/src/test/java/io/questdb/test/griffin/engine/functions/groupby/ShortGroupByFunctionBatchTest.java
core/src/test/java/io/questdb/test/griffin/engine/groupby/GroupByFunctionCaseTest.java

core/src/main/java/io/questdb/griffin/engine/functions/groupby/AvgDoubleGroupByFunction.java

core/src/main/java/io/questdb/griffin/engine/functions/groupby/SumShortGroupByFunction.java

core/src/main/java/io/questdb/griffin/engine/join/AsyncWindowJoinFastRecordCursorFactory.java

core/src/main/java/io/questdb/griffin/engine/join/WindowJoinFastRecordCursorFactory.java

...st/java/io/questdb/test/griffin/engine/functions/groupby/DoubleGroupByFunctionBatchTest.java

...test/java/io/questdb/test/griffin/engine/functions/groupby/LongGroupByFunctionBatchTest.java

core/src/test/java/io/questdb/test/griffin/engine/groupby/GroupByFunctionCaseTest.java

…p_by

glasstiger · 2026-03-09T14:13:34Z

[PR Coverage check]

😍 pass : 561 / 569 (98.59%)

file detail

	path	covered line	new line	coverage
🔵	io/questdb/griffin/engine/join/AsyncWindowJoinRecordCursorFactory.java	3	8	37.50%
🔵	io/questdb/griffin/engine/functions/groupby/AvgShortGroupByFunction.java	5	6	83.33%
🔵	io/questdb/griffin/engine/join/AsyncWindowJoinFastRecordCursorFactory.java	7	8	87.50%
🔵	io/questdb/griffin/engine/table/AsyncGroupByNotKeyedAtom.java	12	13	92.31%
🔵	io/questdb/griffin/engine/functions/groupby/MaxDoubleGroupByFunction.java	3	3	100.00%
🔵	io/questdb/griffin/engine/functions/groupby/FirstIntGroupByFunction.java	4	4	100.00%
🔵	io/questdb/griffin/engine/functions/groupby/FirstShortGroupByFunction.java	4	4	100.00%
🔵	io/questdb/griffin/engine/functions/groupby/LastDoubleGroupByFunction.java	5	5	100.00%
🔵	io/questdb/griffin/engine/functions/groupby/FirstFloatGroupByFunction.java	4	4	100.00%
🔵	io/questdb/griffin/engine/functions/groupby/FirstCharGroupByFunction.java	4	4	100.00%
🔵	io/questdb/griffin/engine/functions/groupby/FirstNotNullTimestampGroupByFunction.java	13	13	100.00%
🔵	io/questdb/griffin/engine/functions/groupby/CountGeoHashGroupByFunctionByte.java	1	1	100.00%
🔵	io/questdb/griffin/engine/functions/groupby/MaxFloatGroupByFunction.java	3	3	100.00%
🔵	io/questdb/griffin/engine/functions/groupby/FirstNotNullFloatGroupByFunction.java	7	7	100.00%
🔵	io/questdb/griffin/engine/functions/groupby/LastCharGroupByFunction.java	5	5	100.00%
🔵	io/questdb/griffin/engine/functions/groupby/FirstNotNullIPv4GroupByFunctionFactory.java	7	7	100.00%
🔵	io/questdb/griffin/engine/functions/groupby/FirstGeoHashGroupByFunctionLong.java	4	4	100.00%
🔵	io/questdb/griffin/engine/functions/groupby/LastFloatGroupByFunction.java	5	5	100.00%
🔵	io/questdb/griffin/engine/functions/groupby/CountGeoHashGroupByFunctionInt.java	1	1	100.00%
🔵	io/questdb/griffin/engine/functions/groupby/AvgDoubleGroupByFunction.java	13	13	100.00%
🔵	io/questdb/griffin/engine/functions/groupby/FirstBooleanGroupByFunction.java	4	4	100.00%
🔵	io/questdb/griffin/engine/functions/groupby/LastLongGroupByFunction.java	5	5	100.00%
🔵	io/questdb/griffin/engine/functions/groupby/LastNotNullDoubleGroupByFunction.java	8	8	100.00%
🔵	io/questdb/griffin/engine/functions/groupby/LastDateGroupByFunction.java	5	5	100.00%
🔵	io/questdb/griffin/engine/functions/groupby/MinDateGroupByFunction.java	5	5	100.00%
🔵	io/questdb/griffin/engine/functions/groupby/MaxIPv4GroupByFunction.java	3	3	100.00%
🔵	io/questdb/griffin/engine/functions/groupby/CountGeoHashGroupByFunctionShort.java	1	1	100.00%
🔵	io/questdb/griffin/engine/functions/groupby/LastNotNullLongGroupByFunction.java	7	7	100.00%
🔵	io/questdb/griffin/engine/functions/groupby/FirstNotNullIntGroupByFunction.java	7	7	100.00%
🔵	io/questdb/griffin/engine/functions/groupby/MaxDateGroupByFunction.java	4	4	100.00%
🔵	io/questdb/griffin/engine/functions/groupby/LastIPv4GroupByFunction.java	5	5	100.00%
🔵	io/questdb/griffin/engine/functions/groupby/FirstNotNullCharGroupByFunction.java	7	7	100.00%
🔵	io/questdb/griffin/engine/functions/groupby/LastNotNullDateGroupByFunction.java	7	7	100.00%
🔵	io/questdb/griffin/engine/functions/groupby/MinLongGroupByFunction.java	5	5	100.00%
🔵	io/questdb/griffin/engine/functions/groupby/CountGeoHashGroupByFunctionLong.java	1	1	100.00%
🔵	io/questdb/griffin/engine/functions/groupby/LastBooleanGroupByFunction.java	5	5	100.00%
🔵	io/questdb/griffin/engine/functions/groupby/LastByteGroupByFunction.java	5	5	100.00%
🔵	io/questdb/griffin/engine/functions/groupby/FirstGeoHashGroupByFunctionShort.java	4	4	100.00%
🔵	io/questdb/griffin/engine/functions/groupby/LastShortGroupByFunction.java	5	5	100.00%
🔵	io/questdb/griffin/engine/functions/groupby/FirstNotNullLongGroupByFunction.java	7	7	100.00%
🔵	io/questdb/griffin/engine/join/WindowJoinFastRecordCursorFactory.java	2	2	100.00%
🔵	io/questdb/griffin/engine/table/AsyncGroupByNotKeyedRecordCursorFactory.java	83	83	100.00%
🔵	io/questdb/griffin/engine/functions/groupby/SumLongGroupByFunction.java	6	6	100.00%
🔵	io/questdb/griffin/engine/functions/groupby/FirstDoubleGroupByFunction.java	4	4	100.00%
🔵	io/questdb/griffin/engine/functions/groupby/LastIntGroupByFunction.java	5	5	100.00%
🔵	io/questdb/griffin/engine/functions/groupby/MinTimestampGroupByFunction.java	5	5	100.00%
🔵	io/questdb/griffin/engine/functions/groupby/FirstNotNullGeoHashGroupByFunctionFactory.java	28	28	100.00%
🔵	io/questdb/griffin/engine/functions/groupby/FirstNotNullDateGroupByFunction.java	7	7	100.00%
🔵	io/questdb/griffin/engine/functions/groupby/MinFloatGroupByFunction.java	3	3	100.00%
🔵	io/questdb/griffin/engine/functions/groupby/LastNotNullTimestampGroupByFunction.java	13	13	100.00%
🔵	io/questdb/griffin/engine/functions/groupby/MaxCharGroupByFunction.java	3	3	100.00%
🔵	io/questdb/griffin/engine/functions/groupby/MinIPv4GroupByFunction.java	7	7	100.00%
🔵	io/questdb/griffin/engine/functions/groupby/MinDoubleGroupByFunction.java	3	3	100.00%
🔵	io/questdb/griffin/engine/functions/groupby/MaxLongGroupByFunction.java	4	4	100.00%
🔵	io/questdb/griffin/engine/functions/groupby/CountDoubleGroupByFunction.java	1	1	100.00%
🔵	io/questdb/griffin/engine/functions/groupby/FirstIPv4GroupByFunction.java	4	4	100.00%
🔵	io/questdb/griffin/engine/functions/groupby/LastTimestampGroupByFunction.java	5	5	100.00%
🔵	io/questdb/griffin/engine/functions/groupby/MaxTimestampGroupByFunction.java	4	4	100.00%
🔵	io/questdb/griffin/engine/functions/groupby/SumShortGroupByFunction.java	5	5	100.00%
🔵	io/questdb/griffin/engine/functions/groupby/LastNotNullIntGroupByFunction.java	7	7	100.00%
🔵	io/questdb/griffin/engine/functions/groupby/FirstLongGroupByFunction.java	4	4	100.00%
🔵	io/questdb/griffin/engine/functions/groupby/FirstTimestampGroupByFunction.java	4	4	100.00%
🔵	io/questdb/griffin/engine/functions/groupby/LastGeoHashGroupByFunctionFactory.java	24	24	100.00%
🔵	io/questdb/griffin/engine/functions/groupby/MinIntGroupByFunction.java	5	5	100.00%
🔵	io/questdb/griffin/engine/functions/groupby/LastNotNullIPv4GroupByFunctionFactory.java	7	7	100.00%
🔵	io/questdb/griffin/engine/functions/groupby/SumDoubleGroupByFunction.java	10	10	100.00%
🔵	io/questdb/griffin/engine/functions/groupby/FirstNotNullDoubleGroupByFunction.java	9	9	100.00%
🔵	io/questdb/griffin/engine/functions/groupby/MinCharGroupByFunction.java	7	7	100.00%
🔵	io/questdb/griffin/engine/functions/groupby/LastNotNullGeoHashGroupByFunctionFactory.java	28	28	100.00%
🔵	io/questdb/griffin/engine/functions/groupby/FirstByteGroupByFunction.java	4	4	100.00%
🔵	io/questdb/griffin/engine/functions/groupby/SumIntGroupByFunction.java	6	6	100.00%
🔵	io/questdb/griffin/engine/functions/groupby/FirstGeoHashGroupByFunctionByte.java	4	4	100.00%
🔵	io/questdb/griffin/engine/functions/groupby/CountLongConstGroupByFunction.java	1	1	100.00%
🔵	io/questdb/griffin/engine/functions/groupby/SumFloatGroupByFunction.java	12	12	100.00%
🔵	io/questdb/griffin/engine/functions/groupby/LastNotNullCharGroupByFunction.java	7	7	100.00%
🔵	io/questdb/griffin/engine/functions/groupby/LastNotNullFloatGroupByFunction.java	7	7	100.00%
🔵	io/questdb/griffin/engine/functions/groupby/FirstDateGroupByFunction.java	4	4	100.00%
🔵	io/questdb/griffin/engine/functions/groupby/MaxIntGroupByFunction.java	4	4	100.00%
🔵	io/questdb/griffin/engine/functions/groupby/CountFloatGroupByFunction.java	1	1	100.00%
🔵	io/questdb/griffin/engine/functions/groupby/FirstGeoHashGroupByFunctionInt.java	4	4	100.00%

…ies (questdb#6805)

puzpuzpuz added 2 commits February 23, 2026 19:42

feat(sql): use vectorized computation in more non-keyed GROUP BY queries

e2349c9

Merge remote-tracking branch 'upstream/master' into puzpuzpuz_vect_gr…

7fe0cc2

…oup_by

puzpuzpuz self-assigned this Feb 23, 2026

puzpuzpuz added SQL Issues or changes relating to SQL execution Performance Performance improvements labels Feb 23, 2026

puzpuzpuz and others added 5 commits February 23, 2026 20:57

Fix AvgDoubleGroupByFunction

9953aa6

Merge remote-tracking branch 'upstream/master' into puzpuzpuz_vect_gr…

3cca934

…oup_by

Fix test

b149ded

Merge branch 'master' into puzpuzpuz_vect_group_by

0dd748f

Merge branch 'master' into puzpuzpuz_vect_group_by

a59edbd

glasstiger self-requested a review March 4, 2026 10:34

glasstiger previously approved these changes Mar 4, 2026

View reviewed changes

Merge remote-tracking branch 'origin/master' into puzpuzpuz_vect_grou…

7ef7cbf

…p_by

fix bug 1-3

fa588c6

coderabbitai bot reviewed Mar 4, 2026

View reviewed changes

Revert the fix for bugs 1-3 and fix a few more issues

fb08047

puzpuzpuz added 2 commits March 4, 2026 22:38

Use assertMemoryLeak in tests

162b29d

Restore the fix for AvgDoubleGroupByFunction

bcd0a23

isFinite() -> !isNaN() to properly handle overflows

f1f85a9

coderabbitai bot reviewed Mar 5, 2026

View reviewed changes

glasstiger added 3 commits March 5, 2026 15:32

fixing GroupByRewriteTest

895630d

Merge remote-tracking branch 'origin/master' into puzpuzpuz_vect_grou…

abe90f3

…p_by

fix issue: NaN overloaded as "empty"

f6d47c7

glasstiger approved these changes Mar 5, 2026

View reviewed changes

Merge branch 'master' into puzpuzpuz_vect_group_by

56e5c4d

bluestreak01 merged commit bf956c6 into master Mar 9, 2026
50 checks passed

bluestreak01 deleted the puzpuzpuz_vect_group_by branch March 9, 2026 15:02

maciulis pushed a commit to maciulis/questdb that referenced this pull request Mar 16, 2026

feat(sql): use vectorized computation in more non-keyed GROUP BY quer…

0565333

…ies (questdb#6805)

Path	`NaN`	`±Infinity`
`computeBatch`	skipped	accepted
`computeNext`	skipped	skipped

Conversation

puzpuzpuz commented Feb 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai bot commented Feb 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Estimated code review effort

Suggested reviewers

Uh oh!

glasstiger commented Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

puzpuzpuz commented Mar 4, 2026

Uh oh!

bluestreak01 commented Mar 4, 2026

Uh oh!

coderabbitai bot commented Mar 4, 2026

🔴 Bug 1: AvgDoubleGroupByFunction.computeBatch — accumulated Infinity is silently discarded

🔴 Bug 2: SumDoubleGroupByFunction.computeBatch — identical Infinity-erasure bug

🔴 Bug 3: SumFloatGroupByFunction.computeBatch — same issue for float

🟠 Bug 4: All window-join callers pass startRowId = 0 — semantically broken for first()/last() across multi-batch groups

🟡 Bug 5: MaxIPv4GroupByFunction — null guard inconsistency (style/latent)

🟡 Design concern: FirstNotNull* triple-condition guard

Uh oh!

glasstiger commented Mar 4, 2026

Uh oh!

glasstiger commented Mar 4, 2026

Uh oh!

coderabbitai bot commented Mar 4, 2026

Corrections to my previous review

What is actually a real (minor) new issue

Standing findings (unchanged)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

puzpuzpuz commented Mar 4, 2026

Uh oh!

puzpuzpuz commented Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

glasstiger commented Mar 4, 2026

Uh oh!

puzpuzpuz commented Mar 4, 2026

Uh oh!

glasstiger commented Mar 5, 2026

Uh oh!

coderabbitai bot commented Mar 5, 2026

✅ All previous concerns resolved

SumDoubleGroupByFunction

SumFloatGroupByFunction

AvgDoubleGroupByFunction

🟠 Remaining dormant issue (unchanged from before)

Overall

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

glasstiger commented Mar 9, 2026

[PR Coverage check]

file detail

puzpuzpuz commented Feb 23, 2026 •

edited

Loading

coderabbitai bot commented Feb 23, 2026 •

edited

Loading

glasstiger commented Mar 4, 2026 •

edited

Loading

🔴 Bug 1: `AvgDoubleGroupByFunction.computeBatch` — accumulated `Infinity` is silently discarded

🔴 Bug 2: `SumDoubleGroupByFunction.computeBatch` — identical Infinity-erasure bug

🔴 Bug 3: `SumFloatGroupByFunction.computeBatch` — same issue for `float`

🟠 Bug 4: All window-join callers pass `startRowId = 0` — semantically broken for `first()`/`last()` across multi-batch groups

🟡 Bug 5: `MaxIPv4GroupByFunction` — null guard inconsistency (style/latent)

🟡 Design concern: `FirstNotNull*` triple-condition guard

puzpuzpuz commented Mar 4, 2026 •

edited

Loading