Skip to content

feat(sql): add geomean() aggregate function#6656

Merged
bluestreak01 merged 1 commit intomasterfrom
vi_geomean
Jan 18, 2026
Merged

feat(sql): add geomean() aggregate function#6656
bluestreak01 merged 1 commit intomasterfrom
vi_geomean

Conversation

@bluestreak01
Copy link
Copy Markdown
Member

Summary

Adds geomean(double) aggregate function that computes the geometric mean of a set of positive numbers.

  • Formula: exp(avg(ln(x))) - uses logarithms to avoid overflow with large products
  • Signature: geomean(D) - accepts double (other numeric types implicitly convert)
  • Parallel execution: Fully supported via merge() for combining partial results across workers

Edge cases (following DuckDB semantics)

Input Result Rationale
Negative values null Geometric mean undefined for negatives (ln(negative) undefined)
Zero values null ln(0) is undefined
Null values Skipped Standard SQL aggregate behavior
Empty group null Standard SQL aggregate behavior

Constant folding optimization

When the argument is a constant, the factory returns DoubleConstant directly since geomean(c) = c for any positive constant c. This avoids aggregate machinery overhead.

Test plan

  • Basic functionality tests (simple, group by, single value, empty table)
  • Mathematical verification tests (geomean(2,8)=4, geomean(1,3,9)=3)
  • Null handling tests (all null, some null)
  • Invalid value tests (negative, zero, mixed valid/invalid)
  • Parallel execution tests (10K, 100K, 2M rows with 4 workers)
  • Constant argument tests
  • Formula verification test (geomean(x) = exp(avg(ln(x))))

🤖 Generated with Claude Code

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Jan 16, 2026

Walkthrough

This PR introduces a new geometric mean aggregate function (geomean) for group-by queries. It includes the core function implementation, a factory for function instantiation, and comprehensive test coverage for edge cases, parallelism, and mathematical correctness.

Changes

Cohort / File(s) Summary
Geomean Implementation
core/src/main/java/io/questdb/griffin/engine/functions/groupby/GeomeanDoubleGroupByFunction.java, GeomeanDoubleGroupByFunctionFactory.java
New classes implementing geometric mean aggregation over positive numeric values. Core function computes exp(sum(ln(x)) / count) with NaN propagation for nulls and non-positive values. Factory provides signature geomean(D) and optimizes constant inputs. Includes merge logic for parallel execution and thread-safety support.
Test Suite
core/src/test/java/io/questdb/test/griffin/engine/functions/groupby/GeomeanDoubleGroupByFunctionFactoryTest.java
18 test methods covering basic operations, edge cases (nulls, zero, negative values), group-by scenarios, parallel execution paths, large datasets, NaN propagation, and constant argument handling. Validates mathematical correctness via formula equivalence and plan structure verification.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Suggested labels

SQL

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title 'feat(sql): add geomean() aggregate function' directly and concisely describes the main change—adding a new geomean aggregate function to the SQL engine.
Description check ✅ Passed The description provides a detailed explanation of the geomean() function implementation, including formula, signature, edge case handling, and comprehensive test coverage that aligns with the changeset.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

🧹 Recent nitpick comments
core/src/main/java/io/questdb/griffin/engine/functions/groupby/GeomeanDoubleGroupByFunction.java (1)

169-183: Verify setDouble and setNull semantics are consistent with computeFirst.

setDouble handles non-positive values by setting NaN but always sets count=1, which is correct. However, setNull sets NaN with count=0, while computeFirst for null input sets sumLn=0.0 with count=0. This inconsistency may not cause issues since both result in count=0 (which returns NaN in getDouble), but using consistent values would be cleaner.

Consider using consistent null representation
 `@Override`
 public void setNull(MapValue mapValue) {
-    mapValue.putDouble(valueIndex, Double.NaN);
+    mapValue.putDouble(valueIndex, 0.0);
     mapValue.putLong(valueIndex + 1, 0);
 }
core/src/test/java/io/questdb/test/griffin/engine/functions/groupby/GeomeanDoubleGroupByFunctionFactoryTest.java (1)

255-263: Consider comparing parallel results against sequential execution.

assertSqlCursors(sql, sql) compares the query result to itself, which only verifies determinism across runs, not correctness. For stronger verification, consider comparing parallel execution results against a sequential baseline or the equivalent exp(avg(ln(value))) formula.

Suggested improvement for stronger verification
-                        // Run query and verify results are consistent
-                        TestUtils.assertSqlCursors(
-                                engine,
-                                sqlExecutionContext,
-                                sql,
-                                sql,
-                                LOG
-                        );
+                        // Run query and verify results match the formula
+                        String formulaSql = "select sym, exp(avg(ln(value))) from tab group by sym order by sym";
+                        TestUtils.assertSqlCursors(
+                                engine,
+                                sqlExecutionContext,
+                                sql,
+                                formulaSql,
+                                LOG
+                        );

📜 Recent review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 63ad191 and 10ce31a.

📒 Files selected for processing (3)
  • core/src/main/java/io/questdb/griffin/engine/functions/groupby/GeomeanDoubleGroupByFunction.java
  • core/src/main/java/io/questdb/griffin/engine/functions/groupby/GeomeanDoubleGroupByFunctionFactory.java
  • core/src/test/java/io/questdb/test/griffin/engine/functions/groupby/GeomeanDoubleGroupByFunctionFactoryTest.java
🧰 Additional context used
🧬 Code graph analysis (3)
core/src/main/java/io/questdb/griffin/engine/functions/groupby/GeomeanDoubleGroupByFunctionFactory.java (2)
core/src/main/java/io/questdb/griffin/engine/functions/constants/DoubleConstant.java (1)
  • DoubleConstant (33-73)
core/src/main/java/io/questdb/std/Numbers.java (1)
  • Numbers (44-3523)
core/src/main/java/io/questdb/griffin/engine/functions/groupby/GeomeanDoubleGroupByFunction.java (3)
core/src/main/java/io/questdb/cairo/ArrayColumnTypes.java (1)
  • ArrayColumnTypes (30-62)
core/src/main/java/io/questdb/griffin/engine/functions/DoubleFunction.java (1)
  • DoubleFunction (41-236)
core/src/main/java/io/questdb/std/Numbers.java (1)
  • Numbers (44-3523)
core/src/test/java/io/questdb/test/griffin/engine/functions/groupby/GeomeanDoubleGroupByFunctionFactoryTest.java (3)
core/src/main/java/io/questdb/griffin/SqlException.java (1)
  • SqlException (37-273)
core/src/main/java/io/questdb/mp/WorkerPool.java (1)
  • WorkerPool (40-221)
core/src/test/java/io/questdb/test/tools/TestUtils.java (1)
  • TestUtils (154-2668)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (34)
  • GitHub Check: New pull request (SelfHosted Running tests with cover on linux-other)
  • GitHub Check: New pull request (SelfHosted Running tests with cover on linux-pgwire)
  • GitHub Check: New pull request (SelfHosted Running tests with cover on linux-cairo-sub)
  • GitHub Check: New pull request (SelfHosted Running tests with cover on linux-cairo-root)
  • GitHub Check: New pull request (SelfHosted Running tests with cover on linux-fuzz2)
  • GitHub Check: New pull request (SelfHosted Running tests with cover on linux-fuzz1)
  • GitHub Check: New pull request (SelfHosted Running tests with cover on linux-griffin-sub)
  • GitHub Check: New pull request (SelfHosted Running tests with cover on linux-griffin-root)
  • GitHub Check: New pull request (SelfHosted Griffin tests on linux-x64-zfs)
  • GitHub Check: New pull request (Rust Test and Lint on linux-jdk17)
  • GitHub Check: New pull request (SelfHosted Griffin tests on linux-x86-graal)
  • GitHub Check: New pull request (SelfHosted Griffin tests on linux-arm64)
  • GitHub Check: New pull request (Trigger Enterprise CI Trigger Enterprise Pipeline)
  • GitHub Check: New pull request (Hosted Running tests on windows-other-2)
  • GitHub Check: New pull request (Hosted Running tests on windows-other-1)
  • GitHub Check: New pull request (Hosted Running tests on windows-pgwire)
  • GitHub Check: New pull request (Hosted Running tests on windows-cairo-2)
  • GitHub Check: New pull request (Hosted Running tests on windows-cairo-1)
  • GitHub Check: New pull request (Hosted Running tests on windows-fuzz2)
  • GitHub Check: New pull request (Hosted Running tests on windows-fuzz1)
  • GitHub Check: New pull request (Hosted Running tests on windows-griffin-sub)
  • GitHub Check: New pull request (Hosted Running tests on windows-griffin-base)
  • GitHub Check: New pull request (Hosted Running tests on mac-other)
  • GitHub Check: New pull request (Hosted Running tests on mac-pgwire)
  • GitHub Check: New pull request (Hosted Running tests on mac-cairo-fuzz)
  • GitHub Check: New pull request (Hosted Running tests on mac-cairo)
  • GitHub Check: New pull request (Hosted Running tests on mac-griffin)
  • GitHub Check: New pull request (SelfHosted Other tests on linux-x64-zfs)
  • GitHub Check: New pull request (SelfHosted Other tests on linux-x86-graal)
  • GitHub Check: New pull request (SelfHosted Cairo tests on linux-x64-zfs)
  • GitHub Check: New pull request (SelfHosted Cairo tests on linux-x86-graal)
  • GitHub Check: New pull request (SelfHosted Other tests on linux-arm64)
  • GitHub Check: New pull request (SelfHosted Cairo tests on linux-arm64)
  • GitHub Check: New pull request (Check Changes Check changes)
🔇 Additional comments (16)
core/src/main/java/io/questdb/griffin/engine/functions/groupby/GeomeanDoubleGroupByFunction.java (5)

38-50: Well-documented edge case semantics.

The class-level Javadoc clearly documents all edge cases (negative, zero, null, empty groups) and the mathematical formula used. This aligns with the DuckDB semantics mentioned in the PR objectives.


59-75: LGTM: computeFirst correctly initializes accumulator state.

The three branches handle all cases:

  • Null: count=0 indicates no valid values yet
  • Non-positive: NaN marker with count=1 to track the row was seen
  • Positive: ln(d) with count=1

77-94: LGTM: computeNext correctly accumulates values.

Null values are skipped (standard SQL behavior), non-positive values mark the sum as NaN (which propagates through addition), and positive values add ln(d) to the running sum.


101-114: LGTM: getDouble correctly computes the final result.

Returns NaN for empty groups (count=0) or when invalid values were encountered (NaN sumLn), otherwise returns exp(sumLn/count) as expected.


148-167: LGTM: merge correctly combines partial results for parallel execution.

The logic properly handles:

  • srcCount=0: nothing to merge
  • destCount=0: copy from source
  • Both non-zero: combine sums and counts (NaN propagates naturally through addition)
core/src/main/java/io/questdb/griffin/engine/functions/groupby/GeomeanDoubleGroupByFunctionFactory.java (2)

39-47: LGTM: Factory signature and group-by flag.

Correctly declares geomean(D) signature accepting double input, and marks itself as a group-by function.


49-70: LGTM: Constant folding optimization is correctly implemented.

The optimization correctly handles:

  • Null constant → DoubleConstant.NULL
  • Non-positive constant → DoubleConstant.NULL
  • Positive constant → new DoubleConstant(value) (since geomean(c) = c)

This avoids aggregate overhead for constant arguments.

core/src/test/java/io/questdb/test/griffin/engine/functions/groupby/GeomeanDoubleGroupByFunctionFactoryTest.java (9)

33-66: Excellent documentation of test scenarios.

The class-level Javadoc thoroughly documents all test scenarios covered, organized by category (basic functionality, parallel execution, edge cases). This makes it easy to understand the test coverage at a glance.


68-96: LGTM: Basic null and empty table tests.

Tests correctly verify that all-null values and empty tables return null, which aligns with standard SQL aggregate behavior.


98-143: LGTM: Mathematical verification with known formulas.

Tests verify geomean against hand-calculated values:

  • geomean(2, 8) = 4 ✓
  • geomean(1, 3, 9) = 3 ✓
  • geomean(4, 4, 4, 4) = 4 ✓

145-182: LGTM: Parallel execution plan verification.

Test correctly verifies that the query plan shows Async Group By workers: 4, confirming parallel execution is being used.


271-306: LGTM: Parallel execution with negative values test.

Correctly verifies that NaN propagates through parallel merge when any group contains negative values.


369-385: LGTM: Simple geomean test with exact expected value.

Good test verifying geomean(1, 2, 4) = 2.0 exactly (since 2^3 = 8 = 124).


425-453: LGTM: Mixed valid and invalid values per group.

Excellent test covering the scenario where some groups have valid results while others have invalid values (negative, zero), verifying independent group handling.


510-535: LGTM: Formula equivalence test.

Excellent test verifying geomean(value) = exp(avg(ln(value))) holds within tolerance for random positive values across multiple groups.


537-579: LGTM: Constant argument tests.

Comprehensive coverage of constant argument behavior including positive, negative, zero, and null constants, verifying the factory's constant folding optimization.

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@bluestreak01
Copy link
Copy Markdown
Member Author

@CodeRabbit review

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Jan 16, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@glasstiger
Copy link
Copy Markdown
Contributor

[PR Coverage check]

😍 pass : 64 / 73 (87.67%)

file detail

path covered line new line coverage
🔵 io/questdb/griffin/engine/functions/groupby/GeomeanDoubleGroupByFunction.java 51 60 85.00%
🔵 io/questdb/cutlass/http/processors/LineHttpTudCache.java 1 1 100.00%
🔵 io/questdb/griffin/engine/functions/groupby/GeomeanDoubleGroupByFunctionFactory.java 10 10 100.00%
🔵 io/questdb/cutlass/http/processors/LineHttpProcessorState.java 1 1 100.00%
🔵 io/questdb/cairo/wal/WalWriter.java 1 1 100.00%

@bluestreak01 bluestreak01 added SQL Issues or changes relating to SQL execution New feature Feature requests labels Jan 18, 2026
@bluestreak01 bluestreak01 merged commit 23df04d into master Jan 18, 2026
43 checks passed
@bluestreak01 bluestreak01 deleted the vi_geomean branch January 18, 2026 20:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

New feature Feature requests SQL Issues or changes relating to SQL execution

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants