feat(sql): weighted average and standard deviation by mtopolnik · Pull Request #6457 · questdb/questdb

mtopolnik · 2025-11-26T12:41:12Z

https://github.com/questdb/questdb-enterprise/pull/791

Adds weighted_avg(), weighted_stddev_rel(), weighted_stddev_freq(). Adds alias weighted_stddev() to weighted_stddev_rel().

According to Wikipedia, there are two variants for the unbiased estimator of population variance based on a subset of it (a sample): one for frequency weights, and another for reliability weights.

A frequency weight represents the number of occurrences of the sample in the dataset. A reliability weight represents the "importance" or "trustworthiness" of the given sample.

We implement both functions, called weighted_stddev_rel for reliability weights, and weighted_stddev_freq for frequency weights. We also define the shorthand weighted_stddev for weighted_stddev_rel.

These two:

  SELECT weighted_stddev_rel(value, weight) FROM my_table;
  SELECT weighted_stddev(value, weight) FROM my_table;

calculate the equivalent of

    SELECT sqrt(
      (
        sum(weight * value * value)
        - (sum(weight * value) * sum(weight * value) / sum(weight))
      )
      / (sum(weight) - sum(weight * weight) / sum(weight))
    ) FROM my_table;

This one:

  SELECT weighted_stddev_freq(value, weight) FROM my_table;

calculates the equivalent of

    SELECT sqrt(
      (
        sum(weight * value * value)
        - (sum(weight * value) * sum(weight * value) / sum(weight))
      )
      / (sum(weight) - 1)
    ) FROM my_table;

Benchmark

We're benchmarking the markout curve query, which is the motivator for the new aggregation functions.

Data setup:

CREATE TABLE prices (
      ts TIMESTAMP,
      sym SYMBOL,
      price DOUBLE
  ) timestamp(ts) PARTITION BY DAY;
INSERT INTO prices
  SELECT
      generate_series as timestamp,
      rnd_symbol_zipf(5_000, 2.0),
      rnd_double() * 10.0 + 5.0
      FROM generate_series('2025-01-02', '2025-01-03', '300u');

prices has 288 million rows.

CREATE TABLE orders (
    id LONG,
    order_ts TIMESTAMP,
    sym1 SYMBOL CAPACITY 1024,
    sym2 SYMBOL CAPACITY 1024,
    unit_price DOUBLE,
    volume DOUBLE
) TIMESTAMP(order_ts) PARTITION BY DAY;

WITH order_basics AS (SELECT * FROM (
  SELECT
    (generate_series / 1_000_000 - '2025-01-02'::timestamp) AS id,
    generate_series as order_ts,
    rnd_symbol_zipf(5_000, 2.0) AS sym1,
    rnd_symbol_zipf(5_000, 2.0) AS sym2,
    rnd_double() * 20 + 10 AS volume
  FROM generate_series('2025-01-02', '2025-01-03', '1s'))
  TIMESTAMP(order_ts)),
  join1 AS (
    SELECT /*+ asof_dense(ob p) */ ob.*, p.price price1 FROM order_basics ob ASOF JOIN prices p ON (ob.sym1 = p.sym)
  ),
  join2 AS (
    SELECT /*+ asof_dense(j1 p) */ j1.*, p.price price2 FROM join1 j1 ASOF JOIN prices p ON (j1.sym2 = p.sym)
  )
INSERT INTO orders
SELECT id, order_ts, sym1, sym2, price1/price2 AS unit_price, volume FROM join2;

orders has 86,400 rows.

WITH
  offsets AS (
    SELECT sec_offs, 1_000_000 * sec_offs usec_offs 
    FROM (SELECT x-601 AS sec_offs FROM long_sequence(1201))
  ),
  points AS (SELECT /*+ markout_horizon(orders offsets) */ * FROM (
    SELECT id, order_ts, sym1, sym2, unit_price, volume, sec_offs, order_ts + usec_offs AS ts
    FROM orders CROSS JOIN offsets
    ORDER BY order_ts + usec_offs
  ) TIMESTAMP(ts)),
  join1 AS (
    SELECT /*+ asof_dense(t p) */ t.*, p.price AS sym1_price
    FROM points as t
    ASOF JOIN prices as p
    ON (t.sym1 = p.sym)
  ),
  join2 AS (
    SELECT /*+ asof_dense(t p) */ t.*, p.price AS sym2_price
    FROM join1 as t
    ASOF JOIN prices as p
    ON (t.sym2 = p.sym)
  ),
  markouts AS (
    SELECT
      sec_offs,
      volume,
      unit_price - (sym1_price/sym2_price) AS markout
    FROM join2
  )
SELECT sec_offs, weighted_avg(markout, volume), weighted_stddev(markout, volume)
FROM markouts;

The CROSS JOIN outputs 103,680,000 rows, then we perform two ASOF JOIN lookups per row, for a total of 207,360,000 ASOF JOIN lookups. Then we perform two aggregations (weighted_avg and weighted_stddev).

This takes 42 seconds on EC2, r7a.4xlarge.

coderabbitai · 2025-11-26T12:41:21Z

Important

Review skipped

Auto reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

The PR extends QuestDB's GROUP BY aggregation subsystem by introducing weighted average and weighted standard deviation functions with parallel-merge support. Key changes include adding merge logic to AbstractStdDevGroupByFunction, introducing new weighted aggregate implementations (AbstractWeightedStdDevGroupByFunction, WeightedAvgDoubleGroupByFunction), removing merge logic from Pop/Sample variance functions in favor of centralized implementations, and adding comprehensive test coverage.

Changes

Cohort / File(s)	Summary
CAST Expression Handling `core/src/main/java/io/questdb/griffin/ExpressionParser.java`	Tightens CAST keyword handling by disallowing its usage after dot-dereference or literals, with explicit error throwing for invalid contexts.
GroupByFunction Base Infrastructure `core/src/main/java/io/questdb/griffin/engine/functions/GroupByFunction.java`	Adds documentation comment indicating setEmpty method is used by generated code.
Standard Deviation Centralization `core/src/main/java/io/questdb/griffin/engine/functions/groupby/AbstractStdDevGroupByFunction.java`	Implements merge logic using Chan et al. approach for combining partial aggregates; changes supportsParallelism() to return true.
Weighted Aggregation Base `core/src/main/java/io/questdb/griffin/engine/functions/groupby/AbstractWeightedStdDevGroupByFunction.java`	New abstract class providing weighted standard deviation computation with stateful accumulators (w_sum, w_sum2, mean, S) and merge support for parallelizable aggregation.
Variance Functions Refactoring `core/src/main/java/io/questdb/griffin/engine/functions/groupby/StdDevPopGroupByFunctionFactory.java`, `StdDevSampleGroupByFunctionFactory.java`, `VarSampleGroupByFunctionFactory.java`	Removes duplicate merge() and supportsParallelism() method implementations; consolidates parallelism logic into base class.
Weighted Average Implementation `core/src/main/java/io/questdb/griffin/engine/functions/groupby/WeightedAvgDoubleGroupByFunction.java`	New group-by function computing weighted average as weighted_sum / total_weight with merge and parallelism support.
Weighted Average Factory `core/src/main/java/io/questdb/griffin/engine/functions/groupby/WeightedAvgDoubleGroupByFunctionFactory.java`	Factory for weighted_avg(DD) group-by function.
Weighted StdDev Factories `core/src/main/java/io/questdb/griffin/engine/functions/groupby/WeightedStdDevFrequencyGroupByFunctionFactory.java`, `WeightedStdDevReliabilityGroupByFunctionFactory.java`, `WeightedStdDevGroupByFunctionFactory.java`	New factories implementing weighted_stddev_freq, weighted_stddev_rel, and weighted_stddev variants with frequency and reliability weight modes.
Avg Function Tests `core/src/test/java/io/questdb/test/griffin/engine/functions/groupby/AvgDoubleGroupByFunctionFactoryTest.java`	Expands test coverage to include weighted_avg and weighted_stddev variants; adds NULL handling and edge-case validation.
StdDev-Related Tests `core/src/test/java/io/questdb/test/griffin/engine/functions/groupby/StdDevGroupByFunctionFactoryTest.java`, `StdDevPopGroupByFunctionFactoryTest.java`, `StdDevSampleGroupByFunctionFactoryTest.java`, `VarGroupByFunctionFactoryTest.java`, `VarPopGroupByFunctionFactoryTest.java`, `VarSampleGroupByFunctionFactoryTest.java`	Refactors existing tests: renames methods, updates test data scaling, removes long256 test variants, adjusts expected values.
Weighted StdDev Tests `core/src/test/java/io/questdb/test/griffin/engine/functions/groupby/WeightedStdDevFreqGroupByFunctionFactoryTest.java`, `WeightedStdDevRelGroupByFunctionFactoryTest.java`	New comprehensive test suites validating weighted_stddev_freq and weighted_stddev_rel across diverse scenarios (nulls, edge cases, numeric ranges).
Test Infrastructure `core/src/test/java/io/questdb/test/sqllogictest/SqlTest.java`, `core/src/test/resources/sqllogictest/test/sql/test_stddev.test`	Narrows test discovery to single file; extends test_stddev.test with weight column and adds test cases for weighted_stddev variants.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Dense statistical implementations requiring verification of merge correctness (Chan et al. variance-combining approach)
Multiple interdependent weighted function variants (frequency vs. reliability weights) demanding consistency checks
Significant refactoring with removal of duplicate merge logic across multiple files
Comprehensive new test coverage requiring alignment with expected outputs
Risk areas requiring careful attention:
- AbstractWeightedStdDevGroupByFunction.java — Merge logic correctness for weighted mean and variance combination
- WeightedStdDevReliabilityGroupByFunctionFactory.java — Variance calculation formula s / (wSum - wSum2 / wSum) numerical stability
- Cross-variant consistency — Ensuring weighted_stddev_freq and weighted_stddev_rel produce mathematically equivalent results under equivalent weight distributions
- Test expectations — Verifying test_stddev.test assertions align with statistical correctness

Possibly related PRs

perf(sql): speed up keyed parallel GROUP BY in case of high cardinality count_distinct() #6432: Both PRs modify GROUP BY aggregation subsystem with parallel-merge behavior; this PR enables merge/supportsParallelism for stddev functions while that PR adds parallel-sharding infrastructure for GroupByFunction implementations.

Suggested labels

SQL, Enhancement

Suggested reviewers

puzpuzpuz

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'feat(sql): weighted average and standard deviation' accurately and concisely summarizes the main change—adding new SQL aggregation functions for weighted calculations.
Description check	✅ Passed	The pull request description clearly explains the addition of weighted aggregation functions (weighted_avg, weighted_stddev_rel, weighted_stddev_freq) with detailed formulas, implementation rationale, and performance benchmarks that align well with the changeset.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

mtopolnik · 2025-11-27T11:31:34Z

@coderabbitai review

coderabbitai · 2025-11-27T11:31:39Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

core/src/test/resources/sqllogictest/test/sql/test_stddev.test (1)
120-120: Potential formatting inconsistency: mixed tab and space separators.

Line 120 appears to use a tab character between NULL values (NULL NULL\tNULL) while other lines use spaces. This may cause test failures or maintenance confusion.

Verify and align the separator to match the expected format:
-3 NULL NULL	NULL
+3 NULL NULL NULL

♻️ Duplicate comments (2)

core/src/test/java/io/questdb/test/griffin/engine/functions/groupby/VarSampleGroupByFunctionFactoryTest.java (1)

133-133: Duplicate: Verify long256 test removal.

Similar to VarPopGroupByFunctionFactoryTest, the testVarSampleLong256Values() test was removed. This is covered by the verification script in the previous file's review comment.

The EOF newline addition is appropriate.

core/src/test/java/io/questdb/test/griffin/engine/functions/groupby/VarGroupByFunctionFactoryTest.java (1)

133-133: Duplicate: Verify long256 test removal.

The testVarLong256Values() test was removed, consistent with the pattern in VarPopGroupByFunctionFactoryTest and VarSampleGroupByFunctionFactoryTest. This is covered by the verification script in the first file's review.

The EOF newline addition is appropriate.

🧹 Nitpick comments (2)

core/src/test/java/io/questdb/test/griffin/engine/functions/groupby/WeightedStdDevRelGroupByFunctionFactoryTest.java (1)

30-206: Comprehensive test coverage for the new weighted stddev functions.

The test suite covers essential scenarios including null handling, edge cases (empty table, single row), type variations (double, float, int), and large values. Both weighted_stddev_rel and its alias weighted_stddev are validated in parallel.

Consider adding a test for negative weights to verify the function's behavior, as negative weights could arise from user data and the current implementation doesn't explicitly reject them.
core/src/test/java/io/questdb/test/griffin/engine/functions/groupby/AvgDoubleGroupByFunctionFactoryTest.java (1)
80-87: SQL column alias formatting could be clearer.

The column alias val appears on a separate line from the CASE...END expression. While this may be valid QuestDB syntax, it's unusual and could be mistaken for a syntax error.

Consider making the alias more explicit:
             execute("""
                     CREATE TABLE test2 AS (
                         SELECT CASE
                             WHEN rnd_double() > 0.6 then 1.0
                             ELSE 0.0
-                        END
-                        val FROM long_sequence(100)
+                        END AS val FROM long_sequence(100)
                     )""");

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between da5ff81 and d2a6805.

📒 Files selected for processing (23)

core/src/main/java/io/questdb/griffin/ExpressionParser.java (1 hunks)
core/src/main/java/io/questdb/griffin/engine/functions/GroupByFunction.java (1 hunks)
core/src/main/java/io/questdb/griffin/engine/functions/groupby/AbstractStdDevGroupByFunction.java (2 hunks)
core/src/main/java/io/questdb/griffin/engine/functions/groupby/AbstractWeightedStdDevGroupByFunction.java (1 hunks)
core/src/main/java/io/questdb/griffin/engine/functions/groupby/StdDevPopGroupByFunctionFactory.java (0 hunks)
core/src/main/java/io/questdb/griffin/engine/functions/groupby/StdDevSampleGroupByFunctionFactory.java (0 hunks)
core/src/main/java/io/questdb/griffin/engine/functions/groupby/VarSampleGroupByFunctionFactory.java (0 hunks)
core/src/main/java/io/questdb/griffin/engine/functions/groupby/WeightedAvgDoubleGroupByFunction.java (1 hunks)
core/src/main/java/io/questdb/griffin/engine/functions/groupby/WeightedAvgDoubleGroupByFunctionFactory.java (1 hunks)
core/src/main/java/io/questdb/griffin/engine/functions/groupby/WeightedStdDevFrequencyGroupByFunctionFactory.java (1 hunks)
core/src/main/java/io/questdb/griffin/engine/functions/groupby/WeightedStdDevGroupByFunctionFactory.java (1 hunks)
core/src/main/java/io/questdb/griffin/engine/functions/groupby/WeightedStdDevReliabilityGroupByFunctionFactory.java (1 hunks)
core/src/test/java/io/questdb/test/griffin/engine/functions/groupby/AvgDoubleGroupByFunctionFactoryTest.java (2 hunks)
core/src/test/java/io/questdb/test/griffin/engine/functions/groupby/StdDevGroupByFunctionFactoryTest.java (1 hunks)
core/src/test/java/io/questdb/test/griffin/engine/functions/groupby/StdDevPopGroupByFunctionFactoryTest.java (2 hunks)
core/src/test/java/io/questdb/test/griffin/engine/functions/groupby/StdDevSampleGroupByFunctionFactoryTest.java (2 hunks)
core/src/test/java/io/questdb/test/griffin/engine/functions/groupby/VarGroupByFunctionFactoryTest.java (1 hunks)
core/src/test/java/io/questdb/test/griffin/engine/functions/groupby/VarPopGroupByFunctionFactoryTest.java (1 hunks)
core/src/test/java/io/questdb/test/griffin/engine/functions/groupby/VarSampleGroupByFunctionFactoryTest.java (1 hunks)
core/src/test/java/io/questdb/test/griffin/engine/functions/groupby/WeightedStdDevFreqGroupByFunctionFactoryTest.java (1 hunks)
core/src/test/java/io/questdb/test/griffin/engine/functions/groupby/WeightedStdDevRelGroupByFunctionFactoryTest.java (1 hunks)
core/src/test/java/io/questdb/test/sqllogictest/SqlTest.java (2 hunks)
core/src/test/resources/sqllogictest/test/sql/test_stddev.test (1 hunks)

💤 Files with no reviewable changes (3)

core/src/main/java/io/questdb/griffin/engine/functions/groupby/StdDevSampleGroupByFunctionFactory.java
core/src/main/java/io/questdb/griffin/engine/functions/groupby/StdDevPopGroupByFunctionFactory.java
core/src/main/java/io/questdb/griffin/engine/functions/groupby/VarSampleGroupByFunctionFactory.java

🧰 Additional context used

🧠 Learnings (2)

📚 Learning: 2025-11-19T12:21:00.062Z

Learnt from: jerrinot
Repo: questdb/questdb PR: 6413
File: core/src/test/java/io/questdb/test/cutlass/pgwire/PGJobContextTest.java:11982-12002
Timestamp: 2025-11-19T12:21:00.062Z
Learning: QuestDB Java tests use a deterministic random seed. The test utilities (e.g., io.questdb.test.tools.TestUtils and io.questdb.std.Rnd) produce reproducible sequences, so rnd_* functions (including rnd_uuid4) yield deterministic outputs across runs. Do not flag tests in core/src/test/** that assert against values produced by rnd_* as flaky due to randomness.

Applied to files:

core/src/test/java/io/questdb/test/griffin/engine/functions/groupby/StdDevGroupByFunctionFactoryTest.java
core/src/test/java/io/questdb/test/griffin/engine/functions/groupby/StdDevPopGroupByFunctionFactoryTest.java
core/src/test/java/io/questdb/test/griffin/engine/functions/groupby/StdDevSampleGroupByFunctionFactoryTest.java
core/src/test/java/io/questdb/test/sqllogictest/SqlTest.java

📚 Learning: 2025-08-04T09:16:27.366Z

Learnt from: jerrinot
Repo: questdb/questdb PR: 5996
File: core/src/test/java/io/questdb/test/cutlass/http/line/LineHttpSenderTest.java:350-356
Timestamp: 2025-08-04T09:16:27.366Z
Learning: In QuestDB ILP tests, maximum dimension values like `(1 << 28) - 1` are intentionally hard-coded instead of using constants like `ArrayView.DIM_MAX_LEN` to serve as "canaries" that will cause test failures if the API limits change, ensuring developers are aware of breaking changes to the ILP API.

Applied to files:

core/src/test/java/io/questdb/test/griffin/engine/functions/groupby/StdDevSampleGroupByFunctionFactoryTest.java

🧬 Code graph analysis (3)

core/src/main/java/io/questdb/griffin/engine/functions/groupby/WeightedAvgDoubleGroupByFunctionFactory.java (2)

core/src/main/java/io/questdb/std/IntList.java (1)

IntList (34-410)

core/src/main/java/io/questdb/std/ObjList.java (1)

ObjList (34-395)

core/src/main/java/io/questdb/griffin/engine/functions/groupby/AbstractWeightedStdDevGroupByFunction.java (3)

core/src/main/java/io/questdb/cairo/ArrayColumnTypes.java (1)

ArrayColumnTypes (30-62)

core/src/main/java/io/questdb/griffin/engine/functions/DoubleFunction.java (1)

DoubleFunction (41-236)

core/src/main/java/io/questdb/std/Numbers.java (1)

Numbers (44-3523)

core/src/main/java/io/questdb/griffin/engine/functions/groupby/WeightedAvgDoubleGroupByFunction.java (3)

core/src/main/java/io/questdb/cairo/ArrayColumnTypes.java (1)

ArrayColumnTypes (30-62)

core/src/main/java/io/questdb/griffin/engine/functions/DoubleFunction.java (1)

DoubleFunction (41-236)

core/src/main/java/io/questdb/std/Numbers.java (1)

Numbers (44-3523)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (33)

GitHub Check: New pull request (SelfHosted Running tests with cover on linux-other)
GitHub Check: New pull request (SelfHosted Running tests with cover on linux-pgwire)
GitHub Check: New pull request (SelfHosted Running tests with cover on linux-cairo-sub)
GitHub Check: New pull request (SelfHosted Running tests with cover on linux-cairo-root)
GitHub Check: New pull request (SelfHosted Running tests with cover on linux-fuzz2)
GitHub Check: New pull request (SelfHosted Running tests with cover on linux-fuzz1)
GitHub Check: New pull request (SelfHosted Running tests with cover on linux-griffin-sub)
GitHub Check: New pull request (SelfHosted Cairo tests on linux-x64-zfs)
GitHub Check: New pull request (SelfHosted Cairo tests on linux-x86-graal)
GitHub Check: New pull request (SelfHosted Cairo tests on linux-arm64)
GitHub Check: New pull request (SelfHosted Running tests with cover on linux-griffin-root)
GitHub Check: New pull request (Rust Test and Lint on linux-jdk17)
GitHub Check: New pull request (Hosted Running tests on windows-other-2)
GitHub Check: New pull request (Hosted Running tests on windows-other-1)
GitHub Check: New pull request (Hosted Running tests on windows-pgwire)
GitHub Check: New pull request (Hosted Running tests on windows-cairo-2)
GitHub Check: New pull request (Hosted Running tests on windows-cairo-1)
GitHub Check: New pull request (Hosted Running tests on windows-fuzz2)
GitHub Check: New pull request (Hosted Running tests on windows-fuzz1)
GitHub Check: New pull request (Hosted Running tests on windows-griffin-sub)
GitHub Check: New pull request (Hosted Running tests on windows-griffin-base)
GitHub Check: New pull request (Hosted Running tests on mac-other)
GitHub Check: New pull request (Hosted Running tests on mac-pgwire)
GitHub Check: New pull request (Hosted Running tests on mac-cairo-fuzz)
GitHub Check: New pull request (Hosted Running tests on mac-cairo)
GitHub Check: New pull request (SelfHosted Other tests on linux-x64-zfs)
GitHub Check: New pull request (SelfHosted Other tests on linux-x86-graal)
GitHub Check: New pull request (SelfHosted Other tests on linux-arm64)
GitHub Check: New pull request (Hosted Running tests on mac-griffin)
GitHub Check: New pull request (SelfHosted Griffin tests on linux-arm64)
GitHub Check: New pull request (SelfHosted Griffin tests on linux-x86-graal)
GitHub Check: New pull request (SelfHosted Griffin tests on linux-x64-zfs)
GitHub Check: New pull request (Check Changes Check changes)

🔇 Additional comments (24)

core/src/main/java/io/questdb/griffin/engine/functions/GroupByFunction.java (1)

193-196: Based on my verification, I found that:

The comment is accurate - setEmpty is indeed called by generated code. The GroupByFunctionsUpdaterFactory uses ASM (bytecode manipulation library) to dynamically generate bytecode that invokes setEmpty via asm.invokeInterface(setEmptyIndex, 1) (line 220).

However, the review's supporting claims are misleading - The review states similar comments exist at lines 173, 198, 203, 208, 215, but examination of the file shows those lines contain comments like "// used when doing interpolation" (for setByte, setDecimal128, setDouble, etc.), not "// used by generated code". The setEmpty comment is unique in documenting generated code usage.

Here's my rewritten review comment:

Comment accurately documents generated code usage.

The setEmpty method is invoked by generated bytecode created by GroupByFunctionsUpdaterFactory using ASM (see GroupByFunctionsUpdaterFactory.java line 220: asm.invokeInterface(setEmptyIndex, 1)). The comment correctly distinguishes this method's usage pattern from other interface methods that are called during interpolation.

core/src/test/java/io/questdb/test/griffin/engine/functions/groupby/StdDevPopGroupByFunctionFactoryTest.java (2)

82-88: Huge-values stddev_pop test looks correct and valuable

The construction 100_000_000 * x over long_sequence(1_000_000) and the asserted result are mathematically consistent with the closed-form population stddev for 1..n scaled by 1e8. This gives good coverage for large magnitudes and checks numerical stability without introducing any apparent overflow or runtime concerns.

92-99: Int-valued stddev_pop coverage is clear and consistent

Using cast(x as int) from long_sequence(100) with the existing expected stddev ensures consistent behavior across double/float/int types and keeps the test straightforward and deterministic. No issues here.

core/src/test/java/io/questdb/test/griffin/engine/functions/groupby/VarPopGroupByFunctionFactoryTest.java (1)

133-133: Long256 was never supported for variance functions—test removal is intentional and correct.

The var_pop function signature is var_pop(D) (double only), and long256 is not implemented in any variance function factory. The removal of testVarPopLong256Values() aligns with the actual supported type coverage and is consistent with similar removals in related variance test files. No further action needed.

The EOF newline addition is a standard formatting improvement.

core/src/main/java/io/questdb/griffin/ExpressionParser.java (1)

891-897: LGTM - tightening CAST context validation.

The change correctly rejects CAST after dot-dereference (e.g., x.cast(...)) or after a literal, which are both invalid syntactic contexts. The simplified flow then unconditionally handles valid CAST positioning.

core/src/test/java/io/questdb/test/griffin/engine/functions/groupby/StdDevGroupByFunctionFactoryTest.java (1)

133-133: Verify intentional removal of testStddevLong256Values test.

The AI summary indicates this test was removed. Please confirm this is intentional and that long256 type support for stddev() is either deprecated/unsupported or covered elsewhere.

core/src/main/java/io/questdb/griffin/engine/functions/groupby/WeightedStdDevGroupByFunctionFactory.java (1)

27-31: Clean alias implementation for weighted_stddev() → weighted_stddev_rel().

The factory correctly inherits all behavior from the reliability-weighted variant while providing the alias signature.

core/src/main/java/io/questdb/griffin/engine/functions/groupby/AbstractStdDevGroupByFunction.java (2)

100-125: Correct implementation of Chan et al. parallel merge algorithm.

The merge logic correctly implements the formula for combining partial Welford aggregates. Edge cases are handled properly:

When both counts are 0, division by zero produces NaN (acceptable for no-data state)

When only one partition is empty, the formula correctly degenerates to preserve the non-empty partition's statistics

The inline comment explaining the numerically stable weighted mean choice (lines 114-119) is appreciated.

141-143: Parallelism enabled consistently with merge implementation.

core/src/test/resources/sqllogictest/test/sql/test_stddev.test (1)

2-40: Good test coverage for the new weighted stddev functions.

The SQL logic tests properly validate both weighted_stddev_rel and weighted_stddev_freq functions with scalar, aggregate, and grouped scenarios, including proper NULL handling.

core/src/test/java/io/questdb/test/griffin/engine/functions/groupby/StdDevSampleGroupByFunctionFactoryTest.java (1)

82-99: Test refactoring looks correct.

The renamed test testStddevSampHugeValues properly validates the function with large values (100M scale over 1M rows), and the repurposed testStddevSampIntValues test now focuses on int type validation.

core/src/main/java/io/questdb/griffin/engine/functions/groupby/WeightedAvgDoubleGroupByFunctionFactory.java (1)

47-56: Factory implementation follows QuestDB patterns correctly.

The signature weighted_avg(DD) correctly indicates two double arguments, and the factory properly delegates to the implementation class. The use of getQuick is safe here as the function signature guarantees exactly two arguments.

core/src/main/java/io/questdb/griffin/engine/functions/groupby/AbstractWeightedStdDevGroupByFunction.java (2)

38-91: Excellent documentation with formula references.

The class documentation clearly explains the two variance estimators (reliability vs frequency weights) with equivalent SQL expressions and Wikipedia references. This will help future maintainers understand the algorithm.

102-145: Incremental algorithm implementation looks correct.

The computeFirst and computeNext methods correctly implement the weighted incremental variance algorithm. The null/non-finite/zero-weight checks properly skip invalid records while maintaining consistent state.

core/src/main/java/io/questdb/griffin/engine/functions/groupby/WeightedStdDevFrequencyGroupByFunctionFactory.java (1)

59-86: Frequency weight implementation is correct.

The getDouble method correctly implements the frequency weight formula with divisor wSum - 1 (Bessel's correction). The divisor <= 0.0 guard properly returns NaN when insufficient data exists for a valid calculation.

core/src/test/java/io/questdb/test/griffin/engine/functions/groupby/WeightedStdDevFreqGroupByFunctionFactoryTest.java (2)

132-141: Different single-row behavior between freq and rel variants is worth noting.

The testWeightedStddevFreqOneRow test expects 0.0 while the equivalent weighted_stddev_rel test expects null. This is mathematically correct because:

freq: divisor = wSum - 1 = 17.2... - 1 > 0, so returns sqrt(0/divisor) = 0

rel: divisor = wSum - wSum²/wSum = 0, so returns NaN

This behavioral difference may surprise users. Consider documenting this distinction.

30-154: Good test coverage for weighted_stddev_freq.

The test suite comprehensively covers various scenarios including type coercion, null handling, and edge cases, mirroring the coverage for the reliability weight variant.

core/src/main/java/io/questdb/griffin/engine/functions/groupby/WeightedStdDevReliabilityGroupByFunctionFactory.java (2)

65-76: Reliability weight formula implementation is correct.

The getDouble method correctly implements the reliability weight divisor formula wSum - wSum²/wSum. When wSum=0 (all null data), the division produces NaN which propagates correctly to the final result.

37-87: Factory implementation follows established patterns.

The factory correctly declares the weighted_stddev_rel(DD) signature and properly instantiates the inner function class with sample and weight arguments.

core/src/main/java/io/questdb/griffin/engine/functions/groupby/WeightedAvgDoubleGroupByFunction.java (2)

71-74: Approve: Division-by-zero handling is correct.

When total_weight is zero (no valid samples), dividing by zero produces NaN, which correctly represents a null result for the weighted average. This is consistent with the setNull method behavior and other aggregate functions in the codebase.

123-131: Merge logic is correct for parallel aggregation.

The merge operation correctly sums both the weighted sums and total weights from source and destination, enabling proper parallel execution support.

core/src/test/java/io/questdb/test/griffin/engine/functions/groupby/AvgDoubleGroupByFunctionFactoryTest.java (3)

139-161: Excellent test coverage for weighted_avg correctness.

The test validates weighted_avg(x, x) against the equivalent formula sum(x*x)/sum(x), confirming the implementation is mathematically correct. The second assertion with random weights further validates the function handles varying weights properly.

163-193: Good edge case coverage for zero and negative weights.

The tests properly validate:

Zero weight for all samples returns null

Conditional weighting (only weight=1 for x=7) returns the single weighted value

Weights that sum to zero (negative + positive) returns null

This covers important edge cases for the weighted average implementation.

195-244: Strong formula validation for weighted standard deviation functions.

Testing weighted_stddev_rel and weighted_stddev_freq against their explicit mathematical formulas provides excellent confidence in the implementation correctness. The use of round(..., 4) appropriately handles floating-point precision for comparison.

.../java/io/questdb/griffin/engine/functions/groupby/AbstractWeightedStdDevGroupByFunction.java

.../main/java/io/questdb/griffin/engine/functions/groupby/WeightedAvgDoubleGroupByFunction.java

core/src/test/java/io/questdb/test/sqllogictest/SqlTest.java

glasstiger · 2025-11-28T21:17:04Z

[PR Coverage check]

😍 pass : 170 / 187 (90.91%)

file detail

	path	covered line	new line	coverage
🔵	io/questdb/griffin/engine/functions/groupby/WeightedAvgDoubleGroupByFunction.java	40	46	86.96%
🔵	io/questdb/griffin/engine/functions/groupby/AbstractWeightedStdDevGroupByFunction.java	75	86	87.21%
🔵	io/questdb/griffin/engine/functions/groupby/WeightedAvgDoubleGroupByFunctionFactory.java	4	4	100.00%
🔵	io/questdb/griffin/engine/functions/groupby/WeightedStdDevGroupByFunctionFactory.java	2	2	100.00%
🔵	io/questdb/griffin/engine/functions/groupby/WeightedStdDevFrequencyGroupByFunctionFactory.java	14	14	100.00%
🔵	io/questdb/griffin/ExpressionParser.java	5	5	100.00%
🔵	io/questdb/griffin/engine/functions/groupby/WeightedStdDevReliabilityGroupByFunctionFactory.java	15	15	100.00%
🔵	io/questdb/griffin/engine/functions/groupby/AbstractStdDevGroupByFunction.java	15	15	100.00%

mtopolnik added 11 commits November 25, 2025 10:53

Extract identical merge method to superclass

acd3586

Extract supportsParallelism() to superclass

3738a43

Add comment that a method is used only from generated code

d79ccbd

Weighted avg, stddev

046a6c9

Properly handle single NULL sample

a7eb39b

Javadoc on weighted_stddev

f4466cc

Rename "sample" -> "reliability" stddev

a3b92bb

Add frequency weighted stddev

2ac52de

Add default weighted_stddev

dffb852

Tests

a10287b

Merge branch 'master' into mt_weighted-agg

5fc68dd

mtopolnik added 5 commits November 26, 2025 16:26

Tests

44581ad

Merge branch 'master' into mt_weighted-agg

499b4da

Use text blocks in SqlCodeGeneratorTest

cc2726e

Remove invalid tests with LONG256

0f020bf

Fix syntax bug involving CAST()

d2a6805

coderabbitai bot reviewed Nov 27, 2025

View reviewed changes

mtopolnik added 3 commits November 27, 2025 14:19

Improve null/empty value handling

82aff9b

Remove temporary code

f173d7f

Merge branch 'master' into mt_weighted-agg

adcab7e

puzpuzpuz self-requested a review November 27, 2025 14:17

bluestreak01 and others added 6 commits November 27, 2025 18:30

Merge remote-tracking branch 'origin/master' into mt_weighted-agg

7b0521b

Compare weighted stddev against unweighted

4452cdf

SQL formatting

269765b

Parellel test

b810434

delegate supportsParallelism to supertype

b6c4865

Assert rounded result

5c14754

mtopolnik and others added 11 commits November 28, 2025 11:04

More negative weight tests

85d776f

Add test for all negative weights

e8024b9

Merge branch 'master' into mt_weighted-agg

4b6fd74

Parallel test of weighted_avg

d82be32

Improve cast syntax validation

d74a428

Add @transient params

ab76edc

Implement isThreadSafe()

4c14f07

Merge branch 'master' into mt_weighted-agg

8a78f3a

Implement and test SAMPLE BY fill methods

25ad652

Merge branch 'master' into mt_weighted-agg

d054c0b

Merge branch 'master' into mt_weighted-agg

a4bdb0a

bluestreak01 approved these changes Nov 28, 2025

View reviewed changes

bluestreak01 merged commit 1fca1eb into master Nov 28, 2025
43 checks passed

bluestreak01 deleted the mt_weighted-agg branch November 28, 2025 21:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(sql): weighted average and standard deviation#6457

feat(sql): weighted average and standard deviation#6457
bluestreak01 merged 36 commits intomasterfrom
mt_weighted-agg

mtopolnik commented Nov 26, 2025 •

edited

Loading

Uh oh!

coderabbitai bot commented Nov 26, 2025 •

edited

Loading

Review skipped

Uh oh!

mtopolnik commented Nov 27, 2025

Uh oh!

coderabbitai bot commented Nov 27, 2025

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

glasstiger commented Nov 28, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

mtopolnik commented Nov 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmark

Uh oh!

coderabbitai bot commented Nov 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Pre-merge checks and finishing touches

Uh oh!

mtopolnik commented Nov 27, 2025

Uh oh!

coderabbitai bot commented Nov 27, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

glasstiger commented Nov 28, 2025

[PR Coverage check]

file detail

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mtopolnik commented Nov 26, 2025 •

edited

Loading

coderabbitai bot commented Nov 26, 2025 •

edited

Loading