feat(sql): implement mode(Ø/T/L/K/S/D) group by functions by nwoolmer · Pull Request #5623 · questdb/questdb

nwoolmer · 2025-04-24T11:56:39Z

mode() returns the most frequent appearance of a column value, for each group. Without mode(), you must first perform a count() operation. Then you must sort the result set and take the value corresponding to the highest count. This requires complex subqueries.

This has a variety of uses. For example, returning the most frequently traded symbol by a trader, or identifying the most frequent state for a sensor.

It is implemented using a map of maps, where the outer map stores pointers to inner maps, and each inner map aggregates a per-key count. These are merged into one set of top level maps, which are then scanned for the most frequent value.

todo:

Benchmark

For GroupByLongHashSet, GroupByLongLongHashMap, LongLongHashMap: (not isolated)

Details

Benchmark	(size)	Mode	Cnt	Score	Error	Units
GroupByLongLongHashMapBenchmark.testGroupByLongHashSet	5000	avgt	3	8.506	1.401	ns/op
GroupByLongLongHashMapBenchmark.testGroupByLongHashSet	50000	avgt	3	14.323	2.399	ns/op
GroupByLongLongHashMapBenchmark.testGroupByLongHashSet	500000	avgt	3	31.933	4.347	ns/op
GroupByLongLongHashMapBenchmark.testGroupByLongHashSet	5000000	avgt	3	52.546	16.526	ns/op
GroupByLongLongHashMapBenchmark.testGroupByLongLongHashMap	5000	avgt	3	14.065	0.740	ns/op
GroupByLongLongHashMapBenchmark.testGroupByLongLongHashMap	50000	avgt	3	22.420	3.034	ns/op
GroupByLongLongHashMapBenchmark.testGroupByLongLongHashMap	500000	avgt	3	56.049	12.629	ns/op
GroupByLongLongHashMapBenchmark.testGroupByLongLongHashMap	5000000	avgt	3	80.433	8.899	ns/op
GroupByLongLongHashMapBenchmark.testLongLongHashMap	5000	avgt	3	16.445	0.487	ns/op
GroupByLongLongHashMapBenchmark.testLongLongHashMap	50000	avgt	3	20.678	1.930	ns/op
GroupByLongLongHashMapBenchmark.testLongLongHashMap	500000	avgt	3	53.339	21.817	ns/op
GroupByLongLongHashMapBenchmark.testLongLongHashMap	5000000	avgt	3	71.944	6.283	ns/op

For GroupbyUtf8SequenceLongHashMap and Utf8SequenceLongHashMap: (not isolated)

Details

Benchmark (size) Mode Cnt Score Error Units GroupByUtf8SequenceLongHashMapBenchmark.testGroupByUtf8SequenceLongHashMap 10 avgt 3 1078.801 ± 90.283 ns/op GroupByUtf8SequenceLongHashMapBenchmark.testGroupByUtf8SequenceLongHashMap 50 avgt 3 2287.050 ± 213.875 ns/op GroupByUtf8SequenceLongHashMapBenchmark.testGroupByUtf8SequenceLongHashMap 200 avgt 3 7145.411 ± 86.704 ns/op GroupByUtf8SequenceLongHashMapBenchmark.testGroupByUtf8SequenceLongHashMap 1000 avgt 3 34169.157 ± 2721.831 ns/op GroupByUtf8SequenceLongHashMapBenchmark.testUtf8SequenceLongHashMap 10 avgt 3 778.603 ± 51.911 ns/op GroupByUtf8SequenceLongHashMapBenchmark.testUtf8SequenceLongHashMap 50 avgt 3 1972.678 ± 376.840 ns/op GroupByUtf8SequenceLongHashMapBenchmark.testUtf8SequenceLongHashMap 200 avgt 3 6501.634 ± 133.595 ns/op GroupByUtf8SequenceLongHashMapBenchmark.testUtf8SequenceLongHashMap 1000 avgt 3 29782.794 ± 2519.313 ns/op

For GroupbyCharSequenceLongHashMap and CharSequenceLongHashMap: (not isolated)

Details

Benchmark (size) Mode Cnt Score Error Units GroupByCharSequenceLongHashMapBenchmark.testCharSequenceLongHashMap 10 avgt 3 503.610 ± 415.333 ns/op GroupByCharSequenceLongHashMapBenchmark.testCharSequenceLongHashMap 50 avgt 3 756.659 ± 178.424 ns/op GroupByCharSequenceLongHashMapBenchmark.testCharSequenceLongHashMap 200 avgt 3 2039.515 ± 222.997 ns/op GroupByCharSequenceLongHashMapBenchmark.testCharSequenceLongHashMap 1000 avgt 3 8263.961 ± 811.661 ns/op GroupByCharSequenceLongHashMapBenchmark.testGroupByCharSequenceLongHashMap 10 avgt 3 583.044 ± 105.280 ns/op GroupByCharSequenceLongHashMapBenchmark.testGroupByCharSequenceLongHashMap 50 avgt 3 1159.440 ± 954.187 ns/op GroupByCharSequenceLongHashMapBenchmark.testGroupByCharSequenceLongHashMap 200 avgt 3 3150.716 ± 132.363 ns/op GroupByCharSequenceLongHashMapBenchmark.testGroupByCharSequenceLongHashMap 1000 avgt 3 11260.364 ± 256.576 ns/op

… off-heap hashmap, passable by pointer, for use in group by queries.

rmja · 2025-06-17T09:05:46Z

Is there any reason that mode cannot be computed on a double value?

My use case is the following:

I have a devices that sends timestamped values like this:
"volume was XXXm3 at the start of this month"

I.e. I dont have a definitive timestamp for the value. The problem with this is of cause that the time on the device may not be exact, so when I receive this type of data in the beginning/end of the month, it may be that the month has changed in the device but not when the receiver makes its timestamp.

My idea to handle this in questdb is to store all values like this:

device_time TIMESTAMP
received_time TIMESTAMP
value DOUBLE

The device time may be set to an incorrect month, however if I could do:

SELECT mode(value) SAMPLE BY 1M

Then that would give me the correct value for the month, as it solves the problem by selecting the most observed value in the month.

nwoolmer · 2025-06-18T12:03:00Z

Hi @rmja , thanks for the details.

No particular reason, just the original use case didn't require it. We can have more options. However, keeping cardinality in check is important, else you will exhaust RAM building hashmaps.

This is on small pause for the time being due to other priorities, but I will wrap it up when I can.

coderabbitai · 2025-09-24T10:37:25Z

Important

Review skipped

Auto reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

Adds mode() aggregate functions for boolean, double, long, string, symbol, and varchar; supporting factories; allocator-backed group-by hash maps for long/charsequence/utf8; benchmarks comparing group-by maps vs standard maps; API extensions in std maps; visibility change for LongLongHashMap constructor; and comprehensive unit/integration tests.

Changes

Cohort / File(s)	Summary of changes
Benchmarks `benchmarks/src/main/java/org/questdb/GroupByCharSequenceLongHashMapBenchmark.java`, `benchmarks/src/main/java/org/questdb/GroupByLongLongHashMapBenchmark.java`, `benchmarks/src/main/java/org/questdb/GroupByUtf8SequenceLongHashMapBenchmark.java`	New JMH benchmarks comparing GroupBy* maps with standard maps across sizes; per-iteration setup/reset; main runners and @benchmark methods.
GroupBy hash maps (allocator-backed) `core/src/main/java/io/questdb/griffin/engine/groupby/GroupByCharSequenceLongHashMap.java`, `core/src/main/java/io/questdb/griffin/engine/groupby/GroupByLongLongHashMap.java`, `core/src/main/java/io/questdb/griffin/engine/groupby/GroupByUtf8SequenceLongHashMap.java`	New flyweight hash maps with direct-memory layout, open addressing, pointer export/import, inc/put/get, mergeAdd, rehash, allocator integration.
Mode aggregators (functions) `core/src/main/java/io/questdb/griffin/engine/functions/groupby/ModeBooleanGroupByFunction.java`, `.../ModeDoubleGroupByFunction.java`, `.../ModeLongGroupByFunction.java`, `.../ModeStringGroupByFunction.java`, `.../ModeSymbolGroupByFunction.java`, `.../ModeVarcharGroupByFunction.java`	New GroupByFunction implementations computing mode per type; maintain per-group pointer to internal counting maps; implement computeFirst/Next, merge, getters, planning, allocator wiring.
Mode function factories `core/src/main/java/io/questdb/griffin/engine/functions/groupby/ModeBooleanGroupByFunctionFactory.java`, `.../ModeDoubleGroupByFunctionFactory.java`, `.../ModeLongGroupByFunctionFactory.java`, `.../ModeStringGroupByFunctionFactory.java`, `.../ModeSymbolGroupByFunctionFactory.java`, `.../ModeVarcharGroupByFunctionFactory.java`	New FunctionFactory classes exposing signatures (mode(T/D/L/S/K/Ø)), group-by flags, and newInstance wiring to corresponding functions.
Std map/API updates `core/src/main/java/io/questdb/std/CharSequenceLongHashMap.java`, `core/src/main/java/io/questdb/std/IntLongHashMap.java`, `core/src/main/java/io/questdb/std/LongLongHashMap.java`, `core/src/main/java/io/questdb/std/Utf8SequenceLongHashMap.java`	Added inc methods to CharSequenceLongHashMap and IntLongHashMap; made LongLongHashMap(int,double) constructor public; added new Utf8SequenceLongHashMap with put/get/inc/rehash APIs.
Tests — mode aggregators `core/src/test/java/io/questdb/test/griffin/engine/functions/groupby/ModeBooleanGroupByFunctionFactoryTest.java`, `.../ModeDoubleGroupByFunctionFactoryTest.java`, `.../ModeGroupByFunctionTest.java`, `.../ModeIntegrationTest.java`, `.../ModeLongGroupByFunctionFactoryTest.java`, `.../ModeStringGroupByFunctionFactoryTest.java`, `.../ModeSymbolGroupByFunctionFactoryTest.java`, `.../ModeVarcharGroupByFunctionFactoryTest.java`	New suites covering nulls, mixed values, grouping, sample by, ordering/limits, unicode/special strings, random data, and integration scenarios for mode() across types.
Tests — group-by maps `core/src/test/java/io/questdb/test/griffin/engine/groupby/GroupByCharSequenceLongHashMapTest.java`, `.../GroupByLongLongHashMapTest.java`, `.../GroupByUtf8SequenceLongHashMapTest.java`	New tests validating allocator-backed maps: insert/get/inc, collisions, rehash, mergeAdd, pointer reuse, unicode handling, and random workloads.
Tests — std map `core/src/test/java/io/questdb/test/std/Utf8SequenceLongHashMapTest.java`	New tests for Utf8SequenceLongHashMap: put/get/inc, remove, clear, putAll, index/value semantics, random scenarios.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor Client
  participant SQL as SQL Planner
  participant Exec as Execution Engine
  participant GB as GroupBy Operator
  participant Agg as mode() Aggregator
  participant Map as GroupBy* Map
  Client->>SQL: SELECT ... mode(col) FROM t [GROUP BY k] [SAMPLE BY ...]
  SQL->>Exec: Plan with mode() (per-type factory)
  Exec->>GB: Initialize groups and aggregators
  GB->>Agg: initValueTypes/setAllocator
  loop scan records
    GB->>Agg: computeFirst/computeNext(mapValue, record, rowId)
    Agg->>Map: inc(key) / put(key,1)
    Agg-->>GB: store/update map pointer in mapValue
  end
  par parallel workers (optional)
    GB->>Agg: merge(destValue, srcValue)
    Agg->>Map: mergeAdd(srcMap)
  end
  GB->>Agg: getX(record) for result materialization
  Agg->>Map: iterate to find max-count key
  Agg-->>GB: mode value
  GB-->>Exec: grouped rows with mode()
  Exec-->>Client: result set

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Suggested labels

Core, Performance

Suggested reviewers

puzpuzpuz
mtopolnik

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.23% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title Check	✅ Passed	The title is a concise, single-sentence description highlighting the primary feature: implementing the SQL mode aggregate for group-by operations. It follows conventional commit style and correctly references the supported data types using shorthand notation. A teammate scanning the history can infer that this PR adds the mode() function across various input types.
Linked Issues Check	✅ Passed	The linked issue #5580 requests a mode() aggregate function that returns the most frequent non-null value per group and supports both keyed and non-keyed queries. This PR implements mode() for boolean, long, double, varchar, string, and symbol types using off-heap group-by map implementations, integrates keyed and non-keyed execution, and includes comprehensive benchmarks and unit tests. These changes fully address the issue’s objectives by providing the requested functionality across all targeted data types.
Out of Scope Changes Check	✅ Passed	All modifications in this PR—including new group-by function classes, custom off-heap map implementations, benchmark suites, test additions, and helper methods added to existing map classes—directly support the implementation and evaluation of the mode() aggregate. No unrelated or extraneous code changes are present beyond what is necessary for this feature. Therefore, there are no out-of-scope modifications in this pull request.
Description Check	✅ Passed	The pull request description clearly outlines the addition of the `mode()` aggregate function, its motivation and use cases, the high-level implementation strategy using nested maps, and provides a completed checklist of related tasks and benchmark results that directly correspond to the changes in the diff.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

bluestreak01 · 2025-09-30T14:41:22Z

@CodeRabbit review

coderabbitai · 2025-09-30T14:41:29Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai

Actionable comments posted: 16

♻️ Duplicate comments (5)

core/src/main/java/io/questdb/griffin/engine/groupby/GroupByCharSequenceLongHashMap.java (1)

34-34: Previous review comment not addressed.

The earlier review noted that this class owns its memory and thus isn't truly a flyweight. The javadoc still describes it as a "flyweight hash map," which remains misleading.

benchmarks/src/main/java/org/questdb/GroupByCharSequenceLongHashMapBenchmark.java (2)

53-59: Static fields with @State(Scope.Thread) may cause issues.

Same issue as in GroupByLongLongHashMapBenchmark: static fields are shared across threads if JMH runs multiple threads. Consider making these instance fields to align with Scope.Thread semantics.

74-81: Verify allocator reuse after close().

Same issue as in GroupByLongLongHashMapBenchmark: Line 76 calls allocator.close(), and line 77 calls setAllocator(allocator). Ensure the allocator can be reused after close().
core/src/main/java/io/questdb/griffin/engine/functions/groupby/ModeStringGroupByFunction.java (1)
46-47: Make initialCapacity and loadFactor final constants.

These fields are initialized once in the class and never modified. They should be declared as final constants to prevent accidental modification and clearly express intent.

Apply this diff:
-    int initialCapacity = 4;
-    double loadFactor = 0.7d;
+    private static final int INITIAL_CAPACITY = 4;
+    private static final double LOAD_FACTOR = 0.7;
Then update the map initializations on lines 48-49:
-    GroupByCharSequenceLongHashMap mapA = new GroupByCharSequenceLongHashMap(initialCapacity, loadFactor, LONG_NULL, LONG_NULL);
-    GroupByCharSequenceLongHashMap mapB = new GroupByCharSequenceLongHashMap(initialCapacity, loadFactor, LONG_NULL, LONG_NULL);
+    GroupByCharSequenceLongHashMap mapA = new GroupByCharSequenceLongHashMap(INITIAL_CAPACITY, LOAD_FACTOR, LONG_NULL, LONG_NULL);
+    GroupByCharSequenceLongHashMap mapB = new GroupByCharSequenceLongHashMap(INITIAL_CAPACITY, LOAD_FACTOR, LONG_NULL, LONG_NULL);
Based on past review comments.
core/src/main/java/io/questdb/griffin/engine/functions/groupby/ModeVarcharGroupByFunction.java (1)

180-201: Verify determinism when multiple values share the maximum count.

Similar to ModeSymbolGroupByFunction, when multiple varchar values tie for the maximum count, the implementation returns the first one encountered during hash map iteration, which may be non-deterministic.

This is a duplicate concern across all Mode implementations. The verification script in the ModeSymbolGroupByFunction review will cover this as well.

🧹 Nitpick comments (23)

core/src/main/java/io/questdb/griffin/engine/functions/groupby/ModeVarcharGroupByFunctionFactory.java (1)
34-56: Factory implementation follows standard pattern.

The factory correctly:

Returns signature "mode(Ø)" for VARCHAR type

Indicates participation in group-by via isGroupBy() = true

Constructs ModeVarcharGroupByFunction with the first argument

This is consistent with the QuestDB function factory pattern.

Note: The implementation assumes the planner ensures args has at least one element. If you want defensive validation, consider adding:
 public Function newInstance(
         int position,
         ObjList<Function> args,
         IntList argPositions,
         CairoConfiguration configuration,
         SqlExecutionContext sqlExecutionContext
 ) {
+    if (args.size() != 1) {
+        throw SqlException.$(position, "mode() expects exactly 1 argument");
+    }
     return new ModeVarcharGroupByFunction(args.getQuick(0));
 }
However, this validation is typically done by the planner, so omitting it here is acceptable.
core/src/main/java/io/questdb/griffin/engine/functions/groupby/ModeLongGroupByFunction.java (2)

88-109: Mode-finding algorithm is correct but could be optimized.

The linear scan through all map slots to find the maximum count is correct but not optimal for large maps.

Consider tracking the mode key/count during updates (in computeFirst/computeNext) rather than scanning the entire map in getLong(). This would reduce the time complexity from O(capacity) to O(1) for retrieval, at the cost of slightly more work during updates.

However, if this is deferred optimization and current performance is acceptable, the current implementation is correct and safe.

46-49: Consider making initialCapacity and loadFactor configurable.

The hardcoded initialCapacity = 4 and loadFactor = 0.7d may not be optimal for all use cases. For high-cardinality columns (as warned in PR comments), this could lead to many rehashes.

Consider:

Making these configurable via constructor parameters or configuration

Adding documentation about memory implications for high-cardinality data

Exposing metrics for map size/rehash counts for observability

This aligns with the PR comment from nwoolmer warning about "cardinality and RAM usage for hashmaps."
core/src/main/java/io/questdb/griffin/engine/groupby/GroupByLongLongHashMap.java (3)
78-93: inc(key, delta) does not validate delta sign.

The method increments a value by delta without checking if delta is negative. While this might be intentional for flexibility, negative deltas could produce nonsensical counts in the context of mode aggregation (where counts represent frequencies and should be non-negative).

If negative deltas are not a valid use case, add validation:
 public void inc(long key, long delta) {
     if (key != noKeyValue) {
+        if (delta < 0) {
+            throw new IllegalArgumentException("delta must be non-negative");
+        }
         long index = keyIndex(key);
Alternatively, document that negative deltas are supported if there's a valid use case (e.g., decrements).

110-123: of(long ptr) allocates new map even if ptr is 0—clarify intent.

When ptr == 0, the method allocates a new map and initializes it. However, the method name of suggests it's restoring from an existing pointer. The behavior is correct but potentially confusing.

Consider renaming to ofOrAllocate(long ptr) or add a javadoc comment:
+    /**
+     * Restores the map from the given pointer, or allocates a new map if ptr is 0.
+     */
     public GroupByLongLongHashMap of(long ptr) {
182-209: Rehashing frees old buffer before verifying new map is valid.

Line 208 calls allocator.free(oldPtr, ...) immediately after rehashing. If there's a bug in the rehashing logic (e.g., incorrect keyIndex calculation), the old data is already lost, making debugging difficult.

For safer debugging, consider deferring the free until after a validation pass (in debug builds):
     }
+    // In debug mode, validate the rehashed map before freeing old buffer
+    assert validateRehash(oldPtr, oldCapacity);
     allocator.free(oldPtr, HEADER_SIZE + 16L * oldCapacity);
 }
+
+private boolean validateRehash(long oldPtr, int oldCapacity) {
+    // Check that all old entries are present in new map
+    for (long p = oldPtr + HEADER_SIZE, lim = oldPtr + HEADER_SIZE + 16L * oldCapacity; p < lim; p += 16L) {
+        long key = Unsafe.getUnsafe().getLong(p);
+        if (key != noKeyValue) {
+            long value = Unsafe.getUnsafe().getLong(p + 8L);
+            assert get(key) == value : "rehash validation failed";
+        }
+    }
+    return true;
+}
This is optional and primarily for debugging.
core/src/main/java/io/questdb/griffin/engine/functions/groupby/ModeDoubleGroupByFunction.java (1)

46-49: Extract initialCapacity and loadFactor into configurable parameters
All Mode*GroupByFunction implementations (String, Varchar, Symbol, Long, Double) currently hardcode initialCapacity = 4 and loadFactor = 0.7d; consider exposing these via constructor arguments or central configuration and tuning based on expected cardinality or query hints.
benchmarks/src/main/java/org/questdb/GroupByLongLongHashMapBenchmark.java (1)
54-62: Static fields with @State(Scope.Thread) may cause issues.

The class uses @State(Scope.Thread), but the allocator, maps, and RNG are declared as static fields. This means they are shared across all threads if JMH runs multiple threads. If the benchmark is intended to be single-threaded (forks=1 in main), this is fine, but the pattern is unconventional. Typically, fields should be instance fields when using Scope.Thread.

Consider making these instance fields to align with Scope.Thread semantics:
-    private static final GroupByAllocator allocator = new FastGroupByAllocator(128 * 1024, Numbers.SIZE_1GB);
+    private GroupByAllocator allocator = new FastGroupByAllocator(128 * 1024, Numbers.SIZE_1GB);
(Apply similar changes to other static fields, and adjust reset() to recreate the allocator instead of closing and reusing it.)
core/src/main/java/io/questdb/std/Utf8SequenceLongHashMap.java (1)
78-80: Consider validating the index parameter.

The inc(int index) method assumes the index is negative (occupied slot) and directly accesses values[-index - 1]. If called with a positive index, this could cause array bounds issues.

Consider adding a precondition check:
 public void inc(int index) {
+    assert index < 0 : "inc(int) expects a negative index from keyIndex()";
     values[-index - 1]++;
 }
Or document the precondition in a JavaDoc comment.
core/src/test/java/io/questdb/test/griffin/engine/functions/groupby/ModeBooleanGroupByFunctionFactoryTest.java (1)

46-56: Document default false fallback in ModeBooleanGroupByFunction
The behaviour is intentional: when all inputs are NULL, both counters are set to LONG_NULL and getBool evaluates trueCount > falseCount as false. Add a Javadoc note in core/src/main/java/io/questdb/griffin/engine/functions/groupby/ModeBooleanGroupByFunction.java (and update the FunctionFactory contract) to explicitly document this default.
core/src/main/java/io/questdb/griffin/engine/functions/groupby/ModeStringGroupByFunction.java (1)
179-200: Optimize mode extraction with early exit.

The current implementation scans the entire capacity even after finding the mode. While functionally correct, consider that for large maps this scans every slot including empty ones.

Consider tracking the maximum count during insertion/merge to avoid the full scan:
// Add field:
private long cachedModeKey = LONG_NULL;
private long cachedModeCount = -1;

// Update in computeNext/merge to track max
// Then getStr becomes:
CharSequence getStr(Record record, GroupByCharSink sink) {
    long mapPtr = record.getLong(valueIndex);
    if (mapPtr <= 0) {
        return null;
    }
    mapA.of(mapPtr);
    if (cachedModeKey != LONG_NULL) {
        sink.of(cachedModeKey);
        return sink;
    }
    // fallback to scan...
}
Note: This optimization adds complexity and would require careful handling during merge operations.
core/src/main/java/io/questdb/griffin/engine/groupby/GroupByUtf8SequenceLongHashMap.java (1)
258-273: Probe method detects corruption but lacks reprobing limit.

The probe method correctly implements linear probing and detects when the table is full by checking if it wraps around to the starting index. However, for a properly sized table this should rarely happen.

Consider whether the error message on line 272 should include diagnostic information (size, capacity, key) to aid debugging if corruption occurs in production.
-        throw CairoException.critical(0).put("corrupt utf8 sequence long long hash map");
+        throw CairoException.critical(0)
+            .put("corrupt utf8 sequence long hash map [size=").put(size())
+            .put(", capacity=").put(capacity())
+            .put(", key=").put(key)
+            .put(']');
core/src/main/java/io/questdb/griffin/engine/functions/groupby/ModeSymbolGroupByFunction.java (4)

51-54: Hard-coded capacity and load factor may not suit all workloads.

The initialCapacity = 4 and loadFactor = 0.7d are hard-coded. For high-cardinality symbol columns, starting with capacity 4 may cause excessive resizing, while for low-cardinality columns it may waste memory. Consider making these configurable or adaptive based on symbol table characteristics.

51-54: Consider making capacity and load factor configurable or document the rationale.

The hard-coded initialCapacity = 4 and loadFactor = 0.7d may lead to early resizing in typical aggregation scenarios with moderate-to-high cardinality. If these values are based on empirical tuning or match conventions across other Mode implementations, consider adding a comment explaining the choice.

103-114: Capacity iteration is intentional—add clarifying comment
GroupByLongLongHashMap lacks an entry iterator; iterating over capacity with LONG_NULL checks is the correct approach. Add a comment above the loop explaining this intent for future maintainers.

51-54: Consider making initial capacity and load factor configurable.

The hardcoded initialCapacity = 4 and loadFactor = 0.7d may not be optimal for all use cases. For columns with high cardinality, a larger initial capacity could reduce resize operations. Consider exposing these as constructor parameters or configuration options, similar to other hash map implementations in the codebase.

core/src/main/java/io/questdb/griffin/engine/functions/groupby/ModeVarcharGroupByFunction.java (4)

47-52: Hard-coded capacity and load factor may not suit all workloads.

Similar to ModeSymbolGroupByFunction, the initialCapacity = 4 and loadFactor = 0.7d are hard-coded. For high-cardinality varchar columns (e.g., free-text fields), this may cause excessive resizing and memory churn.

47-52: Consider making capacity and load factor configurable or document the rationale.

As with ModeSymbolGroupByFunction, the hard-coded initialCapacity = 4 and loadFactor = 0.7d may cause early resizing for moderate cardinality. Document the reasoning or consider making these tunable.

189-198: No change needed for capacity() loop
GroupByUtf8SequenceLongHashMap does not expose an iterator or entry-only traversal API, so iterating over capacity() and filtering out empty slots is the correct—and only—way to find occupied entries. Optionally, add a brief code comment explaining this choice for future maintainers.

47-52: Consider making initial capacity and load factor configurable.

Similar to ModeSymbolGroupByFunction, the hardcoded initialCapacity = 4 and loadFactor = 0.7d may not be optimal for high-cardinality VARCHAR columns. Consider exposing these as configuration options.

core/src/test/java/io/questdb/test/griffin/engine/functions/groupby/ModeDoubleGroupByFunctionFactoryTest.java (2)

30-270: Consider adding floating-point precision edge case tests.

While the test suite is comprehensive, it doesn't explicitly test scenarios where floating-point precision matters:

Very close but distinct values (e.g., 1.0 and 1.0 + Double.MIN_VALUE)

NaN and Infinity handling

Subnormal numbers

These edge cases could expose issues in the hash map key comparison logic for doubles.

30-271: Consider adding tests for NaN and Infinity.

The test suite is comprehensive, but consider adding edge cases for Double.NaN and Double.POSITIVE_INFINITY / Double.NEGATIVE_INFINITY to ensure they're handled correctly (e.g., are NaN values treated as distinct or ignored? What happens if all values are NaN?).

benchmarks/src/main/java/org/questdb/GroupByUtf8SequenceLongHashMapBenchmark.java (1)

86-93: Hoist ptr retrieval outside the loop

GroupByUtf8SequenceLongHashMap.ptr() never changes on put (no relocation), so updating mapPtr after each iteration is redundant—move mapPtr = …ptr() to before the loop.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between dea713f and cc2a1bc.

📒 Files selected for processing (34)

benchmarks/src/main/java/org/questdb/GroupByCharSequenceLongHashMapBenchmark.java (1 hunks)
benchmarks/src/main/java/org/questdb/GroupByLongLongHashMapBenchmark.java (1 hunks)
benchmarks/src/main/java/org/questdb/GroupByUtf8SequenceLongHashMapBenchmark.java (1 hunks)
core/src/main/java/io/questdb/griffin/engine/functions/groupby/ModeBooleanGroupByFunction.java (1 hunks)
core/src/main/java/io/questdb/griffin/engine/functions/groupby/ModeBooleanGroupByFunctionFactory.java (1 hunks)
core/src/main/java/io/questdb/griffin/engine/functions/groupby/ModeDoubleGroupByFunction.java (1 hunks)
core/src/main/java/io/questdb/griffin/engine/functions/groupby/ModeDoubleGroupByFunctionFactory.java (1 hunks)
core/src/main/java/io/questdb/griffin/engine/functions/groupby/ModeLongGroupByFunction.java (1 hunks)
core/src/main/java/io/questdb/griffin/engine/functions/groupby/ModeLongGroupByFunctionFactory.java (1 hunks)
core/src/main/java/io/questdb/griffin/engine/functions/groupby/ModeStringGroupByFunction.java (1 hunks)
core/src/main/java/io/questdb/griffin/engine/functions/groupby/ModeStringGroupByFunctionFactory.java (1 hunks)
core/src/main/java/io/questdb/griffin/engine/functions/groupby/ModeSymbolGroupByFunction.java (1 hunks)
core/src/main/java/io/questdb/griffin/engine/functions/groupby/ModeSymbolGroupByFunctionFactory.java (1 hunks)
core/src/main/java/io/questdb/griffin/engine/functions/groupby/ModeVarcharGroupByFunction.java (1 hunks)
core/src/main/java/io/questdb/griffin/engine/functions/groupby/ModeVarcharGroupByFunctionFactory.java (1 hunks)
core/src/main/java/io/questdb/griffin/engine/groupby/GroupByCharSequenceLongHashMap.java (1 hunks)
core/src/main/java/io/questdb/griffin/engine/groupby/GroupByLongLongHashMap.java (1 hunks)
core/src/main/java/io/questdb/griffin/engine/groupby/GroupByUtf8SequenceLongHashMap.java (1 hunks)
core/src/main/java/io/questdb/std/CharSequenceLongHashMap.java (1 hunks)
core/src/main/java/io/questdb/std/IntLongHashMap.java (1 hunks)
core/src/main/java/io/questdb/std/LongLongHashMap.java (1 hunks)
core/src/main/java/io/questdb/std/Utf8SequenceLongHashMap.java (1 hunks)
core/src/test/java/io/questdb/test/griffin/engine/functions/groupby/ModeBooleanGroupByFunctionFactoryTest.java (1 hunks)
core/src/test/java/io/questdb/test/griffin/engine/functions/groupby/ModeDoubleGroupByFunctionFactoryTest.java (1 hunks)
core/src/test/java/io/questdb/test/griffin/engine/functions/groupby/ModeGroupByFunctionTest.java (1 hunks)
core/src/test/java/io/questdb/test/griffin/engine/functions/groupby/ModeIntegrationTest.java (1 hunks)
core/src/test/java/io/questdb/test/griffin/engine/functions/groupby/ModeLongGroupByFunctionFactoryTest.java (1 hunks)
core/src/test/java/io/questdb/test/griffin/engine/functions/groupby/ModeStringGroupByFunctionFactoryTest.java (1 hunks)
core/src/test/java/io/questdb/test/griffin/engine/functions/groupby/ModeSymbolGroupByFunctionFactoryTest.java (1 hunks)
core/src/test/java/io/questdb/test/griffin/engine/functions/groupby/ModeVarcharGroupByFunctionFactoryTest.java (1 hunks)
core/src/test/java/io/questdb/test/griffin/engine/groupby/GroupByCharSequenceLongHashMapTest.java (1 hunks)
core/src/test/java/io/questdb/test/griffin/engine/groupby/GroupByLongLongHashMapTest.java (1 hunks)
core/src/test/java/io/questdb/test/griffin/engine/groupby/GroupByUtf8SequenceLongHashMapTest.java (1 hunks)
core/src/test/java/io/questdb/test/std/Utf8SequenceLongHashMapTest.java (1 hunks)

👮 Files not reviewed due to content moderation or server errors (8)

core/src/test/java/io/questdb/test/griffin/engine/functions/groupby/ModeBooleanGroupByFunctionFactoryTest.java
core/src/test/java/io/questdb/test/griffin/engine/functions/groupby/ModeIntegrationTest.java
benchmarks/src/main/java/org/questdb/GroupByUtf8SequenceLongHashMapBenchmark.java
core/src/main/java/io/questdb/std/Utf8SequenceLongHashMap.java
core/src/test/java/io/questdb/test/griffin/engine/functions/groupby/ModeStringGroupByFunctionFactoryTest.java
core/src/main/java/io/questdb/griffin/engine/functions/groupby/ModeSymbolGroupByFunction.java
core/src/main/java/io/questdb/griffin/engine/functions/groupby/ModeVarcharGroupByFunction.java
core/src/test/java/io/questdb/test/griffin/engine/functions/groupby/ModeDoubleGroupByFunctionFactoryTest.java

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (27)

GitHub Check: New pull request (Coverage Report Coverage Report)
GitHub Check: New pull request (SelfHosted Other tests on linux-arm64)
GitHub Check: New pull request (SelfHosted Griffin And Fuzz tests on linux-x64-zfs)
GitHub Check: New pull request (SelfHosted Griffin And Fuzz tests on linux-arm64)
GitHub Check: New pull request (SelfHosted Other tests on linux-x64-zfs)
GitHub Check: New pull request (Hosted Running tests on windows-other)
GitHub Check: New pull request (Hosted Running tests on windows-pgwire)
GitHub Check: New pull request (Hosted Running tests on windows-cairo)
GitHub Check: New pull request (Hosted Running tests on windows-fuzz2)
GitHub Check: New pull request (Hosted Running tests on windows-fuzz1)
GitHub Check: New pull request (Hosted Running tests on windows-griffin)
GitHub Check: New pull request (Hosted Running tests on mac-other)
GitHub Check: New pull request (Hosted Running tests on mac-pgwire)
GitHub Check: New pull request (Hosted Running tests on mac-cairo-fuzz)
GitHub Check: New pull request (Hosted Running tests on mac-cairo)
GitHub Check: New pull request (SelfHosted Griffin And Fuzz tests Start ARM Agent)
GitHub Check: New pull request (Hosted Running tests on mac-griffin)
GitHub Check: New pull request (SelfHosted Other tests Start X64Zfs Agent)
GitHub Check: New pull request (SelfHosted Griffin And Fuzz tests Start X64Zfs Agent)
GitHub Check: New pull request (SelfHosted Other tests Start ARM Agent)
GitHub Check: New pull request (Hosted Running tests with cover on linux-other)
GitHub Check: New pull request (Hosted Running tests with cover on linux-pgwire)
GitHub Check: New pull request (Hosted Running tests with cover on linux-cairo)
GitHub Check: New pull request (Hosted Running tests with cover on linux-fuzz)
GitHub Check: New pull request (Hosted Running tests with cover on linux-griffin)
GitHub Check: New pull request (Rust Test and Lint on linux-jdk17)
GitHub Check: New pull request (Check Changes Check changes)

…o try to avoid future confusion

glasstiger · 2025-10-01T14:57:46Z

[PR Coverage check]

😍 pass : 776 / 859 (90.34%)

file detail

	path	covered line	new line	coverage
🔵	io/questdb/std/CharSequenceLongHashMap.java	0	8	00.00%
🔵	io/questdb/std/IntLongHashMap.java	0	6	00.00%
🔵	io/questdb/griffin/engine/functions/groupby/ModeSymbolGroupByFunction.java	61	73	83.56%
🔵	io/questdb/griffin/engine/functions/groupby/ModeSymbolGroupByFunctionFactory.java	6	7	85.71%
🔵	io/questdb/griffin/engine/functions/groupby/ModeDoubleGroupByFunction.java	55	64	85.94%
🔵	io/questdb/griffin/engine/functions/groupby/ModeLongGroupByFunction.java	55	64	85.94%
🔵	io/questdb/griffin/engine/functions/groupby/ModeBooleanGroupByFunction.java	26	30	86.67%
🔵	io/questdb/griffin/engine/functions/groupby/ModeVarcharGroupByFunction.java	64	73	87.67%
🔵	io/questdb/griffin/engine/functions/groupby/ModeStringGroupByFunction.java	65	73	89.04%
🔵	io/questdb/std/Utf8SequenceLongHashMap.java	87	92	94.57%
🔵	io/questdb/griffin/engine/groupby/GroupByLongLongHashMap.java	107	113	94.69%
🔵	io/questdb/griffin/engine/groupby/GroupByUtf8SequenceLongHashMap.java	115	118	97.46%
🔵	io/questdb/griffin/engine/groupby/GroupByCharSequenceLongHashMap.java	115	118	97.46%
🔵	io/questdb/griffin/engine/functions/groupby/ModeDoubleGroupByFunctionFactory.java	4	4	100.00%
🔵	io/questdb/griffin/engine/functions/groupby/ModeVarcharGroupByFunctionFactory.java	4	4	100.00%
🔵	io/questdb/griffin/engine/functions/groupby/ModeBooleanGroupByFunctionFactory.java	4	4	100.00%
🔵	io/questdb/griffin/engine/functions/groupby/ModeLongGroupByFunctionFactory.java	4	4	100.00%
🔵	io/questdb/griffin/engine/functions/groupby/ModeStringGroupByFunctionFactory.java	4	4	100.00%

implement mode(Ø/T/L/K/S) group by function

4ec8820

nwoolmer added the SQL Issues or changes relating to SQL execution label Apr 24, 2025

nwoolmer changed the title ~~implement mode(Ø/T/L/K/S) group by function~~ feat(sql): implement mode(Ø/T/L/K/S) group by function Apr 24, 2025

nwoolmer added 2 commits April 25, 2025 15:50

Add new GroupByLongLongHashMap. This is a new primitive to support an…

1f8f399

… off-heap hashmap, passable by pointer, for use in group by queries.

parallel mode(L)

b3f7a45

nwoolmer changed the title ~~feat(sql): implement mode(Ø/T/L/K/S) group by function~~ feat(sql): implement mode(Ø/T/L/K/S) group by functions Apr 25, 2025

nwoolmer added 3 commits April 25, 2025 18:29

Merge branch 'master' into nw_mode

9936f2b

debris

71b055b

parallelise other mode functions

657fa05

nwoolmer added 2 commits June 21, 2025 16:15

Merge branch 'master' into nw_mode

9ea42fd

add mode(D)

b2cc6e2

nwoolmer changed the title ~~feat(sql): implement mode(Ø/T/L/K/S) group by functions~~ feat(sql): implement mode(Ø/T/L/K/S/D) group by functions Jun 21, 2025

Merge branch 'master' into nw_mode

00868f5

nwoolmer added 10 commits September 29, 2025 13:09

new map tests

c2eb138

mode tests

9171762

boolean

fc381e2

tests

0d10b6f

longs

4482a6c

missing partials

2a03200

double tests

b066d9c

str varchar

f7fbf92

symbol

58b6033

Merge branch 'master' into nw_mode

d021f1a

nwoolmer marked this pull request as ready for review September 29, 2025 16:31

nwoolmer added 3 commits September 29, 2025 17:32

symbol

1aa51ca

ci

c095487

Merge remote-tracking branch 'origin/nw_mode' into nw_mode

2914887

nwoolmer added 4 commits September 29, 2025 17:32

formatting

955509f

more map tests

ee463a3

fix test

5e4699d

additional tests

cc2a1bc

mtopolnik reviewed Sep 30, 2025

View reviewed changes

Comment thread core/src/main/java/io/questdb/griffin/engine/functions/groupby/ModeStringGroupByFunction.java Outdated

Comment thread core/src/main/java/io/questdb/griffin/engine/groupby/GroupByCharSequenceLongHashMap.java

bluestreak01 reviewed Sep 30, 2025

View reviewed changes

Comment thread core/src/main/java/io/questdb/griffin/engine/functions/groupby/ModeBooleanGroupByFunction.java Outdated

bluestreak01 reviewed Sep 30, 2025

View reviewed changes

Comment thread core/src/main/java/io/questdb/griffin/engine/functions/groupby/ModeBooleanGroupByFunction.java Outdated

mtopolnik reviewed Sep 30, 2025

View reviewed changes

Comment thread core/src/main/java/io/questdb/griffin/engine/functions/groupby/ModeBooleanGroupByFunction.java Outdated

bluestreak01 reviewed Sep 30, 2025

View reviewed changes

Comment thread core/src/main/java/io/questdb/griffin/engine/functions/groupby/ModeDoubleGroupByFunction.java

bluestreak01 reviewed Sep 30, 2025

View reviewed changes

Comment thread ...a/io/questdb/test/griffin/engine/functions/groupby/ModeDoubleGroupByFunctionFactoryTest.java

coderabbitai bot reviewed Sep 30, 2025

View reviewed changes

nwoolmer and others added 7 commits September 30, 2025 17:36

address comments

58f62f5

unroll long_sequence temp table in a comment and add header comment t…

4e3f53f

…o try to avoid future confusion

robot fuzz test

ed5979c

Merge branch 'master' into nw_mode

a23cc2f

Configurable loadFactor

ef4b5cf

Fix Java lint

fa46f85

Merge branch 'master' into nw_mode

6d7eb06

mtopolnik previously approved these changes Oct 1, 2025

View reviewed changes

bluestreak01 added 2 commits October 1, 2025 14:54

Merge remote-tracking branch 'origin/master' into nw_mode

fbd4c2b

cleanup

22621f8

bluestreak01 dismissed mtopolnik’s stale review via 22621f8 October 1, 2025 14:09

bluestreak01 approved these changes Oct 1, 2025

View reviewed changes

bluestreak01 merged commit ff18a8f into master Oct 1, 2025
35 checks passed

bluestreak01 deleted the nw_mode branch October 1, 2025 15:42

coderabbitai bot mentioned this pull request Oct 1, 2025

feat(sql): optimized ASOF JOIN on single symbol key where RHS symbol is low-frequency #6208

Merged

Conversation

nwoolmer commented Apr 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rmja commented Jun 17, 2025

Uh oh!

nwoolmer commented Jun 18, 2025

Uh oh!

coderabbitai bot commented Sep 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Suggested labels

Suggested reviewers

Pre-merge checks and finishing touches

Uh oh!

Uh oh!

Uh oh!

bluestreak01 commented Sep 30, 2025

Uh oh!

coderabbitai bot commented Sep 30, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

glasstiger commented Oct 1, 2025

[PR Coverage check]

file detail

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

nwoolmer commented Apr 24, 2025 •

edited

Loading

coderabbitai bot commented Sep 24, 2025 •

edited

Loading