Skip to content

perf(sql): improve ORDER BY performance for fixed-width column types#6862

Merged
bluestreak01 merged 42 commits intomasterfrom
sort_opt
Mar 13, 2026
Merged

perf(sql): improve ORDER BY performance for fixed-width column types#6862
bluestreak01 merged 42 commits intomasterfrom
sort_opt

Conversation

@kafka1991
Copy link
Copy Markdown
Collaborator

@kafka1991 kafka1991 commented Mar 10, 2026

Summary

  • Introduces an encoded sort path for ORDER BY queries. SortKeyEncoder encodes column values into fixed-width, order-preserving binary keys (8/16/24/32 bytes), enabling comparisons via native uint64 operations instead of per-column type dispatch.
  • Adds a fast path (encodeFixed8) for single-column sorts on types ≤8 bytes, avoiding reverseBytes.
  • Supports all fixed-width column types, symbols (via rank maps), ASC/DESC directions, and NULL ordering.
  • The native sort (sortEncodedEntries) is a three-layer hybrid:
    • vergesort: Jump-sampling adaptive sort that detects natural runs. O(n) for pre-sorted time-series data. When multiple runs are detected, falls back to radix sort instead of std::inplace_merge to avoid heap allocation.
    • MSD radix sort (American Flag Sort): In-place byte-by-byte partitioning using cycle sort. For arrays >= 100K entries, the first radix pass runs single-threaded, then the resulting buckets are sorted in parallel across all available cores via atomic work-stealing.
    • pdqsort: Pattern-defeating quicksort for partitions < 128 elements, with heapsort fallback guaranteeing O(n log n) worst-case.
  • The entire native sort path is allocation-free: no malloc, no mmap, no std::inplace_merge buffer. All data structures (count arrays, bucket lists, thread handles) are stack-allocated.
  • Uses pdqsort instead of std::sort for the comparison-sort fallback, ensuring consistent performance across platforms regardless of the standard library implementation.
  • Adds EncodedSortRecordCursorFactory for the non-random-access path, replacing the R-B tree (RecordTreeChain) with flat array collect + radix sort + RecordChain offset lookup.
  • Replaces LongSortedLightRecordCursorFactory with EncodedSortLightRecordCursorFactory for single-column long/timestamp sorts.
  • Guarantees stable sort: when all sort key columns are equal, the comparator falls back to rowId ordering to preserve insertion order, matching the R-B tree's behavior.

tandem: https://github.com/questdb/questdb-enterprise/pull/942

Benchmark

5-column ORDER BY (symbol, ecn, counterparty, passive, horizon_sec) over a HORIZON JOIN with 3,085,929 result rows:

Branch Time
master 6.98s
this PR 1.26s

2-column ORDER BY (symbol, best_ask) over 1,000,000,000 rows, 10 symbols:

Branch Time
master >15 min
this PR (single-threaded) 90s
this PR (parallel bucket sort) 63.6s

Query:

SELECT *
FROM market_data
ORDER BY symbol, best_ask
5-column benchmark query
SELECT
    t.symbol,
    t.ecn,
    t.counterparty,
    t.passive,
    h.offset / 1000000000 AS horizon_sec,
    count() AS n,
    avg(
        CASE t.side
            WHEN 'buy'  THEN ((m.best_bid + m.best_ask) / 2 - t.price)
                             / t.price * 10000
            WHEN 'sell' THEN (t.price - (m.best_bid + m.best_ask) / 2)
                             / t.price * 10000
        END
    ) AS avg_markout_bps,
    sum(
        CASE t.side
            WHEN 'buy'  THEN ((m.best_bid + m.best_ask) / 2 - t.price)
                             * t.quantity
            WHEN 'sell' THEN (t.price - (m.best_bid + m.best_ask) / 2)
                             * t.quantity
        END
    ) AS total_pnl
FROM fx_trades t
HORIZON JOIN market_data m ON (symbol)
    RANGE FROM 0s TO 30s STEP 5s AS h
WHERE t.timestamp IN '$now-48h..$now'
GROUP BY t.symbol, t.ecn, t.counterparty, t.passive, horizon_sec
ORDER BY t.symbol, t.ecn, t.counterparty, t.passive, horizon_sec

Limitations

  • Key size is capped at 32 bytes. ORDER BY clauses exceeding this fall back to the existing R-B tree sort.
  • The parallel sort path creates OS threads per sort call rather than using a thread pool. Thread count is capped at min(hardware_concurrency, bucket_count).

@kafka1991 kafka1991 marked this pull request as draft March 10, 2026 05:37
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 10, 2026

Walkthrough

Replaces the ORDER BY radix-sort path with an encoded-sort implementation: adds native C/C++ encoded-key sorter and JNI bridge, introduces SortKeyEncoder/SORT key types and encoded-sort cursor factories, removes radix-sort configuration and legacy LongSortedLight cursor, and updates many tests and plan labels to "Encode sort" / "Encode sort light".

Changes

Cohort / File(s) Summary
C/C++ native sort & formatting
core/.clang-format, core/src/main/c/share/ooo.cpp, core/src/main/c/share/ooo.h
Added clang-format config, new clzll helper, and large encoded-key MSD radix/vergesort implementation with JNI entry for sorting encoded entries.
Java JNI binding
core/src/main/java/io/questdb/std/Vect.java
Added native method sortEncodedEntries(long addr, long count, int keyLongs).
Encoder core types
core/src/main/java/io/questdb/griffin/engine/orderby/SortKeyEncoder.java, .../SortKeyType.java
New SortKeyEncoder, symbol rank maps, encoding routines, and SortKeyType enum supporting fixed key lengths and entry sizing.
Encoded sort cursors & factories
core/src/main/java/io/questdb/griffin/engine/orderby/EncodedSortLightRecordCursor.java, EncodedSortLightRecordCursorFactory.java, EncodedSortRecordCursor.java, EncodedSortRecordCursorFactory.java
New light and full encoded-sort DelegatingRecordCursor implementations and factories with lazy build, memory management, and integration with SortKeyEncoder.
Codegen integration
core/src/main/java/io/questdb/griffin/SqlCodeGenerator.java
Switched ORDER BY planning to prefer encoded-sort factories when supported and feature flag enabled.
Removed radix-light implementation
core/src/main/java/io/questdb/griffin/engine/orderby/LongSortedLightRecordCursor.java
Deleted legacy single-column radix-based LongSortedLightRecordCursor.
Configuration API changes
core/src/main/java/io/questdb/PropServerConfiguration.java, core/src/main/java/io/questdb/PropertyKey.java, core/src/main/java/io/questdb/cairo/CairoConfiguration.java, .../CairoConfigurationWrapper.java, .../DefaultCairoConfiguration.java
Removed sqlOrderByRadixSortThreshold, deprecated the PropertyKey constant, and eliminated associated getters/initialization.
Tests: new encoded-sort suites
core/src/test/java/io/questdb/test/griffin/OrderByEncodeSortTest.java, .../OrderByEncodeSortFuzzTest.java
Added parameterized and fuzz tests exercising encoded-sort across types, directions, nulls, and multi-column keys.
Tests: removals & updates
core/src/test/java/io/questdb/test/griffin/OrderByRadixSortTest.java (deleted), many test files under core/src/test/java/... (30+ files), core/src/test/resources/server.conf
Deleted radix-sort test class, removed radix threshold from server.conf, and updated many EXPECTED explain/plan strings from "Sort"/"Radix sort light"/"Sort light" to "Encode sort"/"Encode sort light". Also adjusted related assertions and resource limits in some tests.
Misc test fixes
core/src/test/java/io/questdb/test/PropServerConfigurationTest.java, core/src/test/java/io/questdb/test/ServerMainTest.java, core/src/test/java/io/questdb/test/cairo/SecurityTest.java, core/src/test/java/io/questdb/test/griffin/engine/orderby/OrderBySortKeyMaterializationTest.java
Removed assertions for removed config, changed plan labels and expected error messages, adjusted resource parameters, and added test setup to toggle the encoded-sort flag where needed.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Suggested labels

Performance, SQL, Enhancement

Suggested reviewers

  • puzpuzpuz
  • bluestreak01
  • glasstiger
🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.82% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description check ✅ Passed The pull request description clearly outlines a substantial encoded-sort optimization for ORDER BY queries, including detailed technical explanations of the implementation, benchmark results showing significant performance improvements, and documentation of limitations.
Title check ✅ Passed The title accurately and concisely describes the primary change: improving ORDER BY performance for fixed-width column types through encoded sorting.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch sort_opt
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@kafka1991 kafka1991 changed the title Perf(sql): order by optimization perf(sql): add encoded sort for ORDER BY Mar 10, 2026
@kafka1991 kafka1991 marked this pull request as ready for review March 10, 2026 15:19
GitHub Actions - Rebuild Native Libraries and others added 2 commits March 10, 2026 15:23
@ideoma
Copy link
Copy Markdown
Collaborator

ideoma commented Mar 10, 2026

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 10, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🧹 Nitpick comments (9)
core/src/test/java/io/questdb/test/griffin/engine/functions/groupby/GeomeanDoubleGroupByFunctionFactoryTest.java (1)

71-91: Normalize the new SQL to the test style guide.

The added statements use lowercase SQL keywords, and the longer CTAS/query in testGeomeanAgainstExpAvgLnFormula() would be clearer as text blocks. Please keep new test SQL in the repository’s standard format. As per coding guidelines, "In SQL statements within test code, use UPPERCASE for SQL keywords" and "Use multiline strings for longer statements (multiple INSERT rows, complex queries) and to assert multiline query results".

Also applies to: 116-152

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@core/src/test/java/io/questdb/test/griffin/engine/functions/groupby/GeomeanDoubleGroupByFunctionFactoryTest.java`
around lines 71 - 91, The test uses lowercase inline SQL and a single-line
string for a longer CTAS/query; update the statements in
testGeomeanAgainstExpAvgLnFormula() to follow the project style by converting
SQL keywords to UPPERCASE and using multiline text blocks for the CTAS and the
SELECT used in assertSql; specifically replace the execute("create table tab as
(... )") call with a multiline string containing UPPERCASE keywords (e.g.,
CREATE TABLE AS SELECT FROM LONG_SEQUENCE) and similarly make the SELECT passed
to assertSql a text block with all SQL keywords uppercased, leaving the expected
result string unchanged, and ensure you apply the same changes to the other
occurrence noted (lines 116-152) for consistency.
core/src/test/java/io/questdb/test/griffin/WindowJoinTest.java (1)

120-132: Extract a helper for encoded-sort plan prefixes.

This rename touched a lot of assertions for a pure plan-node label change. Pulling the "Encode sort" / "Encode sort light" prefixes into a small helper would make future planner-label churn much less noisy in this file.

♻️ Example shape
+    private static String encodeSortPlan(String keys, String body) {
+        return "Encode sort\n" +
+                "  keys: [" + keys + "]\n" +
+                body;
+    }
+
+    private static String encodeSortLightPlan(String keys, String body) {
+        return "Encode sort light\n" +
+                "  keys: [" + keys + "]\n" +
+                body;
+    }

Also applies to: 4374-4429

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@core/src/test/java/io/questdb/test/griffin/WindowJoinTest.java` around lines
120 - 132, Refactor repeated plan-prefix strings by extracting a helper method
(e.g., encodeSortPrefix or getEncodeSortPrefix) that returns either "Encode
sort\n" or "Encode sort light\n" based on a parameter, then replace all
hard-coded occurrences in WindowJoinTest (including the multi-line expected
strings around the Async Window Fast Join blocks at the locations shown and the
similar block at lines ~4374-4429) to call this helper so assertions concatenate
the helper result with the rest of the expected plan text; update test helper
usages where the prefix is needed so future label renames are handled in one
place.
core/src/test/java/io/questdb/test/griffin/SecurityTest.java (2)

560-560: Avoid pinning these assertions to the EncodedSort implementation name.

These checks will start failing on harmless renames/refactors even if the memory-limit behavior is still correct. If the intent is to validate the failure mode, assert a stable fragment like memory exceeded or an error code, and keep plan/path assertions separate.

Also applies to: 809-809, 1092-1092, 1156-1156

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@core/src/test/java/io/questdb/test/griffin/SecurityTest.java` at line 560,
Replace assertions that match the full implementation name "memory exceeded in
EncodedSort" with a stable fragment such as "memory exceeded" (e.g. change the
Assert.assertTrue(...contains("memory exceeded in EncodedSort")) calls in
SecurityTest.java to assert on "memory exceeded"). Keep any plan/path or
implementation-name checks as separate assertions if you still need to validate
the execution plan; update the occurrences around the Assert.assertTrue calls at
the noted spots (lines with Assert.assertTrue(ex.toString().contains(...))) so
they check the generic error text or an error code rather than the concrete
class name.

1148-1156: This change no longer exercises the tree fallback path.

ORDER BY sym1, count() stays within the encoded-key path described in this PR, so this test now validates EncodedSort rather than the tree-resize scenario implied by testTreeResizesWithImplicitGroupBy. Please keep a separate case that forces the fallback sorter, or rename this test if that coverage moved elsewhere.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@core/src/test/java/io/questdb/test/griffin/SecurityTest.java` around lines
1148 - 1156, The current assertion in testTreeResizesWithImplicitGroupBy now
hits EncodedSort (query "select sym1, count() from tb1 order by sym1, count()")
and no longer triggers the tree-resize fallback; either add a separate assertion
that will force the fallback sorter (for example run the same aggregation but
order only by the aggregate like "select sym1, count() from tb1 order by
count()" or otherwise change the ORDER BY to one that does not use the
encoded-key path) or rename the test to reflect it now targets EncodedSort;
update the test body around the existing execute/query call and the
Assert.assertTrue checking for "memory exceeded in EncodedSort" (or add a new
try/catch block) so there is a distinct case that exercises the tree-resize
fallback path referenced by testTreeResizesWithImplicitGroupBy.
core/src/main/java/io/questdb/std/Vect.java (1)

419-419: Place sortEncodedEntries() in the alphabetical sort* section.

The new method was inserted before sortArrayColumn(), which breaks the required ordering for public static members in Java files.

As per coding guidelines, **/*.java: "Group Java class members by kind (static vs. instance) and visibility, sorted alphabetically."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@core/src/main/java/io/questdb/std/Vect.java` at line 419, The new native
method sortEncodedEntries(long addr, long count, int keyLongs) in class Vect
should be relocated into the alphabetical block of public static sort* methods:
move its declaration so it appears in the same section as and alphabetically
among other sort* methods (for example placing it after sortArrayColumn() or
wherever it fits alphabetically), preserving the native modifier and signature.
core/src/main/java/io/questdb/cairo/CairoConfigurationWrapper.java (1)

1559-1562: Move isSqlParquetRowGroupPruningEnabled() into the alphabetical isSqlP... block.

This delegation is fine, but its current placement comes after all isSqlParallel* members, so the class no longer follows the required member ordering for Java files.

As per coding guidelines, **/*.java: "Group Java class members by kind (static vs. instance) and visibility, sorted alphabetically."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@core/src/main/java/io/questdb/cairo/CairoConfigurationWrapper.java` around
lines 1559 - 1562, The method isSqlParquetRowGroupPruningEnabled() is misplaced
after the isSqlParallel* members, breaking the alphabetical grouping; move the
entire method (public boolean isSqlParquetRowGroupPruningEnabled()) into the
existing alphabetical isSqlP... block among the other isSqlParquet*/isSqlP*
methods so members remain grouped and alphabetized by name consistent with the
class's Java member ordering rules.
core/src/test/java/io/questdb/test/griffin/engine/groupby/SampleByTest.java (1)

6279-6326: Use UPPERCASE for SQL keywords in test SQL statements.

The SQL statements use lowercase keywords (select, from, sample by, fill), which does not follow the coding guidelines for test code.

♻️ Proposed fix
             assertSql(
                     """
                             ts\tavg
                             2017-12-30T00:00:00.000000Z\t72.5
                             2018-01-04T00:00:00.000000Z\t264.5
                             2018-01-09T00:00:00.000000Z\t432.5
                             """,
-                    "select ts, avg(x) from fromto\n" +
-                            "sample by 5d"
+                    "SELECT ts, avg(x) FROM fromto\n" +
+                            "SAMPLE BY 5d"
             );
             assertSql(
                     """
                             ts\tavg
                             2017-12-20T00:00:00.000000Z\tnull
                             2017-12-25T00:00:00.000000Z\tnull
                             2017-12-30T00:00:00.000000Z\t72.5
                             2018-01-04T00:00:00.000000Z\t264.5
                             2018-01-09T00:00:00.000000Z\t432.5
                             """,
-                    "select ts, avg(x) from fromto\n" +
-                            "sample by 5d from '2017-12-20' fill(null)"
+                    "SELECT ts, avg(x) FROM fromto\n" +
+                            "SAMPLE BY 5d FROM '2017-12-20' FILL(null)"
             );
             assertSql(
                     """
                             ts\tavg
                             2017-12-20T00:00:00.000000Z\tnull
                             2017-12-25T00:00:00.000000Z\tnull
                             2017-12-30T00:00:00.000000Z\t72.5
                             2018-01-04T00:00:00.000000Z\t264.5
                             2018-01-09T00:00:00.000000Z\t432.5
                             2018-01-14T00:00:00.000000Z\tnull
                             2018-01-19T00:00:00.000000Z\tnull
                             2018-01-24T00:00:00.000000Z\tnull
                             2018-01-29T00:00:00.000000Z\tnull
                             """,
-                    "select ts, avg(x) from fromto\n" +
-                            "sample by 5d from '2017-12-20' to '2018-01-31' fill(null)"
+                    "SELECT ts, avg(x) FROM fromto\n" +
+                            "SAMPLE BY 5d FROM '2017-12-20' TO '2018-01-31' FILL(null)"
             );

As per coding guidelines: "In SQL statements within test code, use UPPERCASE for SQL keywords (CREATE TABLE, INSERT, SELECT ... AS ... FROM, etc.)"

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@core/src/test/java/io/questdb/test/griffin/engine/groupby/SampleByTest.java`
around lines 6279 - 6326, In testSampleByFromToFillNull, update the SQL literals
passed to assertSql so all SQL keywords are UPPERCASE: change "select" to
"SELECT", "from" to "FROM", "sample by" to "SAMPLE BY", "from '...'"/ "to '...'"
to "FROM '...'" / "TO '...'", and "fill(null)" to "FILL(NULL)" in the three
assertSql calls (the queries that reference the fromto table); keep string
literals like dates and identifiers unchanged.
core/src/test/java/io/questdb/test/griffin/OrderByEncodeSortTest.java (1)

89-102: Use UPPERCASE for SQL keywords.

Per coding guidelines, SQL keywords in test code should be UPPERCASE. The query and DDL statements use lowercase select, from, create table, case, when, then, else, end, as.

♻️ Example fix for one test
-                "select * from x order by a asc;",
-                "create table x as (" +
-                        "select" +
-                        " cast (case" +
-                        "     when x < 10 then x" +
-                        "     when x >= 10 and x < 15 then null" +
-                        "     else x - 25" +
-                        " end as date) as a" +
-                        " from long_sequence(25)" +
-                        ")",
+                "SELECT * FROM x ORDER BY a ASC",
+                "CREATE TABLE x AS (" +
+                        "SELECT" +
+                        " CAST(CASE" +
+                        "     WHEN x < 10 THEN x" +
+                        "     WHEN x >= 10 AND x < 15 THEN NULL" +
+                        "     ELSE x - 25" +
+                        " END AS DATE) AS a" +
+                        " FROM long_sequence(" + 25 + "))",

As per coding guidelines: "In SQL statements within test code, use UPPERCASE for SQL keywords".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@core/src/test/java/io/questdb/test/griffin/OrderByEncodeSortTest.java` around
lines 89 - 102, The SQL literals in OrderByEncodeSortTest.java use lowercase
keywords; update the two string literals (the query string "select * from x
order by a asc;" and the DDL passed as the second argument) to use UPPERCASE SQL
keywords (e.g., SELECT, FROM, ORDER BY, ASC, CREATE TABLE AS, CAST, CASE, WHEN,
THEN, ELSE, END, AS) while preserving identifiers, expressions and spacing so
the test behavior is unchanged.
core/src/test/java/io/questdb/test/griffin/OrderByEncodeSortFuzzTest.java (1)

43-69: Use UPPERCASE for SQL keywords in test code.

Per coding guidelines, SQL keywords should be UPPERCASE in test code. Consider updating select, case, when, then, else, end, from to their uppercase equivalents.

♻️ Suggested change
             execute(
-                    "CREATE TABLE fuzz_sort AS (SELECT" +
-                            " rnd_boolean() col_bool," +
+                    "CREATE TABLE fuzz_sort AS (SELECT " +
+                            "rnd_boolean() col_bool," +
                             ...
-                            " FROM long_sequence(" + rowCount + ")) TIMESTAMP(ts)"
+                            " FROM long_sequence(" + rowCount + ")) TIMESTAMP(ts)"

The CREATE TABLE, AS, SELECT, FROM, and TIMESTAMP keywords should remain uppercase (they already are), but ensure consistency throughout.

As per coding guidelines: "In SQL statements within test code, use UPPERCASE for SQL keywords".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@core/src/test/java/io/questdb/test/griffin/OrderByEncodeSortFuzzTest.java`
around lines 43 - 69, The SQL in OrderByEncodeSortFuzzTest uses lowercase SQL
keywords (e.g., "select", "from", "timestamp") inconsistent with project
guidelines; update the CREATE TABLE statement inside the test so all SQL
keywords are UPPERCASE (SELECT, FROM, TIMESTAMP and any other keywords like
CASE/WHEN/THEN/ELSE/END if present) while leaving function names (rnd_*), column
names, and symbols unchanged; locate the SQL string in OrderByEncodeSortFuzzTest
(the execute(...) call that creates fuzz_sort) and replace the lowercase
keywords with their uppercase equivalents for consistency.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@core/src/main/c/share/ooo.cpp`:
- Around line 1468-1470: The code uses GCC/Clang-specific __builtin_clzll and
doesn't handle count==0; replace this with a cross-platform count-leading-zeros
helper and guard for zero. Implement a wrapper (e.g.,
count_leading_zeros64(uint64_t)) that uses std::countl_zero(count) from <bit>
when available, falls back to _BitScanReverse64 on MSVC via conditional
compilation, and provides a portable loop-based fallback; then compute lg = 63 -
count_leading_zeros64(count) but ensure you handle count==0 (return 0 or set
lg=1 as original intent) before using it to calculate unstable_limit = count /
lg. Ensure necessary headers and conditional `#includes` are added and no
undefined behavior occurs for count==0.

In `@core/src/main/java/io/questdb/griffin/engine/orderby/SortKeyEncoder.java`:
- Around line 156-162: The code undercounts needed bytes because ranks are
stored as (rank + 1) (see the use of "rank + 1"), so a symbol table of size 255
produces a max stored value of 256 and won't fit in 1 byte; update the threshold
logic in SortKeyEncoder where columnByteWidths is set (refer to symbolCount and
columnByteWidths[i]) to account for the +1 by either comparing symbolCount <
0xFF and symbolCount < 0xFFFF or by testing (symbolCount + 1) <= 0xFF and
(symbolCount + 1) <= 0xFFFF so that tables of exactly 256 or 65536 symbols pick
2 or 4 bytes respectively.

In `@core/src/main/java/io/questdb/griffin/engine/orderby/SortKeyType.java`:
- Around line 40-53: The method SortKeyType.fromKeyLength currently maps zero or
negative keyLength to FIXED_8; change it to reject non-positive lengths by
adding an early check in fromKeyLength (e.g., if keyLength <= 0 return
UNSUPPORTED or throw IllegalArgumentException) before the existing <=8/16/24/32
branches so invalid encoder states aren't treated as 8-byte keys.

In
`@core/src/test/java/io/questdb/test/griffin/engine/functions/groupby/GeomeanDoubleGroupByFunctionFactoryTest.java`:
- Around line 68-93: Wrap the test method testGeomeanAgainstExpAvgLnFormula (and
the similar test at lines 112-154) inside assertMemoryLeak(), and replace the
use of assertSql(...) with the leak-aware query helper
assertQueryNoLeakCheck(...) so the table creation and query assertions run under
the memory-leak guard; specifically, call assertMemoryLeak(() -> { create the
table via execute(...); then invoke assertQueryNoLeakCheck(expectedResultString,
queryString, "tab"); }) referencing the test method name
testGeomeanAgainstExpAvgLnFormula and the other affected test to locate where to
apply the change.

In
`@core/src/test/java/io/questdb/test/griffin/engine/orderby/FunctionComplexityTest.java`:
- Around line 58-68: The test FunctionComplexityTest currently expects a plan
without a materialization node even though its intent is to verify that a
non-SYMBOL computed sort key (a::DOUBLE) exceeding the complexity threshold
triggers sort-key materialization; update the test to either (A) rename the
test/comment to reflect the new behavior if materialization is intentionally
removed, or (B) restore/assert the materialization path by adjusting the
expected plan passed to assertPlanNoLeakCheck so it includes the materialization
node for the computed expression (a::double) — ensure the assertion specifically
targets non-SYMBOL computed keys and references the test harness method
assertPlanNoLeakCheck and the test class FunctionComplexityTest to prevent
regressions in complexity-based materialization.

In `@core/src/test/java/io/questdb/test/griffin/ExplainPlanTest.java`:
- Around line 10291-10294: The expected explain plan in ExplainPlanTest.java
incorrectly treats ORDER BY 3 as referring to the projected column "sum" instead
of the third projected column "a0"; update the expected plan string in the test
so the key list shows "a0" (or the correct alias used in the projection) instead
of "sum" for the ORDER BY 3 entry (look for the test method or expectation
string that contains the block starting "SelectedRecord\n    Encode sort\n     
keys: [column desc, sum, berlin_ts desc]" and replace "sum" with "a0").

In `@core/src/test/java/io/questdb/test/griffin/SecurityTest.java`:
- Line 1148: The SQL in the test contains lowercase keywords; update the query
string "select sym1, count() from tb1 order by sym1, count()" to use uppercase
SQL keywords (e.g., SELECT, FROM, ORDER BY) so it follows the test convention
and coding guidelines.

---

Nitpick comments:
In `@core/src/main/java/io/questdb/cairo/CairoConfigurationWrapper.java`:
- Around line 1559-1562: The method isSqlParquetRowGroupPruningEnabled() is
misplaced after the isSqlParallel* members, breaking the alphabetical grouping;
move the entire method (public boolean isSqlParquetRowGroupPruningEnabled())
into the existing alphabetical isSqlP... block among the other
isSqlParquet*/isSqlP* methods so members remain grouped and alphabetized by name
consistent with the class's Java member ordering rules.

In `@core/src/main/java/io/questdb/std/Vect.java`:
- Line 419: The new native method sortEncodedEntries(long addr, long count, int
keyLongs) in class Vect should be relocated into the alphabetical block of
public static sort* methods: move its declaration so it appears in the same
section as and alphabetically among other sort* methods (for example placing it
after sortArrayColumn() or wherever it fits alphabetically), preserving the
native modifier and signature.

In
`@core/src/test/java/io/questdb/test/griffin/engine/functions/groupby/GeomeanDoubleGroupByFunctionFactoryTest.java`:
- Around line 71-91: The test uses lowercase inline SQL and a single-line string
for a longer CTAS/query; update the statements in
testGeomeanAgainstExpAvgLnFormula() to follow the project style by converting
SQL keywords to UPPERCASE and using multiline text blocks for the CTAS and the
SELECT used in assertSql; specifically replace the execute("create table tab as
(... )") call with a multiline string containing UPPERCASE keywords (e.g.,
CREATE TABLE AS SELECT FROM LONG_SEQUENCE) and similarly make the SELECT passed
to assertSql a text block with all SQL keywords uppercased, leaving the expected
result string unchanged, and ensure you apply the same changes to the other
occurrence noted (lines 116-152) for consistency.

In `@core/src/test/java/io/questdb/test/griffin/engine/groupby/SampleByTest.java`:
- Around line 6279-6326: In testSampleByFromToFillNull, update the SQL literals
passed to assertSql so all SQL keywords are UPPERCASE: change "select" to
"SELECT", "from" to "FROM", "sample by" to "SAMPLE BY", "from '...'"/ "to '...'"
to "FROM '...'" / "TO '...'", and "fill(null)" to "FILL(NULL)" in the three
assertSql calls (the queries that reference the fromto table); keep string
literals like dates and identifiers unchanged.

In `@core/src/test/java/io/questdb/test/griffin/OrderByEncodeSortFuzzTest.java`:
- Around line 43-69: The SQL in OrderByEncodeSortFuzzTest uses lowercase SQL
keywords (e.g., "select", "from", "timestamp") inconsistent with project
guidelines; update the CREATE TABLE statement inside the test so all SQL
keywords are UPPERCASE (SELECT, FROM, TIMESTAMP and any other keywords like
CASE/WHEN/THEN/ELSE/END if present) while leaving function names (rnd_*), column
names, and symbols unchanged; locate the SQL string in OrderByEncodeSortFuzzTest
(the execute(...) call that creates fuzz_sort) and replace the lowercase
keywords with their uppercase equivalents for consistency.

In `@core/src/test/java/io/questdb/test/griffin/OrderByEncodeSortTest.java`:
- Around line 89-102: The SQL literals in OrderByEncodeSortTest.java use
lowercase keywords; update the two string literals (the query string "select *
from x order by a asc;" and the DDL passed as the second argument) to use
UPPERCASE SQL keywords (e.g., SELECT, FROM, ORDER BY, ASC, CREATE TABLE AS,
CAST, CASE, WHEN, THEN, ELSE, END, AS) while preserving identifiers, expressions
and spacing so the test behavior is unchanged.

In `@core/src/test/java/io/questdb/test/griffin/SecurityTest.java`:
- Line 560: Replace assertions that match the full implementation name "memory
exceeded in EncodedSort" with a stable fragment such as "memory exceeded" (e.g.
change the Assert.assertTrue(...contains("memory exceeded in EncodedSort"))
calls in SecurityTest.java to assert on "memory exceeded"). Keep any plan/path
or implementation-name checks as separate assertions if you still need to
validate the execution plan; update the occurrences around the Assert.assertTrue
calls at the noted spots (lines with
Assert.assertTrue(ex.toString().contains(...))) so they check the generic error
text or an error code rather than the concrete class name.
- Around line 1148-1156: The current assertion in
testTreeResizesWithImplicitGroupBy now hits EncodedSort (query "select sym1,
count() from tb1 order by sym1, count()") and no longer triggers the tree-resize
fallback; either add a separate assertion that will force the fallback sorter
(for example run the same aggregation but order only by the aggregate like
"select sym1, count() from tb1 order by count()" or otherwise change the ORDER
BY to one that does not use the encoded-key path) or rename the test to reflect
it now targets EncodedSort; update the test body around the existing
execute/query call and the Assert.assertTrue checking for "memory exceeded in
EncodedSort" (or add a new try/catch block) so there is a distinct case that
exercises the tree-resize fallback path referenced by
testTreeResizesWithImplicitGroupBy.

In `@core/src/test/java/io/questdb/test/griffin/WindowJoinTest.java`:
- Around line 120-132: Refactor repeated plan-prefix strings by extracting a
helper method (e.g., encodeSortPrefix or getEncodeSortPrefix) that returns
either "Encode sort\n" or "Encode sort light\n" based on a parameter, then
replace all hard-coded occurrences in WindowJoinTest (including the multi-line
expected strings around the Async Window Fast Join blocks at the locations shown
and the similar block at lines ~4374-4429) to call this helper so assertions
concatenate the helper result with the rest of the expected plan text; update
test helper usages where the prefix is needed so future label renames are
handled in one place.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 6f3345ea-2a1b-4e6f-acd9-48378718fe93

📥 Commits

Reviewing files that changed from the base of the PR and between bf956c6 and 057de68.

⛔ Files ignored due to path filters (6)
  • core/src/main/bin/linux-aarch64/libjemalloc.so is excluded by !**/*.so
  • core/src/main/resources/io/questdb/bin/darwin-aarch64/libquestdb.dylib is excluded by !**/*.dylib
  • core/src/main/resources/io/questdb/bin/darwin-x86-64/libquestdb.dylib is excluded by !**/*.dylib
  • core/src/main/resources/io/questdb/bin/linux-aarch64/libquestdb.so is excluded by !**/*.so
  • core/src/main/resources/io/questdb/bin/linux-x86-64/libquestdb.so is excluded by !**/*.so
  • core/src/main/resources/io/questdb/bin/windows-x86-64/libquestdb.dll is excluded by !**/*.dll
📒 Files selected for processing (53)
  • core/.clang-format
  • core/src/main/c/share/ooo.cpp
  • core/src/main/java/io/questdb/PropServerConfiguration.java
  • core/src/main/java/io/questdb/PropertyKey.java
  • core/src/main/java/io/questdb/cairo/CairoConfiguration.java
  • core/src/main/java/io/questdb/cairo/CairoConfigurationWrapper.java
  • core/src/main/java/io/questdb/cairo/DefaultCairoConfiguration.java
  • core/src/main/java/io/questdb/griffin/SqlCodeGenerator.java
  • core/src/main/java/io/questdb/griffin/engine/orderby/EncodedSortLightRecordCursor.java
  • core/src/main/java/io/questdb/griffin/engine/orderby/EncodedSortLightRecordCursorFactory.java
  • core/src/main/java/io/questdb/griffin/engine/orderby/EncodedSortRecordCursor.java
  • core/src/main/java/io/questdb/griffin/engine/orderby/EncodedSortRecordCursorFactory.java
  • core/src/main/java/io/questdb/griffin/engine/orderby/LongSortedLightRecordCursor.java
  • core/src/main/java/io/questdb/griffin/engine/orderby/SortKeyEncoder.java
  • core/src/main/java/io/questdb/griffin/engine/orderby/SortKeyType.java
  • core/src/main/java/io/questdb/std/Vect.java
  • core/src/test/java/io/questdb/test/PropServerConfigurationTest.java
  • core/src/test/java/io/questdb/test/ServerMainQuerySmokeTest.java
  • core/src/test/java/io/questdb/test/ServerMainTest.java
  • core/src/test/java/io/questdb/test/cairo/ArrayTest.java
  • core/src/test/java/io/questdb/test/cairo/fuzz/ParallelGroupByFuzzTest.java
  • core/src/test/java/io/questdb/test/cairo/view/ViewsFunctionTest.java
  • core/src/test/java/io/questdb/test/griffin/AggregateTest.java
  • core/src/test/java/io/questdb/test/griffin/ClickBenchTest.java
  • core/src/test/java/io/questdb/test/griffin/DistinctTest.java
  • core/src/test/java/io/questdb/test/griffin/ExplainPlanTest.java
  • core/src/test/java/io/questdb/test/griffin/GroupByTest.java
  • core/src/test/java/io/questdb/test/griffin/HorizonJoinTest.java
  • core/src/test/java/io/questdb/test/griffin/IPv4Test.java
  • core/src/test/java/io/questdb/test/griffin/LimitTest.java
  • core/src/test/java/io/questdb/test/griffin/OrderByEncodeSortFuzzTest.java
  • core/src/test/java/io/questdb/test/griffin/OrderByEncodeSortTest.java
  • core/src/test/java/io/questdb/test/griffin/OrderByRadixSortTest.java
  • core/src/test/java/io/questdb/test/griffin/OrderByWithFilterTest.java
  • core/src/test/java/io/questdb/test/griffin/PivotTest.java
  • core/src/test/java/io/questdb/test/griffin/ProjectionReferenceTest.java
  • core/src/test/java/io/questdb/test/griffin/SecurityTest.java
  • core/src/test/java/io/questdb/test/griffin/SqlOptimiserTest.java
  • core/src/test/java/io/questdb/test/griffin/WindowJoinTest.java
  • core/src/test/java/io/questdb/test/griffin/engine/SqlCodeGeneratorTest.java
  • core/src/test/java/io/questdb/test/griffin/engine/functions/array/DoubleArrayRoundFunctionFactoryTest.java
  • core/src/test/java/io/questdb/test/griffin/engine/functions/catalogue/PgAttributeFunctionFactoryTest.java
  • core/src/test/java/io/questdb/test/griffin/engine/functions/groupby/ArgMaxDoubleDoubleGroupByFunctionFactoryTest.java
  • core/src/test/java/io/questdb/test/griffin/engine/functions/groupby/ArgMaxDoubleTimestampGroupByFunctionFactoryTest.java
  • core/src/test/java/io/questdb/test/griffin/engine/functions/groupby/ArgMaxTimestampDoubleGroupByFunctionFactoryTest.java
  • core/src/test/java/io/questdb/test/griffin/engine/functions/groupby/GeomeanDoubleGroupByFunctionFactoryTest.java
  • core/src/test/java/io/questdb/test/griffin/engine/groupby/GroupByFunctionCaseTest.java
  • core/src/test/java/io/questdb/test/griffin/engine/groupby/SampleByNanoTimestampTest.java
  • core/src/test/java/io/questdb/test/griffin/engine/groupby/SampleByTest.java
  • core/src/test/java/io/questdb/test/griffin/engine/orderby/FunctionComplexityTest.java
  • core/src/test/java/io/questdb/test/griffin/engine/orderby/OrderBySortKeyMaterializationTest.java
  • core/src/test/java/io/questdb/test/griffin/engine/table/AsyncFilteredRecordCursorFactoryTest.java
  • core/src/test/resources/server.conf
💤 Files with no reviewable changes (7)
  • core/src/test/resources/server.conf
  • core/src/main/java/io/questdb/cairo/CairoConfiguration.java
  • core/src/test/java/io/questdb/test/griffin/OrderByRadixSortTest.java
  • core/src/test/java/io/questdb/test/ServerMainTest.java
  • core/src/main/java/io/questdb/cairo/DefaultCairoConfiguration.java
  • core/src/test/java/io/questdb/test/PropServerConfigurationTest.java
  • core/src/main/java/io/questdb/griffin/engine/orderby/LongSortedLightRecordCursor.java

@bluestreak01 bluestreak01 changed the title perf(sql): add encoded sort for ORDER BY perf(sql): improve performance of order-by on TCA queries Mar 10, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
core/src/test/java/io/questdb/test/griffin/OrderByEncodeSortTest.java (1)

125-134: Please normalize the new SQL fixtures to the test conventions.

These setup/query strings are still using lowercase keywords and manual string concatenation. Converting them to multiline strings with UPPERCASE SQL will make the new cases much easier to scan and keep consistent with the rest of the suite. As per coding guidelines "In SQL statements within test code, use UPPERCASE for SQL keywords (CREATE TABLE, INSERT, SELECT ... AS ... FROM, etc.)" and "Use multiline strings for longer statements (multiple INSERT rows, complex queries) and to assert multiline query results".

Also applies to: 172-181, 322-331, 369-378, 416-425, 463-472, 702-713, 738-749, 787-796, 834-843, 894-903

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@core/src/test/java/io/questdb/test/griffin/OrderByEncodeSortTest.java` around
lines 125 - 134, Normalize the SQL fixtures in OrderByEncodeSortTest by
replacing the concatenated lowercase strings with Java text-block multiline
strings and using UPPERCASE for SQL keywords; for example replace the query
string "select * from x order by a asc;" with a multiline text block "SELECT *
FROM x ORDER BY a ASC;" and replace the long CREATE TABLE ... AS (...)
concatenation with a single multiline text block where keywords like CREATE
TABLE, AS, SELECT, CAST, CASE, WHEN, THEN, ELSE, END, FROM are uppercased and
the expression is formatted across lines for readability. Apply this same
conversion pattern to the other occurrences called out in the review (the blocks
around lines 172-181, 322-331, 369-378, 416-425, 463-472, 702-713, 738-749,
787-796, 834-843, 894-903) so all SQL fixtures in OrderByEncodeSortTest use
multiline text blocks and UPPERCASE keywords.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@core/src/main/c/share/ooo.cpp`:
- Around line 1563-1575: The JNI boundary function
Java_io_questdb_std_Vect_sortEncodedEntries currently uses assert in the default
switch branch which is disabled in release builds; replace that with explicit
validation: detect unsupported keyLongs in
Java_io_questdb_std_Vect_sortEncodedEntries (before or inside the switch
default), and call a JNI throwable (e.g., env->ThrowNew for
IllegalArgumentException or RuntimeException) with a clear message mentioning
keyLongs and/or call abort() if you want a hard fail, then return immediately;
ensure references to sort_encoded_impl<...> remain unchanged and that the
function does not fall through silently when keyLongs is invalid.

In `@core/src/main/c/share/ooo.h`:
- Around line 37-45: The header ooo.h must be made self-contained: add the
necessary includes before the inline clzll(uint64_t x) definition — specifically
include <stdint.h> (for uint64_t), <type_traits> (for std::integral_constant
usage elsewhere), and on MSVC include <intrin.h> (for _BitScanReverse64) guarded
by the MSVC detection macro; then keep the existing clzll, _BitScanReverse64
usage, and range_bits() changes intact so the header no longer relies on include
order.

In `@core/src/main/java/io/questdb/griffin/engine/orderby/SortKeyEncoder.java`:
- Around line 121-124: The init() method is freeing the encoder-owned field
tempList (calling Misc.free(tempList)), which leads to double-free/idempotency
issues since close() also frees tempList; remove the free of the tempList field
from init() and either (a) allocate any scratch list as a local variable inside
init() if it should be ephemeral, or (b) keep tempList alive and only free it in
close() alongside rankMaps (see tempList, init(), close(), and rankMaps usage);
apply the same correction for the other init/cleanup occurrences noted (around
the 135-168 region).

---

Nitpick comments:
In `@core/src/test/java/io/questdb/test/griffin/OrderByEncodeSortTest.java`:
- Around line 125-134: Normalize the SQL fixtures in OrderByEncodeSortTest by
replacing the concatenated lowercase strings with Java text-block multiline
strings and using UPPERCASE for SQL keywords; for example replace the query
string "select * from x order by a asc;" with a multiline text block "SELECT *
FROM x ORDER BY a ASC;" and replace the long CREATE TABLE ... AS (...)
concatenation with a single multiline text block where keywords like CREATE
TABLE, AS, SELECT, CAST, CASE, WHEN, THEN, ELSE, END, FROM are uppercased and
the expression is formatted across lines for readability. Apply this same
conversion pattern to the other occurrences called out in the review (the blocks
around lines 172-181, 322-331, 369-378, 416-425, 463-472, 702-713, 738-749,
787-796, 834-843, 894-903) so all SQL fixtures in OrderByEncodeSortTest use
multiline text blocks and UPPERCASE keywords.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 1554c663-855f-4b69-b238-7028bae1a2a2

📥 Commits

Reviewing files that changed from the base of the PR and between 057de68 and 10efb50.

⛔ Files ignored due to path filters (4)
  • core/src/main/resources/io/questdb/bin/darwin-aarch64/libquestdb.dylib is excluded by !**/*.dylib
  • core/src/main/resources/io/questdb/bin/linux-aarch64/libquestdb.so is excluded by !**/*.so
  • core/src/main/resources/io/questdb/bin/linux-x86-64/libquestdb.so is excluded by !**/*.so
  • core/src/main/resources/io/questdb/bin/windows-x86-64/libquestdb.dll is excluded by !**/*.dll
📒 Files selected for processing (7)
  • core/src/main/c/share/ooo.cpp
  • core/src/main/c/share/ooo.h
  • core/src/main/java/io/questdb/cairo/CairoConfiguration.java
  • core/src/main/java/io/questdb/griffin/engine/orderby/SortKeyEncoder.java
  • core/src/main/java/io/questdb/griffin/engine/orderby/SortKeyType.java
  • core/src/main/java/io/questdb/std/Vect.java
  • core/src/test/java/io/questdb/test/griffin/OrderByEncodeSortTest.java
🚧 Files skipped from review as they are similar to previous changes (2)
  • core/src/main/java/io/questdb/griffin/engine/orderby/SortKeyType.java
  • core/src/main/java/io/questdb/cairo/CairoConfiguration.java

@bluestreak01 bluestreak01 added SQL Issues or changes relating to SQL execution Performance Performance improvements tandem labels Mar 11, 2026
kafka1991 and others added 9 commits March 12, 2026 15:51
# Conflicts:
#	core/src/main/resources/io/questdb/bin/linux-aarch64/libquestdb.so
#	core/src/main/resources/io/questdb/bin/windows-x86-64/libquestdb.dll
@bluestreak01 bluestreak01 changed the title perf(sql): improve performance of order-by on TCA queries perf(sql): improve ORDER BY performance for fixed-width column types Mar 12, 2026
@glasstiger
Copy link
Copy Markdown
Contributor

[PR Coverage check]

😍 pass : 483 / 503 (96.02%)

file detail

path covered line new line coverage
🔵 io/questdb/griffin/engine/orderby/EncodedSortLightRecordCursorFactory.java 2 4 50.00%
🔵 io/questdb/griffin/engine/orderby/EncodedSortRecordCursorFactory.java 19 23 82.61%
🔵 io/questdb/griffin/engine/orderby/EncodedSortRecordCursor.java 89 93 95.70%
🔵 io/questdb/griffin/engine/orderby/SortKeyType.java 21 22 95.45%
🔵 io/questdb/griffin/engine/orderby/EncodedSortLightRecordCursor.java 85 88 96.59%
🔵 io/questdb/griffin/engine/orderby/SortKeyEncoder.java 256 262 97.71%
🔵 io/questdb/PropertyKey.java 1 1 100.00%
🔵 io/questdb/griffin/SqlCodeGenerator.java 8 8 100.00%
🔵 io/questdb/cairo/CairoConfigurationWrapper.java 1 1 100.00%
🔵 io/questdb/PropServerConfiguration.java 1 1 100.00%

@bluestreak01 bluestreak01 merged commit 68939d7 into master Mar 13, 2026
49 checks passed
@bluestreak01 bluestreak01 deleted the sort_opt branch March 13, 2026 01:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Performance Performance improvements SQL Issues or changes relating to SQL execution tandem

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants