Skip to content

perf(core): scale symbol capacity automatically#6149

Merged
ideoma merged 69 commits intomasterfrom
vi_scale_sym_cap
Oct 2, 2025
Merged

perf(core): scale symbol capacity automatically#6149
ideoma merged 69 commits intomasterfrom
vi_scale_sym_cap

Conversation

@bluestreak01
Copy link
Copy Markdown
Member

@bluestreak01 bluestreak01 commented Sep 16, 2025

This PR introduces automatic capacity management for symbols.

Previously, symbol capacity had to be manually tracked, which led to performance degradation when the symbol map was undersized—especially in high-cardinality cases. With this change, capacity is resized dynamically as cardinality grows, ensuring consistent system performance without manual intervention.

Technical details

  • Min Java language level is 17 now. Unlocks string blocks (less verbose tests), extended "switch" statements, record classes etc.
  • symbol capacity is increased 2x every time symbol count crosses 80% symbol capacity threshold
  • symbol capacity is check if and updated on every commit, only if conditions are met. This is done as a part of "housekeeping"; same place as TTL etc
  • fixed bug in symbol capacity change logic, which could segfault if symbol table is over 4k in size
  • there is added risk that capacity could be changed frequently, which we have not done before. There is circuit breaker config flag, which disables this behaviour without having to restart the database
  • default Java language (in pom.xml) has been moved to 17. This is the version we use in CI and release, we lagged behind with Java 11. The new 17 unlocks some nice syntax perks
  • added a fuzz test to effectively stress adding high-cardinality symbol values and see what breaks
  • discovered that we were grinding the OS with syscalls due to tiny (4k) page size. When you resize to 40m symbols 4k page is being felt. Made a change to map the whole file and set the page to the "floor" power of 2 of the file size
  • introduced floorPow2 counterpart for ceilPow2 + tests
  • fixed now new bugs in parallel copy, which was not using column version API to open files. As symbol maps began to "move", copy fell its face
  • added a new virtual entry in the partition table for symbol table versions (per column) @ideoma's idea. This now allows us not to hardlink symbol columns in each partition when we change symbol capacity. So the partitions remain unchanged.

Risks

When auto-scaling is used and triggered (user is not in control when it is triggered, may be indirectly) it would make it impossible to rollback QuestDB to the previous versions. This is because symbol column file names will not be recognised by old QuestDB. Manual intervention will be required - basically renaming files back to what older QuestDB expects.

WIP:

  • fix tests that expect certain behaviour when file facade fails
  • fix "copy" SQL ignoring column versions and failing as a result
  • investigate incorrect "copy" command behaviour around column versions
  • clean up code (there are some temp code to make things work, extra allocations etc)
  • introduce configuration for the scaling logic
  • add tests to prevent regression

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Sep 16, 2025

Important

Review skipped

Auto reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

The PR introduces symbol-table version partitioning and switches symbol metadata lookups to “symbol table name txn”. It adds symbol capacity auto-scaling and replaces symbol table append page size with min/max allocation settings. Multiple APIs are adjusted accordingly across readers/writers, CSV import, line TCP symbol cache, and tests. Java 17 is enabled.

Changes

Cohort / File(s) Summary
Build & housekeeping
core/pom.xml
Set Java target to 17; update Maven compiler plugin; adjust activation/profile ids.
Ignore patterns
.gitignore
Ignore additional build/test artifacts across Rust, Python, C#, Node.js, PHP, Go.
Symbol-table version partition + txn wiring
core/src/main/java/io/questdb/cairo/ColumnVersionReader.java, .../ColumnVersionWriter.java, .../DatabaseCheckpointAgent.java, .../TableReader.java, .../WalWriter.java, .../MetadataCache.java, .../TableWriter.java, .../O3PartitionJob.java, .../RebuildColumnBase.java, core/src/main/java/io/questdb/griffin/ConvertOperatorImpl.java
Add SYMBOL_TABLE_VERSION_PARTITION; add/get and use symbol-table name txn APIs; route symbol rebuild/reads/writes to symbol-table txn; expose columnVersionReader in writers; adjust truncate/upsert for symbol-table txn; update call sites.
Symbol map components
core/src/main/java/io/questdb/cairo/MapWriter.java, .../vm/NullMapWriter.java, .../SymbolMapWriter.java, .../SymbolMapReaderImpl.java, .../SymbolMapUtil.java
Add getColumnIndex to writers; extend SymbolMapWriter ctor and sizing via calculateExtendSegmentSize; switch SymbolMapReaderImpl to Utf8Sequence path; adjust open/rebuild and size derivation; add extend size helper; minor API additions.
Config: symbol capacity auto-scale + allocation sizing
core/src/main/java/io/questdb/cairo/CairoConfiguration.java, .../CairoConfigurationWrapper.java, .../DefaultCairoConfiguration.java, core/src/main/java/io/questdb/PropServerConfiguration.java, core/src/main/java/io/questdb/PropertyKey.java, core/src/main/resources/io/questdb/site/conf/server.conf
Add autoScaleSymbolCapacity and threshold; replace getSymbolTableAppendPageSize with min/max allocation getters; wire new properties and defaults; add new PropertyKey enums; config file entries updated.
Line TCP symbol cache + table update plumbing
core/src/main/java/io/questdb/cutlass/line/tcp/SymbolCache.java, .../TableUpdateDetails.java
Rework SymbolCache.of signature; track denseSymbolIndex, pathToTableDir, ColumnVersionReader; reload decision based on symbol-table name txn; ThreadLocalDetails manages ColumnVersionReader lifecycle.
CSV importer symbol merge/index lookups
core/src/main/java/io/questdb/cutlass/text/CopyTask.java, .../ParallelCsvFileImporter.java
Replace symbol index mapping with denseSymbolIndex + column index; pass symbol-table name txn from ColumnVersionReader; adapt Phase classes and public task methods; add TableStructureAdapter helpers.
VM/memory + utility tweaks
core/src/main/java/io/questdb/cairo/vm/MemoryCMARWImpl.java, .../Vm.java, core/src/main/java/io/questdb/cairo/TableUtils.java
Rename extendSegmentSize param to extendSegmentSizePow2; update forwarding; reduce MIN_INDEX_VALUE_BLOCK_SIZE.
Numbers/NumericException
core/src/main/java/io/questdb/std/Numbers.java, .../NumericException.java
Add floorPow2 (int/long); ceilPow2 overflow guard; enhance error messages; require positive isPow2; add NumericException.put(cs, lo, hi).
Misc minor changes
core/src/main/java/io/questdb/cutlass/http/processors/LineHttpProcessorState.java, core/src/main/java/io/questdb/cutlass/line/AbstractLineSender.java, core/src/main/java/io/questdb/cairo/IDGeneratorFactory.java
Make sink final; switch to isEmpty and enhanced switch; make uniqueIdMem volatile.
Tests adjusted for new APIs/config
core/src/test/java/io/questdb/test/AbstractCairoTest.java, .../cairo/ColumnVersionWriterTest.java, .../cairo/FullFwdPartitionFrameCursorTest.java, .../cairo/SymbolMapTest.java, .../cairo/TableNameRegistryTest.java, .../cairo/TableWriterTest.java, .../cairo/fuzz/DedupInsertFuzzTest.java, .../cairo/mv/MatViewIdenticalReplaceTest.java, .../cairo/wal/DedupWalWriterTest.java, .../cutlass/line/tcp/SymbolCacheTest.java, .../cutlass/line/tcp/LineTcpSenderTest.java, .../cutlass/text/ParallelCsvFileImporterTest.java, .../griffin/CopyTest.java, .../griffin/SqlParserTest.java, .../std/NumbersTest.java, .../PropServerConfigurationTest.java, .../ServerMainTest.java, .../cairo/fuzz/WalWriterFuzzTest.java
Update method signatures/constructors, expectations, and configs to new symbol-table txn, capacity auto-scale, min/max allocation settings; add new fuzz test; add Numbers floorPow2 coverage; various formatting/text-block migrations.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant TW as TableWriter
  participant CFG as CairoConfiguration
  participant SMW as SymbolMapWriter
  participant ST as SymbolTable Files

  Note over TW,CFG: Housekeeping: symbol capacity auto-scaling
  TW->>CFG: autoScaleSymbolCapacity(), autoScaleSymbolCapacityThreshold()
  alt Auto-scaling enabled and occupancy >= threshold
    TW->>SMW: getSymbolCount(), getCapacity()
    TW->>SMW: resizeCapacity(2x)
    SMW->>ST: extend files (min/max alloc sizing)
  else No action
    Note over TW: Keep current capacities
  end
Loading
sequenceDiagram
  autonumber
  participant Writer as TableWriter/WalWriter
  participant CVR as ColumnVersionReader
  participant SCR as SymbolCache (Line TCP)
  participant SMR as SymbolMapReaderImpl

  Note over Writer,CVR: Symbol-table name txn partition
  Writer->>CVR: getSymbolTableNameTxn(colIdx)
  Writer->>SMR: open(pathToTableDir, columnName, symbolTableNameTxn)

  Note over SCR,CVR: Reload decision on symbol work
  SCR->>CVR: getSymbolTableNameTxn(colIdx)
  alt txn changed
    SCR->>SMR: reopen with new txn
  else same txn
    SCR->>SMR: updateSymbolCount()
  end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Possibly related PRs

Suggested labels

Performance, Core, Enhancement

Suggested reviewers

  • puzpuzpuz
  • jerrinot
  • mtopolnik

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 4.84% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Title Check ✅ Passed The title “perf(core): scale symbol capacity automatically” clearly summarizes the primary change of the pull request—implementing automatic scaling of symbol capacity—and uses a concise conventional commit style that makes the intent and scope immediately understandable to reviewers.
Description Check ✅ Passed The pull request description clearly outlines the purpose and scope of the changes, including automatic symbol capacity scaling, the Java version bump, bug fixes, and new tests, all of which correspond directly to the code modifications summarized in the PR; it is on-topic and adequately describes the changeset.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@bluestreak01
Copy link
Copy Markdown
Member Author

@CodeRabbit review

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Sep 23, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

1 similar comment
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Sep 23, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 10

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (5)
core/src/test/java/io/questdb/test/cairo/TableNameRegistryTest.java (1)

287-289: WalPurgeJob thread stops early; keep it alive until done.

As written, the loop exits on first empty run even if done is false, reducing concurrent stress and risking missed purges during the test window.

Apply this diff:

-                    //noinspection StatementWithEmptyBody
-                    while (!done.get() && job.run(0)) {
-                        // run until empty
-                    }
+                    while (!done.get()) {
+                        // drain until empty
+                        while (job.run(0)) {
+                            // keep draining
+                        }
+                        Os.pause();
+                    }
core/src/test/java/io/questdb/test/griffin/SqlParserTest.java (1)

536-561: Test no longer validates time units; loop now pointless.

testACRangeFrameAcceptsTimeUnits iterates over unitsAndValues but the input SQL strings (2nd arg to assertQuery) no longer include the unit, so we don't actually exercise unit parsing. The .replace("#unit", expectedUnit) is applied to the expected string (which doesn't contain #unit) and is a no‑op. Net effect: coverage regression; the test would pass even if time units were rejected.

Fix by moving #unit into the input SQL and dropping the redundant .replace on the expected string.

Apply this diff:

-            String expectedUnit = unitsAndValues[i].replaceAll("s$", "");
-            assertQuery(
-                    ("select-window a, b, f(c) f over (partition by b order by ts range between 10 preceding and current row exclude no others) " +
-                            "from (select-choose [a, b, c, ts] a, b, c, ts from (select [a, b, c, ts] from xyz timestamp (ts)))")
-                            .replace("#unit", expectedUnit),
-                    "select a,b, f(c) over (partition by b order by ts range 10 preceding) from xyz",
+            String expectedUnit = unitsAndValues[i].replaceAll("s$", "");
+            assertQuery(
+                    "select-window a, b, f(c) f over (partition by b order by ts range between 10 preceding and current row exclude no others) " +
+                    "from (select-choose [a, b, c, ts] a, b, c, ts from (select [a, b, c, ts] from xyz timestamp (ts)))",
+                    ("select a,b, f(c) over (partition by b order by ts range 10 #unit preceding) from xyz")
+                            .replace("#unit", expectedUnit),
                     modelOf("xyz")
                         .col("a", ColumnType.INT)
                         .col("b", ColumnType.INT)
                         .col("c", ColumnType.INT)
                         .timestamp("ts")
             );
 
-            assertQuery(
-                    ("select-window a, b, f(c) f over (partition by b order by ts range between 10 preceding and 1 following exclude no others) " +
-                            "from (select-choose [a, b, c, ts] a, b, c, ts from (select [a, b, c, ts] from xyz timestamp (ts)))")
-                            .replace("#unit", expectedUnit),
-                    "select a,b, f(c) over (partition by b order by ts range between 10 preceding and 1 following) from xyz",
+            assertQuery(
+                    "select-window a, b, f(c) f over (partition by b order by ts range between 10 preceding and 1 following exclude no others) " +
+                    "from (select-choose [a, b, c, ts] a, b, c, ts from (select [a, b, c, ts] from xyz timestamp (ts)))",
+                    ("select a,b, f(c) over (partition by b order by ts range between 10 #unit preceding and 1 #unit following) from xyz")
+                            .replace("#unit", expectedUnit),
                     modelOf("xyz")
                         .col("a", ColumnType.INT)
                         .col("b", ColumnType.INT)
                         .col("c", ColumnType.INT)
                         .timestamp("ts")
             );
core/src/test/java/io/questdb/test/cairo/fuzz/DedupInsertFuzzTest.java (2)

794-799: Use timestampColumnName in both bounds (not hard-coded "ts").

Prevents breakage if timestamp column name differs.

-            String insertSql = "insert into " + tableNameDedup +
-                    " select * from " + tableNameDedup +
-                    " where " + timestampColumnName + " >= " + fromTs + " and ts < " + toTs
-                    + " LIMIT 100000";
+            String insertSql = "insert into " + tableNameDedup +
+                    " select * from " + tableNameDedup +
+                    " where " + timestampColumnName + " >= " + fromTs + " and " + timestampColumnName + " < " + toTs
+                    + " LIMIT 100000";

908-913: Syntax error: extra ')' in ALTER TABLE statement.

The generated SQL has an unmatched closing parenthesis.

-                String alterStatement = String.format(
-                        "alter table %s dedup upsert keys(%s))",
+                String alterStatement = String.format(
+                        "alter table %s dedup upsert keys(%s)",
                         tableNameDedup,
                         comaSeparatedUpsertCols
                 );
core/src/main/java/io/questdb/cairo/TableWriter.java (1)

1152-1167: Use the updated symbol-table txn when rebuilding capacity

After hardLinkAndPurgeSymbolTableFiles() you upsert the symbol-table name txn. Rebuilding with a stale txn risks writing to/reading from the wrong file set if txn changes across operations. Use the current symbol-table txn from ColumnVersionWriter.

@@
-                oldSymbolWriter.rebuildCapacity(
-                        configuration,
-                        path,
-                        columnName,
-                        columnNameTxn,
-                        newSymbolCapacity,
-                        symbolCacheFlag
-                );
+                long newSymTableTxn = columnVersionWriter.getSymbolTableNameTxn(columnIndex);
+                oldSymbolWriter.rebuildCapacity(
+                        configuration,
+                        path,
+                        columnName,
+                        newSymTableTxn,
+                        newSymbolCapacity,
+                        symbolCacheFlag
+                );

Also applies to: 1182-1185

🧹 Nitpick comments (37)
.gitignore (1)

38-39: Reconsider ignoring Cargo.lock; keep for reproducible Rust builds.

For Rust binaries/tools, committing Cargo.lock is recommended to ensure deterministic builds. If parquet-tools is a binary crate (likely), remove the ignore; if it’s a library, keeping it ignored is fine—please confirm intent. Also, align trailing slash style with nearby entries for consistency.

Apply this diff if you decide to keep the lockfile and normalize the path style:

-core/rust/qdbr/parquet2/parquet-tools/target/
-core/rust/qdbr/parquet2/parquet-tools/Cargo.lock
+core/rust/qdbr/parquet2/parquet-tools/target
core/src/test/java/io/questdb/test/griffin/CopyTest.java (1)

1167-1168: Make result ordering explicit to avoid future flakiness.

Relying on implicit row order + negative LIMIT can become brittle if planner/storage details change. Consider ordering by the timestamp.

Apply if it matches current semantics:

-                "select line,ts,d,description from x limit -10"
+                "select line,ts,d,description from x order by ts limit -10"

Please run the test suite to verify negative LIMIT behavior with ORDER BY remains consistent in your QuestDB version.

core/src/test/java/io/questdb/test/cairo/TableNameRegistryTest.java (2)

135-139: Also tolerate "could not remove table" in testConcurrentCreateDrop to avoid flakiness.

The same transient condition can surface here during concurrent drops.

Apply this diff:

-                                    if (!Chars.contains(e.getFlyweightMessage(), "table does not exist")
-                                            && !Chars.contains(e.getFlyweightMessage(), "could not lock")
-                                            && !Chars.contains(e.getFlyweightMessage(), "table name is reserved")) {
+                                    if (!Chars.contains(e.getFlyweightMessage(), "table does not exist")
+                                            && !Chars.contains(e.getFlyweightMessage(), "could not lock")
+                                            && !Chars.contains(e.getFlyweightMessage(), "table name is reserved")
+                                            && !Chars.contains(e.getFlyweightMessage(), "could not remove table")) {
                                         throw e;
                                     }

210-210: Deterministic RNG seeds: consider hoisting to named constants.

Improves readability and reuse across tests.

Example outside this hunk:

private static final long SEED1 = 3095520132513L;
private static final long SEED2 = 1758309268883L;
...
Rnd rnd = TestUtils.generateRandom(LOG, SEED1, SEED2);
core/pom.xml (1)

35-35: Enforce JDK 17, upgrade compiler plugin, and switch to to avoid build/toolchain skew

3.8.1 can compile with JDK 17 (it delegates to javac), but upgrade to a recent maven-compiler-plugin (3.10.1+; 3.11.0 shown) and use while updating the Java profile to require JDK 17.

@@
-                <artifactId>maven-compiler-plugin</artifactId>
-                <version>3.8.1</version>
+                <artifactId>maven-compiler-plugin</artifactId>
+                <version>3.11.0</version>
@@
-                    <source>${javac.compile.source}</source>
-                    <target>${javac.compile.target}</target>
+                    <release>${javac.compile.source}</release>
@@
-        <profile>
-            <id>java11+</id>
+        <profile>
+            <id>java11+</id> <!-- keep id if referenced externally -->
             <properties>
-                <jdk.version>11</jdk.version>
-                <java.enforce.version>11</java.enforce.version>
+                <jdk.version>17</jdk.version>
+                <java.enforce.version>17</java.enforce.version>
@@
-            <activation>
-                <jdk>(1.8,)</jdk>
-            </activation>
+            <activation>
+                <jdk>(16,)</jdk> <!-- require 17+ -->
+            </activation>
core/src/test/java/io/questdb/test/cairo/mv/MatViewIdenticalReplaceTest.java (2)

174-176: Avoid regex in timestamp replacement

replaceAll(".000000Z", ".000000000Z") treats . as “any char”. Use a literal replacement to prevent accidental matches and avoid regex overhead.

Apply this diff:

-    return ColumnType.isTimestampMicro(timestampType.getDriver().getTimestampType()) ? expected : expected.replaceAll(".000000Z", ".000000000Z");
+    return ColumnType.isTimestampMicro(timestampType.getDriver().getTimestampType())
+            ? expected
+            : expected.replace(".000000Z", ".000000000Z");

169-172: Minor: use literal replace, not replaceAll, for #TIMESTAMP

No regex needed here; using replace avoids regex semantics and is a tad faster.

-        sqlText = sqlText.toString().replaceAll("#TIMESTAMP", timestampType.getTypeName());
+        sqlText = sqlText.toString().replace("#TIMESTAMP", timestampType.getTypeName());
core/src/test/java/io/questdb/test/cairo/FullFwdPartitionFrameCursorTest.java (2)

544-545: Parallel YEAR tests: same 1972.3 suffix — apply the same de‑brittling

Mirror the constant/API approach here to keep the suite consistent and reduce future churn.

Apply the same constant:

-        testParallelIndexFailureAtRuntime(PartitionBy.YEAR, 10000000L * 30 * 12, false, "1972.3" + Files.SEPARATOR + "a.v", 2);
+        testParallelIndexFailureAtRuntime(PartitionBy.YEAR, 10000000L * 30 * 12, false, YEAR_1972_DIR + Files.SEPARATOR + "a.v", 2);
@@
-        testParallelIndexFailureAtRuntime(PartitionBy.YEAR, 10000000L * 30 * 12, false, "1972.3" + Files.SEPARATOR + "b.v", 2);
+        testParallelIndexFailureAtRuntime(PartitionBy.YEAR, 10000000L * 30 * 12, false, YEAR_1972_DIR + Files.SEPARATOR + "b.v", 2);
@@
-        testParallelIndexFailureAtRuntime(PartitionBy.YEAR, 10000000L * 30 * 12, false, "1972.3" + Files.SEPARATOR + "c.v", 2);
+        testParallelIndexFailureAtRuntime(PartitionBy.YEAR, 10000000L * 30 * 12, false, YEAR_1972_DIR + Files.SEPARATOR + "c.v", 2);

Also applies to: 549-550, 554-555


234-235: Centralize YEAR partition dir "1972.3" — avoid brittle name‑txn literals

Multiple tests hard-code "1972.3" (core/src/test/java/io/questdb/test/cairo/FullFwdPartitionFrameCursorTest.java:234, 239, 244, 544, 549, 554). If ".3" is a name‑txn suffix (not derivable from the timestamp), replace these literals with a single constant (e.g. private static final String YEAR_1972_DIR = "1972.3") or, preferably, derive the partition directory via the partition-name API so tests don’t break when name‑txn sequencing changes.

core/src/test/java/io/questdb/test/cutlass/line/tcp/LineTcpSenderTest.java (1)

59-59: Drop redundant non-static JUnit import and use existing static assert.

You already have import static org.junit.Assert.*;. Remove the direct org.junit.Assert import and switch the call site to assertEquals.

Apply this diff:

-import org.junit.Assert;
...
-                Assert.assertEquals(0, errorCount.get());
+                assertEquals(0, errorCount.get());

Also applies to: 1023-1023

core/src/test/java/io/questdb/test/cairo/fuzz/DedupInsertFuzzTest.java (3)

764-771: Switch expression looks good; add a diagnostic for unsupported types.

Throwing an informative error helps test triage.

Apply this diff:

-        return switch (colType) {
+        return switch (colType) {
             case ColumnType.SYMBOL -> rec.getSymA(2);
             case ColumnType.STRING -> rec.getStrA(2);
             case ColumnType.VARCHAR -> Utf8s.toString(rec.getVarcharA(2));
-            default -> throw new IllegalArgumentException();
+            default -> throw new IllegalArgumentException("Unsupported column type: " + ColumnType.nameOf(colType));
         };

641-645: Compute dupStep once per transaction, not per operation.

Reduces randomness noise and minor CPU churn inside the loop.

-                for (int op = 0; op < size; op++) {
-                    int dupStep = Math.max(2, transaction.operationList.size() / (1 + rnd.nextInt(10)));
+                int dupStep = Math.max(2, size / (1 + rnd.nextInt(10)));
+                for (int op = 0; op < size; op++) {

889-891: Typo: rename variable to commaSeparatedUpsertCols.

Improves readability.

-            String comaSeparatedUpsertCols;
+            String commaSeparatedUpsertCols;

And update its usages accordingly.

core/src/main/java/io/questdb/std/NumericException.java (1)

93-96: Clarify nullability for substring put()

Align with other overloads. If null is not allowed here, annotate parameter as @NotNull to prevent accidental NPEs downstream.

-    public NumericException put(CharSequence cs, int lo, int hi) {
+    public NumericException put(@NotNull CharSequence cs, int lo, int hi) {
         message.put(cs, lo, hi);
         return this;
     }
core/src/main/java/io/questdb/std/Numbers.java (3)

670-687: Simplify floorPow2 via Long.highestOneBit

Same semantics, clearer and branchless. JVM intrinsic on modern HotSpot.

-    public static long floorPow2(long value) {
-        if (value <= 0) {
-            return 0;
-        }
-        long v = value;
-        v |= v >>> 1;
-        v |= v >>> 2;
-        v |= v >>> 4;
-        v |= v >>> 8;
-        v |= v >>> 16;
-        v |= v >>> 32;
-        return v - (v >>> 1);
-    }
+    public static long floorPow2(long value) {
+        return value <= 0 ? 0 : Long.highestOneBit(value);
+    }

946-949: Empty hex substring check should use hi <= lo

The current check hi == 0 fails for non‑zero lo (e.g., lo==5, hi==5). Use hi <= lo to catch all empty slices.

-        if (hi == 0) {
+        if (hi <= lo) {
             throw NumericException.instance().put("empty hex string");
         }

Also applies to: 966-969, 982-985


1498-1525: Error messages in parseLongSize use “duration” instead of “size”

Nit but can mislead operators. Use consistent wording with int variant.

-                                throw NumericException.instance().put("duration overflow");
+                                throw NumericException.instance().put("size overflow");
@@
-                                throw NumericException.instance().put("duration overflow");
+                                throw NumericException.instance().put("size overflow");
@@
-                                throw NumericException.instance().put("duration overflow");
+                                throw NumericException.instance().put("size overflow");
@@
-                throw NumericException.instance().put("invalid duration format");
+                throw NumericException.instance().put("invalid size format");
core/src/test/java/io/questdb/test/std/NumbersTest.java (1)

331-364: Floor/Ceil relationship tests

These lock in the new ceilPow2 saturation at 2^62. Consider adding a small test clarifying behavior for negatives (e.g., value < 1), even if only to assert current behavior.

core/src/main/java/io/questdb/cutlass/text/ParallelCsvFileImporter.java (1)

1187-1204: Fail fast if symbol indices cannot be resolved

Both indices can be -1; add a quick guard to surface configuration mismatches early.

-                int tempTableDenseSymbolIndex = targetTableStructure.getDenseSymbolIndex(symbolColumnName);
-                int tempTableColumnIndex = targetTableStructure.getColumnIndex(symbolColumnName);
+                int tempTableDenseSymbolIndex = targetTableStructure.getDenseSymbolIndex(symbolColumnName);
+                int tempTableColumnIndex = targetTableStructure.getColumnIndex(symbolColumnName);
+                assert tempTableDenseSymbolIndex >= 0 && tempTableColumnIndex >= 0
+                        : "Unresolved symbol column in temp tables: " + symbolColumnName;
core/src/main/java/io/questdb/cutlass/text/CopyTask.java (1)

709-728: Use TableUtils helpers or COLUMN_NAME_TXN_NONE when appending txn suffix

In core/src/main/java/io/questdb/cutlass/text/CopyTask.java (PhaseUpdateSymbolKeys.run, ~lines 720–728) you do if (columnNameTxn != -1) { path.put('.').put(columnNameTxn); }. Use the same convention as TableUtils (either call TableUtils.dFile/iFile or check columnNameTxn > TableUtils.COLUMN_NAME_TXN_NONE) to keep suffix logic consistent and avoid edge-case mismatches.

core/src/test/java/io/questdb/test/cairo/SymbolMapTest.java (1)

106-108: Replace magic -1 with a named constant or helper to document semantics.

Every new SymbolMapWriter(...) call adds NOOP_COLLECTOR and a trailing -1. The sentinel is unclear and repeated across the file. Introduce a self-explanatory constant (e.g., SYMBOL_TABLE_TXN_NONE or COLUMN_INDEX_UNSET), or a local factory method newWriter(path, name, start) that hides these fixed args to improve readability and reduce copy/paste risk. Applies to all occurrences listed.

Also applies to: 128-130, 191-193, 265-267, 304-305, 328-329, 358-360, 395-397, 421-422, 444-445, 467-468, 492-494, 513-514, 536-537, 575-576, 598-599, 622-623, 694-695, 749-758, 820-822, 864-865, 925-926, 945-946, 971-972, 995-996, 1028-1029, 1072-1073, 1123-1124

core/src/main/java/io/questdb/cairo/ColumnVersionWriter.java (1)

302-305: Fix misleading comment in upsertSymbolTableTxnName().

The comment talks about storing the added-partition timestamp in columnTop, but here columnTop is always 0 for the symbol-table partition. Update the comment to reflect actual behavior.

Apply this wording tweak:

-        // When table is partitioned, use columnTop place to store the timestamp of the partition where the column added
+        // Store the column name txn for the symbol-table partition; columnTop is unused here and set to 0.
core/src/main/java/io/questdb/cutlass/line/tcp/SymbolCache.java (2)

121-123: Clear temp sink before UTF‑8→UTF‑16 conversion.

Without clearing, values may concatenate depending on Utf8s.utf8ToUtf16Unchecked behavior.

Apply this diff:

-        Utf8s.utf8ToUtf16Unchecked(value, tempSink);
+        tempSink.clear();
+        Utf8s.utf8ToUtf16Unchecked(value, tempSink);

131-156: Avoid relying on external immutability of pathToTableDir.

The comment admits fragility. Store a defensive copy to prevent accidental mutation by owners.

Apply this diff:

-        this.pathToTableDir = pathToTableDir;
+        // Defensive copy to decouple from owner mutations
+        this.pathToTableDir = Utf8String.newInstance(pathToTableDir);
core/src/main/java/io/questdb/cairo/ColumnVersionReader.java (1)

221-235: API for symbol-table txn with sane fallback; fix Javadoc tag.

Method and fallback to default txn are correct. Minor: use {@link #getDefaultColumnNameTxn} instead of {@see ...} to avoid Javadoc warnings.

core/src/test/java/io/questdb/test/cutlass/text/ParallelCsvFileImporterTest.java (2)

1582-1590: Typo in test name: Typ → Type.

Nit: rename to testImportTimestampTypeFormatMismatch for consistency/searchability.

-    public void testImportTimestampTypFormatMismatch() {
+    public void testImportTimestampTypeFormatMismatch() {

3517-3519: Public nested test interface likely unnecessary.

TextImportRunnable needn’t be public inside this test class; package‑private keeps surface area smaller.

-    public interface TextImportRunnable {
+    interface TextImportRunnable {
core/src/test/java/io/questdb/test/cutlass/line/tcp/SymbolCacheTest.java (4)

148-156: Column index vs symbol index: avoid off‑by‑one with explicit lookup.

You pass col + 1 (table col index accounting for timestamp) and col (symbol index). This is subtle and brittle. Prefer resolving both from metadata to avoid off‑by‑one when schema changes.

-                            symbolCache.of(
-                                    engine.getConfiguration(),
-                                    "col" + col,
-                                    col + 1,
-                                    col,
-                                    path,
-                                    new TestTableWriterAPI(rdr.getColumnVersionReader()),
-                                    txReader
-                            );
+                            final int tableColIndex = rdr.getMetadata().getColumnIndex("col" + col);
+                            final int symbolColOrdinal = txReader.unsafeSymbolColumnIndexOf(tableColIndex); // or compute from metadata if available
+                            symbolCache.of(
+                                    engine.getConfiguration(),
+                                    "col" + col,
+                                    tableColIndex,
+                                    symbolColOrdinal,
+                                    path,
+                                    new TestTableWriterAPI(rdr.getColumnVersionReader()),
+                                    txReader
+                            );

If unsafeSymbolColumnIndexOf(...) isn’t available, compute the ordinal by iterating metadata and counting SYMBOL columns up to tableColIndex.


221-227: PartitionBy mismatch with created tables.

Tables are created with PartitionBy.HOUR but TxReader.ofRO(...) is passed PartitionBy.DAY. Align these for consistency to prevent subtle test flakiness if TxReader starts validating this.

-                                TableUtils.getTimestampType(model),
-                                PartitionBy.DAY
+                                TableUtils.getTimestampType(model),
+                                PartitionBy.HOUR

Also applies to: 424-428


605-613: Recompute indices after column removal.

After writer.removeColumn("symCol1"), using hardcoded 0 assumes reindexing; safer to recompute the index for "symCol2" from metadata.

-                            "symCol2",
-                            0,
-                            0,
+                            "symCol2",
+                            writer.getColumnIndex("symCol2"),
+                            txReader.unsafeSymbolColumnIndexOf(writer.getColumnIndex("symCol2")),

If unsafeSymbolColumnIndexOf is unavailable, derive the ordinal as noted earlier.


230-238: Reduce duplication: helper to init/reinit SymbolCache.

Consider a small helper to DRY repeated cache.of(...) sequences (path building + index resolution + API construction).

+    private static void initCache(SymbolCache cache,
+                                  CairoConfiguration cfg,
+                                  Path path,
+                                  TableToken token,
+                                  String colName,
+                                  int tableColIdx,
+                                  int symOrdinal,
+                                  TableWriterAPI twa,
+                                  TxReader txReader) {
+        cache.of(cfg, colName, tableColIdx, symOrdinal, path.of(cfg.getDbRoot()).concat(token), twa, txReader);
+    }

Also applies to: 252-259, 452-461, 522-526, 605-613, 683-691, 766-774

core/src/test/java/io/questdb/test/AbstractCairoTest.java (1)

335-335: Minor typographical error in comment.

The word "adcanced" should be spelled "advanced" on Line 335.

Apply this diff to correct the typo:

-                    // intentionally calling hasNext() twice: we want to adcanced the cursor position,
+                    // intentionally calling hasNext() twice: we want to advanced the cursor position,
core/src/main/java/io/questdb/cairo/TableUtils.java (1)

123-123: Lowered MIN_INDEX_VALUE_BLOCK_SIZE: align validation and avoid magic number

Good change; to keep invariants consistent, replace the hardcoded check "index value block capacity < 2" in validateMeta() with MIN_INDEX_VALUE_BLOCK_SIZE.

-                    if (getIndexBlockCapacity(metaMem, i) < 2) {
+                    if (getIndexBlockCapacity(metaMem, i) < MIN_INDEX_VALUE_BLOCK_SIZE) {
core/src/main/java/io/questdb/cairo/MetadataCache.java (1)

303-318: Avoid nested try-with-resources on the same ColumnVersionReader

Currently you open the same AutoCloseable twice; simplify to a single TWR.

-        try (columnVersionReader) {
-            LOG.debug().$("hydrating symbol metadata [table=").$(token).$(", column=").$safe(columnName).I$();
-            // get column version
-            path.trimTo(configuration.getDbRoot().length()).concat(token);
-            final int rootLen = path.size();
-            path.concat(TableUtils.COLUMN_VERSION_FILE_NAME);
-            final long columnNameTxn;
-            final FilesFacade ff = configuration.getFilesFacade();
-            try (columnVersionReader) {
-                columnVersionReader.ofRO(ff, path.$());
-                columnVersionReader.readUnsafe();
-                columnNameTxn = columnVersionReader.getSymbolTableNameTxn(writerIndex);
-            }
+        try (columnVersionReader) {
+            LOG.debug().$("hydrating symbol metadata [table=").$(token).$(", column=").$safe(columnName).I$();
+            path.trimTo(configuration.getDbRoot().length()).concat(token);
+            final int rootLen = path.size();
+            path.concat(TableUtils.COLUMN_VERSION_FILE_NAME);
+            final FilesFacade ff = configuration.getFilesFacade();
+            columnVersionReader.ofRO(ff, path.$());
+            columnVersionReader.readUnsafe();
+            final long columnNameTxn = columnVersionReader.getSymbolTableNameTxn(writerIndex);
core/src/main/java/io/questdb/cairo/TableWriter.java (3)

5484-5553: Don’t hard-link .k/.v in table root for symbol table files

At table root, symbol tables consist of .c/.o; bitmap index .k/.v live per-partition. Attempting to link .k/.v at root is unnecessary and may be confusing. Simplify to .c/.o only.

@@
-                    linkFile(
-                            ff,
-                            keyFileName(path.trimTo(pathSize), columnName, symbolTableNameTxn),
-                            keyFileName(other.trimTo(pathSize), newName, newColumnNameTxn)
-                    );
-                    linkFile(
-                            ff,
-                            valueFileName(path.trimTo(pathSize), columnName, symbolTableNameTxn),
-                            valueFileName(other.trimTo(pathSize), newName, newColumnNameTxn)
-                    );

5571-5615: Capacity rebuild path: copy .o, link .c — good; add small robustness

This path correctly copies offsets (.o) to update header while preserving .c. Recommend fsync root dir after linking/copying (non-Windows) to reduce risk on crash between file ops, mirroring patterns elsewhere.


3591-3619: Tighten attach pinning to known column file types

attachPartitionPinColumnVersions() currently trusts any “name.ext.txn” pattern. Guard by checking ext ∈ {d,i,k,v,c,o} to avoid pinning on unrelated files that match the pattern.

@@
-            if (firstDot != -1) {
+            if (firstDot != -1) {
                 utf16Sink.clear();
                 utf16Sink.put(tmpDirectUtf8StringZ, 0, firstDot);
                 // all our column files have .d (or such) extensions
                 int columnIndex = metadata.getColumnIndexQuiet(utf16Sink);
                 if (columnIndex != -1) {
                     // not a random file, we have column by this name
                     int lastDot = Utf8s.lastIndexOfAscii(tmpDirectUtf8StringZ, '.');
                     int len = Utf8s.length(tmpDirectUtf8StringZ);
                     if (lastDot > 1 && lastDot < len - 1) {
+                        // ensure known column file extension
+                        int extStart = firstDot + 1;
+                        int extEnd = lastDot;
+                        if (!Utf8s.matchesAscii(tmpDirectUtf8StringZ, extStart, extEnd, "d")
+                                && !Utf8s.matchesAscii(tmpDirectUtf8StringZ, extStart, extEnd, "i")
+                                && !Utf8s.matchesAscii(tmpDirectUtf8StringZ, extStart, extEnd, "k")
+                                && !Utf8s.matchesAscii(tmpDirectUtf8StringZ, extStart, extEnd, "v")
+                                && !Utf8s.matchesAscii(tmpDirectUtf8StringZ, extStart, extEnd, "c")
+                                && !Utf8s.matchesAscii(tmpDirectUtf8StringZ, extStart, extEnd, "o")) {
+                            return;
+                        }
                         // we are rejecting 'abc', '.abc' and 'abc.', but accepting 'a.xxx'
                         try {
                             long nameTxn = Numbers.parseLong(tmpDirectUtf8StringZ, lastDot + 1, len);
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 38ffea4 and dd0483c.

📒 Files selected for processing (41)
  • .gitignore (2 hunks)
  • core/pom.xml (1 hunks)
  • core/src/main/java/io/questdb/cairo/ColumnVersionReader.java (3 hunks)
  • core/src/main/java/io/questdb/cairo/ColumnVersionWriter.java (2 hunks)
  • core/src/main/java/io/questdb/cairo/DatabaseCheckpointAgent.java (1 hunks)
  • core/src/main/java/io/questdb/cairo/DefaultCairoConfiguration.java (1 hunks)
  • core/src/main/java/io/questdb/cairo/MapWriter.java (1 hunks)
  • core/src/main/java/io/questdb/cairo/MetadataCache.java (1 hunks)
  • core/src/main/java/io/questdb/cairo/SymbolMapReaderImpl.java (3 hunks)
  • core/src/main/java/io/questdb/cairo/SymbolMapWriter.java (9 hunks)
  • core/src/main/java/io/questdb/cairo/TableReader.java (3 hunks)
  • core/src/main/java/io/questdb/cairo/TableUtils.java (1 hunks)
  • core/src/main/java/io/questdb/cairo/TableWriter.java (23 hunks)
  • core/src/main/java/io/questdb/cairo/TableWriterAPI.java (1 hunks)
  • core/src/main/java/io/questdb/cairo/vm/MemoryCMARWImpl.java (3 hunks)
  • core/src/main/java/io/questdb/cairo/vm/NullMapWriter.java (1 hunks)
  • core/src/main/java/io/questdb/cairo/vm/Vm.java (1 hunks)
  • core/src/main/java/io/questdb/cairo/wal/WalWriter.java (2 hunks)
  • core/src/main/java/io/questdb/cutlass/http/processors/LineHttpProcessorState.java (1 hunks)
  • core/src/main/java/io/questdb/cutlass/line/tcp/SymbolCache.java (4 hunks)
  • core/src/main/java/io/questdb/cutlass/line/tcp/TableUpdateDetails.java (1 hunks)
  • core/src/main/java/io/questdb/cutlass/text/CopyTask.java (17 hunks)
  • core/src/main/java/io/questdb/cutlass/text/ParallelCsvFileImporter.java (6 hunks)
  • core/src/main/java/io/questdb/griffin/ConvertOperatorImpl.java (1 hunks)
  • core/src/main/java/io/questdb/std/Numbers.java (53 hunks)
  • core/src/main/java/io/questdb/std/NumericException.java (1 hunks)
  • core/src/test/java/io/questdb/test/AbstractCairoTest.java (8 hunks)
  • core/src/test/java/io/questdb/test/cairo/ColumnVersionWriterTest.java (3 hunks)
  • core/src/test/java/io/questdb/test/cairo/FullFwdPartitionFrameCursorTest.java (2 hunks)
  • core/src/test/java/io/questdb/test/cairo/SymbolMapTest.java (27 hunks)
  • core/src/test/java/io/questdb/test/cairo/TableNameRegistryTest.java (4 hunks)
  • core/src/test/java/io/questdb/test/cairo/TableWriterTest.java (25 hunks)
  • core/src/test/java/io/questdb/test/cairo/fuzz/DedupInsertFuzzTest.java (5 hunks)
  • core/src/test/java/io/questdb/test/cairo/mv/MatViewIdenticalReplaceTest.java (5 hunks)
  • core/src/test/java/io/questdb/test/cairo/wal/DedupWalWriterTest.java (3 hunks)
  • core/src/test/java/io/questdb/test/cutlass/line/tcp/LineTcpSenderTest.java (3 hunks)
  • core/src/test/java/io/questdb/test/cutlass/line/tcp/SymbolCacheTest.java (13 hunks)
  • core/src/test/java/io/questdb/test/cutlass/text/ParallelCsvFileImporterTest.java (5 hunks)
  • core/src/test/java/io/questdb/test/griffin/CopyTest.java (1 hunks)
  • core/src/test/java/io/questdb/test/griffin/SqlParserTest.java (3 hunks)
  • core/src/test/java/io/questdb/test/std/NumbersTest.java (47 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (26)
  • GitHub Check: New pull request (SelfHosted Other tests on linux-arm64)
  • GitHub Check: New pull request (SelfHosted Griffin And Fuzz tests on linux-arm64)
  • GitHub Check: New pull request (SelfHosted Griffin And Fuzz tests on linux-x64-zfs)
  • GitHub Check: New pull request (SelfHosted Other tests on linux-x64-zfs)
  • GitHub Check: New pull request (Hosted Running tests on windows-other)
  • GitHub Check: New pull request (Hosted Running tests on windows-pgwire)
  • GitHub Check: New pull request (Hosted Running tests on windows-cairo)
  • GitHub Check: New pull request (Hosted Running tests on windows-fuzz2)
  • GitHub Check: New pull request (Hosted Running tests on windows-fuzz1)
  • GitHub Check: New pull request (Hosted Running tests on windows-griffin)
  • GitHub Check: New pull request (Hosted Running tests on mac-other)
  • GitHub Check: New pull request (Hosted Running tests on mac-pgwire)
  • GitHub Check: New pull request (Hosted Running tests on mac-cairo-fuzz)
  • GitHub Check: New pull request (SelfHosted Other tests Start X64Zfs Agent)
  • GitHub Check: New pull request (Hosted Running tests on mac-cairo)
  • GitHub Check: New pull request (Rust Test and Lint on linux-jdk17)
  • GitHub Check: New pull request (SelfHosted Griffin And Fuzz tests Start X64Zfs Agent)
  • GitHub Check: New pull request (SelfHosted Other tests Start ARM Agent)
  • GitHub Check: New pull request (Hosted Running tests on mac-griffin)
  • GitHub Check: New pull request (SelfHosted Griffin And Fuzz tests Start ARM Agent)
  • GitHub Check: New pull request (Hosted Running tests with cover on linux-other)
  • GitHub Check: New pull request (Hosted Running tests with cover on linux-pgwire)
  • GitHub Check: New pull request (Hosted Running tests with cover on linux-cairo)
  • GitHub Check: New pull request (Hosted Running tests with cover on linux-fuzz)
  • GitHub Check: New pull request (Hosted Running tests with cover on linux-griffin)
  • GitHub Check: New pull request (Check Changes Check changes)

@bluestreak01 bluestreak01 changed the title feat(core): scale symbol capacity automatically perf(core): scale symbol capacity automatically Sep 23, 2025
@bluestreak01
Copy link
Copy Markdown
Member Author

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Oct 1, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
core/src/test/java/io/questdb/test/cairo/fuzz/DedupInsertFuzzTest.java (1)

907-913: Extra closing parenthesis in ALTER statement causes SQL syntax error.

There’s an extra “)” at the end of the format string.

Apply:

-String alterStatement = String.format(
-        "alter table %s dedup upsert keys(%s))",
+String alterStatement = String.format(
+        "alter table %s dedup upsert keys(%s)",
         tableNameDedup,
         comaSeparatedUpsertCols
 );
core/src/main/java/io/questdb/cairo/SymbolMapUtil.java (1)

141-163: Handle negative length on char file before mmap

  • charFileLen = ff.length(...) may return –1 on error but only == 0 is checked, so open can receive –1 and trigger an assertion. Change if (charFileLen == 0) to if (charFileLen <= 0) (or explicitly throw on < 0) to cover negative lengths. [SymbolMapUtil.java:98-103]
♻️ Duplicate comments (1)
core/src/main/java/io/questdb/cutlass/line/tcp/SymbolCache.java (1)

156-161: Pass the copied path field, not the external parameter, to avoid mutable-path aliasing

You copy the table dir into this.pathToTableDir, but the initial open still uses the external pathToTableDir parameter. If that object is mutable, the reader may retain a reference to a path that later changes. Use the copied field consistently.

Apply this diff:

-        symbolMapReader.of(configuration, pathToTableDir, columnName, symbolTableNameTxn, symCount);
+        symbolMapReader.of(configuration, this.pathToTableDir, columnName, symbolTableNameTxn, symCount);
🧹 Nitpick comments (5)
core/pom.xml (1)

618-627: Rename the placeholder strings for clarity.
Both excludePattern1 values still end with java11plus, which now mismatches the profile ID (java17+). Consider renaming them to avoid future confusion.

core/src/test/java/io/questdb/test/cairo/fuzz/DedupInsertFuzzTest.java (2)

765-770: Switch expression looks good; avoid magic column index 2.

Hard-coding column index 2 is brittle. Consider a named constant or look up the index by column name once.

Example:

-    return switch (colType) {
-        case ColumnType.SYMBOL -> rec.getSymA(2);
-        case ColumnType.STRING -> rec.getStrA(2);
-        case ColumnType.VARCHAR -> Utf8s.toString(rec.getVarcharA(2));
-        default -> throw new IllegalArgumentException();
-    };
+    final int KEY_COL = 2; // 's' column
+    return switch (colType) {
+        case ColumnType.SYMBOL -> rec.getSymA(KEY_COL);
+        case ColumnType.STRING -> rec.getStrA(KEY_COL);
+        case ColumnType.VARCHAR -> Utf8s.toString(rec.getVarcharA(KEY_COL));
+        default -> throw new IllegalArgumentException("Unexpected colType=" + colType);
+    };

779-779: Five-boolean call to readTxnToString is error-prone; add intent.

The expanded signature increases ordering mistakes risk. Add named locals or inline comments to document each flag, or wrap in a helper.

Example:

-String partitions = readTxnToString(tt, false, true, true, false);
+// booleans: <arg1>, <arg2>, <arg3>, <arg4>, <arg5>
+String partitions = readTxnToString(tt, /*arg1=*/false, /*arg2=*/true, /*arg3=*/true, /*arg4=*/false);

Please confirm the flag order matches the new API.

Also applies to: 805-805

core/src/main/java/io/questdb/cairo/TableReader.java (1)

1709-1719: Provide actual symbolCount on renew and update when reopen isn’t needed

Passing 0 defers count refresh until a later pass. Use current txFile value and update when no reopen is required. This avoids transient undercount and extra work.

     private void renewSymbolMapReader(SymbolMapReader reader, int columnIndex) {
         if (ColumnType.isSymbol(metadata.getColumnType(columnIndex))) {
             final int writerColumnIndex = metadata.getWriterIndex(columnIndex);
             final long symbolTableNameTxn = columnVersionReader.getSymbolTableNameTxn(writerColumnIndex);
             String columnName = metadata.getColumnName(columnIndex);
-            if (!(reader instanceof SymbolMapReaderImpl symbolMapReader)) {
-                reader = new SymbolMapReaderImpl(configuration, path, columnName, symbolTableNameTxn, 0);
+            final int denseIdx = metadata.getDenseSymbolIndex(columnIndex);
+            final int symbolCount = txFile.getSymbolValueCount(denseIdx);
+            if (!(reader instanceof SymbolMapReaderImpl symbolMapReader)) {
+                reader = new SymbolMapReaderImpl(configuration, path, columnName, symbolTableNameTxn, symbolCount);
             } else {
                 // Fully reopen the symbol map reader only when necessary
-                if (symbolMapReader.needsReopen(symbolTableNameTxn)) {
-                    symbolMapReader.of(configuration, path, columnName, symbolTableNameTxn, 0);
+                if (symbolMapReader.needsReopen(symbolTableNameTxn)) {
+                    symbolMapReader.of(configuration, path, columnName, symbolTableNameTxn, symbolCount);
+                } else {
+                    symbolMapReader.updateSymbolCount(symbolCount);
                 }
             }
         } else {
core/src/main/java/io/questdb/cutlass/line/tcp/TableUpdateDetails.java (1)

616-632: Avoid stateful symbolNameTemp; pass the name locally for clarity

resolveSymbolIndexAndName(...) sets and relies on a mutable field (symbolNameTemp). Make it return the name alongside the index (or derive the name locally) to reduce hidden state and cross-call coupling.

Minimal change (derive name locally in addSymbolCache):

-                final int symIndex = resolveSymbolIndexAndName(reader.getMetadata(), colWriterIndex);
-                if (symbolNameTemp == null || symIndex < 0) {
+                final int symIndex = resolveSymbolIndexAndName(reader.getMetadata(), colWriterIndex);
+                final String symbolName =
+                        symIndex < 0 ? null : reader.getMetadata().getColumnName(reader.getMetadata().getColumnIndexQuiet(symbolNameTemp));
+                if (symbolName == null || symIndex < 0) {
                     return NOT_FOUND_LOOKUP;
                 }
...
-                symCache.of(
+                symCache.of(
                         cairoConfiguration,
-                        symbolNameTemp,
+                        symbolName,
                         colWriterIndex,
                         symIndex,
                         path,
                         writerAPI,
                         txReader,
                         columnVersionReader
                 );

Alternatively, refactor resolveSymbolIndexAndName to return a small record (index, name) and drop the field entirely.

Also applies to: 531-541

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between dd0483c and 3bc0123.

📒 Files selected for processing (38)
  • .gitignore (2 hunks)
  • core/pom.xml (4 hunks)
  • core/src/main/java/io/questdb/PropServerConfiguration.java (8 hunks)
  • core/src/main/java/io/questdb/PropertyKey.java (2 hunks)
  • core/src/main/java/io/questdb/cairo/CairoConfiguration.java (2 hunks)
  • core/src/main/java/io/questdb/cairo/CairoConfigurationWrapper.java (2 hunks)
  • core/src/main/java/io/questdb/cairo/ColumnVersionReader.java (5 hunks)
  • core/src/main/java/io/questdb/cairo/ColumnVersionWriter.java (2 hunks)
  • core/src/main/java/io/questdb/cairo/DefaultCairoConfiguration.java (2 hunks)
  • core/src/main/java/io/questdb/cairo/IDGeneratorFactory.java (1 hunks)
  • core/src/main/java/io/questdb/cairo/MetadataCache.java (1 hunks)
  • core/src/main/java/io/questdb/cairo/O3PartitionJob.java (1 hunks)
  • core/src/main/java/io/questdb/cairo/RebuildColumnBase.java (1 hunks)
  • core/src/main/java/io/questdb/cairo/SymbolMapReaderImpl.java (5 hunks)
  • core/src/main/java/io/questdb/cairo/SymbolMapUtil.java (5 hunks)
  • core/src/main/java/io/questdb/cairo/SymbolMapWriter.java (8 hunks)
  • core/src/main/java/io/questdb/cairo/TableReader.java (3 hunks)
  • core/src/main/java/io/questdb/cairo/TableUtils.java (1 hunks)
  • core/src/main/java/io/questdb/cairo/TableWriter.java (25 hunks)
  • core/src/main/java/io/questdb/cairo/vm/Vm.java (1 hunks)
  • core/src/main/java/io/questdb/cairo/wal/WalWriter.java (2 hunks)
  • core/src/main/java/io/questdb/cutlass/http/processors/LineHttpProcessorState.java (1 hunks)
  • core/src/main/java/io/questdb/cutlass/line/AbstractLineSender.java (2 hunks)
  • core/src/main/java/io/questdb/cutlass/line/tcp/SymbolCache.java (6 hunks)
  • core/src/main/java/io/questdb/cutlass/line/tcp/TableUpdateDetails.java (5 hunks)
  • core/src/main/java/io/questdb/cutlass/text/ParallelCsvFileImporter.java (6 hunks)
  • core/src/main/java/io/questdb/griffin/ConvertOperatorImpl.java (1 hunks)
  • core/src/main/java/io/questdb/std/Numbers.java (53 hunks)
  • core/src/main/resources/io/questdb/site/conf/server.conf (2 hunks)
  • core/src/test/java/io/questdb/test/AbstractCairoTest.java (8 hunks)
  • core/src/test/java/io/questdb/test/PropServerConfigurationTest.java (3 hunks)
  • core/src/test/java/io/questdb/test/ServerMainTest.java (3 hunks)
  • core/src/test/java/io/questdb/test/cairo/FullFwdPartitionFrameCursorTest.java (5 hunks)
  • core/src/test/java/io/questdb/test/cairo/TableNameRegistryTest.java (4 hunks)
  • core/src/test/java/io/questdb/test/cairo/TableWriterTest.java (13 hunks)
  • core/src/test/java/io/questdb/test/cairo/fuzz/DedupInsertFuzzTest.java (6 hunks)
  • core/src/test/java/io/questdb/test/cairo/fuzz/WalWriterFuzzTest.java (1 hunks)
  • core/src/test/java/io/questdb/test/cairo/mv/MatViewIdenticalReplaceTest.java (6 hunks)
✅ Files skipped from review due to trivial changes (1)
  • .gitignore
🚧 Files skipped from review as they are similar to previous changes (7)
  • core/src/main/java/io/questdb/cutlass/text/ParallelCsvFileImporter.java
  • core/src/main/java/io/questdb/cairo/TableUtils.java
  • core/src/test/java/io/questdb/test/cairo/mv/MatViewIdenticalReplaceTest.java
  • core/src/main/java/io/questdb/cutlass/http/processors/LineHttpProcessorState.java
  • core/src/main/java/io/questdb/cairo/ColumnVersionWriter.java
  • core/src/test/java/io/questdb/test/AbstractCairoTest.java
  • core/src/main/java/io/questdb/cairo/vm/Vm.java
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: New pull request (Check Changes Check changes)
  • GitHub Check: build
  • GitHub Check: build
🔇 Additional comments (36)
core/src/test/java/io/questdb/test/ServerMainTest.java (3)

36-36: LGTM! Import added for platform-specific page size.

The import is necessary to reference Files.PAGE_SIZE in the test expectations below, ensuring the test correctly validates the platform-specific minimum allocation page size.


361-362: LGTM! Symbol table configuration updated to min/max allocation pattern.

The replacement of the single append.page.size parameter with separate min/max allocation page sizes aligns with the PR's objective to improve symbol table memory management. Using Files.PAGE_SIZE for the minimum ensures platform compatibility.


692-694: LGTM! Auto-scaling parameters added with safe defaults.

The new auto-scaling configuration parameters align with the PR objectives:

  • cairo.auto.scale.symbol.capacity defaults to false (safe opt-in approach given rollback compatibility concerns)
  • cairo.auto.scale.symbol.capacity.threshold set to 0.8 matches the 80% threshold described in the PR
core/src/test/java/io/questdb/test/cairo/TableNameRegistryTest.java (4)

137-138: LGTM! Non-fatal error handling extended appropriately.

Adding "could not remove table" as a non-fatal error condition is appropriate for concurrent drop operations, especially with the symbol capacity scaling changes that may affect file operations.


189-194: LGTM! Consistent error handling across concurrent tests.

The same non-fatal error handling logic is correctly applied here with improved readability through multi-line formatting.


280-284: LGTM! Formatting improved.

Multi-line constructor call improves readability without changing functionality.


758-764: LGTM! Text block adoption improves test readability.

Switching to Java text blocks for multi-line SQL assertions is a clean improvement that aligns with the Java 17 update mentioned in the PR.

core/src/main/java/io/questdb/cairo/IDGeneratorFactory.java (1)

51-53: Good addition of volatile for thread-safe address visibility.

The volatile modifier correctly ensures that the memory-mapped address is visible across threads after initialization in open() or reset in close(). This prevents threads from seeing stale addresses.

Minor clarity note: The comment states "current ID is consumed by other threads," but the field stores the memory address where the ID resides, not the ID value itself. Consider clarifying: "the memory address is accessed by other threads, and we need to ensure the address is visible after mapping."

core/pom.xml (2)

35-35: Java 17 property bump looks good.
Aligning javac.target with the project-wide JDK 17 move keeps compilation flags consistent.


82-82: Compiler plugin upgrade approved.
maven-compiler-plugin 3.11.0 is required to compile with JDK 17, so this version bump is the right call.

core/src/test/java/io/questdb/test/cairo/fuzz/DedupInsertFuzzTest.java (3)

644-662: Good adoption of instanceof pattern matching (Java 17).

Cleaner and safe cast; no issues.


814-817: Use of stream().distinct().toList() is fine on JDK 17.

The resulting list is unmodifiable; you don’t mutate it later, so this is correct.


1186-1188: Variable rename + bounds check logic LGTM.

Condition is correct and guards modulo; no regressions expected.

core/src/test/java/io/questdb/test/cairo/fuzz/WalWriterFuzzTest.java (1)

123-127: Text block keeps the DDL unchanged.

The text block form matches the previous SQL content while improving readability. All good.

core/src/main/java/io/questdb/cairo/RebuildColumnBase.java (1)

126-126: Accessor switch aligns with the new API.

Using tableWriter.columnVersionReader() matches the updated TableWriter surface; behavior stays the same.

core/src/main/java/io/questdb/cairo/O3PartitionJob.java (1)

2576-2576: Consistent use of the new columnVersionReader accessor.

The swap keeps the parquet index update logic intact while matching TableWriter’s updated API.

core/src/test/java/io/questdb/test/PropServerConfigurationTest.java (1)

428-428: LGTM! API migration correctly reflected.

The test correctly migrates from the removed getSymbolTableAppendPageSize() to the new getSymbolTableMinAllocationPageSize() API, consistent with the PR's shift to min/max allocation page sizing for symbol tables.

core/src/main/java/io/questdb/cairo/MetadataCache.java (1)

311-323: LGTM! Symbol table versioning correctly applied.

The change from getDefaultColumnNameTxn() to getSymbolTableNameTxn() correctly reflects the PR's introduction of separate symbol-table versioning. This ensures that symbol map files are located using the symbol-table-specific transaction ID rather than the default column transaction ID, which is consistent with the broader symbol capacity auto-scaling feature.

core/src/main/java/io/questdb/cairo/SymbolMapUtil.java (4)

32-36: LGTM! Improved import clarity.

Replacing wildcard imports with explicit imports improves code readability and reduces the risk of naming conflicts.


103-103: LGTM! Correct parameter passing.

The call correctly passes path.$() to match the LPSZ parameter type expected by the updated open() signature.


118-118: LGTM! More idiomatic empty check.

Using isEmpty() is more idiomatic and readable than length() == 0.


201-203: LGTM! Well-bounded memory allocation.

The calculateExtendSegmentSize() helper computes a power-of-two size bounded by min/max configuration values, which should provide optimal memory allocation for symbol table growth while respecting system limits.

core/src/main/resources/io/questdb/site/conf/server.conf (2)

493-505: LGTM! Auto-scaling properly documented and opt-in.

The new configuration properly documents the symbol capacity auto-scaling feature. The default of false makes this opt-in, which is appropriate for a new feature that changes file naming conventions (per the PR description's rollback warning). The 0.8 threshold is a sensible default for triggering capacity increases.


644-648: LGTM! Appropriate allocation bounds.

The replacement of the single append.page.size with separate min.allocation.page.size (4k) and max.allocation.page.size (8M) provides reasonable bounds for dynamic symbol table sizing. The 4k minimum aligns with typical page sizes, and the 8M maximum prevents excessive memory allocation.

core/src/main/java/io/questdb/cairo/wal/WalWriter.java (2)

388-388: LGTM: ColumnVersionReader lifecycle correctly managed.

The columnVersionReader is now properly freed in doClose, ensuring it remains available for the WalWriter's lifetime. This addresses the use-after-close risk flagged in previous reviews.


1247-1248: LGTM: Correct usage of symbol table name transaction.

The switch from getDefaultColumnNameTxn to getSymbolTableNameTxn aligns with the PR's introduction of symbol-table versioning. The variable naming (symbolTableNameTxn) clearly indicates this is the symbol table's name transaction, and it's correctly passed to configureSymbolMapWriter.

core/src/main/java/io/questdb/cairo/CairoConfiguration.java (2)

69-88: LGTM: Well-documented auto-scaling configuration API.

The new methods provide clear control over symbol capacity auto-scaling behavior:

  • autoScaleSymbolCapacity() acts as a feature flag
  • autoScaleSymbolCapacityThreshold() provides fine-grained control over when resizing occurs

Documentation clearly explains the percentage-based threshold and the rationale for having a disable option.


606-608: LGTM: Symbol table sizing API split improves flexibility.

Replacing the single getSymbolTableAppendPageSize() with separate min/max allocation page sizes allows better control over memory allocation behavior for symbol tables. This aligns with the auto-scaling feature where initial allocation can start small and grow as needed.

core/src/main/java/io/questdb/griffin/ConvertOperatorImpl.java (1)

208-210: LGTM: Correct symbol table name transaction usage.

The variable existingSymbolTableNameTxn is appropriately named and correctly retrieves the symbol table's name transaction via getSymbolTableNameTxn(). This is then properly passed to the SymbolMapReader constructor, aligning with the PR's symbol table versioning approach.

core/src/main/java/io/questdb/cairo/CairoConfigurationWrapper.java (2)

69-77: LGTM: Correct delegation for auto-scaling configuration.

The wrapper correctly delegates both autoScaleSymbolCapacity() and autoScaleSymbolCapacityThreshold() to the underlying configuration, maintaining consistency with the wrapper pattern used throughout this file.


1045-1052: LGTM: Correct delegation for symbol table sizing API.

Both getSymbolTableMaxAllocationPageSize() and getSymbolTableMinAllocationPageSize() are properly delegated to the underlying configuration, following the established wrapper pattern.

core/src/main/java/io/questdb/cairo/DefaultCairoConfiguration.java (2)

99-107: LGTM: Sensible defaults for auto-scaling.

Auto-scaling is enabled by default (true) with an 80% threshold, which provides a good balance between minimizing resize operations and avoiding excessive memory waste. This aligns with the PR's goal of automatic capacity management.


1070-1077: LGTM: Appropriate symbol table allocation page sizes.

The min/max allocation page sizes are well-chosen:

  • Min (Files.PAGE_SIZE): Platform-specific, minimal allocation - good starting point for small symbol tables
  • Max (8 MB): Reasonable upper bound that prevents excessive single allocations while allowing growth

These defaults support the auto-scaling feature by enabling efficient growth from small to large symbol tables.

core/src/main/java/io/questdb/cairo/TableWriter.java (3)

1250-1253: API exposure looks good
Returning ColumnVersionReader view over the writer is a sensible narrowing without extra allocations.


4281-4298: Symbol writer creation changes LGTM
Passing symbolIndexInTxWriter, txWriter, and columnIndex improves consistency with versioned filenames and value-count tracking.

Please confirm that txWriter.getSymbolValueCount(symbolIndexInTxWriter) is stable for the dense writer ordering at this point in initialization.


3530-3614: Attach validation for SYMBOL columns is solid
Bounds-checking symbol keys and validating index files before attach reduces corruption risk.

@puzpuzpuz puzpuzpuz self-requested a review October 2, 2025 11:31
@puzpuzpuz puzpuzpuz added Enhancement Enhance existing functionality Core Related to storage, data type, etc. labels Oct 2, 2025
@bluestreak01
Copy link
Copy Markdown
Member Author

removed few legacy tests, which test that TableWriter will recover from OS failures. These tests do not go thru WAL path and test limited functionality that remains intact after failures. The more robust method to recover from errors is to distress the writer and create a new one from disk state.

@glasstiger
Copy link
Copy Markdown
Contributor

[PR Coverage check]

😍 pass : 401 / 451 (88.91%)

file detail

path covered line new line coverage
🔵 io/questdb/cairo/vm/NullMapWriter.java 0 1 00.00%
🔵 io/questdb/cutlass/line/AbstractLineSender.java 3 7 42.86%
🔵 io/questdb/cairo/vm/MemoryCMARWImpl.java 4 5 80.00%
🔵 io/questdb/std/Numbers.java 117 145 80.69%
🔵 io/questdb/cutlass/text/ParallelCsvFileImporter.java 16 18 88.89%
🔵 io/questdb/cairo/TableWriter.java 89 98 90.82%
🔵 io/questdb/cutlass/line/tcp/SymbolCache.java 22 24 91.67%
🔵 io/questdb/PropServerConfiguration.java 17 18 94.44%
🔵 io/questdb/cairo/ColumnVersionReader.java 29 30 96.67%
🔵 io/questdb/cutlass/text/CopyTask.java 31 32 96.88%
🔵 io/questdb/cairo/MetadataCache.java 2 2 100.00%
🔵 io/questdb/cairo/DatabaseCheckpointAgent.java 1 1 100.00%
🔵 io/questdb/cairo/TableReader.java 8 8 100.00%
🔵 io/questdb/cutlass/http/processors/LineHttpProcessorState.java 1 1 100.00%
🔵 io/questdb/cairo/wal/WalWriter.java 3 3 100.00%
🔵 io/questdb/cutlass/line/tcp/TableUpdateDetails.java 4 4 100.00%
🔵 io/questdb/cairo/vm/Vm.java 1 1 100.00%
🔵 io/questdb/cairo/TableUtils.java 1 1 100.00%
🔵 io/questdb/cairo/SymbolMapReaderImpl.java 8 8 100.00%
🔵 io/questdb/cairo/DefaultCairoConfiguration.java 4 4 100.00%
🔵 io/questdb/cairo/O3PartitionJob.java 1 1 100.00%
🔵 io/questdb/std/NumericException.java 2 2 100.00%
🔵 io/questdb/cairo/SymbolMapWriter.java 16 16 100.00%
🔵 io/questdb/PropertyKey.java 4 4 100.00%
🔵 io/questdb/griffin/ConvertOperatorImpl.java 2 2 100.00%
🔵 io/questdb/cairo/SymbolMapUtil.java 5 5 100.00%
🔵 io/questdb/cairo/CairoConfigurationWrapper.java 4 4 100.00%
🔵 io/questdb/cairo/ColumnVersionWriter.java 4 4 100.00%
🔵 io/questdb/cairo/RebuildColumnBase.java 1 1 100.00%
🔵 io/questdb/cairo/IDGeneratorFactory.java 1 1 100.00%

@ideoma ideoma merged commit d05da25 into master Oct 2, 2025
35 checks passed
@ideoma ideoma deleted the vi_scale_sym_cap branch October 2, 2025 14:22
@tris0laris tris0laris moved this from Imminent Release to Shipped in QuestDB Public Roadmap (Legacy) Oct 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Core Related to storage, data type, etc. Enhancement Enhance existing functionality

Projects

Development

Successfully merging this pull request may close these issues.

5 participants