perf(core): scale symbol capacity automatically#6149
Conversation
|
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the WalkthroughThe PR introduces symbol-table version partitioning and switches symbol metadata lookups to “symbol table name txn”. It adds symbol capacity auto-scaling and replaces symbol table append page size with min/max allocation settings. Multiple APIs are adjusted accordingly across readers/writers, CSV import, line TCP symbol cache, and tests. Java 17 is enabled. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant TW as TableWriter
participant CFG as CairoConfiguration
participant SMW as SymbolMapWriter
participant ST as SymbolTable Files
Note over TW,CFG: Housekeeping: symbol capacity auto-scaling
TW->>CFG: autoScaleSymbolCapacity(), autoScaleSymbolCapacityThreshold()
alt Auto-scaling enabled and occupancy >= threshold
TW->>SMW: getSymbolCount(), getCapacity()
TW->>SMW: resizeCapacity(2x)
SMW->>ST: extend files (min/max alloc sizing)
else No action
Note over TW: Keep current capacities
end
sequenceDiagram
autonumber
participant Writer as TableWriter/WalWriter
participant CVR as ColumnVersionReader
participant SCR as SymbolCache (Line TCP)
participant SMR as SymbolMapReaderImpl
Note over Writer,CVR: Symbol-table name txn partition
Writer->>CVR: getSymbolTableNameTxn(colIdx)
Writer->>SMR: open(pathToTableDir, columnName, symbolTableNameTxn)
Note over SCR,CVR: Reload decision on symbol work
SCR->>CVR: getSymbolTableNameTxn(colIdx)
alt txn changed
SCR->>SMR: reopen with new txn
else same txn
SCR->>SMR: updateSymbolCount()
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@CodeRabbit review |
✅ Actions performedReview triggered.
|
1 similar comment
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 10
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (5)
core/src/test/java/io/questdb/test/cairo/TableNameRegistryTest.java (1)
287-289: WalPurgeJob thread stops early; keep it alive untildone.As written, the loop exits on first empty run even if
doneis false, reducing concurrent stress and risking missed purges during the test window.Apply this diff:
- //noinspection StatementWithEmptyBody - while (!done.get() && job.run(0)) { - // run until empty - } + while (!done.get()) { + // drain until empty + while (job.run(0)) { + // keep draining + } + Os.pause(); + }core/src/test/java/io/questdb/test/griffin/SqlParserTest.java (1)
536-561: Test no longer validates time units; loop now pointless.
testACRangeFrameAcceptsTimeUnitsiterates overunitsAndValuesbut the input SQL strings (2nd arg toassertQuery) no longer include the unit, so we don't actually exercise unit parsing. The.replace("#unit", expectedUnit)is applied to the expected string (which doesn't contain#unit) and is a no‑op. Net effect: coverage regression; the test would pass even if time units were rejected.Fix by moving
#unitinto the input SQL and dropping the redundant.replaceon the expected string.Apply this diff:
- String expectedUnit = unitsAndValues[i].replaceAll("s$", ""); - assertQuery( - ("select-window a, b, f(c) f over (partition by b order by ts range between 10 preceding and current row exclude no others) " + - "from (select-choose [a, b, c, ts] a, b, c, ts from (select [a, b, c, ts] from xyz timestamp (ts)))") - .replace("#unit", expectedUnit), - "select a,b, f(c) over (partition by b order by ts range 10 preceding) from xyz", + String expectedUnit = unitsAndValues[i].replaceAll("s$", ""); + assertQuery( + "select-window a, b, f(c) f over (partition by b order by ts range between 10 preceding and current row exclude no others) " + + "from (select-choose [a, b, c, ts] a, b, c, ts from (select [a, b, c, ts] from xyz timestamp (ts)))", + ("select a,b, f(c) over (partition by b order by ts range 10 #unit preceding) from xyz") + .replace("#unit", expectedUnit), modelOf("xyz") .col("a", ColumnType.INT) .col("b", ColumnType.INT) .col("c", ColumnType.INT) .timestamp("ts") ); - assertQuery( - ("select-window a, b, f(c) f over (partition by b order by ts range between 10 preceding and 1 following exclude no others) " + - "from (select-choose [a, b, c, ts] a, b, c, ts from (select [a, b, c, ts] from xyz timestamp (ts)))") - .replace("#unit", expectedUnit), - "select a,b, f(c) over (partition by b order by ts range between 10 preceding and 1 following) from xyz", + assertQuery( + "select-window a, b, f(c) f over (partition by b order by ts range between 10 preceding and 1 following exclude no others) " + + "from (select-choose [a, b, c, ts] a, b, c, ts from (select [a, b, c, ts] from xyz timestamp (ts)))", + ("select a,b, f(c) over (partition by b order by ts range between 10 #unit preceding and 1 #unit following) from xyz") + .replace("#unit", expectedUnit), modelOf("xyz") .col("a", ColumnType.INT) .col("b", ColumnType.INT) .col("c", ColumnType.INT) .timestamp("ts") );core/src/test/java/io/questdb/test/cairo/fuzz/DedupInsertFuzzTest.java (2)
794-799: Use timestampColumnName in both bounds (not hard-coded "ts").Prevents breakage if timestamp column name differs.
- String insertSql = "insert into " + tableNameDedup + - " select * from " + tableNameDedup + - " where " + timestampColumnName + " >= " + fromTs + " and ts < " + toTs - + " LIMIT 100000"; + String insertSql = "insert into " + tableNameDedup + + " select * from " + tableNameDedup + + " where " + timestampColumnName + " >= " + fromTs + " and " + timestampColumnName + " < " + toTs + + " LIMIT 100000";
908-913: Syntax error: extra ')' in ALTER TABLE statement.The generated SQL has an unmatched closing parenthesis.
- String alterStatement = String.format( - "alter table %s dedup upsert keys(%s))", + String alterStatement = String.format( + "alter table %s dedup upsert keys(%s)", tableNameDedup, comaSeparatedUpsertCols );core/src/main/java/io/questdb/cairo/TableWriter.java (1)
1152-1167: Use the updated symbol-table txn when rebuilding capacityAfter hardLinkAndPurgeSymbolTableFiles() you upsert the symbol-table name txn. Rebuilding with a stale txn risks writing to/reading from the wrong file set if txn changes across operations. Use the current symbol-table txn from ColumnVersionWriter.
@@ - oldSymbolWriter.rebuildCapacity( - configuration, - path, - columnName, - columnNameTxn, - newSymbolCapacity, - symbolCacheFlag - ); + long newSymTableTxn = columnVersionWriter.getSymbolTableNameTxn(columnIndex); + oldSymbolWriter.rebuildCapacity( + configuration, + path, + columnName, + newSymTableTxn, + newSymbolCapacity, + symbolCacheFlag + );Also applies to: 1182-1185
🧹 Nitpick comments (37)
.gitignore (1)
38-39: Reconsider ignoring Cargo.lock; keep for reproducible Rust builds.For Rust binaries/tools, committing Cargo.lock is recommended to ensure deterministic builds. If parquet-tools is a binary crate (likely), remove the ignore; if it’s a library, keeping it ignored is fine—please confirm intent. Also, align trailing slash style with nearby entries for consistency.
Apply this diff if you decide to keep the lockfile and normalize the path style:
-core/rust/qdbr/parquet2/parquet-tools/target/ -core/rust/qdbr/parquet2/parquet-tools/Cargo.lock +core/rust/qdbr/parquet2/parquet-tools/targetcore/src/test/java/io/questdb/test/griffin/CopyTest.java (1)
1167-1168: Make result ordering explicit to avoid future flakiness.Relying on implicit row order + negative LIMIT can become brittle if planner/storage details change. Consider ordering by the timestamp.
Apply if it matches current semantics:
- "select line,ts,d,description from x limit -10" + "select line,ts,d,description from x order by ts limit -10"Please run the test suite to verify negative LIMIT behavior with ORDER BY remains consistent in your QuestDB version.
core/src/test/java/io/questdb/test/cairo/TableNameRegistryTest.java (2)
135-139: Also tolerate "could not remove table" in testConcurrentCreateDrop to avoid flakiness.The same transient condition can surface here during concurrent drops.
Apply this diff:
- if (!Chars.contains(e.getFlyweightMessage(), "table does not exist") - && !Chars.contains(e.getFlyweightMessage(), "could not lock") - && !Chars.contains(e.getFlyweightMessage(), "table name is reserved")) { + if (!Chars.contains(e.getFlyweightMessage(), "table does not exist") + && !Chars.contains(e.getFlyweightMessage(), "could not lock") + && !Chars.contains(e.getFlyweightMessage(), "table name is reserved") + && !Chars.contains(e.getFlyweightMessage(), "could not remove table")) { throw e; }
210-210: Deterministic RNG seeds: consider hoisting to named constants.Improves readability and reuse across tests.
Example outside this hunk:
private static final long SEED1 = 3095520132513L; private static final long SEED2 = 1758309268883L; ... Rnd rnd = TestUtils.generateRandom(LOG, SEED1, SEED2);core/pom.xml (1)
35-35: Enforce JDK 17, upgrade compiler plugin, and switch to to avoid build/toolchain skew3.8.1 can compile with JDK 17 (it delegates to javac), but upgrade to a recent maven-compiler-plugin (3.10.1+; 3.11.0 shown) and use while updating the Java profile to require JDK 17.
@@ - <artifactId>maven-compiler-plugin</artifactId> - <version>3.8.1</version> + <artifactId>maven-compiler-plugin</artifactId> + <version>3.11.0</version> @@ - <source>${javac.compile.source}</source> - <target>${javac.compile.target}</target> + <release>${javac.compile.source}</release>@@ - <profile> - <id>java11+</id> + <profile> + <id>java11+</id> <!-- keep id if referenced externally --> <properties> - <jdk.version>11</jdk.version> - <java.enforce.version>11</java.enforce.version> + <jdk.version>17</jdk.version> + <java.enforce.version>17</java.enforce.version> @@ - <activation> - <jdk>(1.8,)</jdk> - </activation> + <activation> + <jdk>(16,)</jdk> <!-- require 17+ --> + </activation>core/src/test/java/io/questdb/test/cairo/mv/MatViewIdenticalReplaceTest.java (2)
174-176: Avoid regex in timestamp replacement
replaceAll(".000000Z", ".000000000Z")treats.as “any char”. Use a literal replacement to prevent accidental matches and avoid regex overhead.Apply this diff:
- return ColumnType.isTimestampMicro(timestampType.getDriver().getTimestampType()) ? expected : expected.replaceAll(".000000Z", ".000000000Z"); + return ColumnType.isTimestampMicro(timestampType.getDriver().getTimestampType()) + ? expected + : expected.replace(".000000Z", ".000000000Z");
169-172: Minor: use literal replace, not replaceAll, for #TIMESTAMPNo regex needed here; using
replaceavoids regex semantics and is a tad faster.- sqlText = sqlText.toString().replaceAll("#TIMESTAMP", timestampType.getTypeName()); + sqlText = sqlText.toString().replace("#TIMESTAMP", timestampType.getTypeName());core/src/test/java/io/questdb/test/cairo/FullFwdPartitionFrameCursorTest.java (2)
544-545: Parallel YEAR tests: same 1972.3 suffix — apply the same de‑brittlingMirror the constant/API approach here to keep the suite consistent and reduce future churn.
Apply the same constant:
- testParallelIndexFailureAtRuntime(PartitionBy.YEAR, 10000000L * 30 * 12, false, "1972.3" + Files.SEPARATOR + "a.v", 2); + testParallelIndexFailureAtRuntime(PartitionBy.YEAR, 10000000L * 30 * 12, false, YEAR_1972_DIR + Files.SEPARATOR + "a.v", 2); @@ - testParallelIndexFailureAtRuntime(PartitionBy.YEAR, 10000000L * 30 * 12, false, "1972.3" + Files.SEPARATOR + "b.v", 2); + testParallelIndexFailureAtRuntime(PartitionBy.YEAR, 10000000L * 30 * 12, false, YEAR_1972_DIR + Files.SEPARATOR + "b.v", 2); @@ - testParallelIndexFailureAtRuntime(PartitionBy.YEAR, 10000000L * 30 * 12, false, "1972.3" + Files.SEPARATOR + "c.v", 2); + testParallelIndexFailureAtRuntime(PartitionBy.YEAR, 10000000L * 30 * 12, false, YEAR_1972_DIR + Files.SEPARATOR + "c.v", 2);Also applies to: 549-550, 554-555
234-235: Centralize YEAR partition dir "1972.3" — avoid brittle name‑txn literalsMultiple tests hard-code "1972.3" (core/src/test/java/io/questdb/test/cairo/FullFwdPartitionFrameCursorTest.java:234, 239, 244, 544, 549, 554). If ".3" is a name‑txn suffix (not derivable from the timestamp), replace these literals with a single constant (e.g. private static final String YEAR_1972_DIR = "1972.3") or, preferably, derive the partition directory via the partition-name API so tests don’t break when name‑txn sequencing changes.
core/src/test/java/io/questdb/test/cutlass/line/tcp/LineTcpSenderTest.java (1)
59-59: Drop redundant non-static JUnit import and use existing static assert.You already have
import static org.junit.Assert.*;. Remove the directorg.junit.Assertimport and switch the call site toassertEquals.Apply this diff:
-import org.junit.Assert; ... - Assert.assertEquals(0, errorCount.get()); + assertEquals(0, errorCount.get());Also applies to: 1023-1023
core/src/test/java/io/questdb/test/cairo/fuzz/DedupInsertFuzzTest.java (3)
764-771: Switch expression looks good; add a diagnostic for unsupported types.Throwing an informative error helps test triage.
Apply this diff:
- return switch (colType) { + return switch (colType) { case ColumnType.SYMBOL -> rec.getSymA(2); case ColumnType.STRING -> rec.getStrA(2); case ColumnType.VARCHAR -> Utf8s.toString(rec.getVarcharA(2)); - default -> throw new IllegalArgumentException(); + default -> throw new IllegalArgumentException("Unsupported column type: " + ColumnType.nameOf(colType)); };
641-645: Compute dupStep once per transaction, not per operation.Reduces randomness noise and minor CPU churn inside the loop.
- for (int op = 0; op < size; op++) { - int dupStep = Math.max(2, transaction.operationList.size() / (1 + rnd.nextInt(10))); + int dupStep = Math.max(2, size / (1 + rnd.nextInt(10))); + for (int op = 0; op < size; op++) {
889-891: Typo: rename variable to commaSeparatedUpsertCols.Improves readability.
- String comaSeparatedUpsertCols; + String commaSeparatedUpsertCols;And update its usages accordingly.
core/src/main/java/io/questdb/std/NumericException.java (1)
93-96: Clarify nullability for substring put()Align with other overloads. If null is not allowed here, annotate parameter as @NotNull to prevent accidental NPEs downstream.
- public NumericException put(CharSequence cs, int lo, int hi) { + public NumericException put(@NotNull CharSequence cs, int lo, int hi) { message.put(cs, lo, hi); return this; }core/src/main/java/io/questdb/std/Numbers.java (3)
670-687: Simplify floorPow2 via Long.highestOneBitSame semantics, clearer and branchless. JVM intrinsic on modern HotSpot.
- public static long floorPow2(long value) { - if (value <= 0) { - return 0; - } - long v = value; - v |= v >>> 1; - v |= v >>> 2; - v |= v >>> 4; - v |= v >>> 8; - v |= v >>> 16; - v |= v >>> 32; - return v - (v >>> 1); - } + public static long floorPow2(long value) { + return value <= 0 ? 0 : Long.highestOneBit(value); + }
946-949: Empty hex substring check should use hi <= loThe current check
hi == 0fails for non‑zerolo(e.g., lo==5, hi==5). Usehi <= loto catch all empty slices.- if (hi == 0) { + if (hi <= lo) { throw NumericException.instance().put("empty hex string"); }Also applies to: 966-969, 982-985
1498-1525: Error messages in parseLongSize use “duration” instead of “size”Nit but can mislead operators. Use consistent wording with int variant.
- throw NumericException.instance().put("duration overflow"); + throw NumericException.instance().put("size overflow"); @@ - throw NumericException.instance().put("duration overflow"); + throw NumericException.instance().put("size overflow"); @@ - throw NumericException.instance().put("duration overflow"); + throw NumericException.instance().put("size overflow"); @@ - throw NumericException.instance().put("invalid duration format"); + throw NumericException.instance().put("invalid size format");core/src/test/java/io/questdb/test/std/NumbersTest.java (1)
331-364: Floor/Ceil relationship testsThese lock in the new ceilPow2 saturation at 2^62. Consider adding a small test clarifying behavior for negatives (e.g., value < 1), even if only to assert current behavior.
core/src/main/java/io/questdb/cutlass/text/ParallelCsvFileImporter.java (1)
1187-1204: Fail fast if symbol indices cannot be resolvedBoth indices can be -1; add a quick guard to surface configuration mismatches early.
- int tempTableDenseSymbolIndex = targetTableStructure.getDenseSymbolIndex(symbolColumnName); - int tempTableColumnIndex = targetTableStructure.getColumnIndex(symbolColumnName); + int tempTableDenseSymbolIndex = targetTableStructure.getDenseSymbolIndex(symbolColumnName); + int tempTableColumnIndex = targetTableStructure.getColumnIndex(symbolColumnName); + assert tempTableDenseSymbolIndex >= 0 && tempTableColumnIndex >= 0 + : "Unresolved symbol column in temp tables: " + symbolColumnName;core/src/main/java/io/questdb/cutlass/text/CopyTask.java (1)
709-728: Use TableUtils helpers or COLUMN_NAME_TXN_NONE when appending txn suffixIn core/src/main/java/io/questdb/cutlass/text/CopyTask.java (PhaseUpdateSymbolKeys.run, ~lines 720–728) you do
if (columnNameTxn != -1) { path.put('.').put(columnNameTxn); }. Use the same convention as TableUtils (either call TableUtils.dFile/iFile or checkcolumnNameTxn > TableUtils.COLUMN_NAME_TXN_NONE) to keep suffix logic consistent and avoid edge-case mismatches.core/src/test/java/io/questdb/test/cairo/SymbolMapTest.java (1)
106-108: Replace magic -1 with a named constant or helper to document semantics.Every new SymbolMapWriter(...) call adds NOOP_COLLECTOR and a trailing -1. The sentinel is unclear and repeated across the file. Introduce a self-explanatory constant (e.g., SYMBOL_TABLE_TXN_NONE or COLUMN_INDEX_UNSET), or a local factory method newWriter(path, name, start) that hides these fixed args to improve readability and reduce copy/paste risk. Applies to all occurrences listed.
Also applies to: 128-130, 191-193, 265-267, 304-305, 328-329, 358-360, 395-397, 421-422, 444-445, 467-468, 492-494, 513-514, 536-537, 575-576, 598-599, 622-623, 694-695, 749-758, 820-822, 864-865, 925-926, 945-946, 971-972, 995-996, 1028-1029, 1072-1073, 1123-1124
core/src/main/java/io/questdb/cairo/ColumnVersionWriter.java (1)
302-305: Fix misleading comment in upsertSymbolTableTxnName().The comment talks about storing the added-partition timestamp in columnTop, but here columnTop is always 0 for the symbol-table partition. Update the comment to reflect actual behavior.
Apply this wording tweak:
- // When table is partitioned, use columnTop place to store the timestamp of the partition where the column added + // Store the column name txn for the symbol-table partition; columnTop is unused here and set to 0.core/src/main/java/io/questdb/cutlass/line/tcp/SymbolCache.java (2)
121-123: Clear temp sink before UTF‑8→UTF‑16 conversion.Without clearing, values may concatenate depending on
Utf8s.utf8ToUtf16Uncheckedbehavior.Apply this diff:
- Utf8s.utf8ToUtf16Unchecked(value, tempSink); + tempSink.clear(); + Utf8s.utf8ToUtf16Unchecked(value, tempSink);
131-156: Avoid relying on external immutability ofpathToTableDir.The comment admits fragility. Store a defensive copy to prevent accidental mutation by owners.
Apply this diff:
- this.pathToTableDir = pathToTableDir; + // Defensive copy to decouple from owner mutations + this.pathToTableDir = Utf8String.newInstance(pathToTableDir);core/src/main/java/io/questdb/cairo/ColumnVersionReader.java (1)
221-235: API for symbol-table txn with sane fallback; fix Javadoc tag.Method and fallback to default txn are correct. Minor: use
{@link #getDefaultColumnNameTxn}instead of{@see ...}to avoid Javadoc warnings.core/src/test/java/io/questdb/test/cutlass/text/ParallelCsvFileImporterTest.java (2)
1582-1590: Typo in test name: Typ → Type.Nit: rename to
testImportTimestampTypeFormatMismatchfor consistency/searchability.- public void testImportTimestampTypFormatMismatch() { + public void testImportTimestampTypeFormatMismatch() {
3517-3519: Public nested test interface likely unnecessary.
TextImportRunnableneedn’t bepublicinside this test class; package‑private keeps surface area smaller.- public interface TextImportRunnable { + interface TextImportRunnable {core/src/test/java/io/questdb/test/cutlass/line/tcp/SymbolCacheTest.java (4)
148-156: Column index vs symbol index: avoid off‑by‑one with explicit lookup.You pass
col + 1(table col index accounting for timestamp) andcol(symbol index). This is subtle and brittle. Prefer resolving both from metadata to avoid off‑by‑one when schema changes.- symbolCache.of( - engine.getConfiguration(), - "col" + col, - col + 1, - col, - path, - new TestTableWriterAPI(rdr.getColumnVersionReader()), - txReader - ); + final int tableColIndex = rdr.getMetadata().getColumnIndex("col" + col); + final int symbolColOrdinal = txReader.unsafeSymbolColumnIndexOf(tableColIndex); // or compute from metadata if available + symbolCache.of( + engine.getConfiguration(), + "col" + col, + tableColIndex, + symbolColOrdinal, + path, + new TestTableWriterAPI(rdr.getColumnVersionReader()), + txReader + );If
unsafeSymbolColumnIndexOf(...)isn’t available, compute the ordinal by iterating metadata and counting SYMBOL columns up totableColIndex.
221-227: PartitionBy mismatch with created tables.Tables are created with
PartitionBy.HOURbutTxReader.ofRO(...)is passedPartitionBy.DAY. Align these for consistency to prevent subtle test flakiness ifTxReaderstarts validating this.- TableUtils.getTimestampType(model), - PartitionBy.DAY + TableUtils.getTimestampType(model), + PartitionBy.HOURAlso applies to: 424-428
605-613: Recompute indices after column removal.After
writer.removeColumn("symCol1"), using hardcoded0assumes reindexing; safer to recompute the index for"symCol2"from metadata.- "symCol2", - 0, - 0, + "symCol2", + writer.getColumnIndex("symCol2"), + txReader.unsafeSymbolColumnIndexOf(writer.getColumnIndex("symCol2")),If
unsafeSymbolColumnIndexOfis unavailable, derive the ordinal as noted earlier.
230-238: Reduce duplication: helper to init/reinit SymbolCache.Consider a small helper to DRY repeated
cache.of(...)sequences (path building + index resolution + API construction).+ private static void initCache(SymbolCache cache, + CairoConfiguration cfg, + Path path, + TableToken token, + String colName, + int tableColIdx, + int symOrdinal, + TableWriterAPI twa, + TxReader txReader) { + cache.of(cfg, colName, tableColIdx, symOrdinal, path.of(cfg.getDbRoot()).concat(token), twa, txReader); + }Also applies to: 252-259, 452-461, 522-526, 605-613, 683-691, 766-774
core/src/test/java/io/questdb/test/AbstractCairoTest.java (1)
335-335: Minor typographical error in comment.The word "adcanced" should be spelled "advanced" on Line 335.
Apply this diff to correct the typo:
- // intentionally calling hasNext() twice: we want to adcanced the cursor position, + // intentionally calling hasNext() twice: we want to advanced the cursor position,core/src/main/java/io/questdb/cairo/TableUtils.java (1)
123-123: Lowered MIN_INDEX_VALUE_BLOCK_SIZE: align validation and avoid magic numberGood change; to keep invariants consistent, replace the hardcoded check "index value block capacity < 2" in validateMeta() with MIN_INDEX_VALUE_BLOCK_SIZE.
- if (getIndexBlockCapacity(metaMem, i) < 2) { + if (getIndexBlockCapacity(metaMem, i) < MIN_INDEX_VALUE_BLOCK_SIZE) {core/src/main/java/io/questdb/cairo/MetadataCache.java (1)
303-318: Avoid nested try-with-resources on the same ColumnVersionReaderCurrently you open the same AutoCloseable twice; simplify to a single TWR.
- try (columnVersionReader) { - LOG.debug().$("hydrating symbol metadata [table=").$(token).$(", column=").$safe(columnName).I$(); - // get column version - path.trimTo(configuration.getDbRoot().length()).concat(token); - final int rootLen = path.size(); - path.concat(TableUtils.COLUMN_VERSION_FILE_NAME); - final long columnNameTxn; - final FilesFacade ff = configuration.getFilesFacade(); - try (columnVersionReader) { - columnVersionReader.ofRO(ff, path.$()); - columnVersionReader.readUnsafe(); - columnNameTxn = columnVersionReader.getSymbolTableNameTxn(writerIndex); - } + try (columnVersionReader) { + LOG.debug().$("hydrating symbol metadata [table=").$(token).$(", column=").$safe(columnName).I$(); + path.trimTo(configuration.getDbRoot().length()).concat(token); + final int rootLen = path.size(); + path.concat(TableUtils.COLUMN_VERSION_FILE_NAME); + final FilesFacade ff = configuration.getFilesFacade(); + columnVersionReader.ofRO(ff, path.$()); + columnVersionReader.readUnsafe(); + final long columnNameTxn = columnVersionReader.getSymbolTableNameTxn(writerIndex);core/src/main/java/io/questdb/cairo/TableWriter.java (3)
5484-5553: Don’t hard-link .k/.v in table root for symbol table filesAt table root, symbol tables consist of .c/.o; bitmap index .k/.v live per-partition. Attempting to link .k/.v at root is unnecessary and may be confusing. Simplify to .c/.o only.
@@ - linkFile( - ff, - keyFileName(path.trimTo(pathSize), columnName, symbolTableNameTxn), - keyFileName(other.trimTo(pathSize), newName, newColumnNameTxn) - ); - linkFile( - ff, - valueFileName(path.trimTo(pathSize), columnName, symbolTableNameTxn), - valueFileName(other.trimTo(pathSize), newName, newColumnNameTxn) - );
5571-5615: Capacity rebuild path: copy .o, link .c — good; add small robustnessThis path correctly copies offsets (.o) to update header while preserving .c. Recommend fsync root dir after linking/copying (non-Windows) to reduce risk on crash between file ops, mirroring patterns elsewhere.
3591-3619: Tighten attach pinning to known column file typesattachPartitionPinColumnVersions() currently trusts any “name.ext.txn” pattern. Guard by checking ext ∈ {d,i,k,v,c,o} to avoid pinning on unrelated files that match the pattern.
@@ - if (firstDot != -1) { + if (firstDot != -1) { utf16Sink.clear(); utf16Sink.put(tmpDirectUtf8StringZ, 0, firstDot); // all our column files have .d (or such) extensions int columnIndex = metadata.getColumnIndexQuiet(utf16Sink); if (columnIndex != -1) { // not a random file, we have column by this name int lastDot = Utf8s.lastIndexOfAscii(tmpDirectUtf8StringZ, '.'); int len = Utf8s.length(tmpDirectUtf8StringZ); if (lastDot > 1 && lastDot < len - 1) { + // ensure known column file extension + int extStart = firstDot + 1; + int extEnd = lastDot; + if (!Utf8s.matchesAscii(tmpDirectUtf8StringZ, extStart, extEnd, "d") + && !Utf8s.matchesAscii(tmpDirectUtf8StringZ, extStart, extEnd, "i") + && !Utf8s.matchesAscii(tmpDirectUtf8StringZ, extStart, extEnd, "k") + && !Utf8s.matchesAscii(tmpDirectUtf8StringZ, extStart, extEnd, "v") + && !Utf8s.matchesAscii(tmpDirectUtf8StringZ, extStart, extEnd, "c") + && !Utf8s.matchesAscii(tmpDirectUtf8StringZ, extStart, extEnd, "o")) { + return; + } // we are rejecting 'abc', '.abc' and 'abc.', but accepting 'a.xxx' try { long nameTxn = Numbers.parseLong(tmpDirectUtf8StringZ, lastDot + 1, len);
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (41)
.gitignore(2 hunks)core/pom.xml(1 hunks)core/src/main/java/io/questdb/cairo/ColumnVersionReader.java(3 hunks)core/src/main/java/io/questdb/cairo/ColumnVersionWriter.java(2 hunks)core/src/main/java/io/questdb/cairo/DatabaseCheckpointAgent.java(1 hunks)core/src/main/java/io/questdb/cairo/DefaultCairoConfiguration.java(1 hunks)core/src/main/java/io/questdb/cairo/MapWriter.java(1 hunks)core/src/main/java/io/questdb/cairo/MetadataCache.java(1 hunks)core/src/main/java/io/questdb/cairo/SymbolMapReaderImpl.java(3 hunks)core/src/main/java/io/questdb/cairo/SymbolMapWriter.java(9 hunks)core/src/main/java/io/questdb/cairo/TableReader.java(3 hunks)core/src/main/java/io/questdb/cairo/TableUtils.java(1 hunks)core/src/main/java/io/questdb/cairo/TableWriter.java(23 hunks)core/src/main/java/io/questdb/cairo/TableWriterAPI.java(1 hunks)core/src/main/java/io/questdb/cairo/vm/MemoryCMARWImpl.java(3 hunks)core/src/main/java/io/questdb/cairo/vm/NullMapWriter.java(1 hunks)core/src/main/java/io/questdb/cairo/vm/Vm.java(1 hunks)core/src/main/java/io/questdb/cairo/wal/WalWriter.java(2 hunks)core/src/main/java/io/questdb/cutlass/http/processors/LineHttpProcessorState.java(1 hunks)core/src/main/java/io/questdb/cutlass/line/tcp/SymbolCache.java(4 hunks)core/src/main/java/io/questdb/cutlass/line/tcp/TableUpdateDetails.java(1 hunks)core/src/main/java/io/questdb/cutlass/text/CopyTask.java(17 hunks)core/src/main/java/io/questdb/cutlass/text/ParallelCsvFileImporter.java(6 hunks)core/src/main/java/io/questdb/griffin/ConvertOperatorImpl.java(1 hunks)core/src/main/java/io/questdb/std/Numbers.java(53 hunks)core/src/main/java/io/questdb/std/NumericException.java(1 hunks)core/src/test/java/io/questdb/test/AbstractCairoTest.java(8 hunks)core/src/test/java/io/questdb/test/cairo/ColumnVersionWriterTest.java(3 hunks)core/src/test/java/io/questdb/test/cairo/FullFwdPartitionFrameCursorTest.java(2 hunks)core/src/test/java/io/questdb/test/cairo/SymbolMapTest.java(27 hunks)core/src/test/java/io/questdb/test/cairo/TableNameRegistryTest.java(4 hunks)core/src/test/java/io/questdb/test/cairo/TableWriterTest.java(25 hunks)core/src/test/java/io/questdb/test/cairo/fuzz/DedupInsertFuzzTest.java(5 hunks)core/src/test/java/io/questdb/test/cairo/mv/MatViewIdenticalReplaceTest.java(5 hunks)core/src/test/java/io/questdb/test/cairo/wal/DedupWalWriterTest.java(3 hunks)core/src/test/java/io/questdb/test/cutlass/line/tcp/LineTcpSenderTest.java(3 hunks)core/src/test/java/io/questdb/test/cutlass/line/tcp/SymbolCacheTest.java(13 hunks)core/src/test/java/io/questdb/test/cutlass/text/ParallelCsvFileImporterTest.java(5 hunks)core/src/test/java/io/questdb/test/griffin/CopyTest.java(1 hunks)core/src/test/java/io/questdb/test/griffin/SqlParserTest.java(3 hunks)core/src/test/java/io/questdb/test/std/NumbersTest.java(47 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (26)
- GitHub Check: New pull request (SelfHosted Other tests on linux-arm64)
- GitHub Check: New pull request (SelfHosted Griffin And Fuzz tests on linux-arm64)
- GitHub Check: New pull request (SelfHosted Griffin And Fuzz tests on linux-x64-zfs)
- GitHub Check: New pull request (SelfHosted Other tests on linux-x64-zfs)
- GitHub Check: New pull request (Hosted Running tests on windows-other)
- GitHub Check: New pull request (Hosted Running tests on windows-pgwire)
- GitHub Check: New pull request (Hosted Running tests on windows-cairo)
- GitHub Check: New pull request (Hosted Running tests on windows-fuzz2)
- GitHub Check: New pull request (Hosted Running tests on windows-fuzz1)
- GitHub Check: New pull request (Hosted Running tests on windows-griffin)
- GitHub Check: New pull request (Hosted Running tests on mac-other)
- GitHub Check: New pull request (Hosted Running tests on mac-pgwire)
- GitHub Check: New pull request (Hosted Running tests on mac-cairo-fuzz)
- GitHub Check: New pull request (SelfHosted Other tests Start X64Zfs Agent)
- GitHub Check: New pull request (Hosted Running tests on mac-cairo)
- GitHub Check: New pull request (Rust Test and Lint on linux-jdk17)
- GitHub Check: New pull request (SelfHosted Griffin And Fuzz tests Start X64Zfs Agent)
- GitHub Check: New pull request (SelfHosted Other tests Start ARM Agent)
- GitHub Check: New pull request (Hosted Running tests on mac-griffin)
- GitHub Check: New pull request (SelfHosted Griffin And Fuzz tests Start ARM Agent)
- GitHub Check: New pull request (Hosted Running tests with cover on linux-other)
- GitHub Check: New pull request (Hosted Running tests with cover on linux-pgwire)
- GitHub Check: New pull request (Hosted Running tests with cover on linux-cairo)
- GitHub Check: New pull request (Hosted Running tests with cover on linux-fuzz)
- GitHub Check: New pull request (Hosted Running tests with cover on linux-griffin)
- GitHub Check: New pull request (Check Changes Check changes)
core/src/test/java/io/questdb/test/cutlass/line/tcp/LineTcpSenderTest.java
Show resolved
Hide resolved
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 5
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
core/src/test/java/io/questdb/test/cairo/fuzz/DedupInsertFuzzTest.java (1)
907-913: Extra closing parenthesis in ALTER statement causes SQL syntax error.There’s an extra “)” at the end of the format string.
Apply:
-String alterStatement = String.format( - "alter table %s dedup upsert keys(%s))", +String alterStatement = String.format( + "alter table %s dedup upsert keys(%s)", tableNameDedup, comaSeparatedUpsertCols );core/src/main/java/io/questdb/cairo/SymbolMapUtil.java (1)
141-163: Handle negative length on char file before mmap
charFileLen = ff.length(...)may return –1 on error but only== 0is checked, soopencan receive –1 and trigger an assertion. Changeif (charFileLen == 0)toif (charFileLen <= 0)(or explicitly throw on< 0) to cover negative lengths. [SymbolMapUtil.java:98-103]
♻️ Duplicate comments (1)
core/src/main/java/io/questdb/cutlass/line/tcp/SymbolCache.java (1)
156-161: Pass the copied path field, not the external parameter, to avoid mutable-path aliasingYou copy the table dir into
this.pathToTableDir, but the initial open still uses the externalpathToTableDirparameter. If that object is mutable, the reader may retain a reference to a path that later changes. Use the copied field consistently.Apply this diff:
- symbolMapReader.of(configuration, pathToTableDir, columnName, symbolTableNameTxn, symCount); + symbolMapReader.of(configuration, this.pathToTableDir, columnName, symbolTableNameTxn, symCount);
🧹 Nitpick comments (5)
core/pom.xml (1)
618-627: Rename the placeholder strings for clarity.
BothexcludePattern1values still end withjava11plus, which now mismatches the profile ID (java17+). Consider renaming them to avoid future confusion.core/src/test/java/io/questdb/test/cairo/fuzz/DedupInsertFuzzTest.java (2)
765-770: Switch expression looks good; avoid magic column index 2.Hard-coding column index 2 is brittle. Consider a named constant or look up the index by column name once.
Example:
- return switch (colType) { - case ColumnType.SYMBOL -> rec.getSymA(2); - case ColumnType.STRING -> rec.getStrA(2); - case ColumnType.VARCHAR -> Utf8s.toString(rec.getVarcharA(2)); - default -> throw new IllegalArgumentException(); - }; + final int KEY_COL = 2; // 's' column + return switch (colType) { + case ColumnType.SYMBOL -> rec.getSymA(KEY_COL); + case ColumnType.STRING -> rec.getStrA(KEY_COL); + case ColumnType.VARCHAR -> Utf8s.toString(rec.getVarcharA(KEY_COL)); + default -> throw new IllegalArgumentException("Unexpected colType=" + colType); + };
779-779: Five-boolean call to readTxnToString is error-prone; add intent.The expanded signature increases ordering mistakes risk. Add named locals or inline comments to document each flag, or wrap in a helper.
Example:
-String partitions = readTxnToString(tt, false, true, true, false); +// booleans: <arg1>, <arg2>, <arg3>, <arg4>, <arg5> +String partitions = readTxnToString(tt, /*arg1=*/false, /*arg2=*/true, /*arg3=*/true, /*arg4=*/false);Please confirm the flag order matches the new API.
Also applies to: 805-805
core/src/main/java/io/questdb/cairo/TableReader.java (1)
1709-1719: Provide actual symbolCount on renew and update when reopen isn’t neededPassing
0defers count refresh until a later pass. Use currenttxFilevalue and update when no reopen is required. This avoids transient undercount and extra work.private void renewSymbolMapReader(SymbolMapReader reader, int columnIndex) { if (ColumnType.isSymbol(metadata.getColumnType(columnIndex))) { final int writerColumnIndex = metadata.getWriterIndex(columnIndex); final long symbolTableNameTxn = columnVersionReader.getSymbolTableNameTxn(writerColumnIndex); String columnName = metadata.getColumnName(columnIndex); - if (!(reader instanceof SymbolMapReaderImpl symbolMapReader)) { - reader = new SymbolMapReaderImpl(configuration, path, columnName, symbolTableNameTxn, 0); + final int denseIdx = metadata.getDenseSymbolIndex(columnIndex); + final int symbolCount = txFile.getSymbolValueCount(denseIdx); + if (!(reader instanceof SymbolMapReaderImpl symbolMapReader)) { + reader = new SymbolMapReaderImpl(configuration, path, columnName, symbolTableNameTxn, symbolCount); } else { // Fully reopen the symbol map reader only when necessary - if (symbolMapReader.needsReopen(symbolTableNameTxn)) { - symbolMapReader.of(configuration, path, columnName, symbolTableNameTxn, 0); + if (symbolMapReader.needsReopen(symbolTableNameTxn)) { + symbolMapReader.of(configuration, path, columnName, symbolTableNameTxn, symbolCount); + } else { + symbolMapReader.updateSymbolCount(symbolCount); } } } else {core/src/main/java/io/questdb/cutlass/line/tcp/TableUpdateDetails.java (1)
616-632: Avoid statefulsymbolNameTemp; pass the name locally for clarity
resolveSymbolIndexAndName(...)sets and relies on a mutable field (symbolNameTemp). Make it return the name alongside the index (or derive the name locally) to reduce hidden state and cross-call coupling.Minimal change (derive name locally in
addSymbolCache):- final int symIndex = resolveSymbolIndexAndName(reader.getMetadata(), colWriterIndex); - if (symbolNameTemp == null || symIndex < 0) { + final int symIndex = resolveSymbolIndexAndName(reader.getMetadata(), colWriterIndex); + final String symbolName = + symIndex < 0 ? null : reader.getMetadata().getColumnName(reader.getMetadata().getColumnIndexQuiet(symbolNameTemp)); + if (symbolName == null || symIndex < 0) { return NOT_FOUND_LOOKUP; } ... - symCache.of( + symCache.of( cairoConfiguration, - symbolNameTemp, + symbolName, colWriterIndex, symIndex, path, writerAPI, txReader, columnVersionReader );Alternatively, refactor
resolveSymbolIndexAndNameto return a small record(index, name)and drop the field entirely.Also applies to: 531-541
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (38)
.gitignore(2 hunks)core/pom.xml(4 hunks)core/src/main/java/io/questdb/PropServerConfiguration.java(8 hunks)core/src/main/java/io/questdb/PropertyKey.java(2 hunks)core/src/main/java/io/questdb/cairo/CairoConfiguration.java(2 hunks)core/src/main/java/io/questdb/cairo/CairoConfigurationWrapper.java(2 hunks)core/src/main/java/io/questdb/cairo/ColumnVersionReader.java(5 hunks)core/src/main/java/io/questdb/cairo/ColumnVersionWriter.java(2 hunks)core/src/main/java/io/questdb/cairo/DefaultCairoConfiguration.java(2 hunks)core/src/main/java/io/questdb/cairo/IDGeneratorFactory.java(1 hunks)core/src/main/java/io/questdb/cairo/MetadataCache.java(1 hunks)core/src/main/java/io/questdb/cairo/O3PartitionJob.java(1 hunks)core/src/main/java/io/questdb/cairo/RebuildColumnBase.java(1 hunks)core/src/main/java/io/questdb/cairo/SymbolMapReaderImpl.java(5 hunks)core/src/main/java/io/questdb/cairo/SymbolMapUtil.java(5 hunks)core/src/main/java/io/questdb/cairo/SymbolMapWriter.java(8 hunks)core/src/main/java/io/questdb/cairo/TableReader.java(3 hunks)core/src/main/java/io/questdb/cairo/TableUtils.java(1 hunks)core/src/main/java/io/questdb/cairo/TableWriter.java(25 hunks)core/src/main/java/io/questdb/cairo/vm/Vm.java(1 hunks)core/src/main/java/io/questdb/cairo/wal/WalWriter.java(2 hunks)core/src/main/java/io/questdb/cutlass/http/processors/LineHttpProcessorState.java(1 hunks)core/src/main/java/io/questdb/cutlass/line/AbstractLineSender.java(2 hunks)core/src/main/java/io/questdb/cutlass/line/tcp/SymbolCache.java(6 hunks)core/src/main/java/io/questdb/cutlass/line/tcp/TableUpdateDetails.java(5 hunks)core/src/main/java/io/questdb/cutlass/text/ParallelCsvFileImporter.java(6 hunks)core/src/main/java/io/questdb/griffin/ConvertOperatorImpl.java(1 hunks)core/src/main/java/io/questdb/std/Numbers.java(53 hunks)core/src/main/resources/io/questdb/site/conf/server.conf(2 hunks)core/src/test/java/io/questdb/test/AbstractCairoTest.java(8 hunks)core/src/test/java/io/questdb/test/PropServerConfigurationTest.java(3 hunks)core/src/test/java/io/questdb/test/ServerMainTest.java(3 hunks)core/src/test/java/io/questdb/test/cairo/FullFwdPartitionFrameCursorTest.java(5 hunks)core/src/test/java/io/questdb/test/cairo/TableNameRegistryTest.java(4 hunks)core/src/test/java/io/questdb/test/cairo/TableWriterTest.java(13 hunks)core/src/test/java/io/questdb/test/cairo/fuzz/DedupInsertFuzzTest.java(6 hunks)core/src/test/java/io/questdb/test/cairo/fuzz/WalWriterFuzzTest.java(1 hunks)core/src/test/java/io/questdb/test/cairo/mv/MatViewIdenticalReplaceTest.java(6 hunks)
✅ Files skipped from review due to trivial changes (1)
- .gitignore
🚧 Files skipped from review as they are similar to previous changes (7)
- core/src/main/java/io/questdb/cutlass/text/ParallelCsvFileImporter.java
- core/src/main/java/io/questdb/cairo/TableUtils.java
- core/src/test/java/io/questdb/test/cairo/mv/MatViewIdenticalReplaceTest.java
- core/src/main/java/io/questdb/cutlass/http/processors/LineHttpProcessorState.java
- core/src/main/java/io/questdb/cairo/ColumnVersionWriter.java
- core/src/test/java/io/questdb/test/AbstractCairoTest.java
- core/src/main/java/io/questdb/cairo/vm/Vm.java
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: New pull request (Check Changes Check changes)
- GitHub Check: build
- GitHub Check: build
🔇 Additional comments (36)
core/src/test/java/io/questdb/test/ServerMainTest.java (3)
36-36: LGTM! Import added for platform-specific page size.The import is necessary to reference
Files.PAGE_SIZEin the test expectations below, ensuring the test correctly validates the platform-specific minimum allocation page size.
361-362: LGTM! Symbol table configuration updated to min/max allocation pattern.The replacement of the single
append.page.sizeparameter with separate min/max allocation page sizes aligns with the PR's objective to improve symbol table memory management. UsingFiles.PAGE_SIZEfor the minimum ensures platform compatibility.
692-694: LGTM! Auto-scaling parameters added with safe defaults.The new auto-scaling configuration parameters align with the PR objectives:
cairo.auto.scale.symbol.capacitydefaults tofalse(safe opt-in approach given rollback compatibility concerns)cairo.auto.scale.symbol.capacity.thresholdset to0.8matches the 80% threshold described in the PRcore/src/test/java/io/questdb/test/cairo/TableNameRegistryTest.java (4)
137-138: LGTM! Non-fatal error handling extended appropriately.Adding "could not remove table" as a non-fatal error condition is appropriate for concurrent drop operations, especially with the symbol capacity scaling changes that may affect file operations.
189-194: LGTM! Consistent error handling across concurrent tests.The same non-fatal error handling logic is correctly applied here with improved readability through multi-line formatting.
280-284: LGTM! Formatting improved.Multi-line constructor call improves readability without changing functionality.
758-764: LGTM! Text block adoption improves test readability.Switching to Java text blocks for multi-line SQL assertions is a clean improvement that aligns with the Java 17 update mentioned in the PR.
core/src/main/java/io/questdb/cairo/IDGeneratorFactory.java (1)
51-53: Good addition of volatile for thread-safe address visibility.The volatile modifier correctly ensures that the memory-mapped address is visible across threads after initialization in
open()or reset inclose(). This prevents threads from seeing stale addresses.Minor clarity note: The comment states "current ID is consumed by other threads," but the field stores the memory address where the ID resides, not the ID value itself. Consider clarifying: "the memory address is accessed by other threads, and we need to ensure the address is visible after mapping."
core/pom.xml (2)
35-35: Java 17 property bump looks good.
Aligningjavac.targetwith the project-wide JDK 17 move keeps compilation flags consistent.
82-82: Compiler plugin upgrade approved.
maven-compiler-plugin 3.11.0is required to compile with JDK 17, so this version bump is the right call.core/src/test/java/io/questdb/test/cairo/fuzz/DedupInsertFuzzTest.java (3)
644-662: Good adoption of instanceof pattern matching (Java 17).Cleaner and safe cast; no issues.
814-817: Use of stream().distinct().toList() is fine on JDK 17.The resulting list is unmodifiable; you don’t mutate it later, so this is correct.
1186-1188: Variable rename + bounds check logic LGTM.Condition is correct and guards modulo; no regressions expected.
core/src/test/java/io/questdb/test/cairo/fuzz/WalWriterFuzzTest.java (1)
123-127: Text block keeps the DDL unchanged.The text block form matches the previous SQL content while improving readability. All good.
core/src/main/java/io/questdb/cairo/RebuildColumnBase.java (1)
126-126: Accessor switch aligns with the new API.Using
tableWriter.columnVersionReader()matches the updated TableWriter surface; behavior stays the same.core/src/main/java/io/questdb/cairo/O3PartitionJob.java (1)
2576-2576: Consistent use of the new columnVersionReader accessor.The swap keeps the parquet index update logic intact while matching TableWriter’s updated API.
core/src/test/java/io/questdb/test/PropServerConfigurationTest.java (1)
428-428: LGTM! API migration correctly reflected.The test correctly migrates from the removed
getSymbolTableAppendPageSize()to the newgetSymbolTableMinAllocationPageSize()API, consistent with the PR's shift to min/max allocation page sizing for symbol tables.core/src/main/java/io/questdb/cairo/MetadataCache.java (1)
311-323: LGTM! Symbol table versioning correctly applied.The change from
getDefaultColumnNameTxn()togetSymbolTableNameTxn()correctly reflects the PR's introduction of separate symbol-table versioning. This ensures that symbol map files are located using the symbol-table-specific transaction ID rather than the default column transaction ID, which is consistent with the broader symbol capacity auto-scaling feature.core/src/main/java/io/questdb/cairo/SymbolMapUtil.java (4)
32-36: LGTM! Improved import clarity.Replacing wildcard imports with explicit imports improves code readability and reduces the risk of naming conflicts.
103-103: LGTM! Correct parameter passing.The call correctly passes
path.$()to match theLPSZparameter type expected by the updatedopen()signature.
118-118: LGTM! More idiomatic empty check.Using
isEmpty()is more idiomatic and readable thanlength() == 0.
201-203: LGTM! Well-bounded memory allocation.The
calculateExtendSegmentSize()helper computes a power-of-two size bounded by min/max configuration values, which should provide optimal memory allocation for symbol table growth while respecting system limits.core/src/main/resources/io/questdb/site/conf/server.conf (2)
493-505: LGTM! Auto-scaling properly documented and opt-in.The new configuration properly documents the symbol capacity auto-scaling feature. The default of
falsemakes this opt-in, which is appropriate for a new feature that changes file naming conventions (per the PR description's rollback warning). The 0.8 threshold is a sensible default for triggering capacity increases.
644-648: LGTM! Appropriate allocation bounds.The replacement of the single
append.page.sizewith separatemin.allocation.page.size(4k) andmax.allocation.page.size(8M) provides reasonable bounds for dynamic symbol table sizing. The 4k minimum aligns with typical page sizes, and the 8M maximum prevents excessive memory allocation.core/src/main/java/io/questdb/cairo/wal/WalWriter.java (2)
388-388: LGTM: ColumnVersionReader lifecycle correctly managed.The columnVersionReader is now properly freed in doClose, ensuring it remains available for the WalWriter's lifetime. This addresses the use-after-close risk flagged in previous reviews.
1247-1248: LGTM: Correct usage of symbol table name transaction.The switch from
getDefaultColumnNameTxntogetSymbolTableNameTxnaligns with the PR's introduction of symbol-table versioning. The variable naming (symbolTableNameTxn) clearly indicates this is the symbol table's name transaction, and it's correctly passed toconfigureSymbolMapWriter.core/src/main/java/io/questdb/cairo/CairoConfiguration.java (2)
69-88: LGTM: Well-documented auto-scaling configuration API.The new methods provide clear control over symbol capacity auto-scaling behavior:
autoScaleSymbolCapacity()acts as a feature flagautoScaleSymbolCapacityThreshold()provides fine-grained control over when resizing occursDocumentation clearly explains the percentage-based threshold and the rationale for having a disable option.
606-608: LGTM: Symbol table sizing API split improves flexibility.Replacing the single
getSymbolTableAppendPageSize()with separate min/max allocation page sizes allows better control over memory allocation behavior for symbol tables. This aligns with the auto-scaling feature where initial allocation can start small and grow as needed.core/src/main/java/io/questdb/griffin/ConvertOperatorImpl.java (1)
208-210: LGTM: Correct symbol table name transaction usage.The variable
existingSymbolTableNameTxnis appropriately named and correctly retrieves the symbol table's name transaction viagetSymbolTableNameTxn(). This is then properly passed to theSymbolMapReaderconstructor, aligning with the PR's symbol table versioning approach.core/src/main/java/io/questdb/cairo/CairoConfigurationWrapper.java (2)
69-77: LGTM: Correct delegation for auto-scaling configuration.The wrapper correctly delegates both
autoScaleSymbolCapacity()andautoScaleSymbolCapacityThreshold()to the underlying configuration, maintaining consistency with the wrapper pattern used throughout this file.
1045-1052: LGTM: Correct delegation for symbol table sizing API.Both
getSymbolTableMaxAllocationPageSize()andgetSymbolTableMinAllocationPageSize()are properly delegated to the underlying configuration, following the established wrapper pattern.core/src/main/java/io/questdb/cairo/DefaultCairoConfiguration.java (2)
99-107: LGTM: Sensible defaults for auto-scaling.Auto-scaling is enabled by default (
true) with an 80% threshold, which provides a good balance between minimizing resize operations and avoiding excessive memory waste. This aligns with the PR's goal of automatic capacity management.
1070-1077: LGTM: Appropriate symbol table allocation page sizes.The min/max allocation page sizes are well-chosen:
- Min (Files.PAGE_SIZE): Platform-specific, minimal allocation - good starting point for small symbol tables
- Max (8 MB): Reasonable upper bound that prevents excessive single allocations while allowing growth
These defaults support the auto-scaling feature by enabling efficient growth from small to large symbol tables.
core/src/main/java/io/questdb/cairo/TableWriter.java (3)
1250-1253: API exposure looks good
ReturningColumnVersionReaderview over the writer is a sensible narrowing without extra allocations.
4281-4298: Symbol writer creation changes LGTM
PassingsymbolIndexInTxWriter,txWriter, andcolumnIndeximproves consistency with versioned filenames and value-count tracking.Please confirm that
txWriter.getSymbolValueCount(symbolIndexInTxWriter)is stable for the dense writer ordering at this point in initialization.
3530-3614: Attach validation for SYMBOL columns is solid
Bounds-checking symbol keys and validating index files before attach reduces corruption risk.
|
removed few legacy tests, which test that TableWriter will recover from OS failures. These tests do not go thru WAL path and test limited functionality that remains intact after failures. The more robust method to recover from errors is to distress the writer and create a new one from disk state. |
[PR Coverage check]😍 pass : 401 / 451 (88.91%) file detail
|
This PR introduces automatic capacity management for symbols.
Previously, symbol capacity had to be manually tracked, which led to performance degradation when the symbol map was undersized—especially in high-cardinality cases. With this change, capacity is resized dynamically as cardinality grows, ensuring consistent system performance without manual intervention.
Technical details
floorPow2counterpart forceilPow2+ testsRisks
When auto-scaling is used and triggered (user is not in control when it is triggered, may be indirectly) it would make it impossible to rollback QuestDB to the previous versions. This is because symbol column file names will not be recognised by old QuestDB. Manual intervention will be required - basically renaming files back to what older QuestDB expects.
WIP: