Skip to content

fix(core): data corruption after range replace on column top partition#6519

Merged
bluestreak01 merged 5 commits intomasterfrom
puzpuzpuz_data_corruption_range_replace
Dec 10, 2025
Merged

fix(core): data corruption after range replace on column top partition#6519
bluestreak01 merged 5 commits intomasterfrom
puzpuzpuz_data_corruption_range_replace

Conversation

@puzpuzpuz
Copy link
Copy Markdown
Contributor

@puzpuzpuz puzpuzpuz commented Dec 10, 2025

The bug can be reproduced only in case of range replace transactions running on partitions with column tops, so no users are affected. That's because the only place where we use range replace is mat views and it's impossible to add a column to a mat view.

In case of a range replace transaction and a column top, O3OpenColumnJob#merge*Column() methods could overwrite data in the source partition making it corrupted. As an example, in the mergeVarColumn() method the auxRowCount variable was calculated after we adjust srcDataMax and srcDataTop values, so auxRowCount could end being set to zero while the source partition has some non-coltop rows. Later on, the columnTypeDriver.setPartAuxVectorNull() would overwrite the source aux file leaving it with the content like what's shown below.

# corrupted binary column (col top - 7, total rows - 8)
$ xxd /tmp/junit12784799641063025524/dbRoot/testBaseTableCanHaveColumnsAdded_0~2/2022-03-01.124/new_col_73.i.125
00000000: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000010: 0800 0000 0000 0000 1000 0000 0000 0000  ................
00000020: 1800 0000 0000 0000 2000 0000 0000 0000  ........ .......
00000030: 2800 0000 0000 0000 3000 0000 0000 0000  (.......0.......
00000040: 0000 0000 0000 0000 0000 0000 0000 0000  ................

$ xxd /tmp/junit12784799641063025524/dbRoot/testBaseTableCanHaveColumnsAdded_0~2/2022-03-01.124/new_col_73.d.125
00000000: ffff ffff ffff ffff ffff ffff ffff ffff  ................
00000010: ffff ffff ffff ffff ffff ffff ffff ffff  ................
00000020: ffff ffff ffff ffff ffff ffff ffff ffff  ................
00000030: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000040: 0000 0000 0000 0000 0000 0000 0000 0000  ................

# valid binary column (col top - 7, total rows - 8)
$ xxd /tmp/junit11744283241191190407/dbRoot/testBaseTableCanHaveColumnsAdded_0~2/2022-03-01.127/new_col_73.i.128 
00000000: 0000 0000 0000 0000 0800 0000 0000 0000  ................
00000010: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000020: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000030: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000040: 0000 0000 0000 0000 0000 0000 0000 0000  ................

$ xxd /tmp/junit11744283241191190407/dbRoot/testBaseTableCanHaveColumnsAdded_0~2/2022-03-01.127/new_col_73.d.128 
00000000: ffff ffff ffff ffff 0000 0000 0000 0000  ................
00000010: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000020: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000030: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000040: 0000 0000 0000 0000 0000 0000 0000 0000  ................

Notice that the second long value in the corrupted new_col_73.i file no longer holds 8, but instead is set to zero. As a result, MatViewFuzzTest#testBaseTableCanHaveColumnsAdded() was flaky:

2025-12-05T18:17:35.1492063Z 2025-12-05T18:17:35.148936Z C i.q.c.TableReader Invalid var len column size [column=new_col_73, size=0, path=/tmp/junit1364862788416964094/dbRoot/testBaseTableCanHaveColumnsAdded_0~2/2022-03-01.124/new_col_73.i.125]
2025-12-05T18:17:35.1501792Z 2025-12-05T18:17:35.149874Z E i.q.g.e.QueryProgress err [id=596, sql=`select min(c3), max(c3), ts from testBaseTableCanHaveColumnsAdded_0 sample by 1h`, principal=admin, cache=false, refreshMinTs=2022-03-01T15:00:00.000000Z, refreshMaxTs=2022-03-02T06:59:59.999999Z, jit=true, time=3426000, msg=Invalid column size [column=/tmp/junit1364862788416964094/dbRoot/testBaseTableCanHaveColumnsAdded_0~2/2022-03-01.124/new_col_73.i.125, size=0], errno=0, pos=0]
2025-12-05T18:17:35.1504652Z 2025-12-05T18:17:35.150017Z E i.q.c.m.MatViewRefreshJob could not refresh materialized view [view=testBaseTableCanHaveColumnsAdded_0_mv~3, ex=
2025-12-05T18:17:35.1505352Z io.questdb.cairo.CairoException: [0] Invalid column size [column=/tmp/junit1364862788416964094/dbRoot/testBaseTableCanHaveColumnsAdded_0~2/2022-03-01.124/new_col_73.i.125, size=0]
2025-12-05T18:17:35.1505945Z 	at io.questdb.cairo.CairoException.instance(CairoException.java:387)
2025-12-05T18:17:35.1506443Z 	at io.questdb.cairo.CairoException.critical(CairoException.java:81)
2025-12-05T18:17:35.1506921Z 	at io.questdb.cairo.TableReader.reloadColumnAt(TableReader.java:1546)
2025-12-05T18:17:35.1507581Z 	at io.questdb.cairo.TableReader.openPartitionColumns(TableReader.java:1233)

@puzpuzpuz puzpuzpuz self-assigned this Dec 10, 2025
@puzpuzpuz puzpuzpuz added Bug Incorrect or unexpected behavior Core Related to storage, data type, etc. labels Dec 10, 2025
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Dec 10, 2025

Important

Review skipped

Auto reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch puzpuzpuz_data_corruption_range_replace

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@glasstiger
Copy link
Copy Markdown
Contributor

[PR Coverage check]

😍 pass : 14 / 14 (100.00%)

file detail

path covered line new line coverage
🔵 io/questdb/std/FilesFacadeImpl.java 1 1 100.00%
🔵 io/questdb/cairo/O3OpenColumnJob.java 12 12 100.00%
🔵 io/questdb/cairo/StringTypeDriver.java 1 1 100.00%

@bluestreak01 bluestreak01 merged commit ac482a9 into master Dec 10, 2025
41 checks passed
@bluestreak01 bluestreak01 deleted the puzpuzpuz_data_corruption_range_replace branch December 10, 2025 20:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Bug Incorrect or unexpected behavior Core Related to storage, data type, etc.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants