Skip to content

Feature/issue 48 sequential handle glob match folder#50

Merged
yokofly merged 2 commits intomainfrom
feature/issue-48-sequential-handle-glob-match-folder
Aug 5, 2025
Merged

Feature/issue 48 sequential handle glob match folder#50
yokofly merged 2 commits intomainfrom
feature/issue-48-sequential-handle-glob-match-folder

Conversation

@yokofly
Copy link
Copy Markdown
Collaborator

@yokofly yokofly commented Aug 5, 2025

as titled avoid parallel goroutine in MergeReaders
bonus: update golang ver

Changelog

  • Change: Enforced strict, deterministic file processing order for file→DB runs.

    • --src-stream file://<folder> and --src-stream file://<folder>/* are now processed sequentially in ascending filename order.
    • Replaces prior async merging via goroutines that could interleave rows and produce non-deterministic ordering.
  • Impact

    • Ordering rule: lexical by filename after discovery and sort. Use zero-padded names (e.g., tablename_YYYYMMDD.csv, tablename_0001.csv) to match chronological order.
    • Scope: applies to Proton and all DB targets; also affects --stdout.
    • Performance: slight throughput trade-off vs previous async merging in exchange for correctness.
  • Verify

    1. Generate 50 files: tablename_20240101.csvtablename_20240150.csv (each with header and several rows).
    2. On the old build, run with --src-stream file://<folder>/* targeting Proton.
      • Expected: rows written to Proton may be out of order/non-deterministic.
    3. On the new build, run the same.
      • Expected: rows are written in strict ascending filename order (1 → 50).

@yokofly yokofly linked an issue Aug 5, 2025 that may be closed by this pull request
@yokofly yokofly merged commit b470d74 into main Aug 5, 2025
@yokofly yokofly deleted the feature/issue-48-sequential-handle-glob-match-folder branch August 5, 2025 03:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

folder2db: avoid parallel merge file

1 participant