perf(proton): 3x faster bulk insert for CSV to Proton ingestion by yokofly · Pull Request #64 · timeplus-io/sling-cli

yokofly · 2026-03-30T10:06:30Z

Summary

Optimizes the Proton bulk insert path. The main win is removing g.Debug() calls from the hot loop that caused a 3x slowdown when the -d flag is enabled (recommended for customer debugging).

Benchmark Results (fresh Proton server, 1.66M rows, controlled A/B)

Variant	With -d (customer debug mode)	Without -d
Baseline (v1.2.20-timeplus.4)	37.3s	13.2s
This PR	13.3s	13.0s
Improvement	2.8x faster	no regression

The -d flag is recommended for customer debugging. With -d enabled, the baseline spends most of its CPU on g.Debug() → runtime.Caller() stack walks for every nil value in every row. This PR removes those calls, making -d mode as fast as non-debug mode.

Changes (7 commits)

Performance

Enable LZ4 compression (a9affc1) — reduces bytes on WAN
Remove g.Debug() nil calls in hot loop (60a5ddd) — the main win: 37s → 13s with -d
Remove per-row mutex lock (18193a0) — 1.66M unnecessary lock/unlock cycles

Refactoring

Hoist invariant work out of per-batch loop (1668a82) — extract colClassification struct, compute once per schema change instead of per batch; remove no-op Begin/Commit

Bug fixes (pre-existing, found by Codex review)

Stable idempotent insert keys (in retry fix commit) — old code used time.Now() which changed on retry, causing duplicate inserts in proton-to-proton path. Now uses TaskExecution.ExecID (stable across retries, unique across imports)
Retry-safe array/map converters (in retry fix commit) — converters now accept already-converted native types, so batch retry after transient Send failure no longer causes permanent "expected string, got []string" errors

Infrastructure

Go 1.24.11 → 1.24.13 (d0c311b) — security fixes for crypto/tls, crypto/x509

What we tested and ruled out

Batch-level context lock: tested, caused 5-30s regression by blocking CSV reader pipeline during Send(). Dropped.
Go 1.24 → 1.26: zero measurable improvement, not worth dependency churn
batch_limit increase (50K → 200K): negligible improvement on localhost
Multi-file folder import (6 files): no new hotspots in our code; per-file connection lifecycle is architectural (separate PR)
10M row stress test: throughput scales linearly, no memory/GC issues

Data Integrity Verification

Row count: 1,660,051 exact match
Integer aggregates (sum_id): exact match
5 random row spot-checks (floats like 16954534081.07, Chinese chars, NULLs): all exact
Idempotent dedup verified: same import twice without truncate → no duplicates
Separate imports verified: two different sling invocations both accepted (not suppressed)

Test plan

Controlled A/B benchmark (fresh server, alternating runs)
Row count and data integrity verification
Idempotent dedup correctness (same import = deduped, different import = accepted)
Multi-file folder import (6 files)
10M row stress test
Customer UAT on their environment

Enable LZ4 compression for the proton-go-driver native protocol to reduce bytes on the wire. Minimal impact on localhost (~1s), but significant for remote connections through LB/NodePort where bandwidth and latency matter. Benchmark (localhost, 1.66M rows, batch_limit=50K): Before: 62s After: 61s (~0% on localhost, expected to help on WAN) Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

…t loop Remove 37 g.Debug() calls that fired for every nil value in every row. g.Debug() calls runtime.Caller() for stack walking + string formatting + zerolog event creation on each invocation. With -d flag enabled, 10 nullable columns, and 1.66M rows, this generated millions of expensive debug calls that provided zero diagnostic value (nil is normal for nullable columns). Benchmark (localhost, 1.66M rows, batch_limit=50K, cumulative): Baseline: 62s (41s user, 77% CPU) + LZ4: 61s + this: 28s (10s user, 40% CPU) ← 54% faster, biggest single win Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

The ds.Context.Lock()/Unlock() around batched.Append(row...) was unnecessary: - batched is a local object created in the operation closure - batchRows is a local snapshot fully drained from the channel - No concurrent goroutine accesses either during the append loop This saved 1.66M mutex lock/unlock cycles. Benchmark (localhost, 1.66M rows, batch_limit=50K, cumulative): Baseline: 62s (41.0s user) + LZ4: 61s + g.Debug removal: 28s (10.5s user) + this: 27s ( 9.8s user) ← ~1s improvement Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

…fake transactions Three structural improvements: 1. Extract colClassification struct + classifyColumns() function: Column type classification (the big switch mapping DbType → column index lists) is now computed once in BulkImportStream and reused across all batches, instead of being rebuilt inside every batch's retry closure. 2. Hoist ValidateColumnNames + GenerateInsertStatement to BulkImportStream: These only change on schema change events, not per-batch. With batch_limit=1000 and 1660 batches, this avoids 1659 redundant recomputations per import. 3. Remove fake Begin/Commit per batch: Proton does not support transactions (line 168 comment says so). The driver's Begin() returns nil, Commit() returns nil. These were pure ceremony adding method-chain overhead per batch. The retry loop now only contains: PrepareBatch → Append rows → Send. Benchmark (localhost, 1.66M rows, cumulative): batch_limit=50K: 27s → 28s (no change, expected on localhost) batch_limit=1000: 36s → 36s (no change on localhost; expected to help on WAN where per-batch CPU overhead compounds with network round-trip latency) Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

Security fixes since 1.24.11: - go1.24.12: security fixes in crypto/tls, archive/zip, net/url, go cmd - go1.24.13: security fixes in crypto/tls, crypto/x509, go cmd Relevant for fintech deployments connecting through TLS-terminated load balancers. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

go mod tidy with Go 1.24.13 removed the local replace directives for github.com/flarco/g and github.com/slingdata-io/sling. These are required for development against our local forks. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

… inserts Fix two pre-existing bugs found by Codex review: P1 - Idempotent ID unstable across whole-import retries: runProtonToProton retries by calling runFileToDB() from scratch, which creates a new connection and re-enters BulkImportStream(). The old code generated idempotent_id from time.Now().UnixNano(), so retried batches got new IDs and were inserted again. Fix: use TaskExecution.ExecID (a KSUID generated once per task) as the session component of the idempotent key. ExecID is set in runFileToDB via conn.SetProp("exec_id", t.ExecID) and read in processBatch. The resulting key is: exec_id + table + batch_number. This is stable across retries (same ExecID) but different across separate imports (new ExecID per task). P2 - In-place row mutation breaks retry for array/map types: batchRows is reused across retries. Type converters (e.g., convertToArrayString) mutate row values in-place from string to []string. On retry, the converter sees []string instead of string and returns "expected string, got []string". Fix: add idempotent type-check at the top of all 22 converters. If the value is already the target type, return it as-is. Both bugs are pre-existing (identical in v1.2.20-timeplus.4). Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

yokofly and others added 6 commits March 30, 2026 02:13

yokofly linked an issue Mar 30, 2026 that may be closed by this pull request

Ingestion slow #63

Closed

yokofly force-pushed the perf/proton-bulk-insert-optimization branch 10 times, most recently from b94f5c9 to edb2fcc Compare March 31, 2026 06:02

yokofly force-pushed the perf/proton-bulk-insert-optimization branch from edb2fcc to 9d8ca07 Compare March 31, 2026 06:04

fix: restore local replace directives in go.mod

0ef27eb

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

yokofly force-pushed the perf/proton-bulk-insert-optimization branch from 9d8ca07 to 0ef27eb Compare March 31, 2026 06:11

yokofly merged commit d20ae91 into main Mar 31, 2026

yokofly deleted the perf/proton-bulk-insert-optimization branch March 31, 2026 06:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(proton): 3x faster bulk insert for CSV to Proton ingestion#64

perf(proton): 3x faster bulk insert for CSV to Proton ingestion#64
yokofly merged 8 commits intomainfrom
perf/proton-bulk-insert-optimization

yokofly commented Mar 30, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

yokofly commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Benchmark Results (fresh Proton server, 1.66M rows, controlled A/B)

Changes (7 commits)

Performance

Refactoring

Bug fixes (pre-existing, found by Codex review)

Infrastructure

What we tested and ruled out

Data Integrity Verification

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

yokofly commented Mar 30, 2026 •

edited

Loading