perf(js): Offload serialize to worker thread at flush time#2781
Merged
Jacob Lee (jacoblee93) merged 12 commits intomainfrom Apr 24, 2026
Merged
perf(js): Offload serialize to worker thread at flush time#2781Jacob Lee (jacoblee93) merged 12 commits intomainfrom
Jacob Lee (jacoblee93) merged 12 commits intomainfrom
Conversation
When LANGSMITH_PERF_OPTIMIZATION=true, the flush-time
serializePayloadForTracing() call moves to a Node worker_threads worker,
removing main-thread event-loop blocking for large (base64-image-heavy)
payloads. postMessage transfers the payload via V8's structured clone,
which refcounts large strings across isolates -- about 3x cheaper than
stringifying on the main thread.
Falls back to synchronous serialize when: the flag is off, in
manualFlushMode, worker_threads is unavailable (browsers / edge /
Deno / etc.), or the payload contains non-cloneable values
(DataCloneError).
Tracks in-flight drain promises so awaitPendingTraceBatches waits for
the async serialize chain to register with batchIngestCaller.queue
before checking idleness -- otherwise tests using the callback-based
flush timing can race and see 0 fetches.
Benchmark (100 createRun, 2.5MB each, 500KB base64 image per message):
DEFAULT mode:
createRun p95 4.94ms
wall time 1169ms
event loop lag 824ms
PERF mode (worker):
createRun p95 0.51ms (9.7x)
wall time 583ms (2.0x)
event loop lag 161ms (5.1x)
Tests: 30 existing serialize tests pass; 5 new SerializeWorker tests
cover plain object, well-known types (Date/Map/Set/RegExp/bigint),
DataCloneError, large base64, and circular refs. batch_client.test.ts
failures unchanged (same 8 pre-existing hideMetadata failures on main).
The worker-thread serialization path only pays for itself on payloads dominated by large strings (base64 media, long documents) where V8 can refcount string storage across isolates instead of copying. For structure-heavy payloads -- many keys, deep nesting, lots of small strings -- the structuredClone walk plus thread-hop overhead exceeds the JSON.stringify cost and produced measurable regressions. Add a short-circuiting, node-budgeted `hasLargeString` walk and gate the worker dispatch on it. Falls through to sync serialize when no large string is found. Also extend `estimateSerializedSize` to return the longest-string length seen during its existing walk so future callers can use the signal without an extra scan. End-to-end bench (100 createRun+updateRun, mean of 3 runs): base64 (2.5MB images): 2.1x wall / 10x createRun p95 -- preserved structure (165KB nested): ~parity wide (192KB / 10k keys): ~parity mixed (60KB LLM): ~parity tiny (<1KB): ~parity
Mirrors the pattern used by prompt_cache / utils/fs: a Node-only
`worker_threads.ts` module re-exports the bits we use (`Worker`,
`WORKER_THREADS_AVAILABLE`) and is swapped for `worker_threads.browser.ts`
(returning `null` / `false`) via the package.json `browser` field.
Replaces the previous runtime detection (typeof require check + dynamic
import("node:worker_threads")) which relied on bundlers to externalize
the `node:` specifier. The new pattern is static and bundler-friendly:
browser / edge builds pick the stub at bundle time and never traverse
the Node variant at all.
David Duong (dqbd)
approved these changes
Apr 24, 2026
JS perf benchmarkLower is better. Noisy on shared runners — treat as a signal, not a gate. Base64-heavy payloadMissing results: base=false, pr=true Structural payloadMissing results: base=false, pr=true |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
When LANGSMITH_PERF_OPTIMIZATION=true, the flush-time serializePayloadForTracing() call moves to a Node worker_threads worker, removing main-thread event-loop blocking for large (base64-image-heavy) payloads. postMessage transfers the payload via V8's structured clone, which refcounts large strings across isolates -- about 3x cheaper than stringifying on the main thread.
Falls back to synchronous serialize when: the flag is off, in manualFlushMode, worker_threads is unavailable (browsers / edge / Deno / etc.), or the payload contains non-cloneable values (DataCloneError).
Tracks in-flight drain promises so awaitPendingTraceBatches waits for the async serialize chain to register with batchIngestCaller.queue before checking idleness -- otherwise tests using the callback-based flush timing can race and see 0 fetches.