Skip to content

perf(js): Offload serialize to worker thread at flush time#2781

Merged
Jacob Lee (jacoblee93) merged 12 commits intomainfrom
jacob/structuredclone
Apr 24, 2026
Merged

perf(js): Offload serialize to worker thread at flush time#2781
Jacob Lee (jacoblee93) merged 12 commits intomainfrom
jacob/structuredclone

Conversation

@jacoblee93
Copy link
Copy Markdown
Collaborator

@jacoblee93 Jacob Lee (jacoblee93) commented Apr 23, 2026

When LANGSMITH_PERF_OPTIMIZATION=true, the flush-time serializePayloadForTracing() call moves to a Node worker_threads worker, removing main-thread event-loop blocking for large (base64-image-heavy) payloads. postMessage transfers the payload via V8's structured clone, which refcounts large strings across isolates -- about 3x cheaper than stringifying on the main thread.

Falls back to synchronous serialize when: the flag is off, in manualFlushMode, worker_threads is unavailable (browsers / edge / Deno / etc.), or the payload contains non-cloneable values (DataCloneError).

Tracks in-flight drain promises so awaitPendingTraceBatches waits for the async serialize chain to register with batchIngestCaller.queue before checking idleness -- otherwise tests using the callback-based flush timing can race and see 0 fetches.

    === Event loop benchmark ===
    Mode:                        WORKER OFF (main-thread serialize at flush)
    Runs traced:                 100
    Per-run create payload:      2511.2 KB
    Per-run update payload:      5.2 KB
    Wall time (incl. drain):     1140.5 ms
    
                     total       max       p50       p95       p99
    createRun      381.58      5.99      3.67      4.97      5.99
    updateRun        8.67      0.29      0.08      0.16      0.29
    loop lag       792.38     41.18      0.16      5.33     39.31
    (loop lag monitor: 1ms target, 324 samples > 0)
    === Event loop benchmark ===
    Mode:                        WORKER ON (worker-thread serialize at flush)
    Runs traced:                 100
    Per-run create payload:      2511.2 KB
    Per-run update payload:      5.2 KB
    Wall time (incl. drain):     769.2 ms
    
                     total       max       p50       p95       p99
    createRun      374.57      7.03      3.68      4.78      7.03
    updateRun        8.01      0.30      0.06      0.20      0.30
    loop lag       400.87     17.04      0.16      3.76      5.29
    (loop lag monitor: 1ms target, 333 samples > 0)


When LANGSMITH_PERF_OPTIMIZATION=true, the flush-time
serializePayloadForTracing() call moves to a Node worker_threads worker,
removing main-thread event-loop blocking for large (base64-image-heavy)
payloads. postMessage transfers the payload via V8's structured clone,
which refcounts large strings across isolates -- about 3x cheaper than
stringifying on the main thread.

Falls back to synchronous serialize when: the flag is off, in
manualFlushMode, worker_threads is unavailable (browsers / edge /
Deno / etc.), or the payload contains non-cloneable values
(DataCloneError).

Tracks in-flight drain promises so awaitPendingTraceBatches waits for
the async serialize chain to register with batchIngestCaller.queue
before checking idleness -- otherwise tests using the callback-based
flush timing can race and see 0 fetches.

Benchmark (100 createRun, 2.5MB each, 500KB base64 image per message):
  DEFAULT mode:
    createRun p95       4.94ms
    wall time        1169ms
    event loop lag    824ms
  PERF mode (worker):
    createRun p95       0.51ms   (9.7x)
    wall time         583ms   (2.0x)
    event loop lag    161ms   (5.1x)

Tests: 30 existing serialize tests pass; 5 new SerializeWorker tests
cover plain object, well-known types (Date/Map/Set/RegExp/bigint),
DataCloneError, large base64, and circular refs. batch_client.test.ts
failures unchanged (same 8 pre-existing hideMetadata failures on main).
The worker-thread serialization path only pays for itself on payloads
dominated by large strings (base64 media, long documents) where V8 can
refcount string storage across isolates instead of copying. For
structure-heavy payloads -- many keys, deep nesting, lots of small
strings -- the structuredClone walk plus thread-hop overhead exceeds
the JSON.stringify cost and produced measurable regressions.

Add a short-circuiting, node-budgeted `hasLargeString` walk and gate
the worker dispatch on it. Falls through to sync serialize when no
large string is found. Also extend `estimateSerializedSize` to return
the longest-string length seen during its existing walk so future
callers can use the signal without an extra scan.

End-to-end bench (100 createRun+updateRun, mean of 3 runs):
  base64 (2.5MB images): 2.1x wall / 10x createRun p95 -- preserved
  structure (165KB nested): ~parity
  wide (192KB / 10k keys): ~parity
  mixed (60KB LLM): ~parity
  tiny (<1KB): ~parity
Mirrors the pattern used by prompt_cache / utils/fs: a Node-only
`worker_threads.ts` module re-exports the bits we use (`Worker`,
`WORKER_THREADS_AVAILABLE`) and is swapped for `worker_threads.browser.ts`
(returning `null` / `false`) via the package.json `browser` field.

Replaces the previous runtime detection (typeof require check + dynamic
import("node:worker_threads")) which relied on bundlers to externalize
the `node:` specifier. The new pattern is static and bundler-friendly:
browser / edge builds pick the stub at bundle time and never traverse
the Node variant at all.
Comment thread js/src/utils/worker_threads.ts
Base automatically changed from jacob/perf to main April 24, 2026 01:41
@github-actions
Copy link
Copy Markdown

JS perf benchmark

Lower is better. Noisy on shared runners — treat as a signal, not a gate.

Base64-heavy payload

Missing results: base=false, pr=true

Structural payload

Missing results: base=false, pr=true

@jacoblee93 Jacob Lee (jacoblee93) merged commit 7557ce9 into main Apr 24, 2026
32 checks passed
@jacoblee93 Jacob Lee (jacoblee93) deleted the jacob/structuredclone branch April 24, 2026 18:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants