perf(js): Offload serialize to worker thread at flush time by jacoblee93 · Pull Request #2781 · langchain-ai/langsmith-sdk

Jacob Lee (jacoblee93) · 2026-04-23T19:16:38Z

When LANGSMITH_PERF_OPTIMIZATION=true, the flush-time serializePayloadForTracing() call moves to a Node worker_threads worker, removing main-thread event-loop blocking for large (base64-image-heavy) payloads. postMessage transfers the payload via V8's structured clone, which refcounts large strings across isolates -- about 3x cheaper than stringifying on the main thread.

Falls back to synchronous serialize when: the flag is off, in manualFlushMode, worker_threads is unavailable (browsers / edge / Deno / etc.), or the payload contains non-cloneable values (DataCloneError).

Tracks in-flight drain promises so awaitPendingTraceBatches waits for the async serialize chain to register with batchIngestCaller.queue before checking idleness -- otherwise tests using the callback-based flush timing can race and see 0 fetches.

    === Event loop benchmark ===
    Mode:                        WORKER OFF (main-thread serialize at flush)
    Runs traced:                 100
    Per-run create payload:      2511.2 KB
    Per-run update payload:      5.2 KB
    Wall time (incl. drain):     1140.5 ms
    
                     total       max       p50       p95       p99
    createRun      381.58      5.99      3.67      4.97      5.99
    updateRun        8.67      0.29      0.08      0.16      0.29
    loop lag       792.38     41.18      0.16      5.33     39.31
    (loop lag monitor: 1ms target, 324 samples > 0)

    === Event loop benchmark ===
    Mode:                        WORKER ON (worker-thread serialize at flush)
    Runs traced:                 100
    Per-run create payload:      2511.2 KB
    Per-run update payload:      5.2 KB
    Wall time (incl. drain):     769.2 ms
    
                     total       max       p50       p95       p99
    createRun      374.57      7.03      3.68      4.78      7.03
    updateRun        8.01      0.30      0.06      0.20      0.30
    loop lag       400.87     17.04      0.16      3.76      5.29
    (loop lag monitor: 1ms target, 333 samples > 0)

When LANGSMITH_PERF_OPTIMIZATION=true, the flush-time serializePayloadForTracing() call moves to a Node worker_threads worker, removing main-thread event-loop blocking for large (base64-image-heavy) payloads. postMessage transfers the payload via V8's structured clone, which refcounts large strings across isolates -- about 3x cheaper than stringifying on the main thread. Falls back to synchronous serialize when: the flag is off, in manualFlushMode, worker_threads is unavailable (browsers / edge / Deno / etc.), or the payload contains non-cloneable values (DataCloneError). Tracks in-flight drain promises so awaitPendingTraceBatches waits for the async serialize chain to register with batchIngestCaller.queue before checking idleness -- otherwise tests using the callback-based flush timing can race and see 0 fetches. Benchmark (100 createRun, 2.5MB each, 500KB base64 image per message): DEFAULT mode: createRun p95 4.94ms wall time 1169ms event loop lag 824ms PERF mode (worker): createRun p95 0.51ms (9.7x) wall time 583ms (2.0x) event loop lag 161ms (5.1x) Tests: 30 existing serialize tests pass; 5 new SerializeWorker tests cover plain object, well-known types (Date/Map/Set/RegExp/bigint), DataCloneError, large base64, and circular refs. batch_client.test.ts failures unchanged (same 8 pre-existing hideMetadata failures on main).

The worker-thread serialization path only pays for itself on payloads dominated by large strings (base64 media, long documents) where V8 can refcount string storage across isolates instead of copying. For structure-heavy payloads -- many keys, deep nesting, lots of small strings -- the structuredClone walk plus thread-hop overhead exceeds the JSON.stringify cost and produced measurable regressions. Add a short-circuiting, node-budgeted `hasLargeString` walk and gate the worker dispatch on it. Falls through to sync serialize when no large string is found. Also extend `estimateSerializedSize` to return the longest-string length seen during its existing walk so future callers can use the signal without an extra scan. End-to-end bench (100 createRun+updateRun, mean of 3 runs): base64 (2.5MB images): 2.1x wall / 10x createRun p95 -- preserved structure (165KB nested): ~parity wide (192KB / 10k keys): ~parity mixed (60KB LLM): ~parity tiny (<1KB): ~parity

Mirrors the pattern used by prompt_cache / utils/fs: a Node-only `worker_threads.ts` module re-exports the bits we use (`Worker`, `WORKER_THREADS_AVAILABLE`) and is swapped for `worker_threads.browser.ts` (returning `null` / `false`) via the package.json `browser` field. Replaces the previous runtime detection (typeof require check + dynamic import("node:worker_threads")) which relied on bundlers to externalize the `node:` specifier. The new pattern is static and bundler-friendly: browser / edge builds pick the stub at bundle time and never traverse the Node variant at all.

github-actions · 2026-04-24T16:55:39Z

JS perf benchmark

Lower is better. Noisy on shared runners — treat as a signal, not a gate.

Base64-heavy payload

Missing results: base=false, pr=true

Structural payload

Missing results: base=false, pr=true

Jacob Lee (jacoblee93) added 7 commits April 23, 2026 10:47

Use estimator instead of serialization in hot path

a85130e

Format

7fa44b5

Types

09a7352

Format

4dee920

Jacob Lee (jacoblee93) requested a review from David Duong (dqbd) April 23, 2026 21:20

David Duong (dqbd) reviewed Apr 24, 2026

View reviewed changes

Comment thread js/src/utils/worker_threads.ts

Base automatically changed from jacob/perf to main April 24, 2026 01:41

Jacob Lee (jacoblee93) added 3 commits April 23, 2026 18:46

Merge

13d7436

Fix build

3006f2b

nits

b1a7c2f

David Duong (dqbd) approved these changes Apr 24, 2026

View reviewed changes

Jacob Lee (jacoblee93) added 2 commits April 24, 2026 09:44

Add perf action

2104d91

Fix

4008567

Jacob Lee (jacoblee93) merged commit 7557ce9 into main Apr 24, 2026
32 checks passed

Jacob Lee (jacoblee93) deleted the jacob/structuredclone branch April 24, 2026 18:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(js): Offload serialize to worker thread at flush time#2781

perf(js): Offload serialize to worker thread at flush time#2781
Jacob Lee (jacoblee93) merged 12 commits intomainfrom
jacob/structuredclone

Jacob Lee (jacoblee93) commented Apr 23, 2026 •

edited

Loading

Uh oh!

Uh oh!

github-actions Bot commented Apr 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Jacob Lee (jacoblee93) commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented Apr 24, 2026

JS perf benchmark

Base64-heavy payload

Structural payload

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Jacob Lee (jacoblee93) commented Apr 23, 2026 •

edited

Loading