-
Notifications
You must be signed in to change notification settings - Fork 38.9k
Investigate alternatives to fixed 5ms pacing in the macOS multiline PTY workaround #300762
Description
Context
This is a follow-up issue for the workaround discussion raised in @Jarred-Sumner's post:
Related VS Code work:
- Original PR: fix: chunk multiline PTY writes on macOS to avoid 1024-byte buffer corruption #298993
- Existing bug: Terminal tool corrupts multiline commands exceeding 1024 bytes #296955
- Current PR: WIP: make macOS multiline PTY chunking byte-aware #300740
The original bug is real: on macOS, multiline PTY input can get corrupted or stuck when the shell is fed a large multiline payload through VS Code's terminal pipeline. The current PR fixes one concrete VS Code-side bug by making the macOS multiline chunking gate UTF-8 byte-aware instead of relying on JS string length.
This issue is about the remaining design question: whether the current fixed 5 ms pacing heuristic is acceptable, and what viable alternatives exist.
What We Investigated
We looked at two separate concerns raised in the discussion:
- Whether the current workaround's
5 mssleep between chunks is too heuristic, non-deterministic, or slow. - Whether short writes or retries in the lower-level write path might actually be the main issue.
1. UTF-8 predicate mismatch in VS Code
We confirmed a real VS Code-side bug in the current macOS multiline gate:
- The old gate used JS string length.
- The PTY limit that matters here is UTF-8 bytes.
- Multibyte payloads can stay under the UTF-16 gate while exceeding the relevant UTF-8 threshold.
That is now covered by a regression test in the PR and fixed locally by switching the gate and chunk splitting to UTF-8 byte semantics.
2. Lower-level partial write and retry handling
We re-read the node-pty Unix write path and did not find evidence that this specific repro is simply explained by missing short-write handling in VS Code.
At a high level, upstream node-pty already appears to:
- queue writes,
- handle partial writes with offsets,
- retry on
EAGAIN.
So this still looks like a separate issue from the VS Code-side UTF-16-vs-UTF-8 predicate bug and the canonical-mode or line-editor backpressure behavior seen on macOS.
That said, this area could still use a sharper audit from people who know the PTY stack better.
What We Tried
Baseline fixed-delay workaround
Existing local findings from the macOS multiline workaround investigation:
timeout(0)failed the medium and large multiline tests.timeout(1)passed the medium case but still failed larger cases.timeout(5)passed the existing 500-line local probe.
Event-driven pacing spike
We spiked a replacement for the fixed 5 ms delay in TerminalProcess.input():
- chunk multiline writes,
- wait for PTY output activity after each chunk,
- then wait for a short quiet period before sending the next chunk.
We also tried a few variants:
- wait for the first PTY activity event only,
- use a
1 msquiet window after activity, - use
0 ms, - use a next-turn yield (
setTimeout0) instead of1 ms.
Results
Small regression coverage
The event-driven spike with a 1 ms quiet window looked promising against the current small regression coverage:
- medium ASCII multiline case passed repeatedly,
- multibyte UTF-8 regression case passed repeatedly.
Larger payload stress
The same spike did not hold up under larger standalone ASCII payload probes:
- 100 lines:
0/3passes, - 500 lines:
0/3passes, - 1000 lines:
0/3passes, - 1500 lines:
0/3passes, - 1660 lines:
0/3passes.
Failure modes included:
- no output file produced at all,
- corrupted tail content rather than a clean stall.
0 ms and setTimeout0 were both too aggressive and regressed even the medium ASCII case.
Secondary issue surfaced by the spike
The event-driven spike also exposed an implementation problem in standalone Node-based probing:
- canceling the internal
timeout(...).cancel()path generated repeated unhandledCanceledpromise rejections in plain Node 23 runs, - the existing Mocha tests did not surface this, but the standalone harness did.
So even setting aside the larger-payload regressions, that spike implementation would need cleanup before it could be considered further.
Current Conclusion
What seems justified right now:
- Keep the UTF-8 byte-aware fix. That addresses a real bug.
- Do not assume the event-driven pacing spike is a drop-in replacement for the current fixed delay.
- Do not assume
0 ms,yield, orsetTimeout0is sufficient. We tried those and they regressed.
At the moment, the fixed-delay workaround still seems to be the only local approach that survives the larger payload probes we ran.
Open Questions
- Is there a better PTY-level signal than a fixed sleep for knowing when canonical-mode input is safe to continue writing on macOS?
- Is the remaining problem fundamentally shell line-editor backpressure, or is there still a lower-level write-path detail we are missing?
- Is there a better mitigation than a flat per-chunk delay, for example adaptive pacing based on observed output or buffer state?
- How much of the observed failure boundary is true PTY behavior versus scheduling noise on macOS?
- How should we think about the tradeoff between a narrow macOS-only workaround and the user-visible latency concerns Jarred raised?
If people familiar with macOS PTYs, shells in canonical mode, or node-pty internals want to weigh in, that would be useful.