Skip to content

Investigate alternatives to fixed 5ms pacing in the macOS multiline PTY workaround #300762

@jcansdale

Description

@jcansdale

Context

This is a follow-up issue for the workaround discussion raised in @Jarred-Sumner's post:

Related VS Code work:

The original bug is real: on macOS, multiline PTY input can get corrupted or stuck when the shell is fed a large multiline payload through VS Code's terminal pipeline. The current PR fixes one concrete VS Code-side bug by making the macOS multiline chunking gate UTF-8 byte-aware instead of relying on JS string length.

This issue is about the remaining design question: whether the current fixed 5 ms pacing heuristic is acceptable, and what viable alternatives exist.

What We Investigated

We looked at two separate concerns raised in the discussion:

  1. Whether the current workaround's 5 ms sleep between chunks is too heuristic, non-deterministic, or slow.
  2. Whether short writes or retries in the lower-level write path might actually be the main issue.

1. UTF-8 predicate mismatch in VS Code

We confirmed a real VS Code-side bug in the current macOS multiline gate:

  • The old gate used JS string length.
  • The PTY limit that matters here is UTF-8 bytes.
  • Multibyte payloads can stay under the UTF-16 gate while exceeding the relevant UTF-8 threshold.

That is now covered by a regression test in the PR and fixed locally by switching the gate and chunk splitting to UTF-8 byte semantics.

2. Lower-level partial write and retry handling

We re-read the node-pty Unix write path and did not find evidence that this specific repro is simply explained by missing short-write handling in VS Code.

At a high level, upstream node-pty already appears to:

  • queue writes,
  • handle partial writes with offsets,
  • retry on EAGAIN.

So this still looks like a separate issue from the VS Code-side UTF-16-vs-UTF-8 predicate bug and the canonical-mode or line-editor backpressure behavior seen on macOS.

That said, this area could still use a sharper audit from people who know the PTY stack better.

What We Tried

Baseline fixed-delay workaround

Existing local findings from the macOS multiline workaround investigation:

  • timeout(0) failed the medium and large multiline tests.
  • timeout(1) passed the medium case but still failed larger cases.
  • timeout(5) passed the existing 500-line local probe.

Event-driven pacing spike

We spiked a replacement for the fixed 5 ms delay in TerminalProcess.input():

  • chunk multiline writes,
  • wait for PTY output activity after each chunk,
  • then wait for a short quiet period before sending the next chunk.

We also tried a few variants:

  • wait for the first PTY activity event only,
  • use a 1 ms quiet window after activity,
  • use 0 ms,
  • use a next-turn yield (setTimeout0) instead of 1 ms.

Results

Small regression coverage

The event-driven spike with a 1 ms quiet window looked promising against the current small regression coverage:

  • medium ASCII multiline case passed repeatedly,
  • multibyte UTF-8 regression case passed repeatedly.

Larger payload stress

The same spike did not hold up under larger standalone ASCII payload probes:

  • 100 lines: 0/3 passes,
  • 500 lines: 0/3 passes,
  • 1000 lines: 0/3 passes,
  • 1500 lines: 0/3 passes,
  • 1660 lines: 0/3 passes.

Failure modes included:

  • no output file produced at all,
  • corrupted tail content rather than a clean stall.

0 ms and setTimeout0 were both too aggressive and regressed even the medium ASCII case.

Secondary issue surfaced by the spike

The event-driven spike also exposed an implementation problem in standalone Node-based probing:

  • canceling the internal timeout(...).cancel() path generated repeated unhandled Canceled promise rejections in plain Node 23 runs,
  • the existing Mocha tests did not surface this, but the standalone harness did.

So even setting aside the larger-payload regressions, that spike implementation would need cleanup before it could be considered further.

Current Conclusion

What seems justified right now:

  • Keep the UTF-8 byte-aware fix. That addresses a real bug.
  • Do not assume the event-driven pacing spike is a drop-in replacement for the current fixed delay.
  • Do not assume 0 ms, yield, or setTimeout0 is sufficient. We tried those and they regressed.

At the moment, the fixed-delay workaround still seems to be the only local approach that survives the larger payload probes we ran.

Open Questions

  1. Is there a better PTY-level signal than a fixed sleep for knowing when canonical-mode input is safe to continue writing on macOS?
  2. Is the remaining problem fundamentally shell line-editor backpressure, or is there still a lower-level write-path detail we are missing?
  3. Is there a better mitigation than a flat per-chunk delay, for example adaptive pacing based on observed output or buffer state?
  4. How much of the observed failure boundary is true PTY behavior versus scheduling noise on macOS?
  5. How should we think about the tradeoff between a narrow macOS-only workaround and the user-visible latency concerns Jarred raised?

If people familiar with macOS PTYs, shells in canonical mode, or node-pty internals want to weigh in, that would be useful.

Metadata

Metadata

Labels

insiders-releasedPatch has been released in VS Code Insiders

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions