fix(renderer): YAML/JSON/diff code blocks lose newlines (#1618 / #1463)#1642
fix(renderer): YAML/JSON/diff code blocks lose newlines (#1618 / #1463)#16421 commit merged intomasterfrom
Conversation
Closes #1618 (reported by @Zixim) and corrects #1463's previous fix. Bug: YAML, JSON, and diff/patch fenced code blocks render flattened to a single line. Reporter noted the bug persisted v0.50.279 -> v0.50.291 -> v0.50.292 despite PR #1516's CSS-only "fix". Root cause: PR #484 (v0.50.237) added a JSON/YAML tree-viewer that routes those languages through <div class="code-tree-wrap">...<pre class="tree-raw-view"> instead of bare <pre>. Same release added the diff/patch coloring path that emits <pre class="diff-block">. The _pre_stash regex at static/ui.js:1914 matched only literal <pre> with no attributes: <pre>[\s\S]*?<\/pre> Both new shapes failed to match, fell through to the paragraph-wrap pass, and \n characters inside the code blocks got replaced with <br> tags inside <code>. By the time Prism ran, no newlines remained for the CSS rule (PR #1516, language-yaml .token { white-space: pre !important }) to preserve. Fix: relax the regex to accept any attribute on <pre>: <pre>[\s\S]*?<\/pre> -> <pre[^>]*>[\s\S]*?<\/pre> One regex character. Pulls JSON, YAML, and diff/patch blocks into the stash so paragraph-wrap can't mangle them. Bash, Python, Go, etc. were never affected because they emit bare <pre>. Tests: 9 new (2 source-string invariants + 7 behavioural via node-driver against the actual static/ui.js renderMd()). 6 of the 7 behavioural tests fail on master and pass with the fix; the 3 sanity checks (yml-alias, bash, mermaid) pass on both. Plus widened source-scan window in 3 pre-existing test_745 assertions from 400 to 1500 chars. The new comment block above the fixed regex pushed it past the previous scan window. Pure window-narrowness bug, not a behavior regression. 4245 -> 4254 passing.
nesquena
left a comment
There was a problem hiding this comment.
Review — end-to-end ✅ (clean approve, behavioural harness confirms regex match)
Self-built PR closing #1618 (reported by @Zixim) and superseding #1463's prior CSS-only attempt at #1516 (v0.50.279). YAML, JSON, and diff/patch fenced code blocks have been rendering flattened to a single line since v0.50.237 — the previous CSS-only fix targeted the wrong layer (Prism token white-space) when the bug actually lived earlier in the regex pipeline. One-character regex relax in static/ui.js.
What this ships
- static/ui.js:1914-1925 —
_pre_stashregex relaxed from<pre>to<pre[^>]*>. Pulls JSON's<pre class="tree-raw-view">and diff's<pre class="diff-block">(both introduced by #484 / v0.50.237) into the stash so the paragraph-wrap pass can't replace their\nwith<br>. Bash/Python/Go (bare<pre>) unaffected — same regex still matches them. - tests/test_issue1618_yaml_json_diff_newline_preserve.py — 9 new regression tests (2 source-string + 7 behavioral via node-driver against actual
static/ui.js renderMd()). - tests/test_745_code_block_newlines.py — three pre-existing window-scan assertions widened from 400 → 1500 chars (the new explanatory comment block pushed the regex past the previous scan window —
pytest-pitfalls§ D pattern).
Traced against upstream hermes-agent
Webui-only — renderMd() is the frontend's markdown renderer; the agent doesn't render markdown server-side. No cross-tool surface. ✅
Behavioural harness — regex match verification
Ran old vs new regex against the five <pre>/<div> shapes that flow through _pre_stash:
OLD regex / NEW regex matches:
✓ / ✓ bare <pre> (bash, python)
✗ / ✓ <pre class=tree-raw-view> (yaml/json) ← the bug
✗ / ✓ <pre class=diff-block> (diff/patch) ← the bug
✓ / ✓ <div class=pre-header><pre>
✓ / ✓ <div class=mermaid-block>
Confirms:
- Pre-fix: YAML/JSON tree-viewer (
<pre class="tree-raw-view">) and diff/patch coloring (<pre class="diff-block">) — both introduced by #484 — were silently slipping past the stash. The paragraph-wrap pass converted their\nto<br>inside<code>, leaving Prism with no newlines to highlight against. - Post-fix: all three
<pre>shapes (bare, tree-raw-view, diff-block) caught by the relaxed regex. Bash/Python/Go pass-through preserved. Mermaid/katex alternation untouched.
End-to-end trace
Pre-fix flow (Zixim's case):
- User pastes
```yaml\nfoo:\n bar: 1\n```. renderMd()runs Prism, which produces<div class="code-tree-wrap"><div class="pre-header">yaml</div><pre class="tree-raw-view"><code class="language-yaml">foo:\n bar: 1\n</code></pre></div>._pre_stashregex at line 1914 (old) only matches<pre>—<pre class="tree-raw-view">does NOT match → fall-through.- Paragraph-wrap pass runs on the un-stashed content →
\n→<br>inside<code>. - Prism re-tokenizes → builds spans from a single-line string → CSS
white-space: prerule has nothing to preserve. - User sees:
foo: bar: 1(single line). ❌
Post-fix flow:
1-2. Same.
3. _pre_stash regex (new, <pre[^>]*>) matches <pre class="tree-raw-view"> → stashed with token \x00E0\x00.
4. Paragraph-wrap pass operates on the un-stashed content but the <pre> block is now opaque (replaced by token).
5. Stash restoration replaces token with original <pre class="tree-raw-view"><code>foo:\n bar: 1\n</code></pre> — newlines intact.
6. Prism tokenizes the multi-line string → spans built per-line → CSS rule preserves them. User sees multi-line YAML. ✅
Security audit
- ✅ One-character regex change in pure markdown processing — no XSS surface added (the regex was always there to MATCH, the change just widens what it matches).
- ✅ No new endpoints, no auth changes, no
config.yamlwrites. - ✅ The
[^>]*between<preand>matches any char except>, which is the standard "anything inside the tag" idiom — won't match nested<or escape its boundary. - ✅ The match is greedy on
[\s\S]*?(non-greedy) up to</pre>— same lazy-match shape as before, no ReDoS surface introduced.
Race / state analysis
renderMd is a pure JS string transformation — no shared state, no async. Single-threaded JS execution. ✓
Edge-case trace
| Scenario | Pre-fix | Post-fix |
|---|---|---|
```yaml block with multi-line content |
collapsed to one line ❌ | multi-line preserved ✅ |
```json block with multi-line content |
collapsed | preserved ✅ |
```diff block with multi-line content |
collapsed | preserved ✅ |
```yml (Prism alias for yaml) |
preserved (different code path emitted bare <pre>) |
preserved (still works) ✅ |
```bash |
preserved | preserved ✅ |
```python |
preserved | preserved ✅ |
<div class="pre-header"><pre>...</pre> (existing path) |
preserved | preserved (alternation unchanged) ✅ |
<div class="mermaid-block">...</div> |
preserved | preserved (alternation unchanged) ✅ |
<div class="katex-block">...</div> |
preserved | preserved ✅ |
<pre><code class="language-bash">...</code></pre> (no attrs on pre) |
matched | matched (regex still matches no-attr case) ✅ |
<pre id="x" class="y"> (multiple attrs) |
not matched | matched ✅ |
Nested <pre> inside another <pre> |
malformed by source — n/a | malformed by source — n/a |
| Cross-tool: agent CLI doesn't render markdown | unaffected | unaffected ✅ |
Tests
tests/test_issue1618_yaml_json_diff_newline_preserve.py— 9/9 pass:- Source:
test_pre_stash_regex_matches_pre_with_attributes,test_pre_stash_still_captures_pre_header_and_optional_div - Behavioral (node-driver):
test_yaml_block_preserves_newlines,test_json_block_preserves_newlines,test_diff_block_preserves_newlines,test_yaml_block_renders_multiline_html_shape,test_yml_alias_already_worked_still_works,test_bash_block_unaffected_baseline,test_mermaid_block_unaffected_by_regex_relaxation
- Source:
tests/test_745_code_block_newlines.py— 9/9 pass with the widened scan window. Pure window adjustment, no behavior change.- PR description claims 6/9 behavioral tests fail on master pre-fix — the right kind of "the test would have caught the bug" verification.
- Full suite: 4199 passed, 57 skipped, 3 xpassed, 0 failed in 32.12s on the PR branch (PR description claims 4254; counting drift consistent with prior batches).
- CI: all 3 (3.11/3.12/3.13) green ✅.
Other audit — things that are correct already
- ✅ The 7-line explanatory comment above the relaxed regex correctly documents the bug class (every stash regex must match the union of HTML shapes the upstream pipeline emits, not just the original shape).
- ✅ The test driver pattern uses
nodeto spawn against the actualstatic/ui.jsrenderMd — that's the right level of behavioral test for a JS regex fix. Source-presence assertions alone (which is what #1516 had) are insufficient. - ✅ The PR description honestly acknowledges the previous diagnosis (maintainer reply on #1618 saying the fix had landed) was wrong, with apology to @Zixim. Good external-contributor handling.
- ✅ The CSS rule from #1516 is intentionally left in place as defense-in-depth. With the regex fix, newlines reach Prism intact and the rule's white-space preservation is now redundant-but-idempotent.
- ✅ Reporter (@Zixim) attribution preserved.
Minor observations (non-blocking)
-
The regex relax is a one-character net change but the diff is +9/-1 lines because the contributor added an explanatory comment block. That comment block is the right kind of documentation — anchors the fix to the reasoning so future contributors understand why the regex must accept attributes.
-
The 400 → 1500 widening in
test_745_code_block_newlines.pyis a pure scan-window fix, not a behavior change. Worth a future cleanup that anchors the test on actual symbols (e.g.re.search(...)for the regex pattern) rather than character offsets. Not blocking. -
<pre[^>]*>matches even malformed cases like<preen attribute>— but</pre>is required for the lazy match to terminate, and HTML parsers are extremely lenient about this. Practical concern is zero; flagging for completeness. -
PR description test count off by ~55 (4254 claimed vs 4199 local). Consistent with prior batches.
-
CHANGELOG provisionally stamped
[v0.50.295]which the PR description acknowledges as a release-agent boundary call. Not a review concern.
Recommendation
✅ Approved. Surgical one-character regex fix targeting the exact layer where the bug actually lived. Behavioral harness verifies the OLD regex misses YAML/JSON/diff while the NEW regex catches them — and bash/python/mermaid/katex shapes still match identically. The 6-of-9-tests-fail-on-master verification confirms the regression coverage actually catches the bug.
PR description honestly corrects the previous misdiagnosis on #1618. Test driver pattern (node-against-actual-renderMd) is the right level of behavioral test for a JS regex fix — source-presence alone (#1516's mistake) was insufficient.
Cross-tool safe (webui-only). 9/9 PR tests + full-suite green on all 3 Python versions. The CSS rule from #1516 stays in place as defense-in-depth.
Parked at approval — ready for the release agent's merge/tag pipeline.
…chaelyklam nesquena#1641) Adopt the UI media from @Michaelyklam's parallel-discovery PR nesquena#1641 which shipped the same one-character regex relax fix for nesquena#1618. PR nesquena#1641 is being closed as superseded by nesquena#1642 (which carries nesquena APPROVED + 322 LOC test suite); preserving Michael's UI evidence here so the visual proof of the fix lives in-tree alongside the canonical PR. Co-authored-by: Michael Lam <[email protected]>
nesquena#1618, nesquena#1463) by @nesquena-hermes — APPROVED, with media from @Michaelyklam
Constituent PRs: nesquena#1637 by @Michaelyklam — protect raw pre from glued-bold lift (closes nesquena#1451) nesquena#1639 by @bergeouss — macOS auto-scroll race + custom:* provider list (closes nesquena#1360, nesquena#1619) nesquena#1642 by @nesquena-hermes — YAML/JSON/diff code block newlines (closes nesquena#1618, nesquena#1463) Opus advisor SHIP verdict on stage-295. One observation absorbed: - api/config.py:2533 dead-code comment per Opus (defensive belt-and-braces for nesquena#1619 fallback; load-bearing fix is in routes.py /api/models/live) PR nesquena#1641 (Michaelyklam parallel-discovery duplicate of nesquena#1642) closed as superseded; UI media adopted with co-author trailer. 4245 → 4255 tests passing (+10).
Summary
Closes #1618 (reported by @Zixim) and corrects the previous fix attempt (PR #1516, v0.50.279).
TL;DR: YAML, JSON, and diff/patch fenced code blocks have been rendering flattened to a single line since v0.50.237 (PR #484). PR #1516's CSS-only fix was the wrong layer — it preserves newlines inside Prism's token spans, but the spans are built from a string that already had its newlines destroyed earlier in the pipeline. This PR fixes the actual layer with a one-character regex relax.
Verification of the previous diagnosis
I went back to verify whether @Zixim's "still broken in v0.50.291" report was correct or whether the maintainer reply (asserting the fix landed) was. Live-rendered both
```yaml ... ```and a control```yml ... ```block throughrenderMd()in the browser:@Zixim was right. The maintainer reply was wrong. Apologies posted on the issue.
Root cause
PR #484 (v0.50.237,
feat: collapsible JSON/YAML tree viewer) routeslang === 'json'andlang === 'yaml'through:Same release added diff/patch coloring with
<pre class="diff-block">. The_pre_stashregex atstatic/ui.js:1914matched only literal<pre>with no attributes:<pre class="tree-raw-view">and<pre class="diff-block">don't match → fall through to the paragraph wrap pass →\nbecomes<br>inside<code>. By the time Prism runs, there are no newlines left for the CSS rule from PR #1516 to highlight against.Bash, Python, Go, etc. were never affected because they emit bare
<pre>and matched the existing regex.Fix
One-character regex relax:
Pulls JSON, YAML, AND diff/patch blocks into the
_pre_stash, where they're protected from paragraph wrap. Newlines survive. CSS rule from PR #1516 stays in place as defense-in-depth (idempotent — works either way once newlines are preserved).Tests
4245 → 4254 passing (+9 regression tests). 0 regressions. Full suite ~115s.
Source-string tests (2)
test_pre_stash_regex_matches_pre_with_attributes— the regex must contain<pre[^>]*>and must NOT contain<pre>followed by[\s\S].test_pre_stash_still_captures_pre_header_and_optional_div— surrounding alternation (mermaid/katex blocks, optional<div class="pre-header">prefix) preserved.Behavioral tests via node-driver (7)
The driver pattern is borrowed from
tests/test_renderer_js_behaviour.py: spawnnodeon the actualstatic/ui.js, eval therenderMdfunction out of the file, and pipe markdown through stdin. Tests assert that the rendered<pre>inner content contains literal\ncharacters and zero<br>tags.test_yaml_block_preserves_newlines<pre>test_json_block_preserves_newlinestest_diff_block_preserves_newlines<pre class="diff-block">from PR #484's diff coloring path (also affected)test_yaml_block_renders_multiline_html_shapetest_yml_alias_already_worked_still_works```yml(Prism alias, bare<pre>) was never broken — must continue to worktest_bash_block_unaffected_baselinetest_mermaid_block_unaffected_by_regex_relaxation<div class="mermaid-block">alternation unaffectedBug-still-reproduces verification on master: with the test file added but
static/ui.jsreverted to master, 6 of the 9 tests fail. The 3 sanity-only tests pass on both. Strong regression coverage — a future drift back to the literal-<pre>shape would fail loudly.Pre-existing test that needed widening (1)
tests/test_745_code_block_newlines.pyhad threestash_block_idx + 400window-scan assertions that no longer reach the regex line (the new explanatory comment block pushed it past byte 400). Widened to 1500. Pure window-narrowness fix perpytest-pitfalls§ D — behavior preserved.Why CI never caught this
PR #1516 only added the CSS rule and a presence check (
assert 'language-yaml .token' in style.css). Source-presence is not behavior. No PR added a behavioral test that actually pushed YAML throughrenderMd()and inspected the rendered<code>for surviving\n. PR #484 (the upstream cause) didn't update the_pre_stashregex even though it changed the shape of<pre>blocks — same class as thewebui-rendermd-pipelineskill's documented bug pattern: every stash regex must match the union of HTML shapes any pass emits, not just the original shape from when the regex was written.Backward compat
test_745_code_block_newlines.pyadjusted to fit the wider source span — no behavior change.Risk
Very low. Single-character regex change in
static/ui.js. The change strictly widens what the regex matches; nothing that previously matched stops matching. End-to-end behavioral tests pin the actual rendered output, not just source presence.Pipeline state
fix/1618-yaml-json-newline-collapse@cbfc544origin/master@304a422(post v0.50.294 release)Verification checklist
node -c static/ui.js— cleanpython -m py_compile— N/A (no Python changes to product code)Per the standing release lane: requesting nesquena independent review, then ship via single-PR release lane.