Skip to content

fix(renderer): YAML/JSON/diff code blocks lose newlines (#1618 / #1463)#1642

Merged
1 commit merged intomasterfrom
fix/1618-yaml-json-newline-collapse
May 4, 2026
Merged

fix(renderer): YAML/JSON/diff code blocks lose newlines (#1618 / #1463)#1642
1 commit merged intomasterfrom
fix/1618-yaml-json-newline-collapse

Conversation

@nesquena-hermes
Copy link
Copy Markdown
Collaborator

Summary

Closes #1618 (reported by @Zixim) and corrects the previous fix attempt (PR #1516, v0.50.279).

TL;DR: YAML, JSON, and diff/patch fenced code blocks have been rendering flattened to a single line since v0.50.237 (PR #484). PR #1516's CSS-only fix was the wrong layer — it preserves newlines inside Prism's token spans, but the spans are built from a string that already had its newlines destroyed earlier in the pipeline. This PR fixes the actual layer with a one-character regex relax.

Verification of the previous diagnosis

I went back to verify whether @Zixim's "still broken in v0.50.291" report was correct or whether the maintainer reply (asserting the fix landed) was. Live-rendered both ```yaml ... ``` and a control ```yml ... ``` block through renderMd() in the browser:

yaml-tagged block:
  codeClass:        language-yaml
  textContent:      "foo:  bar: 1  baz:    - 2    - 3"   ← newlines GONE
  hasNewlines:      false
  rendered height:  15px (single line)

yml-tagged block:
  codeClass:        language-yml
  textContent:      "foo:\n  bar: 1\n  baz:\n    - 2\n    - 3"   ← preserved
  hasNewlines:      true
  rendered height:  102.1875px (multi-line, correct)

@Zixim was right. The maintainer reply was wrong. Apologies posted on the issue.

Root cause

PR #484 (v0.50.237, feat: collapsible JSON/YAML tree viewer) routes lang === 'json' and lang === 'yaml' through:

<div class="code-tree-wrap" data-raw="" data-lang="yaml" id="tree-…">
  <div class="pre-header">yaml</div>
  <pre class="tree-raw-view"><code class="language-yaml"></code></pre>
</div>

Same release added diff/patch coloring with <pre class="diff-block">. The _pre_stash regex at static/ui.js:1914 matched only literal <pre> with no attributes:

s=s.replace(/(<div class="pre-header">[\s\S]*?<\/div>)?<pre>[\s\S]*?<\/pre>|<div class="(mermaid-block|katex-block)"[\s\S]*?<\/div>/g, m=>{

<pre class="tree-raw-view"> and <pre class="diff-block"> don't match → fall through to the paragraph wrap pass → \n becomes <br> inside <code>. By the time Prism runs, there are no newlines left for the CSS rule from PR #1516 to highlight against.

Bash, Python, Go, etc. were never affected because they emit bare <pre> and matched the existing regex.

Fix

One-character regex relax:

- <pre>[\s\S]*?<\/pre>
+ <pre[^>]*>[\s\S]*?<\/pre>

Pulls JSON, YAML, AND diff/patch blocks into the _pre_stash, where they're protected from paragraph wrap. Newlines survive. CSS rule from PR #1516 stays in place as defense-in-depth (idempotent — works either way once newlines are preserved).

Tests

4245 → 4254 passing (+9 regression tests). 0 regressions. Full suite ~115s.

Source-string tests (2)

  • test_pre_stash_regex_matches_pre_with_attributes — the regex must contain <pre[^>]*> and must NOT contain <pre> followed by [\s\S].
  • test_pre_stash_still_captures_pre_header_and_optional_div — surrounding alternation (mermaid/katex blocks, optional <div class="pre-header"> prefix) preserved.

Behavioral tests via node-driver (7)

The driver pattern is borrowed from tests/test_renderer_js_behaviour.py: spawn node on the actual static/ui.js, eval the renderMd function out of the file, and pipe markdown through stdin. Tests assert that the rendered <pre> inner content contains literal \n characters and zero <br> tags.

Test What it pins
test_yaml_block_preserves_newlines The exact #1618 reproducer — YAML block with 5 lines must produce multi-line <pre>
test_json_block_preserves_newlines Same shape, JSON variant (also affected)
test_diff_block_preserves_newlines <pre class="diff-block"> from PR #484's diff coloring path (also affected)
test_yaml_block_renders_multiline_html_shape 5-line YAML must produce exactly 5 logical lines
test_yml_alias_already_worked_still_works Sanity: ```yml (Prism alias, bare <pre>) was never broken — must continue to work
test_bash_block_unaffected_baseline Sanity: bash was never affected
test_mermaid_block_unaffected_by_regex_relaxation Sanity: mermaid <div class="mermaid-block"> alternation unaffected

Bug-still-reproduces verification on master: with the test file added but static/ui.js reverted to master, 6 of the 9 tests fail. The 3 sanity-only tests pass on both. Strong regression coverage — a future drift back to the literal-<pre> shape would fail loudly.

Pre-existing test that needed widening (1)

tests/test_745_code_block_newlines.py had three stash_block_idx + 400 window-scan assertions that no longer reach the regex line (the new explanatory comment block pushed it past byte 400). Widened to 1500. Pure window-narrowness fix per pytest-pitfalls § D — behavior preserved.

Why CI never caught this

PR #1516 only added the CSS rule and a presence check (assert 'language-yaml .token' in style.css). Source-presence is not behavior. No PR added a behavioral test that actually pushed YAML through renderMd() and inspected the rendered <code> for surviving \n. PR #484 (the upstream cause) didn't update the _pre_stash regex even though it changed the shape of <pre> blocks — same class as the webui-rendermd-pipeline skill's documented bug pattern: every stash regex must match the union of HTML shapes any pass emits, not just the original shape from when the regex was written.

Backward compat

  • All other code-block languages unchanged (bash, python, go, json — wait, JSON is one of the affected ones; see above).
  • Mermaid and katex blocks unaffected — their alternation in the same regex was already shape-agnostic.
  • The CSS rule from PR fix: YAML code blocks collapse newlines due to Prism token white-space (#1463) #1516 stays in place. With the regex fix, newlines reach Prism intact, the rule does its (now-redundant but idempotent) job preserving them inside token spans.
  • Static-string tests in test_745_code_block_newlines.py adjusted to fit the wider source span — no behavior change.

Risk

Very low. Single-character regex change in static/ui.js. The change strictly widens what the regex matches; nothing that previously matched stops matching. End-to-end behavioral tests pin the actual rendered output, not just source presence.

Pipeline state

  • Branch: fix/1618-yaml-json-newline-collapse @ cbfc544
  • Base: origin/master @ 304a422 (post v0.50.294 release)
  • CHANGELOG: stamped v0.50.295 (provisional — release script will re-stamp if other PRs ship first)

Verification checklist

  • node -c static/ui.js — clean
  • python -m py_compile — N/A (no Python changes to product code)
  • 4254 / 4254 pass on full suite
  • 9 new regression tests cover every affected lang and every sanity baseline
  • Bug-still-reproduces verification: 6 tests FAIL on master pre-fix, all 9 PASS post-fix
  • Live browser render confirms multi-line YAML output post-fix

Per the standing release lane: requesting nesquena independent review, then ship via single-PR release lane.

Closes #1618 (reported by @Zixim) and corrects #1463's previous fix.

Bug: YAML, JSON, and diff/patch fenced code blocks render flattened to a
single line. Reporter noted the bug persisted v0.50.279 -> v0.50.291 ->
v0.50.292 despite PR #1516's CSS-only "fix".

Root cause: PR #484 (v0.50.237) added a JSON/YAML tree-viewer that routes
those languages through <div class="code-tree-wrap">...<pre class="tree-raw-view">
instead of bare <pre>. Same release added the diff/patch coloring path
that emits <pre class="diff-block">. The _pre_stash regex at
static/ui.js:1914 matched only literal <pre> with no attributes:

    <pre>[\s\S]*?<\/pre>

Both new shapes failed to match, fell through to the paragraph-wrap pass,
and \n characters inside the code blocks got replaced with <br> tags
inside <code>. By the time Prism ran, no newlines remained for the CSS
rule (PR #1516, language-yaml .token { white-space: pre !important }) to
preserve.

Fix: relax the regex to accept any attribute on <pre>:

    <pre>[\s\S]*?<\/pre>  ->  <pre[^>]*>[\s\S]*?<\/pre>

One regex character. Pulls JSON, YAML, and diff/patch blocks into the
stash so paragraph-wrap can't mangle them. Bash, Python, Go, etc. were
never affected because they emit bare <pre>.

Tests: 9 new (2 source-string invariants + 7 behavioural via node-driver
against the actual static/ui.js renderMd()). 6 of the 7 behavioural tests
fail on master and pass with the fix; the 3 sanity checks (yml-alias,
bash, mermaid) pass on both.

Plus widened source-scan window in 3 pre-existing test_745 assertions
from 400 to 1500 chars. The new comment block above the fixed regex
pushed it past the previous scan window. Pure window-narrowness bug,
not a behavior regression.

4245 -> 4254 passing.
@nesquena-hermes nesquena-hermes requested a review from nesquena May 4, 2026 18:13
@nesquena-hermes nesquena-hermes added bug Something isn't working renderer Issues related to the markdown/content renderer sprint-candidate Strong candidate for next sprint labels May 4, 2026
Copy link
Copy Markdown
Owner

@nesquena nesquena left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review — end-to-end ✅ (clean approve, behavioural harness confirms regex match)

Self-built PR closing #1618 (reported by @Zixim) and superseding #1463's prior CSS-only attempt at #1516 (v0.50.279). YAML, JSON, and diff/patch fenced code blocks have been rendering flattened to a single line since v0.50.237 — the previous CSS-only fix targeted the wrong layer (Prism token white-space) when the bug actually lived earlier in the regex pipeline. One-character regex relax in static/ui.js.

What this ships

  • static/ui.js:1914-1925_pre_stash regex relaxed from <pre> to <pre[^>]*>. Pulls JSON's <pre class="tree-raw-view"> and diff's <pre class="diff-block"> (both introduced by #484 / v0.50.237) into the stash so the paragraph-wrap pass can't replace their \n with <br>. Bash/Python/Go (bare <pre>) unaffected — same regex still matches them.
  • tests/test_issue1618_yaml_json_diff_newline_preserve.py — 9 new regression tests (2 source-string + 7 behavioral via node-driver against actual static/ui.js renderMd()).
  • tests/test_745_code_block_newlines.py — three pre-existing window-scan assertions widened from 400 → 1500 chars (the new explanatory comment block pushed the regex past the previous scan window — pytest-pitfalls § D pattern).

Traced against upstream hermes-agent

Webui-only — renderMd() is the frontend's markdown renderer; the agent doesn't render markdown server-side. No cross-tool surface. ✅

Behavioural harness — regex match verification

Ran old vs new regex against the five <pre>/<div> shapes that flow through _pre_stash:

OLD regex / NEW regex matches:
  ✓ / ✓  bare <pre> (bash, python)
  ✗ / ✓  <pre class=tree-raw-view> (yaml/json)        ← the bug
  ✗ / ✓  <pre class=diff-block> (diff/patch)          ← the bug
  ✓ / ✓  <div class=pre-header><pre>
  ✓ / ✓  <div class=mermaid-block>

Confirms:

  1. Pre-fix: YAML/JSON tree-viewer (<pre class="tree-raw-view">) and diff/patch coloring (<pre class="diff-block">) — both introduced by #484 — were silently slipping past the stash. The paragraph-wrap pass converted their \n to <br> inside <code>, leaving Prism with no newlines to highlight against.
  2. Post-fix: all three <pre> shapes (bare, tree-raw-view, diff-block) caught by the relaxed regex. Bash/Python/Go pass-through preserved. Mermaid/katex alternation untouched.

End-to-end trace

Pre-fix flow (Zixim's case):

  1. User pastes ```yaml\nfoo:\n bar: 1\n```.
  2. renderMd() runs Prism, which produces <div class="code-tree-wrap"><div class="pre-header">yaml</div><pre class="tree-raw-view"><code class="language-yaml">foo:\n bar: 1\n</code></pre></div>.
  3. _pre_stash regex at line 1914 (old) only matches <pre><pre class="tree-raw-view"> does NOT match → fall-through.
  4. Paragraph-wrap pass runs on the un-stashed content → \n<br> inside <code>.
  5. Prism re-tokenizes → builds spans from a single-line string → CSS white-space: pre rule has nothing to preserve.
  6. User sees: foo: bar: 1 (single line). ❌

Post-fix flow:
1-2. Same.
3. _pre_stash regex (new, <pre[^>]*>) matches <pre class="tree-raw-view"> → stashed with token \x00E0\x00.
4. Paragraph-wrap pass operates on the un-stashed content but the <pre> block is now opaque (replaced by token).
5. Stash restoration replaces token with original <pre class="tree-raw-view"><code>foo:\n bar: 1\n</code></pre> — newlines intact.
6. Prism tokenizes the multi-line string → spans built per-line → CSS rule preserves them. User sees multi-line YAML. ✅

Security audit

  • ✅ One-character regex change in pure markdown processing — no XSS surface added (the regex was always there to MATCH, the change just widens what it matches).
  • ✅ No new endpoints, no auth changes, no config.yaml writes.
  • ✅ The [^>]* between <pre and > matches any char except >, which is the standard "anything inside the tag" idiom — won't match nested < or escape its boundary.
  • ✅ The match is greedy on [\s\S]*? (non-greedy) up to </pre> — same lazy-match shape as before, no ReDoS surface introduced.

Race / state analysis

renderMd is a pure JS string transformation — no shared state, no async. Single-threaded JS execution. ✓

Edge-case trace

Scenario Pre-fix Post-fix
```yaml block with multi-line content collapsed to one line ❌ multi-line preserved ✅
```json block with multi-line content collapsed preserved ✅
```diff block with multi-line content collapsed preserved ✅
```yml (Prism alias for yaml) preserved (different code path emitted bare <pre>) preserved (still works) ✅
```bash preserved preserved ✅
```python preserved preserved ✅
<div class="pre-header"><pre>...</pre> (existing path) preserved preserved (alternation unchanged) ✅
<div class="mermaid-block">...</div> preserved preserved (alternation unchanged) ✅
<div class="katex-block">...</div> preserved preserved ✅
<pre><code class="language-bash">...</code></pre> (no attrs on pre) matched matched (regex still matches no-attr case) ✅
<pre id="x" class="y"> (multiple attrs) not matched matched ✅
Nested <pre> inside another <pre> malformed by source — n/a malformed by source — n/a
Cross-tool: agent CLI doesn't render markdown unaffected unaffected ✅

Tests

  • tests/test_issue1618_yaml_json_diff_newline_preserve.py — 9/9 pass:
    • Source: test_pre_stash_regex_matches_pre_with_attributes, test_pre_stash_still_captures_pre_header_and_optional_div
    • Behavioral (node-driver): test_yaml_block_preserves_newlines, test_json_block_preserves_newlines, test_diff_block_preserves_newlines, test_yaml_block_renders_multiline_html_shape, test_yml_alias_already_worked_still_works, test_bash_block_unaffected_baseline, test_mermaid_block_unaffected_by_regex_relaxation
  • tests/test_745_code_block_newlines.py — 9/9 pass with the widened scan window. Pure window adjustment, no behavior change.
  • PR description claims 6/9 behavioral tests fail on master pre-fix — the right kind of "the test would have caught the bug" verification.
  • Full suite: 4199 passed, 57 skipped, 3 xpassed, 0 failed in 32.12s on the PR branch (PR description claims 4254; counting drift consistent with prior batches).
  • CI: all 3 (3.11/3.12/3.13) green ✅.

Other audit — things that are correct already

  • ✅ The 7-line explanatory comment above the relaxed regex correctly documents the bug class (every stash regex must match the union of HTML shapes the upstream pipeline emits, not just the original shape).
  • ✅ The test driver pattern uses node to spawn against the actual static/ui.js renderMd — that's the right level of behavioral test for a JS regex fix. Source-presence assertions alone (which is what #1516 had) are insufficient.
  • ✅ The PR description honestly acknowledges the previous diagnosis (maintainer reply on #1618 saying the fix had landed) was wrong, with apology to @Zixim. Good external-contributor handling.
  • ✅ The CSS rule from #1516 is intentionally left in place as defense-in-depth. With the regex fix, newlines reach Prism intact and the rule's white-space preservation is now redundant-but-idempotent.
  • ✅ Reporter (@Zixim) attribution preserved.

Minor observations (non-blocking)

  • The regex relax is a one-character net change but the diff is +9/-1 lines because the contributor added an explanatory comment block. That comment block is the right kind of documentation — anchors the fix to the reasoning so future contributors understand why the regex must accept attributes.

  • The 400 → 1500 widening in test_745_code_block_newlines.py is a pure scan-window fix, not a behavior change. Worth a future cleanup that anchors the test on actual symbols (e.g. re.search(...) for the regex pattern) rather than character offsets. Not blocking.

  • <pre[^>]*> matches even malformed cases like <preen attribute> — but </pre> is required for the lazy match to terminate, and HTML parsers are extremely lenient about this. Practical concern is zero; flagging for completeness.

  • PR description test count off by ~55 (4254 claimed vs 4199 local). Consistent with prior batches.

  • CHANGELOG provisionally stamped [v0.50.295] which the PR description acknowledges as a release-agent boundary call. Not a review concern.

Recommendation

Approved. Surgical one-character regex fix targeting the exact layer where the bug actually lived. Behavioral harness verifies the OLD regex misses YAML/JSON/diff while the NEW regex catches them — and bash/python/mermaid/katex shapes still match identically. The 6-of-9-tests-fail-on-master verification confirms the regression coverage actually catches the bug.

PR description honestly corrects the previous misdiagnosis on #1618. Test driver pattern (node-against-actual-renderMd) is the right level of behavioral test for a JS regex fix — source-presence alone (#1516's mistake) was insufficient.

Cross-tool safe (webui-only). 9/9 PR tests + full-suite green on all 3 Python versions. The CSS rule from #1516 stays in place as defense-in-depth.

Parked at approval — ready for the release agent's merge/tag pipeline.

@nesquena-hermes nesquena-hermes closed this pull request by merging all changes into master in 4085a1f May 4, 2026
bergeouss pushed a commit to bergeouss/hermes-webui that referenced this pull request May 4, 2026
…chaelyklam nesquena#1641)

Adopt the UI media from @Michaelyklam's parallel-discovery PR nesquena#1641 which
shipped the same one-character regex relax fix for nesquena#1618. PR nesquena#1641 is
being closed as superseded by nesquena#1642 (which carries nesquena APPROVED +
322 LOC test suite); preserving Michael's UI evidence here so the visual
proof of the fix lives in-tree alongside the canonical PR.

Co-authored-by: Michael Lam <[email protected]>
bergeouss pushed a commit to bergeouss/hermes-webui that referenced this pull request May 4, 2026
bergeouss pushed a commit to bergeouss/hermes-webui that referenced this pull request May 4, 2026
Constituent PRs:
  nesquena#1637 by @Michaelyklam — protect raw pre from glued-bold lift (closes nesquena#1451)
  nesquena#1639 by @bergeouss — macOS auto-scroll race + custom:* provider list (closes nesquena#1360, nesquena#1619)
  nesquena#1642 by @nesquena-hermes — YAML/JSON/diff code block newlines (closes nesquena#1618, nesquena#1463)

Opus advisor SHIP verdict on stage-295. One observation absorbed:
- api/config.py:2533 dead-code comment per Opus (defensive belt-and-braces
  for nesquena#1619 fallback; load-bearing fix is in routes.py /api/models/live)

PR nesquena#1641 (Michaelyklam parallel-discovery duplicate of nesquena#1642) closed as
superseded; UI media adopted with co-author trailer.

4245 → 4255 tests passing (+10).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working renderer Issues related to the markdown/content renderer sprint-candidate Strong candidate for next sprint

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants