Skip to content

fix(renderer): preserve raw <pre> blocks during markdown rewrites (v0.50.228)#1172

Closed
nesquena-hermes wants to merge 3 commits intomasterfrom
integrate/pr1150
Closed

fix(renderer): preserve raw <pre> blocks during markdown rewrites (v0.50.228)#1172
nesquena-hermes wants to merge 3 commits intomasterfrom
integrate/pr1150

Conversation

@nesquena-hermes
Copy link
Copy Markdown
Collaborator

Attribution: Original work by @bsgdigital (PR #1150). Integrated here.

What this fixes

renderMd() in static/ui.js processes raw HTML from model output through a series of rewrites. One pass converts <code>text</code> to `text` (markdown inline code). That pass was running inside raw <pre> blocks, so multiline <pre><code> blocks from the model got their content rewritten to backtick strings — producing broken output like:

`line 1
line 2
`</pre>

How it's fixed

6 lines: stash all <pre>...</pre> blocks behind \x00R{n}\x00 tokens before the inline-code rewrite, restore them after. The token uses \x00R — no collision with existing \x00F (fence stash) or \x00M (math stash) tokens.

The regex (<pre\b[^>]*>[\s\S]*?<\/pre>) is non-greedy and case-insensitive, handles <pre class="..."> variants correctly.

Tests: 1 new test class (TestRawPreCodePreservation) with 1 test covering the exact regression. Full suite: 2635 passing.

bsgdigital and others added 3 commits April 27, 2026 21:07
Stash raw pre blocks before inline code-tag normalization so multiline code content is not rewritten into stray backticks that can corrupt subsequent code box rendering. Add a JS-renderer behavior test that guards balanced pre/code output across multiple raw code blocks.

Made-with: Cursor
Rename the raw pre stash variable so static substring checks for _pre_stash continue to target the intended code-block stash section.

Made-with: Cursor
Copy link
Copy Markdown
Owner

@nesquena nesquena left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review — end-to-end ✅ (clean approve)

What this ships

v0.50.228 — absorbing @bsgdigital's PR #1150. Fixes renderMd() corrupting raw <pre> blocks during the inline-<code>-to-backtick rewrite.

Six lines in static/ui.js:813-824:

const rawPreStash=[];
s=s.replace(/(<pre\b[^>]*>[\s\S]*?<\/pre>)/gi,m=>{rawPreStash.push(m);return `\x00R${rawPreStash.length-1}\x00`;});
// ... existing strong/b/em/i/code/br rewrites run here ...
s=s.replace(/\x00R(\d+)\x00/g,(_,i)=>rawPreStash[+i]);

Traced against upstream hermes-agent

Pulled fresh nousresearch/hermes-agent tarball. renderMd is purely WebUI-frontend — agent emits text/markdown, not raw <pre> HTML, so this change has zero cross-tool coupling. No agent-side files touched.

End-to-end trace

Token namespace check — no collision

Existing escape tokens in renderMd: \x00F (fence), \x00M (math), \x00Q (blockquote), \x00D (media), \x00C (code), \x00G (img), \x00L (link), \x00O (code outer), \x00A (anchor), \x00B (block-element-line). The new \x00R is the only unused letter near the top — no collision.

Stash/restore ordering

Lines 798-826 of static/ui.js:

# Op Why
1 Stash F (fences) First — protects `code` from $ math
2 Stash M (math) After fences
3 Stash R (raw <pre>) ← new Before HTML→md rewrites
4 <strong> <b> <em> <i> <code> <br> Rewrites — won't see content inside R
5 Restore R ← new Bring <pre> back verbatim
6 Restore F (fences) Bring fences back

Pre blocks are protected through the per-line passes by a separate _al_stash at ui.js:951 which also matches <pre>...</pre> — so restored pre content is re-stashed before per-line list/heading/blockquote work runs. ✅

Behavioural harness — 7/7 pass on PR, 5/7 on master

PASS [Multiline pre>code preserved]
PASS [Two pre blocks separated by paragraph]  
PASS [Pre with class attr]
PASS [Pre containing strong should NOT collapse to **]
PASS [Plain markdown still rewrites]
PASS [<code> outside <pre> still rewrites]
PASS [Empty <pre></pre> preserved]

On master, two failures confirm the bug:

  1. <pre><code>line 1\nline 2\nline 3</code></pre><pre> `line 1\nline 2\nline 3` </pre> (exactly the corruption the PR description shows)
  2. <pre><code>not <strong>bold</strong></code></pre><pre><code>not **bold**</code></pre> (bonus: <strong> inside pre also got collapsed pre-fix)

XSS audit — no new surface

The stash just stores the raw <pre>...</pre> substring and restores it verbatim. Whatever was renderable before is renderable now; the fix is purely about preventing rewrites from running inside the block, not about new sink behaviour. The pre block downstream goes through _al_stash again (line 951) which is content-shape-preserving. ✅ No CSP impact — no inline script generation, no new attribute write paths.

Edge-case trace

Scenario Expected Actual
Multiline <pre><code> from model preserved as-is ✅ harness 1
Two <pre> blocks separated by text both preserved, paragraph between rendered ✅ harness 2
<pre class="lang-py"> class attr preserved ✅ harness 3 ([^>]* in regex)
<strong> inside <pre> not collapsed to **bold** ✅ harness 4
<code> outside <pre> still rewritten ✅ harness 6
Plain **markdown** outside still rewritten to <strong> ✅ harness 5
<pre> inside fenced ``` fence wins (escaped) ✅ F-stash runs before R-stash
Empty <pre></pre> preserved ✅ harness 7
Unclosed <pre> (no </pre>) regex doesn't match → content rewrites ⚠️ acceptable (broken input → broken output)
Nested <pre> (invalid HTML) non-greedy stops at first </pre> ⚠️ acceptable (HTML spec disallows nesting)

Tests

  • PR's added test (TestRawPreCodePreservation::test_multiline_pre_code_blocks_do_not_degrade_to_backticks): passes. Validates two <pre><code> blocks remain balanced and content isn't backtick-corrupted.
  • All 25 renderer behaviour tests: pass.
  • Local full suite: 2587 passed, 47 skipped, 1 PR-unrelated pre-existing failure (test_sprint3.py::test_workspace_add_rejects_system_paths — macOS-only, fails on master too).
  • CI on PR: ✅ test (3.11), ✅ test (3.12), ✅ test (3.13).
  • Behavioural harness: 7/7 pass on PR, 5/7 on master (confirms bug + fix).

Other audit — confirmed correct

  • JS syntax: node --check passes.
  • Regex non-greediness: [\s\S]*? correctly stops at first </pre>. Combined with gi flags handles multiple blocks and case variants.
  • Token format: \x00R<n>\x00 — same shape as existing tokens; the restore regex /\x00R(\d+)\x00/g correctly captures the index.

Minor observations (non-blocking)

  • The fix doesn't address <pre> blocks that use <pre> without explicit </pre> (e.g., truncated streaming output mid-render). Acceptable — broken input to a markdown-aware renderer can yield broken output, and the streaming smd path takes a different code path entirely.
  • A future cleanup could batch all the HTML-tag stashes (R, A, B, etc.) into a single shared mechanism, but the current named tokens are easy to debug and the bug surface is small.

Recommendation

Approved. Tight 6-line fix that targets a real and reproducible regression. Behavioural harness confirms the bug exists on master (master 5/7, PR 7/7). Token namespace doesn't collide with existing escape tokens, stash/restore ordering is correct relative to the rewrites that needed gating. No new XSS surface, no cross-tool agent concern, CI green on 3.11/3.12/3.13. Parked at approval — ready for the release agent's merge/tag pipeline.

@nesquena-hermes
Copy link
Copy Markdown
Collaborator Author

This integrates @bsgdigital's original fix from #1150 into a clean standalone PR — good call given #1150 had scope issues that needed resolution.

The stash-and-restore pattern for protecting raw <pre> blocks is the correct fix for this class of renderer bug. The renderMd() inline-code rewrite pass (<code>text</code>`text`) should never run inside a <pre> block, and the stash/restore approach prevents that cleanly without fragile regex lookaheads.

v0.50.228 attribution note: the PR description credits @bsgdigital's original work — that's the right thing to do. The commit message and PR title should carry that attribution into the git log as well (and the description does include "Attribution: Original work by @bsgdigital (PR #1150)").

The fix directly addresses the rendering corruption reported in #1154 — stray </pre> in transcripts after large code-containing tool outputs.

Flagging for maintainer review.

nesquena-hermes added a commit that referenced this pull request Apr 27, 2026
…, .env (#1179)

Merged as v0.50.228. 2644 tests passing. Browser QA 21/21 (desktop 1440×900 + mobile iPhone 14). All 5 fix invariants verified live in browser.

**Fix verifications:**
- #1172 (`renderMd` pre-stash): `rawPreStash` present in function, `<pre>` blocks pass through without content rewrite ✅
- #1174 (model race guard): `syncTopbar()` contains `liveStillPending` guard ✅
- #1175 (tool card): `.tool-card-result pre` max-height=360px, `.tool-card.open .tool-card-detail` overflow=auto, cap=600px ✅  
- #1176 (empty session guard): double-click New Conversation on empty session → stays on same session, composer focused ✅
- #1178 (`.env` atomic write): `tempfile.mkstemp + os.replace` in `providers.py`, 9/9 env tests pass ✅

Thanks @bsgdigital (#1150) and @bergeouss (#1178)!
@nesquena-hermes
Copy link
Copy Markdown
Collaborator Author

Merged as v0.50.228 via #1179. Thank you @bsgdigital!

@nesquena-hermes nesquena-hermes deleted the integrate/pr1150 branch April 27, 2026 23:52
JKJameson pushed a commit to JKJameson/hermes-webui that referenced this pull request Apr 29, 2026
…, .env (nesquena#1179)

Merged as v0.50.228. 2644 tests passing. Browser QA 21/21 (desktop 1440×900 + mobile iPhone 14). All 5 fix invariants verified live in browser.

**Fix verifications:**
- nesquena#1172 (`renderMd` pre-stash): `rawPreStash` present in function, `<pre>` blocks pass through without content rewrite ✅
- nesquena#1174 (model race guard): `syncTopbar()` contains `liveStillPending` guard ✅
- nesquena#1175 (tool card): `.tool-card-result pre` max-height=360px, `.tool-card.open .tool-card-detail` overflow=auto, cap=600px ✅  
- nesquena#1176 (empty session guard): double-click New Conversation on empty session → stays on same session, composer focused ✅
- nesquena#1178 (`.env` atomic write): `tempfile.mkstemp + os.replace` in `providers.py`, 9/9 env tests pass ✅

Thanks @bsgdigital (nesquena#1150) and @bergeouss (nesquena#1178)!
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants