fix(renderer): preserve raw <pre> blocks during markdown rewrites (v0.50.228)#1172
fix(renderer): preserve raw <pre> blocks during markdown rewrites (v0.50.228)#1172nesquena-hermes wants to merge 3 commits intomasterfrom
Conversation
Stash raw pre blocks before inline code-tag normalization so multiline code content is not rewritten into stray backticks that can corrupt subsequent code box rendering. Add a JS-renderer behavior test that guards balanced pre/code output across multiple raw code blocks. Made-with: Cursor
Rename the raw pre stash variable so static substring checks for _pre_stash continue to target the intended code-block stash section. Made-with: Cursor
nesquena
left a comment
There was a problem hiding this comment.
Review — end-to-end ✅ (clean approve)
What this ships
v0.50.228 — absorbing @bsgdigital's PR #1150. Fixes renderMd() corrupting raw <pre> blocks during the inline-<code>-to-backtick rewrite.
Six lines in static/ui.js:813-824:
const rawPreStash=[];
s=s.replace(/(<pre\b[^>]*>[\s\S]*?<\/pre>)/gi,m=>{rawPreStash.push(m);return `\x00R${rawPreStash.length-1}\x00`;});
// ... existing strong/b/em/i/code/br rewrites run here ...
s=s.replace(/\x00R(\d+)\x00/g,(_,i)=>rawPreStash[+i]);Traced against upstream hermes-agent
Pulled fresh nousresearch/hermes-agent tarball. renderMd is purely WebUI-frontend — agent emits text/markdown, not raw <pre> HTML, so this change has zero cross-tool coupling. No agent-side files touched.
End-to-end trace
Token namespace check — no collision
Existing escape tokens in renderMd: \x00F (fence), \x00M (math), \x00Q (blockquote), \x00D (media), \x00C (code), \x00G (img), \x00L (link), \x00O (code outer), \x00A (anchor), \x00B (block-element-line). The new \x00R is the only unused letter near the top — no collision.
Stash/restore ordering
Lines 798-826 of static/ui.js:
| # | Op | Why |
|---|---|---|
| 1 | Stash F (fences) | First — protects `code` from $ math |
| 2 | Stash M (math) | After fences |
| 3 | Stash R (raw <pre>) ← new |
Before HTML→md rewrites |
| 4 | <strong> <b> <em> <i> <code> <br> |
Rewrites — won't see content inside R |
| 5 | Restore R ← new | Bring <pre> back verbatim |
| 6 | Restore F (fences) | Bring fences back |
Pre blocks are protected through the per-line passes by a separate _al_stash at ui.js:951 which also matches <pre>...</pre> — so restored pre content is re-stashed before per-line list/heading/blockquote work runs. ✅
Behavioural harness — 7/7 pass on PR, 5/7 on master
PASS [Multiline pre>code preserved]
PASS [Two pre blocks separated by paragraph]
PASS [Pre with class attr]
PASS [Pre containing strong should NOT collapse to **]
PASS [Plain markdown still rewrites]
PASS [<code> outside <pre> still rewrites]
PASS [Empty <pre></pre> preserved]
On master, two failures confirm the bug:
<pre><code>line 1\nline 2\nline 3</code></pre>→<pre>`line 1\nline 2\nline 3`</pre>(exactly the corruption the PR description shows)<pre><code>not <strong>bold</strong></code></pre>→<pre><code>not **bold**</code></pre>(bonus:<strong>inside pre also got collapsed pre-fix)
XSS audit — no new surface
The stash just stores the raw <pre>...</pre> substring and restores it verbatim. Whatever was renderable before is renderable now; the fix is purely about preventing rewrites from running inside the block, not about new sink behaviour. The pre block downstream goes through _al_stash again (line 951) which is content-shape-preserving. ✅ No CSP impact — no inline script generation, no new attribute write paths.
Edge-case trace
| Scenario | Expected | Actual |
|---|---|---|
Multiline <pre><code> from model |
preserved as-is | ✅ harness 1 |
Two <pre> blocks separated by text |
both preserved, paragraph between rendered | ✅ harness 2 |
<pre class="lang-py"> |
class attr preserved | ✅ harness 3 ([^>]* in regex) |
<strong> inside <pre> |
not collapsed to **bold** |
✅ harness 4 |
<code> outside <pre> |
still rewritten | ✅ harness 6 |
Plain **markdown** outside |
still rewritten to <strong> |
✅ harness 5 |
<pre> inside fenced ``` |
fence wins (escaped) | ✅ F-stash runs before R-stash |
Empty <pre></pre> |
preserved | ✅ harness 7 |
Unclosed <pre> (no </pre>) |
regex doesn't match → content rewrites | |
Nested <pre> (invalid HTML) |
non-greedy stops at first </pre> |
Tests
- PR's added test (
TestRawPreCodePreservation::test_multiline_pre_code_blocks_do_not_degrade_to_backticks): passes. Validates two<pre><code>blocks remain balanced and content isn't backtick-corrupted. - All 25 renderer behaviour tests: pass.
- Local full suite: 2587 passed, 47 skipped, 1 PR-unrelated pre-existing failure (
test_sprint3.py::test_workspace_add_rejects_system_paths— macOS-only, fails on master too). - CI on PR: ✅ test (3.11), ✅ test (3.12), ✅ test (3.13).
- Behavioural harness: 7/7 pass on PR, 5/7 on master (confirms bug + fix).
Other audit — confirmed correct
- JS syntax:
node --checkpasses. - Regex non-greediness:
[\s\S]*?correctly stops at first</pre>. Combined withgiflags handles multiple blocks and case variants. - Token format:
\x00R<n>\x00— same shape as existing tokens; the restore regex/\x00R(\d+)\x00/gcorrectly captures the index.
Minor observations (non-blocking)
- The fix doesn't address
<pre>blocks that use<pre>without explicit</pre>(e.g., truncated streaming output mid-render). Acceptable — broken input to a markdown-aware renderer can yield broken output, and the streaming smd path takes a different code path entirely. - A future cleanup could batch all the HTML-tag stashes (R, A, B, etc.) into a single shared mechanism, but the current named tokens are easy to debug and the bug surface is small.
Recommendation
Approved. Tight 6-line fix that targets a real and reproducible regression. Behavioural harness confirms the bug exists on master (master 5/7, PR 7/7). Token namespace doesn't collide with existing escape tokens, stash/restore ordering is correct relative to the rewrites that needed gating. No new XSS surface, no cross-tool agent concern, CI green on 3.11/3.12/3.13. Parked at approval — ready for the release agent's merge/tag pipeline.
|
This integrates @bsgdigital's original fix from #1150 into a clean standalone PR — good call given #1150 had scope issues that needed resolution. The stash-and-restore pattern for protecting raw v0.50.228 attribution note: the PR description credits @bsgdigital's original work — that's the right thing to do. The commit message and PR title should carry that attribution into the git log as well (and the description does include "Attribution: Original work by @bsgdigital (PR #1150)"). The fix directly addresses the rendering corruption reported in #1154 — stray Flagging for maintainer review. |
…, .env (#1179) Merged as v0.50.228. 2644 tests passing. Browser QA 21/21 (desktop 1440×900 + mobile iPhone 14). All 5 fix invariants verified live in browser. **Fix verifications:** - #1172 (`renderMd` pre-stash): `rawPreStash` present in function, `<pre>` blocks pass through without content rewrite ✅ - #1174 (model race guard): `syncTopbar()` contains `liveStillPending` guard ✅ - #1175 (tool card): `.tool-card-result pre` max-height=360px, `.tool-card.open .tool-card-detail` overflow=auto, cap=600px ✅ - #1176 (empty session guard): double-click New Conversation on empty session → stays on same session, composer focused ✅ - #1178 (`.env` atomic write): `tempfile.mkstemp + os.replace` in `providers.py`, 9/9 env tests pass ✅ Thanks @bsgdigital (#1150) and @bergeouss (#1178)!
|
Merged as v0.50.228 via #1179. Thank you @bsgdigital! |
…, .env (nesquena#1179) Merged as v0.50.228. 2644 tests passing. Browser QA 21/21 (desktop 1440×900 + mobile iPhone 14). All 5 fix invariants verified live in browser. **Fix verifications:** - nesquena#1172 (`renderMd` pre-stash): `rawPreStash` present in function, `<pre>` blocks pass through without content rewrite ✅ - nesquena#1174 (model race guard): `syncTopbar()` contains `liveStillPending` guard ✅ - nesquena#1175 (tool card): `.tool-card-result pre` max-height=360px, `.tool-card.open .tool-card-detail` overflow=auto, cap=600px ✅ - nesquena#1176 (empty session guard): double-click New Conversation on empty session → stays on same session, composer focused ✅ - nesquena#1178 (`.env` atomic write): `tempfile.mkstemp + os.replace` in `providers.py`, 9/9 env tests pass ✅ Thanks @bsgdigital (nesquena#1150) and @bergeouss (nesquena#1178)!
What this fixes
renderMd()instatic/ui.jsprocesses raw HTML from model output through a series of rewrites. One pass converts<code>text</code>to`text`(markdown inline code). That pass was running inside raw<pre>blocks, so multiline<pre><code>blocks from the model got their content rewritten to backtick strings — producing broken output like:How it's fixed
6 lines: stash all
<pre>...</pre>blocks behind\x00R{n}\x00tokens before the inline-code rewrite, restore them after. The token uses\x00R— no collision with existing\x00F(fence stash) or\x00M(math stash) tokens.The regex
(<pre\b[^>]*>[\s\S]*?<\/pre>)is non-greedy and case-insensitive, handles<pre class="...">variants correctly.Tests: 1 new test class (
TestRawPreCodePreservation) with 1 test covering the exact regression. Full suite: 2635 passing.