fix(bluebubbles): preserve intra-word underscores in stripMarkdown#46292
fix(bluebubbles): preserve intra-word underscores in stripMarkdown#46292sudabg wants to merge 1 commit intoopenclaw:mainfrom
Conversation
The underscore-based italic regex was too aggressive and stripped underscores inside words like `here_is_a_message_with_underscores`. Fix: Require whitespace or start/end position around underscore delimiters, matching CommonMark spec for flanking delimiter runs. Only underscores that look like actual markdown italic formatting (`_text_` with proper boundaries) are now stripped. Fixes openclaw#46185
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 7f8184f075
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| result = result.replace(/(?<!\*)\*(?!\*)(.+?)(?<!\*)\*(?!\*)/g, "$1"); | ||
| result = result.replace(/(?<!_)_(?!_)(.+?)(?<!_)_(?!_)/g, "$1"); | ||
| // Only match _text_ when underscores are at word boundaries (CommonMark-compliant) | ||
| result = result.replace(/(?<=\s|^)_(?!_)(.+?)(?<!_)_(?=\s|$|[.,;:!?)/\]])/g, "$1"); |
There was a problem hiding this comment.
Allow punctuation-flanked
_italic_ delimiters
The new underscore-italic regex only opens on (?<=\s|^), so valid markdown like (_italic_) or "_italic_" is no longer stripped and leaks literal underscores into plain-text output. In stripMarkdown, this is a regression because the function is intended to remove markdown markers generally, and CommonMark permits underscore emphasis when delimiters are adjacent to punctuation (not only whitespace/start).
Useful? React with 👍 / 👎.
Greptile SummaryThis PR fixes a bug in
Confidence Score: 3/5
Prompt To Fix All With AIThis is a comment left during a code review.
Path: src/line/markdown-to-line.ts
Line: 354
Comment:
**Opening delimiter lookbehind too restrictive — regression for italic after punctuation**
The `(?<=\s|^)` lookbehind only allows whitespace or start-of-string before the opening `_`. This means valid italic markdown that immediately follows opening punctuation — such as `(`, `[`, `"`, or `'` — will **no longer be stripped**, leaving raw underscores in the plain-text output.
For example:
| Input | Before fix | After this fix |
|---|---|---|
| `(_italic text_)` | `(italic text)` ✅ | `(_italic text_)` ❌ |
| `"_italic_"` | `"italic"` ✅ | `"_italic_"` ❌ |
| `[_see this_]` | `[see this]` ✅ | `[_see this_]` ❌ |
A more complete fix would also allow opening punctuation characters in the lookbehind, matching closer to the CommonMark spec (which says a left-flanking delimiter run may be preceded by Unicode punctuation):
```suggestion
result = result.replace(/(?<=\s|^|[([{"'`])_(?!_)(.+?)(?<!_)_(?=\s|$|[.,;:!?)/\]}"'`])/g, "$1");
```
This extends both the lookbehind and the closing lookahead to cover matching punctuation pairs, preserving the snake_case fix while not regressing on italic-inside-punctuation cases.
How can I resolve this? If you propose a fix, please make it concise.Last reviewed commit: 7f8184f |
| result = result.replace(/(?<!\*)\*(?!\*)(.+?)(?<!\*)\*(?!\*)/g, "$1"); | ||
| result = result.replace(/(?<!_)_(?!_)(.+?)(?<!_)_(?!_)/g, "$1"); | ||
| // Only match _text_ when underscores are at word boundaries (CommonMark-compliant) | ||
| result = result.replace(/(?<=\s|^)_(?!_)(.+?)(?<!_)_(?=\s|$|[.,;:!?)/\]])/g, "$1"); |
There was a problem hiding this comment.
Opening delimiter lookbehind too restrictive — regression for italic after punctuation
The (?<=\s|^) lookbehind only allows whitespace or start-of-string before the opening _. This means valid italic markdown that immediately follows opening punctuation — such as (, [, ", or ' — will no longer be stripped, leaving raw underscores in the plain-text output.
For example:
| Input | Before fix | After this fix |
|---|---|---|
(_italic text_) |
(italic text) ✅ |
(_italic text_) ❌ |
"_italic_" |
"italic" ✅ |
"_italic_" ❌ |
[_see this_] |
[see this] ✅ |
[_see this_] ❌ |
A more complete fix would also allow opening punctuation characters in the lookbehind, matching closer to the CommonMark spec (which says a left-flanking delimiter run may be preceded by Unicode punctuation):
| result = result.replace(/(?<=\s|^)_(?!_)(.+?)(?<!_)_(?=\s|$|[.,;:!?)/\]])/g, "$1"); | |
| result = result.replace(/(?<=\s|^|[([{"'`])_(?!_)(.+?)(?<!_)_(?=\s|$|[.,;:!?)/\]}"'`])/g, "$1"); |
This extends both the lookbehind and the closing lookahead to cover matching punctuation pairs, preserving the snake_case fix while not regressing on italic-inside-punctuation cases.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/line/markdown-to-line.ts
Line: 354
Comment:
**Opening delimiter lookbehind too restrictive — regression for italic after punctuation**
The `(?<=\s|^)` lookbehind only allows whitespace or start-of-string before the opening `_`. This means valid italic markdown that immediately follows opening punctuation — such as `(`, `[`, `"`, or `'` — will **no longer be stripped**, leaving raw underscores in the plain-text output.
For example:
| Input | Before fix | After this fix |
|---|---|---|
| `(_italic text_)` | `(italic text)` ✅ | `(_italic text_)` ❌ |
| `"_italic_"` | `"italic"` ✅ | `"_italic_"` ❌ |
| `[_see this_]` | `[see this]` ✅ | `[_see this_]` ❌ |
A more complete fix would also allow opening punctuation characters in the lookbehind, matching closer to the CommonMark spec (which says a left-flanking delimiter run may be preceded by Unicode punctuation):
```suggestion
result = result.replace(/(?<=\s|^|[([{"'`])_(?!_)(.+?)(?<!_)_(?=\s|$|[.,;:!?)/\]}"'`])/g, "$1");
```
This extends both the lookbehind and the closing lookahead to cover matching punctuation pairs, preserving the snake_case fix while not regressing on italic-inside-punctuation cases.
How can I resolve this? If you propose a fix, please make it concise.
Summary
Fixes #46185
The
stripMarkdownfunction used in BlueBubbles outbound text processing strips single underscores inside words (e.g.,here_is_a_message_with_underscores→hereisamessagewithunderscores).Root Cause
The underscore-based italic regex
(?<!_)_(?!_)(.+?)(?<!_)_(?!_)matches_text_anywhere, even when underscores are clearly word-internal separators rather than markdown italic delimiters.Fix
Updated the regex to require whitespace or start/end position around underscore delimiters:
This matches CommonMark spec for flanking delimiter runs — only
_text_with proper word boundaries is treated as italic formatting. Underscores inside words (like snake_case identifiers) are preserved.Test Cases
here_is_a_messagehereisamessagehere_is_a_message✅hello _world_ therehello world therehello world there✅_italic text_italic textitalic text✅snake_case_varsnakecasevarsnake_case_var✅