Skip to content

fix(bluebubbles): preserve intra-word underscores in stripMarkdown#46292

Open
sudabg wants to merge 1 commit intoopenclaw:mainfrom
sudabg:fix/stripmarkdown-intra-word
Open

fix(bluebubbles): preserve intra-word underscores in stripMarkdown#46292
sudabg wants to merge 1 commit intoopenclaw:mainfrom
sudabg:fix/stripmarkdown-intra-word

Conversation

@sudabg
Copy link
Copy Markdown

@sudabg sudabg commented Mar 14, 2026

Summary

Fixes #46185

The stripMarkdown function used in BlueBubbles outbound text processing strips single underscores inside words (e.g., here_is_a_message_with_underscoreshereisamessagewithunderscores).

Root Cause

The underscore-based italic regex (?<!_)_(?!_)(.+?)(?<!_)_(?!_) matches _text_ anywhere, even when underscores are clearly word-internal separators rather than markdown italic delimiters.

Fix

Updated the regex to require whitespace or start/end position around underscore delimiters:

// Before
result.replace(/(?<!_)_(?!_)(.+?)(?<!_)_(?!_)/g, "$1");

// After (CommonMark-compliant)
result.replace(/(?<=\s|^)_(?!_)(.+?)(?<!_)_(?=\s|$|[.,;:!?)/\]])/g, "$1");

This matches CommonMark spec for flanking delimiter runs — only _text_ with proper word boundaries is treated as italic formatting. Underscores inside words (like snake_case identifiers) are preserved.

Test Cases

Input Before After
here_is_a_message hereisamessage here_is_a_message
hello _world_ there hello world there hello world there
_italic text_ italic text italic text
snake_case_var snakecasevar snake_case_var

The underscore-based italic regex was too aggressive and stripped
underscores inside words like `here_is_a_message_with_underscores`.

Fix: Require whitespace or start/end position around underscore
delimiters, matching CommonMark spec for flanking delimiter runs.
Only underscores that look like actual markdown italic formatting
(`_text_` with proper boundaries) are now stripped.

Fixes openclaw#46185
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7f8184f075

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

result = result.replace(/(?<!\*)\*(?!\*)(.+?)(?<!\*)\*(?!\*)/g, "$1");
result = result.replace(/(?<!_)_(?!_)(.+?)(?<!_)_(?!_)/g, "$1");
// Only match _text_ when underscores are at word boundaries (CommonMark-compliant)
result = result.replace(/(?<=\s|^)_(?!_)(.+?)(?<!_)_(?=\s|$|[.,;:!?)/\]])/g, "$1");
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Allow punctuation-flanked _italic_ delimiters

The new underscore-italic regex only opens on (?<=\s|^), so valid markdown like (_italic_) or "_italic_" is no longer stripped and leaks literal underscores into plain-text output. In stripMarkdown, this is a regression because the function is intended to remove markdown markers generally, and CommonMark permits underscore emphasis when delimiters are adjacent to punctuation (not only whitespace/start).

Useful? React with 👍 / 👎.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Mar 14, 2026

Greptile Summary

This PR fixes a bug in stripMarkdown where the underscore-italic regex was incorrectly stripping underscores inside snake_case identifiers (e.g. here_is_a_messagehereisamessage). The fix works for the reported case, but the updated regex introduces a regression for italic text that appears after opening punctuation characters.

  • Fix is correct for the reported case: snake_case identifiers like here_is_a_message are now preserved since none of their underscores are preceded by whitespace or start-of-string.
  • Regression for italic after punctuation: The (?<=\s|^) lookbehind is too strict — valid italic like (_italic_), "_italic_", or [_see this_] will no longer be stripped, leaving raw underscores in the plain-text output.
  • Closing lookahead is asymmetric: the closing _ requires (?=\s|$|[.,;:!?)/\]]) (specific punctuation), but opening punctuation characters ((, [, ", etc.) are not included in the lookbehind.

Confidence Score: 3/5

  • The fix resolves the reported snake_case bug but introduces a regression for italic text after opening punctuation characters.
  • The change correctly fixes the snake_case identifier stripping issue, but the (?<=\s|^) lookbehind is overly restrictive. Italic-marked text following (, [, ", or ' will no longer be stripped, leaving raw underscores in plain-text output where none were expected before.
  • src/line/markdown-to-line.ts — the updated italic regex needs a broader lookbehind to cover opening punctuation characters.
Prompt To Fix All With AI
This is a comment left during a code review.
Path: src/line/markdown-to-line.ts
Line: 354

Comment:
**Opening delimiter lookbehind too restrictive — regression for italic after punctuation**

The `(?<=\s|^)` lookbehind only allows whitespace or start-of-string before the opening `_`. This means valid italic markdown that immediately follows opening punctuation — such as `(`, `[`, `"`, or `'` — will **no longer be stripped**, leaving raw underscores in the plain-text output.

For example:
| Input | Before fix | After this fix |
|---|---|---|
| `(_italic text_)` | `(italic text)`| `(_italic text_)`|
| `"_italic_"` | `"italic"`| `"_italic_"`|
| `[_see this_]` | `[see this]`| `[_see this_]`|

A more complete fix would also allow opening punctuation characters in the lookbehind, matching closer to the CommonMark spec (which says a left-flanking delimiter run may be preceded by Unicode punctuation):

```suggestion
  result = result.replace(/(?<=\s|^|[([{"'`])_(?!_)(.+?)(?<!_)_(?=\s|$|[.,;:!?)/\]}"'`])/g, "$1");
```

This extends both the lookbehind and the closing lookahead to cover matching punctuation pairs, preserving the snake_case fix while not regressing on italic-inside-punctuation cases.

How can I resolve this? If you propose a fix, please make it concise.

Last reviewed commit: 7f8184f

result = result.replace(/(?<!\*)\*(?!\*)(.+?)(?<!\*)\*(?!\*)/g, "$1");
result = result.replace(/(?<!_)_(?!_)(.+?)(?<!_)_(?!_)/g, "$1");
// Only match _text_ when underscores are at word boundaries (CommonMark-compliant)
result = result.replace(/(?<=\s|^)_(?!_)(.+?)(?<!_)_(?=\s|$|[.,;:!?)/\]])/g, "$1");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opening delimiter lookbehind too restrictive — regression for italic after punctuation

The (?<=\s|^) lookbehind only allows whitespace or start-of-string before the opening _. This means valid italic markdown that immediately follows opening punctuation — such as (, [, ", or ' — will no longer be stripped, leaving raw underscores in the plain-text output.

For example:

Input Before fix After this fix
(_italic text_) (italic text) (_italic text_)
"_italic_" "italic" "_italic_"
[_see this_] [see this] [_see this_]

A more complete fix would also allow opening punctuation characters in the lookbehind, matching closer to the CommonMark spec (which says a left-flanking delimiter run may be preceded by Unicode punctuation):

Suggested change
result = result.replace(/(?<=\s|^)_(?!_)(.+?)(?<!_)_(?=\s|$|[.,;:!?)/\]])/g, "$1");
result = result.replace(/(?<=\s|^|[([{"'`])_(?!_)(.+?)(?<!_)_(?=\s|$|[.,;:!?)/\]}"'`])/g, "$1");

This extends both the lookbehind and the closing lookahead to cover matching punctuation pairs, preserving the snake_case fix while not regressing on italic-inside-punctuation cases.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/line/markdown-to-line.ts
Line: 354

Comment:
**Opening delimiter lookbehind too restrictive — regression for italic after punctuation**

The `(?<=\s|^)` lookbehind only allows whitespace or start-of-string before the opening `_`. This means valid italic markdown that immediately follows opening punctuation — such as `(`, `[`, `"`, or `'` — will **no longer be stripped**, leaving raw underscores in the plain-text output.

For example:
| Input | Before fix | After this fix |
|---|---|---|
| `(_italic text_)` | `(italic text)`| `(_italic text_)`|
| `"_italic_"` | `"italic"`| `"_italic_"`|
| `[_see this_]` | `[see this]`| `[_see this_]`|

A more complete fix would also allow opening punctuation characters in the lookbehind, matching closer to the CommonMark spec (which says a left-flanking delimiter run may be preceded by Unicode punctuation):

```suggestion
  result = result.replace(/(?<=\s|^|[([{"'`])_(?!_)(.+?)(?<!_)_(?=\s|$|[.,;:!?)/\]}"'`])/g, "$1");
```

This extends both the lookbehind and the closing lookahead to cover matching punctuation pairs, preserving the snake_case fix while not regressing on italic-inside-punctuation cases.

How can I resolve this? If you propose a fix, please make it concise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BlueBubbles: stripMarkdown regex too aggressive — strips underscores inside words

1 participant