Skip to content

fix(line): preserve underscores inside words in stripMarkdown#47465

Merged
Takhoffman merged 4 commits intoopenclaw:mainfrom
jackjin1997:fix/line-strip-markdown-underscore
Mar 29, 2026
Merged

fix(line): preserve underscores inside words in stripMarkdown#47465
Takhoffman merged 4 commits intoopenclaw:mainfrom
jackjin1997:fix/line-strip-markdown-underscore

Conversation

@jackjin1997
Copy link
Copy Markdown
Contributor

Summary

  • The italic-underscore regex in stripMarkdown incorrectly stripped underscores inside snake_case words (e.g. here_is_a_messageherisamessage).
  • Tighten lookbehind/lookahead from (?<!_) / (?!_) to (?<!\w) / (?!\w) so only standalone _italic_ markers are removed.
  • Added 3 test cases covering underscore preservation and proper italic stripping.

Fixes #46185

Test plan

  • pnpm test src/line/markdown-to-line.test.ts — 18/18 pass
  • New tests verify snake_case_var, here_is_a_message preserved
  • Existing italic stripping tests still pass

Copy link
Copy Markdown

@xkonjin xkonjin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review — Preserve underscores inside words in stripMarkdown

Small, focused fix. The regex change is correct:

// Before: (?<!_)_(?!_)(.+?)(?<!_)_(?!_)
// After:  (?<!\w)_(?!_)(.+?)(?<!_)_(?!\w)

Changing the lookbehind/lookahead from _ to \w means snake_case_var no longer has its underscores stripped, while standalone _italic_ still works.

  1. Edge case — underscores at string boundaries: The tests cover _italic_ at start and end _italic_, which is good since \w lookbehind at position 0 correctly fails (no preceding word char).

  2. Potential false negative: (_italic_) — underscore italic wrapped in parens. The ( is not \w, so the lookbehind passes and this still strips correctly. Good.

  3. Missed edge case: What about _multi word italic_? The .+? is non-greedy, and there's no \w adjacent, so this should still strip. Confirmed by the existing "complex markdown" test.

  4. Another edge case worth testing: A string like foo_bar _italic_ baz_qux — the middle _italic_ should strip while the surrounding underscores in foo_bar and baz_qux stay. Might be worth adding this as a test case to lock down the behavior.

Clean fix, good tests. 👍

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Mar 15, 2026

Greptile Summary

This PR fixes a regex bug in stripMarkdown where the underscore-italic pattern /(?<!_)_(?!_)(.+?)(?<!_)_(?!_)/g would incorrectly strip underscores inside snake_case identifiers (e.g. snake_case_varsnakevar). The fix tightens both the opening lookbehind (from (?<!_) to (?<!\w)) and the closing lookahead (from (?!_) to (?!\w)), so only underscores at true word boundaries are treated as italic markers.

  • Root cause: The old (?<!_) lookbehind only guarded against a preceding underscore; any letter or digit before _ would still permit an italic match, mangling words like snake_case_var.
  • Fix: Changing to (?<!\w) ensures any word character ([a-zA-Z0-9_]) before an opening _ disqualifies it as an italic opener — cleanly aligning with CommonMark's left-flanking delimiter rule.
  • Bonus fix not in the description: The old closing lookahead (?!_) would also incorrectly strip _italic_word (since w_); the new (?!\w) correctly preserves such patterns.
  • Tests: 6 new test cases cover snake_case preservation and correct italic stripping; all existing tests continue to pass.

Confidence Score: 5/5

  • This PR is safe to merge — it is a well-scoped regex fix with comprehensive tests and no breaking changes.
  • The regex change is minimal, logically correct, and well-aligned with CommonMark's word-boundary rules for underscore italic markers. The six new test cases directly validate both the preservation of snake_case words and the continued stripping of legitimate italic syntax. No other code paths are affected.
  • No files require special attention.

Last reviewed commit: b70adeb

@jackjin1997 jackjin1997 force-pushed the fix/line-strip-markdown-underscore branch from 0cb7338 to 41af5ee Compare March 15, 2026 16:33
@jackjin1997
Copy link
Copy Markdown
Contributor Author

@xkonjin thanks for the thorough review! Added the foo_bar _italic_ baz_qux edge case test — confirms italic between underscored words strips correctly while preserving the surrounding underscores. 21/21 pass.

@jackjin1997 jackjin1997 force-pushed the fix/line-strip-markdown-underscore branch 2 times, most recently from 16b70ec to 181cd2b Compare March 17, 2026 04:14
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a07dd6a728

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: dacda07be4

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@jackjin1997
Copy link
Copy Markdown
Contributor Author

Good catch on the Unicode boundary — switched from \w to [\p{L}\p{N}] with the u flag so non-Latin scripts (CJK, Cyrillic) adjacent to underscores are preserved. Added test cases for Russian, Japanese, and digit boundaries.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: cf98fd84f2

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

// Remove italic: *text* or _text_ (but not already processed)
result = result.replace(/(?<!\*)\*(?!\*)(.+?)(?<!\*)\*(?!\*)/g, "$1");
result = result.replace(/(?<!_)_(?!_)(.+?)(?<!_)_(?!_)/g, "$1");
result = result.replace(/(?<![\p{L}\p{N}])_(?!_)(.+?)(?<!_)_(?![\p{L}\p{N}])/gu, "$1");
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Handle underscore italics inside non-spaced scripts

The new (?<![\p{L}\p{N}]) / (?![\p{L}\p{N}]) guards fix snake_case, but they also stop _italic_ from being stripped when it is embedded in languages that normally do not use spaces. For example, stripMarkdown("これは_重要_です") or stripMarkdown("请看_这里_谢谢") now leaves the underscores in place because the delimiters touch CJK letters on both sides. Since stripMarkdown() is reused for LINE/TTS/plain-text send paths, localized markdown replies will now surface literal _ markers instead of emphasized text.

Useful? React with 👍 / 👎.

jackjin1997 and others added 4 commits March 28, 2026 21:22
Tighten italic-underscore regex lookbehind/lookahead from _ to \w
so snake_case identifiers are preserved while standalone _italic_
markers are still stripped.

Fixes openclaw#46185
Switch from ASCII-only \w to [\p{L}\p{N}] with the u flag so
non-Latin scripts (CJK, Cyrillic, etc.) adjacent to underscores
are not mistakenly treated as italic markers.
@Takhoffman Takhoffman force-pushed the fix/line-strip-markdown-underscore branch from 0664912 to 934442b Compare March 29, 2026 02:30
@openclaw-barnacle openclaw-barnacle bot added the channel: line Channel integration: line label Mar 29, 2026
@Takhoffman Takhoffman merged commit a0407c7 into openclaw:main Mar 29, 2026
10 checks passed
@Takhoffman
Copy link
Copy Markdown
Contributor

Autoland landed this after a manual freshness rebase onto current main.

What changed during prep:

  • moved the underscore fix onto the shared stripMarkdown helper introduced by the markdown refactor
  • kept LINE coverage and added shared TTS regression coverage for Latin, Cyrillic, and CJK underscore cases
  • added the required changelog entry

Verified:

  • pnpm install --frozen-lockfile
  • pnpm build
  • pnpm test -- extensions/line/src/markdown-to-line.test.ts src/tts/prepare-text.test.ts

Note: pnpm check is currently failing on unchanged extensions/microsoft/speech-provider.test.ts typing at lines 108 and 139 on the rebased base, outside this PR diff.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 934442b9de

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".


result = result.replace(/(?<!\*)\*(?!\*)(.+?)(?<!\*)\*(?!\*)/g, "$1");
result = result.replace(/(?<!_)_(?!_)(.+?)(?<!_)_(?!_)/g, "$1");
result = result.replace(/(?<![\p{L}\p{N}])_(?!_)(.+?)(?<!_)_(?![\p{L}\p{N}])/gu, "$1");
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve underscore italics stripping in non-spaced scripts

The new boundary guard blocks _..._ stripping whenever the delimiters touch Unicode letters, which regresses valid markdown-style emphasis in languages that typically omit spaces. With this regex, inputs like これは_重要_です and 请看_这里_谢谢 now keep literal underscores, whereas the previous implementation removed them; this affects shared stripMarkdown() consumers (LINE fallback text and TTS), so localized emphasized replies can now be rendered/spoken with raw _ markers instead of clean text.

Useful? React with 👍 / 👎.

alexjiang1 pushed a commit to alexjiang1/openclaw that referenced this pull request Mar 31, 2026
…aw#47465)

Fixes openclaw#46185.

Verified:
- pnpm install --frozen-lockfile
- pnpm build
- pnpm test -- extensions/line/src/markdown-to-line.test.ts src/tts/prepare-text.test.ts

Note: `pnpm check` currently fails on unchanged `extensions/microsoft/speech-provider.test.ts` lines 108 and 139 on the rebased base, outside this PR diff.
livingghost pushed a commit to livingghost/openclaw that referenced this pull request Mar 31, 2026
…aw#47465)

Fixes openclaw#46185.

Verified:
- pnpm install --frozen-lockfile
- pnpm build
- pnpm test -- extensions/line/src/markdown-to-line.test.ts src/tts/prepare-text.test.ts

Note: `pnpm check` currently fails on unchanged `extensions/microsoft/speech-provider.test.ts` lines 108 and 139 on the rebased base, outside this PR diff.
pgondhi987 pushed a commit to pgondhi987/openclaw that referenced this pull request Mar 31, 2026
…aw#47465)

Fixes openclaw#46185.

Verified:
- pnpm install --frozen-lockfile
- pnpm build
- pnpm test -- extensions/line/src/markdown-to-line.test.ts src/tts/prepare-text.test.ts

Note: `pnpm check` currently fails on unchanged `extensions/microsoft/speech-provider.test.ts` lines 108 and 139 on the rebased base, outside this PR diff.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

channel: line Channel integration: line size: XS

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BlueBubbles: stripMarkdown regex too aggressive — strips underscores inside words

3 participants