fix(line): preserve underscores inside words in stripMarkdown#47465
fix(line): preserve underscores inside words in stripMarkdown#47465Takhoffman merged 4 commits intoopenclaw:mainfrom
Conversation
xkonjin
left a comment
There was a problem hiding this comment.
Code Review — Preserve underscores inside words in stripMarkdown
Small, focused fix. The regex change is correct:
// Before: (?<!_)_(?!_)(.+?)(?<!_)_(?!_)
// After: (?<!\w)_(?!_)(.+?)(?<!_)_(?!\w)
Changing the lookbehind/lookahead from _ to \w means snake_case_var no longer has its underscores stripped, while standalone _italic_ still works.
-
Edge case — underscores at string boundaries: The tests cover
_italic_ at startandend _italic_, which is good since\wlookbehind at position 0 correctly fails (no preceding word char). -
Potential false negative:
(_italic_)— underscore italic wrapped in parens. The(is not\w, so the lookbehind passes and this still strips correctly. Good. -
Missed edge case: What about
_multi word italic_? The.+?is non-greedy, and there's no\wadjacent, so this should still strip. Confirmed by the existing "complex markdown" test. -
Another edge case worth testing: A string like
foo_bar _italic_ baz_qux— the middle_italic_should strip while the surrounding underscores infoo_barandbaz_quxstay. Might be worth adding this as a test case to lock down the behavior.
Clean fix, good tests. 👍
Greptile SummaryThis PR fixes a regex bug in
Confidence Score: 5/5
Last reviewed commit: b70adeb |
0cb7338 to
41af5ee
Compare
|
@xkonjin thanks for the thorough review! Added the |
16b70ec to
181cd2b
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a07dd6a728
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: dacda07be4
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
|
Good catch on the Unicode boundary — switched from |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: cf98fd84f2
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
src/line/markdown-to-line.ts
Outdated
| // Remove italic: *text* or _text_ (but not already processed) | ||
| result = result.replace(/(?<!\*)\*(?!\*)(.+?)(?<!\*)\*(?!\*)/g, "$1"); | ||
| result = result.replace(/(?<!_)_(?!_)(.+?)(?<!_)_(?!_)/g, "$1"); | ||
| result = result.replace(/(?<![\p{L}\p{N}])_(?!_)(.+?)(?<!_)_(?![\p{L}\p{N}])/gu, "$1"); |
There was a problem hiding this comment.
Handle underscore italics inside non-spaced scripts
The new (?<![\p{L}\p{N}]) / (?![\p{L}\p{N}]) guards fix snake_case, but they also stop _italic_ from being stripped when it is embedded in languages that normally do not use spaces. For example, stripMarkdown("これは_重要_です") or stripMarkdown("请看_这里_谢谢") now leaves the underscores in place because the delimiters touch CJK letters on both sides. Since stripMarkdown() is reused for LINE/TTS/plain-text send paths, localized markdown replies will now surface literal _ markers instead of emphasized text.
Useful? React with 👍 / 👎.
Tighten italic-underscore regex lookbehind/lookahead from _ to \w so snake_case identifiers are preserved while standalone _italic_ markers are still stripped. Fixes openclaw#46185
Switch from ASCII-only \w to [\p{L}\p{N}] with the u flag so
non-Latin scripts (CJK, Cyrillic, etc.) adjacent to underscores
are not mistakenly treated as italic markers.
0664912 to
934442b
Compare
|
Autoland landed this after a manual freshness rebase onto current What changed during prep:
Verified:
Note: |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 934442b9de
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
|
||
| result = result.replace(/(?<!\*)\*(?!\*)(.+?)(?<!\*)\*(?!\*)/g, "$1"); | ||
| result = result.replace(/(?<!_)_(?!_)(.+?)(?<!_)_(?!_)/g, "$1"); | ||
| result = result.replace(/(?<![\p{L}\p{N}])_(?!_)(.+?)(?<!_)_(?![\p{L}\p{N}])/gu, "$1"); |
There was a problem hiding this comment.
Preserve underscore italics stripping in non-spaced scripts
The new boundary guard blocks _..._ stripping whenever the delimiters touch Unicode letters, which regresses valid markdown-style emphasis in languages that typically omit spaces. With this regex, inputs like これは_重要_です and 请看_这里_谢谢 now keep literal underscores, whereas the previous implementation removed them; this affects shared stripMarkdown() consumers (LINE fallback text and TTS), so localized emphasized replies can now be rendered/spoken with raw _ markers instead of clean text.
Useful? React with 👍 / 👎.
…aw#47465) Fixes openclaw#46185. Verified: - pnpm install --frozen-lockfile - pnpm build - pnpm test -- extensions/line/src/markdown-to-line.test.ts src/tts/prepare-text.test.ts Note: `pnpm check` currently fails on unchanged `extensions/microsoft/speech-provider.test.ts` lines 108 and 139 on the rebased base, outside this PR diff.
…aw#47465) Fixes openclaw#46185. Verified: - pnpm install --frozen-lockfile - pnpm build - pnpm test -- extensions/line/src/markdown-to-line.test.ts src/tts/prepare-text.test.ts Note: `pnpm check` currently fails on unchanged `extensions/microsoft/speech-provider.test.ts` lines 108 and 139 on the rebased base, outside this PR diff.
…aw#47465) Fixes openclaw#46185. Verified: - pnpm install --frozen-lockfile - pnpm build - pnpm test -- extensions/line/src/markdown-to-line.test.ts src/tts/prepare-text.test.ts Note: `pnpm check` currently fails on unchanged `extensions/microsoft/speech-provider.test.ts` lines 108 and 139 on the rebased base, outside this PR diff.
Summary
stripMarkdownincorrectly stripped underscores inside snake_case words (e.g.here_is_a_message→herisamessage).(?<!_)/(?!_)to(?<!\w)/(?!\w)so only standalone_italic_markers are removed.Fixes #46185
Test plan
pnpm test src/line/markdown-to-line.test.ts— 18/18 passsnake_case_var,here_is_a_messagepreserved