fix(line): preserve underscores inside words in stripMarkdown by jackjin1997 · Pull Request #47465 · openclaw/openclaw

jackjin1997 · 2026-03-15T16:17:13Z

Summary

The italic-underscore regex in stripMarkdown incorrectly stripped underscores inside snake_case words (e.g. here_is_a_message → herisamessage).
Tighten lookbehind/lookahead from (?<!_) / (?!_) to (?<!\w) / (?!\w) so only standalone _italic_ markers are removed.
Added 3 test cases covering underscore preservation and proper italic stripping.

Fixes #46185

Test plan

pnpm test src/line/markdown-to-line.test.ts — 18/18 pass
New tests verify snake_case_var, here_is_a_message preserved
Existing italic stripping tests still pass

xkonjin

Code Review — Preserve underscores inside words in stripMarkdown

Small, focused fix. The regex change is correct:

// Before: (?<!_)_(?!_)(.+?)(?<!_)_(?!_)
// After:  (?<!\w)_(?!_)(.+?)(?<!_)_(?!\w)

Changing the lookbehind/lookahead from _ to \w means snake_case_var no longer has its underscores stripped, while standalone _italic_ still works.

Edge case — underscores at string boundaries: The tests cover _italic_ at start and end _italic_, which is good since \w lookbehind at position 0 correctly fails (no preceding word char).
Potential false negative: (_italic_) — underscore italic wrapped in parens. The ( is not \w, so the lookbehind passes and this still strips correctly. Good.
Missed edge case: What about _multi word italic_? The .+? is non-greedy, and there's no \w adjacent, so this should still strip. Confirmed by the existing "complex markdown" test.
Another edge case worth testing: A string like foo_bar _italic_ baz_qux — the middle _italic_ should strip while the surrounding underscores in foo_bar and baz_qux stay. Might be worth adding this as a test case to lock down the behavior.

Clean fix, good tests. 👍

greptile-apps · 2026-03-15T16:22:19Z

Greptile Summary

This PR fixes a regex bug in stripMarkdown where the underscore-italic pattern /(?<!_)_(?!_)(.+?)(?<!_)_(?!_)/g would incorrectly strip underscores inside snake_case identifiers (e.g. snake_case_var → snakevar). The fix tightens both the opening lookbehind (from (?<!_) to (?<!\w)) and the closing lookahead (from (?!_) to (?!\w)), so only underscores at true word boundaries are treated as italic markers.

Root cause: The old (?<!_) lookbehind only guarded against a preceding underscore; any letter or digit before _ would still permit an italic match, mangling words like snake_case_var.
Fix: Changing to (?<!\w) ensures any word character ([a-zA-Z0-9_]) before an opening _ disqualifies it as an italic opener — cleanly aligning with CommonMark's left-flanking delimiter rule.
Bonus fix not in the description: The old closing lookahead (?!_) would also incorrectly strip _italic_word (since w ≠ _); the new (?!\w) correctly preserves such patterns.
Tests: 6 new test cases cover snake_case preservation and correct italic stripping; all existing tests continue to pass.

Confidence Score: 5/5

This PR is safe to merge — it is a well-scoped regex fix with comprehensive tests and no breaking changes.
The regex change is minimal, logically correct, and well-aligned with CommonMark's word-boundary rules for underscore italic markers. The six new test cases directly validate both the preservation of snake_case words and the continued stripping of legitimate italic syntax. No other code paths are affected.
No files require special attention.

_{Last reviewed commit: b70adeb}

jackjin1997 · 2026-03-15T16:57:45Z

@xkonjin thanks for the thorough review! Added the foo_bar _italic_ baz_qux edge case test — confirms italic between underscored words strips correctly while preserving the surrounding underscores. 21/21 pass.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a07dd6a728

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

src/line/markdown-to-line.ts

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: dacda07be4

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

src/line/markdown-to-line.ts

jackjin1997 · 2026-03-18T09:36:34Z

Good catch on the Unicode boundary — switched from \w to [\p{L}\p{N}] with the u flag so non-Latin scripts (CJK, Cyrillic) adjacent to underscores are preserved. Added test cases for Russian, Japanese, and digit boundaries.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: cf98fd84f2

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-03-18T12:40:47Z

src/line/markdown-to-line.ts

  // Remove italic: *text* or _text_ (but not already processed)
  result = result.replace(/(?<!\*)\*(?!\*)(.+?)(?<!\*)\*(?!\*)/g, "$1");
-  result = result.replace(/(?<!_)_(?!_)(.+?)(?<!_)_(?!_)/g, "$1");
+  result = result.replace(/(?<![\p{L}\p{N}])_(?!_)(.+?)(?<!_)_(?![\p{L}\p{N}])/gu, "$1");


Handle underscore italics inside non-spaced scripts

The new (?<![\p{L}\p{N}]) / (?![\p{L}\p{N}]) guards fix snake_case, but they also stop _italic_ from being stripped when it is embedded in languages that normally do not use spaces. For example, stripMarkdown("これは_重要_です") or stripMarkdown("请看_这里_谢谢") now leaves the underscores in place because the delimiters touch CJK letters on both sides. Since stripMarkdown() is reused for LINE/TTS/plain-text send paths, localized markdown replies will now surface literal _ markers instead of emphasized text.

Useful? React with 👍 / 👎.

Tighten italic-underscore regex lookbehind/lookahead from _ to \w so snake_case identifiers are preserved while standalone _italic_ markers are still stripped. Fixes openclaw#46185

Switch from ASCII-only \w to [\p{L}\p{N}] with the u flag so non-Latin scripts (CJK, Cyrillic, etc.) adjacent to underscores are not mistakenly treated as italic markers.

Takhoffman · 2026-03-29T02:31:20Z

Autoland landed this after a manual freshness rebase onto current main.

What changed during prep:

moved the underscore fix onto the shared stripMarkdown helper introduced by the markdown refactor
kept LINE coverage and added shared TTS regression coverage for Latin, Cyrillic, and CJK underscore cases
added the required changelog entry

Verified:

pnpm install --frozen-lockfile
pnpm build
pnpm test -- extensions/line/src/markdown-to-line.test.ts src/tts/prepare-text.test.ts

Note: pnpm check is currently failing on unchanged extensions/microsoft/speech-provider.test.ts typing at lines 108 and 139 on the rebased base, outside this PR diff.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 934442b9de

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-03-29T02:34:02Z

src/shared/text/strip-markdown.ts


  result = result.replace(/(?<!\*)\*(?!\*)(.+?)(?<!\*)\*(?!\*)/g, "$1");
-  result = result.replace(/(?<!_)_(?!_)(.+?)(?<!_)_(?!_)/g, "$1");
+  result = result.replace(/(?<![\p{L}\p{N}])_(?!_)(.+?)(?<!_)_(?![\p{L}\p{N}])/gu, "$1");


Preserve underscore italics stripping in non-spaced scripts

The new boundary guard blocks _..._ stripping whenever the delimiters touch Unicode letters, which regresses valid markdown-style emphasis in languages that typically omit spaces. With this regex, inputs like これは_重要_です and 请看_这里_谢谢 now keep literal underscores, whereas the previous implementation removed them; this affects shared stripMarkdown() consumers (LINE fallback text and TTS), so localized emphasized replies can now be rendered/spoken with raw _ markers instead of clean text.

Useful? React with 👍 / 👎.

…aw#47465) Fixes openclaw#46185. Verified: - pnpm install --frozen-lockfile - pnpm build - pnpm test -- extensions/line/src/markdown-to-line.test.ts src/tts/prepare-text.test.ts Note: `pnpm check` currently fails on unchanged `extensions/microsoft/speech-provider.test.ts` lines 108 and 139 on the rebased base, outside this PR diff.

openclaw-barnacle bot added the size: XS label Mar 15, 2026

xkonjin reviewed Mar 15, 2026

View reviewed changes

jackjin1997 force-pushed the fix/line-strip-markdown-underscore branch from 0cb7338 to 41af5ee Compare March 15, 2026 16:33

jackjin1997 force-pushed the fix/line-strip-markdown-underscore branch 2 times, most recently from 16b70ec to 181cd2b Compare March 17, 2026 04:14

chatgpt-codex-connector bot reviewed Mar 18, 2026

View reviewed changes

src/line/markdown-to-line.ts Outdated Show resolved Hide resolved

chatgpt-codex-connector bot reviewed Mar 18, 2026

View reviewed changes

src/line/markdown-to-line.ts Outdated Show resolved Hide resolved

chatgpt-codex-connector bot reviewed Mar 18, 2026

View reviewed changes

jackjin1997 and others added 4 commits March 28, 2026 21:22

fix(line): preserve underscores inside words in stripMarkdown

76e82aa

Tighten italic-underscore regex lookbehind/lookahead from _ to \w so snake_case identifiers are preserved while standalone _italic_ markers are still stripped. Fixes openclaw#46185

fix: use Unicode-aware boundary check for underscore italic stripping

5886122

Switch from ASCII-only \w to [\p{L}\p{N}] with the u flag so non-Latin scripts (CJK, Cyrillic, etc.) adjacent to underscores are not mistakenly treated as italic markers.

test: add non-Latin standalone italic edge case

8812d38

Changelog: note LINE underscore preservation fix

934442b

Takhoffman force-pushed the fix/line-strip-markdown-underscore branch from 0664912 to 934442b Compare March 29, 2026 02:30

openclaw-barnacle bot added the channel: line Channel integration: line label Mar 29, 2026

Takhoffman merged commit a0407c7 into openclaw:main Mar 29, 2026
10 checks passed

chatgpt-codex-connector bot reviewed Mar 29, 2026

View reviewed changes

github-actions bot mentioned this pull request Mar 29, 2026

📡 Upstream Digest — 2026-03-29 04:31 UTC curtismercier/openclaw-mods#397

Open

Uh oh!

Conversation

jackjin1997 commented Mar 15, 2026

Summary

Test plan

Uh oh!

xkonjin left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot commented Mar 15, 2026

Greptile Summary

Confidence Score: 5/5

Uh oh!

jackjin1997 commented Mar 15, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

jackjin1997 commented Mar 18, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Takhoffman commented Mar 29, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Mar 29, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants