Skip to content

Auto-escape some markdown syntax in markdown labels#13887

Merged
lukasmasuch merged 2 commits intodevelopfrom
lukasmasuch/fix-markdown-dash
Feb 11, 2026
Merged

Auto-escape some markdown syntax in markdown labels#13887
lukasmasuch merged 2 commits intodevelopfrom
lukasmasuch/fix-markdown-dash

Conversation

@lukasmasuch
Copy link
Copy Markdown
Collaborator

@lukasmasuch lukasmasuch commented Feb 10, 2026

Describe your changes

This PR fixes issue #7359 where widget labels containing markdown syntax characters (-, +, *, #, >, 1.) would render as empty labels because the markdown parser was interpreting them as list markers, headings, or blockquotes which are then stripped for labels.

The fix escapes these markdown syntax patterns when isLabel is true, converting them to literal text by adding backslash escapes before markdown is processed. The escaping only applies to patterns followed by whitespace or end of line (e.g., "- item" is escaped but "not-a-list" is not).

GitHub Issue Link (if applicable)

Fixes #7359

Testing Plan

  • Unit Tests: 175 passing tests covering escaped patterns, non-escaped patterns, edge cases with pre-escaped text, and emphasis markdown
  • E2E Tests: Added test cases for "+" and "1. Something" labels to verify they display correctly in buttons
  • No additional manual testing needed beyond existing test coverage

Co-authored-by: sea-turt1e [email protected]


Contribution License Agreement

By submitting this pull request you agree that all contributions to this project are made under the Apache 2.0 license.

Escape markdown syntax patterns in labels that would otherwise be stripped,
leaving empty content. The fix adds backslash escapes before markdown list
markers (-, +, *), headings (#), blockquotes (>), and ordered list markers
(1., 1), etc.) when they appear at the start of a line followed by whitespace.

Also includes comprehensive unit tests covering escaped patterns, non-escaped
patterns, and edge cases like pre-escaped text and emphasis markdown.
Copilot AI review requested due to automatic review settings February 10, 2026 16:34
@snyk-io
Copy link
Copy Markdown
Contributor

snyk-io bot commented Feb 10, 2026

Snyk checks have passed. No issues have been found so far.

Status Scanner Critical High Medium Low Total (0)
Open Source Security 0 0 0 0 0 issues
Licenses 0 0 0 0 0 issues

💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Feb 10, 2026

✅ PR preview is ready!

Name Link
📦 Wheel file https://core-previews.s3-us-west-2.amazonaws.com/pr-13887/streamlit-1.54.0-py3-none-any.whl
📦 @streamlit/component-v2-lib Download from artifacts
🕹️ Preview app pr-13887.streamlit.app (☁️ Deploy here if not accessible)

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes issue #7359 where widget labels containing markdown syntax characters (-, +, *, #, >, 1.) would render as empty because the markdown parser interpreted them as list markers, headings, or blockquotes that are then stripped from labels.

Changes:

  • Added escaping logic in StreamlitMarkdown.tsx to escape markdown syntax patterns when isLabel is true
  • Added comprehensive unit tests covering escaped patterns, non-escaped patterns, edge cases, and emphasis markdown
  • Added E2E tests for button labels with "+" and "1. Something" to verify the fix works end-to-end

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File Description
frontend/lib/src/components/shared/StreamlitMarkdown/StreamlitMarkdown.tsx Implements markdown syntax escaping for labels using two regex patterns: one for unordered lists/headings/blockquotes, one for ordered lists
frontend/lib/src/components/shared/StreamlitMarkdown/StreamlitMarkdown.test.tsx Adds 175 test cases covering escaped patterns (-, +, *, #, >, 1.), non-escaped patterns (mid-word hyphens, hashtags, decimals), pre-escaped text, and emphasis markdown; updates existing test expectations to reflect the new escaping behavior
e2e_playwright/st_button_test.py Adds E2E test to verify markdown syntax characters are displayed literally in button labels
e2e_playwright/st_button.py Adds test buttons with "+" and "1. Something" labels to test app

@lukasmasuch lukasmasuch added security-assessment-completed change:feature PR contains new feature or enhancement implementation impact:users PR changes affect end users labels Feb 10, 2026
@lukasmasuch lukasmasuch changed the title Fix markdown syntax characters in widget labels Auto-escape some markdown syntax characters in markdown labels Feb 10, 2026
@lukasmasuch lukasmasuch added the ai-review If applied to PR or issue will run AI review workflow label Feb 10, 2026
@github-actions github-actions bot removed the ai-review If applied to PR or issue will run AI review workflow label Feb 10, 2026
@github-actions
Copy link
Copy Markdown
Contributor

Summary

This PR fixes issue #7359 where widget labels containing markdown syntax characters (-, +, *, #, >, 1., etc.) would render as empty or incorrect labels. The root cause was that the markdown parser interpreted these patterns as list markers, headings, or blockquotes, which were then stripped by the existing disallowedElements mechanism — leaving empty labels.

The fix adds regex-based escaping of markdown syntax patterns in processedSource when isLabel is true, converting them to literal text (via backslash escapes) before the markdown parser processes them. The escaping is carefully scoped to only match patterns at the start of a line followed by whitespace or end-of-line, avoiding false positives on text like not-a-list, #hashtag, or 1.5.

Changed files:

  • frontend/lib/src/components/shared/StreamlitMarkdown/StreamlitMarkdown.tsx — Core fix (regex escaping)
  • frontend/lib/src/components/shared/StreamlitMarkdown/StreamlitMarkdown.test.tsx — Unit tests
  • e2e_playwright/st_button.py — E2E test app script
  • e2e_playwright/st_button_test.py — E2E test assertions

Code Quality

The implementation is clean and well-structured:

  • Regex correctness: Both regex patterns are well-crafted:
    • /^(\s*)((?:[+\-*]|#+)(?=\s|$)|>)/gm handles unordered lists, headings, and blockquotes correctly.
    • /^(\s*)(\d+)([.)])(?=\s|$)/gm handles ordered lists, escaping only the punctuation (not the digits).
    • The gm flags ensure multi-line handling works correctly.
    • The ^ anchor prevents matching mid-line occurrences (e.g., not-a-list).
  • No double-escaping: Pre-escaped input like 1\. text or \- text does not match the regexes because \ is not in the matched character classes. This is correct.
  • Complement with existing mechanism: The LABEL_DISALLOWED_ELEMENTS list (line 871-892) still serves its purpose for elements not handled by escaping (e.g., tables). The two mechanisms work well together.
  • Proper memoization: The processing is inside useMemo with the correct dependency array [source, isLabel] (line 1062).
  • Good inline comments: The regex patterns are well-documented with examples of what they escape and what they don't.

Minor observations:

  1. invalidCases test description is slightly stale (StreamlitMarkdown.test.tsx line 630): The test name "does NOT render invalid markdown when isLabel is true" was accurate when the behavior was "strip disallowed elements." Now the behavior is "escape so markdown is never parsed in the first place." The test still passes correctly (escaped text renders as a <p>, not the disallowed tag), but the description could be updated to reflect the new escaping behavior for clarity.

  2. getBy* + toBeInTheDocument pattern (e.g., lines 633-634, 701-702): Per the frontend AGENTS.md, getBy* already throws if the element is not found, making toBeInTheDocument redundant — toBeVisible is preferred. However, this pattern is extensively used in the existing test file, so it's consistent with the surrounding code.

Test Coverage

Unit tests (113 lines added): Comprehensive and well-organized:

  • markdownEscapingCases (17 cases): Covers all escaped patterns — single characters (-, +, *, >, #), patterns with text, indented patterns, ordered lists with . and ), multi-digit ordered lists (99.), and multi-hash headings. Also verifies elements render as <p> (plain paragraph), not as special elements.
  • nonEscapingCases (5 cases): Important anti-regression tests ensuring no over-escaping — mid-word hyphens, hashless hashtags, decimal numbers, and pre-escaped text.
  • Emphasis test: Verifies *italic label* still renders as <em>, confirming the regex doesn't break emphasis syntax.
  • Updated invalidCases: Expectations correctly updated to reflect the new literal-text behavior.

E2E tests (12 lines added): Appropriately lightweight:

  • Tests + and 1. Something labels on buttons, verifying they display literally.
  • Uses get_element_by_key per best practices.
  • TOTAL_BUTTONS count correctly updated from 30 to 32.
  • Minor: Per e2e AGENTS.md, adding a negative assertion would strengthen the test (e.g., assert no empty button text or that the button label is not empty). However, the to_contain_text assertions implicitly verify the text is present and non-empty.

Backwards Compatibility

This is a backwards-compatible bug fix. The behavioral change only affects labels that previously had their content incorrectly stripped:

  • Labels with plain text are unaffected.
  • Labels using inline markdown (bold, italic, code, links) are unaffected — the regex only matches line-start patterns with space/EOL lookahead.
  • Labels that were previously empty due to the bug now correctly display the intended text.

The only theoretical concern would be users intentionally relying on markdown stripping in labels (e.g., using # as a prefix that gets removed). This would be an extreme edge case exploiting buggy behavior, and the fix is clearly the correct behavior.

Security & Risk

  • Low risk: The change is a pre-processing text transformation (regex escaping) that runs before the existing markdown parser. It adds backslash characters, which is safe.
  • No XSS concern: The escaping only adds \ characters to prevent markdown parsing. It does not remove any existing security mechanisms (HTML sanitization, allowHTML controls, etc.).
  • No user-provided regex: The regex patterns are static constants, not derived from user input.

Accessibility

This change improves accessibility by ensuring widget labels display their intended text content. Previously, labels like "-" or "+" would render as empty, which is problematic for both visual and screen reader users. Now they render as visible, readable text.

No new interactive elements or ARIA attributes are introduced, so no additional accessibility concerns apply.

Recommendations

  1. (Optional) Update the invalidCases test description at StreamlitMarkdown.test.tsx line 630: Consider renaming from "does NOT render invalid markdown when isLabel is true - $tag" to something like "escapes markdown syntax in labels so it renders as plain text - $tag" to better reflect the new escaping behavior.

  2. (Optional) Add a negative assertion to the E2E test: Per e2e best practices, test_markdown_syntax_in_labels could assert that the button labels are not empty elements, e.g., checking that the button doesn't contain an empty <p> tag or verifying to_have_count(1) for the text locator.

  3. (Nit) Consider extracting regexes to module-level constants: Per the "Static Data Structures" guideline in frontend/AGENTS.md, the two regex patterns inside useMemo are static and could be extracted to module-level const variables. While useMemo already prevents re-creation on re-renders, module-level extraction would be slightly cleaner and make them reusable/testable independently. However, since they're only used in one place and useMemo handles the performance concern, this is a minor style preference.

Verdict

APPROVED: Clean, well-tested bug fix that correctly escapes markdown syntax characters in widget labels to prevent them from being stripped, resolving #7359 without breaking existing functionality.


This is an automated AI review by opus-4.6-thinking.

Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Bugbot reviewed your changes and found no new issues!

Comment @cursor review or bugbot run to trigger another review on this PR

@lukasmasuch lukasmasuch changed the title Auto-escape some markdown syntax characters in markdown labels Auto-escape some markdown syntax in markdown labels Feb 10, 2026
@lukasmasuch lukasmasuch merged commit 9661d92 into develop Feb 11, 2026
56 of 57 checks passed
@lukasmasuch lukasmasuch deleted the lukasmasuch/fix-markdown-dash branch February 11, 2026 17:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

change:feature PR contains new feature or enhancement implementation impact:users PR changes affect end users

Projects

None yet

Development

Successfully merging this pull request may close these issues.

st.button("+") or st.button("-") results in empty button

3 participants