Widgets: Decode HTML entities in widget RSS title before escaping by Infinite-Null · Pull Request #9042 · WordPress/wordpress-develop

Infinite-Null · 2025-06-22T17:48:33Z

What

RSS feeds sometimes contain HTML tags that have been escaped as HTML entities in their titles. The wp_widget_rss_output() function currently strips HTML tags but doesn't decode HTML entities first, causing escaped tags like  to display as literal text instead of being properly removed.

Example:

RSS feed title: Oral administration of Lactiplantibacillus plantarum GKK1 ameliorates atopic dermatitis
Current display: Shows  as visible text
Expected display: Clean title without HTML entities or tags

How

Add html_entity_decode() before strip_tags() in the title processing logic.

Testing Instruction

Activate Twenty Thirteen theme
Use Classic Widgets plugin to disable block editor widget editor
Go to widgets and add RSS Widget
Paste this URL: https://pubmed.ncbi.nlm.nih.gov/rss/search/16cUU5Jcud0BSYRzHgbqJGm_F6kq07gr9atM8kZoogUmZdX5oj/

Screenshot:

Before	After

github-actions · 2025-06-22T17:48:44Z

The following accounts have interacted with this PR and/or linked issues. I will continue to update these lists as activity occurs. You can also manually ask me to refresh this list by adding the props-bot label.

Core Committers: Use this line as a base for the props when committing in SVN:

Props ankitkumarshah, wildworks, mukesh27, n8finch.

To understand the WordPress project's expectations around crediting contributors, please review the Contributor Attribution page in the Core Handbook.

github-actions · 2025-06-22T18:01:58Z

Test using WordPress Playground

The changes in this pull request can previewed and tested using a WordPress Playground instance.

WordPress Playground is an experimental project that creates a full WordPress instance entirely within the browser.

Some things to be aware of

The Plugin and Theme Directories cannot be accessed within Playground.
All changes will be lost when closing a tab with a Playground instance.
All changes will be lost when refreshing the page.
A fresh instance is created each time the link below is clicked.
Every time this pull request is updated, a new ZIP file containing all changes is created. If changes are not reflected in the Playground instance,
it's possible that the most recent build failed, or has not completed. Check the list of workflow runs to be sure.

For more details about these limitations and more, check out the Limitations page in the WordPress Playground documentation.

Test this pull request with WordPress Playground.

mukeshpanchal27 · 2025-06-23T08:05:10Z

Thanks @Infinite-Null, The changes looks good to me.

n8finch · 2025-06-23T18:49:32Z

@Infinite-Null these changes look good, ~~can you also make the change to the same code in the wp-includes/blocks/rss.php file?~~ Nevermind, I see that's taken care of here: WordPress/gutenberg#70491

Tested well for me on WP 5.0:

Before:

After:

mukeshpanchal27 · 2025-06-24T03:54:34Z

Let's add unit tests that check the updated functionality.

t-hamano · 2025-06-24T04:37:14Z

@Infinite-Null Thanks for the PR.

Are the ENT_QUOTES and get_option( 'blog_charset' ) options necessary? Because the title will be escaped by esc_html at the end.

Infinite-Null · 2025-06-24T07:42:05Z

@t-hamano Thanks for the feedback! I've updated the patch to use the simpler html_entity_decode( $item->get_title() ) without the additional parameters.

Please review at your convenience.

t-hamano · 2025-06-24T07:53:18Z

@Infinite-Null can you check the following feedback?

Let's add unit tests that check the updated functionality.

Infinite-Null · 2025-06-24T08:20:30Z

Sure @mukeshpanchal27 @t-hamano, I will start working on the unit test shortly.

Infinite-Null · 2025-06-27T08:42:27Z

Hi @mukeshpanchal27 and @t-hamano, I have completed the unit test for this PR can you please review the test at your convenience.

t-hamano · 2025-06-30T03:22:43Z

Thanks for the update, the unit tests look good to me.

t-hamano · 2025-11-04T07:11:03Z

I have merged the latest trunk branch into this branch. Once all CI checks pass, I will commit this pull request.

t-hamano · 2025-11-04T12:22:44Z

@dmsnell, I noticed your feedback regarding html_entity_decode() on a separate ticket: https://core.trac.wordpress.org/ticket/64177#comment:10

I was planning to commit this PR before beta3, but do you have any feedback regarding this PR? In this case, I'm wondering whether simply using html_entity_decode() is sufficient.

dmsnell

It’s good we think through these changes, and think very intentionally about what kind of data we have, since we have multiple layers of encoding and escaping.

The RSS 2.0 Spec is very clear that only description should contain encoded HTML, and the linked feed is probably generated improperly. That means this fix has the potential to break other feeds, but in practice will probably fix more of them.

If we are going to apply this change to title I think it would make sense to update all of the item elements in this function together so we don’t get into a situation even more confusing than it already is.

dmsnell · 2025-11-04T20:54:59Z

src/wp-includes/widgets.php

 		$link = esc_url( strip_tags( $link ) );

-		$title = esc_html( trim( strip_tags( $item->get_title() ) ) );
+		$title = esc_html( trim( strip_tags( html_entity_decode( $item->get_title() ) ) ) );


sigh. we have a hard time separating XML and HTML. RSS makes it even more complicated because it’s not required to be a valid XML file and they usually don’t indicate how we should be interpreting the characters inside the different elements.

here is what comes from the linked feed. it’s obvious they have sent encoded content inside the title element.

<dc:title>Oral administration of Lactiplantibacillus plantarum GKK1 ameliorates atopic dermatitis in a mouse model</dc:title>

this brings us to an odd spot. if we follow this logic we are missing a second html_entity_decode(), and the first one is wrong. the first level is decoding XML, where HTML named character references are not valid. that produces the HTML of the item’s title. then we want to decode the item’s title, revealing plaintext. but that title itself could have referenced something like drug A is < drug B in which case its HTML would be drug A is < drug B in which case the XML wrapping should escape that a second time into drug A is &lt; drug B.

we have decoded the XML into HTML and then removed tags, potentially leaving those same character references undecoded from the HTML side.

it may help to use additional variables, and I will recommend switching from html_entity_decode() on the XML side into the HTML API on the HTML side. why this function works for XML and breaks for HTML is beyond my imagination, but alas, that’s the way it is.

$item_title_xml = $item->get_title(); $item_title_html = html_entity_decode( $item_title_xml, ENT_XML1 | ENT_SUBSTITUTE ); $processor = new WP_HTML_Tag_Processor( $item_title_html ); $item_title = ''; while ( $processor->next_token() ) { if ( '#text' === $processor->get_token_name() ) { $item_title .= $processor->get_modifiable_text(); } }

This function is old and could use a lot of love.

Note that it’s very likely we could encounter other encodings of the title attribute. Fixing this could break others, because RSS is not explicit about the content type of the contained values.

The above code could turn into some function explicitly indicating what it is assuming and used here. Should we decide to enhance the robustness later, we would have an easy way to assess the existing code and swap it out as appropriate.

/** * Returns the plaintext content of encoded HTML content serialized in XML. * * When an RSS tag contains “encoded content” then the decoded XML * represents HTML. After decoding the XML into HTML, this returns * the plaintext content of that decoded HTML. * * Example: * * echo wp_rss_xml_to_html( '&#x1f63c;' ); * // 😼 * * echo wp_rss_xml_to_html_to_text( '&#x1f63c;' ); * // 😼 * */ function wp_rss_xml_to_html_to_text( string $raw_xml ): string { // XML only defines five named entities: & > < ' " $html = html_entity_decode( $raw_xml, ENT_XML1 | ENT_SUBSTITUTE ); $plaintext = ''; $processor = new WP_HTML_Tag_Processor( $html ); while ( $processor->next_token() ) { if ( '#text' === $processor->get_token_name() ) { $plaintext .= $processor->get_modifiable_text(); } } return $plaintext; }

and then call it

$escaped_title = esc_html( trim( wp_rss_xml_to_html_to_text( $item->get_title() ) ) );

The same should likely apply to the description below as well. Why do we call esc_attr( esc_html( $desc ) ) 🤦‍♂️ 😭.

Fix: Decode HTML entities in widget RSS title before escaping

5363661

himanshupathak95 mentioned this pull request Jun 23, 2025

RSS Block: Decode HTML entities in feed titles before display WordPress/gutenberg#70491

Merged

mukeshpanchal27 approved these changes Jun 23, 2025

View reviewed changes

Infinite-Null added 2 commits June 24, 2025 12:52

fix: Decode HTML entities in RSS item titles before escaping

13b4924

fix: Remove unnecessary HTML entity decoding from RSS item titles

71c1ca0

test: Add unit test for HTML entities decoding in RSS title output

e469162

Infinite-Null requested a review from mukeshpanchal27 June 27, 2025 08:42

Merge branch 'trunk' into fix/rss-widget-html-entities

959c8aa

SirLouen mentioned this pull request Nov 4, 2025

RSS Widgets: HTML entities that are part of HTML tags should be removed. Refactoring Tests #9042 #10463

Open

dmsnell reviewed Nov 4, 2025

View reviewed changes

Conversation

Infinite-Null commented Jun 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

How

Testing Instruction

Screenshot:

Uh oh!

github-actions bot commented Jun 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Jun 22, 2025

Test using WordPress Playground

Some things to be aware of

Uh oh!

mukeshpanchal27 commented Jun 23, 2025

Uh oh!

n8finch commented Jun 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mukeshpanchal27 commented Jun 24, 2025

Uh oh!

t-hamano commented Jun 24, 2025

Uh oh!

Infinite-Null commented Jun 24, 2025

Uh oh!

t-hamano commented Jun 24, 2025

Uh oh!

Infinite-Null commented Jun 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Infinite-Null commented Jun 27, 2025

Uh oh!

t-hamano commented Jun 30, 2025

Uh oh!

t-hamano commented Nov 4, 2025

Uh oh!

t-hamano commented Nov 4, 2025

Uh oh!

dmsnell left a comment

Choose a reason for hiding this comment

Uh oh!

dmsnell Nov 4, 2025 • edited by sirreal Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dmsnell Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Infinite-Null commented Jun 22, 2025 •

edited

Loading

github-actions bot commented Jun 22, 2025 •

edited

Loading

n8finch commented Jun 23, 2025 •

edited

Loading

Infinite-Null commented Jun 24, 2025 •

edited

Loading

dmsnell Nov 4, 2025 •

edited by sirreal

Loading