Skip to content

Conversation

@sirreal
Copy link
Member

@sirreal sirreal commented Aug 21, 2025

Change the JSON encoding of block attributes to address problems with strings ending in the \ character.

The encoding is updated to replace the escaped \ character with its unicode escape \u005c. This simplifies the replacement of escaped " characters with its unicode escape \u0022. This addresses a bug where plain, syntactic JSON " characters following an escaped \ character were replaced, breaking the JSON syntax and causing block attributes to be ignored entirely.

By replacing escaped \ characters, subsequent replacements become simpler. Escaped " replacement (applied by wp_kses_stripslashes() and the motivation for WordPress/gutenberg#6619 that introduced the escaped " replacement) can be maintained as a simple replacement with no need for backtracking or other complex regular expression techniques.

This should also make the syntactic JSON " characters safe from wp_kses_stripslashes() because escaped slashes are been replaced.

This applies the same encoding as WordPress/gutenberg#71291.

This likely requires #9559.

See WordPress/gutenberg#16508 and WordPress/gutenberg#6619.

Trac ticket: https://core.trac.wordpress.org/ticket/63917


This Pull Request is for code review only. Please keep all other discussion in the Trac ticket. Do not merge this Pull Request. See GitHub Pull Requests for Code Review in the Core Handbook for more details.

@github-actions
Copy link

Test using WordPress Playground

The changes in this pull request can previewed and tested using a WordPress Playground instance.

WordPress Playground is an experimental project that creates a full WordPress instance entirely within the browser.

Some things to be aware of

  • The Plugin and Theme Directories cannot be accessed within Playground.
  • All changes will be lost when closing a tab with a Playground instance.
  • All changes will be lost when refreshing the page.
  • A fresh instance is created each time the link below is clicked.
  • Every time this pull request is updated, a new ZIP file containing all changes is created. If changes are not reflected in the Playground instance,
    it's possible that the most recent build failed, or has not completed. Check the list of workflow runs to be sure.

For more details about these limitations and more, check out the Limitations page in the WordPress Playground documentation.

Test this pull request with WordPress Playground.

return strtr(
$encoded_attributes,
array(
'\\\\' => '\\u005c',
Copy link
Member Author

@sirreal sirreal Sep 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this to align with a different approach in WordPress/gutenberg#71291.

By replacing escaped \ characters with their escape sequence (firststrtr operates on the longest key first and does not perform replacements in already replaced segments), it makes the replacement of escaped quotes \" much easier.

The return value is a string where all the occurrences of the array keys have been replaced by the corresponding values. The longest keys will be tried first. Once a substring has been replaced, its new value will not be searched again.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It says the longest keys are replaced first, but '\\\\' is not the longest. It is the same length as '--' and '\\"'. It doesn't seem to indicate that the first array key is also replaced first.

If this is the case, should there be a separate strstr() call first specifically for replacing '\\\\'?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a really good point. The two relevant character sequences for search are \\ and \" because they have some overlap.

In the character sequence \\" we don't want to process \" first to produce \\u0022, we want to process \\ first to produce \u005c".

The PR includes tests where it's clear it's working well. I believe the processing is from the start of the string towards the end.

var_dump( strtr(
    '\\\\"',
    array(
        '\\"' => '[dbl-quote]',
        '\\\\' => '[backslash]',
    )
) );
// string(12) "[backslash]""
var_dump( strtr(
    '\\"\\\\',
    array(
        '\\"' => '[dbl-quote]',
        '\\\\' => '[backslash]',
    )
) );
// string(22) "[dbl-quote][backslash]"

In both cases, we get the desired result and it seems to be the behavior for any reasonably recent PHP version (7.0+ and even earlier I believe).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can add a specific test for this.

Copy link
Member Author

@sirreal sirreal Sep 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a test for this. When run against trunk it fails like this:

_Blocks_Serialize
 ✘ Attribute encoding
   ┐
   ├ Failed asserting that two strings are identical.
   ┊ ---·Expected
   ┊ +++·Actual
   ┊ @@ @@
   ┊ -'<!--·wp:test·{"lt":"\u003c","gt":"\u003e","amp":"\u0026","bs":"\u005c","quot":"\u0022","bs-bs-quot":"\u005c\u005c\u0022"}·/-->'
   ┊ +'<!--·wp:test·{"lt":"\u003c","gt":"\u003e","amp":"\u0026","bs":"\\u0022,"quot":"\u0022","bs-bs-quot":"\\\\\u0022"}·/-->'
   │
   ╵ /Users/jonsurrell/a8c/wordpress-develop/trunk/tests/phpunit/tests/blocks/serialize.php:35
   ┴

You can see the bad escaping there: "bs":"\\u0022. In this test on the branch we can confirm that the character sequence \\\" is replaced as desired with \u005c\u005c\u0022.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That quote is BS.

@sirreal sirreal force-pushed the fix/gb-16508-backslash-in-attribute-string branch from 3f8fa52 to b041b24 Compare September 2, 2025 08:31
@sirreal sirreal marked this pull request as ready for review September 3, 2025 07:11
@github-actions
Copy link

github-actions bot commented Sep 3, 2025

The following accounts have interacted with this PR and/or linked issues. I will continue to update these lists as activity occurs. You can also manually ask me to refresh this list by adding the props-bot label.

Core Committers: Use this line as a base for the props when committing in SVN:

Props jonsurrell, shailu25, westonruter.

To understand the WordPress project's expectations around crediting contributors, please review the Contributor Attribution page in the Core Handbook.

@sirreal sirreal requested a review from Mamaduka September 3, 2025 15:52
@sirreal sirreal requested a review from westonruter September 3, 2025 21:22
@westonruter
Copy link
Member

Experimenting with using Gemini CLI to help with code reviewing.

gh pr checkout https://github.com/WordPress/wordpress-develop/pull/9558;
git diff trunk... | gemini -m "gemini-2.5-pro" -p "Review these changes to WordPress core. Anything amiss?"

Output:

The changes look good. The switch from multiple preg_replace calls to a single strtr call in serialize_block_attributes is a nice performance improvement.

More importantly, the change to explicitly encode backslashes (\) as \u005c makes the serialization more robust and less ambiguous.

The updated and new tests in tests/phpunit/tests/blocks/serialize.php are thorough. I especially like the addition of test_older_serialization_is_compatible, which ensures that content with the old backslash escaping (\\) is correctly migrated to the new format (\u005c) upon re-serialization. This is great for backward compatibility.

Overall, this is a solid and well-tested improvement.

Copy link
Member

@westonruter westonruter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It also looks good to me!

@sirreal sirreal requested a review from Copilot September 4, 2025 13:25

This comment was marked as off-topic.

sirreal added a commit to WordPress/gutenberg that referenced this pull request Sep 4, 2025
Fix an issue in the block editor where attributes that terminate in the \ character are
mis-encoded and cause block attributes to be lost.

This change encodes `\` characters with their Unicode escape sequence `\u005c`
instead of their escaped form `\\`. This makes the replacement of escaped double quotes
`\"` much simpler because the preceding `\` character must be the escape character for the
quote and an escaped character itself.

Escaping of `\"` was originally introduced in #6619
to address an issue where `wp_kses_stripslashes()` would replace escaped double quotes `\"`
with plain quotes `"` and break JSON syntax (`{"str":"\""}` becomes `{"str":"""}`).

There is a companion ticket for WordPress Core: https://core.trac.wordpress.org/ticket/63917
And an associated PR to apply the same updated JSON encoding:
WordPress/wordpress-develop#9558

See a related ticket about `wp_kses_stripslashes()` and its purpose today:
https://core.trac.wordpress.org/ticket/63881

---

Unlinked contributors: ehti, Alexius08.

Co-authored-by: sirreal <[email protected]>
Co-authored-by: dmsnell <[email protected]>
Co-authored-by: Mamaduka <[email protected]>
Co-authored-by: skorasaurus <[email protected]>
Co-authored-by: youknowriad <[email protected]>
pento pushed a commit that referenced this pull request Sep 4, 2025
…butes.

Corrects an issue with block attribute encoding where JSON strings ending in the `\` character would be misencoded and cause block attributes to be lost.

Client-side block serialization was updated with matching logic in WordPress/gutenberg@10453ab.

Developed in #9558.

Props jonsurrell, westonruter, mamaduka, dmsnell, shailu25.
Fixes #63917.


git-svn-id: https://develop.svn.wordpress.org/trunk@60708 602fd350-edb4-49c9-b593-d223f7449a82
@github-actions
Copy link

github-actions bot commented Sep 4, 2025

A commit was made that fixes the Trac ticket referenced in the description of this pull request.

SVN changeset: 60708
GitHub commit: 89ae2bd

This PR will be closed, but please confirm the accuracy of this and reopen if there is more work to be done.

@github-actions github-actions bot closed this Sep 4, 2025
@sirreal sirreal deleted the fix/gb-16508-backslash-in-attribute-string branch September 4, 2025 18:31
markjaquith pushed a commit to markjaquith/WordPress that referenced this pull request Sep 4, 2025
…butes.

Corrects an issue with block attribute encoding where JSON strings ending in the `\` character would be misencoded and cause block attributes to be lost.

Client-side block serialization was updated with matching logic in WordPress/gutenberg@10453ab.

Developed in WordPress/wordpress-develop#9558.

Props jonsurrell, westonruter, mamaduka, dmsnell, shailu25.
Fixes #63917.

Built from https://develop.svn.wordpress.org/trunk@60708


git-svn-id: http://core.svn.wordpress.org/trunk@60044 1a063a9b-81f0-0310-95a4-ce76da25c4cd
github-actions bot pushed a commit to gilzow/wordpress-performance that referenced this pull request Sep 4, 2025
…butes.

Corrects an issue with block attribute encoding where JSON strings ending in the `\` character would be misencoded and cause block attributes to be lost.

Client-side block serialization was updated with matching logic in WordPress/gutenberg@10453ab.

Developed in WordPress/wordpress-develop#9558.

Props jonsurrell, westonruter, mamaduka, dmsnell, shailu25.
Fixes #63917.

Built from https://develop.svn.wordpress.org/trunk@60708


git-svn-id: https://core.svn.wordpress.org/trunk@60044 1a063a9b-81f0-0310-95a4-ce76da25c4cd
peterwilsoncc pushed a commit to peterwilsoncc/gutenberg-build that referenced this pull request Sep 4, 2025
Fix an issue in the block editor where attributes that terminate in the \ character are
mis-encoded and cause block attributes to be lost.

This change encodes `\` characters with their Unicode escape sequence `\u005c`
instead of their escaped form `\\`. This makes the replacement of escaped double quotes
`\"` much simpler because the preceding `\` character must be the escape character for the
quote and an escaped character itself.

Escaping of `\"` was originally introduced in WordPress/gutenberg#6619
to address an issue where `wp_kses_stripslashes()` would replace escaped double quotes `\"`
with plain quotes `"` and break JSON syntax (`{"str":"\""}` becomes `{"str":"""}`).

There is a companion ticket for WordPress Core: https://core.trac.wordpress.org/ticket/63917
And an associated PR to apply the same updated JSON encoding:
WordPress/wordpress-develop#9558

See a related ticket about `wp_kses_stripslashes()` and its purpose today:
https://core.trac.wordpress.org/ticket/63881

---

Unlinked contributors: ehti, Alexius08.

Co-authored-by: sirreal <[email protected]>
Co-authored-by: dmsnell <[email protected]>
Co-authored-by: Mamaduka <[email protected]>
Co-authored-by: skorasaurus <[email protected]>
Co-authored-by: youknowriad <[email protected]>

Source: WordPress/gutenberg@10453ab
jonnynews pushed a commit to spacedmonkey/wordpress-develop that referenced this pull request Sep 24, 2025
…butes.

Corrects an issue with block attribute encoding where JSON strings ending in the `\` character would be misencoded and cause block attributes to be lost.

Client-side block serialization was updated with matching logic in WordPress/gutenberg@10453ab.

Developed in WordPress#9558.

Props jonsurrell, westonruter, mamaduka, dmsnell, shailu25.
Fixes #63917.


git-svn-id: https://develop.svn.wordpress.org/trunk@60708 602fd350-edb4-49c9-b593-d223f7449a82
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants