Skip to content

Conversation

@ockham
Copy link
Contributor

@ockham ockham commented Aug 13, 2025

Currently, Block Bindings support for block attributes such as core/paragraph's content or core/button's text depends on hard-coded logic to locate and replace the respective attributes. Since that logic is not filterable, it means that extending support for additional Core or third-party blocks requires hand-writing similar code in the block's PHP. This is limiting the scalability of Block Bindings.

Thus, the existing block-specific custom logic should be replaced by more generic code that is able to locate and replace an attribute based on the selector definition in its block.json.

Ultimately, this will require a set_inner_html() method from the HTML API (which doesn't exist yet); but we can already provide a decent solution based on what's currently available from the HTML API.


This is a preparatory step for making it easier to have block bindings support more blocks and block attributes, see WordPress/gutenberg#70642 (comment).

For example, try applying the following patch, which enables Block Bindings support for the Image block's caption attribute, and test coverage to verify the latter. Check that tests pass (npm run test:php -- --group=block-bindings):

Patch
diff --git a/src/wp-includes/class-wp-block.php b/src/wp-includes/class-wp-block.php
index 173b0dbe80..2a72d09ddb 100644
--- a/src/wp-includes/class-wp-block.php
+++ b/src/wp-includes/class-wp-block.php
@@ -109,7 +109,7 @@ class WP_Block {
 	private const BLOCK_BINDINGS_SUPPORTED_ATTRIBUTES = array(
 		'core/paragraph' => array( 'content' ),
 		'core/heading'   => array( 'content' ),
-		'core/image'     => array( 'id', 'url', 'title', 'alt' ),
+		'core/image'     => array( 'id', 'url', 'title', 'alt', 'caption' ),
 		'core/button'    => array( 'url', 'text', 'linkTarget', 'rel' ),
 		'core/post-date' => array( 'datetime' ),
 	);
diff --git a/tests/phpunit/tests/block-bindings/render.php b/tests/phpunit/tests/block-bindings/render.php
index 60b970cf79..178c5c17ba 100644
--- a/tests/phpunit/tests/block-bindings/render.php
+++ b/tests/phpunit/tests/block-bindings/render.php
@@ -83,6 +83,16 @@ HTML
 				,
 				'<div class="wp-block-button"><a class="wp-block-button__link wp-element-button">test source value</a></div>',
 			),
+			'image block' => array(
+				'caption',
+				<<<HTML
+<!-- wp:image {"id":66,"sizeSlug":"large","linkDestination":"none"} -->
+<figure class="wp-block-image size-large"><img src="breakfast.jpg" alt="" class="wp-image-1"/><figcaption class="wp-element-caption">Breakfast at a <em>café</em> in Wrocław.</figcaption></figure>
+<!-- /wp:image -->
+HTML
+			,
+			'<figure class="wp-block-image size-large"><img src="breakfast.jpg" alt="" class="wp-image-1"/><figcaption class="wp-element-caption">test source value</figcaption></figure>'
+			)
 		);
 	}
 

Trac ticket: https://core.trac.wordpress.org/ticket/63840


This Pull Request is for code review only. Please keep all other discussion in the Trac ticket. Do not merge this Pull Request. See GitHub Pull Requests for Code Review in the Core Handbook for more details.

@ockham ockham self-assigned this Aug 13, 2025
@github-actions
Copy link

Test using WordPress Playground

The changes in this pull request can previewed and tested using a WordPress Playground instance.

WordPress Playground is an experimental project that creates a full WordPress instance entirely within the browser.

Some things to be aware of

  • The Plugin and Theme Directories cannot be accessed within Playground.
  • All changes will be lost when closing a tab with a Playground instance.
  • All changes will be lost when refreshing the page.
  • A fresh instance is created each time the link below is clicked.
  • Every time this pull request is updated, a new ZIP file containing all changes is created. If changes are not reflected in the Playground instance,
    it's possible that the most recent build failed, or has not completed. Check the list of workflow runs to be sure.

For more details about these limitations and more, check out the Limitations page in the WordPress Playground documentation.

Test this pull request with WordPress Playground.

@ockham ockham force-pushed the try/simplify-block-bindings-replace-html branch from e390302 to 83b3ae1 Compare August 13, 2025 13:29
@ockham ockham changed the title Block Bindings: Simplify replace_html() method Block Bindings: Allow more generic setting of block attributes Aug 19, 2025
@ockham ockham marked this pull request as ready for review August 20, 2025 09:18
@ockham ockham requested review from cbravobernal and gziolo August 20, 2025 09:18
@github-actions
Copy link

github-actions bot commented Aug 20, 2025

The following accounts have interacted with this PR and/or linked issues. I will continue to update these lists as activity occurs. You can also manually ask me to refresh this list by adding the props-bot label.

Core Committers: Use this line as a base for the props when committing in SVN:

Props bernhard-reiter, jonsurrell, gziolo, cbravobernal, dmsnell.

To understand the WordPress project's expectations around crediting contributors, please review the Contributor Attribution page in the Core Handbook.

@ockham
Copy link
Contributor Author

ockham commented Aug 20, 2025

Here's a little experiment that builds on this PR: ockham#5

@gziolo
Copy link
Member

gziolo commented Aug 20, 2025

This is awesome! Can we replicate the same logic in the Gutenberg plugin or does it require any code from HTML API in trunk?

I would be happy to explore how this could be integrated with WordPress/gutenberg#70975.

Copy link
Member

@sirreal sirreal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've left some early feedback and questions.

* @param string $rich_text The rich text to replace the original content with.
* @return bool True on success.
*/
public function replace_rich_text( $rich_text ) {
Copy link
Member

@sirreal sirreal Aug 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should rely on WP_HTML_Text_Replacement and lexical_updates, is there a reason it does not?

The algorithm should be something like:

  • Ensure the processor is stopped on an open tag that is not atomic (like SCRIPT), void (like BR), nor foreign content with a self-closing flag (like G in (<svg><g /></svg>).
  • It may be necessary to flush updates with get_updated_html, I'm not sure off the top of my head.
  • Find the open tag, use a bookmark to get its closing position.
  • Find the matching close tag, use a bookmark to find the byte length of the replacement.
  • Push a lexical update replacement like this:
$this->lexical_updates[] = new WP_HTML_Text_Replacement(
	$start,
	$length,
	$replacement_html
);

I'd like if the function mentioned that the replacement is not checked for proper HTML nesting, it's unsafe by nature. The method provides raw HTML replacement between tags. There's no guarantee that the HTML will nest as expected after the replacement is applied.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should rely on WP_HTML_Text_Replacement and lexical_updates, is there a reason it does not?

The algorithm should be something like: [...]

I think that would amount to a basic version of set_inner_html(), no? The idea for WP_Block_Bindings_Processor was that it would give weaker guarantees -- e.g. you can't currently replace_rich_text(), then seek() to an earlier position, and make another change. This simpler and less safe implementation should however be sufficient for the problem domain -- i.e. block bindings -- where we can make a few assumptions on what kind of operations are needed.

The current approach is based on a recommendation and an earlier experiment by @dmsnell, which is similar in spirit (with a string buffer that's totally separate from the one that's kept by WP_HTML_Processor).

That said, I can spin up another PR to try out the lexical_updates-based approach.


I'd like if the function mentioned that the replacement is not checked for proper HTML nesting, it's unsafe by nature. The method provides raw HTML replacement between tags. There's no guarantee that the HTML will nest as expected after the replacement is applied.

Good point. Something like that is mentioned in the PHPDoc of the class, but it's a good idea to also include it in the method's.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That said, I can spin up another PR to try out the lexical_updates-based approach.

ockham#6

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that would amount to a basic version of set_inner_html(), no?

Yes, but an unsafe version of set_inner_html(). Similar to what's implemented here.

The idea for WP_Block_Bindings_Processor was that it would give weaker guarantees -- e.g. you can't currently replace_rich_text(), then seek() to an earlier position, and make another change. This simpler and less safe implementation should however be sufficient for the problem domain -- i.e. block bindings -- where we can make a few assumptions on what kind of operations are needed.

The current approach is based on a recommendation and an earlier experiment by @dmsnell, which is similar in spirit (with a string buffer that's totally separate from the one that's kept by WP_HTML_Processor).

It's unclear to me why this would be safer. The unsafe operation is intending to put HTML inside a tag that potentially "leaks" to change external structure. Once we do that here, it feels like the unsafe part has happened. Taking the result of this and putting it back into another processor easily circumvents the prevention on seeks and such.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for context I was using substr() because we controlled the entire execution environment and only moved forward after making any changes.

while I have no problem moving to using lexical updates, it may be valuable to wait until the end to do things with them given that the HTML Processor is already relatively slow.

what I’ve done in the past is keep an external array of updates, then at the end dump them into a new Tag Processor for batch replacement. but still, I think that moving forward only is important here.

$this->replacements[] = new WP_HTML_Text_Replacment(…);

…

$replacer = new class( $html ) extends WP_HTML_Tag_Processor {
	public function swap_replacements( $replacements ) {
		$this->lexical_updates = $replacements;
	}
}

$replacer->swap_replacements( $this->replacements );
return $replacer->get_updated_html();

@ockham and I did discuss safety and I was suggesting that this serialization builder is probably good enough for now, providing comparable security to what Dom\HtmlDocument provides. everything above that is even better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Ensure the processor is stopped on an open tag that is not atomic (like SCRIPT), void (like BR), nor foreign content with a self-closing flag (like G in (<svg><g /></svg>).

Looks like expects_closer() (negated) should cover those cases: 3410f40

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, the way that the attribute replacing in block bindings currently works is as follows:

  • For each bound block attribute, we create a new processor for the block markup. The type of processor we create depends on the attribute source:
    • If the attribute source is rich-text, we create a block bindings processor; if the attribute source is an HTML attribute, we create an HTML Tag Processor.
  • We use the processor to locate the sourced attribute's content in the block markup, and replace it with its new value.
  • We then call get_updated_html() to get the resulting HTML.

This means that there's currently only ever one attribute replaced per processor created. We also only move forward within that processor, with one exception: If we have a comma-separated selector like e.g. the Button block's text attribute has, we iterate over those individual tag selectors, and reset the processor's cursor to the first tag (by seek()ing a bookmark that we set initially).

In practice, a comma-separated selector such as a,button is there because the user can choose whether they want their button block to use an a tag or a button. These choices are of course mutually exclusive -- one button block will either contain an a or a button, but never both. As a result, the seek will only happen before the block markup is modified, which AFAICT should be fine ✅


It is however arguable that instantiating a new processor for each attribute isn't great in terms of performance. If we were ever going to change that and instead use one processor to update all bound attributes, we'd have to deal with cases like the following -- an image block whose url and caption attributes need to be bound:

<!-- wp:image {"metadata":{"bindings":{"url":{"source":"test/source"}}}} -->
<figure class="wp-block-image">
<img src="fallback.jpg" alt=""/>
<figcaption class="wp-element-caption">Fallback caption</figcaption></figure>
<!-- /wp:image -->

I guess we'd change the algorithm to advance through the markup tag-by-tag, check if the current tag matches any of the sourced attribute selectors, and replace it if yes. So we'd probably still be able to ensure forward movement.


So what's the bottom line? I'd say that we can guarantee forward-only movement right now, meaning that the serialization builder would probably be good enough indeed. I'm also fine with the WP_HTML_Text_Replacment-based one.

@dmsnell As for the extra optimization of keeping a separate stack of WP_HTML_Text_Replacments, and feeding them to a tag processor at the end for better performance, I'm wondering if that makes a difference for our use case? We stop traversing the document and apply the (only) change via get_updated_html() once we've located the portion that needs replacing anyway.

Copy link
Contributor Author

@ockham ockham Aug 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've applied Dennis' suggestion locally:

Patch
diff --git a/src/wp-includes/class-wp-block.php b/src/wp-includes/class-wp-block.php
index e2b75f6643..54f175bb38 100644
--- a/src/wp-includes/class-wp-block.php
+++ b/src/wp-includes/class-wp-block.php
@@ -435,7 +435,13 @@ class WP_Block {
 						// TODO: Use `WP_HTML_Processor::set_inner_html` method once it's available.
 						$block_reader->release_bookmark( 'iterate-selectors' );
 						$block_reader->replace_rich_text( wp_kses_post( $source_value ) );
-						return $block_reader->get_updated_html();
+						$replacer = new class( $block_content ) extends WP_HTML_Tag_Processor {
+							public function swap_replacements( $replacements ) {
+								$this->lexical_updates = $replacements;
+							}
+						};
+						$replacer->swap_replacements( $block_reader->my_lexical_updates );
+						return $replacer->get_updated_html();
 					} else {
 						$block_reader->seek( 'iterate-selectors' );
 					}
@@ -463,6 +469,8 @@ class WP_Block {
 
 	private static function get_block_bindings_processor( string $block_content ) {
 		$internal_processor_class = new class('', WP_HTML_Processor::CONSTRUCTOR_UNLOCK_CODE) extends WP_HTML_Processor {
+			public $my_lexical_updates = array();
+
 			/**
 			 * Replace the rich text content between a tag opener and matching closer.
 			 *
@@ -494,7 +502,7 @@ class WP_Block {
 				$end         = $tag_closer->start;
 				$this->release_bookmark( '_wp_block_bindings_tag_closer' );
 
-				$this->lexical_updates[] = new WP_HTML_Text_Replacement(
+				$this->my_lexical_updates[] = new WP_HTML_Text_Replacement(
 					$start,
 					$end - $start,
 					$rich_text

One thing I've noted based on the unit tests I wrote for this PR is that it becomes harder to mix attribute changes (that rely on methods that are native to the HTML (Tag) Processor) with rich text replacements that rely on the block bindings processor -- essentially requiring an extra get_updated_html() to apply the former before the latter:

Patch
diff --git a/tests/phpunit/tests/block-bindings/wp-block-get-block-bindings-processor.php b/tests/phpunit/tests/block-bindings/wp-block-get-block-bindings-processor.php
index b199e10b0e..1714734d5a 100644
--- a/tests/phpunit/tests/block-bindings/wp-block-get-block-bindings-processor.php
+++ b/tests/phpunit/tests/block-bindings/wp-block-get-block-bindings-processor.php
@@ -45,26 +52,35 @@ class Tests_Blocks_GetBlockBindingsProcessor extends WP_UnitTestCase {
 		$figure_opener = '<figure class="wp-block-image">';
 		$img           = '<img src="breakfast.jpg" alt="" class="wp-image-1"/>';
 		$figure_closer = '</figure>';
-		$processor     = self::$get_block_bindings_processor_method->invoke(
-			null,
-			$figure_opener .
+
+		$html = $figure_opener .
 			$img .
 			'<figcaption class="wp-element-caption">Breakfast at a <em>café</em> in Berlin</figcaption>' .
-			$figure_closer
-		);
+			$figure_closer;
+
+		$processor = self::$get_block_bindings_processor_method->invoke( null, $html );
 
 		$processor->next_tag( array( 'tag_name' => 'figure' ) );
 		$processor->add_class( 'size-large' );
+		$html = $processor->get_updated_html(); // This right here.
 
 		$processor->next_tag( array( 'tag_name' => 'figcaption' ) );
 
 		$this->assertTrue( $processor->replace_rich_text( '<strong>New</strong> image caption' ) );
+
+		$replacer = new class( $html ) extends WP_HTML_Tag_Processor {
+			public function swap_replacements( $replacements ) {
+				$this->lexical_updates = $replacements;
+			}
+		};
+		$replacer->swap_replacements( $processor->my_lexical_updates );
+
 		$this->assertEquals(
 			'<figure class="wp-block-image size-large">' .
 			$img .
 			'<figcaption class="wp-element-caption"><strong>New</strong> image caption</figcaption>' .
 			$figure_closer,
-			$processor->get_updated_html()
+			$replacer->get_updated_html()
 		);
 	}
 

It gets of course even messier if the attribute change is made after the rich text change, and at an earlier position than the rich text change.

@ockham
Copy link
Contributor Author

ockham commented Aug 20, 2025

This is awesome! Can we replicate the same logic in the Gutenberg plugin or does it require any code from HTML API in trunk?

I don't think it relies on any new additions to the HTML API that are only present in trunk, so it should be possible to backport this to GB. The main annoyance is probably that the block bindings logic is rather opaquely located in the WP_Block class, i.e. there are no "easy" filters to extend it for other blocks and attributes; so instead, we might have to duplicate a bunch of code in the block's render() method.

I would be happy to explore how this could be integrated with WordPress/gutenberg#70975.

👍

FWIW, non-conditional binding (i.e. replacement of an existing citation) works pretty much OOTB (with this PR), simply by adding the attribute to BLOCK_BINDINGS_SUPPORTED_ATTRIBUTES:

Patch
diff --git a/src/wp-includes/class-wp-block.php b/src/wp-includes/class-wp-block.php
index 173b0dbe80..902093fe3b 100644
--- a/src/wp-includes/class-wp-block.php
+++ b/src/wp-includes/class-wp-block.php
@@ -112,6 +112,7 @@ class WP_Block {
 		'core/image'     => array( 'id', 'url', 'title', 'alt' ),
 		'core/button'    => array( 'url', 'text', 'linkTarget', 'rel' ),
 		'core/post-date' => array( 'datetime' ),
+		'core/pullquote' => array( 'citation' ),
 	);
 
 	/**
diff --git a/tests/phpunit/tests/block-bindings/render.php b/tests/phpunit/tests/block-bindings/render.php
index 60b970cf79..d2c38b7951 100644
--- a/tests/phpunit/tests/block-bindings/render.php
+++ b/tests/phpunit/tests/block-bindings/render.php
@@ -83,6 +83,17 @@ HTML
 				,
 				'<div class="wp-block-button"><a class="wp-block-button__link wp-element-button">test source value</a></div>',
 			),
+			'pullquote block'   => array(
+				'citation',
+				<<<HTML
+<!-- wp:pullquote -->
+<figure class="wp-block-pullquote"><blockquote><p>This should not appear</p><cite>Quote McQuoteface</cite></blockquote></figure>
+<!-- /wp:pullquote -->
+HTML
+				,
+				'<figure class="wp-block-pullquote"><blockquote><p>This should not appear</p><cite>test source value</cite></blockquote></figure>',
+			)
+
 		);
 	}
 

@ockham
Copy link
Contributor Author

ockham commented Aug 21, 2025

Here's a minor enhancement to the tests that will currently break with the implementation of WP_Block_Bindings_Processor found in this PR:

diff --git a/tests/phpunit/tests/block-bindings/wpBlockBindingsProcessor.php b/tests/phpunit/tests/block-bindings/wpBlockBindingsProcessor.php
index 5a8cf11095..d7f20099c0 100644
--- a/tests/phpunit/tests/block-bindings/wpBlockBindingsProcessor.php
+++ b/tests/phpunit/tests/block-bindings/wpBlockBindingsProcessor.php
@@ -79,10 +79,11 @@ class Tests_Blocks_wpBlockBindingsProcessor extends WP_UnitTestCase {
                $this->assertTrue( $processor->replace_rich_text( '<strong>New</strong> image caption' ) );
 
                $processor->seek( 'image' );
+               $processor->add_class( 'extra-img-class' );
 
                $this->assertEquals(
                        $figure_opener .
-                       $img .
+                       '<img src="breakfast.jpg" alt="" class="wp-image-1 extra-img-class"/>' .
                        '<figcaption class="wp-element-caption"><strong>New</strong> image caption</figcaption>' .
                        $figure_closer,
                        $processor->build()

The test passes however with the alternative implementation that's based on Jon's suggestion.

@ockham ockham force-pushed the try/simplify-block-bindings-replace-html branch from be3e5c2 to 3a651d9 Compare August 25, 2025 13:03
@ockham
Copy link
Contributor Author

ockham commented Aug 25, 2025

I've now merged @sirreal's PR ockham#7 which makes the helper class anonymous (see discussion), and carried over the changes from ockham#6 to use WP_HTML_Text_Replacement (rather than a string buffer) to avoid manipulations of the markup that would break the output (see).

This should be ready for another round of review.


Since WP_Block is getting a bit bloated (especially due to block bindings specific code), I'm planning to do a follow-up to move at least replace_html() out of that class (tentatively into block-bindings.php, under the name replace_block_attribute_value()) . (I will keep the block bindings processor class anonymous inside of that function, though.)

@ockham
Copy link
Contributor Author

ockham commented Aug 25, 2025

Unit test failures in CI seem to be unrelated.

@ockham ockham force-pushed the try/simplify-block-bindings-replace-html branch from b558a5a to a18e435 Compare August 26, 2025 18:24
Copy link
Contributor

@cbravobernal cbravobernal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Let's ship it and iterate it later if needed.

@github-actions
Copy link

A commit was made that fixes the Trac ticket referenced in the description of this pull request.

SVN changeset: 60684
GitHub commit: 6c21850

This PR will be closed, but please confirm the accuracy of this and reopen if there is more work to be done.


// Find matching tag closer.
while ( $this->next_token() && $this->get_current_depth() >= $depth ) {
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reasonable follow-up idea: verify that the processor completed and didn’t pause at an incomplete token or reach unsupported markup.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finally addressing these notes over at #10029.

This look about good? 8d7590e
(I was at first thinking to check if state was STATE_MATCHED_TAG, but then I thought, the higher-level check should give me that anyway.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good; I left a comment on the new PR.

// The bookmark names are prefixed with `_` so the key below has an extra `_`.
$tag_opener = $this->bookmarks['__wp_block_bindings_tag_opener'];
$start = $tag_opener->start + $tag_opener->length;
$this->release_bookmark( '_wp_block_bindings_tag_opener' );
Copy link
Member

@dmsnell dmsnell Sep 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is certainly welcome to do, to release the bookmarks, but performance-wise it’s not setting a better example than leaving them dangling.

given that this function contains all of the scanning, it’s even fine to trap the bookmark bounds and re-use one, such as set_bookmark( 'here' ). I’ve considered exposing something like ->current_token() but haven’t yet.

part of this is the fact that we expect isolation of bookmarks here and we will dismiss the processor when we’re done.

$start,
$end - $start,
$rich_text
);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

applying lexical updates that span token boundaries is not something we have particularly tested well. it’s probably working well enough and hopefully the tests here will catch them, but this is sort of uncharted territory within the HTML Processor. we should make sure that we don’t seek anywhere here to verify that we don’t cross indices.

this single issue is one of the main reasons I proposed the single linear pass of the serialization builder, because it structurally prevents the ability of jumping around or referring to indices after changing positions.

I’m not particularly confident I would know how this interacts with virtual tokens, for example, whose boundaries are coincident with bookmarks (vs. realized tokens which all have non-overlapping spans)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants