Make WordPress Core

Opened 12 years ago

Last modified 6 months ago

#25644 new enhancement

strip_shortcodes always removes text between shortcode tags, should be optional

Reported by: jonscaife's profile jonscaife Owned by:
Milestone: Priority: normal
Severity: normal Version: 3.6.1
Component: Shortcodes Keywords: has-patch dev-feedback
Focuses: Cc:

Description

strip_shortcodes will always remove all of the content between shortcode tags. So, for example, if I have a shortcode tag which wraps a link or a style around some text the text is lost when the shortcode is removed.
Example:
Lorem ipsum [highlight]dolor[ /highlight] sit amet, consectetur adipisicing elit
becomes
Lorem ipsum sit amet, consectetur adipisicing elit
It should become
Lorem ipsum dolor sit amet, consectetur adipisicing elit

Removing the content between shortcodes may often be desirable behaviour, but there should be some way to retain the content. The easiest way would be for strip_shortcodes() to take a second parameter which defaults to true to remove the content, but if it is false then it leaves the content between the tags

Example change to wp-includes/shortcodes.php

Before

function strip_shortcodes( $content ) {
	global $shortcode_tags;

	if (empty($shortcode_tags) || !is_array($shortcode_tags))
		return $content;

	$pattern = get_shortcode_regex();

	return preg_replace_callback( "/$pattern/s", 'strip_shortcode_tag', $content );
}

function strip_shortcode_tag( $m ) {
	// allow [[foo]] syntax for escaping a tag
	if ( $m[1] == '[' && $m[6] == ']' ) {
		return substr($m[0], 1, -1);
	}

	return $m[1] . $m[6];
}

After

function strip_shortcodes( $content, $strip_between = true ) {
	global $shortcode_tags;

	if (empty($shortcode_tags) || !is_array($shortcode_tags))
		return $content;

	$pattern = get_shortcode_regex();

	if($strip_between==true) return preg_replace_callback( "/$pattern/s", 'strip_shortcode_tag', $content );
	else return preg_replace_callback( "/$pattern/s", 'strip_shortcode_tag_notbetween', $content );
}

function strip_shortcode_tag( $m ) {
	// allow [[foo]] syntax for escaping a tag
	if ( $m[1] == '[' && $m[6] == ']' ) {
		return substr($m[0], 1, -1);
	}

	return $m[1] . $m[6];
}

function strip_shortcode_tag_notbetween( $m ) {
	// allow [[foo]] syntax for escaping a tag
	if ( $m[1] == '[' && $m[6] == ']' ) {
		return substr($m[0], 1, -1);
	}

	return $m[1] . $m[5] . $m[6];
}

It's probably possible to do this with slicker code, but this is fairly simple and works.

An example of when this problem is encountered in the real world is with the RB internal links plugin being used in post content. When the post is displayed by the popular widget plugin any text which was internally linked is lost, leaving a snippet of the post which makes no sense to a human reader. For an example on a live site see the first entry under the Most popular (all time) section on the right hand side of DIY Media Home

Attachments (2)

strip_shortcodes.1.diff (566 bytes) - added by gMagicScott 12 years ago.
strip_shortcodes.2.diff (834 bytes) - added by gMagicScott 10 years ago.

Download all attachments as: .zip

Change History (10)

#1 @jonscaife
12 years ago

I should also say, I tried modifying a plugin to send the extra variable, without modifying shortcodes.php, to simulate the behaviour of a plugin making use of the new feature I describe, running on an old wordpress version which obviously wouldn't support said feature. strip_shortcodes() behaved as expected and no errors occurred.

#2 @gMagicScott
12 years ago

I agree that there are many shortcodes that leave content distorted when totally removed, but I think the interest in modifying how shortcodes are removed is more an interest of the plugin/theme that provides the shortcode, than whoever calls strip_shortcodes()

A better solution may be to add a filter to the return value of strip_shortcode_tag(). Individual plugins and themes can then manage the output, knowing the desired result is to remove it.

Patch attached (strip_shortcodes.1.diff), or on GitHub

#3 follow-up: @jonscaife
12 years ago

Your solution sounds much smarter than mine! As long as it's possible to avoid the distortion of content it works for me. Thanks

#4 in reply to: ↑ 3 @gMagicScott
12 years ago

  • Cc scott@… added
  • Keywords dev-feedback added

Replying to jonscaife:

Your solution sounds much smarter than mine! As long as it's possible to avoid the distortion of content it works for me. Thanks

My proposed patch would expose the three arguments to the filter function, an empty string, the shortcode attributes array, and the content. All the function would need to do in your case is return the content

function my_filter_strip_shortcode_tag_highlight( $value, $attr, $content = null ) {
    if ( null === $content ) {
        return $val;
    }

    return $content;
}

add_filter( 'strip_shortcode_tag_highlight', 'my_filter_strip_shortcode_tag_highlight', 10, 3 );

#5 @jdgrimes
12 years ago

  • Cc jdg@… added

#6 @chriscct7
10 years ago

  • Severity changed from trivial to normal

#7 @gMagicScott
10 years ago

I updated the patch to include docblocks for the new filter.

#8 @callumbw95
6 months ago

Hi All,

I have just taken a look into this, and this issue still appears to occur. I also cannot think of any reason why this bug would be intentional, and as of such I have attached a bug report below to help gain some traction on this issue.

Bug Report

Description

strip_shortcodes() is still removing content within shortcodes in passed content.

Environment

  • WordPress: 6.9-alpha-60093-src
  • PHP: 8.4.7
  • Server: nginx/1.27.5
  • Database: mysqli (Server: 8.0.40 / Client: mysqlnd 8.4.7)
  • Browser: Chrome 137.0.0.0
  • OS: macOS
  • Theme: Twenty Seventeen 3.9
  • MU Plugins: None activated
  • Plugins:
    • Test Reports 1.2.0

Steps to Reproduce

  1. Pass a string to strip_shortcodes() including a shortcode. For example: Lorem ipsum [caption]dolor[/caption] sit amet, consectetur adipisicing elit.

Expected Results

  1. ✅ Output string should be: Lorem ipsum dolor sit amet, consectetur adipisicing elit

Actual Results

  1. ❌ Output string is actually: Lorem ipsum sit amet, consectetur adipisicing elit
Note: See TracTickets for help on using tickets.