Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WordPress search, unexpected results due to Gutenberg serialization markup #3739

Closed
ghost opened this issue Nov 30, 2017 · 7 comments
Closed
Labels
[Type] WP Core Ticket Requires an upstream change from WordPress. Core Trac ticket should be linked.

Comments

@ghost
Copy link

ghost commented Nov 30, 2017

As already mentioned in a comment in #2718, a simple WordPress search for "paragraph" or "core" or "image" (if an image was added) shows unexpected results:

example.com/?s=paragraph

Gutenberg serialization markup leads to unexpected search results with above and many more keywords and keyword parts like para, graph, text, but, butt, button, cat, ate, categories, code, over, cover, form, head, ding, html, late, latest, post, list, quote, tor, table, ...

WP 4.9.1, Gutenberg Plugin 1.8.1

@jasmussen
Copy link
Contributor

Could #1422 be related? CC: @youknowriad

@youknowriad
Copy link
Contributor

@jasmussen I don't think so, this is a separate issue while searching uses the raw value of the post_content which includes the block comments and mess up with the results. wp:paragraph...

But this is not specific to Gutenberg, Gutenberg makes it more visible but this is a Core Bug that can be reproduced using shortcodes as well.

@StaggerLeee
Copy link

One new roadblock ?

@pento pento added the [Type] WP Core Ticket Requires an upstream change from WordPress. Core Trac ticket should be linked. label Jan 5, 2018
@pento
Copy link
Member

pento commented Jan 5, 2018

Unfortunately, this is a known issue in WordPress core - in a vanilla WordPress install, if you search for "table", you'll get results including <table> tags. MySQL's string searching isn't capable of dealing with this kind of contextual parsing.

If you require 100% accurate search results, the best option is to use a dedicated search engine, like Elasticsearch. There are also Elasticsearch services available within the WordPress world, if setting up a dedicated search server is not an option.

@pento pento closed this as completed Jan 5, 2018
@danielbachhuber
Copy link
Member

Given Gutenberg blocks will add substantially more "hidden" strings, I wonder how much larger of a problem this will become. It'd be interesting to do some analysis comparing an English-language dictionary to partial string matches with Gutenberg blocks.

@Zodiac1978
Copy link

MySQL's string searching isn't capable of dealing with this kind of contextual parsing.

I found a way to do this, but I'm not sure if fiddling with posts_where would be okay for core or if this is plugin territory ...

<?php
/**
 * Plugin Name: Ignore block name in search
 * Description: Updated the native search to ignore block editor comments
 * Version: 1.0
 * Author: Torsten Landsiedel
 * License: GPL2
 */

/*
Based on "Search Ignore HTML Tags" by Pramod Sivadas
wordpress.org/plugins/wp-search-ignore-html-tags/
*/

/**
 * Modify search query to ignore the search term in HTML comments.
 *
 * @param string   $where The WHERE clause of the query.
 * @param WP_Query $query The WP_Query instance (passed by reference).
 *
 * @return string The modified WHERE clause.
 */
function tl_update_search_query( $where, $query ) {
	if ( ! is_search() || ! $query->is_main_query() ) {
		return $where;
	}

	global $wpdb;
	$search_query = get_search_query();
	$search_query = $wpdb->esc_like( $search_query );

	$where .= " AND {$wpdb->posts}.post_content NOT REGEXP '<!--.*$search_query.*-->' ";

	return $where;
}
add_filter( 'posts_where', 'tl_update_search_query', 10, 2 );

@pento Feedback if a trac ticket makes sense would be very much appreciated. Thanks in advance!

@Zodiac1978
Copy link

The code above is not working, but it seems there is a way for MariaDB 10.0.5+ and MySQL 8.0.4+, because then REGEXP_REPLACE was introduced. More details here: https://core.trac.wordpress.org/ticket/56294/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
[Type] WP Core Ticket Requires an upstream change from WordPress. Core Trac ticket should be linked.
Projects
None yet
Development

No branches or pull requests

6 participants