What happened?
Description
In a multi-site Craft CMS Pro project with the "craftcms/redactor": "3.1.0" plugin, I am encountering an issue with content indexing. The issue seems related to invisible characters that affect search results in the admin panel and frontend.
Example Content
If I click the Source button in the Redactor field, the content is displayed as follows:
<p>More information on the projects submission can be found in the <strong>TOOLBOX</strong>.</p>
However, when I search for the word toolbox in the admin, this page does not appear in the results. Conversely, if I search for TOOL%C2%ADBOX, the page is found. The same behavior occurs in the frontend when using .search().
Field Configuration
- Clean up HTML: Remove inline styles, Remove empty tags, Replace non-breaking spaces with regular spaces
- Purify HTML: Enabled
- HTML Purifier Config: Default
It seems that some invisible characters are being introduced and retained after saving. These characters interfere with the indexing process.
Steps to Reproduce
- Create a field using
Redactor plugin with the configurations mentioned above.
- Add the following content in the field:
<p>More information on the projects submission can be found in the <strong>TOOLBOX</strong>.</p>
- Save the entry and perform a search for the word
toolbox in the admin or frontend.
Expected Behavior
The page containing the word TOOLBOX should appear in the search results without requiring the exact invisible character sequence (TOOL%C2%ADBOX).
Actual Behavior
The page only appears in the search results if the invisible character sequence is included in the search query. Regular searches for toolbox do not return the expected result.
Additional Questions
- Why are these invisible characters added and retained after saving the content?
- How can I prevent such characters from being saved in the first place?
- What is the recommended approach to clean all content encodings before re-index using
--update-search-index?
Craft CMS version
4.13.8
PHP version
No response
Operating system and version
No response
Database type and version
No response
Image driver and version
No response
Installed plugins and versions
- "craftcms/redactor": "3.1.0"
What happened?
Description
In a multi-site Craft CMS Pro project with the
"craftcms/redactor": "3.1.0"plugin, I am encountering an issue with content indexing. The issue seems related to invisible characters that affect search results in the admin panel and frontend.Example Content
If I click the
Sourcebutton in the Redactor field, the content is displayed as follows:However, when I search for the word
toolboxin the admin, this page does not appear in the results. Conversely, if I search forTOOL%C2%ADBOX, the page is found. The same behavior occurs in the frontend when using.search().Field Configuration
It seems that some invisible characters are being introduced and retained after saving. These characters interfere with the indexing process.
Steps to Reproduce
Redactorplugin with the configurations mentioned above.toolboxin the admin or frontend.Expected Behavior
The page containing the word
TOOLBOXshould appear in the search results without requiring the exact invisible character sequence (TOOL%C2%ADBOX).Actual Behavior
The page only appears in the search results if the invisible character sequence is included in the search query. Regular searches for
toolboxdo not return the expected result.Additional Questions
--update-search-index?Craft CMS version
4.13.8
PHP version
No response
Operating system and version
No response
Database type and version
No response
Image driver and version
No response
Installed plugins and versions