Conversation
|
I'm running this branch in a custom docker-compose build based on the I'd recommend being explicit about what type of regex patterns you support. The most common standard types are Posix (very limited), Extended, and Perl, though there are lots of variations like Java, RE2 (Golang), etc that slightly tweak the syntax, usually primarily only of more advanced features. Maybe if you add a link in the documentation to the regex engine's definition that would be helpful as well. Working:
Not working:
BugsPattern ParenthesisIt's common to use parentheses in regex patterns to form conditional groups. For example: The parenthesis currently require manual escaping however, they're being captured by the filter logic outside the regex. This also appears to be an existing bug in how the non-regex filters work as well, but it's much more significant for regex patterns because parentheses are used so frequently in them. For example, this fails to match anything I did verify that this same parenthesis bug affects quoted non-regex patterns though too. Doing Multi-line modeI'm not sure how to test multi-line mode exactly. It works fine if I put the The pattern I suspect you want |
|
Hello @mtalexan and thanks for testing 👍🏻
|
|
Excellent, thanks for the clarification. For reference, I'm testing with PostgreSQL.
It seems only parentheses have weird escaping rules. Pattern escape characters like The super strange behavior with parenthesis and escaping definitely need to be documented. No one will ever figure out how to use it without a lot of trial-and-error, and it's such a standard feature of regex patterns to use parentheses you can guarantee almost everyone will encounter the issue. I'd suggest dropping the mention in the documentation on filtering along with the links to the regex syntaxes supported by the different databases. Is there anything else you'd like me to test? It looks like the same code path is used for all cases of filters being used, so I assume there's not much point in re-testing with per-feed filters, label filters, or saved user filters. Correct? |
|
Oh, forgot to mention I'd tested negation of a regex filter in my original set of tests. It's working fine too. |
|
As I work on using the regexes more, I'm finding more and more cases where it's not clear if there was a regex error, or if the pattern is matching unexpected things. For example, if I use the word-boundary escape character ( This for example doesn't work: Related: it appears constraint escapes aren't working. Class-Shorthand escapes and Backreferences are working however. |
|
Thanks for the additional tests and feedback, @mtalexan 👍🏻 Indeed, we do not have any feedback to the user when the search expression is wrong, for instance with wrong parentheses, invalid regex, etc. But we should add that. Adding an error feedback in particular should not be too difficult, but should still be left to a follow-up PR. |
I'm definitely on board with splitting the work up. Having access to regex parsing at all is a major feature that shouldn't be held up by reduced usability. |
|
Did you mean to include the |
|
Added Example : |
I meant |
|
Tags fixed (not tested). Tests welcome |
|
And regex added for tags |
Check support for multiline mode in MySQL
Allow searching for instance `/<pre>/` Fix FreshRSS#6775 (comment)
|
|
||
| Example to search entries, which title starts with the *Lol* word, with any number of *o*: `intitle:/^Lo+l/i` | ||
|
|
||
| As opposed to normal searches, HTML special characters are not escaped in regex searches, to allow searching HTML code, like: `/Hello <span>world<\/span>/` |
There was a problem hiding this comment.
This also has the effect that I have to use the HTML encoding of any character literals I'm looking for in the body as well, right? So & and " to find the post-rendered characters & and " for example.
Might it be worth doing this only if you have an r character on the regex? Like /Hello <span>world<\/span>/r?
There was a problem hiding this comment.
Indeed, the search is supposed to match the HTML content verbatim. No need to extra encode anything though, just exactly at it appears in the original HTML document.
We could consider adding some modifiers, but let's see if there is a sufficient need first. Another modifier I am considering is an automated transformation from PCRE syntax (subset) to the other variants such as PostgreSQL.
There was a problem hiding this comment.
I think you might run into issues trying to translate between the regex versions. If you settle on the PCRE syntax there are a lot of advanced (?...) features that aren't present in Postgres. Postgres also has explicit start and end word boundary markers, while PCRE has only a singular word boundary marker. Then there's MySQL, which supports only a very small subset of regex features.
I could see maybe trying to do something with the constraint escapes for Postgres, but that's about the only thing I can see that could possibly be translated and it would probably just confuse the situation with the databases having different regex formats.
|
Hi @Alkarex, I was trying to find issues/discussions where the regex filters were mentioned, and I ended up here. I have an issue where I'm trying to assign labels automatically to incoming feed items by using intitle regex filters, but it has some erratic behaviour that it's difficult for me to identify, and I was hoping you could give me a hint of what's happening. So I tried the same as you did in your first comment for the FreshRSS releases feed: If I use the filter Any idea why is this happening in my instance? I'm using the default Docker deploy. |
What version of FreshRSS are you running? Try |
Version 1.24.3.
Awesome! With this one (1.25.0-dev), it works as expected: Problem solved then, thanks a lot for the quick reply :D EDIT: duh, I just realized from your first comment that regex search is only available from FreshRSS 1.25.0+... sorry for not checking that before. |
This PR contains the following updates: | Package | Update | Change | |---|---|---| | [freshrss](https://github.com/FreshRSS/FreshRSS) | minor | `1.24.3` -> `1.26.0` | --- ### Release Notes <details> <summary>FreshRSS/FreshRSS (freshrss)</summary> ### [`v1.26.0`](https://github.com/FreshRSS/FreshRSS/blob/HEAD/CHANGELOG.md#2025-02-23-FreshRSS-1260) [Compare Source](FreshRSS/FreshRSS@1.25.0...1.26.0) - Features - Add order-by options to sort articles by received date (existing, default), publication date, title, link, random [#​7149](FreshRSS/FreshRSS#7149) - Allow searching in all feeds, also feeds only visible at category level with `&get=A`, and also those archived with `&get=Z` [#​7144](FreshRSS/FreshRSS#7144) - UI accessible from user-query view - Add search operator `intext:` [#​7228](FreshRSS/FreshRSS#7228) - New shortcuts for adding user labels to articles [#​7274](FreshRSS/FreshRSS#7274) - New *About* page with system information [#​7161](FreshRSS/FreshRSS#7161) - Bug fixing - Fix regression denying access to app manifest [#​7158](FreshRSS/FreshRSS#7158) - Fix unwanted feed description updates [#​7269](FreshRSS/FreshRSS#7269) - Ensure no PHP buffer for SQLite download (some setups would first put the file in memory) [#​7230](FreshRSS/FreshRSS#7230) - Fix XML encoding regression in HTML+XPath mode [#​7345](FreshRSS/FreshRSS#7345) - Improve cURL proxy options and fix some constants [#​7231](FreshRSS/FreshRSS#7231) - Fix UI of global view unread articles counter [#​7247](FreshRSS/FreshRSS#7247) - Hide base theme in carrousel [#​7234](FreshRSS/FreshRSS#7234) - Deployment - Reduce superfluous Docker builds [#​7137](FreshRSS/FreshRSS#7137) - Docker default image (Debian 12 Bookworm) updated to PHP 8.2.26 and Apache 2.4.62 - Docker alternative image (Alpine 3.21) updated to PHP 8.3.16 - UI - Add footer icons to reader view [#​7133](FreshRSS/FreshRSS#7133) - Remove local reference to font *Open Sans* to avoid bugs with some local versions [#​7215](FreshRSS/FreshRSS#7215) - Improve stats page layout [#​7243](FreshRSS/FreshRSS#7243) - Smaller *mark as read* button in mobile view [#​5220](FreshRSS/FreshRSS#5220) - Add CSS class to various types of notifications to allow custom styling [#​7287](FreshRSS/FreshRSS#7287) - Various UI and style improvements: [#​7162](FreshRSS/FreshRSS#7162), [#​7268](FreshRSS/FreshRSS#7268) Security - Better authorization label for OIDC in the UI [#​7264](FreshRSS/FreshRSS#7264) - Allow comments in `force-https.txt` [#​7259](FreshRSS/FreshRSS#7259) - I18n: - Improve German [#​7177](FreshRSS/FreshRSS#7177), [#​7275](FreshRSS/FreshRSS#7275), [#​7278](FreshRSS/FreshRSS#7278) - Improve Japanese [#​7187](FreshRSS/FreshRSS#7187), [#​7195](FreshRSS/FreshRSS#7195), [#​7332](FreshRSS/FreshRSS#7332) - Misc. - Improve PHP code [#​7191](FreshRSS/FreshRSS#7191), [#​7204](FreshRSS/FreshRSS#7204) - Upgrade to PHPStan 2 [#​7131](FreshRSS/FreshRSS#7131), [#​7164](FreshRSS/FreshRSS#7164), [#​7224](FreshRSS/FreshRSS#7224), [#​7270](FreshRSS/FreshRSS#7270), [#​7281](FreshRSS/FreshRSS#7281), [#​7282](FreshRSS/FreshRSS#7282) - Update to CssXPath 1.3.0 (no change) [#​7211](FreshRSS/FreshRSS#7211) - Update dev dependencies [#​7165](FreshRSS/FreshRSS#7165), [#​7166](FreshRSS/FreshRSS#7166), [#​7167](FreshRSS/FreshRSS#7167), [#​7279](FreshRSS/FreshRSS#7279), [#​7280](FreshRSS/FreshRSS#7280), [#​7283](FreshRSS/FreshRSS#7283), [#​7284](FreshRSS/FreshRSS#7284), [#​7285](FreshRSS/FreshRSS#7285), [#​7347](FreshRSS/FreshRSS#7347) - Update GitHub Actions to Ubuntu 24.04 [#​7207](FreshRSS/FreshRSS#7207) ### [`v1.25.0`](https://github.com/FreshRSS/FreshRSS/blob/HEAD/CHANGELOG.md#2024-12-23-FreshRSS-1250) [Compare Source](FreshRSS/FreshRSS@1.24.3...1.25.0) - Features - Add support for [regex search (regular expressions)](https://freshrss.github.io/FreshRSS/en/users/10\_filter.html#regex) [#​6706](FreshRSS/FreshRSS#6706), [#​6926](FreshRSS/FreshRSS#6926) -⚠️ Advanced regex syntax for searches depends on the database used (SQLite, PostgreSQL, MariaDB, MySQL), but FreshRSS filter actions such as auto-mark-as-read and auto-favourite always use [PHP PCRE2 syntax](https://php.net/regexp.introduction). - Allow dynamic search operator in user queries, like `search:UserQueryA date:P1d` [#​6851](FreshRSS/FreshRSS#6851) - New feed mode *HTML+XPath+JSON dot notation* (JSON in HTML) [#​6888](FreshRSS/FreshRSS#6888) - Better HTTP compliance with support for HTTP response headers `Cache-Control: max-age` and `Expires` [#​6812](FreshRSS/FreshRSS#6812), [FreshRSS/simplepie#26](FreshRSS/simplepie#26) - Support custom HTTP request headers per feed (e.g. for `Authorization`) [#​6820](FreshRSS/FreshRSS#6820) - New unicity policies and heuristic for feeds with bad article IDs [#​4487](FreshRSS/FreshRSS#4487), [#​6900](FreshRSS/FreshRSS#6900) - Fallback to GUID if article link is empty [#​7051](FreshRSS/FreshRSS#7051) - New option to automatically mark new articles as read if an identical title already exists in the same category [#​6922](FreshRSS/FreshRSS#6922) - New reading view option to display unread articles + favourites [#​7088](FreshRSS/FreshRSS#7088) - And corresponding new filter state `&state=96` (no UI button yet) - Add ability to remove content from articles with CSS selectors, also when not using full content [#​6786](FreshRSS/FreshRSS#6786), [#​6807](FreshRSS/FreshRSS#6807) - Update `phpgt/cssxpath` library with improved CSS selectors [#​6618](FreshRSS/FreshRSS#6618) - Support for `:last-child`, `:first-of-type`, `:last-of-type`, `^=`, `|=` - New condition option to selectively retrieve full content of articles [#​33fd07f6f26310d4806077cc87bcdf9b8b940e35](FreshRSS/FreshRSS@33fd07f), [#​7082](FreshRSS/FreshRSS#7082) - Allow parentheses in quoted search [#​7055](FreshRSS/FreshRSS#7055) - New UI feature to download a user’ SQLite database or a database SQLite export (to be produced by CLI) [#​6931](FreshRSS/FreshRSS#6931) - New button to delete errored feeds from a category [#​7030](FreshRSS/FreshRSS#7030) - Better import of Inoreader user labels [#​6791](FreshRSS/FreshRSS#6791) - Rebuild feed favicon on cache clear [#​6961](FreshRSS/FreshRSS#6961) - New sharing with Bluesky [#​7116](FreshRSS/FreshRSS#7116) - New sharing with Telegram [#​6838](FreshRSS/FreshRSS#6838) - Bug fixing - Fix searches with a parenthesis before an operator like `("a b")` or `(!c)` [#​6818](FreshRSS/FreshRSS#6818) - Fix auto-read tags [#​6790](FreshRSS/FreshRSS#6790) - Fix CSS selector for removing elements [#​7037](FreshRSS/FreshRSS#7037), [#​7073](FreshRSS/FreshRSS#7073), [#​7081](FreshRSS/FreshRSS#7081), [#​7091](FreshRSS/FreshRSS#7091), [#​7083](FreshRSS/FreshRSS#7083) - Fix redirection error after creating a new user [#​6995](FreshRSS/FreshRSS#6995) - Fix favicon error in case of wrong URL [#​6899](FreshRSS/FreshRSS#6899) - Use cURL to fetch extensions list (allows e.g. IPv6) [#​6767](FreshRSS/FreshRSS#6767) - Fix XML encoding in cURL options [#​6821](FreshRSS/FreshRSS#6821) - Fix initial UI scroll for some browsers [#​7059](FreshRSS/FreshRSS#7059) - Fix menu for article tags in some cases [#​6990](FreshRSS/FreshRSS#6990) - Fix share menu shortcut [#​6825](FreshRSS/FreshRSS#6825) - Fix HTML regex pattern during install for compatibility with `v` mode [#​7009](FreshRSS/FreshRSS#7009) - More robust creation of user data folder [#​7000](FreshRSS/FreshRSS#7000) - API - Fix API for categories and labels containing a `+` [#​7033](FreshRSS/FreshRSS#7033) - Compatibility with FocusReader - Supported by [Capy Reader](https://github.com/jocmp/capyreader) (Android, open source) [capyreader#492](jocmp/capyreader#492) - Improved UI for API [#​7048](FreshRSS/FreshRSS#7048) - Allow adding multiple feeds to a category via API [#​7017](FreshRSS/FreshRSS#7017) - API support edit multiple tags [#​7060](FreshRSS/FreshRSS#7060) - API return all categories also those without any feed [#​7020](FreshRSS/FreshRSS#7020) - Compatibility - Require PHP 8.1+ (drop PHP 7.4) [#​6711](FreshRSS/FreshRSS#6711) - Improved support of PHP 8.4+ [#​6618](FreshRSS/FreshRSS#6618), [phpgt/CssXPath#227](phpgt/CssXPath#227), [#​6781](FreshRSS/FreshRSS#6781), [#​4374](FreshRSS/FreshRSS#4374) - Require PostgreSQL 10+ (drop PostgreSQL 9.5) [#​6705](FreshRSS/FreshRSS#6705) - Require MariaDB 10.0.5+ (drop MariaDB 5.5) [#​6706](FreshRSS/FreshRSS#6706) - Require MySQL 8+ (drop MySQL 5.5.3) [#​6706](FreshRSS/FreshRSS#6706) - Deployment - Docker: dev image `freshrss/freshrss:oldest` updated to Alpine 3.16 with PHP 8.1.22 and Apache 2.4.59 [#​6711](FreshRSS/FreshRSS#6711) - Docker alternative image updated to Alpine 3.21 with PHP 8.3.14 and Apache 2.4.62 [#​5383](FreshRSS/FreshRSS#5383) - Update Dockerfiles to newer key-value format [#​6819](FreshRSS/FreshRSS#6819) - Docker minor improvement of entrypoint [#​6827](FreshRSS/FreshRSS#6827) - SimplePie - Refactor [our embedding](lib/README.md) of SimplePie [#​4374](FreshRSS/FreshRSS#4374) - Our fork is maintained in its [own repository](https://github.com/FreshRSS/simplepie/tree/freshrss). - Remove HTTP `Referer` [#​6822](FreshRSS/FreshRSS#6822), [FreshRSS/simplepie#27](FreshRSS/simplepie#27) - If some sites require it, add `Referer: https://example.net/` to the custom HTTP headers of the feed [#​6820](FreshRSS/FreshRSS#6820) - Upstream fixes [simplepie#878](simplepie/simplepie#878), [simplepie#883](simplepie/simplepie#883) - Sync upstream [#​6840](FreshRSS/FreshRSS#6840), [#​7067](FreshRSS/FreshRSS#7067) - Security - Apache protect more non-public folders and files [#​6881](FreshRSS/FreshRSS#6881), [#​6893](FreshRSS/FreshRSS#6893), [#​7008](FreshRSS/FreshRSS#7008) - Add privacy settings on extension list retrieval [#​4603](FreshRSS/FreshRSS#4603), [#​7132](FreshRSS/FreshRSS#7132) - Fix login in unsafe mode when using a password with special XML characters [#​6797](FreshRSS/FreshRSS#6797) - Fix login in e.g. Brave browser by avoiding synchronous XHR [#​7023](FreshRSS/FreshRSS#7023) - Fix invalid login message [#​7066](FreshRSS/FreshRSS#7066) - Modernise `windows.open noopener` (to avoid flash of white page in dark mode) [#​7077](FreshRSS/FreshRSS#7077), [#​7089](FreshRSS/FreshRSS#7089) - UI - Searchable *My Labels* field [#​6753](FreshRSS/FreshRSS#6753) - Add subscription management button to reading view [#​6946](FreshRSS/FreshRSS#6946) - New option for showing label menu in article row [#​6984](FreshRSS/FreshRSS#6984) - Move to next unread label on mark as read [#​6886](FreshRSS/FreshRSS#6886) - Improved article footer for small / mobile screens [#​7031](FreshRSS/FreshRSS#7031) - Improve Web accessibility: fix `aria-hidden` bug, and use HTML5 `hidden` [#​6910](FreshRSS/FreshRSS#6910) - Default styles for `<pre>` and `<code>` [#​6770](FreshRSS/FreshRSS#6770) - Refactor the sharing menu to use a `<template>` instead of duplicated HTML code [#​6751](FreshRSS/FreshRSS#6751), [#​7113](FreshRSS/FreshRSS#7113) - Refactor the label menu to use a `<template>` [#​6864](FreshRSS/FreshRSS#6864) - Rework UI for authors [#​7054](FreshRSS/FreshRSS#7054) - Avoid Unicode escape of authors in HTML UI [#​7056](FreshRSS/FreshRSS#7056) - Improved subscription management page [#​6816](FreshRSS/FreshRSS#6816) - Improve user query management page [#​7062](FreshRSS/FreshRSS#7062) - Restore JavaScript form validation compatibility with Web browsers using older engines (SeaMonkey) [#​6777](FreshRSS/FreshRSS#6777) - Reorganise some options [#​6920](FreshRSS/FreshRSS#6920) - New shortcut `?` to show shortcut page and help [#​6981](FreshRSS/FreshRSS#6981) - Use of consistent colours in statistics [#​7090](FreshRSS/FreshRSS#7090) - Various UI and style improvements [#​6959](FreshRSS/FreshRSS#6959) - Extensions - New extension hook `simplepie_after_init` [#​7007](FreshRSS/FreshRSS#7007) - I18n - Add Finnish [#​6954](FreshRSS/FreshRSS#6954) - Improve English [#​7049](FreshRSS/FreshRSS#7049), [#​7053](FreshRSS/FreshRSS#7053) - Improve German [#​6847](FreshRSS/FreshRSS#6847), [#​7068](FreshRSS/FreshRSS#7068), [#​7128](FreshRSS/FreshRSS#7128) - Improve Italian [#​6872](FreshRSS/FreshRSS#6872), [#​7069](FreshRSS/FreshRSS#7069), [#​7086](FreshRSS/FreshRSS#7086) - Improve Spanish [#​6894](FreshRSS/FreshRSS#6894), [#​6908](FreshRSS/FreshRSS#6908) - Improve Turkish [#​6960](FreshRSS/FreshRSS#6960) - Misc. - Better cache name for JSON feeds [#​6768](FreshRSS/FreshRSS#6768) - Fix inversed encoding logic in `Minz_Request::paramArray()` [#​6800](FreshRSS/FreshRSS#6800) - Pass PHPStan `booleansInConditions` [#​6793](FreshRSS/FreshRSS#6793) - Rename PHPStan configuration file to `phpstan.dist.neon` to allow custom configuration in `phpstan.neon` [#​6892](FreshRSS/FreshRSS#6892) - Code improvements [#​6800](FreshRSS/FreshRSS#6800), [#​6809](FreshRSS/FreshRSS#6809), [#​6983](FreshRSS/FreshRSS#6983) - Makefile improvements [#​6913](FreshRSS/FreshRSS#6913) - Fix PHPCS `ControlSignature` [#​6896](FreshRSS/FreshRSS#6896) - Update *PHPMailer* [#​6968](FreshRSS/FreshRSS#6968), [#​7046](FreshRSS/FreshRSS#7046) - Code updates to PHP 8.1 syntax [#​6748](FreshRSS/FreshRSS#6748) - Update dev dependencies [#​6780](FreshRSS/FreshRSS#6780), [#​6964](FreshRSS/FreshRSS#6964), , [#​6965](FreshRSS/FreshRSS#6965), [#​6966](FreshRSS/FreshRSS#6966), [#​6967](FreshRSS/FreshRSS#6967), [#​6970](FreshRSS/FreshRSS#6970), [#​7042](FreshRSS/FreshRSS#7042), [#​7043](FreshRSS/FreshRSS#7043), [#​7044](FreshRSS/FreshRSS#7044), [#​7045](FreshRSS/FreshRSS#7045), [#​7047](FreshRSS/FreshRSS#7047), [#​7052](FreshRSS/FreshRSS#7052) </details> --- ### Configuration 📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined). 🚦 **Automerge**: Enabled. ♻ **Rebasing**: Whenever PR is behind base branch, or you tick the rebase/retry checkbox. 🔕 **Ignore**: Close this PR and you won't be reminded about this update again. --- - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box --- This PR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate). <!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzOS45MC4zIiwidXBkYXRlZEluVmVyIjoiMzkuMTc2LjQiLCJ0YXJnZXRCcmFuY2giOiJtYWluIiwibGFiZWxzIjpbXX0=--> Reviewed-on: https://gitea.sh4ke.rocks/lickler/freshrss/pulls/29 Co-authored-by: Michael Wittig <[email protected]> Co-committed-by: Michael Wittig <[email protected]>



fix #3549
Support regex filters and regex search (working with PostgreSQL and MySQL and SQLite).
Works like any regular expression. Must be enclosed in
/ /Currently supporting the prefix
intitle:or no prefix at all (will addauthor:etc. soon)Case sensitive by default, but can be made case-insensitive with
imodifier like/Alice/iSupports multiline mode with
mmodifier like/^Alice/mExample of filter to keep only the entries, which title ends with

.0from https://github.com/FreshRSS/FreshRSS/releases.atom , marking as read the entries not matching this filter:Requires PHP 8.0+ due to use of new function
str_contains, as planned requirement for FreshRSS 1.25.0+ #6711