SQL: Same updateCacheUnreads for all DBs#5648
Conversation
Use same SQL update request for MySQL / MariaDB than the one we already used for PostgreSQL / SQLite (i.e. using a sub-query). Testing on a DB of 688MB with 270k entries, 199 feeds, 19 categories, using MySQL 8.1.0. The new SQL update using a sub-query took in average 0.02s, while the old SQL update using a join took in average 0.05s. SQL cache was properly invalidated between each run. The new SQL request is thus about twice faster. Another advantage of the SQL update using a sub-query is that it works identically in PostgreSQL, SQLite, MariaDB, MySQL, so we do need different versions anymore. Contributes to FreshRSS#5008 (comment)
|
For the new SQL request, using MySQL 8.1.0: EXPLAIN UPDATE freshrss_feed
SET `cache_nbUnreads`=(
SELECT COUNT(*) AS nbUnreads FROM freshrss_entry e
WHERE e.id_feed=freshrss_feed.id AND e.is_read=0); |
|
This was for the old SQL request: EXPLAIN UPDATE `freshrss_feed` f LEFT OUTER JOIN (
SELECT e.id_feed, COUNT(*) AS nbUnreads
FROM `freshrss_entry` e
WHERE e.is_read = 0
GROUP BY e.id_feed
) x ON x.id_feed = f.id
SET f.`cache_nbUnreads` = COALESCE(x.nbUnreads, 0); |
|
Ping @Emporea |
|
|
Could you explain me how to "break the cache" and why?
|
|
I have just tried with MariaDB 11.1 with similar results than my tests with MySQL. Both with Docker. |
Just manually mark a few articles as read in the FreshRSS Web interface, from a few different feeds, to make sure no sub-query is cached by the DB. From your EXPLAIN commands, no index is used, which is probably the main reason why it is so slow, so this is what needs to be investigated. |
|
@Emporea If your DB is not too private, you could export it with our CLI to an SQLite file and send it to me so I can try to reproduce. If that is not an option, we will try a few more things from remote. What kind of server are you operating on? In particular: OS, processor architecture, type of disk |
|
@Emporea Could you try this variant, in which we force the query plan to help MariaDB's optimizer?: UPDATE freshrss_empo_feed
SET `cache_nbUnreads`=(
SELECT COUNT(*) AS nbUnreads FROM freshrss_empo_entry e
USE INDEX (entry_feed_read_index)
WHERE e.id_feed=freshrss_empo_feed.id AND e.is_read=0); |
This is faster. Do you think this is on the freshrss side? Or are my other databases also affected and I could have much better performance overall? Only freshrss caught my eye as I have been getting these 504's and locked databases in the logs a lot lately. |
@Alkarex I am also fine with sharing my database. I don't actually use freshrss to read the news by hand. It's just to collect rss data for a small data science project. I exported it with How should I send this file to you?
Debian12, x86_64 Intel Xeon, NVME SSD. |
|
Ok, thanks for the tests. That looks much better now :-) I believe this is a case of MariaDB's optimizer making an unfortunate decision for the query plan. Since we have a dedicated index, which is supposed to be used all the time, I have hardcoded a Could you just try the current PR? If it works, then we should be good to go, and no need to share your DB. P.S. Pure speculation, but amusingly, it might be that it would have run by default must faster on a slow hard-disk than on your SSD, because the cost of the scanning the whole DB would have been higher, thus favouring the use of the index :-P |
How? Sorry I am not really a developer ^^. |
|
Here is an option: Edit your docker-compose on the model of https://github.com/FreshRSS/FreshRSS/blob/edge/Docker/freshrss/docker-compose.yml with the following information: build:
context: https://github.com/Alkarex/FreshRSS.git#updateCacheUnreads
dockerfile: Docker/Dockerfile-AlpineAnd then execute a Let me know if that works for you, otherwise I will propose you something else. |
|
I hit "mark all as read" with this docker config. It still gives me 504, and the container has a high CPU usage for some minutes. |
|
Can you see which SQL request is running when it takes that long? |
Thats probably not the query we are looking for. I had 10000 unread before. Now it was only about 200 maybe. I didnt get 504 and probably wasnt fast enough with the "show processlist" commnad |
|
Could you try those three variants? EXPLAIN UPDATE freshrss_empo_entry e
SET e.is_read = 1
WHERE e.is_read <> 1 AND e.id <= 1694438602000000;
EXPLAIN UPDATE freshrss_empo_entry e
INNER JOIN freshrss_empo_feed f ON e.id_feed = f.id
SET e.is_read = 1
WHERE e.is_read <> 1 AND e.id <= 1694438602000000
AND f.priority > 10;
EXPLAIN UPDATE freshrss_empo_entry e
SET e.is_read = 1
WHERE e.is_read <> 1 AND e.id <= 1694438602000000
AND id_feed IN (SELECT f.id FROM freshrss_empo_feed f WHERE f.priority > 10); |
If you need my database just tell me and I will send a link to your profile email. |
Yes, I think that would be good after all, so I can try to replicate. Visibly, MariaDB has some difficulties picking the correct available indexes at the right time |
|
I have well received the DB copy, but all the SQL queries I am trying are relatively snappy and perform in at max 1.5s, even with 100k+ unread articles. Could you try to see how long three variants take on your side? They will mark all articles as read, so it it best if you try these just after you are done reading everything. For the test, you can click on the open envelope to show read articles, use the search field with UPDATE freshrss_empo_entry e
INNER JOIN freshrss_empo_feed f ON e.id_feed = f.id
SET e.is_read = 1
WHERE e.is_read <> 1 AND e.id <= 1694470099000000
AND f.priority > 0;
UPDATE freshrss_empo_entry e
SET e.is_read = 1
WHERE e.is_read <> 1 AND e.id <= 1694470099000000
AND id_feed IN (SELECT f.id FROM freshrss_empo_feed f WHERE f.priority > 0);
UPDATE freshrss_empo_entry e
USE INDEX (is_read)
SET e.is_read = 1
WHERE e.is_read <> 1 AND e.id <= 1694470099000000
AND id_feed IN (SELECT f.id FROM freshrss_empo_feed f WHERE f.priority > 0); |
|
Will do, but could you test with mariadb 10.6 too? That is the one I am using. If I need to upgrade then I will have a look at it. |
|
Date:P15d/' gave me about 50,000 messages. Marking all as read or unread via the web ui is now fast and doesn't give me 504's. The first one takes the longest. Thanks again for your effort btw. |
|
I have re-tested with MariaDB, MySQL, PostgreSQL, all in latest versions, as well as with the built-in SQLite. I have not performed an exhaustive and systematic test, but on your dataset, some clear trends seem to appear nevertheless. 🐇 PostgreSQL > SQLite >> MySQL > MariaDB 🐢 Here is some data from MySQL on my computer: good performance with 4k unread articles, bad performance with 10k unread articles. MySQLPostgreSQL |
|
@Emporea Please change your docker-compose back to |
Favour PostgreSQL, based on tests such as: FreshRSS#5648 (comment) FreshRSS#5707 SQLite is generally very fine, although with the challenge that the DB model cannot easily be updated https://www.sqlite.org/lang_altertable.html#why_alter_table_is_such_a_problem_for_sqlite (We should ship a command-line update mechanism) I have verified that FreshRSS still works with `mysql:5.5.62` https://hub.docker.com/layers/library/mysql/5.5.62/images/sha256-d404d78aa797c87c255e5ae2beb5d8d0e4d095f930b1f20dc208eaa957477b74?context=explore
* Update DB recommendations Favour PostgreSQL, based on tests such as: #5648 (comment) #5707 SQLite is generally very fine, although with the challenge that the DB model cannot easily be updated https://www.sqlite.org/lang_altertable.html#why_alter_table_is_such_a_problem_for_sqlite (We should ship a command-line update mechanism) I have verified that FreshRSS still works with `mysql:5.5.62` https://hub.docker.com/layers/library/mysql/5.5.62/images/sha256-d404d78aa797c87c255e5ae2beb5d8d0e4d095f930b1f20dc208eaa957477b74?context=explore * sqlite

Use same SQL update request for MySQL / MariaDB than the one we already used for PostgreSQL / SQLite (i.e. using a sub-query).
Testing on a DB of 688MB with 270k entries, 199 feeds, 19 categories, using MySQL 8.1.0 and MariaDB 11.1.
The new SQL update using a sub-query took in average 0.02s, while the old SQL update using a join took in average 0.05s. SQL cache was properly invalidated between each run. The new SQL request is thus about twice faster on that test.
EXPLAINalso shows an intrinsic advantage, and it might lead to fewer locks.Another advantage of the SQL update using a sub-query is that it works identically in PostgreSQL, SQLite, MariaDB, MySQL, so we do need different versions anymore.
Contributes to #5008 (comment)
Edit: But with a
USE INDEXhint for MySQL/MariaDB.