Skip to content

Shuffle Sort#2463

Closed
mdemoss wants to merge 17 commits intoFreshRSS:edgefrom
mdemoss:shuffle
Closed

Shuffle Sort#2463
mdemoss wants to merge 17 commits intoFreshRSS:edgefrom
mdemoss:shuffle

Conversation

@mdemoss
Copy link
Copy Markdown
Contributor

@mdemoss mdemoss commented Jul 25, 2019

image

Problems with chronological sort

I subscribe to a lot of feeds, and feeds differ widely in terms of how often they update and how much time I need to spend on an entry before marking it read. Funny pictures of cats take a second or two to appreciate and crowd the top of the timeline while that two hour long video on the history of pineapple pizza tends to sink lower and lower in the feed, never to be seen again.

So I set about trying to make the less-frequently-updating, more-attention-demanding feeds more visible in a shuffle sort option for FreshRSS.

Description of the shuffle algorithm

Imagine each feed as a stack of cards, with the most recent entries on top. Draw 3 from each stack, shuffle those cards together, and deal. Then repeat the process as the user keeps scrolling. The "shuffle" changes every day because I throw the current date into the hash as well.

This isn't strictly-speaking random, but most user-facing shuffle features in mp3 players and the like aren't really random either.

Limitations

It only works on MySQL (or MariaDB)

I wrote this for MySQL and not all of features I'm using are available in SQLite and Postgres.

This is addressed for the moment just by conditionally hiding the shuffle button.

My guess is this method of shuffling is probably possible in Postgres using different functionality, but may not be possible with SQLite.

Lack of a when-marked-as-read column

When an entry gets marked as read, this alters the sort order of the other entries in that feed. Not ideal. We could stop elements from 'jumping around' if we could mark when an entry was marked read and check for that when building the list of entries from each feed.
I thought the lastSeen column could be used this way, but it can't. It relatates to the last time the entry was seen when updating the feed from its source.

Even without those checks this hasn't turned out to be too noticable. Marking down the date and time when entries are read could enable some more interesting statistics.

Code

It works now, and I've been using it almost every day.

The query when it's put together looks something like this:

SELECT 
    e.id, e.guid, e.title, e.author, 
    # UNCOMPRESS(content_bin) AS content, 
    e.link, 	e.date,	e.is_read, e.is_favorite, e.id_feed, e.tags,
    # FIND_IN_SET(e.id, mostRecentFromEach.grouped_entries) AS orderWithinOwnFeed,
    # f.name AS feedName,
    CAST(CONV(
        CONCAT(
            HEX( (FIND_IN_SET(e.id, mostRecentFromEach.grouped_entries)-1) DIV 3 ), /* 1 hex digit because of BETWEEN */
            LEFT((SHA1(CONCAT(e.id, CURDATE()))),15) /* leave room for 1 hex digit */
        ), 
    16, 10) AS UNSIGNED) shuffleOrderKey
FROM `matt_entry` e 
INNER JOIN `matt_feed` f ON e.id_feed = f.id
/* https://stackoverflow.com/questions/2129693/using-limit-within-group-by-to-get-n-results-per-group */
INNER JOIN (
    SELECT
        e.id_feed,
        GROUP_CONCAT(e.id ORDER BY e.id DESC) grouped_entries
    FROM `matt_entry` e
    WHERE e.is_read=0 OR e.lastSeen > UNIX_TIMESTAMP() - (24 * 60 * 60)
    GROUP BY id_feed
) mostRecentFromEach ON mostRecentFromEach.id_feed = e.id_feed AND FIND_IN_SET(e.id, mostRecentFromEach.grouped_entries) BETWEEN 1 and 48 /* 1 hex digit after DIV 3 */
WHERE f.priority > 0  AND e.is_read=0 
ORDER BY shuffleOrderKey
LIMIT 21;

@mdemoss mdemoss changed the base branch from master to dev July 25, 2019 22:54
@mdemoss
Copy link
Copy Markdown
Contributor Author

mdemoss commented Jul 25, 2019

Accidentally based this on master, switched it to dev. Will checks re-run?

@mdemoss mdemoss mentioned this pull request Jul 26, 2019
@Alkarex
Copy link
Copy Markdown
Member

Alkarex commented Aug 2, 2019

Hello,
Sorry for the delay - I am at the beach 🏖️
This is quite interesting. It would indeed require more work to make it work with all databases, and ensure that the performances are acceptable.
It could be a part of a more substantial rewrite allowing various sort orders, e.g. by likelihood of interest. Among other things, it requires replacing the "mark all as read" function by the new function "mark many as read by ID" recently introduced.

@Alkarex
Copy link
Copy Markdown
Member

Alkarex commented Aug 2, 2019

P.S.: Checks will re-run at the next commit in that PR.

@Frenzie Frenzie closed this Aug 2, 2019
@Frenzie Frenzie reopened this Aug 2, 2019
@Frenzie
Copy link
Copy Markdown
Member

Frenzie commented Aug 2, 2019

Closing/reopening retriggers Travis. I'm not sure if that's a bug or a feature but there you go. ;-)

@mdemoss
Copy link
Copy Markdown
Contributor Author

mdemoss commented Aug 2, 2019

I think PostgreSQL can probably do something different-but-equivalent to MySQL's FIND_IN_SET and GROUP_CONCAT with arrays. SQLite may not have equivalent functionality at all. It could require multiple queries, temporary tables, or work done outside of the DB to rank the unread entries in each category.

@marienfressinaud marienfressinaud changed the base branch from dev to master December 18, 2019 08:28
Base automatically changed from master to edge March 14, 2021 18:46
@math-GH
Copy link
Copy Markdown
Contributor

math-GH commented Feb 25, 2022

Nothing happened here in the last 1,5 year...
@mdemoss Do you have any plans to finish this PR?

@math-GH math-GH added the Vote to close Maybe it's time to close that issue! label Feb 25, 2022
@Alkarex
Copy link
Copy Markdown
Member

Alkarex commented Feb 27, 2022

@math-GH Sorting by another criteria than what we have at the moment requires quite some more implementation efforts to be robust with all the cases we need to support. I might work on that when I am done with some of my more important own wishes. This could be in relation with an an automatic weighting of articles based on guessed importance. See related issues / drafts about sorting order

@Alkarex Alkarex marked this pull request as draft February 27, 2022 22:15
@mdemoss
Copy link
Copy Markdown
Contributor Author

mdemoss commented Feb 28, 2022

I reimplemented this in a way where it ought to be able to work with the other databases. https://github.com/mdemoss/FreshRSS/tree/1.18.0-with-mdd-shuffle

Got stuck trying to figure out how to safely change the DB schema; I need to record when entries get marked as read.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Vote to close Maybe it's time to close that issue!

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants