Skip to content

commitNewEntries should ignore conflicting keys on migration from ent…#1613

Merged
Alkarex merged 3 commits intoFreshRSS:devfrom
Trim:fix-postgresql-commitnewentry-by-ignore
Aug 16, 2017
Merged

commitNewEntries should ignore conflicting keys on migration from ent…#1613
Alkarex merged 3 commits intoFreshRSS:devfrom
Trim:fix-postgresql-commitnewentry-by-ignore

Conversation

@Trim
Copy link
Copy Markdown
Contributor

@Trim Trim commented Aug 15, 2017

…rytmp to entry

Fixes #1610 by first ignoring all entries from entrytmp which exists
already inside the entry table.

…rytmp to entry

Fixes FreshRSS#1610 by first ignoring all entries from entrytmp which exists
already inside the entry table.
@Alkarex
Copy link
Copy Markdown
Member

Alkarex commented Aug 15, 2017

@Alkarex Alkarex added this to the 1.8.0 milestone Aug 15, 2017
(SELECT rank + row_number() OVER(ORDER BY date) AS id, guid, title, author, content, link, date, `lastSeen`, hash, is_read, is_favorite, id_feed, tags FROM `' . $this->prefix . 'entrytmp` ORDER BY date);
(SELECT rank + row_number() OVER(ORDER BY date) AS id, guid, title, author, content, link, date, `lastSeen`, hash, is_read, is_favorite, id_feed, tags
FROM `' . $this->prefix . 'entrytmp` AS entrytmp
WHERE NOT EXISTS (SELECT 1 FROM `' . $this->prefix . 'entry` AS entry WHERE entrytmp.id_feed = entry.id_feed AND entrytmp.guid = entry.guid)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please change the name of the aliases just like you did in 0d51b68

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done 👍

@Alkarex
Copy link
Copy Markdown
Member

Alkarex commented Aug 15, 2017

P.S.: @Trim, putting your name in the Credits file was just a suggestion; no problem if you do not want :-)

@Trim
Copy link
Copy Markdown
Contributor Author

Trim commented Aug 15, 2017

Oh, I didn't see your first comment, I'll add a commit to this PR.

Note, that I tried to run this modification on my server and I run into a related issue: I have an RSS feed containing two articles with same URI: http://www.geany.org/Main/HomePage?action=blogrss contains two articles with URI http://www.geany.org/Main/20170717 (the latest two).

This issue continue to rise a PostgreSQL error, but it didn't block all the transaction:

2017-08-15 14:15:22 CEST [4159-2] freshrss@freshrss DÉTAIL:  La clé « (id_feed, guid)=(67, http://www.geany.org/Main/20170717) » existe déjà.
2017-08-15 14:15:22 CEST [4159-3] freshrss@freshrss INSTRUCTION :  INSERT INTO "adrien_entrytmp" (id, guid, title, author, content, link, date, "lastSeen", hash, is_read, is_favorite, id_feed, tags) VALUES($1, $2, $3, $4, $5, $6, $7, $8, decode($9, 'hex'), $10, $11, $12, $13)
2017-08-15 14:15:22 CEST [4159-4] freshrss@freshrss ERREUR:  la transaction est annulée, les commandes sont ignorées jusqu'à la fin du bloc
	de la transaction

Now, I don't know two thing:

  • why not all the block contained inside the BEGIN END block has been reverted (other feeds have been updated)
  • how FreshRSS should handle articles with same URI

@Alkarex
Copy link
Copy Markdown
Member

Alkarex commented Aug 15, 2017

Thanks for the other related issue. For the time being, I would suggest to add the same WHERE clause than the one you have made, for the initial insert into the temporary table.
It is not a problem to have entries with identical URL, IF they provide a GUID or equivalent (e.g. rdf:about). When they do not provide a GUID, then it is not trivial to make a good unique identifier (hashing is possible, but then it fails when articles are updated). So the behaviour right now should be to keep the newest article of a given URL when there is no GUID.

@Trim
Copy link
Copy Markdown
Contributor Author

Trim commented Aug 15, 2017

I've just understood the log message with your comment:
the error I see now is not inside the commitNewEntries, but inside the addEntry function.

So this pull request still make sense and I still understand transactions :)

PS: I updated changelog and contributors files

@Alkarex
Copy link
Copy Markdown
Member

Alkarex commented Aug 15, 2017

Indeed, during the discovery of new articles (that is when articles are added to the temporary table), there is one transaction by feed. Then, when committing to the main table, there is one single transaction for all new articles.

@Alkarex
Copy link
Copy Markdown
Member

Alkarex commented Aug 16, 2017

@Trim Could you please open a new issue for the related problem you reported above? #1613 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants