Skip to content

Postgresql: commitNewEntries() fail due to conflict on duplicate keys while inserting #1610

@Trim

Description

@Trim

Hello,

I'm using FreshRSS 1.7.0 (zip release download) with PostgreSQL 9.6.4-0+deb9u1 on Debian Stretch.

It appears, that after a while, FreshRSS don't import any more articles from RSS feeds.

I've found the issue: there's an SQL Error inside the PostgreSQL logs (sorry, they are French « speaking »):

2017-08-15 06:36:30 CEST [23983-600] freshrss@freshrss ERREUR:  la valeur d'une clé dupliquée rompt la contrainte unique « adrien_entry_id_feed_guid_key »
2017-08-15 06:36:30 CEST [23983-601] freshrss@freshrss DÉTAIL:  La clé « (id_feed, guid)=(9, http://www.minimachines.net/?p=53460) » existe déjà.
2017-08-15 06:36:30 CEST [23983-602] freshrss@freshrss CONTEXTE :  instruction SQL « INSERT INTO "adrien_entry" (id, guid, title, author, content, link, date, "lastSeen", hash, is_read, is_favorite, id_feed, tag
s)
                        (SELECT rank + row_number() OVER(ORDER BY date) AS id, guid, title, author, content, link, date, "lastSeen", hash, is_read, is_favorite, id_feed, tags FROM "adrien_entrytmp" ORDER BY date
) »
        fonction PL/pgsql inline_code_block, ligne 6 à instruction SQL
2017-08-15 06:36:30 CEST [23983-603] freshrss@freshrss INSTRUCTION :  DO $$
        DECLARE
        maxrank bigint := (SELECT MAX(id) FROM "adrien_entrytmp");
        rank bigint := (SELECT maxrank - COUNT(*) FROM "adrien_entrytmp");
        BEGIN
                INSERT INTO "adrien_entry" (id, guid, title, author, content, link, date, "lastSeen", hash, is_read, is_favorite, id_feed, tags)
                        (SELECT rank + row_number() OVER(ORDER BY date) AS id, guid, title, author, content, link, date, "lastSeen", hash, is_read, is_favorite, id_feed, tags FROM "adrien_entrytmp" ORDER BY date
);
                DELETE FROM "adrien_entrytmp" WHERE id <= maxrank;
        END $$;
2017-08-15 06:36:30 CEST [23983-604] freshrss@freshrss ERREUR:  la transaction est annulée, les commandes sont ignorées jusqu'à la fin du bloc
        de la transaction

Little bit explanation, as text is in French: first line explains that an error occurred due to a conflict on key constraint uniqueness (id_feed, guid). Next line explains that there's already an entry with values (9, http://www.minimachines.net/?p=53460) for this key.

Two next lines give the exact transaction sql query which failed and advertise that all the transaction has been canceled.

So, I've checked files EntryDAO.php and EntryDAOPGSQL.php to see the difference between MySQL and PostgreSQL statements.

The first one is:

	public function commitNewEntries() {
		$sql = 'SET @rank=(SELECT MAX(id) - COUNT(*) FROM `' . $this->prefix . 'entrytmp`); ' .	//MySQL-specific
			'INSERT IGNORE INTO `' . $this->prefix . 'entry` (id, guid, title, author, content_bin, link, date, `lastSeen`, hash, is_read, is_favorite, id_feed, tags) ' .
				'SELECT @rank:=@rank+1 AS id, guid, title, author, content_bin, link, date, `lastSeen`, hash, is_read, is_favorite, id_feed, tags FROM `' . $this->prefix . 'entrytmp` ORDER BY date; ' .
			'DELETE FROM `' . $this->prefix . 'entrytmp` WHERE id <= @rank;';
		$hadTransaction = $this->bd->inTransaction();
		if (!$hadTransaction) {
			$this->bd->beginTransaction();
		}
		$result = $this->bd->exec($sql) !== false;
		if (!$hadTransaction) {
			$this->bd->commit();
		}
		return $result;
	}

The second one is:

	public function commitNewEntries() {
		$sql = 'DO $$
DECLARE
maxrank bigint := (SELECT MAX(id) FROM `' . $this->prefix . 'entrytmp`);
rank bigint := (SELECT maxrank - COUNT(*) FROM `' . $this->prefix . 'entrytmp`);
BEGIN
	INSERT INTO `' . $this->prefix . 'entry` (id, guid, title, author, content, link, date, `lastSeen`, hash, is_read, is_favorite, id_feed, tags)
		(SELECT rank + row_number() OVER(ORDER BY date) AS id, guid, title, author, content, link, date, `lastSeen`, hash, is_read, is_favorite, id_feed, tags FROM `' . $this->prefix . 'entrytmp` ORDER BY date);
	DELETE FROM `' . $this->prefix . 'entrytmp` WHERE id <= maxrank;
END $$;';
		$hadTransaction = $this->bd->inTransaction();
		if (!$hadTransaction) {
			$this->bd->beginTransaction();
		}
		$result = $this->bd->exec($sql) !== false;
		if (!$hadTransaction) {
			$this->bd->commit();
		}
		return $result;
	}

As you can see, MySQL original code use INSERT IGNORE instruction and PostgreSQL use simply INSERT and so fails on duplicates.

To fix that, we have three solutions:

  1. Add a sentence inside the transaction to first DELETE duplicates between tables _entry and _entrytmp tables
  2. Modify the SELECT statement to exclude duplicates to the result set (i.e. modify the WHERE part)
  3. Use the ON CONFLICT DO NOTHING new PostgreSQL feature (>=9.5 according to online doc)

I've tried the third solution on my server and it worked well on next sync: all unread articles appeared and adrien_entrytmp has been well cleared.

I'm making two Pull Requests for first and third solutions which are simplest. If I've time, I'll do one for the second solution.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions