Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion src/SimplePie.php
Original file line number Diff line number Diff line change
Expand Up @@ -1900,7 +1900,7 @@ protected function fetch_data(&$cache)
$this->data = [];
}
// Check if the cache has been updated
elseif (isset($this->data['cache_expiration_time']) && $this->data['cache_expiration_time'] > time()) {
elseif (!isset($this->data['cache_expiration_time']) || $this->data['cache_expiration_time'] < time()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading the explanations from both of you I think this PR is reasonable.

The last remaining thing not clear to me is how it can this patch fix anything without https://github.com/FreshRSS/simplepie/blob/5e92ce767597aa9f5cbb00d88c7591b2e604dc5f/src/Cache/BaseDataCache.php#L58-L62

If we use the same $this->cache_duration as TTL passed to $cache->set_data() and to $this->data['cache_expiration_time'], I would expect that if the cache expires, the following would return the default empty array, and the above line would never be executed, as it is within if (!empty($this->data)):

$this->data = $cache->get_data($cacheKey, []);

Copy link
Contributor Author

@Alkarex Alkarex Sep 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how it can this patch fix anything

This patch will fix some cases when the internal TTL has not expired yet. I think this is needed in any case.

But I completely agree that to have a working 304 cache, more fixes of the logic are needed. This PR focusses on the straightforward bugs in the current logic, which are needed no matter what, and should not be controversial (hopefully).

P.S.: In other words, to properly test the behaviour of the current logic, this PR is needed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But how could the internal TTL not have expired when $this->data['cache_expiration_time'] did – are not they set to the same cache duration?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While the default cache implementation ("legacy file cache") is still broken, I believe passing another implementation might work (but I have not investigated in details)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, alright. We can set the TTL to null in all set_data calls in a follow up PR to actually fix it then.

// Want to know if we tried to send last-modified and/or etag headers
// when requesting this file. (Note that it's up to the file to
// support this, but we don't always send the headers either.)
Expand All @@ -1924,6 +1924,7 @@ protected function fetch_data(&$cache)
$this->status_code = 0;

if ($this->force_cache_fallback) {
$this->data['cache_expiration_time'] = $this->cache_duration + time();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also probably want to do the same for 304 response.

Copy link
Contributor Author

@Alkarex Alkarex Sep 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I was wondering why does not the in-cache TTL remove the entry when it expires, which would break the 304 and it does, as you found out in https://github.com/FreshRSS/simplepie/blob/5e92ce767597aa9f5cbb00d88c7591b2e604dc5f/src/Cache/BaseDataCache.php#L58-L62

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can happily send upstream more of the fixes / improvements, but best after some of the existing pending PRs are processed

$cache->set_data($cacheKey, $this->data, $this->cache_duration);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand this correctly, this just extends the cache lifetime ($this->data remains the same). Not sure why bb38d74 made SimplePie store cache_expiration_time in the cache when set_data already takes TTL but it makes sense to update cache_expiration_time here too.

Probably should be a separate commit from the test fix, though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tests are still related to cache, and were preventing this PR from passing the tests, but can be put in another PR

Copy link
Contributor

@Art4 Art4 Sep 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jtojnar $this->data['cache_expiration_time'] was introduced to allow mimic the functionality of mtime() and touch(), because that allowed us to use the cached data even if it was expired.

Copy link
Member

@jtojnar jtojnar Sep 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, maybe we should just pass TTL as null (unlimited) to Psr16 caches, and have another layer that would emulate the old mtime()/touch() API, if we want to have access to expired cache entries. I do not see how can we fix the cache fallback/304 otherwise.

In the long term, we can probably come up with a nicer API though. For example, we could have a class CacheWrapper that would handle expiration and invalidation transparently – it could store the cache entry as follows

/** @type CacheEntry array{mtime: int, extraKeys: array<string, mixed>, value: mixed} */

and be used something like this:

$cacheWrapper = new CacheWrapper(new Psr16($this->cache), $this->cache_duration);

$this->data = $cacheWrapper->get(
    $key,
    // Extra validation keys, since the allowed size of `$key` is limited.
    // If value of any of the keys does not match the one in the `extraKeys` array,
    // the value will be considered stale and one of the following actions will be performed.
    [
        // Different version of SimplePie might have unrecognizable structure of the cache data
        // so pass `null` as `$staleData` to the callback.
        ['key' => 'build', 'expected' => Misc::get_build(), 'action' => CacheWrapper::INVALID_ACTION_DISCARD],
        // A collision. Cause the value returned by the callback to be returned by `CacheWrapper::get()` but the cache would not be updated.
        // I guess this could also be handled transparently by having the `CacheWrapper` store it in `CacheEntry`.
        ['key' => 'url', 'expected' => $this->feed_url, 'action' => CacheWrapper::INVALID_ACTION_COLLISION],
        // …
    ],
    // This callback will be executed when the cache lacks the key, or it is invalid.
    // `$staleData` will be null if the key did not exist.
    // The cache will be updated with the returned value and the mtime will be updated.
    // Except maybe when exception occurs, then we might want to unset the cache and propagate the exception.
    function(?array $staleData) use (&$obtainedFromCache): array {
        $headers = [];
        if ($staleData !== null && isset($staleData['headers']['etag'])) {
            $headers['etag'] = $staleData['headers']['etag'];
        }

        $file = fetch($url, $headers);

        if ($this->status_code === 304) {
            $obtainedFromCache = true;
            return $staleData;
        }

        // …

        return  [
            'url' => $this->feed_url,
            'feed_url' => $file->get_final_requested_uri(),
            'build' => Misc::get_build(),
            'cache_expiration_time' => $this->cache_duration + time(),
        ];
    }
);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest to address this potential refactoring in another PR though :-)
The current PR fixes some relatively obvious bugs, so merging it would already be beneficial

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P.S. Regarding the tests, the change is due to the fix of the inequality higher up, so they are directly related


return true;
Expand All @@ -1936,6 +1937,7 @@ protected function fetch_data(&$cache)
// Set raw_data to false here too, to signify that the cache
// is still valid.
$this->raw_data = false;
$this->data['cache_expiration_time'] = $this->cache_duration + time();
$cache->set_data($cacheKey, $this->data, $this->cache_duration);

return true;
Expand Down
6 changes: 3 additions & 3 deletions tests/Integration/CachingTest.php
Original file line number Diff line number Diff line change
Expand Up @@ -120,7 +120,7 @@ public function testInitWithDifferentCacheStateCallsCacheCorrectly(
*/
public static function provideSavedCacheData(): array
{
$defaultMtime = time();
$defaultMtime = time() - 1; // -1 to account for tests running within the same second
$defaultExpirationTime = $defaultMtime + 3600;

$expectDefaultDataWritten = [
Expand Down Expand Up @@ -246,10 +246,10 @@ public static function provideSavedCacheData(): array
[CacheInterface::class, $currentlyCachedDataWithNonFeedUrl, $expectDataWithNewFeedUrl, $defaultMtime],
// Check if the cache has been updated
[Base::class, $currentlyCachedDataIsUpdated, $expectDefaultDataWritten, $defaultMtime],
[CacheInterface::class, $currentlyCachedDataIsUpdated, $expectDefaultDataWritten, $defaultMtime],
[CacheInterface::class, $currentlyCachedDataIsUpdated, $expectNoDataWritten, $defaultMtime],
// If the cache is still valid, just return true
[Base::class, $currentlyCachedDataIsValid, $expectDefaultDataWritten, $defaultMtime],
[CacheInterface::class, $currentlyCachedDataIsValid, $expectNoDataWritten, $defaultMtime],
[CacheInterface::class, $currentlyCachedDataIsValid, $expectDefaultDataWritten, $defaultMtime],
];
}
}