Skip to content

Reduce memory when parsing large feeds#672

Merged
mblaney merged 1 commit intosimplepie:masterfrom
FreshRSS:xml_parse-large-files
Feb 20, 2021
Merged

Reduce memory when parsing large feeds#672
mblaney merged 1 commit intosimplepie:masterfrom
FreshRSS:xml_parse-large-files

Conversation

@Alkarex
Copy link
Copy Markdown
Contributor

@Alkarex Alkarex commented Feb 17, 2021

Upstream PR for FreshRSS/FreshRSS#3416 (use case is 12MB+ feed)

Use the approach recommended by https://php.net/xml-parse#example-5983 for parsing documents that can potentially be large, because parsing a whole document in one go takes a lot of memory.

No change in parsing approach compared to now for feeds up to 1MB (i.e. most feeds are unchanged - in my list of 173 test feeds, only one is larger than 1MB). Larger feeds will be parsed in more than one iteration (no functional difference).

Using the php://temp as defined in https://php.net/wrappers.php fully in memory for feeds up to 2MB (by default) then using system's temp directory https://php.net/sys-get-temp-dir

There is a test for badly configured systems with an unwritable temp directory for which we only use php://memory (only in-memory even if it does not fit)

Credits to @Kiblyn11 for the idea and the original PR.

Upstream PR for FreshRSS/FreshRSS#3416 (use case
is 12MB+ feed)

Use the approach recommended by https://php.net/xml-parse#example-5983
for parsing documents that can potentially be large, because parsing a
whole document in one go takes a lot of memory.

No change in parsing approach compared to now for feeds up to 1MB (i.e.
most feeds are unchanged - in my list of 173 test feeds, only one is
larger than 1MB). Larger feeds will be parsed in more than one iteration
(no functional difference).

Using the php://temp as defined in https://php.net/wrappers.php fully in
memory for feeds up to 2MB (by default) then using system's temp
directory https://php.net/sys-get-temp-dir

There is a test for badly configured systems with an unwriteable temp
directory for which we only use php://memory (only in-memory even if it
does not fit)

Credits to @Kiblyn11 for the idea and the original PR.

// Parse!
if (!xml_parse($xml, $data, true))
$wrapper = @is_writable(sys_get_temp_dir()) ? 'php://temp' : 'php://memory';
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can be simplified by always using php://memory at the cost of not being able to parse just as large feeds

@mblaney mblaney merged commit 155cfcf into simplepie:master Feb 20, 2021
@mblaney
Copy link
Copy Markdown
Member

mblaney commented Feb 20, 2021

this looks great, thanks!

@Alkarex Alkarex deleted the xml_parse-large-files branch March 20, 2021 21:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants