Reduce memory when parsing large feeds by Alkarex · Pull Request #672 · simplepie/simplepie

Alkarex · 2021-02-17T22:30:06Z

Upstream PR for FreshRSS/FreshRSS#3416 (use case is 12MB+ feed)

Use the approach recommended by https://php.net/xml-parse#example-5983 for parsing documents that can potentially be large, because parsing a whole document in one go takes a lot of memory.

No change in parsing approach compared to now for feeds up to 1MB (i.e. most feeds are unchanged - in my list of 173 test feeds, only one is larger than 1MB). Larger feeds will be parsed in more than one iteration (no functional difference).

Using the php://temp as defined in https://php.net/wrappers.php fully in memory for feeds up to 2MB (by default) then using system's temp directory https://php.net/sys-get-temp-dir

There is a test for badly configured systems with an unwritable temp directory for which we only use php://memory (only in-memory even if it does not fit)

Credits to @Kiblyn11 for the idea and the original PR.

@Kiblyn11

Upstream PR for FreshRSS/FreshRSS#3416 (use case is 12MB+ feed) Use the approach recommended by https://php.net/xml-parse#example-5983 for parsing documents that can potentially be large, because parsing a whole document in one go takes a lot of memory. No change in parsing approach compared to now for feeds up to 1MB (i.e. most feeds are unchanged - in my list of 173 test feeds, only one is larger than 1MB). Larger feeds will be parsed in more than one iteration (no functional difference). Using the php://temp as defined in https://php.net/wrappers.php fully in memory for feeds up to 2MB (by default) then using system's temp directory https://php.net/sys-get-temp-dir There is a test for badly configured systems with an unwriteable temp directory for which we only use php://memory (only in-memory even if it does not fit) Credits to @Kiblyn11 for the idea and the original PR.

Alkarex · 2021-02-17T22:32:04Z

library/SimplePie/Parser.php


 			// Parse!
-			if (!xml_parse($xml, $data, true))
+			$wrapper = @is_writable(sys_get_temp_dir()) ? 'php://temp' : 'php://memory';


Can be simplified by always using php://memory at the cost of not being able to parse just as large feeds

mblaney · 2021-02-20T08:02:04Z

this looks great, thanks!

pull-request-size bot added the size/S label Feb 17, 2021

Alkarex commented Feb 17, 2021

View reviewed changes

mblaney merged commit 155cfcf into simplepie:master Feb 20, 2021

Alkarex deleted the xml_parse-large-files branch March 20, 2021 21:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce memory when parsing large feeds#672

Reduce memory when parsing large feeds#672
mblaney merged 1 commit intosimplepie:masterfrom
FreshRSS:xml_parse-large-files

Alkarex commented Feb 17, 2021 •

edited

Loading

Uh oh!

Alkarex Feb 17, 2021

Uh oh!

mblaney commented Feb 20, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Alkarex commented Feb 17, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Alkarex Feb 17, 2021

Choose a reason for hiding this comment

Uh oh!

mblaney commented Feb 20, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Alkarex commented Feb 17, 2021 •

edited

Loading