-
Notifications
You must be signed in to change notification settings - Fork 97
Description
When using pyexcel to stream CSV files with Pyston, I get a memory leak that does not occur when running the same code with Python 3.8.
I created a minimal example that creates a random CSV with ~3M lines and reads it using pyexcel.iget_records. Repo with steps to reproduce: https://github.com/PLPeeters/pyston-pyexcel-leak.
When monitoring docker stats, the memory increases as follows from start to finish:
- Python 3.8: 32 MB -> 33 MB
- Pyston 2.3.1: 39 MB -> 723 MB
When changing the CSV delimiter to ;, which reduces the number of fields per line from 3 to 1, memory before exiting is reduced to 300 MB, so the number of fields pyexcel has to process seems to play a role. I'm guessing there's some kind of mechanism pyexcel relies on in one way or another that is broken in Pyston and causes it to leak memory (@chfw could have some insights?).
As a side note, in my quest to get my example down to the bare minimum, I checked if using csv.reader would reproduce this bug as well; it doesn't.