Skip to content

Memory leak when using pyexcel.iget_records #149

@PLPeeters

Description

@PLPeeters

When using pyexcel to stream CSV files with Pyston, I get a memory leak that does not occur when running the same code with Python 3.8.

I created a minimal example that creates a random CSV with ~3M lines and reads it using pyexcel.iget_records. Repo with steps to reproduce: https://github.com/PLPeeters/pyston-pyexcel-leak.

When monitoring docker stats, the memory increases as follows from start to finish:

  • Python 3.8: 32 MB -> 33 MB
  • Pyston 2.3.1: 39 MB -> 723 MB

When changing the CSV delimiter to ;, which reduces the number of fields per line from 3 to 1, memory before exiting is reduced to 300 MB, so the number of fields pyexcel has to process seems to play a role. I'm guessing there's some kind of mechanism pyexcel relies on in one way or another that is broken in Pyston and causes it to leak memory (@chfw could have some insights?).

As a side note, in my quest to get my example down to the bare minimum, I checked if using csv.reader would reproduce this bug as well; it doesn't.

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions