-
Notifications
You must be signed in to change notification settings - Fork 38
Memory leaks in ujson #4
Copy link
Copy link
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
We've replaced the usage of json with srsly.ujson a while ago for that free performance boost since we are doing lots of JSON encoding/decoding and we already have it installed as part of spacy, but now we had to move back because of some terrible memory leaks:
import json
import random
import string
import psutil
from srsly import ujson as json
sample = lambda x: ''.join(
random.choice(string.ascii_uppercase + string.digits) for _ in range(x))
process = psutil.Process()
for i in range(10):
data = json.dumps({sample(99): sample(100000) for k in range(50)})
json.loads(data)
print(process.memory_info())Output with ujson:
pmem(rss=24203264, vms=4400664576, pfaults=19049, pageins=0)
pmem(rss=29409280, vms=4414173184, pfaults=33898, pageins=0)
pmem(rss=34557952, vms=4419309568, pfaults=47960, pageins=0)
pmem(rss=39714816, vms=4424429568, pfaults=62855, pageins=0)
pmem(rss=44838912, vms=4429549568, pfaults=77571, pageins=0)
pmem(rss=50081792, vms=4434751488, pfaults=92307, pageins=0)
pmem(rss=55312384, vms=4439973888, pfaults=107288, pageins=0)
pmem(rss=60440576, vms=4445093888, pfaults=122711, pageins=0)
pmem(rss=65806336, vms=4451422208, pfaults=137578, pageins=0)
pmem(rss=70934528, vms=4456542208, pfaults=151382, pageins=0)
Output with stdlib json:
pmem(rss=17154048, vms=4385366016, pfaults=17692, pageins=0)
pmem(rss=17317888, vms=4403191808, pfaults=32047, pageins=0)
pmem(rss=17317888, vms=4403191808, pfaults=46541, pageins=0)
pmem(rss=17317888, vms=4403191808, pfaults=61035, pageins=0)
pmem(rss=17358848, vms=4403191808, pfaults=75539, pageins=0)
pmem(rss=17383424, vms=4403191808, pfaults=90039, pageins=0)
pmem(rss=17383424, vms=4403191808, pfaults=104533, pageins=0)
pmem(rss=17420288, vms=4403191808, pfaults=119036, pageins=0)
pmem(rss=17420288, vms=4403191808, pfaults=133530, pageins=0)
pmem(rss=17420288, vms=4403191808, pfaults=148024, pageins=0)
Benchmark ran on python3.7 on macos, but same leak exists on Debian with python27.
You can increase the range from 10 and you will eventually run of our memory.
Days were spent on this issue, because I would never suspect the JSON library to be at fault, but it is. I don't know if it affects Spacy in any way.
Could be related: ultrajson/ultrajson#270
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working