Skip to content

Memory leaks in ujson #4

@sadovnychyi

Description

@sadovnychyi

We've replaced the usage of json with srsly.ujson a while ago for that free performance boost since we are doing lots of JSON encoding/decoding and we already have it installed as part of spacy, but now we had to move back because of some terrible memory leaks:

import json
import random
import string
import psutil
from srsly import ujson as json

sample = lambda x: ''.join(
  random.choice(string.ascii_uppercase + string.digits) for _ in range(x))

process = psutil.Process()

for i in range(10):
  data = json.dumps({sample(99): sample(100000) for k in range(50)})
  json.loads(data)
  print(process.memory_info())

Output with ujson:

pmem(rss=24203264, vms=4400664576, pfaults=19049, pageins=0)
pmem(rss=29409280, vms=4414173184, pfaults=33898, pageins=0)
pmem(rss=34557952, vms=4419309568, pfaults=47960, pageins=0)
pmem(rss=39714816, vms=4424429568, pfaults=62855, pageins=0)
pmem(rss=44838912, vms=4429549568, pfaults=77571, pageins=0)
pmem(rss=50081792, vms=4434751488, pfaults=92307, pageins=0)
pmem(rss=55312384, vms=4439973888, pfaults=107288, pageins=0)
pmem(rss=60440576, vms=4445093888, pfaults=122711, pageins=0)
pmem(rss=65806336, vms=4451422208, pfaults=137578, pageins=0)
pmem(rss=70934528, vms=4456542208, pfaults=151382, pageins=0)

Output with stdlib json:

pmem(rss=17154048, vms=4385366016, pfaults=17692, pageins=0)
pmem(rss=17317888, vms=4403191808, pfaults=32047, pageins=0)
pmem(rss=17317888, vms=4403191808, pfaults=46541, pageins=0)
pmem(rss=17317888, vms=4403191808, pfaults=61035, pageins=0)
pmem(rss=17358848, vms=4403191808, pfaults=75539, pageins=0)
pmem(rss=17383424, vms=4403191808, pfaults=90039, pageins=0)
pmem(rss=17383424, vms=4403191808, pfaults=104533, pageins=0)
pmem(rss=17420288, vms=4403191808, pfaults=119036, pageins=0)
pmem(rss=17420288, vms=4403191808, pfaults=133530, pageins=0)
pmem(rss=17420288, vms=4403191808, pfaults=148024, pageins=0)

Benchmark ran on python3.7 on macos, but same leak exists on Debian with python27.
You can increase the range from 10 and you will eventually run of our memory.

Days were spent on this issue, because I would never suspect the JSON library to be at fault, but it is. I don't know if it affects Spacy in any way.

Could be related: ultrajson/ultrajson#270

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions