Skip to content

Conversation

@Fidget-Spinner
Copy link
Member

@Fidget-Spinner Fidget-Spinner commented Jan 5, 2021

This patch helps isolate static caches in _json.c, and may help with decreasing refleaks at finalization too.

No measured performance slowdowns in json_loads and json_dumps in pyperformance with this patch. There is a measurable slowdown in error raising of around 8%, but this isn't usually very hot code.

https://bugs.python.org/issue42834

Copy link
Member

@vstinner vstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you consider to use _Py_IDENTIFIER() API? It is compatible with subinterpreters and it provides interned strings. _PyUnicode_FromId() can fail with memory allocation error (at the first call, when the string is created). There are "Id" variant of many functions.

@Fidget-Spinner
Copy link
Member Author

Fidget-Spinner commented Jan 6, 2021

Did you consider to use _Py_IDENTIFIER() API? It is compatible with subinterpreters and it provides interned strings. _PyUnicode_FromId() can fail with memory allocation error (at the first call, when the string is created). There are "Id" variant of many functions.

No that didn't cross my mind due to my inexperience with the internal C API. Thank you for taking the time to review this and suggesting it to me! Please give me a week to address your suggestions as I'm currently quite busy. Thanks!

@Fidget-Spinner
Copy link
Member Author

Sorry, it seems that I was wrong, my personal computer isn't producing consistent pyperformance results, so I don't see any real slowdown/speedup.

@Fidget-Spinner Fidget-Spinner changed the title bpo-42834: Convert static cache variables in _json.c to heap variables bpo-42834: Make static cache variables in _json.c compatible with subinterpreters Jan 10, 2021
@Fidget-Spinner
Copy link
Member Author

@vstinner I think this should be ready now. Pyperformance on windows (this isn't really accurate, but it's something at least) says:

json_dumps: Mean +- std dev: [json-main2] 17.5 ms +- 0.2 ms -> [json-heap] 17.8 ms +- 0.1 ms: 1.01x slower (+1%)
Benchmark hidden because not significant (1): json_loads

If it gets merged, I'll monitor speed.python.org for awhile and see if anything crops up. Thanks for your patience and help in all of this!

@Fidget-Spinner
Copy link
Member Author

Fidget-Spinner commented Jan 29, 2021

@vstinner I was finally able to test it on a linux machine (btw, json_dumps is slightly unstable on my machine and pyperformance sometimes complains, only json_loads is stable):
on master branch:

### json_dumps ###
Mean +- std dev: 47.5 ms +- 5.6 ms

### json_loads ###
Mean +- std dev: 81.1 us +- 10.5 us

on this branch:

### json_dumps ###
Mean +- std dev: 48.2 ms +- 4.4 ms

### json_loads ###
Mean +- std dev: 80.0 us +- 8.7 us
> pyperf compare_to master2.json json-heap2.json 
Benchmark hidden because not significant (2): json_dumps, json_loads

Geometric mean: 1.00x slower

Seems like performance isn't affected :). Please take a look when you have the time for it . Thank you!

/* Use JSONDecodeError exception to raise a nice looking ValueError subclass */
static PyObject *JSONDecodeError = NULL;
PyObject *exc;
_Py_static_string(PyId_decoder, "json.decoder");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need PyId_decoder and PyId_JSONDecodeError micro-optimization? Decoding an ASCII string is fast in Python. Why not keeping PyImport_ImportModule & PyObject_GetAttrString?

I don't care much about the performance of the error path.

Copy link
Member Author

@Fidget-Spinner Fidget-Spinner Jan 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that performance on error path is usually not important. However, I think json may be a little special - many people use it to load from sources that may provide invalid json data, this is pretty common:

try:
    val = json.loads(...)
except json.decoder.JSONDecodeError:
    pass
else:
    ...

So raising error is quite a common operation. Using PyId here surprisingly seems to speed up error raising by around 20%.

@vstinner
Copy link
Member

So raising error is quite a common operation. Using PyId here surprisingly seems to speed up error raising by around 20%.

What is your benchmark showing a 20% difference?

@Fidget-Spinner
Copy link
Member Author

Fidget-Spinner commented Jan 30, 2021

So raising error is quite a common operation. Using PyId here surprisingly seems to speed up error raising by around 20%.

What is your benchmark showing a 20% difference?

It's a microbenchmark, so I don't see it as 100% representative of a real-world scenario:

pyperf timeit "import json" "try: json.loads('{dkfjdkjf')
except: pass" 
# without pyid

Mean +- std dev: 9.89 us +- 0.45 us

# with pyid

Mean +- std dev: 8.36 us +- 0.54 us

# compare

Mean +- std dev: [json-heap-no-static] 9.89 us +- 0.45 us -> [json-heap-static-string] 8.36 us +- 0.54 us: 1.18x faster

Edit: Honestly I'm quite surprised. I did not expect the static string interning to save that much time. I expected more of the call overhead to be from PyImport_ImportModule and PyObject_GetAttrString internal non-string ops, rather than almost 20% of time spent on strings.

Edit2: Wow, on master where exception is cached, it's only slightly faster than pyid with no cached exception version:

# no cached exception, no PyId
Mean +- std dev: [master-clean] 7.81 us +- 0.50 us -> [json-heap-no-static] 9.89 us +- 0.45 us: 1.27x slower

# no cached exception, PyId
Mean +- std dev: [master-clean] 7.81 us +- 0.50 us -> [json-heap-static-string] 8.36 us +- 0.54 us: 1.07x slower

@vstinner
Copy link
Member

vstinner commented Feb 1, 2021

Can you please compare using the current master branch as a reference?

  • bench 1: reference, master
  • bench 2: your PR without static strings (expected to be slower)
  • bench 3: your PR using static strings

pyperf timeit "import json" "try: json.loads('{dkfjdkjf')

It seems like you import json at each iteration. Use -s "import json".

@Fidget-Spinner
Copy link
Member Author

Fidget-Spinner commented Feb 1, 2021

It seems like you import json at each iteration. Use -s "import json".

Wow! I didn't know that. I assumed that the first argument is automatically considered setup code without having to specify -s. Thanks for correcting me.

Can you please compare using the current master branch as a reference?

* bench 1: reference, master

* bench 2: your PR without static strings (expected to be slower)

* bench 3: your PR using static strings

I did:

  • sudo pyperf system tune
  • Ensured that I only saw 0.1us difference between runs before starting the benchmark.
python3 -m pyperf timeit -s "import json" "try: json.loads('{')
except: pass"

bench 1: reference, master
Mean +- std dev: 8.00 us +- 0.19 us

bench 2: PR without static strings (expected to be slower)
Mean +- std dev: 10.3 us +- 0.2 us

bench 3: PR using static strings
Mean +- std dev: 8.52 us +- 0.26 us

# compare_to bench1, bench2, bench3
python3 -m pyperf compare_to master.json  json-heap-no-pyid.json json-heap-pyid.json 
Mean +- std dev: [master] 8.00 us +- 0.19 us -> [json-heap-no-pyid] 10.3 us +- 0.2 us: 1.28x slower
Mean +- std dev: [master] 8.00 us +- 0.19 us -> [json-heap-pyid] 8.52 us +- 0.26 us: 1.07x slower

Edit: 1us -> 0.1us

@vstinner vstinner merged commit b5931f1 into python:master Feb 1, 2021
@vstinner
Copy link
Member

vstinner commented Feb 1, 2021

Merged, thanks. Well, since the code already uses static strings for the "hot code" (true/false), it's ok to "optimize" the error path (raise an exception).

@Fidget-Spinner
Copy link
Member Author

Thanks for your patience and guidance on this PR Victor! I have waaaay more respect for the all the pyperf/pyperformance people now - benchmarks are pretty hard 😆.

@Fidget-Spinner Fidget-Spinner deleted the json-heap branch February 1, 2021 16:33
@vstinner
Copy link
Member

vstinner commented Feb 1, 2021

Ensured that I only saw 0.1us difference between runs before starting the benchmark.

I don't understand what you mean. If you don't use CPU isolation, you should run benchmarks on an idle machine (don't run other programs in parallel).

You should not pick which numbers look better to you, but accumulate more data. Use --append option rather than -o/--output when you create JSON files. Using pyperf, you can run a benchmark many times and accumulate more and more runs, pyperf computes the mean and std dev. Then you can have fun with stats and hist commands ;-)

https://pyperf.readthedocs.io/en/latest/analyze.html

@Fidget-Spinner
Copy link
Member Author

Fidget-Spinner commented Feb 1, 2021

You should not pick which numbers look better to you, but accumulate more data. Use --append option rather than -o/--output when you create JSON files. Using pyperf, you can run a benchmark many times and accumulate more and more runs, pyperf computes the mean and std dev. Then you can have fun with stats and hist commands ;-)

https://pyperf.readthedocs.io/en/latest/analyze.html

Thanks, TIL something new again about append.

I don't understand what you mean. If you don't use CPU isolation, you should run benchmarks on an idle machine (don't run other programs in parallel).

Sorry I wasn't clear here. What I meant was: I did that to make sure that all the tuning was working, not cherry picking data on purpose. Because on Windows, when I ran pyperf, there was actually still quite a lot of noise between each run (>1us! because tuning didn't really work!). So this just meant: my machine was quite stable.

On the linux benchmarking machine I didn't have any other user programs running (well other than the desktop environment and terminal :p).

@Fidget-Spinner
Copy link
Member Author

Point taken about accumulating data though - I never knew such a useful feature existed. If I had known I'd definitely use it way more. Thanks!

@vstinner
Copy link
Member

vstinner commented Feb 1, 2021

Oh, I only run benchmarks on Linux. Someone should write a documentation explaining how to run reproducible benchmarks on Windows. pyperf changes the process priority if it has the required dependency and the permissions. But I don't recall the details.

@Fidget-Spinner
Copy link
Member Author

Fidget-Spinner commented Feb 1, 2021

Oh, I only run benchmarks on Linux. Someone should write a documentation explaining how to run reproducible benchmarks on Windows. pyperf changes the process priority if it has the required dependency and the permissions. But I don't recall the details.

Yeap. After a while I gave up and just installed Linux 😅. I don't know if there's an easy way for pyperf to disable turbo boost properly on Windows, and some of the other tuning options which affect accuracy too can't be easily changed. So I just installed Linux on a real computer and ran all the benchmarks you see there.

Edit: Just to clarify:
1st benchmark (1 month ago): Windows
Everything else in the past 1 week: Linux

adorilson pushed a commit to adorilson/cpython that referenced this pull request Mar 13, 2021
)

Make internal caches of the _json extension module
compatible with subinterpreters.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants