I am noticing a roughly 3x performance penalty between v3.30.3 and v3.30.4 (despite the number of examples being halved). Since I'm also taking a 4x hit from #879 this is keeping me at v3.28.3 for now.
An example is a test that uses this hypothesis statement:
@given(ids=st.lists(st.text(alphabet=string.ascii_lowercase)))
Running under v3.30.3 these are the run statistics:
=================================================================================== test session starts ===================================================================================
platform darwin -- Python 3.6.3, pytest-3.2.3, py-1.4.34, pluggy-0.4.0 -- /Users/dan/.pyenv/versions/3.6.3/envs/pstat/bin/python
cachedir: .cache
rootdir: /Users/dan/w/pstat, inifile:
plugins: xdist-1.20.1, mock-1.6.3, forked-0.2, cov-2.5.1, hypothesis-3.30.3
collected 28 items
test/test_board.py::test_get_node_devices_calls_device PASSED
================================================================================== Hypothesis Statistics ==================================================================================
test/test_board.py::test_get_node_devices_calls_device:
- 200 passing examples, 0 failing examples, 0 invalid examples
- Typical runtimes: 4-17 ms
- Stopped because settings.max_examples=200
=================================================================================== 27 tests deselected ===================================================================================
========================================================================= 1 passed, 27 deselected in 3.33 seconds =========================================================================
And when only changing to v3.30.4:
platform darwin -- Python 3.6.3, pytest-3.2.3, py-1.4.34, pluggy-0.4.0 -- /Users/dan/.pyenv/versions/3.6.3/envs/pstat/bin/python
cachedir: .cache
rootdir: /Users/dan/w/pstat, inifile:
plugins: xdist-1.20.1, mock-1.6.3, forked-0.2, cov-2.5.1, hypothesis-3.30.4
collected 28 items
test/test_board.py::test_get_node_devices_calls_device PASSED
================================================================================== Hypothesis Statistics ==================================================================================
test/test_board.py::test_get_node_devices_calls_device:
- 100 passing examples, 0 failing examples, 0 invalid examples
- Typical runtimes: 4-14 ms
- Stopped because settings.max_examples=100
=================================================================================== 27 tests deselected ===================================================================================
======================================================================== 1 passed, 27 deselected in 10.20 seconds =========================================================================
Notice that even though the typical run time is roughly the same, the execution time triples. Here is the profiling for v3.30.3
Profiling (from /Users/dan/w/pstat/prof/combined.prof):
Fri Oct 6 18:03:32 2017 /Users/dan/w/pstat/prof/combined.prof
10990129 function calls (10962229 primitive calls) in 5.074 seconds
Ordered by: cumulative time
List reduced from 741 to 20 due to restriction <20>
ncalls tottime percall cumtime percall filename:lineno(function)
4/1 0.000 0.000 5.077 5.077 pluggy.py:263(__init__)
5/1 0.000 0.000 5.077 5.077 pluggy.py:598(execute)
3/1 0.000 0.000 5.077 5.077 pluggy.py:238(_wrapped_call)
1 0.000 0.000 5.077 5.077 runner.py:109(pytest_runtest_call)
1 0.000 0.000 5.077 5.077 python.py:1167(runtest)
1 0.000 0.000 5.077 5.077 pluggy.py:743(__call__)
1 0.000 0.000 5.077 5.077 pluggy.py:336(_hookexec)
1 0.000 0.000 5.077 5.077 pluggy.py:333(<lambda>)
1 0.000 0.000 5.077 5.077 python.py:133(pytest_pyfunc_call)
1 0.017 0.017 5.077 5.077 core.py:931(wrapped_test)
1 0.000 0.000 5.020 5.020 core.py:747(run)
1 0.000 0.000 5.020 5.020 engine.py:389(run)
1 0.004 0.004 5.020 5.020 engine.py:654(_run)
200 0.191 0.001 5.004 0.025 engine.py:134(test_function)
200 0.159 0.001 2.605 0.013 core.py:680(evaluate_test_data)
212115 0.203 0.000 1.921 0.000 {method 'get' of 'dict' objects}
800081 0.936 0.000 1.834 0.000 _make.py:415(_attrs_to_tuple)
401321 0.202 0.000 1.259 0.000 _make.py:431(hash_)
199380 0.153 0.000 1.113 0.000 _make.py:454(eq)
398760 0.127 0.000 0.961 0.000 _make.py:448(attrs_to_tuple)
And v3.30.4:
Fri Oct 6 18:03:59 2017 /Users/dan/w/pstat/prof/combined.prof
49196866 function calls (48286483 primitive calls) in 18.292 seconds
Ordered by: cumulative time
List reduced from 803 to 20 due to restriction <20>
ncalls tottime percall cumtime percall filename:lineno(function)
4/1 0.000 0.000 18.294 18.294 pluggy.py:263(__init__)
5/1 0.000 0.000 18.294 18.294 pluggy.py:598(execute)
3/1 0.000 0.000 18.294 18.294 pluggy.py:238(_wrapped_call)
1 0.000 0.000 18.294 18.294 runner.py:109(pytest_runtest_call)
1 0.000 0.000 18.294 18.294 python.py:1167(runtest)
1 0.000 0.000 18.294 18.294 pluggy.py:743(__call__)
1 0.000 0.000 18.294 18.294 pluggy.py:336(_hookexec)
1 0.000 0.000 18.294 18.294 pluggy.py:333(<lambda>)
1 0.000 0.000 18.294 18.294 python.py:133(pytest_pyfunc_call)
1 0.004 0.004 18.294 18.294 core.py:931(wrapped_test)
1 0.000 0.000 18.221 18.221 core.py:747(run)
1 0.000 0.000 18.221 18.221 engine.py:379(run)
1 0.000 0.000 18.221 18.221 engine.py:751(_run)
1 0.003 0.003 17.494 17.494 engine.py:649(generate_new_examples)
100 0.155 0.002 12.739 0.127 engine.py:121(test_function)
204737 1.026 0.000 10.680 0.000 engine.py:1259(rescore)
5260988 4.936 0.000 10.255 0.000 _make.py:415(_attrs_to_tuple)
100 0.276 0.003 9.566 0.096 engine.py:1211(add)
2803366/2458931 1.392 0.000 7.510 0.000 _make.py:431(hash_)
1228811/1228810 0.933 0.000 6.352 0.000 _make.py:454(eq)
Do you think this is a hypothesis issue or could it be something I'm doing to cause the degradation?
I am noticing a roughly 3x performance penalty between v3.30.3 and v3.30.4 (despite the number of examples being halved). Since I'm also taking a 4x hit from #879 this is keeping me at v3.28.3 for now.
An example is a test that uses this hypothesis statement:
Running under v3.30.3 these are the run statistics:
And when only changing to v3.30.4:
Notice that even though the typical run time is roughly the same, the execution time triples. Here is the profiling for v3.30.3
And v3.30.4:
Do you think this is a hypothesis issue or could it be something I'm doing to cause the degradation?