Skip to content

Significant performance degradation for v3.30.4 #919

@dangitall

Description

@dangitall

I am noticing a roughly 3x performance penalty between v3.30.3 and v3.30.4 (despite the number of examples being halved). Since I'm also taking a 4x hit from #879 this is keeping me at v3.28.3 for now.

An example is a test that uses this hypothesis statement:

@given(ids=st.lists(st.text(alphabet=string.ascii_lowercase)))

Running under v3.30.3 these are the run statistics:

=================================================================================== test session starts ===================================================================================
platform darwin -- Python 3.6.3, pytest-3.2.3, py-1.4.34, pluggy-0.4.0 -- /Users/dan/.pyenv/versions/3.6.3/envs/pstat/bin/python
cachedir: .cache
rootdir: /Users/dan/w/pstat, inifile:
plugins: xdist-1.20.1, mock-1.6.3, forked-0.2, cov-2.5.1, hypothesis-3.30.3
collected 28 items

test/test_board.py::test_get_node_devices_calls_device PASSED
================================================================================== Hypothesis Statistics ==================================================================================

test/test_board.py::test_get_node_devices_calls_device:

  - 200 passing examples, 0 failing examples, 0 invalid examples
  - Typical runtimes: 4-17 ms
  - Stopped because settings.max_examples=200

=================================================================================== 27 tests deselected ===================================================================================
========================================================================= 1 passed, 27 deselected in 3.33 seconds =========================================================================

And when only changing to v3.30.4:

platform darwin -- Python 3.6.3, pytest-3.2.3, py-1.4.34, pluggy-0.4.0 -- /Users/dan/.pyenv/versions/3.6.3/envs/pstat/bin/python
cachedir: .cache
rootdir: /Users/dan/w/pstat, inifile:
plugins: xdist-1.20.1, mock-1.6.3, forked-0.2, cov-2.5.1, hypothesis-3.30.4
collected 28 items

test/test_board.py::test_get_node_devices_calls_device PASSED
================================================================================== Hypothesis Statistics ==================================================================================

test/test_board.py::test_get_node_devices_calls_device:

  - 100 passing examples, 0 failing examples, 0 invalid examples
  - Typical runtimes: 4-14 ms
  - Stopped because settings.max_examples=100

=================================================================================== 27 tests deselected ===================================================================================
======================================================================== 1 passed, 27 deselected in 10.20 seconds =========================================================================

Notice that even though the typical run time is roughly the same, the execution time triples. Here is the profiling for v3.30.3

Profiling (from /Users/dan/w/pstat/prof/combined.prof):
Fri Oct  6 18:03:32 2017    /Users/dan/w/pstat/prof/combined.prof

         10990129 function calls (10962229 primitive calls) in 5.074 seconds

   Ordered by: cumulative time
   List reduced from 741 to 20 due to restriction <20>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      4/1    0.000    0.000    5.077    5.077 pluggy.py:263(__init__)
      5/1    0.000    0.000    5.077    5.077 pluggy.py:598(execute)
      3/1    0.000    0.000    5.077    5.077 pluggy.py:238(_wrapped_call)
        1    0.000    0.000    5.077    5.077 runner.py:109(pytest_runtest_call)
        1    0.000    0.000    5.077    5.077 python.py:1167(runtest)
        1    0.000    0.000    5.077    5.077 pluggy.py:743(__call__)
        1    0.000    0.000    5.077    5.077 pluggy.py:336(_hookexec)
        1    0.000    0.000    5.077    5.077 pluggy.py:333(<lambda>)
        1    0.000    0.000    5.077    5.077 python.py:133(pytest_pyfunc_call)
        1    0.017    0.017    5.077    5.077 core.py:931(wrapped_test)
        1    0.000    0.000    5.020    5.020 core.py:747(run)
        1    0.000    0.000    5.020    5.020 engine.py:389(run)
        1    0.004    0.004    5.020    5.020 engine.py:654(_run)
      200    0.191    0.001    5.004    0.025 engine.py:134(test_function)
      200    0.159    0.001    2.605    0.013 core.py:680(evaluate_test_data)
   212115    0.203    0.000    1.921    0.000 {method 'get' of 'dict' objects}
   800081    0.936    0.000    1.834    0.000 _make.py:415(_attrs_to_tuple)
   401321    0.202    0.000    1.259    0.000 _make.py:431(hash_)
   199380    0.153    0.000    1.113    0.000 _make.py:454(eq)
   398760    0.127    0.000    0.961    0.000 _make.py:448(attrs_to_tuple)

And v3.30.4:

Fri Oct  6 18:03:59 2017    /Users/dan/w/pstat/prof/combined.prof

         49196866 function calls (48286483 primitive calls) in 18.292 seconds

   Ordered by: cumulative time
   List reduced from 803 to 20 due to restriction <20>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      4/1    0.000    0.000   18.294   18.294 pluggy.py:263(__init__)
      5/1    0.000    0.000   18.294   18.294 pluggy.py:598(execute)
      3/1    0.000    0.000   18.294   18.294 pluggy.py:238(_wrapped_call)
        1    0.000    0.000   18.294   18.294 runner.py:109(pytest_runtest_call)
        1    0.000    0.000   18.294   18.294 python.py:1167(runtest)
        1    0.000    0.000   18.294   18.294 pluggy.py:743(__call__)
        1    0.000    0.000   18.294   18.294 pluggy.py:336(_hookexec)
        1    0.000    0.000   18.294   18.294 pluggy.py:333(<lambda>)
        1    0.000    0.000   18.294   18.294 python.py:133(pytest_pyfunc_call)
        1    0.004    0.004   18.294   18.294 core.py:931(wrapped_test)
        1    0.000    0.000   18.221   18.221 core.py:747(run)
        1    0.000    0.000   18.221   18.221 engine.py:379(run)
        1    0.000    0.000   18.221   18.221 engine.py:751(_run)
        1    0.003    0.003   17.494   17.494 engine.py:649(generate_new_examples)
      100    0.155    0.002   12.739    0.127 engine.py:121(test_function)
   204737    1.026    0.000   10.680    0.000 engine.py:1259(rescore)
  5260988    4.936    0.000   10.255    0.000 _make.py:415(_attrs_to_tuple)
      100    0.276    0.003    9.566    0.096 engine.py:1211(add)
2803366/2458931    1.392    0.000    7.510    0.000 _make.py:431(hash_)
1228811/1228810    0.933    0.000    6.352    0.000 _make.py:454(eq)

Do you think this is a hypothesis issue or could it be something I'm doing to cause the degradation?

Metadata

Metadata

Assignees

No one assigned

    Labels

    performancego faster! use less memory!

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions