Problem: Validation code not optimized#2490
Conversation
Solution: Memoize operations which generate same results
|
|
||
| # to query the transactions for a transaction id, this field is unique | ||
| conn.conn[dbname]['transactions'].create_index('id', | ||
| unique=True, |
There was a problem hiding this comment.
Good point.
Note for other PRs: feel free to make a PR for this specific issue, so we don't mix concerns 👏
| def create_blocks_secondary_index(conn, dbname): | ||
| conn.conn[dbname]['blocks']\ | ||
| .create_index([('height', DESCENDING)], name='height') | ||
| .create_index([('height', DESCENDING)], name='height', unique=True) |
There was a problem hiding this comment.
Same as the previous comment 🙂
bigchaindb/common/memoize.py
Outdated
|
|
||
| class HDict(dict): | ||
| def __hash__(self): | ||
| return int.from_bytes(codecs.decode(self['id'], 'hex'), 'big') |
There was a problem hiding this comment.
I had a similar problem recently. While your code converts the hex string representing the transaction.id to a number, a simpler approach is to just use int(self['id'], 16) (I was actually quite surprised of its simplicity when I found it out).
In [1]: int('437752a2c5c3cf2ab8ff6254ca8c0fb417a0951ab651c42481474dd9347971a7', 16) == int.from_bytes(codecs.decode('437752a2c5c3cf2ab8ff6254ca8c0fb417a0951ab651c42481474dd9347971a7', 'hex'), 'big')
Out[1]: TrueI was curious about your approach so I compared performance of the two approaches:
In [2]: %timeit int('437752a2c5c3cf2ab8ff6254ca8c0fb417a0951ab651c42481474dd9347971a7', 16)
382 ns ± 3.24 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [3]: %timeit int.from_bytes(codecs.decode('437752a2c5c3cf2ab8ff6254ca8c0fb417a0951ab651c42481474dd9347971a7', 'hex'), 'big')
1.72 µs ± 8.16 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)There was a problem hiding this comment.
Switched to hash(), following are the stats
In [1]: %timeit hash('437752a2c5c3cf2ab8ff6254ca8c0fb417a0951ab651c42481474dd9347971a7')
70.4 ns ± 0.625 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
bigchaindb/common/memoize.py
Outdated
| @functools.wraps(func) | ||
| def memoized_func(*args, **kwargs): | ||
| print(args) | ||
| new_args = (args[0], HDict(args[1]), args[2]) |
There was a problem hiding this comment.
That's difficult to understand, can you please add some comments around this code (and the equivalent for memoize_to_dict.
Solution: use `hash()` function instead of `int.from_bytes()`
Solution: remove if condition
Solution: Clear from_dict cache
Solution: Enable memoization decorator
Codecov Report
@@ Coverage Diff @@
## master #2490 +/- ##
==========================================
+ Coverage 91.73% 91.87% +0.14%
==========================================
Files 41 42 +1
Lines 2467 2511 +44
==========================================
+ Hits 2263 2307 +44
Misses 204 204 |
Solution: enable memoization and fix failing tests
Solution: Add tests for `to_dict` and `from_dict` memoization
Solution: clear cache in bdb fixture
Solution: Fix flake8 issue
|
I run Speed with this patch: We now are 27% faster in transaction validation! |
Solution: Use memoization for functions with static validation