Skip to content

Problem: Validation code not optimized#2490

Merged
vrde merged 16 commits intobigchaindb:masterfrom
kansi:feat/memoize
Sep 4, 2018
Merged

Problem: Validation code not optimized#2490
vrde merged 16 commits intobigchaindb:masterfrom
kansi:feat/memoize

Conversation

@kansi
Copy link
Copy Markdown
Contributor

@kansi kansi commented Aug 29, 2018

Solution: Use memoization for functions with static validation

Copy link
Copy Markdown
Contributor

@vrde vrde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests are missing 👮‍♂️


# to query the transactions for a transaction id, this field is unique
conn.conn[dbname]['transactions'].create_index('id',
unique=True,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point.

Note for other PRs: feel free to make a PR for this specific issue, so we don't mix concerns 👏

def create_blocks_secondary_index(conn, dbname):
conn.conn[dbname]['blocks']\
.create_index([('height', DESCENDING)], name='height')
.create_index([('height', DESCENDING)], name='height', unique=True)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as the previous comment 🙂


class HDict(dict):
def __hash__(self):
return int.from_bytes(codecs.decode(self['id'], 'hex'), 'big')
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a similar problem recently. While your code converts the hex string representing the transaction.id to a number, a simpler approach is to just use int(self['id'], 16) (I was actually quite surprised of its simplicity when I found it out).

In [1]: int('437752a2c5c3cf2ab8ff6254ca8c0fb417a0951ab651c42481474dd9347971a7', 16) == int.from_bytes(codecs.decode('437752a2c5c3cf2ab8ff6254ca8c0fb417a0951ab651c42481474dd9347971a7', 'hex'), 'big')
Out[1]: True

I was curious about your approach so I compared performance of the two approaches:

In [2]: %timeit int('437752a2c5c3cf2ab8ff6254ca8c0fb417a0951ab651c42481474dd9347971a7', 16)
382 ns ± 3.24 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [3]: %timeit int.from_bytes(codecs.decode('437752a2c5c3cf2ab8ff6254ca8c0fb417a0951ab651c42481474dd9347971a7', 'hex'), 'big')
1.72 µs ± 8.16 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switched to hash(), following are the stats

In [1]: %timeit hash('437752a2c5c3cf2ab8ff6254ca8c0fb417a0951ab651c42481474dd9347971a7')
70.4 ns ± 0.625 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

@functools.wraps(func)
def memoized_func(*args, **kwargs):
print(args)
new_args = (args[0], HDict(args[1]), args[2])
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's difficult to understand, can you please add some comments around this code (and the equivalent for memoize_to_dict.

@codecov-io
Copy link
Copy Markdown

codecov-io commented Sep 3, 2018

Codecov Report

Merging #2490 into master will increase coverage by 0.14%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master    #2490      +/-   ##
==========================================
+ Coverage   91.73%   91.87%   +0.14%     
==========================================
  Files          41       42       +1     
  Lines        2467     2511      +44     
==========================================
+ Hits         2263     2307      +44     
  Misses        204      204

Solution: enable memoization and fix failing tests
Solution: Add tests for `to_dict` and `from_dict` memoization
@kansi kansi requested review from ldmberman and vrde September 4, 2018 10:04
@vrde
Copy link
Copy Markdown
Contributor

vrde commented Sep 4, 2018

I run prof.py to check the improvement, this transaction validation speed on master:

 /tmp ➜ python prof.py
Create 1000 transactions
Start serial validation
  Total time: 4.285186
  Time per transaction: 0.004285

Speed with this patch:

 /tmp ➜ python prof.py
Create 1000 transactions
Start serial validation
  Total time: 3.363722
  Time per transaction: 0.003364

We now are 27% faster in transaction validation!
🏇 🏇 🏇

@vrde vrde merged commit cb22557 into bigchaindb:master Sep 4, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants