Skip to content

Conversation

@Chillee
Copy link
Collaborator

@Chillee Chillee commented May 24, 2019

Overall Improvements

  1. Switched from using unordered_set to sparse bitset.
  2. Prevent some excessive memory allocations (thanks to @resistor )
  3. Take advantage of the sparse bitset operations
  4. Switch to flat_hash_map instead of unordered_map in some places.

Benchmarks (somewhat approximate, best of a couple runs)

  1. InceptionNet (load + one forward pass): 19.8->13.3
  2. GoogleNet(load + one forward pass): 10.0 -> 7.24
  3. DenseNet (only load): 7.3 -> 5.3

I use the sparse bitset taken from https://llvm.org/doxygen/SparseBitVector_8h_source.html. I had to make some modifications to use __builtin_popcountl and instructions like that instead of other transitive clang dependencies.

Some notes on our graph topologies

In general, our graphs are very sparse, and most of the components aren't connected. For GoogleNet, we have 200k nodes, we do 2k mayAlias queries, and the sum of magnitudes of sets at each node is 500k (ie: every node, on average, reaches 2.5 leaves).

PS: Holy crap macbooks throttle an insane amount with the default fan settings.

@Chillee Chillee requested review from jamesr66a, resistor and suo May 24, 2019 06:39
@pytorchbot pytorchbot added oncall: jit Add this issue/PR to JIT oncall triage queue module: internals Related to internal abstractions in c10 and ATen labels May 24, 2019
@Chillee Chillee force-pushed the optimizeAliasAnalysis branch 2 times, most recently from f19d577 to d33d47c Compare May 24, 2019 06:51
@Chillee Chillee changed the title [WIP] Optimize alias analysis [jit] Optimize alias analysis May 24, 2019
@suo
Copy link
Member

suo commented May 24, 2019

This looks good, but the PR is pretty big. Seems like a good opportunity to check out ghstack, our emulation of phabricator diff stacking. Can you put each of items 1-4 on their own stacked PR?

Copy link
Member

@suo suo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see above comment. Also remember to clang-format :)

@Chillee Chillee force-pushed the optimizeAliasAnalysis branch from d33d47c to 380dc05 Compare May 24, 2019 17:37
@Chillee Chillee force-pushed the optimizeAliasAnalysis branch 3 times, most recently from a4582fc to c3bf65d Compare May 24, 2019 18:11
@Chillee Chillee force-pushed the optimizeAliasAnalysis branch from c3bf65d to 4be1efb Compare May 24, 2019 18:31
Copy link
Member

@suo suo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good so far! Few comments inline.

@Chillee Chillee force-pushed the optimizeAliasAnalysis branch from 2547097 to f38ea25 Compare May 24, 2019 23:10
@Chillee
Copy link
Collaborator Author

Chillee commented May 27, 2019

@pytorchbot rebase this please

@Chillee
Copy link
Collaborator Author

Chillee commented May 28, 2019

I just realized - wouldn't this be a good use of bloom filters? The primary bottleneck is memory allocation, which bloom filters solve. Intersection/union can also be implemented as bit operations (which would be even faster than what we have now, since they could be sense).

The only issue is that bloom filters are only probabilistically correct for membership queries. If an element is in the set it's guaranteed to return true. If it's not, then with arbitrarily high probability (depending on memory usage) it'll return false.

Also, we never need to remove elements from the set. We've mentioned that we might want to remove edges, but regardless, we'd need to retraverse our DAG

@resistor
Copy link
Contributor

I just realized - wouldn't this be a good use of bloom filters?

It could be, but I'd prefer to bottom out on how well we can do without going probabilistic first.

@Chillee Chillee force-pushed the optimizeAliasAnalysis branch from 5f6f5c1 to 4549d42 Compare May 28, 2019 17:40
@Chillee Chillee force-pushed the optimizeAliasAnalysis branch from 7c3f3be to 1272912 Compare May 28, 2019 18:37
@Chillee Chillee force-pushed the optimizeAliasAnalysis branch from 5798c2d to 99674d1 Compare May 29, 2019 02:23
Copy link
Member

@suo suo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm except for some name bikeshedding

@Chillee Chillee force-pushed the optimizeAliasAnalysis branch 2 times, most recently from 010543f to 04484fc Compare May 30, 2019 00:40
@Chillee
Copy link
Collaborator Author

Chillee commented May 30, 2019

@pytorchbot rebase this please.

@Chillee Chillee force-pushed the optimizeAliasAnalysis branch from fc993fa to b044739 Compare May 30, 2019 01:03
Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Chillee is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@Chillee merged this pull request in 4163576.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Merged module: internals Related to internal abstractions in c10 and ATen oncall: jit Add this issue/PR to JIT oncall triage queue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants