Minor JIT improvements #11654

apaszke · 2018-09-13T18:15:46Z

Disable addmm fusion. The reason for this is explained in the comment.
Tiny change in stack.h that lets us avoid constructing an unnecessary temporary IValue on the (C++) stack (it will only get created on the interpreter stack directly).
Fixed a correctness issue in requires grad propagation

facebook-github-bot

apaszke has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot

apaszke has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot

apaszke has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot

apaszke has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

zdevito

Still looks good.

Summary: - Disable addmm fusion. The reason for this is explained in the comment. - Tiny change in `stack.h` that lets us avoid constructing an unnecessary temporary `IValue` on the (C++) stack (it will only get created on the interpreter stack directly). - Fixed a correctness issue in requires grad propagation Pull Request resolved: pytorch/pytorch#11654 Reviewed By: colesbury Differential Revision: D9813739 Pulled By: apaszke fbshipit-source-id: 23e83bc8605802f39bfecf447efad9239b9421c3

jmp84 · 2018-10-10T05:32:14Z

@apaszke, this is causing a ~200ms latency regression for NMT models. Here are the top lines from perf before/after this change:

Before:

43.31% NmtBenchmark NmtBenchmark [.] mkl_blas_avx2_sgemm_kernel_nocopy_NN_b0
8.67% NmtBenchmark NmtBenchmark [.] loop_inner866
4.42% NmtBenchmark libc-2.23.so [.] __memcpy_avx_unaligned
3.20% NmtBenchmark NmtBenchmark [.] caffe2::math::(anonymous namespace)::ReduceTensor<float, std::plus >
3.04% NmtBenchmark NmtBenchmark [.] caffe2::math::utils::GetIndexFromDims
2.84% NmtBenchmark NmtBenchmark [.] mkl_blas_avx2_xsgemv_t
2.84% NmtBenchmark NmtBenchmark [.] fbgemm::PackedGemmMatrix<signed char, int, true>::addr_
2.76% CaffeTaskThread NmtBenchmark [.] mkl_blas_avx2_xsgemv_t

After:

29.04% NmtBenchmark NmtBenchmark [.] mkl_blas_avx2_sgemm_kernel_nocopy_NN_b0
17.97% NmtBenchmark NmtBenchmark [.] mkl_blas_avx2_sgemm_kernel_nocopy_TN_b1
8.02% NmtBenchmark NmtBenchmark [.] mkl_blas_avx2_sgemm_kernel_nocopy_TN_b0
5.95% CaffeTaskThread NmtBenchmark [.] mkl_blas_avx2_xsgemv_t
4.28% NmtBenchmark libc-2.23.so [.] __memcpy_avx_unaligned

Is there more info I can provide to help debug this? Also cc @jamesr66a who is familiar with these models and how to optimize them.

apaszke · 2018-10-10T13:00:07Z

@jmp84 can you please provide me with some example tensor sizes that appear as inputs to your GEMMs? It looks like we started triggering the transposed kernels in some cases which was not the case previously (TN vs NN).

apaszke · 2018-10-10T13:00:21Z

Finally, how do you run those models? Is that via ONNX export or what?

jamesr66a · 2018-10-10T16:50:18Z

@apaszke I think what's going on here is that we are not hitting the special quantized implementation of FC in the caffe2 backend due to the ONNX-caffe2 backend emitting "MatMul" + "Add" instead of "FC". This can be seen from loop_inner866 being present before but not after this patch. (That label is from an autogenerated GEMM kernel). I'm going to send in a patch that adds addmm fusion on the ONNX export path, but not the regular path

dzhulgakov · 2018-10-11T08:38:22Z

So for the addmm optimization - “not helpful at all” means neutral or negative? Of neutral - why not kee it if it matters for ONNX export?

apaszke · 2018-10-11T13:55:32Z

It means negative in some RNN use cases I saw

apaszke requested review from colesbury, ezyang, gchanan, soumith and zdevito as code owners September 13, 2018 18:15

pytorchbot added the oncall: jit Add this issue/PR to JIT oncall triage queue label Sep 13, 2018

zdevito approved these changes Sep 13, 2018

View reviewed changes

facebook-github-bot reviewed Sep 13, 2018

View reviewed changes

apaszke force-pushed the minor_jit_improve branch from 288640d to 71140ab Compare September 17, 2018 17:24

apaszke requested review from Yangqing, anderspapitto, bddppq, dzhulgakov, houseroad, jamesr66a and smessmer as code owners September 17, 2018 17:24

facebook-github-bot reviewed Sep 17, 2018

View reviewed changes

apaszke added 3 commits September 17, 2018 15:04

Minor JIT improvements

c739721

Fix requires grad prop, accept test expects

8cd7cc6

Better fix for requires_grad prop

a95ec20

apaszke force-pushed the minor_jit_improve branch from 71140ab to a95ec20 Compare September 17, 2018 22:12

facebook-github-bot reviewed Sep 17, 2018

View reviewed changes

Update ONNX expect

c6930e0

facebook-github-bot reviewed Sep 18, 2018

View reviewed changes

zdevito approved these changes Sep 18, 2018

View reviewed changes

apaszke mentioned this pull request Sep 21, 2018

add autodiff expressions for common operations #11832

Closed

facebook-github-bot closed this in 1ad7e0c Sep 21, 2018

jmp84 mentioned this pull request Oct 10, 2018

[JIT] Enable addmm fusion for ONNX export only #12538

Closed

ezyang added open source merged labels Jun 24, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Minor JIT improvements #11654

Minor JIT improvements #11654

Uh oh!

apaszke commented Sep 13, 2018 •

edited

Loading

Uh oh!

facebook-github-bot left a comment

Uh oh!

facebook-github-bot left a comment

Uh oh!

facebook-github-bot left a comment

Uh oh!

facebook-github-bot left a comment

Uh oh!

zdevito left a comment

Uh oh!

jmp84 commented Oct 10, 2018

Uh oh!

apaszke commented Oct 10, 2018

Uh oh!

apaszke commented Oct 10, 2018

Uh oh!

jamesr66a commented Oct 10, 2018

Uh oh!

dzhulgakov commented Oct 11, 2018

Uh oh!

apaszke commented Oct 11, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

Minor JIT improvements #11654

Minor JIT improvements #11654

Uh oh!

Conversation

apaszke commented Sep 13, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

zdevito left a comment

Choose a reason for hiding this comment

Uh oh!

jmp84 commented Oct 10, 2018

Uh oh!

apaszke commented Oct 10, 2018

Uh oh!

apaszke commented Oct 10, 2018

Uh oh!

jamesr66a commented Oct 10, 2018

Uh oh!

dzhulgakov commented Oct 11, 2018

Uh oh!

apaszke commented Oct 11, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

apaszke commented Sep 13, 2018 •

edited

Loading