Skip to content

Conversation

@ljk53
Copy link
Contributor

@ljk53 ljk53 commented Nov 20, 2019

Stack from ghstack:

Summary:
Create script to produce libtorch that only contains ops needed by specific
models. Developers can use this workflow to further optimize mobile build size.

Need keep a dummy stub for unused (stripped) ops because some JIT side
logic requires certain function schemas to be existed in the JIT op
registry.

Test Steps:

  1. Build "dump_operator_names" binary and use it to dump root ops needed
    by a specific model:
build/bin/dump_operator_names --model=mobilenetv2.pk --output=mobilenetv2.yaml
  1. The MobileNetV2 model should use the following ops:
- aten::t
- aten::dropout
- aten::mean.dim
- aten::add.Tensor
- prim::ListConstruct
- aten::addmm
- aten::_convolution
- aten::batch_norm
- aten::hardtanh_
- aten::mm

NOTE that for some reason it outputs "aten::addmm" but actually uses "aten::mm".
You need fix it manually for now.

  1. Run custom build script locally (use Android as an example):
SELECTED_OP_LIST=mobilenetv2.yaml scripts/build_pytorch_android.sh armeabi-v7a
  1. Checkout demo app that uses locally built library instead of
    downloading from jcenter repo:
git clone --single-branch --branch custom_build [email protected]:ljk53/android-demo-app.git
  1. Copy locally built libraries to demo app folder:
find ${HOME}/src/pytorch/android -name '*.aar' -exec cp {} ${HOME}/src/android-demo-app/HelloWorldApp/app/libs/ \;
  1. Build demo app with locally built libtorch:
cd ${HOME}/src/android-demo-app/HelloWorldApp
./gradlew clean && ./gradlew assembleDebug
  1. Install and run the demo app.

In-APK arm-v7 libpytorch_jni.so build size reduced from 5.5M to 2.9M.

Differential Revision: D18612127

Summary:
Create script to produce libtorch that only contains ops needed by specific
models. Developers can use this workflow to further optimize mobile build size.

Need keep a dummy stub for unused (stripped) ops because some JIT side
logic requires certain function schemas to be existed in the JIT op
registry.

Test Steps:
1. Build "dump_operator_names" binary and use it to dump root ops needed
by a specific model:
```
build/bin/dump_operator_names --model=mobilenetv2.pk --output=mobilenetv2.yaml
```

2. The MobileNetV2 model should use the following ops:
```
- aten::t
- aten::dropout
- aten::mean.dim
- aten::add.Tensor
- prim::ListConstruct
- aten::addmm
- aten::_convolution
- aten::batch_norm
- aten::hardtanh_
- aten::mm
```
NOTE that for some reason it outputs "aten::addmm" but actually uses "aten::mm".
You need fix it manually for now.

3. Run custom build script locally (use Android as an example):
```
SELECTED_OP_LIST=mobilenetv2.yaml scripts/build_pytorch_android.sh armeabi-v7a
```

4. Checkout demo app that uses locally built library instead of
downloading from jcenter repo:
```
git clone --single-branch --branch custom_build [email protected]:ljk53/android-demo-app.git
```

5. Copy locally built libraries to demo app folder:
```
find ${HOME}/src/pytorch/android -name '*.aar' -exec cp {} ${HOME}/src/android-demo-app/HelloWorldApp/app/libs/ \;
```

6. Build demo app with locally built libtorch:
```
cd ${HOME}/src/android-demo-app/HelloWorldApp
./gradlew clean && ./gradlew assembleDebug
```

7. Install and run the demo app.

In-APK arm-v7 libpytorch_jni.so build size reduced from 5.5M to 2.9M.

[ghstack-poisoned]
@facebook-github-bot facebook-github-bot added the oncall: jit Add this issue/PR to JIT oncall triage queue label Nov 20, 2019
Summary:
Create script to produce libtorch that only contains ops needed by specific
models. Developers can use this workflow to further optimize mobile build size.

Need keep a dummy stub for unused (stripped) ops because some JIT side
logic requires certain function schemas to be existed in the JIT op
registry.

Test Steps:
1. Build "dump_operator_names" binary and use it to dump root ops needed
by a specific model:
```
build/bin/dump_operator_names --model=mobilenetv2.pk --output=mobilenetv2.yaml
```

2. The MobileNetV2 model should use the following ops:
```
- aten::t
- aten::dropout
- aten::mean.dim
- aten::add.Tensor
- prim::ListConstruct
- aten::addmm
- aten::_convolution
- aten::batch_norm
- aten::hardtanh_
- aten::mm
```
NOTE that for some reason it outputs "aten::addmm" but actually uses "aten::mm".
You need fix it manually for now.

3. Run custom build script locally (use Android as an example):
```
SELECTED_OP_LIST=mobilenetv2.yaml scripts/build_pytorch_android.sh armeabi-v7a
```

4. Checkout demo app that uses locally built library instead of
downloading from jcenter repo:
```
git clone --single-branch --branch custom_build [email protected]:ljk53/android-demo-app.git
```

5. Copy locally built libraries to demo app folder:
```
find ${HOME}/src/pytorch/android -name '*.aar' -exec cp {} ${HOME}/src/android-demo-app/HelloWorldApp/app/libs/ \;
```

6. Build demo app with locally built libtorch:
```
cd ${HOME}/src/android-demo-app/HelloWorldApp
./gradlew clean && ./gradlew assembleDebug
```

7. Install and run the demo app.

In-APK arm-v7 libpytorch_jni.so build size reduced from 5.5M to 2.9M.

[ghstack-poisoned]
ljk53 added a commit that referenced this pull request Nov 20, 2019
Summary:
Create script to produce libtorch that only contains ops needed by specific
models. Developers can use this workflow to further optimize mobile build size.

Need keep a dummy stub for unused (stripped) ops because some JIT side
logic requires certain function schemas to be existed in the JIT op
registry.

Test Steps:
1. Build "dump_operator_names" binary and use it to dump root ops needed
by a specific model:
```
build/bin/dump_operator_names --model=mobilenetv2.pk --output=mobilenetv2.yaml
```

2. The MobileNetV2 model should use the following ops:
```
- aten::t
- aten::dropout
- aten::mean.dim
- aten::add.Tensor
- prim::ListConstruct
- aten::addmm
- aten::_convolution
- aten::batch_norm
- aten::hardtanh_
- aten::mm
```
NOTE that for some reason it outputs "aten::addmm" but actually uses "aten::mm".
You need fix it manually for now.

3. Run custom build script locally (use Android as an example):
```
SELECTED_OP_LIST=mobilenetv2.yaml scripts/build_pytorch_android.sh armeabi-v7a
```

4. Checkout demo app that uses locally built library instead of
downloading from jcenter repo:
```
git clone --single-branch --branch custom_build [email protected]:ljk53/android-demo-app.git
```

5. Copy locally built libraries to demo app folder:
```
find ${HOME}/src/pytorch/android -name '*.aar' -exec cp {} ${HOME}/src/android-demo-app/HelloWorldApp/app/libs/ \;
```

6. Build demo app with locally built libtorch:
```
cd ${HOME}/src/android-demo-app/HelloWorldApp
./gradlew clean && ./gradlew assembleDebug
```

7. Install and run the demo app.

In-APK arm-v7 libpytorch_jni.so build size reduced from 5.5M to 2.9M.

ghstack-source-id: ac089ad
Pull Request resolved: #30144
@ljk53
Copy link
Contributor Author

ljk53 commented Nov 20, 2019

@iseeyuan do you know why MobileNetV2 contains "aten::addmm" operator in the instruction list but it ended up calling "aten::mm"? Do we need first look up JIT registry to "resolve" the operator binding before dumping?

@iseeyuan
Copy link
Contributor

@ljk53 It may be caused by JIT optimization passes as well. Let me check if it's possible to dump the op list right before the interpreter running stage. It may be deeper in the graph executor.

@ljk53
Copy link
Contributor Author

ljk53 commented Nov 20, 2019

@ljk53 It may be caused by JIT optimization passes as well. Let me check if it's possible to dump the op list right before the interpreter running stage. It may be deeper in the graph executor.

Is it possible to do different JIT optimization on mobile v.s. on server (might depend on input as well)? If we dump operator list on server how can we make sure it covers all possible optimization passes? cc: @zdevito

@ljk53
Copy link
Contributor Author

ljk53 commented Nov 20, 2019

BTW, I think this script can be reviewed independently of the op-dump issue. We can fix the issue separately.

@ezyang
Copy link
Contributor

ezyang commented Nov 20, 2019

Can we put the docs in a place more durable than a PR description?

@iseeyuan
Copy link
Contributor

@ljk53 It may be caused by JIT optimization passes as well. Let me check if it's possible to dump the op list right before the interpreter running stage. It may be deeper in the graph executor.

Is it possible to do different JIT optimization on mobile v.s. on server (might depend on input as well)? If we dump operator list on server how can we make sure it covers all possible optimization passes? cc: @zdevito

It may be different from the final bytecode in full JIT. For lite interpreter we dump the bytecode from the original module, without JIT opt passes. We have one set of bytecode independent of input-based optimizations. The performance should not be affected significantly (cc @zdevito ). The actual performance difference is pending to be measured in detail.

@iseeyuan
Copy link
Contributor

Probably a stupid idea: it may not be hard to disable opt passes in mobile (in general, not just for lite interpreter). In that sense we don't need to worry about the input-dependent passes and different ops introduced by each pass. Not sure if it can be at least a short-term solution.


set(ONNX_NAMESPACE "onnx_torch" CACHE STRING "A namespace for ONNX; needed to build with other frameworks that share ONNX.")
set(SELECTED_OP_LIST "" CACHE STRING
"Path to the yaml file that contains the list of operators to include for custom build. Include all operators by default.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For doc discoverability, would be good to have a link to the relevant docs here

@facebook-github-bot
Copy link
Contributor

@ljk53 merged this pull request in 43fb001.

ljk53 added a commit that referenced this pull request Nov 22, 2019
… custom build

Summary:
PR #30144 introduced custom build script to tailor build to specific
models. It requires a list of all potentially used ops at build time.

Some JIT optimization passes can transform the IR by replacing
operators, e.g. decompose pass can replace aten::addmm with aten::mm if
coefficients are 1s.

Disabling optimization pass can ensure that the list of ops we dump from
the model is the list of ops that are needed.

Test Plan:
- rerun the test on PR #30144 to verify the raw list without aten::mm works.

[ghstack-poisoned]
ljk53 added a commit that referenced this pull request Nov 22, 2019
… custom build

Summary:
PR #30144 introduced custom build script to tailor build to specific
models. It requires a list of all potentially used ops at build time.

Some JIT optimization passes can transform the IR by replacing
operators, e.g. decompose pass can replace aten::addmm with aten::mm if
coefficients are 1s.

Disabling optimization pass can ensure that the list of ops we dump from
the model is the list of ops that are needed.

Test Plan:
- rerun the test on PR #30144 to verify the raw list without aten::mm works.

ghstack-source-id: 28e4a40
Pull Request resolved: #30285
facebook-github-bot pushed a commit that referenced this pull request Nov 22, 2019
)

Summary:
Pull Request resolved: #30285

PR #30144 introduced custom build script to tailor build to specific
models. It requires a list of all potentially used ops at build time.

Some JIT optimization passes can transform the IR by replacing
operators, e.g. decompose pass can replace aten::addmm with aten::mm if
coefficients are 1s.

Disabling optimization pass can ensure that the list of ops we dump from
the model is the list of ops that are needed.

Test Plan: - rerun the test on PR #30144 to verify the raw list without aten::mm works.

Differential Revision: D18652777

Pulled By: ljk53

fbshipit-source-id: 084751cb9a9ee16d8df7e743e9e5782ffd8bc4e3
@facebook-github-bot facebook-github-bot deleted the gh/ljk53/74/head branch November 24, 2019 15:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Merged oncall: jit Add this issue/PR to JIT oncall triage queue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants