Skip to content

Conversation

@m4drat
Copy link
Contributor

@m4drat m4drat commented Feb 7, 2023

Hi!

I've been fuzzing different pytorch modules, and found a crash inside one of them.

Specifically, I'm talking about a module that processes script_call rpc requests and a function ScriptCall::fromIValues(std::vector<at::IValue>& ivalues).

Running this test case causes a crash that occurs when ivalues.back() is called script_call.cpp:90. The crash occurs because the vector ivalues is empty.

All tests were performed on this pytorch version: abc54f93145830b502400faa92bec86e05422fbd

The provided patch checks if there are enough elements in the ivalues vector.

How to reproduce

  1. To reproduce the crash, use provided docker: Dockerfile

  2. Build the container: docker build -t oss-sydr-fuzz-pytorch-reproduce .

  3. Copy crash file to the current directory:

  4. Run the container: docker run --privileged --network host -v `pwd`:/homedir --rm -it oss-sydr-fuzz-pytorch-reproduce /bin/bash

  5. And execute the binary: /message_deserialize_fuzz /homedir/crash-9f76d4e37a2391136a4ce07d47269db1e063e4b4

After execution completes you will see this stacktrace:

AddressSanitizer:DEADLYSIGNAL
=================================================================
==57==ERROR: AddressSanitizer: SEGV on unknown address (pc 0x0000008e7b19 bp 0x7ffd2fdded70 sp 0x7ffd2fddec40 T0)
==57==The signal is caused by a READ memory access.
==57==Hint: this fault was caused by a dereference of a high value address (see register values below).  Disassemble the provided pc to learn which register was used.
    #0 0x8e7b19 in c10::IValue::isString() const /pytorch_fuzz/aten/src/ATen/core/ivalue.h:639:27
    #1 0x8e7b19 in c10::IValue::toStringRef[abi:cxx11]() const /pytorch_fuzz/aten/src/ATen/core/ivalue_inl.h:2179:3
    #2 0xe04fb58 in torch::distributed::rpc::ScriptCall::fromIValues(std::vector<c10::IValue, std::allocator<c10::IValue> >&) /pytorch_fuzz/torch/csrc/distributed/rpc/script_call.cpp:90:53
    #3 0xe0511f0 in torch::distributed::rpc::ScriptCall::fromMessage(torch::distributed::rpc::Message const&) /pytorch_fuzz/torch/csrc/distributed/rpc/script_call.cpp:133:10
    #4 0xe0ff71e in torch::distributed::rpc::deserializeRequest(torch::distributed::rpc::Message const&) /pytorch_fuzz/torch/csrc/distributed/rpc/utils.cpp:102:14
    #5 0x602a41 in LLVMFuzzerTestOneInput /message_deserialize_fuzz.cc:192:27
    #6 0x52ce61 in fuzzer::Fuzzer::ExecuteCallback(unsigned char const*, unsigned long) /llvm-project/compiler-rt/lib/fuzzer/FuzzerLoop.cpp:611:15
    #7 0x516d7c in fuzzer::RunOneTest(fuzzer::Fuzzer*, char const*, unsigned long) /llvm-project/compiler-rt/lib/fuzzer/FuzzerDriver.cpp:324:6
    #8 0x51cacb in fuzzer::FuzzerDriver(int*, char***, int (*)(unsigned char const*, unsigned long)) /llvm-project/compiler-rt/lib/fuzzer/FuzzerDriver.cpp:860:9
    #9 0x546062 in main /llvm-project/compiler-rt/lib/fuzzer/FuzzerMain.cpp:20:10
    #10 0x7f41e42a8082 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x24082)
    #11 0x51169d in _start (/message_deserialize_fuzz+0x51169d)

AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV /pytorch_fuzz/aten/src/ATen/core/ivalue.h:639:27 in c10::IValue::isString() const
==57==ABORTING

@pytorch-bot
Copy link

pytorch-bot bot commented Feb 7, 2023

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/94297

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure

As of commit be5fd1f:

NEW FAILURE - The following job has failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@soulitzer soulitzer added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Feb 8, 2023
@apach301
Copy link
Contributor

apach301 commented Apr 5, 2023

@soulitzer @mrshenli @rohan-varma
Hi! Could you please review these changes? PR tries to fix sigsegv in RPC-routine

@SweetVishnya
Copy link

@ezyang, can you, please, review the patch fixing SEGV crash in Distributed RPC?

@ezyang
Copy link
Contributor

ezyang commented Apr 28, 2023

Test in pt would be great!

@ezyang
Copy link
Contributor

ezyang commented Apr 28, 2023

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Apr 28, 2023
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: This PR is too stale; the last push date was more than 3 days ago. Please rebase and try again. You can rebase and merge by leaving the following comment on this PR:
@pytorchbot merge -r
Or just rebase by leaving @pytorchbot rebase comment

Details for Dev Infra team Raised by workflow job

@SweetVishnya
Copy link

@pytorchbot rebase

@pytorch-bot
Copy link

pytorch-bot bot commented Apr 28, 2023

You don't have permissions to rebase this PR since you are a first time contributor. If you think this is a mistake, please contact PyTorch Dev Infra.

@m4drat
Copy link
Contributor Author

m4drat commented Apr 28, 2023

@pytorchbot rebase

@pytorchmergebot
Copy link
Collaborator

@pytorchbot successfully started a rebase job. Check the current status here

@pytorchmergebot
Copy link
Collaborator

Successfully rebased rpc-check-size-before-calling-back onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout rpc-check-size-before-calling-back && git pull --rebase)

@pytorchmergebot pytorchmergebot force-pushed the rpc-check-size-before-calling-back branch from 10a960d to be5fd1f Compare April 28, 2023 11:59
@ezyang
Copy link
Contributor

ezyang commented Apr 28, 2023

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: 1 mandatory check(s) failed. The first few are:

Dig deeper by viewing the failures on hud

Details for Dev Infra team Raised by workflow job

Failing merge rule: Core Maintainers

@SweetVishnya
Copy link

@ezyang, build-docs-functorch-false seems to fail on main branch.

@ezyang
Copy link
Contributor

ezyang commented Apr 29, 2023

@pytorchbot merge -f "master failure only"

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk Trigger trunk jobs on your pull request Merged merging open source release notes: distributed (rpc) release notes category triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants