[caffe2] Fix signal handler deleting siginfo_t in resulting Coredump by Jlalond · Pull Request #174247 · pytorch/pytorch

Jlalond · 2026-02-03T23:32:22Z

Summary:
This patch fixes the loss of signal info in Coredumps produced by caffe2 apps when they crash.

The culprit is the signal handler's call to raise after unregistering itself. Raise under the hood actually calls tgkill which replaces whatever the data into the siginfo_t with the uid and pid of the calling process. This means when the signal and re-raised and the process coredumps, the reason for the coredump is something like SEGV sent by=your pid, your user without the address info or the SI_CODE from the original signal. We fix this by calling raise signal directly with the original signal.

This is a port of yfeldblum's change in Folly Signal Handler to caffe2.

Test Plan:
In the diff above this one creates a small app that loads the caffe2 app and then SEGV's. Then inspecting the core locally

(lldb) thread siginfo
thread #1: tid = 1711969, 0x000000000024f76a, name = 'signal_handler_', stop reason = SIGSEGV: address not mapped to object (fault address=0x1000)

(__lldb_siginfo_t) __lldb_siginfo = {
  si_signo = 11
  si_errno = 0
  si_code = 1
  __pad0 = 0
  _sifields = {
    _kill = (si_pid = 4096, si_uid = 0)
    _timer = {
      si_tid = 4096
      si_overrun = 0
      si_sigval = (sival_int = 0, sival_ptr = 0x0000000000000000)
    }
    _rt = {
      si_pid = 4096
      si_uid = 0
      si_sigval = (sival_int = 0, sival_ptr = 0x0000000000000000)
    }
    _sigchld = (si_pid = 4096, si_uid = 0, si_status = 0, si_utime = 0, si_stime = 0)
    _sigfault = {
      si_addr = 0x0000000000001000
      si_addr_lsb = 0
      _bounds = {
        _addr_bnd = (_lower = 0x0000000000000000, _upper = 0x0000000000000000)
        _pkey = 0
      }
    }
    _sigpoll = (si_band = 4096, si_fd = 0)
    _sigsys = (_call_addr = 0x0000000000001000, _syscall = 0, _arch = 0)
  }
}

And we see the siginfo contains the address which triggered the original SEGV.

Differential Revision: D92093984

pytorch-bot · 2026-02-03T23:32:26Z

This appears to be a diff that was exported from phabricator, but the PR author does not have sufficient permissions to run CI. @Jlalond, please do step 2 of internal wiki to get write access so you do not need to get CI approvals in the future. If you think this is a mistake, please contact the Pytorch Dev Infra team.

linux-foundation-easycla · 2026-02-03T23:32:29Z

The committers listed above are authorized under a signed CLA.

✅ login: Jlalond / name: Jacob Lalonde (dc95547)

pytorch-bot · 2026-02-03T23:32:29Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/174247

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit dc95547 with merge base 91ee748 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

pytorch-bot · 2026-02-03T23:32:32Z

This PR needs a `release notes:` label

If your changes are user facing and intended to be a part of release notes, please use a label starting with release notes:.

If not, please add the topic: not user facing label.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "topic: not user facing"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

meta-codesync · 2026-02-03T23:32:35Z

@Jlalond has exported this pull request. If you are a Meta employee, you can view the originating Diff in D92093984.

…ytorch#174247) Summary: This patch fixes the loss of signal info in Coredumps produced by caffe2 apps when they crash. The culprit is the signal handler's call to `raise` after unregistering itself. Raise under the hood actually calls `tgkill` which replaces whatever the data into the `siginfo_t` with the uid and pid of the calling process. This means when the signal and re-raised and the process coredumps, the reason for the coredump is something like `SEGV sent by=your pid, your user` without the address info or the SI_CODE from the original signal. We fix this by calling raise signal directly with the original signal. This is a port of yfeldblum's change in [Folly Signal Handler](facebook/folly@79d7f8e) to caffe2. Test Plan: In the diff above this one creates a small app that loads the caffe2 app and then SEGV's. Then inspecting the core locally ``` (lldb) thread siginfo thread pytorch#1: tid = 1711969, 0x000000000024f76a, name = 'signal_handler_', stop reason = SIGSEGV: address not mapped to object (fault address=0x1000) (__lldb_siginfo_t) __lldb_siginfo = { si_signo = 11 si_errno = 0 si_code = 1 __pad0 = 0 _sifields = { _kill = (si_pid = 4096, si_uid = 0) _timer = { si_tid = 4096 si_overrun = 0 si_sigval = (sival_int = 0, sival_ptr = 0x0000000000000000) } _rt = { si_pid = 4096 si_uid = 0 si_sigval = (sival_int = 0, sival_ptr = 0x0000000000000000) } _sigchld = (si_pid = 4096, si_uid = 0, si_status = 0, si_utime = 0, si_stime = 0) _sigfault = { si_addr = 0x0000000000001000 si_addr_lsb = 0 _bounds = { _addr_bnd = (_lower = 0x0000000000000000, _upper = 0x0000000000000000) _pkey = 0 } } _sigpoll = (si_band = 4096, si_fd = 0) _sigsys = (_call_addr = 0x0000000000001000, _syscall = 0, _arch = 0) } } ``` And we see the siginfo contains the address which triggered the original SEGV. Differential Revision: D92093984

meta-codesync · 2026-02-04T01:02:12Z

@Jlalond has exported this pull request. If you are a Meta employee, you can view the originating Diff in D92093984.

…174247) Summary: This patch fixes the loss of signal info in Coredumps produced by caffe2 apps when they crash. The culprit is the signal handler's call to `raise` after unregistering itself. Raise under the hood actually calls `tgkill` which replaces whatever the data into the `siginfo_t` with the uid and pid of the calling process. This means when the signal and re-raised and the process coredumps, the reason for the coredump is something like `SEGV sent by=your pid, your user` without the address info or the SI_CODE from the original signal. We fix this by calling raise signal directly with the original signal. This is a port of yfeldblum's change in [Folly Signal Handler](facebook/folly@79d7f8e) to caffe2. Test Plan: In the diff above this one creates a small app that loads the caffe2 app and then SEGV's. Then inspecting the core locally ``` (lldb) thread siginfo thread #1: tid = 1711969, 0x000000000024f76a, name = 'signal_handler_', stop reason = SIGSEGV: address not mapped to object (fault address=0x1000) (__lldb_siginfo_t) __lldb_siginfo = { si_signo = 11 si_errno = 0 si_code = 1 __pad0 = 0 _sifields = { _kill = (si_pid = 4096, si_uid = 0) _timer = { si_tid = 4096 si_overrun = 0 si_sigval = (sival_int = 0, sival_ptr = 0x0000000000000000) } _rt = { si_pid = 4096 si_uid = 0 si_sigval = (sival_int = 0, sival_ptr = 0x0000000000000000) } _sigchld = (si_pid = 4096, si_uid = 0, si_status = 0, si_utime = 0, si_stime = 0) _sigfault = { si_addr = 0x0000000000001000 si_addr_lsb = 0 _bounds = { _addr_bnd = (_lower = 0x0000000000000000, _upper = 0x0000000000000000) _pkey = 0 } } _sigpoll = (si_band = 4096, si_fd = 0) _sigsys = (_call_addr = 0x0000000000001000, _syscall = 0, _arch = 0) } } ``` And we see the siginfo contains the address which triggered the original SEGV. Differential Revision: D92093984

…ytorch#174247) Summary: This patch fixes the loss of signal info in Coredumps produced by caffe2 apps when they crash. The culprit is the signal handler's call to `raise` after unregistering itself. Raise under the hood actually calls `tgkill` which replaces whatever the data into the `siginfo_t` with the uid and pid of the calling process. This means when the signal and re-raised and the process coredumps, the reason for the coredump is something like `SEGV sent by=your pid, your user` without the address info or the SI_CODE from the original signal. We fix this by calling raise signal directly with the original signal. This is a port of yfeldblum's change in [Folly Signal Handler](facebook/folly@79d7f8e) to caffe2. Test Plan: In the diff above this one creates a small app that loads the caffe2 app and then SEGV's. Then inspecting the core locally ``` (lldb) thread siginfo thread pytorch#1: tid = 1711969, 0x000000000024f76a, name = 'signal_handler_', stop reason = SIGSEGV: address not mapped to object (fault address=0x1000) (__lldb_siginfo_t) __lldb_siginfo = { si_signo = 11 si_errno = 0 si_code = 1 __pad0 = 0 _sifields = { _kill = (si_pid = 4096, si_uid = 0) _timer = { si_tid = 4096 si_overrun = 0 si_sigval = (sival_int = 0, sival_ptr = 0x0000000000000000) } _rt = { si_pid = 4096 si_uid = 0 si_sigval = (sival_int = 0, sival_ptr = 0x0000000000000000) } _sigchld = (si_pid = 4096, si_uid = 0, si_status = 0, si_utime = 0, si_stime = 0) _sigfault = { si_addr = 0x0000000000001000 si_addr_lsb = 0 _bounds = { _addr_bnd = (_lower = 0x0000000000000000, _upper = 0x0000000000000000) _pkey = 0 } } _sigpoll = (si_band = 4096, si_fd = 0) _sigsys = (_call_addr = 0x0000000000001000, _syscall = 0, _arch = 0) } } ``` And we see the siginfo contains the address which triggered the original SEGV. Differential Revision: D92093984

…174247) Summary: This patch fixes the loss of signal info in Coredumps produced by caffe2 apps when they crash. The culprit is the signal handler's call to `raise` after unregistering itself. Raise under the hood actually calls `tgkill` which replaces whatever the data into the `siginfo_t` with the uid and pid of the calling process. This means when the signal and re-raised and the process coredumps, the reason for the coredump is something like `SEGV sent by=your pid, your user` without the address info or the SI_CODE from the original signal. We fix this by calling raise signal directly with the original signal. This is a port of yfeldblum's change in [Folly Signal Handler](facebook/folly@79d7f8e) to caffe2. Test Plan: In the diff above this one creates a small app that loads the caffe2 app and then SEGV's. Then inspecting the core locally ``` (lldb) thread siginfo thread #1: tid = 1711969, 0x000000000024f76a, name = 'signal_handler_', stop reason = SIGSEGV: address not mapped to object (fault address=0x1000) (__lldb_siginfo_t) __lldb_siginfo = { si_signo = 11 si_errno = 0 si_code = 1 __pad0 = 0 _sifields = { _kill = (si_pid = 4096, si_uid = 0) _timer = { si_tid = 4096 si_overrun = 0 si_sigval = (sival_int = 0, sival_ptr = 0x0000000000000000) } _rt = { si_pid = 4096 si_uid = 0 si_sigval = (sival_int = 0, sival_ptr = 0x0000000000000000) } _sigchld = (si_pid = 4096, si_uid = 0, si_status = 0, si_utime = 0, si_stime = 0) _sigfault = { si_addr = 0x0000000000001000 si_addr_lsb = 0 _bounds = { _addr_bnd = (_lower = 0x0000000000000000, _upper = 0x0000000000000000) _pkey = 0 } } _sigpoll = (si_band = 4096, si_fd = 0) _sigsys = (_call_addr = 0x0000000000001000, _syscall = 0, _arch = 0) } } ``` And we see the siginfo contains the address which triggered the original SEGV. Differential Revision: D92093984

…ytorch#174247) Summary: This patch fixes the loss of signal info in Coredumps produced by caffe2 apps when they crash. The culprit is the signal handler's call to `raise` after unregistering itself. Raise under the hood actually calls `tgkill` which replaces whatever the data into the `siginfo_t` with the uid and pid of the calling process. This means when the signal and re-raised and the process coredumps, the reason for the coredump is something like `SEGV sent by=your pid, your user` without the address info or the SI_CODE from the original signal. We fix this by calling raise signal directly with the original signal. This is a port of yfeldblum's change in [Folly Signal Handler](facebook/folly@79d7f8e) to caffe2. Test Plan: In the diff above this one creates a small app that loads the caffe2 app and then SEGV's. Then inspecting the core locally ``` (lldb) thread siginfo thread pytorch#1: tid = 1711969, 0x000000000024f76a, name = 'signal_handler_', stop reason = SIGSEGV: address not mapped to object (fault address=0x1000) (__lldb_siginfo_t) __lldb_siginfo = { si_signo = 11 si_errno = 0 si_code = 1 __pad0 = 0 _sifields = { _kill = (si_pid = 4096, si_uid = 0) _timer = { si_tid = 4096 si_overrun = 0 si_sigval = (sival_int = 0, sival_ptr = 0x0000000000000000) } _rt = { si_pid = 4096 si_uid = 0 si_sigval = (sival_int = 0, sival_ptr = 0x0000000000000000) } _sigchld = (si_pid = 4096, si_uid = 0, si_status = 0, si_utime = 0, si_stime = 0) _sigfault = { si_addr = 0x0000000000001000 si_addr_lsb = 0 _bounds = { _addr_bnd = (_lower = 0x0000000000000000, _upper = 0x0000000000000000) _pkey = 0 } } _sigpoll = (si_band = 4096, si_fd = 0) _sigsys = (_call_addr = 0x0000000000001000, _syscall = 0, _arch = 0) } } ``` And we see the siginfo contains the address which triggered the original SEGV. Reviewed By: yfeldblum, qxy11 Differential Revision: D92093984

facebook-github-bot · 2026-02-08T01:11:53Z

@pytorchbot merge

(Initiating merge automatically since Phabricator Diff has merged)

pytorchmergebot · 2026-02-08T01:13:52Z

Merge failed

Reason: This PR needs a release notes: label
If your changes are user facing and intended to be a part of release notes, please use a label starting with release notes:.

If not, please add the topic: not user facing label.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "topic: not user facing"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Details for Dev Infra team

Raised by workflow job

pytorch-bot · 2026-02-08T01:13:56Z

This PR needs a `release notes:` label

If your changes are user facing and intended to be a part of release notes, please use a label starting with release notes:.

If not, please add the topic: not user facing label.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "topic: not user facing"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

izaitsevfb · 2026-02-08T09:07:36Z

@pytorchbot merge

pytorchmergebot · 2026-02-08T09:09:36Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

…ytorch#174247) Summary: This patch fixes the loss of signal info in Coredumps produced by caffe2 apps when they crash. The culprit is the signal handler's call to `raise` after unregistering itself. Raise under the hood actually calls `tgkill` which replaces whatever the data into the `siginfo_t` with the uid and pid of the calling process. This means when the signal and re-raised and the process coredumps, the reason for the coredump is something like `SEGV sent by=your pid, your user` without the address info or the SI_CODE from the original signal. We fix this by calling raise signal directly with the original signal. This is a port of yfeldblum's change in [Folly Signal Handler](facebook/folly@79d7f8e) to caffe2. Test Plan: In the diff above this one creates a small app that loads the caffe2 app and then SEGV's. Then inspecting the core locally ``` (lldb) thread siginfo thread pytorch#1: tid = 1711969, 0x000000000024f76a, name = 'signal_handler_', stop reason = SIGSEGV: address not mapped to object (fault address=0x1000) (__lldb_siginfo_t) __lldb_siginfo = { si_signo = 11 si_errno = 0 si_code = 1 __pad0 = 0 _sifields = { _kill = (si_pid = 4096, si_uid = 0) _timer = { si_tid = 4096 si_overrun = 0 si_sigval = (sival_int = 0, sival_ptr = 0x0000000000000000) } _rt = { si_pid = 4096 si_uid = 0 si_sigval = (sival_int = 0, sival_ptr = 0x0000000000000000) } _sigchld = (si_pid = 4096, si_uid = 0, si_status = 0, si_utime = 0, si_stime = 0) _sigfault = { si_addr = 0x0000000000001000 si_addr_lsb = 0 _bounds = { _addr_bnd = (_lower = 0x0000000000000000, _upper = 0x0000000000000000) _pkey = 0 } } _sigpoll = (si_band = 4096, si_fd = 0) _sigsys = (_call_addr = 0x0000000000001000, _syscall = 0, _arch = 0) } } ``` And we see the siginfo contains the address which triggered the original SEGV. Differential Revision: D92093984 Pull Request resolved: pytorch#174247 Approved by: https://github.com/Skylion007

meta-codesync bot added fb-exported meta-exported labels Feb 3, 2026

Jlalond requested a review from mrajpal February 4, 2026 00:53

Jlalond force-pushed the export-D92093984 branch from c07110e to d098629 Compare February 4, 2026 01:02

Skylion007 approved these changes Feb 4, 2026

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Feb 4, 2026

Jlalond force-pushed the export-D92093984 branch from d098629 to 3e1f9d9 Compare February 4, 2026 23:27

Jlalond force-pushed the export-D92093984 branch from 3e1f9d9 to feed064 Compare February 5, 2026 00:00

Jlalond force-pushed the export-D92093984 branch from feed064 to 058e8a0 Compare February 5, 2026 23:18

Jlalond force-pushed the export-D92093984 branch from 058e8a0 to dc95547 Compare February 6, 2026 19:25

pytorchmergebot added the merging label Feb 8, 2026

pytorchmergebot removed the merging label Feb 8, 2026

izaitsevfb added the topic: not user facing topic category label Feb 8, 2026

pytorchmergebot added the merging label Feb 8, 2026

pytorchmergebot closed this in f7de554 Feb 8, 2026

pytorchmergebot added Merged and removed merging labels Feb 8, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[caffe2] Fix signal handler deleting siginfo_t in resulting Coredump#174247

[caffe2] Fix signal handler deleting siginfo_t in resulting Coredump#174247
Jlalond wants to merge 1 commit intopytorch:mainfrom
Jlalond:export-D92093984

Jlalond commented Feb 3, 2026

Uh oh!

pytorch-bot bot commented Feb 3, 2026

Uh oh!

linux-foundation-easycla bot commented Feb 3, 2026 •

edited

Loading

Uh oh!

pytorch-bot bot commented Feb 3, 2026 •

edited

Loading

Uh oh!

pytorch-bot bot commented Feb 3, 2026

Uh oh!

meta-codesync bot commented Feb 3, 2026

Uh oh!

meta-codesync bot commented Feb 4, 2026

Uh oh!

facebook-github-bot commented Feb 8, 2026

Uh oh!

pytorchmergebot commented Feb 8, 2026

Uh oh!

pytorch-bot bot commented Feb 8, 2026

Uh oh!

izaitsevfb commented Feb 8, 2026

Uh oh!

pytorchmergebot commented Feb 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

Jlalond commented Feb 3, 2026

Uh oh!

pytorch-bot bot commented Feb 3, 2026

Uh oh!

linux-foundation-easycla bot commented Feb 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Feb 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/174247

✅ No Failures

Uh oh!

pytorch-bot bot commented Feb 3, 2026

This PR needs a release notes: label

Uh oh!

meta-codesync bot commented Feb 3, 2026

Uh oh!

meta-codesync bot commented Feb 4, 2026

Uh oh!

facebook-github-bot commented Feb 8, 2026

Uh oh!

pytorchmergebot commented Feb 8, 2026

Merge failed

Uh oh!

pytorch-bot bot commented Feb 8, 2026

This PR needs a release notes: label

Uh oh!

izaitsevfb commented Feb 8, 2026

Uh oh!

pytorchmergebot commented Feb 8, 2026

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

linux-foundation-easycla bot commented Feb 3, 2026 •

edited

Loading

pytorch-bot bot commented Feb 3, 2026 •

edited

Loading

This PR needs a `release notes:` label

This PR needs a `release notes:` label