Skip to content

Use weak reference to break circular reference and memory leaks#3447

Merged
vfdev-5 merged 7 commits intopytorch:masterfrom
goanpeca:fix/oom
Sep 19, 2025
Merged

Use weak reference to break circular reference and memory leaks#3447
vfdev-5 merged 7 commits intopytorch:masterfrom
goanpeca:fix/oom

Conversation

@goanpeca
Copy link
Copy Markdown
Collaborator

@goanpeca goanpeca commented Sep 4, 2025

Fixes #3438


@vfdev-5 I can confirm this PR fixes the issue.

Screenshot 2025-09-05 at 10 21 49 PM

@github-actions github-actions bot added the module: engine Engine module label Sep 4, 2025
@github-actions github-actions bot added module: metrics Metrics module module: handlers Core Handlers module labels Sep 6, 2025
@goanpeca goanpeca force-pushed the fix/oom branch 3 times, most recently from 918bbd5 to c359e0c Compare September 6, 2025 03:20
@goanpeca goanpeca changed the title Fix oom in python 3.12+, add close method to engine and context manager Use weak reference to break circular reference and memory leaks Sep 6, 2025
@goanpeca goanpeca marked this pull request as ready for review September 6, 2025 14:28
Copilot AI review requested due to automatic review settings September 6, 2025 14:28

This comment was marked as outdated.

@goanpeca goanpeca requested a review from vfdev-5 September 6, 2025 15:41
@goanpeca goanpeca force-pushed the fix/oom branch 3 times, most recently from 68a568a to f3d6e85 Compare September 6, 2025 20:05
@vfdev-5
Copy link
Copy Markdown
Collaborator

vfdev-5 commented Sep 6, 2025

Confirming from my side as well that there is no OOM anymore with this PR:

!!! 0 3163603968 7885418496                                                                                                                                                                
!!! 1 3163603968 7885418496                                                                                                                                                                
!!! 2 3163603968 7885418496                                                                                                                                                                
!!! 3 3163603968 7885418496                                                                                                                                                                
!!! 4 3163603968 7885418496                                                                                                                                                                
!!! 5 3163603968 7885418496                                                                                                                                                                
!!! 6 3163603968 7885418496                                                                                                                                                                
!!! 7 3163603968 7885418496                                                                                                                                                                
!!! 8 3163603968 7885418496                                                                                                                                                                
!!! 9 3163603968 7885418496                                                                                                                                                                
!!! 10 3163603968 7885418496                                                                                                                                                               
!!! 11 3163603968 7885418496                                                                                                                                                               
!!! 12 3163603968 7885418496                                                                                                                                                               
^CTraceback (most recent call last):
  File "/ignite/tmp/check_oom.py", line 47, in <module>
    main()
  File "/ignite/tmp/check_oom.py", line 41, in main
    model = nn.Sequential(nn.Flatten(), nn.Linear(64 * 64 * 32, N), nn.ReLU(), nn.Linear(N, 1))
                                        ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/torch/nn/modules/linear.py", line 112, in __init__
    self.reset_parameters()
  File "/opt/conda/lib/python3.11/site-packages/torch/nn/modules/linear.py", line 118, in reset_parameters
    init.kaiming_uniform_(self.weight, a=math.sqrt(5))
  File "/opt/conda/lib/python3.11/site-packages/torch/nn/init.py", line 518, in kaiming_uniform_
    return tensor.uniform_(-bound, bound, generator=generator)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
KeyboardInterrupt

And there is an OOM with python 3.11

python -VV
Python 3.11.11 | packaged by conda-forge | (main, Dec  5 2024, 14:17:24) [GCC 13.3.0]

pip list | grep torch
pytorch-ignite            0.6.0        /ignite
torch                     2.6.0+cu126

@goanpeca goanpeca force-pushed the fix/oom branch 5 times, most recently from 2c4a36d to 0a427c6 Compare September 7, 2025 18:53
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes a memory leak issue in the Ignite Engine by replacing direct engine references with weak references in event handlers. The circular reference between the engine and its event handlers was preventing proper garbage collection.

  • Replaces direct engine references with weak references in event handler storage
  • Updates event firing logic to resolve weak references at runtime
  • Adds comprehensive test coverage to verify the memory leak fix

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File Description
ignite/engine/engine.py Updates event handler registration and firing to use weak references instead of direct engine references
tests/ignite/engine/test_memory_leaks.py Adds new test class to verify that engines are properly garbage collected with and without event handlers

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@vfdev-5
Copy link
Copy Markdown
Collaborator

vfdev-5 commented Sep 15, 2025

@goanpeca can please check why the CI is all red?

Copy link
Copy Markdown
Collaborator

@vfdev-5 vfdev-5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@vfdev-5 vfdev-5 enabled auto-merge September 19, 2025 10:43
@vfdev-5 vfdev-5 added this pull request to the merge queue Sep 19, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to no response for status checks Sep 19, 2025
@vfdev-5 vfdev-5 added this pull request to the merge queue Sep 19, 2025
@vfdev-5 vfdev-5 removed this pull request from the merge queue due to a manual request Sep 19, 2025
@vfdev-5 vfdev-5 merged commit dcac448 into pytorch:master Sep 19, 2025
26 checks passed
@goanpeca goanpeca deleted the fix/oom branch September 19, 2025 18:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

module: engine Engine module module: handlers Core Handlers module module: metrics Metrics module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

OOM in Python 3.12, but not in 3.10

3 participants