Skip to content

Conversation

@zhangyuqin1998
Copy link
Contributor

@zhangyuqin1998 zhangyuqin1998 commented Sep 22, 2025

PR Category

Distributed Strategy

PR Types

Improvements

Description

Pass a local function to forward_backward_pipeline instead of the dataset itself. This prevents the dataset from being passed as a direct argument to forward_backward_pipeline, which would create additional reference counts that cannot be cleared, leading to GPU memory leaks.

pcard-76459

@paddle-bot
Copy link

paddle-bot bot commented Sep 22, 2025

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@zhangyuqin1998
Copy link
Contributor Author

/rerun all-failed

4 similar comments
@zhangyuqin1998
Copy link
Contributor Author

/rerun all-failed

@zhangyuqin1998
Copy link
Contributor Author

/rerun all-failed

@zhangyuqin1998
Copy link
Contributor Author

/rerun all-failed

@zhangyuqin1998
Copy link
Contributor Author

/rerun all-failed

@codecov-commenter
Copy link

Codecov Report

✅ All modified and coverable lines are covered by tests.
⚠️ Please upload report for BASE (develop@1aad21e). Learn more about missing BASE report.

Additional details and impacted files
@@             Coverage Diff             @@
##             develop    #75446   +/-   ##
===========================================
  Coverage           ?   100.00%           
===========================================
  Files              ?         1           
  Lines              ?         9           
  Branches           ?         0           
===========================================
  Hits               ?         9           
  Misses             ?         0           
  Partials           ?         0           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

micro_batch_data = self._load_micro_batch(self._index)
self._index += 1

if self._index >= self._acc_steps:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

上面不是已经raise StopIteration,这个判断是否有问题。

Copy link
Member

@ForFishes ForFishes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ForFishes ForFishes merged commit f0523fe into PaddlePaddle:develop Sep 23, 2025
52 checks passed
@zhangyuqin1998 zhangyuqin1998 deleted the pr/pp_dataset_dev branch October 25, 2025 16:57
zhangyuqin1998 added a commit to zhangyuqin1998/Paddle that referenced this pull request Nov 6, 2025
sneaxiy pushed a commit that referenced this pull request Nov 8, 2025
…ipeline parallel (#76260)

* [Distributed] Add PipelineDatasetPreprocessor to aviod mem leaks in pipeline parallel (#75446)

* [Bug fix] Fix bugs for dualpipe when using PipelineDatasetPreprocessor (#76212)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants