【Inference Optimize】Calculate paddle_peak_increase using paddle_allocated_mem_after_run #4355
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Calculate paddle_peak_increase using paddle_allocated_mem_after_run:
paddle_peak_increase = paddle_allocated_mem_after_run - paddle_allocated_mem_before_runThis way paddle_peak_increase is closer to the actual situation.
The
reservedvariable returns the current memory size managed by the Allocator. Theallocatedvariable returns the current memory size allocated to the Tensor. The difference we are concerned about here is the memory size allocated to the active Tensor before and after the profile. Therefore, theoretically, usingpaddle_allocated_mem_after_run - paddle_allocated_mem_before_runcan meet the requirement. Usingreservedmay cause the calculated difference to be artificially high, resulting in a smaller memory size allocated to the kv-cache