-
Notifications
You must be signed in to change notification settings - Fork 31.5k
Force use_cache to be False in PyTorch #15385
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Force use_cache to be False in PyTorch #15385
Conversation
|
The documentation is not available anymore as the PR was closed or merged. |
|
Hmm, I think we can leave those statements @ydshieh no? The don't do any harm, but reduce the memory footprint slightly by not creating the past tensor. The reduced memory footprint might however be neglectible - in this case, ok for me to remove it. For more context, I think the reason we've added
|
|
Keen to hear @Rocketknight1 and @gante opinion here |
|
I totally agree. My purpose it to make the PT/TF returns the same results, not the specific way to use. I can do it in another way: keep TF code as it is, but add I believe a thorough test will be beneficial to HF's transformers - I have use it to identify several other issues that require real fixes (unlike this one). |
|
Given the mismatches that @ydshieh has found so far, I'm leaning towards approving the PR so we can do more thorough testing. Can we get some numbers (memory utilization and runtime, before and after the change), so we can make an informed decision? If the difference turns out to be small, then I believe it is a no-brainer :) |
I could try to measure it. But I prefer to go another way: add |
|
Hi, Use a setting I calculated the number of float32: I tried with other settings, and it is linear. So for I am using PT's BART in the experimentation. |
|
The savings could be helpful, especially on larger models. I'm in favour of setting |
d5111ab to
43c8452
Compare
|
I reverted the changes done in TF models, and add |
|
I think we should give a warning only if Or, we shouldn't set Since the changes are in PT models now, cc @LysandreJik @patrickvonplaten @sgugger
|
Yes I very much agree that's a nice insight! If |
I will change the title, but I haven't done anything for the warning (if |
Think we can do it in this PR |
|
Added warning. (We can add the same warning for TF models in a follow-up PR) |
patil-suraj
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
|
Great PR @ydshieh - thanks a lot! |
* use_cache = False for PT models if labels is passed * Fix for BigBirdPegasusForConditionalGeneration * add warning if users specify use_cache=True * Use logger.warning instead of warnings.warn Co-authored-by: ydshieh <[email protected]>
What does this PR do?
In
TFxxxForForConditionalGenerationmodels, there iswhile there is no such change in
xxxForForConditionalGenerationmodels.This would fail a more complete pt/tf equivalence test.
I understand why TF models doing this (which might be beneficial for efficiency.
We can do it in another way: add
use_cache=Falsefor PyTorch models iflabelsis passed. Let me know which way HF prefers.