-
Notifications
You must be signed in to change notification settings - Fork 32.7k
do_lower_case not saved/loaded correctly for Tokenizers #8001
Copy link
Copy link
Closed
Description
Environment info
transformersversion: 3.4.0- Platform: Linux-5.4.0-52-generic-x86_64-with-debian-buster-sid
- Python version: 3.7.6
- PyTorch version (GPU?): 1.5.1 (False)
- Tensorflow version (GPU?): not installed (NA)
- Using GPU in script?: No
- Using distributed or parallel set-up in script?: No
Who can help
Information
The do_lower_case property of BertTokenizer is not correctly restored after saving / loading.
To reproduce
from transformers import BertTokenizer
tokenizer = BertTokenizer.from_pretrained("bert-base-cased")
print(tokenizer.do_lower_case)
tokenizer.save_pretrained("debug_tokenizer")
tokenizer_loaded = BertTokenizer.from_pretrained("debug_tokenizer")
print(tokenizer_loaded.do_lower_case)returns
False
True
Expected behavior
Same object attributes after saving / loading
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels