Skip to content

do_lower_case not saved/loaded correctly for Tokenizers #8001

@tholor

Description

@tholor

Environment info

  • transformers version: 3.4.0
  • Platform: Linux-5.4.0-52-generic-x86_64-with-debian-buster-sid
  • Python version: 3.7.6
  • PyTorch version (GPU?): 1.5.1 (False)
  • Tensorflow version (GPU?): not installed (NA)
  • Using GPU in script?: No
  • Using distributed or parallel set-up in script?: No

Who can help

@mfuntowicz

Information

The do_lower_case property of BertTokenizer is not correctly restored after saving / loading.

To reproduce

from transformers import BertTokenizer

tokenizer = BertTokenizer.from_pretrained("bert-base-cased")
print(tokenizer.do_lower_case)

tokenizer.save_pretrained("debug_tokenizer")
tokenizer_loaded = BertTokenizer.from_pretrained("debug_tokenizer")
print(tokenizer_loaded.do_lower_case)

returns

False
True

Expected behavior

Same object attributes after saving / loading

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions