Skip to content
This repository was archived by the owner on Apr 8, 2025. It is now read-only.

Bump transformers version to 3.4.0#598

Closed
tholor wants to merge 3 commits intomasterfrom
transformers_3.4.0
Closed

Bump transformers version to 3.4.0#598
tholor wants to merge 3 commits intomasterfrom
transformers_3.4.0

Conversation

@tholor
Copy link
Copy Markdown
Member

@tholor tholor commented Oct 21, 2020

No description provided.

@tholor tholor added this to the #3 milestone Oct 21, 2020
@lalitpagaria
Copy link
Copy Markdown
Contributor

Thanks @tholor for PR, we need latest changes because of many bug fixes related to RAG.
https://github.com/huggingface/transformers/releases/tag/v3.4.0

@tholor
Copy link
Copy Markdown
Member Author

tholor commented Oct 23, 2020

Seems that there's a bug in transformers for saving/loading do_lower_case attribute of the Tokenizers: huggingface/transformers#8001

We should not upgrade before we have a clear fix / workaround.

@lalitpagaria
Copy link
Copy Markdown
Contributor

@tholor possible to use BertTokenizerFast

from transformers import BertTokenizerFast

tokenizer = BertTokenizerFast.from_pretrained("bert-base-cased")
print(tokenizer.do_lower_case)


tokenizer.save_pretrained("debug_tokenizer")

tokenizer_loaded = BertTokenizerFast.from_pretrained("debug_tokenizer")
print(tokenizer_loaded.do_lower_case)

# Print
# False
# False

And this one

from transformers import BertTokenizerFast, BertConfig

tokenizer = BertTokenizerFast.from_pretrained("bert-base-cased", do_lower_case=True)
print(tokenizer.do_lower_case)


tokenizer.save_pretrained("debug_tokenizer")

tokenizer_loaded = BertTokenizerFast.from_pretrained("debug_tokenizer")
print(tokenizer_loaded.do_lower_case)

# Print
# True
# True

@tholor
Copy link
Copy Markdown
Member Author

tholor commented Oct 23, 2020

yeah, but the slow ones are failing! We can release FARM with such a nasty bug. Many deployments still rely on the slow tokenizers and not everybody can install the fast Tokenizers due to rust issues.

@lalitpagaria
Copy link
Copy Markdown
Contributor

Agree, I think better wait for fix or workaround.
Anyway Haystack RAG can work with 3.3.1

@tholor
Copy link
Copy Markdown
Member Author

tholor commented Oct 27, 2020

It's fixed now upstream in transformers: huggingface/transformers#8001

@lalitpagaria
Copy link
Copy Markdown
Contributor

What do you suggest about release of RAG -

  • Wait for new transformers release ( >3.4.0) then release new FARM version
    OR
  • Release FARM with transformers (3.3.1) then release another FARM version later.

@tholor
Copy link
Copy Markdown
Member Author

tholor commented Oct 28, 2020

We are currently preparing the release. Just a few more PRs we want to get in.
The plan is to release tomorrow or Friday. If transformers doesn't do a release before that, we'll go with 3.3.1 for now

@tholor
Copy link
Copy Markdown
Member Author

tholor commented Nov 4, 2020

We'll skip transformers 3.4.0 and wait for the next release that has a fix for the above mentioned lowercase bug.

@tholor tholor closed this Nov 4, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants