Squad processor verbose feature by kolk · Pull Request #470 · deepset-ai/FARM

kolk · 2020-07-22T10:59:43Z

Added verbose to SquadProcessor to remove max_seq_length tokenizer warning for MiniLM

This reverts commit 6ce374d.

…lm_support

Timoeller

Hey this looks like a solid solution to suppress the warnings. I just had a couple of ideas to improve upon it, since I do not like to have an additional parameter in the processor. Furthermore this parameter does not suppress farm processing, which a user would expect.

So, lets tackle the root cause because we know that the input into tokenize.encode_plus will be long. We can set "truncation_strategy" to "do_not_truncate" and not have the warning. Could you test if that is actually the case @kolk ?

Timoeller · 2020-07-22T15:01:01Z

farm/modeling/language_model.py

                    language_model_class = 'Electra'
                elif "word2vec" in pretrained_model_name_or_path.lower() or "glove" in pretrained_model_name_or_path.lower():
                    language_model_class = 'WordEmbedding_LM'
+                elif "minilm" in pretrained_model_name_or_path.lower():


I dont understand why this is in here. Shouldnt minilm already be in master?

yes, it is weird. I'll double check this and fix it

Regarding the tokenizer warnings, I'll test out the truncation_strategy argument

…ose_feature

kolk added 5 commits July 15, 2020 21:22

MiniLM support in tokenization and language_model

6ce374d

Revert "MiniLM support in tokenization and language_model"

4e0df95

This reverts commit 6ce374d.

MiniLM support for FARM

d86e30f

Merge branch 'master' of https://github.com/deepset-ai/FARM into mini…

089247c

…lm_support

verbose argument added to SquadProcessor

644f0b7

kolk requested a review from Timoeller July 22, 2020 10:59

Timoeller suggested changes Jul 22, 2020

View reviewed changes

truncation_strategy set to do_not_truncate

68a7357

kolk requested a review from Timoeller July 23, 2020 17:53

Timoeller added 3 commits July 27, 2020 10:37

remove verbose param

d02397c

Merge remote-tracking branch 'origin/master' into SquadProcessor_verb…

9195680

…ose_feature

remove verbose param again in tokenization

fc33a42

Timoeller merged commit 3cb8e4b into master Jul 27, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Squad processor verbose feature#470

Squad processor verbose feature#470
Timoeller merged 9 commits intomasterfrom
SquadProcessor_verbose_feature

kolk commented Jul 22, 2020

Uh oh!

Timoeller left a comment •

edited

Loading

Uh oh!

Timoeller Jul 22, 2020

Uh oh!

kolk Jul 22, 2020

Uh oh!

kolk Jul 22, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

kolk commented Jul 22, 2020

Uh oh!

Timoeller left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Timoeller Jul 22, 2020

Choose a reason for hiding this comment

Uh oh!

kolk Jul 22, 2020

Choose a reason for hiding this comment

Uh oh!

kolk Jul 22, 2020

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Timoeller left a comment •

edited

Loading