Refactoring Processor for LM Finetuning (FastTokenizers) by tholor · Pull Request #659 · deepset-ai/FARM

tholor · 2020-12-18T10:29:48Z

Moving to fast tokenizers and new preprocessing stages ...

Breaking change: Slow tokenizers won't be supported any more for this type of task

Timoeller

Looks good.

Please also fix the lm finetuning tests (if they should be failing).

We need to make get start of word better in a coming release (branden also has a version as function of a single processor).

tholor added 6 commits December 14, 2020 09:19

WIP lm finetuning refactoring

cbe0e8e

WIP refactoring bert style lm

e82940b

first working version of bert_style_lm

1776162

optimize speed of mask_random_words

d85c2af

merge latest refactor branch

b3582a6

move get_start_of_words to tokenization module

8adfea7

tholor added the breaking change label Dec 18, 2020

tholor mentioned this pull request Dec 18, 2020

lm_finetuning.py is getting stuck on my machine #652

Closed

tholor added 3 commits December 18, 2020 13:50

Update docstrings. fix estimation

82180c7

add multithreading_rust arg

0f9ecce

fix import. fix vocab index out of range

20b7918

Timoeller approved these changes Dec 18, 2020

View reviewed changes

tholor added 3 commits December 18, 2020 20:15

fix empty sequence b

e24d40d

make bert-style to new default for lm finetuning. disable eval_report

30b29df

change evaluate_every to 1000

3f40691

tholor merged commit 803b41b into refactor_processor_qa Dec 21, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactoring Processor for LM Finetuning (FastTokenizers)#659

Refactoring Processor for LM Finetuning (FastTokenizers)#659
tholor merged 12 commits intorefactor_processor_qafrom
refactor_processor_lm_fine

tholor commented Dec 18, 2020

Uh oh!

Timoeller left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

tholor commented Dec 18, 2020

Uh oh!

Timoeller left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants