Hi guys,
I'm trying to use a further train XLMRoBERTa and I got the following error :
AssertionError: Vocab size of tokenizer 250002 doesn't match with model 250005. If you added a custom vocabulary to the tokenizer, make sure to supply 'n_added_tokens' to LanguageModel.load() and BertStyleLM.load()
So I looked to language_model.py and you have
if language_model_class == 'XLMRoberta':
TODO: for some reason, the pretrained XLMRoberta has different vocab size in the tokenizer compared to the model this is a hack to resolve that
n_added_tokens = 3
On line 155, that is unecesary now.
If you try to load a classic XLMRoBERTa on Farm it's not working anymore because of this line, a fix have been made from transformers. Can you update this please ? Thanks a lot