-
Notifications
You must be signed in to change notification settings - Fork 176
[BUG] tokenizer loading error the NER Analyzer #151
Copy link
Copy link
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
The tokenizer is not loaded while loading the NER Analyzer , causing exception stated below:
Exception: Impossible to guess which tokenizer to use. Please provide a PreTrainedTokenizer class or a path/identifier to a pretrained tokenizer.
I also checked out the NERAnalyzer class and I might have a fix as well
To Reproduce
Picked this snipped from documentation under "Step 4: Configure Analyzer" >> "NER Analyzer"
from obsei.analyzer.ner_analyzer import NERAnalyzer
# NER analyzer does not need configuration settings
analyzer_config=None
# initialize ner analyzer
# For supported models refer https://huggingface.co/models?filter=token-classification
text_analyzer = NERAnalyzer(
model_name_or_path="elastic/distilbert-base-cased-finetuned-conll03-english",
device = "auto"
)
It shows the exception stated above,
Running on google colab ,
OS ="Ubuntu"
VERSION="18.04.5 LTS (Bionic Beaver)"
Additional context
I think while initializing the class NER Analyzer , the tokenizer is not initialized , ans is set to None
class NERAnalyzer(BaseAnalyzer):
_pipeline: Pipeline = PrivateAttr()
_max_length: int = PrivateAttr()
TYPE: str = "NER"
model_name_or_path: str
tokenizer_name: Optional[str] = None
grouped_entities: Optional[bool] = True
def __init__(self, **data: Any):
super().__init__(**data)
model = AutoModelForTokenClassification.from_pretrained(self.model_name_or_path)
if self.tokenizer_name:
tokenizer = AutoTokenizer.from_pretrained(
self.tokenizer_name, use_fast=True
)
else:
tokenizer = None
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working