Skip to content

Tokenizer saving/loading #15

@n1t0

Description

@n1t0

We need to provide a way to save and load tokenizers to/from files.
Things that need to be saved:

  • Each part (Normalizer, PreTokenizer, ..) and their options
  • Added tokens / special tokens
  • The model's vocabulary

We can approach this in multiple ways, but in the end, we would like to have a single self-contained file that represents a tokenizer. We will probably need to have some scripts to convert existing models to this new format.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions