Skip to content

[Feature] Use local tokenizer  #37

@gespispace

Description

@gespispace

FireCoder currently uses a tokenizer to determine the maximum length of the prompt for autocomplete. To achieve this, FireCoder sends the text to the llama.cpp tokenizer endpoint. However, this process takes time and cannot be used if the user is working on the cloud. It is important to provide as much context as possible, but the current method has some issues.

Firstly, to use the llama.cpp tokenizer, the user must download the server and model. However, this is not convenient for users who want to work with the cloud.
Secondly, preparing a prompt can take more than 2 seconds, which can be time-consuming.
Finally, FireCoder has a complex algorithm for selecting the maximum suitable length of the prompt with the minimum request to llama.cpp.

The solution is to use a local tokenizer that can be directly called from the extension. There are two possible options for this:

  1. Use tokenizers, but it works poorly when combined with nodejs bindings, so further investigation is needed.
  2. Use transformers.js, which should work well, but it still needs to be tested.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions