[Feature] Use local tokenizer 

FireCoder currently uses a tokenizer to determine the maximum length of the prompt for autocomplete. To achieve this, FireCoder sends the text to the llama.cpp tokenizer endpoint. However, this process takes time and cannot be used if the user is working on the cloud. It is important to provide as much context as possible, but the current method has some issues.

Firstly, to use the llama.cpp tokenizer, the user must download the server and model. However, this is not convenient for users who want to work with the cloud. 
Secondly, preparing a prompt can take more than 2 seconds, which can be time-consuming. 
Finally, FireCoder has a complex algorithm for selecting the maximum suitable length of the prompt with the minimum request to llama.cpp.

The solution is to use a local tokenizer that can be directly called from the extension. There are two possible options for this: 

1. Use [tokenizers](https://huggingface.co/docs/tokenizers/en/index), but it works poorly when combined with nodejs bindings, so further investigation is needed. 
2. Use [transformers.js](https://huggingface.co/docs/transformers.js/main/en/api/tokenizers#tokenizers), which should work well, but it still needs to be tested.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature] Use local tokenizer #37

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Feature] Use local tokenizer #37

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions