Open
Conversation
Member
|
This is an automated comment for commit 3e44d9f with description of existing statuses. It's updated for the latest CI running ❌ Click here to open a full report in a separate page
|
Member
|
Note to myself: thesis. |
Contributor
|
Dear @rschu1ze, this PR hasn't been updated for a while. You will be unassigned. Will you continue working on it? If so, please feel free to reassign yourself. |
src/Client/AutocompleteModel.h
Outdated
| KneserNey markov_identifiers = KneserNey(markov_order); | ||
| KneserNey markov_operators = KneserNey(markov_order); | ||
|
|
||
| GPTJModel transformer_model = GPTJModel("ggml-model-f32.bin"); |
Member
There was a problem hiding this comment.
Why not fp16?
Make the file embedded into the binary.
|
|
||
| size_t query_history_limit = 700; | ||
|
|
||
| const String history_query = fmt::format( |
Member
There was a problem hiding this comment.
The query shouldn't run if it is clickhouse-local.
c1f23ff to
2df3264
Compare
5f3ff67 to
6f839ae
Compare
…mer + add transformer with context size = 96
9348460 to
742c89b
Compare
Contributor
|
Dear @nikitamikhaylov, this PR hasn't been updated for a while. You will be unassigned. Will you continue working on it? If so, please feel free to reassign yourself. |
72 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Autcomplete for ClickHouse CLI based on a small transformer and several markov models
Documentation entry for user-facing changes
I will add full documentation in the future in this PR. For now:
It is an autocomplete system that predicts the next token of user input in ClickHouse CLI. On a high level, it works like this:
There is a transformer that predicts the type of the next token (Literal, Operator, Identifier or it can predict Keywords like SELECT, etc). Then on each of the types (Literal, Operator, Identifier), there is a dedicated markov model that predicts the value of the token itself. Also, there is a Markov Model for not preprocessed tokens, just for bare queries. If the latter markov model is highly sure about the next word (p>0.8) its prediction is placed on top of the predictions of the other machinery.
This approach allows us to predict words with markov model that we are very sure of and if we are not sure it allows us NOT to predict nonsense (because we are bound to the specific type of token).
Here is a short video (this is an old one, it works a bit better now).
autocomplete_speed_x5.mov
I will be updating this PR description and the code because it is still ongoing. There will be a very verbose description of the system a bit later as this is part of my course work and I need to do it anyway :).
Please have a look and if you have any suggestions/comments I will be happy to update the code/explain my choices.
CI Settings (Only check the boxes if you know what you are doing):