I am on v0.2.22 now. The RAM usage still peaks going from 78GB used (model loaded) to 94GB used when I run a request to the LLM.
This is much higher than before when I used to only get up to 88GB used when underload.
This problem exists with and without TurboQuant turned on.