Crashes (still have RAM peak and unloading)

I am on v0.2.22 now. The RAM usage still peaks going from 78GB used (model loaded) to 94GB used when I run a request to the LLM.

This is much higher than before when I used to only get up to 88GB used when underload. 

This problem exists with and without TurboQuant turned on.