A user Frank talked to mentioned they primarily like the GPU splitting feature when doing inference with models in LM Studio.
We should add that as well. This may involve doing work with transformerlab-inference. We could start with fastchat_server supporting this and then propagating it to other loader plugins.
Reference:

A user Frank talked to mentioned they primarily like the GPU splitting feature when doing inference with models in LM Studio.
We should add that as well. This may involve doing work with
transformerlab-inference. We could start with fastchat_server supporting this and then propagating it to other loader plugins.Reference: