Add GPU Split Strategy for Inference similar to LM Studio

A user Frank talked to mentioned they primarily like the GPU splitting feature when doing inference with models in LM Studio. 
We should add that as well. This may involve doing work with `transformerlab-inference`. We could start with fastchat_server supporting this and then propagating it to other loader plugins. 


Reference:

<img width="1515" height="675" alt="Image" src="https://github.com/user-attachments/assets/55374396-3821-47dc-af79-e733e69318e9" />

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add GPU Split Strategy for Inference similar to LM Studio #884

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Add GPU Split Strategy for Inference similar to LM Studio #884

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions