Skip to content

Conversation

@monstruosoft
Copy link
Contributor

Feedback needed!.

In previous versions of FastSD, changing ControlNet settings would trigger a pipeline rebuild. With the recent addition of SDXL safetensors model support and the fact that loading an SDXL model takes over a minute on my machine, it became obvious that rebuilding the pipeline at every change on the settings was not the best idea, so this PR attempts to optimize pipeline creation by having up to four pipelines ready for generation at all times and setting the appropriate pipeline depending on the user generation settings, effectively reducing the places that require a full pipeline rebuild to when the generation models are changed.
The code for this PR focuses on LCM-LoRA mode and, hopefully, shouldn't break anything else in other generation modes, the new code is enclosed in a conditional statement to avoid affecting other modes.
This PR also fixes an issue in the LoRA code that was saving a reference to the current pipeline that was preventing the garbage collector for releasing the memory used by the pipeline.

However, there are some issues still present. I have 16 GB of RAM, barely enough to run SDXL, but sometimes, when switching models, RAM usage will go up to 30 GB, effectively killing performance on my machine. I've tried to spot the cause for this with no success, it just seems to happen randomly; sometimes I can switch models a dozen times with RAM usage remaining at a reasonable level, and other times just switching models a couple times will eat all available RAM and start using swap space. I've tried a lot of things, like explicitly calling del on all pipelines, but every time I think I found a solution, the issue randomly pops again. Last thing I tried was calling unload_lora_weights() (in line 310 of lcm_text_to_image.py) before deleting the pipeline and that seemed to somehow help but doesn't solve the issue. At this point, I think the OS is caching the safetensors files in RAM, forcing the pipeline to be created in swap space. So, I'd like to have some feedback on this subject.
Note that this issue happens when switching models within FastSD in LCM-LoRA mode. If, instead of switching models, you close FastSD to allow for the RAM to be claimed back by the OS and then launch FastSD again to select the new model, the issue shouldn't be present. This is not an ideal solution but for the time being it's the best I can do.

This commit attempts to optimize pipeline creation by reusing the
base txt2img pipeline to maintain four different pipelines ready
to use at all times, thus reducing the places in the code that
require rebuilding the generation pipeline.

The available generation pipelines are:
- StableDiffusionPipeline
- StableDiffusionImg2ImgPipeline
- StableDiffusionControlNetPipeline
- StableDiffusionControlNetImg2ImgPipeline

For SDXL the corresponding pipelines are created in a similar
fashion.

This commit also tries to ensure that the garbage collector is
called when rebuilding pipelines by removing all pipeline
references when rebuilding, including a pipeline reference in the
LoRA code that was preventing the garbage collector from releasing
the pipeline memory.
This commit adapts the Qt GUI and the WebUI to the recent pipeline
changes; in particular, ControlNet pipelines are always built from
the base txt2img pipeline, and LoRA models are always loaded onto
the base txt2img pipeline.
This commit improves the code for manipulating the multiple
pipelines according to the user selected generation options so
that the garbage collector correctly releases the pipeline
memory when switching models.

This commit also reverts the changes made in the previous commit
to use the default pipeline variable instead of the
txt2img_pipeline variable for pipeline manipulation (loading LoRAs,
for example), this ensures that the code continues to work in
generation modes other than LCM-LoRA.
This commit explicitly calls del on all pipelines to try and
make sure that the garbage collector releases all memory when
rebuilding pipelines.
This commit explicitly calls unload_lora_weights() when rebuilding
pipelines in LCM-LoRA mode.
This commit fixes pipeline creation in LCM mode.
Merge branch 'cli' from the github repo into cli
This commit makes some minor changes to apply the multiple
pipelines method to LCM mode in addition to the LCM-LoRA mode.
Minor changes
@monstruosoft
Copy link
Contributor Author

OK, I removed the call to unload_lora_weights(), it's obvious that it doesn't solve the RAM issue and whatever is causing it is not within the control of FastSD.
In the latest commit I also updated the LCM mode to use the same multiple pipelines approach as the LCM-LoRA mode.
Also, note that by using this approach of having multiple pipelines and juggling them around depending on the user settings, it might be simple to add long prompts support to FastSD.

@rupeshs
Copy link
Owner

rupeshs commented Sep 20, 2025

@monstruosoft Does SDXL controlnets works now? which controlnets you tested, please let me know.

@monstruosoft
Copy link
Contributor Author

Yes, ControlNet should work since previous PRs. I've tried these 1, 2 ControlNets; use the 640 MB versions (the fp16 versions should also work but for some reason those end up using a lot more RAM). I also tried this one but RAM usage was so high that I literally only used it once, though I can say it worked that one time.

@rupeshs
Copy link
Owner

rupeshs commented Oct 5, 2025

@monstruosoft Yes, latest master branch controlent working but it is not respecting the controlnet conditioning scale value change from webui, does it considered in this PR?

@rupeshs
Copy link
Owner

rupeshs commented Oct 5, 2025

SDXL wil take around 14 GB system RAM to generate a 768x768 image

@monstruosoft
Copy link
Contributor Author

it is not respecting the controlnet conditioning scale value change from webui

Gonna have to try it with the WebUI. I usually run FastSD either in the CLI or QtGUI versions. Will take a look at the WebUI as soon as I can.

SDXL will take around 14 GB system RAM to generate a 768x768 image

Yes, that's the typical amount of RAM used by SDXL. Weights have to be loaded in fp32 mode for CPU inference, which is a shame; even if using quantized models the weights have to be loaded into RAM in fp32 mode.

@monstruosoft
Copy link
Contributor Author

I just tried the code from this PR in the WebUI and ControlNet is working correctly. Can you please give more details on the issues you're having?.

@rupeshs
Copy link
Owner

rupeshs commented Oct 11, 2025

@monstruosoft I've added comments could you please resolve it

Copy link
Owner

@rupeshs rupeshs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please resolve.

@monstruosoft
Copy link
Contributor Author

Oh, right. Will do ASAP.

Minor changes
@monstruosoft
Copy link
Contributor Author

OK, it's done. I've also fixed a minor issue with the WebUI using int values for the ControlNet conditioning scale if the value was set to 0 or 1, it should be fixed now.

@rupeshs rupeshs merged commit 0c930a1 into rupeshs:main Oct 18, 2025
@rupeshs
Copy link
Owner

rupeshs commented Oct 18, 2025

@monstruosoft Thank you for the contribution

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants