-
Notifications
You must be signed in to change notification settings - Fork 172
Optimize pipeline creation in LCM-LoRA mode #328
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This commit attempts to optimize pipeline creation by reusing the base txt2img pipeline to maintain four different pipelines ready to use at all times, thus reducing the places in the code that require rebuilding the generation pipeline. The available generation pipelines are: - StableDiffusionPipeline - StableDiffusionImg2ImgPipeline - StableDiffusionControlNetPipeline - StableDiffusionControlNetImg2ImgPipeline For SDXL the corresponding pipelines are created in a similar fashion. This commit also tries to ensure that the garbage collector is called when rebuilding pipelines by removing all pipeline references when rebuilding, including a pipeline reference in the LoRA code that was preventing the garbage collector from releasing the pipeline memory.
This commit adapts the Qt GUI and the WebUI to the recent pipeline changes; in particular, ControlNet pipelines are always built from the base txt2img pipeline, and LoRA models are always loaded onto the base txt2img pipeline.
This commit improves the code for manipulating the multiple pipelines according to the user selected generation options so that the garbage collector correctly releases the pipeline memory when switching models. This commit also reverts the changes made in the previous commit to use the default pipeline variable instead of the txt2img_pipeline variable for pipeline manipulation (loading LoRAs, for example), this ensures that the code continues to work in generation modes other than LCM-LoRA.
This commit explicitly calls del on all pipelines to try and make sure that the garbage collector releases all memory when rebuilding pipelines.
This commit explicitly calls unload_lora_weights() when rebuilding pipelines in LCM-LoRA mode.
This commit fixes pipeline creation in LCM mode.
Merge branch 'cli' from the github repo into cli
This commit makes some minor changes to apply the multiple pipelines method to LCM mode in addition to the LCM-LoRA mode.
Minor changes
|
OK, I removed the call to unload_lora_weights(), it's obvious that it doesn't solve the RAM issue and whatever is causing it is not within the control of FastSD. |
|
@monstruosoft Does SDXL controlnets works now? which controlnets you tested, please let me know. |
|
Yes, ControlNet should work since previous PRs. I've tried these 1, 2 ControlNets; use the 640 MB versions (the fp16 versions should also work but for some reason those end up using a lot more RAM). I also tried this one but RAM usage was so high that I literally only used it once, though I can say it worked that one time. |
|
@monstruosoft Yes, latest master branch controlent working but it is not respecting the controlnet conditioning scale value change from webui, does it considered in this PR? |
|
SDXL wil take around 14 GB system RAM to generate a 768x768 image |
Gonna have to try it with the WebUI. I usually run FastSD either in the CLI or QtGUI versions. Will take a look at the WebUI as soon as I can.
Yes, that's the typical amount of RAM used by SDXL. Weights have to be loaded in fp32 mode for CPU inference, which is a shame; even if using quantized models the weights have to be loaded into RAM in fp32 mode. |
|
I just tried the code from this PR in the WebUI and ControlNet is working correctly. Can you please give more details on the issues you're having?. |
|
@monstruosoft I've added comments could you please resolve it |
rupeshs
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please resolve.
|
Oh, right. Will do ASAP. |
Minor changes
|
OK, it's done. I've also fixed a minor issue with the WebUI using int values for the ControlNet conditioning scale if the value was set to 0 or 1, it should be fixed now. |
|
@monstruosoft Thank you for the contribution |
Feedback needed!.
In previous versions of FastSD, changing ControlNet settings would trigger a pipeline rebuild. With the recent addition of SDXL safetensors model support and the fact that loading an SDXL model takes over a minute on my machine, it became obvious that rebuilding the pipeline at every change on the settings was not the best idea, so this PR attempts to optimize pipeline creation by having up to four pipelines ready for generation at all times and setting the appropriate pipeline depending on the user generation settings, effectively reducing the places that require a full pipeline rebuild to when the generation models are changed.
The code for this PR focuses on LCM-LoRA mode and, hopefully, shouldn't break anything else in other generation modes, the new code is enclosed in a conditional statement to avoid affecting other modes.
This PR also fixes an issue in the LoRA code that was saving a reference to the current pipeline that was preventing the garbage collector for releasing the memory used by the pipeline.
However, there are some issues still present. I have 16 GB of RAM, barely enough to run SDXL, but sometimes, when switching models, RAM usage will go up to 30 GB, effectively killing performance on my machine. I've tried to spot the cause for this with no success, it just seems to happen randomly; sometimes I can switch models a dozen times with RAM usage remaining at a reasonable level, and other times just switching models a couple times will eat all available RAM and start using swap space. I've tried a lot of things, like explicitly calling del on all pipelines, but every time I think I found a solution, the issue randomly pops again. Last thing I tried was calling unload_lora_weights() (in line 310 of lcm_text_to_image.py) before deleting the pipeline and that seemed to somehow help but doesn't solve the issue. At this point, I think the OS is caching the safetensors files in RAM, forcing the pipeline to be created in swap space. So, I'd like to have some feedback on this subject.
Note that this issue happens when switching models within FastSD in LCM-LoRA mode. If, instead of switching models, you close FastSD to allow for the RAM to be claimed back by the OS and then launch FastSD again to select the new model, the issue shouldn't be present. This is not an ideal solution but for the time being it's the best I can do.