LoRA (civitai format) with enable_model_cpu_offload

### Describe the bug

LoRA (civitai format) with enable_model_cpu_offload option and ControlNet (have not tested with basic Stable Diffusion) does not work correctly. See the code and logs. There is a notebook to reproduce the problem. 

### Reproduction

Colab notebook available here: https://colab.research.google.com/drive/1j-MEPv6gJyg16QfyjJdL9cE80J7qfSES?usp=sharing


```
!pip install -q diffusers==0.17.1 transformers xformers git+https://github.com/huggingface/accelerate.git
!pip install -q opencv-contrib-python
!pip install -q controlnet_aux
```

```python
# Load image
from diffusers import StableDiffusionControlNetPipeline
from diffusers.utils import load_image

image = load_image("https://hf.co/datasets/huggingface/documentation-images/resolve/main/diffusers/input_image_vermeer.png")
```

```python
# Canny
import cv2
from PIL import Image
import numpy as np

image = np.array(image)

low_threshold = 100
high_threshold = 200

image = cv2.Canny(image, low_threshold, high_threshold)
image = image[:, :, None]
image = np.concatenate([image, image, image], axis=2)
canny_image = Image.fromarray(image)
```

```python
from diffusers import StableDiffusionControlNetPipeline, ControlNetModel
import torch
from diffusers import UniPCMultistepScheduler

controlnet = ControlNetModel.from_pretrained("lllyasviel/sd-controlnet-canny", torch_dtype=torch.float16)
pipe = StableDiffusionControlNetPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", controlnet=controlnet, torch_dtype=torch.float16)
pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config)
pipe.enable_model_cpu_offload()
pipe.enable_xformers_memory_efficient_attention()
```

```python
!wget https://civitai.com/api/download/models/15603 -O light_and_shadow.safetensors
pipe.load_lora_weights(".", weight_name="light_and_shadow.safetensors")
```

```python
prompt = "rihanna, best quality, extremely detailed"
negative_prompt = "monochrome, lowres, bad anatomy, worst quality, low quality"
generator = torch.Generator(device="cpu").manual_seed(2)

image = pipe(prompt, canny_image, negative_prompt=negative_prompt, generator=generator, num_inference_steps=20).images[0]
```


### Logs

```shell
RuntimeError                              Traceback (most recent call last)
<ipython-input-14-0687f72ed3ff> in <cell line: 5>()
      3 generator = torch.Generator(device="cpu").manual_seed(2)
      4 
----> 5 image = pipe(prompt, canny_image, negative_prompt=negative_prompt,
      6     generator=generator, num_inference_steps=20).images[0]
      7 

19 frames
/usr/local/lib/python3.10/dist-packages/torch/nn/modules/linear.py in forward(self, input)
    112 
    113     def forward(self, input: Tensor) -> Tensor:
--> 114         return F.linear(input, self.weight, self.bias)
    115 
    116     def extra_repr(self) -> str:

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument mat2 in method wrapper_CUDA_mm)
```


### System Info

- `diffusers` version: 0.17.1
- Platform: Linux-5.15.107+-x86_64-with-glibc2.31
- Python version: 3.10.12
- PyTorch version (GPU?): 2.0.1+cu118 (True)
- Huggingface_hub version: 0.16.2
- Transformers version: 4.30.2
- Accelerate version: 0.21.0.dev0
- xFormers version: 0.0.20

### Who can help?

@williamberman, @patrickvonplaten, and @sayakpaul

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

LoRA (civitai format) with enable_model_cpu_offload #3958

Describe the bug

Reproduction

Logs

System Info

Who can help?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

LoRA (civitai format) with enable_model_cpu_offload #3958

Description

Describe the bug

Reproduction

Logs

System Info

Who can help?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions