Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
|
While I'm not an expert of the diffusers code base as far as I can see, based on WAN which also has multiple parameter counts they're just treated as different model types e.g. in if checkpoint[target_key].shape[0] == 1536:
model_type = "wan-t2v-1.3B"
elif checkpoint[target_key].shape[0] == 5120 and checkpoint[target_key].shape[1] == 16:
model_type = "wan-t2v-14B"
else:
model_type = "wan-i2v-14B" |
|
@a-r-r-o-w I think just run a shape check on the params to determine which config to use. I think this should be sufficient to differentiate? |
|
@Vargol Could you verify if the latest changes work for you? |
|
The Comos 2B single file at https://huggingface.co/nvidia/Cosmos-Predict2-2B-Text2Image/resolve/main/model.pt loaded and successfully ran and generated the expected image. I tried a GGUF file for the 14B version and that didn't work. I'm not sure if that was in scope though. If it was $ python cosmos_gguf_prmpts.py
Multiple distributions found for package optimum. Picked distribution: optimum-quanto
WARNING:torchao.kernel.intmm:Warning: Detected no triton, on systems without Triton certain kernels will not work
W0627 23:30:48.574000 85696 lib/python3.11/site-packages/torch/distributed/elastic/multiprocessing/redirects.py:29] NOTE: Redirects are currently not supported in Windows or MacOs.
The config attributes {'input_types': ['text'], 'model_size': '14b'} were passed to CosmosTransformer3DModel, but are not expected and will be ignored. Please verify your config.json configuration file.
Traceback (most recent call last):
File "/Volumes/SSD2TB/AI/Diffusers/cosmos_gguf_prmpts.py", line 12, in <module>
transformer = CosmosTransformer3DModel.from_single_file(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Volumes/SSD2TB/AI/Diffusers/lib/python3.11/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
return fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File "/Volumes/SSD2TB/AI/Diffusers/lib/python3.11/site-packages/diffusers/loaders/single_file_model.py", line 420, in from_single_file
load_model_dict_into_meta(
File "/Volumes/SSD2TB/AI/Diffusers/lib/python3.11/site-packages/diffusers/models/model_loading_utils.py", line 285, in load_model_dict_into_meta
hf_quantizer.check_quantized_param_shape(param_name, empty_state_dict[param_name], param)
File "/Volumes/SSD2TB/AI/Diffusers/lib/python3.11/site-packages/diffusers/quantizers/gguf/gguf_quantizer.py", line 84, in check_quantized_param_shape
raise ValueError(
ValueError: patch_embed.proj.weight has an expected quantized shape of: (5120, 68), but received shape: torch.Size([5120, 136])
$ |
|
@Vargol Could you share a link to the GGUF checkpoint you're trying to load? |
|
@DN6 sorry if you get this multiple times, Github isn't showing any response to the Comment button and I've reloaded the page and there's no sign of my reply. I got the version I tested from |
* update * update * update docs
Possibly fixes #11798
We can run inference with the 7B Text-to-World model with the following code:
@DN6 I'm not sure I remember how to support different versions of the same model. With the current implementation, if we tried loading the 14B model, it would fail with a weight shape mismatch. This is most likely to do with config-related issues. Could you share some insights?
For Cosmos 1.0 text-to-world and video-to-world models 7B and 14B models, I'll have to make a
cosmos-1.0entry. Another entrycosmos-2.0for Cosmos Predict2 models. But, what's the normal process for model of same family but different parameter sizes?