Skip to content

Add control noise refiner correctly#404

Merged
bubbliiiing merged 8 commits intomainfrom
add_control_noise_refiner_correctly
Dec 18, 2025
Merged

Add control noise refiner correctly#404
bubbliiiing merged 8 commits intomainfrom
add_control_noise_refiner_correctly

Conversation

@bubbliiiing
Copy link
Copy Markdown
Collaborator

No description provided.

@elismasilva
Copy link
Copy Markdown

elismasilva commented Dec 16, 2025

I think this implementation of control_noise_refiner dont work properly, because the image dont follow the control image, and some artifacts are present on image. It seems to me that it's dead weight in this model; perhaps you just need to continue using the control_layer as a refiner and at the end of the forward process apply some dummy to bypass these layers.

With control noise refiner (not followed pose):
image (34)

Without control noise(followed pose):
image (35)

@bubbliiiing
Copy link
Copy Markdown
Collaborator Author

我认为这个 control_noise_refiner 的实现方式有问题,因为处理后的图像与对照图像不一致,并且图像上出现了一些伪影。在我看来,它在这个模型中是多余的;或许应该继续使用 control_layer 作为细化层,并在前向传播过程的最后阶段添加一些虚拟层来绕过这些层。

使用控制噪声精炼器(未跟踪姿态): 图片(34)

无控制噪声(跟随姿态): 图片(35)

The old model cannot predict properly and needs to be retrained. I'm currently uploading the newly trained model.

@bubbliiiing
Copy link
Copy Markdown
Collaborator Author

speed 2.1
image
speed 2.0
image

@bubbliiiing bubbliiiing requested a review from hkunzhe December 17, 2025 10:05
@elismasilva
Copy link
Copy Markdown

elismasilva commented Dec 17, 2025

speed 2.1
image
speed 2.0
image

Thanks,
Since the models internally share the same key structure, I think it would be beneficial to have separate repositories with a config.json file for each. And within that config, you could include a key like "version":"v1", where the value changes to "v2", and so on. Then in the model code you check the key to determine the flow; with this you can even discard the config.yaml and transfer his keys for the .json This way you stay aligned with the diffusers standard and don't need to pass any more parameters via custom kwargs.

@bubbliiiing
Copy link
Copy Markdown
Collaborator Author

Could you give a demo

@bubbliiiing
Copy link
Copy Markdown
Collaborator Author

The issue is mainly due to an incorrectly written layer, which caused an extra 2.1 that shouldn't exist. This has been quite frustrating.

@elismasilva
Copy link
Copy Markdown

elismasilva commented Dec 17, 2025

Could you give a demo

like this

{
  "_class_name": "ZImageControlTransformer2DModel",
  "_diffusers_version": "0.36.0.dev0",
  "all_f_patch_size": [
    1
  ],
  "all_patch_size": [
    2
  ],
  "axes_dims": [
    32,
    48,
    48
  ],
  "axes_lens": [
    1536,
    512,
    512
  ],
  "cap_feat_dim": 2560,
  "dim": 3840,
  "in_channels": 16,
  "n_heads": 30,
  "n_kv_heads": 30,
  "n_layers": 30,
  "n_refiner_layers": 2,
  "norm_eps": 1e-05,
  "qk_norm": true,
  "rope_theta": 256.0,
  "t_scale": 1000.0,
  "control_layers_places": [0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28],
  "control_refiner_layers_places": [0, 1],
  "add_control_noise_refiner": true,  
  "control_in_dim": 33,
  "version": "v2.1"
}

so in you ZImageControlTransformer2d class init you can do something like this:

@register_to_config
    def __init__(
        self,
        control_layers_places=None,
        control_refiner_layers_places=None,
        control_in_dim=None,
        add_control_noise_refiner=None,
        all_patch_size=(2,),
        all_f_patch_size=(1,),
        in_channels=16,
        dim=3840,
        n_layers=30,
        n_refiner_layers=2,
        n_heads=30,
        n_kv_heads=30,
        norm_eps=1e-5,
        qk_norm=True,
        cap_feat_dim=2560,
        rope_theta=256.0,
        t_scale=1000.0,
        axes_dims=[32, 48, 48],
        axes_lens=[1024, 512, 512],
        version="v1",
    ):
   super().__init__()
        self.all_patch_size = all_patch_size
        self.all_f_patch_size = all_f_patch_size
        self.in_channels = in_channels
        self.out_channels = in_channels
        self.dim = dim
        self.n_layers = n_layers
        self.n_refiner_layers = n_refiner_layers
        self.n_heads = n_heads
        self.n_kv_heads = n_kv_heads
        self.norm_eps = norm_eps
        self.qk_norm = qk_norm
        self.cap_feat_dim = cap_feat_dim
        self.rope_theta = rope_theta
        self.t_scale = t_scale
        self.axes_dims = axes_dims
        self.axes_lens = axes_lens
        self.version = version
        self.gradient_checkpointing = False

        if version == "v1":
            (
                _control_in_dim_default,
                _layers_places_default,
                _refiner_places_default,
                _add_refiner_default,
            ) = (16, [0, 5, 10, 15, 20, 25], [], False)            
        else:
            (
                _control_in_dim_default,
                _layers_places_default,
                _refiner_places_default,
                _add_refiner_default,
            ) = (33, list(range(0, 30, 2)), [0, 1], True)

        self.control_in_dim = (
                    control_in_dim if control_in_dim is not None else _control_in_dim_default
                )
        self.control_layers_places = (
            control_layers_places
            if control_layers_places is not None
            else _layers_places_default
        )
        self.control_refiner_layers_places = (
            control_refiner_layers_places
            if control_refiner_layers_places is not None
            else _refiner_places_default
        )
        self.add_control_noise_refiner = (
            add_control_noise_refiner
            if add_control_noise_refiner is not None
            else _add_refiner_default
        )

.....

If the file exists in the folder, whoever uses from_pretrained will assume the values ​​from config.json. If it doesn't exist, there's a security check that creates the values ​​automatically. This is also useful for those who use from_single_file for a gguf, for example, loading it by specifying the repository in the config parameter.

transformer = ZImageControlTransformer2DModel.from_single_file(
    "https://huggingface.co/alibaba-pai/Z-Image-Turbo-Fun-Controlnet-Union-2.0/blob/main/Z-Image-Turbo-Fun-Controlnet-Union-2.0.gguf",
    torch_dtype=torch.bfloat16,
    config="https://huggingface.co/alibaba-pai/Z-Image-Turbo-Fun-Controlnet-Union-2.0",
)

@elismasilva
Copy link
Copy Markdown

The issue is mainly due to an incorrectly written layer, which caused an extra 2.1 that shouldn't exist. This has been quite frustrating.

Well, if version 2.1 works as expected, then perhaps there's no point in having a 2.1 version; just a corrected 2.0 would only be an update to the repository and a notification for people to download again or do a "pull" in the repository to update the weights, thus avoiding creating a lot of code flows.

@hlky
Copy link
Copy Markdown

hlky commented Dec 17, 2025

Hi @bubbliiiing, I'm the author of the Diffusers PR, apologies for any confusion, I don't think it is necessary to change anything here, I've already handled the different versions by uploading HF Hub repos for the configs (and from_pretrained support - but from_single_file works too for the original weights).

@bubbliiiing bubbliiiing merged commit ca4cc15 into main Dec 18, 2025
@bubbliiiing
Copy link
Copy Markdown
Collaborator Author

Hi @bubbliiiing, I'm the author of the Diffusers PR, apologies for any confusion, I don't think it is necessary to change anything here, I've already handled the different versions by uploading HF Hub repos for the configs (and from_pretrained support - but from_single_file works too for the original weights).

I'll go ahead and merge this PR. Feel free to share any feedback or suggestions for changes. Thx.

@bubbliiiing
Copy link
Copy Markdown
Collaborator Author

The issue is mainly due to an incorrectly written layer, which caused an extra 2.1 that shouldn't exist. This has been quite frustrating.

Well, if version 2.1 works as expected, then perhaps there's no point in having a 2.1 version; just a corrected 2.0 would only be an update to the repository and a notification for people to download again or do a "pull" in the repository to update the weights, thus avoiding creating a lot of code flows.

Since the weights have already been propagated, and considering that the two weights cannot be distinguished very precisely, I'm concerned this might affect the performance in actual use.

@bubbliiiing bubbliiiing deleted the add_control_noise_refiner_correctly branch December 19, 2025 07:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants