Skip to content

fixed lfm2 vlm lmhead issue that came in with hf 5.0.0#426

Merged
HenryNdubuaku merged 3 commits intomainfrom
karen/lfm2-vl-fix
Feb 24, 2026
Merged

fixed lfm2 vlm lmhead issue that came in with hf 5.0.0#426
HenryNdubuaku merged 3 commits intomainfrom
karen/lfm2-vl-fix

Conversation

@kar-m
Copy link
Copy Markdown
Collaborator

@kar-m kar-m commented Feb 24, 2026

Had a problem with LFM2-VLM, where the model would not output anything. The breaking change was introduced at commit addad5d where the Trnasformers version was changed. To replicate run

deactivate
cactus clean
source setup
cactus test --reconvert

The fix was manually hard coding the tying of embeddings and lm head for lfm2 vlm model

also fixed a seemingly unrelated problem that somehow just now appeared where the lfm2.5 vlm doesn't have a layernorm for their projector so it wouldn't even load the model. Not sure why this wasn't a problem before

Copilot AI review requested due to automatic review settings February 24, 2026 07:49
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the LFM2-VLM model loader/build to restore text generation after a Transformers/HF conversion change, and to tolerate a projector LayerNorm that is absent in some LFM2.5-VLM checkpoints.

Changes:

  • Make projector LayerNorm weights optional (skip layernorm op when weights are unavailable).
  • Change LM-head weight mapping to use token_embeddings.weights (manual tying) instead of output_weight.weights.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

projector_weights_.linear_2_weight = gb->mmap_weights(resolve_weight("projector_linear_2.weights", "projector_linear2.weights"));
projector_weights_.linear_2_bias = gb->mmap_weights(resolve_weight("projector_linear_2.bias.weights", "projector_linear2.bias.weights"));
output_weight_node_id_ = gb->mmap_weights(resolve_weight("output_weight.weights"));
output_weight_node_id_ = gb->mmap_weights(resolve_weight("token_embeddings.weights"));
Copy link

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

output_weight_node_id_ is now always sourced from token_embeddings.weights, which ignores the possibility of an untied LM head (output_weight.weights differing from embeddings). To keep correctness across configs/converted checkpoints, prefer output_weight.weights when present and fall back to token_embeddings.weights only if the output weight file is missing. Also consider reusing the already-mapped embedding node (or mmap_embeddings) to avoid double-mmapping the same file into the graph.

Suggested change
output_weight_node_id_ = gb->mmap_weights(resolve_weight("token_embeddings.weights"));
output_weight_node_id_ = gb->mmap_weights(resolve_weight("output_weight.weights", "token_embeddings.weights"));

Copilot uses AI. Check for mistakes.
Comment on lines +105 to +110
try { // these guys are not present in lfm2.5, so in lieu of adding a config option detecting that i just try catch them
projector_weights_.layer_norm_weight = gb->mmap_weights(resolve_weight("projector_layer_norm.weights"));
projector_weights_.layer_norm_bias = gb->mmap_weights(resolve_weight("projector_layer_norm.bias.weights"));
} catch (const std::exception&) {
projector_weights_.layer_norm_weight = 0;
projector_weights_.layer_norm_bias = 0;
Copy link

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The inline comment in the try block is informal and uses exception handling as control flow. Since “missing projector layernorm” is an expected variant, it would be clearer and cheaper to explicitly check std::filesystem::exists() for those files (and document why) rather than catching all std::exception (which can also mask real corruption/format errors).

Suggested change
try { // these guys are not present in lfm2.5, so in lieu of adding a config option detecting that i just try catch them
projector_weights_.layer_norm_weight = gb->mmap_weights(resolve_weight("projector_layer_norm.weights"));
projector_weights_.layer_norm_bias = gb->mmap_weights(resolve_weight("projector_layer_norm.bias.weights"));
} catch (const std::exception&) {
projector_weights_.layer_norm_weight = 0;
projector_weights_.layer_norm_bias = 0;
// Projector layer norm weights are optional: some model variants (e.g. lfm2.5) omit them.
// Explicitly check for their presence instead of relying on exceptions as control flow.
fs::path layer_norm_weight_path = base / "projector_layer_norm.weights";
fs::path layer_norm_bias_path = base / "projector_layer_norm.bias.weights";
if (fs::exists(layer_norm_weight_path) && fs::exists(layer_norm_bias_path)) {
projector_weights_.layer_norm_weight = gb->mmap_weights(layer_norm_weight_path.string());
projector_weights_.layer_norm_bias = gb->mmap_weights(layer_norm_bias_path.string());
} else {
projector_weights_.layer_norm_weight = 0;
projector_weights_.layer_norm_bias = 0;

Copilot uses AI. Check for mistakes.
Comment on lines +105 to +111
try { // these guys are not present in lfm2.5, so in lieu of adding a config option detecting that i just try catch them
projector_weights_.layer_norm_weight = gb->mmap_weights(resolve_weight("projector_layer_norm.weights"));
projector_weights_.layer_norm_bias = gb->mmap_weights(resolve_weight("projector_layer_norm.bias.weights"));
} catch (const std::exception&) {
projector_weights_.layer_norm_weight = 0;
projector_weights_.layer_norm_bias = 0;
}
Copy link

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using 0 as a sentinel for “missing” layer-norm weights is unsafe because CactusGraph node IDs start at 0 (the first mmap_weights() call will typically return node_id=0). That means the layer norm can be incorrectly treated as absent even when it loaded successfully. Use a dedicated invalid value (e.g., std::numeric_limits<size_t>::max()), or store presence as a separate boolean/std::optional<size_t> and branch on that instead.

Copilot uses AI. Check for mistakes.
Comment on lines +149 to +153
size_t normalized = flattened;
if (projector_weights_.layer_norm_weight != 0 && projector_weights_.layer_norm_bias != 0) {
normalized = gb->layernorm(flattened, projector_weights_.layer_norm_weight,
projector_weights_.layer_norm_bias, config_.layer_norm_eps);
}
Copy link

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check uses != 0 to decide whether to apply layernorm, but node_id 0 is a valid graph node ID. If the layer-norm weight happened to be loaded as node 0, normalization will be skipped unintentionally. Switch to an explicit “present” flag or compare against a dedicated invalid sentinel value instead of 0.

Copilot uses AI. Check for mistakes.
@HenryNdubuaku HenryNdubuaku merged commit 837169e into main Feb 24, 2026
5 of 6 checks passed
HenryNdubuaku added a commit to jrajala6/cactus that referenced this pull request Feb 26, 2026
…e#426)

* fixed lfm2 vlm lmhead issue that came in with hf 5.0.0

Signed-off-by: Karen Mosoyan <[email protected]>

* fixed vlm2.5 projector

Signed-off-by: Karen Mosoyan <[email protected]>

* refactor error handling in load_weights_to_graph for projector weights

Signed-off-by: HenryNdubuaku <[email protected]>

---------

Signed-off-by: Karen Mosoyan <[email protected]>
Signed-off-by: HenryNdubuaku <[email protected]>
Co-authored-by: HenryNdubuaku <[email protected]>
cattermelon1234 pushed a commit to cattermelon1234/cactus that referenced this pull request Feb 28, 2026
…e#426)

* fixed lfm2 vlm lmhead issue that came in with hf 5.0.0

Signed-off-by: Karen Mosoyan <[email protected]>

* fixed vlm2.5 projector

Signed-off-by: Karen Mosoyan <[email protected]>

* refactor error handling in load_weights_to_graph for projector weights

Signed-off-by: HenryNdubuaku <[email protected]>

---------

Signed-off-by: Karen Mosoyan <[email protected]>
Signed-off-by: HenryNdubuaku <[email protected]>
Co-authored-by: HenryNdubuaku <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants