model : add ASR support for LFM2-Audio-1.5B #17694

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Closed

tdakhran wants to merge 5 commits into ggml-org:master from Liquid4All:tarek/feat/lfm2-asr-upstream

Contributor

tdakhran commented Dec 2, 2025

LFM2-Audio-1.5B supports audio input and audio output.

PR adds only ASR support. To perform ASR invoke CLI with

bin/llama-mtmd-cli -m LFM2-Audio-1.5B-F32.gguf --mmproj mmproj-LFM2-Audio-1.5b-F32.gguf -n 30 --audio input.wav -sys "Perform ASR." -p "<__media__>"

Changes to existing code:

model requires system prompt, -sys enabled for llama-mtmd-cli
mel bins generation reworked, now it is generated dynamically and supports different n_fft values
OP_SSM_CONV for CUDA backend is extended to support kernel size 9

Contributor Author

tdakhran commented Dec 2, 2025

tested that llama-server works as intended with input

[
        {"role": "system", "content": "Perform ASR."},
        {
            "role": "user",
            "content": [
                {
                    "type": "input_audio",
                    "input_audio": {
                        "format": "wav",
                        "data": base64.b64encode(pathlib.Path("/data/playground/issue_400/10.wav").read_bytes()).decode(
                            "utf-8"
                        ),
                    },
                },
            ],
        },
    ]

tdakhran changed the title ~~model : add LFM2-Audio-1.5B support~~ model : add ASR support for LFM2-Audio-1.5B

github-actions bot added testing Nvidia GPU examples python ggml labels

This was referenced Dec 12, 2025

clip: move model cgraphs into their own files #17965

Merged

mtmd: refactor audio preprocessing #17978

Merged

Contributor Author

tdakhran commented Dec 14, 2025

The code is tested, will wait for #17978 to be merged, and then rebase and mark it as "ready for review".

tdakhran mentioned this pull request

Rebased ASR for LFM2-Audio-1.5B ngxson/llama.cpp#58

Closed

tdakhran force-pushed the tarek/feat/lfm2-asr-upstream branch 2 times, most recently from 50597aa to 5044ab6 Compare

December 15, 2025 13:42

tdakhran marked this pull request as ready for review

December 15, 2025 13:44

tdakhran requested review from CISC, ggerganov and ngxson as code owners

December 15, 2025 13:44

loci-dev mentioned this pull request

UPSTREAM PR #17694: model : add ASR support for LFM2-Audio-1.5B auroralabs-loci/llama.cpp#578

Open

Contributor Author

tdakhran commented Dec 15, 2025

The code is ready for review and is tested with mtmd-cli and llama-server.

python convert_hf_to_gguf.py  /data/playground/checkpoints/LFM2-Audio-1.5B --outtype f32
python convert_hf_to_gguf.py  /data/playground/checkpoints/LFM2-Audio-1.5B --outtype f32 --mmproj

build/bin/llama-mtmd-cli -m /data/playground/checkpoints/LFM2-Audio-1.5B/LFM2-Audio-1.5B-F32.gguf --mmproj /data/playground/checkpoints/LFM2-Audio-1.5B/mmproj-LFM2-Audio-1.5b-F32.gguf -n 30 --audio /data/playground/issue_400/10.wav -sys "Perform ASR." -p "<__media__>" -v

produces valid results for the attached file
10.wav

encoding audio slice...
audio slice encoded in 39 ms
decoding audio batch 1/1, n_tokens_batch = 33
audio decoded (batch 1/1) in 109 ms

I need more air. Can you increase the fan speed?

CISC reviewed

View reviewed changes

convert_hf_to_gguf.py Outdated Show resolved Hide resolved

ngxson reviewed

View reviewed changes

tools/mtmd/models/lfm2-audio-enc.cpp Show resolved Hide resolved

tools/mtmd/models/lfm2-audio-enc.cpp Outdated Show resolved Hide resolved

tools/mtmd/models/lfm2-audio-enc.cpp Outdated

Comment on lines 114 to 126

    
                          Kcur = ggml_cont(ctx0, ggml_permute(ctx0, Kcur, 0, 2, 1, 3));

                          Q_bias_u = ggml_cont(ctx0, ggml_permute(ctx0, Q_bias_u, 0, 2, 1, 3));

                          ggml_tensor * matrix_ac = ggml_mul_mat(ctx0, Q_bias_u, Kcur);

                          matrix_ac = ggml_cont(ctx0, ggml_permute(ctx0, matrix_ac, 1, 0, 2, 3));

                          cb(matrix_ac, "conformer.layers.{}.self_attn.id3", il);

                          auto * p = ggml_mul_mat(ctx0, layer.linear_pos_w, pos_emb);

                          cb(p, "conformer.layers.{}.self_attn.linear_pos", il);

                          p = ggml_reshape_3d(ctx0, p, d_head, n_head, p->ne[1]);

                          Q_bias_v = ggml_cont(ctx0, ggml_permute(ctx0, Q_bias_v, 0, 2, 1, 3));

                          cb(Q_bias_v, "conformer.layers.{}.self_attn.id0", il);

                          p = ggml_cont(ctx0, ggml_permute(ctx0, p, 1, 2, 0, 3));

Collaborator

ngxson Dec 15, 2025

do you think we could replace this with build_attn?

the advantage of build_attn is that it supports flash attn which can significantly improve the performance, but I'm not sure if there is currently anything missing to make it work in this case

Contributor Author

tdakhran Dec 15, 2025

I saw some extra stuff like biases, matrix_ac, matrix_bd, it scared me followed Python implementation as is, will give it a second look

Contributor Author

tdakhran Dec 15, 2025

looked into it, build_attn won't fit, too many customizations to attention.

tools/mtmd/models/lfm2-audio-enc.cpp Outdated

Comment on lines 141 to 148

    
                              matrix_bd = ggml_reshape_3d(ctx0, matrix_bd, q_len, pos_len + 1, h);

                              matrix_bd = ggml_cont(ctx0, ggml_view_3d(ctx0, matrix_bd,

                                          q_len, pos_len, h,

                                          matrix_bd->nb[1], matrix_bd->nb[2], matrix_bd->nb[0] * q_len));

                              matrix_bd = ggml_reshape_3d(ctx0, matrix_bd, pos_len, q_len, h);

                          }

                          matrix_bd = ggml_cont(ctx0, ggml_view_3d(ctx0, matrix_bd,

Collaborator

ngxson Dec 15, 2025

a bit strange that we're having these 4 reshapes / view without any permutations. can we collapse this into one single ggml_reshape_3d?

Contributor Author

tdakhran Dec 15, 2025

If it were a plain view, reshapes could be simplified. There is a crop happening inside ggml_view_3d.

Collaborator

ngxson Dec 15, 2025

hmm yeah interesting. not very important to optimize this, so I'll have a look later to see if there is another way

tools/mtmd/models/lfm2-audio-enc.cpp Outdated Show resolved Hide resolved

tools/mtmd/models/lfm2-audio-enc.cpp Outdated

Comment on lines 209 to 211

    
                              x = ggml_cont(ctx0, ggml_transpose(ctx0, x));

                              x = ggml_add(ctx0, ggml_mul(ctx0, x, layer.conv_norm_w), layer.conv_norm_b);

                              x = ggml_cont(ctx0, ggml_transpose(ctx0, x));

Collaborator

ngxson Dec 15, 2025

we may be able to remove of transposes if conv_norm_b is already transpose upon conversion?

tools/mtmd/models/lfm2-audio-enc.cpp Outdated

Comment on lines 217 to 221

    
                          x = ggml_cont(ctx0, ggml_transpose(ctx0, x));

                          auto * conv_pw2_w = ggml_reshape_2d(ctx0, layer.conv_pw2_w, layer.conv_pw2_w->ne[1], layer.conv_pw2_w->ne[2]);

                          x = ggml_mul_mat(ctx0, conv_pw2_w, x);

                          x = ggml_add(ctx0, x, layer.conv_pw2_b);

                          x = ggml_cont(ctx0, ggml_transpose(ctx0, x));

Collaborator

ngxson Dec 15, 2025

(I'll have a look into this), I suspect that these 2 transposes can be removed too (or at worse, one can be a view)

Contributor Author

tdakhran Dec 15, 2025

Many transposes here are following the Python code without optimization in mind. The objective was to get numerically close intermediates. I'll have a closer look to understand what can be optimized.

Contributor Author

tdakhran Dec 15, 2025

removed most of transposes

tools/mtmd/models/lfm2-audio-enc.cpp Outdated

Comment on lines 251 to 258

    
                      cur = ggml_mul_mat(ctx0, model.mm_1_w, cur);

                      cur = ggml_add(ctx0, cur, model.mm_1_b);

                      cb(cur, "audio_adapter.model.{}", 1);

                      cur = ggml_gelu_erf(ctx0, cur);

                      cb(cur, "audio_adapter.model.{}", 2);

                      cur = ggml_mul_mat(ctx0, model.mm_3_w, cur);

                      cur = ggml_add(ctx0, cur, model.mm_3_b);

                      cb(cur, "audio_adapter.model.{}", 3);

Collaborator

ngxson Dec 15, 2025

this can be replaced with build_ffn

Contributor Author

tdakhran Dec 15, 2025

didn't recognize it, will replace

Contributor Author

tdakhran commented Dec 15, 2025

@ngxson , I addressed most of the feedback, added a comment explaining why build_attn cannot be used, removed unnecessary transposes, and simplified permutes. Applied the formatting as well.

PR requires #18061, otherwise rope_theta won't be set.

tdakhran force-pushed the tarek/feat/lfm2-asr-upstream branch from 8ba4562 to ba9e597 Compare

December 15, 2025 21:14

Contributor Author

tdakhran commented Dec 15, 2025

Rebased to incorporate #18061, now works as is.

tdakhran added 5 commits

December 15, 2025 22:14


          ASR with LFM2-Audio-1.5B

145b628


          Set rope_theta

4f5d521


          Fix comment

0e8779a


          Remove rope_theta setting

f5b132a


          Address PR feedback

ba9e597

Collaborator

ngxson commented Dec 15, 2025 •

edited

Loading

Thanks @tdakhran ! I'll do a final review tmr and will push commits directly here if needed.

For now, my priority will be to make sure that the GGUF is ready for any possible optimizations in the future. We can then look deeper into these optimizations in a follow-up PR (so users won't have to re-generate the GGUF)

This comment was marked as outdated.

Sign in to view

Collaborator

ngxson commented Dec 16, 2025

nevermind, I can do a follow-up PR

ngxson approved these changes

View reviewed changes

Collaborator

ngxson commented Dec 16, 2025 •

edited

Loading

hein? I have no idea why github doesn't allow me to merge it 😂

I will copy the commit to another PR then

ngxson mentioned this pull request

model : add ASR support for LFM2-Audio-1.5B (conformer) #18106

Open

Collaborator

ngxson commented Dec 16, 2025

Superseded by #18106

ngxson closed this

loci-dev mentioned this pull request

UPSTREAM PR #18106: model : add ASR support for LFM2-Audio-1.5B (conformer) auroralabs-loci/llama.cpp#592

Open

Contributor Author

tdakhran commented Dec 16, 2025

@ngxson , my bad, I think I forgot to click "allow edits" when created PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

examples ggml Nvidia GPU python testing