mtmd: Add DeepSeekOCR Support #17400

sfallah · 2025-11-20T09:11:15Z

Feature Request: #16676

Make sure to read the contributing guidelines before submitting a PR

GGUF Models

sabafallah/DeepSeek-OCR-GGUF

deepseek-ocr-f32.gguf

mmproj-deepseek-ocr-f32.gguf

Running the Model

Build llama.cpp (Mac)

cmake -B build -G Ninja -DCMAKE_BUILD_TYPE=Release
cmake --build build -j --config Release

Running llama-mtmd-cli

DeepSeekOCR Paper (First page)

build/bin/llama-mtmd-cli \
-m gguf_models/deepseek-ai/deepseek-ocr-f16.gguf \
--mmproj gguf_models/deepseek-ai/mmproj-deepseek-ocr-f16.gguf \
--image tmp/mtmd_test_data/Deepseek-OCR-2510.18234v1_page1.png \
-p "<|grounding|>Convert the document to markdown." \
--chat-template deepseek-ocr --temp 0

Hard Test (Old Newspaper Image)

build/bin/llama-mtmd-cli \
-m gguf_models/deepseek-ai/deepseek-ocr-f16.gguf \
--mmproj gguf_models/deepseek-ai/mmproj-deepseek-ocr-f16.gguf \
--image tools/mtmd/test-1.jpeg \
-p "<|grounding|>Convert the document to markdown." \
--chat-template deepseek-ocr --temp 0

init commit

mtmd: fix vision model processing

…f/deepseek-ocr

testing Vision model loading

mtmd: DeepseekOCR Implement DeepSeek3B-MoE-A570M (LM component)

…ut in deepseek2 model

…f/deepseek-ocr

ngxson · 2025-12-12T15:06:09Z

heads up, sorry for the breaking change but there will be a refactoring (just moving stuff around) in #17965

after finishing with this refactoring (and after you done testing on your side), I'll go back to deepseek-ocr

…rge_#17965 # Conflicts: # src/llama-kv-cache.cpp # tools/mtmd/clip.cpp

Merged with PR ggml-org#17965

sfallah · 2025-12-13T16:49:49Z

@ngxson

heads up, sorry for the breaking change but there will be a refactoring (just moving stuff around) in #17965

after finishing with this refactoring (and after you done testing on your side), I'll go back to deepseek-ocr

Merge with #17965 is done.
I have also added deepseek-ocr to tests.sh.
As far my tests goes, it works, but the python test script is not done yet.
I will finish the test script tomorrow.

python test script for deepseek-ocr testing OCR on text-1.jpeg newspaper image checking against expected reference model output for Free-OCR and Markdown

# Conflicts: # convert_hf_to_gguf.py

# Conflicts: # gguf-py/gguf/constants.py # gguf-py/gguf/tensor_mapping.py # tools/mtmd/clip-impl.h # tools/mtmd/clip.cpp # tools/mtmd/models/models.h

# Conflicts: # src/llama-arch.cpp

- added GLM-4.6V to big tests - added missing deps for python test

convert_hf_to_gguf.py

sfallah and others added 22 commits November 14, 2025 12:40

mtmd: llama.cpp DeepSeekOCR support

43a130b

init commit

loading sam tensors

b6b9f02

mtmd: fix vision model processing

85c7cda

Merge pull request #1 from bluebread/sf/deepseek-ocr

578c8d7

mtmd: fix vision model processing

deepseek-ocr clip-vit model impl

2aab52e

mtmd: add DeepSeek-OCR LM support with standard attention

eab28ed

mtmd: successfully runs DeepSeek-OCR LM in llama-cli

7630587

mtmd: Fix RoPE type for DeepSeek-OCR LM.

2de3436

Merge branch 'sf/deepseek-ocr' of github.com:sfallah/llama.cpp into s…

e8b2610

…f/deepseek-ocr

loading LM

97e0907

testing Vision model loading

Merge branch 'sf/deepseek-ocr' into sf/deepseek-ocr

13dc6fb

Merge pull request #2 from bluebread/sf/deepseek-ocr

b32bb5e

mtmd: DeepseekOCR Implement DeepSeek3B-MoE-A570M (LM component)

sam warmup working

790bbb9

sam erroneous return corrected

cec9a5c

clip-vit: corrected cls_embd concat

8b3d319

clip-vit: model convert qkv_proj split

1e08157

corrected combining of image encoders' results

331cea8

fix: update callback for ffn_moe_weighted and add callback for attn_o…

6c0715b

…ut in deepseek2 model

Merge branch 'sf/deepseek-ocr' of github.com:sfallah/llama.cpp into s…

a65ddf5

…f/deepseek-ocr

concat image_newline and image_seperator tokens

63a042f

visual_model warmup (technically) works

89afda8

window partitioning using standard ggml ops

88032f4

sfallah requested review from CISC, ggerganov and ngxson as code owners November 20, 2025 09:11

github-actions bot added model Model specific examples python python script changes labels Nov 20, 2025

sfallah marked this pull request as draft November 20, 2025 09:12

sfallah mentioned this pull request Nov 20, 2025

ggml : enhance rel-pos and window ops with CUDA support #17383

Open

ngxson mentioned this pull request Dec 12, 2025

clip: move model cgraphs into their own files #17965

Merged

sfallah added 5 commits December 13, 2025 10:59

Merge remote-tracking branch 'sfallah/master' into sf/deepseek-ocr-me…

e0e69fd

…rge_#17965 # Conflicts: # src/llama-kv-cache.cpp # tools/mtmd/clip.cpp

quick and (potential) dirty merge with ggml-org#17909

f95a6fe

refactoring, one single builder function and static helpers

f7736f2

added deepseek-ocr test to tests.sh

fb3bb6a

Merge pull request #11 from sfallah/sf/deepseek-ocr-merge_#17965

1b38ccf

Merged with PR ggml-org#17965

bluebread mentioned this pull request Dec 14, 2025

mtmd: generalize image resizing in llava_uhd #18014

Merged

sfallah added 6 commits December 14, 2025 15:14

minor formatting fixes

6c36c03

check with fixed expected resutls

dc2066e

Merge pull request #10 from sfallah/sf/deepseek-ocr-test-script

3fc61d4

python test script for deepseek-ocr testing OCR on text-1.jpeg newspaper image checking against expected reference model output for Free-OCR and Markdown

minor formatting

7f8621c

Merge remote-tracking branch 'sfallah/master' into sf/deepseek-ocr

b3bf8cb

# Conflicts: # convert_hf_to_gguf.py

editorconfig-check fix

8ad98ee

kmk142789 approved these changes Dec 15, 2025

View reviewed changes

sfallah requested review from CISC and ngxson December 16, 2025 06:20

sfallah added 5 commits December 16, 2025 09:09

Merge branch 'ggml-org:master' into sf/deepseek-ocr

4a4f829

Merge remote-tracking branch 'sfallah/master' into sf/deepseek-ocr

51c3de6

# Conflicts: # gguf-py/gguf/constants.py # gguf-py/gguf/tensor_mapping.py # tools/mtmd/clip-impl.h # tools/mtmd/clip.cpp # tools/mtmd/models/models.h

merge with changes from ggml-org#18042

512b2c8

Merge remote-tracking branch 'sfallah/master' into sf/deepseek-ocr

00d2357

# Conflicts: # src/llama-arch.cpp

minor

87e4a00

- added GLM-4.6V to big tests - added missing deps for python test

CISC reviewed Dec 16, 2025

View reviewed changes

convert_hf_to_gguf.py Outdated Show resolved Hide resolved

convert_hf_to_gguf.py Outdated Show resolved Hide resolved

convert: minor fix

f629d02

CISC reviewed Dec 17, 2025

View reviewed changes

convert_hf_to_gguf.py Outdated Show resolved Hide resolved

convert_hf_to_gguf.py Outdated Show resolved Hide resolved

bluebread and others added 4 commits December 17, 2025 03:26

mtmd: format code

5a741fd

convert: quick fix

616f009

convert: quick fix

e5d426b

minor python formatting

c739cf2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

mtmd: Add DeepSeekOCR Support #17400

mtmd: Add DeepSeekOCR Support #17400

sfallah commented Nov 20, 2025 •

edited

Loading

Uh oh!

ngxson commented Dec 12, 2025

Uh oh!

sfallah commented Dec 13, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

mtmd: Add DeepSeekOCR Support #17400

Are you sure you want to change the base?

mtmd: Add DeepSeekOCR Support #17400

Conversation

sfallah commented Nov 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

GGUF Models

Running the Model

Build llama.cpp (Mac)

Running llama-mtmd-cli

DeepSeekOCR Paper (First page)

Hard Test (Old Newspaper Image)

Uh oh!

ngxson commented Dec 12, 2025

Uh oh!

sfallah commented Dec 13, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

sfallah commented Nov 20, 2025 •

edited

Loading