-
Notifications
You must be signed in to change notification settings - Fork 14.2k
model : add PaddleOCR #16701
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
model : add PaddleOCR #16701
Conversation
|
@ngxson thanks for the great work on this, I was really looking forward to benchmarking this model, until I saw it's limitations, on your point here "Model generate hallucinated text, likely because of the projector being incorrect" I don't think it's due to the projector, I cloned your branch to see why it's hallucinating, it seems to be due to the lack of pre-processing input done by this model "PP-DocLayoutV2"... PaddleOCR-VL is not an end to end VLM, it relies on "PP-DocLayoutV2" for detection, it's basically a glorified version of LayoutLM. |
|
@TalonBvV thanks for the info. Yes I also almost come to the same conclusion. The main issue is that PaddleOCR is not just one monolithic model like Qwen or Deepseek-OCR, but it's more like a pipeline of multiple models glued together. Therefore, I don't think we currently have the infrastructure to bring it into llama.cpp. I'll close this PR for now as it's not giving any meaningful results. For users who need to do OCR task, I would recommend having a look at the latest Qwen3-VL series, or LightOnOCR-1B |
|
@TalonBvV Hi, I managed to convert the PP-DocLayoutV2 part of the pipeline into onnx format by adding a onnx conversion mapping for index_put to paddle2onnx. I've been looking into DeepSeek-OCR, but their accuracy is actually lower than PaddleOCR-VL for real-world use, and it is also only 0.9B, which makes it runnable on basically any device. |
This is a very early WIP
Progress:
Only the language model is working now. The vision encoder is not yet implementedVision encoder is added, but not yet numerically correct