PictOBI-20k: Unveiling Large Multimodal Models in Visual Decipherment for Pictographic Oracle Bone Characters 🔍

A pioneering benchmark bridging large multimodal models with the visual decipherment of ancient Chinese oracle bone scripts

Zijian Chen^1,2,†, Wenjie Hua^3,†, Jinhao Li⁴, Lirong Deng⁵, Fan Du⁶, Tingzhu Chen^1,★, Guangtao Zhai^1,2,★

¹Shanghai Jiao Tong University ²Shanghai AI Lab ³Wuhan University ⁴East China Normal University ⁵Macao Polytechnic University ⁶Southern University of Science and Technology

^†Equal contribution ^★Corresponding authors

中文版-知乎

Overview of PictOBI-20k: We present PictOBI-20k, a large-scale dataset for evaluating LMMs on the visual decipherment of pictographic Oracle Bone Characters (OBCs). The dataset comprises 20k carefully curated OBC–object image pairs and over 15k multi-choice questions. To further assess visual reasoning, we provide subjective annotations examining the consistency of reference points between humans and LMMs. Experimental results suggest that while general LMMs exhibit preliminary visual decipherment ability, they often fail to effectively leverage visual information and remain constrained by language priors. We hope PictOBI-20k can serve as a foundation for advancing evaluation and optimization of visual attention in OBC-oriented LMMs.

Examples of reference point heatmaps:

Release

[2026/1/17] 🔥🔥🔥 PictOBI-20k has been accepted by the ICASSP2026!
[2025/12/24] 🔥 PictOBI-20k has been included in the annual-benchmark list of OpenCompass!
[2025/09/29] 🔥 PictOBI-20k and code have been released !
[2025/09/09] 🔥 Github repository for PictOBI-20k is online.

Dataset 📦

You can download the PictOBI-20k by Netdisk with extraction code: rtx1 or huggingface.

Implementation 💻

We recommend directly installing the environment for the model to be evaluated.

Such as Qwen2.5-VL, InternVL3, and OpenAI.
Download the PictOBI-20k and unzip it to current path. You will have two json files (i.e., quiz_data.json for MCQ evaluation; quiz_reference_point.json for reference point exploration) and three image folders (OBC_image, object_image, and reference_point_quiz).
Multi-choice question evaluation: We provide three pairs of code examples for proprietary LMM (GPT) and open-source LMMs (Qwen25-vl-72B and InternVL3-78B).

Run it directly:

python gpt.py  # add your api_key first
python qwen25-vl-72B.py
python internvl3-78B.py

Reference point explore: We provide three pairs of code examples also, which are ended with -refpoint.py.

Benchmark Design

Focusing on the Visual-Decipherment Abilities of LMMs for OBCs

We collect OBC and real object images from 12 sources, covering multiple font appearances and categories. Based on these, we construct 15,175 multi-choice questions for LMM evaluation. Meanwhile, we conduct human annotations for obtaining reference points on OBC-object image pairs.

OBC and Real-Object Image Sources

We collect OBC images from 3 OBC-centric ancient script websites, YinQiWenYuan, XiaoXueTang, and GuoXueDaShi, as well as 5 open-source OBC datasets, including Oracle-241, Oracle-50k, HUST-OBS, OBI125, and OBIdatasetIJDH. Corresponding real-object images (≈4.8k) are carefully collected from Freepik, Pexels, Pinterest, and the Academia Sinica Bronze Ware Database.

Benchmark Candidates

We evaluate 11 LMMs—including GPT-4o, Gemini 2.5 Pro, Claude 4 Sonnet, GLM-4.5V, the Qwen2.5-VL family, and the InternVL3 series—alongside three vision encoders (DINOv2-L/14, CLIP-L/14, InternViT-300M) to assess multimodal and visual-only performance on pictographic OBCs.

Performance Benchmark on Pictographic OBC Tasks

Results on the classification tasks — Average accuracy in terms of OBC classes (click to expand)

Results on the consistency tasks — The consistency (%) of visual reference on 240 OBC–object pairs between humans and LMMs (click to expand)

Results on the vision encoder analysis — Visualization of attention maps (left) and accuracy from direct readout of visual encoders (right) (click to expand)

Contact 📧

Please contact the authors for queries.

Zijian Chen, [email protected]

Citation📎

If you find our work useful, please feel free to cite our paper:

@article{chen2025pictobi20kunveilinglargemultimodal,
      title={PictOBI-20k: Unveiling Large Multimodal Models in Visual Decipherment for Pictographic Oracle Bone Characters}, 
      author={Zijian Chen and Wenjie Hua and Jinhao Li and Lirong Deng and Fan Du and Tingzhu Chen and Guangtao Zhai},
      journal={arXiv preprint arXiv:2509.05773},
      year={2025},
}

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
assets		assets
.DS_Store		.DS_Store
LICENSE		LICENSE
README.md		README.md
gpt-refpoint.py		gpt-refpoint.py
gpt.py		gpt.py
internvl3-78B-refpoint.py		internvl3-78B-refpoint.py
internvl3-78B.py		internvl3-78B.py
qwen25-vl-72B-refpoint.py		qwen25-vl-72B-refpoint.py
qwen25-vl-72B.py		qwen25-vl-72B.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PictOBI-20k: Unveiling Large Multimodal Models in Visual Decipherment for Pictographic Oracle Bone Characters 🔍

Release

Dataset 📦

Implementation 💻

Benchmark Design

Focusing on the Visual-Decipherment Abilities of LMMs for OBCs

OBC and Real-Object Image Sources

Benchmark Candidates

Performance Benchmark on Pictographic OBC Tasks

Contact 📧

Citation📎

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

PictOBI-20k: Unveiling Large Multimodal Models in Visual Decipherment for Pictographic Oracle Bone Characters 🔍

Release

Dataset 📦

Implementation 💻

Benchmark Design

Focusing on the Visual-Decipherment Abilities of LMMs for OBCs

OBC and Real-Object Image Sources

Benchmark Candidates

Performance Benchmark on Pictographic OBC Tasks

Contact 📧

Citation📎

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages