PictOBI-20k: Unveiling Large Multimodal Models in Visual Decipherment for Pictographic Oracle Bone Characters 🔍
A pioneering benchmark bridging large multimodal models with the visual decipherment of ancient Chinese oracle bone scripts
†Equal contribution ★Corresponding authors
Overview of PictOBI-20k: We present PictOBI-20k, a large-scale dataset for evaluating LMMs on the visual decipherment of pictographic Oracle Bone Characters (OBCs). The dataset comprises 20k carefully curated OBC–object image pairs and over 15k multi-choice questions. To further assess visual reasoning, we provide subjective annotations examining the consistency of reference points between humans and LMMs. Experimental results suggest that while general LMMs exhibit preliminary visual decipherment ability, they often fail to effectively leverage visual information and remain constrained by language priors. We hope PictOBI-20k can serve as a foundation for advancing evaluation and optimization of visual attention in OBC-oriented LMMs.
Examples of reference point heatmaps:
- [2026/1/17] 🔥🔥🔥 PictOBI-20k has been accepted by the ICASSP2026!
- [2025/12/24] 🔥 PictOBI-20k has been included in the annual-benchmark list of OpenCompass!
- [2025/09/29] 🔥 PictOBI-20k and code have been released !
- [2025/09/09] 🔥 Github repository for PictOBI-20k is online.
You can download the PictOBI-20k by Netdisk with extraction code: rtx1 or huggingface.
We recommend directly installing the environment for the model to be evaluated.
-
Such as Qwen2.5-VL, InternVL3, and OpenAI.
-
Download the PictOBI-20k and unzip it to current path. You will have two json files (i.e., quiz_data.json for MCQ evaluation; quiz_reference_point.json for reference point exploration) and three image folders (OBC_image, object_image, and reference_point_quiz).
-
Multi-choice question evaluation: We provide three pairs of code examples for proprietary LMM (GPT) and open-source LMMs (Qwen25-vl-72B and InternVL3-78B).
Run it directly:
python gpt.py # add your api_key first
python qwen25-vl-72B.py
python internvl3-78B.py
- Reference point explore: We provide three pairs of code examples also, which are ended with -refpoint.py.
We collect OBC and real object images from 12 sources, covering multiple font appearances and categories. Based on these, we construct 15,175 multi-choice questions for LMM evaluation. Meanwhile, we conduct human annotations for obtaining reference points on OBC-object image pairs.
We collect OBC images from 3 OBC-centric ancient script websites, YinQiWenYuan, XiaoXueTang, and GuoXueDaShi, as well as 5 open-source OBC datasets, including Oracle-241, Oracle-50k, HUST-OBS, OBI125, and OBIdatasetIJDH. Corresponding real-object images (≈4.8k) are carefully collected from Freepik, Pexels, Pinterest, and the Academia Sinica Bronze Ware Database.
We evaluate 11 LMMs—including GPT-4o, Gemini 2.5 Pro, Claude 4 Sonnet, GLM-4.5V, the Qwen2.5-VL family, and the InternVL3 series—alongside three vision encoders (DINOv2-L/14, CLIP-L/14, InternViT-300M) to assess multimodal and visual-only performance on pictographic OBCs.
Results on the consistency tasks — The consistency (%) of visual reference on 240 OBC–object pairs between humans and LMMs (click to expand)
Results on the vision encoder analysis — Visualization of attention maps (left) and accuracy from direct readout of visual encoders (right) (click to expand)
Please contact the authors for queries.
- Zijian Chen,
[email protected]
If you find our work useful, please feel free to cite our paper:
@article{chen2025pictobi20kunveilinglargemultimodal,
title={PictOBI-20k: Unveiling Large Multimodal Models in Visual Decipherment for Pictographic Oracle Bone Characters},
author={Zijian Chen and Wenjie Hua and Jinhao Li and Lirong Deng and Fan Du and Tingzhu Chen and Guangtao Zhai},
journal={arXiv preprint arXiv:2509.05773},
year={2025},
}







