Skip to content

OBI-Future/PictOBI-20k

Repository files navigation

PictOBI-20k: Unveiling Large Multimodal Models in Visual Decipherment for Pictographic Oracle Bone Characters 🔍

A pioneering benchmark bridging large multimodal models with the visual decipherment of ancient Chinese oracle bone scripts

Zijian Chen1,2,†, Wenjie Hua3,†, Jinhao Li4, Lirong Deng5, Fan Du6, Tingzhu Chen1,★, Guangtao Zhai1,2,★
1Shanghai Jiao Tong University 2Shanghai AI Lab 3Wuhan University 4East China Normal University 5Macao Polytechnic University 6Southern University of Science and Technology

Equal contribution    Corresponding authors

中文版-知乎

Overview of PictOBI-20k: We present PictOBI-20k, a large-scale dataset for evaluating LMMs on the visual decipherment of pictographic Oracle Bone Characters (OBCs). The dataset comprises 20k carefully curated OBC–object image pairs and over 15k multi-choice questions. To further assess visual reasoning, we provide subjective annotations examining the consistency of reference points between humans and LMMs. Experimental results suggest that while general LMMs exhibit preliminary visual decipherment ability, they often fail to effectively leverage visual information and remain constrained by language priors. We hope PictOBI-20k can serve as a foundation for advancing evaluation and optimization of visual attention in OBC-oriented LMMs.

Examples of reference point heatmaps:

Release

  • [2026/1/17] 🔥🔥🔥 PictOBI-20k has been accepted by the ICASSP2026!
  • [2025/12/24] 🔥 PictOBI-20k has been included in the annual-benchmark list of OpenCompass!
  • [2025/09/29] 🔥 PictOBI-20k and code have been released !
  • [2025/09/09] 🔥 Github repository for PictOBI-20k is online.

Dataset 📦

You can download the PictOBI-20k by Netdisk with extraction code: rtx1 or huggingface.

Implementation 💻

We recommend directly installing the environment for the model to be evaluated.

  1. Such as Qwen2.5-VL, InternVL3, and OpenAI.

  2. Download the PictOBI-20k and unzip it to current path. You will have two json files (i.e., quiz_data.json for MCQ evaluation; quiz_reference_point.json for reference point exploration) and three image folders (OBC_image, object_image, and reference_point_quiz).

  3. Multi-choice question evaluation: We provide three pairs of code examples for proprietary LMM (GPT) and open-source LMMs (Qwen25-vl-72B and InternVL3-78B).

Run it directly:

python gpt.py  # add your api_key first
python qwen25-vl-72B.py
python internvl3-78B.py
  1. Reference point explore: We provide three pairs of code examples also, which are ended with -refpoint.py.

Benchmark Design

Focusing on the Visual-Decipherment Abilities of LMMs for OBCs

We collect OBC and real object images from 12 sources, covering multiple font appearances and categories. Based on these, we construct 15,175 multi-choice questions for LMM evaluation. Meanwhile, we conduct human annotations for obtaining reference points on OBC-object image pairs.

OBC and Real-Object Image Sources

We collect OBC images from 3 OBC-centric ancient script websites, YinQiWenYuan, XiaoXueTang, and GuoXueDaShi, as well as 5 open-source OBC datasets, including Oracle-241, Oracle-50k, HUST-OBS, OBI125, and OBIdatasetIJDH. Corresponding real-object images (≈4.8k) are carefully collected from Freepik, Pexels, Pinterest, and the Academia Sinica Bronze Ware Database.

Benchmark Candidates

We evaluate 11 LMMs—including GPT-4o, Gemini 2.5 Pro, Claude 4 Sonnet, GLM-4.5V, the Qwen2.5-VL family, and the InternVL3 series—alongside three vision encoders (DINOv2-L/14, CLIP-L/14, InternViT-300M) to assess multimodal and visual-only performance on pictographic OBCs.

Performance Benchmark on Pictographic OBC Tasks

Results on the classification tasks — Average accuracy in terms of OBC classes (click to expand)

Results on the consistency tasks — The consistency (%) of visual reference on 240 OBC–object pairs between humans and LMMs (click to expand)

Results on the vision encoder analysis — Visualization of attention maps (left) and accuracy from direct readout of visual encoders (right) (click to expand)

Contact 📧

Please contact the authors for queries.

Citation📎

If you find our work useful, please feel free to cite our paper:

@article{chen2025pictobi20kunveilinglargemultimodal,
      title={PictOBI-20k: Unveiling Large Multimodal Models in Visual Decipherment for Pictographic Oracle Bone Characters}, 
      author={Zijian Chen and Wenjie Hua and Jinhao Li and Lirong Deng and Fan Du and Tingzhu Chen and Guangtao Zhai},
      journal={arXiv preprint arXiv:2509.05773},
      year={2025},
}

About

[ICASSP'26] PictOBI-20k: A dataset designed to evaluate large multimodal models on the visual decipherment tasks of pictographic OBCs

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages