Skip to content

Repo for paper "Enabling Stroke-Level Structural Analysis of Hieroglyphic Scripts without Language-Specific Priors".

License

Notifications You must be signed in to change notification settings

THUNLP-MT/HieroSA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

HieroSA

[📖 Paper] [🤗 HieroSA (Chinese)]

Introduction

We propose HieroSA (Hieroglyph Stroke Analyzer) 🏺, a framework for capturing stroke-level structural representations of hieroglyphic and logographic scripts. It automatically converts characters into normalized stroke-segment representations ✍️, without relying on handcrafted rules or script-specific priors.

HieroSA supports both modern logographic scripts and ancient hieroglyphs 🌍, enabling cross-lingual structural generalization. Experimental results demonstrate that it effectively captures character-level structure and semantics 🧩, providing a solid foundation for downstream analysis and understanding of hieroglyphic writing systems.

Performance

Environment Setup

This project is built on the VERL framework. Follow the commands below to set up the environment:

git clone https://github.com/THUNLP-MT/HieroSA && cd HieroSA
conda create -n HieroSA python=3.12
conda activate HieroSA
./scripts/install.sh

Training

Prepare your image data in JPG or PNG format and place all images in a single directory. Run the following script to preprocess the data:

./scripts/prepare_data.sh

Download Qwen3-VL-4B-Instruct as the base model here, and start training with the following command:

./scripts/train.sh

Evaluation

Prepare your image data in JPG or PNG format and place all images in a single directory.

Download the pretrained HieroSA (Chinese) checkpoint here, and run inference with the following command:

./scripts/infer.sh

Citation

If you find our work helpful for your research, please consider citing our work.

@article{luo2026hierosa,
    title={Enabling Stroke-Level Structural Analysis of Hieroglyphic Scripts without Language-Specific Priors}, 
    author={Fuwen Luo and Zihao Wan and Ziyue Wang and Yaluo Liu and Pau Tong Lin Xu and Xuanjia Qiao and Xiaolong Wang and Peng Li and Yang Liu},
    journal={arXiv preprint arXiv:2601.05508},
    year={2026}
}

About

Repo for paper "Enabling Stroke-Level Structural Analysis of Hieroglyphic Scripts without Language-Specific Priors".

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published