[IJCAI-2024] The official code of Self-Supervised Pre-training with Symmetric Superimposition Modeling for Scene Text Recognition
Download the datasets from the following links:
- LMDB archives for MJSynth, SynthText, IIIT5k, SVT, SVTP, IC13, IC15, CUTE80, ArT, RCTW17, ReCTS, LSVT, MLT19, COCO-Text, and Uber-Text.
- LMDB archives for TextOCR and OpenVINO.
- Union14M archivesfor Union14M
- Place your data in data_path and create output_path directory
- Execute the following script:
cd pretrain
bash scripts/encoder-pretrain.shThe shell script contains the following configuration:
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python3 -m torch.distributed.launch --nproc_per_node=8 --nnodes=1 --master_port 29060 sim_pretrain.py \
--lr 5e-4 \
--batch_size 144 \
--mode single \
--model flipae_sim_vit_small_str \
--epochs 20 \
--warmup_epochs 1 \
--mm 0.995 \
--mmschedule 'cosine' \
--output_dir $output_path$ \
--data_path $data_path$ \
--direction aug_pool --num_workers 10 cd finetune
CUDA_VISIBLE_DEVICES=0,1,2,3 ./train.py +experiment=mdr-dec6-union./train.py +experiment=parseq-tiny pretrained=parseq-tiny # Not all experiments have pretrained weightsThe base model configurations are in configs/model/, while variations are stored in configs/experiment/.
./train.py +experiment=parseq-tiny # Some examples: abinet-sv, trbc./train.py charset=94_full # Other options: 36_lowercase or 62_mixed-case. See configs/charset/./train.py dataset=real # Other option: synth. See configs/dataset/./train.py model.img_size=[32, 128] model.max_label_length=25 model.batch_size=384./train.py data.root_dir=data data.num_workers=2 data.augment=true./train.py trainer.max_epochs=20 trainer.accelerator=gpu trainer.devices=2Note that you can pass any Trainer parameter,
you just need to prefix it with + if it is not originally specified in configs/main.yaml.
./train.py +experiment=<model_exp> ckpt_path=outputs/<model>/<timestamp>/checkpoints/<checkpoint>.ckptThe test script, test.py, can be used to evaluate any model trained with this project. For more info, see ./test.py --help.
PARSeq runtime parameters can be passed using the format param:type=value. For example, PARSeq NAR decoding can be invoked via ./test.py parseq.ckpt refine_iters:int=2 decode_ar:bool=false.
./test.py outputs/<model>/<timestamp>/checkpoints/last.ckpt ./test.py outputs/<model>/<timestamp>/checkpoints/last.ckpt # lowercase alphanumeric (36-character set)
./test.py outputs/<model>/<timestamp>/checkpoints/last.ckpt --cased # mixed-case alphanumeric (62-character set)
./test.py outputs/<model>/<timestamp>/checkpoints/last.ckpt --cased --punctuation # mixed-case alphanumeric + punctuation (94-character set)./test.py outputs/<model>/<timestamp>/checkpoints/last.ckpt --new@misc{gao2024selfsupervised,
title={Self-Supervised Pre-training with Symmetric Superimposition Modeling for Scene Text Recognition},
author={Zuan Gao and Yuxin Wang and Yadong Qu and Boqiang Zhang and Zixiao Wang and Jianjun Xu and Hongtao Xie},
year={2024},
eprint={2405.05841},
archivePrefix={arXiv},
primaryClass={cs.CV}
}