This repository provides the code and data of the work in ACL2021: PLOME: Pre-training with Misspelled Knowledge for Chinese Spelling Correction https://aclanthology.org/2021.acl-long.233.pdf
We have a new work on CSC in ACL 2022: https://github.com/liushulinle/CRASpell
Requirements:
-
python3
-
tensorflow1.14
-
horovod
Instructions:
-
Finetune:
train and evaluation file format: original sentence \t golden sentence
step1: cd finetune_src ; step2: download the pretrained PLOME model and corpus from https://drive.google.com/file/d/1aip_siFdXynxMz6-2iopWvJqr5jtUu3F/view?usp=sharing ; step3: sh start.sh
-
Pre-train
step1: cd pre_train_src ; step2: sh gen_train_tfrecords.sh ; step3: sh start.sh
Our Pretrained Models:
国内下载地址:
PLOME: https://share.weiyun.com/OREEY0H3
cBERT: https://share.weiyun.com/wXErg7gB