this is sub_project of MNBVC, which is to aim to process DocLayNet dataset to MNBVC format.
Steps:
- Download and unzip:
wget -c https://codait-cos-dax.s3.us.cloud-object-storage.appdomain.cloud/dax-doclaynet/1.0.0/DocLayNet_core.zipunzip DocLayNet_core.zipwget -c https://codait-cos-dax.s3.us.cloud-object-storage.appdomain.cloud/dax-doclaynet/1.0.0/DocLayNet_extra.zipunzip DocLayNet_extra.zip
- Update data_process.py: provide local_path parameter of load_dataset method to the parent directory which 2 zip files have been extracted.
- Run:
python data_process.py