This repository contains the dataset published alongside the paper "Extracting Structured Scholarly Information from the Machine Translation Literature" in LREC 2016.
The preprocessed paper content is in xml format, and is inside "paper_content" folder. sci.xml contains citation information as well (which is not used for this project).
The original survey results are in the csv format, and processed data is in text format.
We included two python scripts which we used to processes paper xml and text format, in case it can be helpful.
If you have further questions, please contact the authors.
Eunsol Choi([email protected]) Matic Horvat([email protected]).