Papers by Dimitris Boumparis

This paper aims to explore whether cross-linguistic authorship attribution and author's gende... more This paper aims to explore whether cross-linguistic authorship attribution and author's gender identification are feasible using Machine Translation (MT) as a method to bridge the language gap. We designed a series of computational stylistics experiments to explore whether the stylometric signal survives through the MT process. We compiled an extensive blog corpus in Greek containing 100 authors, balanced in gender with 50 texts from each author. Then, we used Google's Neural Machine Translation to automatically translate each text into English. We ran several classification experiments using the Random Forest algorithm in authorship attribution and gender profiling tasks employing different feature groups in both the source language (Greek) and the machine-translated (English) corpora. Moreover, we trained models in the source language and used the texts in the target language as the unseen test set, to simulate a cross-linguistic prediction. The results showed that cross-l...
Uploads
Papers by Dimitris Boumparis