The ACTRES Parallel Corpus (P-ACTRES 2.0)


The ACTRES Parallel Corpus (P-ACTRES 2.0) is a bidirectional English-Spanish corpus developed by ACTRES research group. P-ACTRES 2.0 contains over 4 million words both directions. From original English texts to their Spanish translations, the former P-ACTRES 1.0 (Izquierdo, Hofland & Reigem, 2008), is about 2,5 million, and from original Spanish texts to their English translations, 1.5 million words. P-ACTRES includes 5 subcorpora: books-fiction, books-nonfiction, newspaper articles, magazine articles and miscellaneous. P-ACTRES 2.0 allows users to carry out corpus-based linguistic and textual contrast as well as Translation Studies projects either independently or jointly. It has proved to be a useful tool for studies at both lexico-grammatical and rhetorical level. Technically, P-ACTRES 2.0 was designed by Knut Holland (University of Bergen). It employs Corpus Workbench (CWB) (Stefan & Hardie, 2011) for managing and querying and Treetagger for POS annotation (Schmid, 1995). Textual pairs were aligned employing TCA2 (Hofland & Johansson, 1998) and subsequently checked manually. P-ACTRES 2.0 is an upgrade from P-ACTRES 1.0 developed by Hugo Sanjurjo-González (University of León) in collaboration with Knut Hofland to house new repositories. It has been designed using HTML5 ("HTML5", 2014) technology as well as JavaScript and Perl script files in order to show results of corpus queries and related statistics. Further work on the P-ACTRES 2.0 features new annotation layers, supporting of n-grams queries and more sophisticated statistics based on R programming language ("R Core Team", 2013).