Parallel Corpora
Most downloaded papers in Parallel Corpora
"This article presents a corpus-based study of the metaphorical and metonymical use of the words "head" and "heart," together with the Norwegian correspondents "hode" and "hjerte." The continuum between metaphor and metonymy is explored,... more
This paper proposes a new approach for collecting lexical and grammatical data: one that meets the need to control the features to be elicited, while ensuring a fair level of idiomaticity. The method, called conversational questionnaires,... more
Contrastive methods have long been employed in lexicography, in particular in bi-and multilingual dictionary projects. The main rationale for this is the necessity to comprehensively study, i.e. compare and contrast, two or more... more
We present HindEnCorp, a parallel corpus of Hindi and English, and HindMonoCorp, a monolingual corpus of Hindi in their release version 0.5. Both corpora were collected from web sources and preprocessed primarily for the training of... more
In this thesis we describe and evaluate a tool for automatic generation of translations for multiword English terms into Spanish from a monolingual specialized Spanish corpus, compiled by means of web crawling. The resulting translations... more
We report on a project to annotate biblical texts in order to create an aligned multilingual Bible corpus for linguistic research, particularly computational linguistics, including automatically creating and evaluating translation... more
This paper describes the first phase of the CEXI project at the University of Bologna in Forlì, involving the selection of the texts to be included in the corpus and decisions about the processing of these texts. The aim of the project is... more
Automatic extraction of bilingual lexicons from parallel corpora has been recently exploited to overcome the knowledge acquisition bottleneck in a number of research areas in natural language processing, such as machine translation (MT)... more
The feature selection or extraction is the most important task in Opinion mining and Sentimental Analysis (OSMA) for calculating the polarity score. These scores are used to determine the positive, negative, and neutral polarity about... more
The paper presents an attempt to propose an exact method for identifying the so-called "language-specific" lexicon, a controversial notion often reasonably questioned. An aligned bilingual parallel corpus is chosen as an instrument for... more
The aim of this paper is to investigate Polish equivalents of English phrasal verbs as found in an English-Polish (E-P) parallel corpus PHRAVERB. Given the semantic idiosyncrasy exhibited by phrasal verbs, it is assumed that the... more
This paper concentrates on the verbal moods used after Spanish adverbs expressing potentiality (quizá(s), tal vez, probablemente, posiblemente). With the use of the corpus CREA, we sought to determine whether there is a preference for... more
The sentences in the RNC are aligned sentence -by -sentence. The texts kindly offered for the use in the RNC by Adrian Barentsen and included into the Amsterdam Slavic Parallel Aligned Corpus multilingual corpus are already aligned... more
(Draft with minor differences to published version)
The paper describes three studies concerned with inner-Slavic variation in the use of different functional categories, two of which involve verbal aspect and one of which involves reflexive coding. The leading interest behind this... more
In this paper we present a method for term extraction that can be used in classroom with translation students. The terms are extracted from a multilingual parallel corpus with the aid of a parallel concordancer, AntPConc. Our work is... more
reading and commenting on a draft of this paper. 2 There is no published account on this corpus; for an example of work with it, see 3 See
This study investigates formal and functional variation of analytic causatives (ACs) in eighteen European languages from the Indo-European and Uralic language families. Employing the comparative concept approach to language comparison,... more
Nous proposons une méthode de découverte et de compilation des normes de traduction des concepts spécialisés employés dans des termes simples et complexes attestés dans un corpus parallèle bilingue. Les normes de traduction mises au jour... more
La lingüística histórica, en su camino hacia la consagración como disciplina autónoma, no ha podido, o no ha querido, distanciarse de las corrientes anejas que transitan y evolucionan en el seno de una lingüística más general y... more
The present study investigates the cross-linguistic differences in the use of so-called T/V forms (e.g. French tu and vous, German du and Sie, Russian ty and vy) in ten European languages from different language families and genera. These... more
Multilingual corpora, containing the same documents in a variety of languages, are becoming an essential resource for natural language processing. Clustering multilingual corpora provides us with an insight into the differences between... more
The merging of corpus linguistic methods and digital technology can provide new ways of representing medieval digital texts. In this paper, we introduce a multi-layered parallel Old Occitan-English corpus. We show how parallel alignment... more
Accessing historical texts is often a challenge because readers either do not know the historical language, or they are challenged by the technological hurdle when such texts are available digitally. Merging corpus linguistic methods and... more
Demand for Chinese-to-English translation has increased over recent years. In contrast, resources for training translators for Chinese-to-English are few although increasing now, relative to English-to-Chinese for example. Corpus-based... more
Canonical question tags feature prominently in spoken English, where they display great versatility. At face value they are meant to elicit a response from a co-participant in the form of (dis)agreement with the proposition to which the... more
This paper describes the characteristics of Brazilian Portuguese (BP) thetic sentences by means of a parallel corpus study consisting of the original dialogues of two Argentinean movies from 2004 and 2010 and the corresponding doubling... more
Generating source code API sequences from an English query using Machine Translation (MT) has gained much interest in recent years. For any kind of MT, the model needs to be trained on a parallel corpus. In this paper we clean... more
Anotace Several parallel corpora built from European Union language resources are presented here. They were processed by state-of-the-art tools and made available for researchers in the Sketch Engine corpus management system. A completely... more