Parallel Corpora
4,994 Followers
Recent papers in Parallel Corpora
The availability of partially overlapping parallel corpora for a language pair opens up opportunities for automatically comparing, evaluating and improving them. We compare and evaluate the alignment quality of two English-Estonian... more
Resumen: Los corpus de textos son herramientas de larga tradición y numerosas aplicaciones. De todos los tipos existentes, este trabajo se centra en uno en concreto: el corpus paralelo alineado. Tomando como punto de partida un corpus... more
Rule Based Machine Translation (RBMT) and Statistical Machine translation (SMT) have different approach in performing translation task. RBMT uses linguistic rule between two languages which is built manually by human in general, whereas... more
The aim of this paper is to investigate Polish equivalents of English phrasal verbs as found in an English-Polish (E-P) parallel corpus PHRAVERB. Given the semantic idiosyncrasy exhibited by phrasal verbs, it is assumed that the... more
In this paper we present a method for term extraction that can be used in classroom with translation students. The terms are extracted from a multilingual parallel corpus with the aid of a parallel concordancer, AntPConc. Our work is... more
Rafael guzmán tiRado iRina a. VotyakoVa (ed.) gRanada 2013 tipología léxica cualquier forma de reproducción, distribución, comunicación pública o transformación de esta obra sólo puede ser realizada con la autorización de sus titulares,... more
The paper discusses the main trends in the development of the parallel corpora within the RNC since 2015. The New languages section deals with seven new language pairs that emerged during this period, their architecture and tagging.... more
The article presents the analysis of etiquette formulas (forms of address, greetings and farewells) used between teachers of Russian as a foreign language and students studying Russian outside Russia. the survey was conducted among 100... more
"This article presents a corpus-based study of the metaphorical and metonymical use of the words "head" and "heart," together with the Norwegian correspondents "hode" and "hjerte." The continuum between metaphor and metonymy is explored,... more
摘要: 最近几十年,语料库语言学已成为现代应用语言学的支柱。因此,本文的宗旨是更深入地探讨语料库建设的一些认知性和操作性的步骤,以便把语料库观念向广大的研究人员推广。本文主要分为三个部分: 1. 语料库建设:理论与实践 2. 语料文本的加工层面 3. 语料格式属性的标注... more
В статье рассматривается на корпусном материале русская конструкция типа пошёл было в сопоставлении с белорусским плюсквамперфектом (форма типа пайшоў быў). Выявлены некоторые особенности менее изученной белорусской формы -прежде всего,... more
This paper presents a bilingual corpus-based study of the use of several nouns meaning ‘time’ or time units (‘hour’, ‘minute’, ‘moment’) in Bulgarian and Ukrainian. All matching instances of these words in a collection of parallel texts... more
In this thesis we describe and evaluate a tool for automatic generation of translations for multiword English terms into Spanish from a monolingual specialized Spanish corpus, compiled by means of web crawling. The resulting translations... more
The paper reports on a study based on the data drawn from such a corpus. The aim of the study was to find and examine the closest Polish translation equivalents of two semantically related verbs in Czech. The author starts with the... more
This paper concentrates on the verbal moods used after Spanish adverbs expressing potentiality (quizá(s), tal vez, probablemente, posiblemente). With the use of the corpus CREA, we sought to determine whether there is a preference for... more
We report on a project to annotate biblical texts in order to create an aligned multilingual Bible corpus for linguistic research, particularly computational linguistics, including automatically creating and evaluating translation... more
We present HindEnCorp, a parallel corpus of Hindi and English, and HindMonoCorp, a monolingual corpus of Hindi in their release version 0.5. Both corpora were collected from web sources and preprocessed primarily for the training of... more
La lingüística histórica, en su camino hacia la consagración como disciplina autónoma, no ha podido, o no ha querido, distanciarse de las corrientes anejas que transitan y evolucionan en el seno de una lingüística más general y... more
Canonical question tags feature prominently in spoken English, where they display great versatility. At face value they are meant to elicit a response from a co-participant in the form of (dis)agreement with the proposition to which the... more
يُعْتَبَر علم الذخائر اللغوية من العلوم اللغوية التأسيسية التي تُرَسِّخْ مفهوم دراسة اللغة في بيئتها الطبيعية، بعيدًا عن القياس اللغوي المنطقي الذي ساد في حقل الدراسات اللغوية قرونًا عدة. إن علم الذخائر اللغوية، الذي أَسَّسَ له عالم اللغة... more
We report on a project to annotate biblical texts in order to create an aligned multilingual Bible corpus for linguistic research, particularly computational linguistics, including automatically creating and evaluating translation... more
Accessing historical texts is often a challenge because readers either do not know the historical language, or they are challenged by the technological hurdle when such texts are available digitally. Merging corpus linguistic methods and... more
The sentences in the RNC are aligned sentence -by -sentence. The texts kindly offered for the use in the RNC by Adrian Barentsen and included into the Amsterdam Slavic Parallel Aligned Corpus multilingual corpus are already aligned... more
Contrastive methods have long been employed in lexicography, in particular in bi-and multilingual dictionary projects. The main rationale for this is the necessity to comprehensively study, i.e. compare and contrast, two or more... more
The ACTRES Parallel Corpus (P-ACTRES 2.0) is a bidirectional English-Spanish corpus developed by ACTRES research group. P-ACTRES 2.0 contains over 4 million words both directions. From original English texts to their Spanish translations,... more
Automatic extraction of bilingual lexicons from parallel corpora has been recently exploited to overcome the knowledge acquisition bottleneck in a number of research areas in natural language processing, such as machine translation (MT)... more
This paper describes the first phase of the CEXI project at the University of Bologna in Forlì, involving the selection of the texts to be included in the corpus and decisions about the processing of these texts. The aim of the project is... more
The paper describes semantic properties of Perfect forms in European languages exemplified by a massive parallel corpus. A NeighbourNet distance graph for European Perfects is built. In a separate section, the English Perfect in the... more
reading and commenting on a draft of this paper. 2 There is no published account on this corpus; for an example of work with it, see 3 See
This study will examine the prefixed derivates from the verb of motion (VoM) ходить and analyse their translations to German by focusing on the problem of determining the correct meaning of individual forms and possible irregularities in... more
This paper presents a comparison between Russian prefixed verbs of memory and their Italian equivalent. In particular, analysing a Russian-Italian parallel corpus, we observed the strategies used for the translation of these verbs from... more
Research in the Humanities is predominantly text-based. For centuries scholars have studied documents such as historical manuscripts, literary works, legal contracts, diaries of important personalities, old tax records etc. Manual... more
In this paper we describe an alignment system that aligns English-Hindi texts at the sentence and word level in parallel corpora. We describe a simple sentence length approach to sentence alignment and a hybrid, multi-feature approach to... more
The present Ph. D. thesis deals with the so-called grey areas that can be found within the Spanish modal system. In these areas, two different types of modality (modal meanings) can occur. We study the relationship that can be found... more