Parallel Corpora Research Papers

"This article presents a corpus-based study of the metaphorical and metonymical use of the words "head" and "heart," together with the Norwegian correspondents "hode" and "hjerte." The continuum between metaphor and metonymy is explored,... more

Bookmark
Download
- by Susan Nacey
- •
- 4
  Metaphor, Metonymy, Parallel Corpora, ENPC

This paper proposes a new approach for collecting lexical and grammatical data: one that meets the need to control the features to be elicited, while ensuring a fair level of idiomaticity. The method, called conversational questionnaires,... more

This paper proposes a new approach for collecting lexical and grammatical data: one that meets the need to control the features to be elicited, while ensuring a fair level of idiomaticity. The method, called conversational questionnaires, consists in eliciting speech not at the level of words or of isolated sentences, but in the form of a chunk of dialogue. Ahead of fieldwork, a number of scripted conversations are written in the area’s lingua franca, each anchored in a plausible real-world situation – whether universal or culture-specific. Native speakers are then asked to come up with the most naturalistic utterances that would occur in each context, resulting in a plausible conversation in the target language. Experience shows that conversational questionnaires provide a number of advantages in linguistic fieldwork, compared to traditional elicitation methods. The anchoring in real-life situations lightens the cognitive burden on consultants, making the fieldwork experience easier for all. The method enables efficient coverage of various linguistic structures at once, from phonetic to pragmatic dimensions, from morphosyntax to phraseology. The tight-knit structure of each dialogue makes it an effective tool for cross-linguistic comparison, whether areal, historical or typological. Conversational questionnaires help the linguist make quick progress in language proficiency, which in turn facilitates further stages of data collection. Finally, these stories can serve as learning resources for language teaching and revitalization. Five dialogue samples are provided here as examples of such questionnaires. Every linguist is encouraged to write their own dialogues, adapted to a region’s linguistic and cultural profile. Ideally, a set of such texts could be developed and made standard among linguists, so as to create comparable or parallel corpora across languages – a mine of data for typological comparison.

François, Alexandre. 2019. A proposal for conversational questionnaires. In Aimée Lahaussois & Marine Vuillermet (eds.), Methodological Tools for Linguistic Description and Typology. Special issue of Language Documentation & Conservation 16, 155-196.

Contrastive methods have long been employed in lexicography, in particular in bi-and multilingual dictionary projects. The main rationale for this is the necessity to comprehensively study, i.e. compare and contrast, two or more... more

Bookmark
Download
- by Marek Łukasik
- •
- 31
  Lexicology, Vocabulary, Terminology, Conceptual Modelling

We present HindEnCorp, a parallel corpus of Hindi and English, and HindMonoCorp, a monolingual corpus of Hindi in their release version 0.5. Both corpora were collected from web sources and preprocessed primarily for the training of... more

Bookmark
Download
- by Vojta Diatka
- •
- 4
  Computational Linguistics, Machine Translation, Corpora, Parallel Corpora

This essay presents the ERC project ‘Transmission of Classical Scientific and Philosophical Literature from Greek into Syriac and Arabic’ (HUNAYNNET) based at the Institute for Medieval Research of the Austrian Academy of Sciences. The... more

Bookmark
Download
- by Grigory Kessel and +1
  Yury Arzhanov
- •
- 14
  Late Antique and Byzantine Studies, Translation Studies, Digital Humanities, History of Medicine

In this thesis we describe and evaluate a tool for automatic generation of translations for multiword English terms into Spanish from a monolingual specialized Spanish corpus, compiled by means of web crawling. The resulting translations... more

Bookmark
Download
- by Olya Novikova
- •
- 15
  Machine Translation, Terminology, Lexical Semantics, Lexicography

We report on a project to annotate biblical texts in order to create an aligned multilingual Bible corpus for linguistic research, particularly computational linguistics, including automatically creating and evaluating translation... more

This paper describes the first phase of the CEXI project at the University of Bologna in Forlì, involving the selection of the texts to be included in the corpus and decisions about the processing of these texts. The aim of the project is... more

Bookmark
Download
- by Federico Zanettin
- •
- 3
  Translation, Parallel Corpora, Corpus-Based Translation

Automatic extraction of bilingual lexicons from parallel corpora has been recently exploited to overcome the knowledge acquisition bottleneck in a number of research areas in natural language processing, such as machine translation (MT)... more

The feature selection or extraction is the most important task in Opinion mining and Sentimental Analysis (OSMA) for calculating the polarity score. These scores are used to determine the positive, negative, and neutral polarity about... more

The paper presents an attempt to propose an exact method for identifying the so-called "language-specific" lexicon, a controversial notion often reasonably questioned. An aligned bilingual parallel corpus is chosen as an instrument for... more

Bookmark
Download
- by Dmitri Sitchinava
- •
- 3
  Lexical Semantics, Parallel Corpora, Lexical Typology

The aim of this paper is to investigate Polish equivalents of English phrasal verbs as found in an English-Polish (E-P) parallel corpus PHRAVERB. Given the semantic idiosyncrasy exhibited by phrasal verbs, it is assumed that the... more

This paper concentrates on the verbal moods used after Spanish adverbs expressing potentiality (quizá(s), tal vez, probablemente, posiblemente). With the use of the corpus CREA, we sought to determine whether there is a preference for... more

Bookmark
Download
- by Dana Kratochvílová
- •
- 10
  Translation Studies, Spanish, Modality, Corpus Linguistics

Translation is a profession highly connected to technology, and for this reason, most of today's translators are in contact with a variety of tools, services and programs, such as word processors, e-mail, electronic dictionaries, among... more

In this paper, syntactic annotation is used to reveal linguistic properties of translations. We employ the Universal Dependencies framework to represent learner and professional translations of English mass-media texts into Russian (along... more

Bookmark
Download
- by Maria Kunilovskaya and +1
  Andrey Kutuzov
- •
- 5
  Translation Studies, Translation Quality, Parallel Corpora, Dependency Parsing

The sentences in the RNC are aligned sentence -by -sentence. The texts kindly offered for the use in the RNC by Adrian Barentsen and included into the Amsterdam Slavic Parallel Aligned Corpus multilingual corpus are already aligned... more

The present paper is about the project of Russian Learner Translator Corpus, which is currently under development. The paper discusses the feasibility of such a corpus and existing analogues, describes the current status of corpus... more

Bookmark
Download
- by Чепуркова Анна and +1
  Andrey Kutuzov
- •
- 6
  Translation Studies, Corpus Linguistics, Learner corpora, Parallel Corpora

Bookmark
Download
- by Andrejs Vasiļjevs
- •
- 2
  Machine Translation, Parallel Corpora

(Draft with minor differences to published version)

The paper describes three studies concerned with inner-Slavic variation in the use of different functional categories, two of which involve verbal aspect and one of which involves reflexive coding. The leading interest behind this... more

In this paper we present a method for term extraction that can be used in classroom with translation students. The terms are extracted from a multilingual parallel corpus with the aid of a parallel concordancer, AntPConc. Our work is... more

reading and commenting on a draft of this paper. 2 There is no published account on this corpus; for an example of work with it, see 3 See

This study investigates formal and functional variation of analytic causatives (ACs) in eighteen European languages from the Indo-European and Uralic language families. Employing the comparative concept approach to language comparison,... more

Bookmark
Download
- by Niraj Aswani
- •
- 5
  Word alignment, Parallel Corpora, Hybrid Approach, Edit Distance

Bookmark
Download
- by Dmitri Sitchinava
- •
- 2
  Corpus Linguistics, Parallel Corpora

Nous proposons une méthode de découverte et de compilation des normes de traduction des concepts spécialisés employés dans des termes simples et complexes attestés dans un corpus parallèle bilingue. Les normes de traduction mises au jour... more

Nous proposons une méthode de découverte et de compilation des normes de traduction des concepts spécialisés employés dans des termes simples et complexes attestés dans un corpus parallèle bilingue. Les normes de traduction mises au jour par cette méthode ont la particularité d’être fondées sur l’usage et prennent appui sur des solutions de traduction éprouvées. Celles-ci sont essentielles à l’enseignement des compétences en traduction spécialisée telles que proposées par le groupe PACTE et généralement acceptées en traductologie. Notre méthode consiste à analyser la traduction des occurrences spécialisées de business en économie et en finance réunies dans un corpus bilingue constitué d’un échantillon d’occurrences aléatoires obtenues au moyen d’un concordancier en ligne. L’analyse repose sur trois catégories d’annotations et leurs corrélations : l’acception de business, la fonction de business dans le syntagme nominal et les modalités de traduction de business. Cette méthode d’analyse peut être facilement étendue à l’ensemble des concepts spécialisés de nature nominale qui sont des unités distinctives des textes de spécialité.

We propose a method for the discovery and compilation of translation standards of specialized concepts used in simple and complex terms attested in a parallel bilingual corpus. The standards of translation developed by this method have the characteristic of being based on usage and are based on proven translation solutions. These are essential to the teaching of specialized translation skills, as proposed by the PACTE group and generally accepted in translation studies. Our method consists in analysing the translation of specialized tokens of business in economics and finance collected in a bilingual corpus made up of a sample of random occurrences generated with an online concordancer. The analysis is based on three categories of annotations and their correlations: senses of business, functions of business in the noun phrase and translation modes of business. This method of analysis can be easily extended to all specialized concepts of a nominal nature, which are distinctive units of specialized texts.

Bookmark
Download
- by Andrejs Vasiļjevs
- •
- 2
  Machine Translation, Parallel Corpora

La lingüística histórica, en su camino hacia la consagración como disciplina autónoma, no ha podido, o no ha querido, distanciarse de las corrientes anejas que transitan y evolucionan en el seno de una lingüística más general y... more

Polysemy is a key issue in theoretical semantics and lexicography as well as in computational linguistics. When words have several senses, it is important to describe them properly in the dictionary (a lexicographic task) and to be able... more

Bookmark
Download
- by Boris Iomdin and +2
  Konstantin Lopukhin
  Grigory Nosyrev
- •
- 11
  Semantics, English language, Word Sense Disambiguation, Lexical Semantics

The present study investigates the cross-linguistic differences in the use of so-called T/V forms (e.g. French tu and vous, German du and Sie, Russian ty and vy) in ten European languages from different language families and genera. These... more

In this study we examine the occurrences and correspondences of terms for blood kinship in a Bulgarian–Ukrainian parallel corpus of fiction. All instances of the terms selected for study, matching and non-matching, were located and... more

Bookmark
Download
- by Ivan Derzhanski and +1
  Olena Siruk
- •
- 11
  Translation Studies, Semantics, Slavic Languages, Corpus Linguistics

The Algerian Arabic dialects are under-resourced languages, which lack both corpora and Natural Language Processing (NLP) tools, although they are increasingly used in written form, especially on social media and forums. We aim through... more

Bookmark
Download
- by Mourad Abbas and +3
  Karima Meftouh
  kamel smaili
  slm hrrt
- •
- 3
  Statistical Machine Translation, Arabic Dialects, Parallel Corpora

Multilingual corpora, containing the same documents in a variety of languages, are becoming an essential resource for natural language processing. Clustering multilingual corpora provides us with an insight into the differences between... more

The merging of corpus linguistic methods and digital technology can provide new ways of representing medieval digital texts. In this paper, we introduce a multi-layered parallel Old Occitan-English corpus. We show how parallel alignment... more

Accessing historical texts is often a challenge because readers either do not know the historical language, or they are challenged by the technological hurdle when such texts are available digitally. Merging corpus linguistic methods and... more

The paper presents parallel corpora within the Russian National Corpus (RNC) featuring Circum-Baltic/Russian language pairs and describes the choice of texts, morphological annotation and possible applications. The following languages of... more

Bookmark
Download
- by Natalia Perkova and +1
  Dmitri Sitchinava
- •
- 3
  Corpus compilation and design, Parallel Corpora, Circum-Baltic languages

In this study we examine the metaphoric mentions of three wild animals considered to be most important in the Slavic popular tradition, namely the wolf, the bear and the hare, in a Bulgarian–Ukrainian parallel corpus. Our goal is to see... more

Bookmark
Download
- by Ivan Derzhanski and +1
  Olena Siruk
- •
- 13
  Translation Studies, Semantics, Slavic Languages, Corpus Linguistics

Demand for Chinese-to-English translation has increased over recent years. In contrast, resources for training translators for Chinese-to-English are few although increasing now, relative to English-to-Chinese for example. Corpus-based... more

Canonical question tags feature prominently in spoken English, where they display great versatility. At face value they are meant to elicit a response from a co-participant in the form of (dis)agreement with the proposition to which the... more

Bookmark
Download
- by Lieven Buysse
- •
- 4
  Pragmatics, Tag Questions, Parallel Corpora, Contrastive Linguistics

The connective because can express both highly objective and highly subjective causal relations. In this, it differs from its counterparts in other languages, e.g. Dutch, where two conjunctions omdat and want express more objective and... more

Bookmark
Download
- by Natalia Levshina and +1
  Liesbeth Degand
- •
- 4
  Pragmatics, Logistic Regression, Discourse Connectives, Parallel Corpora

EXPERT (EXPloiting Empirical appRoaches to Translation): http://expert-itn.eu

This paper describes the characteristics of Brazilian Portuguese (BP) thetic sentences by means of a parallel corpus study consisting of the original dialogues of two Argentinean movies from 2004 and 2010 and the corresponding doubling... more

The paper relates about our ongoing work on the creation of a corpus of Bulgarian and Ukrainian parallel texts. We discuss some differences in the approaches and the interpretation of some concepts, as well as various problems associated... more

Bookmark
Download
- by Ivan Derzhanski and +1
  Olena Siruk
- •
- 6
  Translation Studies, Slavic Languages, Corpus Linguistics, Bulgarian Language

This paper presents a comparative bilingual corpus-based study of the use of several frequent temporal adverbs and adverbial expressions (‘always’, ‘sometimes’, ‘never’ and their synonyms) in Bulgarian and Ukrainian. The Ukrainian items... more

Bookmark
Download
- by Ivan Derzhanski and +1
  Olena Siruk
- •
- 11
  Translation Studies, Semantics, Slavic Languages, Corpus Linguistics

It is well known that word aligned parallel corpora are valuable linguistic resources. Since many factors affect automatic alignment quality, manual post-editing may be required in some applications. While there are several... more

Bookmark
Download
- by Olga Scrivner and +1
  Tim Gilmanov
- •
- 4
  Machine Translation, Cross-language Processing, Word alignment, Parallel Corpora

Generating source code API sequences from an English query using Machine Translation (MT) has gained much interest in recent years. For any kind of MT, the model needs to be trained on a parallel corpus. In this paper we clean... more

Bookmark
Download
- by Musfiqur Rahman
- •
- 2
  Software Engineering, Parallel Corpora

Anotace Several parallel corpora built from European Union language resources are presented here. They were processed by state-of-the-art tools and made available for researchers in the Sketch Engine corpus management system. A completely... more

Bookmark
Download
- by Miloš Jakubíček
- •
- 3
  Corpus Linguistics, Parallel Corpora, Sketch Engine

Parallel Corpora

Log In