Papers by Andrejs Vasiļjevs

tuhat.halvi.helsinki.fi
The META-NORD project has contributed to an open infrastructure for language resources (data and ... more The META-NORD project has contributed to an open infrastructure for language resources (data and tools) under the META-NET umbrella. This paper presents the key objectives of META-NORD and reports on the results achieved in the first year of the project. META-NORD has mapped and described the national language technology landscape in the Nordic and Baltic countries in terms of language use, language technology and resources, main actors in the academy, industry, government and society; identified and collected the first batch of language resources in the Nordic and Baltic countries; documented, processed, linked, and upgraded the identified language resources to agreed standards and guidelines. The three horizontal multilingual actions in META-NORD are overviewed in this paper: linking and validating Nordic and Baltic wordnets, the harmonisation of multilingual Nordic and Baltic treebanks, and consolidating multilingual terminology resources across European countries. This paper also touches upon intellectual property rights for the sharing of language resources.
aclweb.org
The lack of parallel corpora and linguistic resources for many languages and domains is one of th... more The lack of parallel corpora and linguistic resources for many languages and domains is one of the major obstacles for the further advancement of automated translation. A possible solution is to exploit comparable corpora (non-parallel bi-or multi-lingual text resources) which are much more widely available than parallel translation data. Our presented toolkit deals with parallel content extraction from comparable corpora. It consists of tools bundled in two workflows: (1) alignment of comparable documents and extraction of parallel sentences and (2) extraction and bilingual mapping of terms and named entities. The toolkit pairs similar bilingual comparable documents and extracts parallel sentences and bilingual terminological and named entity dictionaries from comparable corpora. This demonstration focuses on the English, Latvian, Lithuanian, and Romanian languages.
… of the 2010 conference on Human …, Jan 1, 2010
This position paper presents the recently started European collaboration project LetsMT!. This pr... more This position paper presents the recently started European collaboration project LetsMT!. This project creates a platform that gathers public and userprovided MT training data and generates multiple MT systems by combining and prioritizing this data. The project extends the use of existing state-of-the-art SMT methods that are applied to data supplied by users to increase quality, scope and language coverage of machine translation. The paper describes the background and motivation for this work, key approaches, and the technologies used.
Proceedings of the 13th …, Jan 1, 2011
To fully exploit the huge potential of existing open SMT technologies and user-provided content, ... more To fully exploit the huge potential of existing open SMT technologies and user-provided content, we have created an innovative online platform for data sharing and MT building. This platform is being developed in the EU collaboration project LetsMT!. This paper presents motivation in developing this platform, its architecture and main features.
Multilingual, Jan 1, 2012

langtech.fub.it
The current translation practice demonstrates lack of integration support between the traditional... more The current translation practice demonstrates lack of integration support between the traditional desktop translation tools and the rich terminological data available on the internet. This article sets the background for development of a new layer of web-based translation tools for automated translation of multilingual terminology, bridging the gap between translation tools and environments and internet term banks. It analyses the experience gained during the EuroTermBank project that proposes solutions to a number of challenges in integration of term banks with translation tools, such as the federation approach to interlinking term banks and the entry compounding approach for visual representation of multiple overlapping terminology entries. The article propones a standards-based approach to ensure data compatibility, and identifies the requirement to support terminology sharing on an interoperable level.
Proceedings of the First …, Jan 1, 2008

Proceedings of LREC 2006, …, Jan 1, 2005
The new EU member countries face the problems of terminology resource fragmentation and lack of c... more The new EU member countries face the problems of terminology resource fragmentation and lack of coordination in terminology development in general. The EuroTermBank project aims at contributing to improve the terminology infrastructure of the new EU countries and the project will result in a centralized online terminology bank -interlinked to other terminology banks and resourcesfor languages of the new EU member countries. The main focus of this paper is on a description of how to identify best practice within terminology work seen from a broad perspective. Surveys of real life terminology work have been conducted and these surveys have resulted in identification of scenario specific best practice descriptions of terminology work. Furthermore, this paper will present an outline of the specific criteria that have been used for selection of existing term resources to be included in the EuroTermBank database.
Uses and usage of language resource- …, Jan 1, 2008
sources (LRs) have been issued by different organizations (e.g., ISO, Oasis, LISA, W3C, TEI etc.)... more sources (LRs) have been issued by different organizations (e.g., ISO, Oasis, LISA, W3C, TEI etc.). While some of them have been designed explicitly for the purpose of modeling language (e.g., LAF, OLAC, IMDI, etc), other standards in use in the community are originally geared to the representation of properties of texts (e.g., TEI). Still others have been defined with broader, not inherently language or text-related purposes in mind. Recently, standards for resource management are also emerging.
… PROCEEDINGS SERIES VOL. …, Jan 1, 2011
Although Machine Translation is very popular for personal tasks, its use in localization and othe... more Although Machine Translation is very popular for personal tasks, its use in localization and other business applications is still very limited. The paper presents an experiment on the evaluation of an English-Latvian SMT system integrated into SDL Trados which has been used in an actual localization assignment by a professional localization company. We show that such an integrated localization environment can increase the productivity of localization by 32.9% without a critical reduction in quality.
translingual-europe.eu
Page 1. LetsMT! – Towards cloud‐based service for MT generalon Andrejs [email protected] ... more Page 1. LetsMT! – Towards cloud‐based service for MT generalon Andrejs [email protected] Tilde Translingual Europe 2010, Berlin, 07.06.2010 Page 2. Data challenge ❑ Sta"s"cal methods provide breakthrough in cost‐ effeclve MT development ...
lrec-conf.org
This paper proposes statistical analysis methods for improvement of terminology entry compounding... more This paper proposes statistical analysis methods for improvement of terminology entry compounding. Terminology entry compounding is a mechanism that identifies matching entries across multiple multilingual terminology collections. Bilingual or trilingual term entries are unified in compounded multilingual entry. We suggest that corpus analysis can improve entry compounding results by analysing contextual terms of given term in the corpus data.
The 5th Workshop on Building and Using Comparable …, Jan 1, 2012
aclweb.org
To facilitate the creation and usage of custom SMT systems we have created a cloud-based platform... more To facilitate the creation and usage of custom SMT systems we have created a cloud-based platform for do-it-yourself MT. The platform is developed in the EU collaboration project LetsMT!. This system demonstration paper presents the motivation in developing the LetsMT! platform, its main features, architecture, and an evaluation in a practical use case.
flarenet.eu
The explosive growth of digital information on the web enables rapid development of data driven t... more The explosive growth of digital information on the web enables rapid development of data driven techniques. Significant breakthrough in many areas of language technologies has been achieved. Statistical methods based on huge volume of data have replaced the laborious human work that was required to encode linguistic knowledge. In the new paradigm, the more data you have the better results you get.

Proceedings of the 21st international …, Jan 1, 2012
This paper presents European Union co-funded projects to advance the development and use of machi... more This paper presents European Union co-funded projects to advance the development and use of machine translation (MT) that will benefit from the possibilities provided by the Web. Current mass-market and online MT systems are of a general nature and perform poorly for smaller languages and domain specific texts. The ICT-PSP Programme project LetsMT! develops a user-driven machine translation "factory in the cloud" enabling web users to get customized MT that better fits their needs. Harnessing the huge potential of the web together with open statistical machine translation (SMT) technologies LetsMT! has created an innovative online collaborative platform for data sharing and building MT. Users can upload their parallel corpora to an online repository and generate user-tailored SMT systems based on user selected data. FP7 Programme project ACCURAT researches new methods for accumulating more data from the Web to improve the quality of data-driven machine translation systems. ACCURAT has created techniques and tools to use comparable corpora such as news feeds and multinational web pages. Although the majority of these texts are not direct translations, they share a lot of common paragraphs, sentences, phrases, terms and named entities in different languages which are useful for machine translation.
Uploads
Papers by Andrejs Vasiļjevs