Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
1998
We investigate an automatic method for Cross Language Information Retrieval (CLIR) that utilizes the multilingual UMLS Metathesaurus to translate Spanish and French natural language queries into English. Two experiments are presented using OHSUMED, a subset of MEDLINE. Both experiments examine retrieval effectiveness of the translated queries. However, in the second experiment, the query translation procedure is augmented with digram based vocabulary normalization procedures. In this comparative study of retrieval effectiveness the measures used are: 11-point-average precision score (11-AvgP); average interpolated precision at recall of 0.1; and noninterpolated (i.e., exact) precision after 10 retrieved documents. Our results indicate that for Spanish the UMLS Metathesaurus based CLIR method appears equivalent to multilingual dictionary based approaches investigated in the current literature French yields less favorable results and our analysis suggests that linguistic differences may have caused the performance differences.
2010
This article describes and evaluates various information retrieval models used to search document collections written in English through submitting queries written in various other languages, either members of the Indo-European family (English, French, German, and Spanish) or radically different language groups such as Chinese. This evaluation method involves searching a rather large number of topics (around 300) and using two commercial machine translation systems to translate across the language barriers. In this study, mean average precision is used to measure variances in retrieval effectiveness when a query language differs from the document language. Although performance differences are rather large for certain languages pairs, this does not mean that bilingual search methods are not commercially viable. Causes of the difficulties incurred when searching or during translation are analyzed and the results of concrete examples are explained.
1998
Cross-language retrieval systems seek to use queries in one natural language to guide the retrieval of documents that might be written in another. Acquisition and representation of translation knowledge plays a central role in this process. This paper explores the utility of two sources of manually encoded translation knowledge, bilingual dictionaries and translation lexicons, for cross-language retrieval. We have implemented six query translation techniques that use bilingual dictionaries, one based on lexicalsemantic analysis, and one based on direct use of the translation output from an existing machine translation system; these are compared with a document translation technique that uses output from the same existing translation system. Average precision measures on portions of the TREC collection suggest that arbitrarily selecting a single translation from a bilingual dictionary is typically no less effective than using every translation in the dictionary, that query translation using an existing machine translation system can achieve somewhat better e ectiveness than simple dictionary-based techniques, and that performing document translation rather than query translation may result in further improvements in retrieval e ectiveness under some conditions.
Information processing & …, 2000
In this paper, we present the system MULINEX, a fully implemented system which supports cross-lingual search of the WWW. Users can formulate, expand and disambiguate queries, filter the search results and read the retrieved documents by using only their native language. This multilingual functionality is achieved by the use of dictionary-based query translation, multilingual document categorisation and automatic translation of summaries and documents. The system supports French, German and English and has been installed and tested in the online services of two European internet content and service provider companies. This paper focuses on the techniques and algorithms used in the MULINEX system, explaining how each component works and how it contributes to the overall functionality of the integrated system. The primary system functionalities are outlined from the user perspective, followed by a description of the document database used in the system. The technologies and linguistic resources used in the various system components are then described in detail.
Journal of the American Society for Information Science and Technology, 2007
Information retrieval systems' ability to retrieve highly relevant documents has become more and more important in the age of extremely large collections, such as the World Wide Web (WWW). The authors' aim was to find out how corpus-based cross-language information retrieval (CLIR) manages in retrieving highly relevant documents. They created a Finnish-Swedish comparable corpus from two loosely related document collections and used it as a source of knowledge for query translation. Finnish test queries were translated into Swedish and run against a Swedish test collection. Graded relevance assessments were used in evaluating the results and three relevance criterion levels-liberal, regular, and stringent-were applied. The runs were also evaluated with generalized recall and precision, which weight the retrieved documents according to their relevance level. The performance of the Comparable Corpus Translation system (COCOT) was compared to that of a dictionarybased query translation program; the two translation methods were also combined. The results indicate that corpus-based CLIR performs particularly well with highly relevant documents. In average precision, COCOT even matched the monolingual baseline on the highest relevance level. The performance of the different query translation methods was further analyzed by finding out reasons for poor rankings of highly relevant documents.
Polibits, 2009
As Internet resources become accessible to more and more countries, there is a need to develop efficient methods for information retrieval across languages. In the present paper, we focus on query expansion techniques to improve the effectiveness of an information retrieval. A combination to a dictionary-based translation and statistical-based disambiguation is indispensable to overcome translation's ambiguity. We propose a model using multiple sources for query reformulation and expansion to select expansion terms and retrieve information needed by a user. Relevance feedback, thesaurus-based expansion, as well as a new feedback strategy, based on the extraction of domain keywords to expand user's query, are introduced and evaluated. We tested the effectiveness of the proposed combined method, by an application to a French-English Information Retrieval. Experiments using CLEF data collection proved a great effectiveness of the proposed combined query expansion techniques.
… of the 19th annual international ACM …, 1996
As the worldwide network grows it is easier to access textual databases containing text in a variety of languages. Rather than reformulating an information request in each of the possible target languages, it would be interesting to have means of posing a query in one language in order to access documents written in another. We have run some large-scale experiments on such multilingual information retrieval, and the first results are presented here. We have found that given resources such as bilingual dictionaries, retrieval results improve when derivational morphology normalization is applied, when the bilingual dictionary is transformed into a transfer dictionary, and when multiword terminology is correctly recognized.
QUILT (Query User Interface with Light Translations) is prototype implementation of a complete cross-language text retrieval system that takes English queries and produces English gloss translations of Spanish documents. The system indexes the Spanish documents in Spanish, but converts the English query into a Spanish equivalent set through a novel combination of lexical methods and parallel-corpus disam- biguatinn. Similar methods are applied to the returned docu- ment to produce a simple translation that can be examined by non-Spanish speakers to gauge the relevance of the document to the original English query. The system integrates tradi- tional, glossary-based machine txanslation technology with information retrieval approaches and demonstrates that rela- tively simple term substitution and disambiguation approaches can he viable for cross-language text retrieval. Components of QUILT have been used to build a CLTR inter- face to WWW-based search services.
As the number of non-English documents that are available on the World Wide Web and in corporate repositories increases, the ability to quickly and effectively search and view documents across language boundaries will continue to grow in importance. Cross-language information retrieval techniques allow searchers access to a wider range of material without requiring specialized knowledge of the content or the languages in the database. We present in this paper a cross-language information retrieval system (Arabic-English-French) based on a deep linguistic analysis of documents and queries and a statistical model which assigns a weight to each word in the database according to discriminating power. A comparison tool is used to evaluate all possible intersections between queries and documents and order documents by their relevance.
ProLISSA conference, 2002
As Internet resources become accessible to more and more countries, there is a need to develop efficient methods for information retrieval across languages. In the present paper, we focus on query expansion techniques to improve the effectiveness of an information retrieval. A combination to a dictionary-based translation and statistical-based disambiguation is indispensable to overcome translation's ambiguity. We propose a model using multiple sources for query reformulation and expansion to select expansion terms and retrieve information needed by a user. Relevance feedback, thesaurus-based expansion, as well as a new feedback strategy, based on the extraction of domain keywords to expand user's query, are introduced and evaluated. We tested the effectiveness of the proposed combined method, by an application to a French-English Information Retrieval. Experiments using CLEF data collection proved a great effectiveness of the proposed combined query expansion techniques.
In Proceedings of the Sixth …, 1998
1999
The paper studies concept-based cross-language information retrieval (CLIR). The document collection was a subset of the TREC collection. The test requests were formed from TREC's health related topics. As translation dictionaries the study used a general dictionary and a domain-specific (=medical) dictionary. The effects of translation method, conjunction, and facet order on the effectiveness of concept-based cross-language queries were studied, and concept-based structuring of cross-language queries was compared to mechanical structuring based on the output of dictionaries. The performance of translated Finnish queries against English documents was compared to the performance of original English queries against the English documents, and the performance of different CLIR query types was compared with one another. No major difference was found between concept-based and mechanical structuring. The best translation method was a simultaneous look-up in the medical dictionary and the general dictionary, in which case cross-language queries performed as well as the original English queries. The results showed that especially at high exhaustivity (the number of mutually restrictive concepts in a request) levels cross-language queries perform well in relation to monolingual queries. This suggests that conjunction disambiguates cross-language queries. An extensive study was made of the relative importance of the concepts of requests. On the basis of the classification data of request concepts it was shown how the order of facets in a query affects cross-language as well as monolingual queries.
— The rise in unmatched multilingual resources afforded by the exponential WWW growth demands the advancement of technologies to eradicate the communication barriers among languages. Relevant information in collections and the Web is not limited to the native language of the user, but today, the need to retrieve documents in other languages is growing so that the content, which can be translated, satisfies the information needs of the user. Information retrieval (IR) can be classified into different categories such as monolingual information retrieval, Cross lingual information retrieval (CLIR) and Multi lingual information retrieval (MLIR). In the present day scenario, the diversity of information and language barriers are the serious challenges for communication and cultural interchange across the globe. To solve such communication barriers, CLIR systems are today in strong demand. The goal of CLIR is to find relevant information written in a language different from other languages of the query. CLIR can be used to improve the capabilities of users to search and retrieve documents in many languages. Diverse translation techniques can be used to achieve CLIR. In this paper, we review the techniques and approaches of CLIR research for query and document translation and their role in current research directions, which include new models, and paradigm in the extensive area of IR. In addition, based on existing literature, a number of challenges and tools in CLIR has been identified and discussed. Finally, possible future research directions on semantic query-document translation for CLIR are discussed.
Proceedings of LREC, 1998
This situation has given rise to new line of research called Cross-Language Information Retrieval (CLIR), treating the problem of finding a document written in one language via a query written in another language. One of the important resources needed for this problem is set of bilingual dictionaries for producing queries in new languages. The two most important aspects of these bilingual dictionaries for CLIR are the coverage that the dictionary provides for domain-independent corpora, and the adequacy of the translations provided for finding relevant documents in the second language. In this paper, we present a number of evaluations of these aspects for a bilingual dictionary, available through the ELRA. These evaluations are run against large corpora used in the TREC information retrieval trials.
2001
This paper reviews literature on dictionary-based cross-language information retrieval (CLIR) and presents CLIR research done at the University of Tampere (UTA). The main problems associated with dictionary-based CLIR, as well as appropriate methods to deal with the problems are discussed. We will present the structured query model by Pirkola and report findings for four different language pairs concerning the effectiveness of query structuring.
This paper proposes a method of query translation for Cross Language Information Retrieval. The method uses a parallel bilingual corpus to produce word vectors and can readily be applied to monolingual vector-retrieval models. The Cross Language Information Retrieval system produced with the method showed 97.4% accuracy in our preliminary tests of finding the counterparts in a parallel corpus of English and Japanese documents.
2004
In this study the basic framework and performance analysis results are presented for the three year long development process of the dictionary-based UTACLIR system. The tests expand from bilingual CLIR for three language pairs Swedish, Finnish and German to English, to six language pairs, from English to French, German, Spanish, Italian, Dutch and Finnish, and from bilingual to multilingual. In addition, transitive translation tests are reported.
Transactions on Engineering, Computing …, 2005
Abstract-Classical Information Retrieval (IR) is the sifting out of the documents most relevant to a user's information requirement (expressed as a query), from a large electronic store of documents. A search engine performs IR by retrieving relevant web pages from the ...
"With the explosive growth of international users, distributed information and the number of linguistic resources, accessible throughout the World Wide Web, information retrieval has become crucial for users to find, retrieve and understand relevant information, in any language and form. Cross- Language Information Retrieval (CLIR) is a subfield of Information Retrieval which provides a query in one language and searches document collections in one or many languages but it also has a specific meaning of crosslanguage information retrieval where a document collection is multilingual. In the present research, we focus on query translation, disambiguation of multiple translation candidates and query expansion with various combinations, in order to improve the effectiveness of retrieval. Extracting, selecting and adding terms that emphasize query concepts are performed using expansion techniques such as, pseudo-relevance feedback, domain-based feedback and thesaurus-based expansion. A method for information retrieval for a query expressed in a native language is presented in this paper. It uses insights from data mining and intelligent search for formulating the query and parsing the results."
… in Information Science …, 2010
This paper is devoted to a new method that uses query expansion to improve multilingual information retrieval. The backbone is an Information Retrieval (IR) system based on a search engine and a multilingual module based on statistical machine translation of documents. To this system is added a Query Expansion (QE) module which mainly uses linguistic resources to perform the expansion. The aim is to use QE to overcome the limitations of machine translation, and to retrieve more relevant results. The authors demonstrate, with examples, the usefulness of such a system. They also validate it with several measures, which show a clear reduction of the silence for results.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.