Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2011
…
8 pages
1 file
Non-English-speaking users, such as Arabic speakers, are not always able to express terminology in their native languages, especially in scientific domains. Such difficulty forces many Arabic authors and scholars to use English terms in order to explain precise concepts, resulting in mixed/multilingual queries with both English and Arabic terms. Current CLIR techniques are optimized for monolingual queries, even if they are translated, but neither mixed-language queries nor searches for mixed-language documents have yet been adequately studied. This paper attempts to address the problem of multilingual querying in CLIR. It shows experimentally that current search engines and IR systems are not language-aware and are not adequate for multilingual querying. The paper then presents the main ingredients that every language-aware solution should take care of.
Proceedings of the South African Institute of Computer Scientists and Information Technologists Conference on Knowledge, Innovation and Leadership in a Diverse, Multidisciplinary Environment - SAICSIT '11, 2011
Non-English-speaking users, such as Arabic speakers, are not always able to express terminology in their native languages, especially in scientific domains. Such difficulty forces many Arabic authors and scholars to use English terms in order to explain precise concepts, particularly when they address technical topics, resulting in mixed/multilingual queries with both English and Arabic terms. Cross Language Information Retrieval (CLIR) allows users to search documents that are written in a language different from the query. However, current algorithms are optimized for monolingual queries, even if they are translated. This paper attempts to address the problem of multilingual querying in CLIR. New techniques that are better suited to the unique characteristics of this problem, in terms of indexing and weighting, are proposed. A new multilingual and mixed test collection containing mixed-language (Arabic and English) computer science documents and mixed-language queries has been created. Experimentally, results show that current CLIR techniques were not designed for these types of multilingual queries and documents and are found to perform poorly whereas the proposed techniques are found to be promising.
As the number of non-English documents that are available on the World Wide Web and in corporate repositories increases, the ability to quickly and effectively search and view documents across language boundaries will continue to grow in importance. Cross-language information retrieval techniques allow searchers access to a wider range of material without requiring specialized knowledge of the content or the languages in the database. We present in this paper a cross-language information retrieval system (Arabic-English-French) based on a deep linguistic analysis of documents and queries and a statistical model which assigns a weight to each word in the database according to discriminating power. A comparison tool is used to evaluate all possible intersections between queries and documents and order documents by their relevance.
At a time of wide availability of communication technologies, language barriers are a serious issue to world communication and to economic and cultural exchanges. More comprehensive tools to overcome such barriers, such as machine translation and cross-lingual information application, are nowadays in strong demand. In this paper, the research problems will be considered within the context of Arabic/English Cross Language Information Retrieval (CLIR). Our proposed method consists of two main steps: First, using an Arabic analyzer, the query terms are analyzed and the senses of the ambiguous query terms are defined. Secondly, the correct senses of the ambiguous query terms are selected, based on cooccurrence statistical data. To compensate for the lack of training data for some test queries, we used the web, in order to get the statistical co-occurrence data needed to disambiguate the ambiguous query terms.
2008
Cross-language information retrieval (CLIR) is a rapidly growing area of research in the information retrieval (IR) field. There is an increasing interest in the ability to enter queries in one language and retrieve documents in a different language. There are several existing approaches to cross language information retrieval; however in this research; one particular approach based on bilingual dictionaries of Arabic and English languages has been discussed, implemented, and evaluated. The retrieval performance of the above approach has been studied in two situations: when the query consists of full words and when the query words are in base (root) form. The query is selected in the source language (English) to retrieve documents in Arabic and English languages. The experiment results show that the retrieval performance in Arabic and English languages is improved when querying is done based on the root of the query words
— The rise in unmatched multilingual resources afforded by the exponential WWW growth demands the advancement of technologies to eradicate the communication barriers among languages. Relevant information in collections and the Web is not limited to the native language of the user, but today, the need to retrieve documents in other languages is growing so that the content, which can be translated, satisfies the information needs of the user. Information retrieval (IR) can be classified into different categories such as monolingual information retrieval, Cross lingual information retrieval (CLIR) and Multi lingual information retrieval (MLIR). In the present day scenario, the diversity of information and language barriers are the serious challenges for communication and cultural interchange across the globe. To solve such communication barriers, CLIR systems are today in strong demand. The goal of CLIR is to find relevant information written in a language different from other languages of the query. CLIR can be used to improve the capabilities of users to search and retrieve documents in many languages. Diverse translation techniques can be used to achieve CLIR. In this paper, we review the techniques and approaches of CLIR research for query and document translation and their role in current research directions, which include new models, and paradigm in the extensive area of IR. In addition, based on existing literature, a number of challenges and tools in CLIR has been identified and discussed. Finally, possible future research directions on semantic query-document translation for CLIR are discussed.
Information processing & …, 2000
In this paper, we present the system MULINEX, a fully implemented system which supports cross-lingual search of the WWW. Users can formulate, expand and disambiguate queries, filter the search results and read the retrieved documents by using only their native language. This multilingual functionality is achieved by the use of dictionary-based query translation, multilingual document categorisation and automatic translation of summaries and documents. The system supports French, German and English and has been installed and tested in the online services of two European internet content and service provider companies. This paper focuses on the techniques and algorithms used in the MULINEX system, explaining how each component works and how it contributes to the overall functionality of the integrated system. The primary system functionalities are outlined from the user perspective, followed by a description of the document database used in the system. The technologies and linguistic resources used in the various system components are then described in detail.
International Journal of Innovative Technology and Exploring Engineering
In the era of globalization, internet being accessible and affordable has gained huge popularity and is widely being used almost everywhere by Government, private organizations, companies, banks, etc. as well as by individuals. It has empowered its users to contribute to the creation of information on web enabling them to use their native languages which consequently has drastically increased the volume of web-accessible documents available in languages other than English. This exponential growth of information on the internet has also induced several challenges before the information retrieval systems. Most of the present monolingual information retrieval systems can retrieve documents in the language of query only, missing the information in other languages that may be more relevant to the user. The need of information retrieval systems to become multilingual has given rise to the research in Cross Language Information Retrieval (CLIR) which can cross the language barriers and ret...
International Journal for Research in Applied Science and Engineering Technology, 2019
The aim of this paper is to extract and fetch simple and efficient features to enhance multilingual document ranking (MLDR).Our approach is to extract monolingual and multilingual similarity features using a bilingual dictionary. In order to make this approach extensible for all other languages, no language-specific tools are preferred to be used.The process of ranking the documents of various languages based on their relevancy to the query irrespective of query's language is ranking for multilingual information retrieval (MLIR). There are some approaches which focuses on merging the relevant scores of different retrieval settings but do not learn the concept of ranking.The concept of web MLIR ranking in learning-to-rank(L2R) framework is preferred to be used. We create a ranking model to findout the relations among the documents and also to findout the joint relevance probability for the documents. We can improve the relevant estimation of documents in all the languagesusing this method. I. INTRODUCTION Multilingual Information Retrieval (MLIR) is essential and desirable because of the increase of information in various languages. MLIR plays and involves the task of Cross Lingual Information Retrieval for each different desired languages. As the development of globalization and digital online information in Internet is growing, there is a great demand for MLIR. In order to produce a single result, which is obtained from different languages, we come across a merging step, which results in ranking of documents of multilingual results obtained based on the relevancy of the results. The problem of CLIR has been well studied in the past decade especially with the help of CLEF, NTCIR, TREC and FIRE forums. In the realm of CLIR theproblem of ranking multilingual result lists is a very challenging task. The task of identifying whether two different language documents talks about the same topic is itself very challenging. There are few early attempts on ranking multilingual documents (Round robin merging [1], raw-score merging [1]). These merging processes have to make some simplifying assumptions. For example, one may assume that the similarities calculated for different language result lists are comparable; so the result lists can be merged according to their raw similarity values [1]. One can also normalize the similarities first; but this approach implicitly assumes that the highly ranked documents in different languages are similar to the query at a comparable level. These assumptions are not true. Until recent past [2], [3], [4], [5], [6], there was little focus on merging multilingual result lists. The recent work concentrated more on extracting semantic information such as multilingual topics from documents. These methods are highly dependent upon language specific tools like named-entity recognizer, part-of-speech tagger etc., hence they cannot be extended for languages with fewer resources, i.e., they do not achieve highmultilinguality. If we have a requirement to approach a ranking in order to apply in various languages, There is a major challenge in achieving language specific development pose. When we try to merge multilingual list of results achieved ,techniques that is suitable for one language may not be suitable for the other language .There are some applications which deals with limited number of languages while others require lot of different languages. We try to implement language-independent approaches which will benefit multilingual retrieval which encourages MLIR community. In paper[7], Using multilingual documents and topics we extract efficient features which is useful for the enhancement of the performance in Multilingual Document Ranking using the similarities in candidate documents. After the result lists of different languages is obtained along with their queries, we calculate similarity measures along with the various metrics. The tool given for translation will produce large number of translations which is acceptable for a given set of queries [8]. There is a limited set of availability of tools for a specific number of languages.In this regard some language-specific tools are eliminated while measuring the document based on similarity metrics. Therefore in order to calculate the similarity of multilingual document, we can use the bilingual dictionaries, Wikipedia for gaining the knowledge. The approach can also be used to other language to provide the availability of basic language resources. Some experiments are carried out on FIRE2010 corpus which was conducted by using several ranking algorithms on various features and the results were extracted and combined using the NDCG as the metric for evaluation and the extracted results were verified and compared against the BM25 baseline ranking system[30].
Information Retrieval, 12(3):227-229, 2009. ISSN 1386-4564. DOI 10.1007/s10791-009-9091-2, 2009
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.
IEEE Data(base) Engineering Bulletin, 2007
ACM Transactions on Asian and Low-Resource Language Information Processing, 2016
Indonesian Journal of Electrical Engineering and Computer Science
Information Retrieval, 12(3):230-250, 2009. ISSN 1386-4564. DOI 10.1007/s10791-009-9093-0, 2009
Proc. of ACM SIGIR 2007 Workshop on Improving Non-English Web Searching (iNEWS07), Amsterdam, The Netherlands, 2007. ISBN 978-84-690-6978-3 (78 pp.), 2007
Lecture Notes in Computer Science, 2015
iConference 2016 Proceedings
… of the 19th annual international ACM …, 1996
ACM SIGIR Forum, 41(2):72-76, 2007. ISSN 0163-5840., 2007
International Journal of Advanced Computer Science and Applications, 2013