Multilingual Querying

Mohammed Mustafa

Multilingual Querying

Mohammed Mustafa

2011

visibility

…

description

8 pages

link

1 file

Non-English-speaking users, such as Arabic speakers, are not always able to express terminology in their native languages, especially in scientific domains. Such difficulty forces many Arabic authors and scholars to use English terms in order to explain precise concepts, resulting in mixed/multilingual queries with both English and Arabic terms. Current CLIR techniques are optimized for monolingual queries, even if they are translated, but neither mixed-language queries nor searches for mixed-language documents have yet been adequately studied. This paper attempts to address the problem of multilingual querying in CLIR. It shows experimentally that current search engines and IR systems are not language-aware and are not adequate for multilingual querying. The paper then presents the main ingredients that every language-aware solution should take care of.

Sign up for access to the world's latest research

checkGet notified about relevant papers

checkSave papers to use in your research

checkJoin the discussion with peers

checkTrack your impact

mohamed mustafa

Proceedings of the South African Institute of Computer Scientists and Information Technologists Conference on Knowledge, Innovation and Leadership in a Diverse, Multidisciplinary Environment - SAICSIT '11, 2011

Non-English-speaking users, such as Arabic speakers, are not always able to express terminology in their native languages, especially in scientific domains. Such difficulty forces many Arabic authors and scholars to use English terms in order to explain precise concepts, particularly when they address technical topics, resulting in mixed/multilingual queries with both English and Arabic terms. Cross Language Information Retrieval (CLIR) allows users to search documents that are written in a language different from the query. However, current algorithms are optimized for monolingual queries, even if they are translated. This paper attempts to address the problem of multilingual querying in CLIR. New techniques that are better suited to the unique characteristics of this problem, in terms of indexing and weighting, are proposed. A new multilingual and mixed test collection containing mixed-language (Arabic and English) computer science documents and mixed-language queries has been created. Experimentally, results show that current CLIR techniques were not designed for these types of multilingual queries and documents and are found to perform poorly whereas the proposed techniques are found to be promising.

Log In

Multilingual Querying

Sign up for access to the world's latest research

Related papers

Related papers

Related topics