Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2017, IRAQI JOURNAL OF SCIENCE
…
11 pages
1 file
Query expansion (QE) is a successful idea to overcome the weaknesses in the information retrieval performance. The QE requires finding out appropriate word synonyms of the query words in a process that can be made automatically without any user intervention. The candidate synonyms should be associated with an accurate meaning (sense) of the original word. Arabic language is rich in multiple meanings and this requires using the so-called word sense disambiguation (WSD). WSD in general is a task to discover the correct sense of a word within context. To disambiguate the word sense, three different traditional semantic measures are tested in this work; they are called lch, wup, and path respectively. The proposed system uses these measures along with an automatic synonym selection method employed to expand the query. The proposed system outperforms the traditional baseline system that has no query expansion technique in a rate from 10% to 18 % and reduces the latency in an approximate rate from 0.232 to 0.283 second for each query.
Word sense ambiguity is widely spread in all natural languages; a word may carry several distinct meanings. Human can figure out the suitable meaning according to the context in which the word occurs. The Arabic language is highly polysemous; in many situations we find it extremely necessary to disambiguate the word senses. This paper studies and compares the performance of a search engine before and after expanding the query through Interactive Word Sense Disambiguation (WSD). We found that expanding polysemous query terms by adding more specific synonyms will narrow the search into the specific targeted request and thus causes both precision and recall to increase; on the other hand, expanding the query with a more general (polysemous) synonym will broaden the search which would cause the precision to decrease.
2013
Millions of users search daily for their needs using internet and other information stores, they search by writing their queries. Unfortunately, these queries may fail to reach to their needs, this fail known as word mismatch. One way of handling this Word mismatch is by using a thesaurus, that shows (usually semantic) the relationships between terms. The main goal of this study is to design and build an automatic Arabic thesaurus using Local Context Analysis technique that can be used in any special field or domain to improve the expansion process and to get more relevance documents for the user's query. This technique can be used in any special field or domain to improve the expansion process and to get more relevant documents for the user's query. Results of this study were compared with the classical information retrieval system. Two hundred and forty two Arabic documents and 59 Arabic queries were used for building the requirements of the thesaurus, such as inverted Fil...
World Journal of English Language, 2023
This study aimed at assessing the performance and efficacy of the retrieval information (IR) systems implemented in three widely used search engines (Google, Bing, and Yahoo), specifically with regard to the challenge of word sense disambiguation in Arabic texts. Such a challenge has been confirmed to negatively influence the retrieval of the most relevant documents. Therefore, we extended the paradigm of using computational methods and natural language processing (NLP) tools, primarily tailored for processing English texts, to explore morphosyntactic as well as lexical issues disturbing the accuracy of Arabic IR systems. Findings revealed striking disparities in the efficacy of IR systems integrated into these search engines, which can be attributed to four principal challenges: (a) the intricate morpho-syntactic structures inherent in Arabic; (b) the idiosyncratic orthographical system of the Arabic script; (c) the multifaceted semantic flexibility of certain lexical elements; and (d) the intriguing diaglossic nature of Arabic, allowing for the coexistence of multiple linguistic varieties within a single discourse situation. Drawing from these findings, a series of solutions rooted in supervised machine learning techniques, including clustering models and adaptations based on geographic locations, are proposed. Moreover, the study advocates for the capacity of search engines to interpret queries across all Arabic varieties, encompassing vernacular dialects. Furthermore, the importance of search engines accommodating queries irrespective of the specific language adopted by users is underscored. While the research primarily centers on Arabic, its implications resonate beyond this language alone. By applying computational methodologies originally designed for English to Arabic, the study not only addresses the challenges specific to Arabic IR systems but also contributes valuable insights that transcend linguistic boundaries. Through a comparative lens, issues like word sense disambiguation between Arabic and English are juxtaposed, extracting lessons that can inform advancements in information retrieval for both languages.
American Journal of Applied Sciences, 2013
The word mismatch problem is fundamental to Information retrieval. Query expansion process helps to overcome this problem. Based on the Arabic corpuses, the comparisons between two query expansion techniques (global and local query) have been conducted to determine the query effectiveness. First one represents the local context analysis which represents a local method, while a global method was the second technique that has been represented by the Association and similarity thesauruses. These techniques can be used in any special field or domain to improve the expansion process and to get more relevant documents for the user's query. This study introduces a comparison between these approaches and shows their effectiveness. Although, local context analysis has some advantages over the similarity thesaurus, Association thesaurus which is global is generally the most effective one.
2006
In both writing and conversation, different people may use different terminologies for the same concept. The same situation could be generalized on issuing queries for search engines and digital libraries. Arabic Information Retrieval (IR) systems practices are still based on word-matching rather than word-sense approaches. This paper addresses some characteristics of Arabic language text properties and its computer processing, in addition to a general idea about synonyms facility and its current implementation fields in IT. The study exhibits an implementation model for a new IR system using additional components like Arabic light stemmers and word synonyms structure which assist in solving some limitations that today's Arabic IR systems suffer from. The study recommends the use of word stemming and wildcard search modules to solve the word scripts mismatching problem which arise with word-matching approach. In addition, it utilizes the synonyms facility in order to expand the queries in word-sense approach.
IFIP Advances in Information and Communication Technology , 2012
This research suggests a method for query expansion on Arabic Information Retrieval using Expectation Maximization (EM). We employ the EM algorithm in the process of selecting relevant terms for expanding the query and weeding out the non-related terms. We tested our algorithm on INFILE test collection of CLLEF2009, and the experiments show that query expansion that considers similarity of terms both improves precision and retrieves more relevant documents. The main finding of this research is that we can increase the recall while keeping the precision at the same level by this method.
Proceedings of the EMNLP 2014 Workshop on Arabic Natural Language Processing (ANLP), 2014
Traditional keyword based search is found to have some limitations. Such as word sense ambiguity, and the query intent ambiguity which can hurt the precision. Semantic search uses the contextual meaning of terms in addition to the semantic matching techniques in order to overcome these limitations. This paper introduces a query expansion approach using an ontology built from Wikipedia pages in addition to other thesaurus to improve search accuracy for Arabic language. Our approach outperformed the traditional keyword based approach in terms of both F-score and NDCG measures.
Business Intelligence: Concepts, Methodologies, Tools, and Applications, 2000
Research and experimentation using Arabic WordNet in the field of information retrieval are relatively new. It is limited compared to the research that has been done using Princeton WordNet. This work attempts to study the impact of Arabic WordNet on the performance of Arabic information retrieval. The authors extend Lucene with Arabic WordNet to expand user's queries. The major contribution of this study is to propose an interactive query expansion (IQE) methodology using the word's part-of-speech, according to the part it plays in a query. First, the user selects the appropriate part of speech for each term in the original query, and then they reselect the appropriate synonyms. Experimental results show that the IQE strategy produces a good Mean Average Precision (MAP), it is able to improve MAP by 12.6%, but no variant of automatic query expansion (AQE) strategies did. Nevertheless, the experiments allow the authors to conclude that an appropriate use of Arabic WordNet as a source of linguistic information for AQE can improve effectiveness for Arabic information retrieval.
One of the major problems of modern Information Retrieval (IR) systems is the vocabulary problem that concerns the discrepancies between terms used for describing documents and the terms used by the researchers to describe their information need. One way of handling the vocabulary problem is by using a thesaurus (usually semantic), that shows the relationships between terms.
Journal of Big Data
Introduction Information retrieval (IR) is an active research field that aims at extraction of the most relevant documents from large datasets. User query plays an important role in this process. A numerous efforts have been done to retrieve the relevant documents which are written in English language. Nevertheless, Arabic language has not received the deserved effort due to some inherent difficulties with the language itself. In fact, Arabic language is one of the richest human languages in its terms, varieties of sentence constructions, and diversity of meaning [1]. The sentence in Arabic language is made up of interconnected terms based on grammatical relation [2-4]. User query in most cases is too short which may neither be sufficient nor effective enough to express what the user needs [2]. Vocabulary mismatch is one of the most critical issues in IR where the user and indexer use different terms [5, 6]. Consequently, IR systems could not retrieve the documents which match the user needs. A well-known and effective strategy to resolve this issue is to perform query expansion (QE).
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.
Arabian Journal for Science and Engineering, 2018
Journal of Information Science Theory and Practice, 2020
International Journal of Advanced Computer Science and Applications, 2012
International Journal of Advanced Computer Science and Applications, 2016
International Journal of Information Retrieval Research, 2013
MATEC Web of Conferences
International Journal of English Linguistics, 2020
Advances in Artificial Intelligence, 2012
International Journal of Computer Applications in Technology, 2008
International Journal of Intelligent Information Technologies, 2016
EAI Endorsed Transactions on Internet of Things