Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2012, Advances in Artificial Intelligence
…
9 pages
1 file
We propose a new approach for determining the adequate sense of Arabic words. For that, we propose an algorithm based on information retrieval measures to identify the context of use that is the closest to the sentence containing the word to be disambiguated. The contexts of use represent a set of sentences that indicates a particular sense of the ambiguous word. These contexts are generated using the words that define the senses of the ambiguous words, the exact string-matching algorithm, and the corpus. We use the measures employed in the domain of information retrieval, Harman, Croft, and Okapi combined to the Lesk algorithm, to assign the correct sense of those proposed.
In this paper we propose an hybrid system of Arabic words disambiguation. To achieve this goal we use the methods employed in the domain of information retrieval: Latent semantic analysis, Harman, Croft, Okapi, combined to the lesk algorithm. These methods are used to estimate the most relevant sense of the ambiguous word. This estimation is based on the calculation of the proximity between the current context (Context of the ambiguous word), and the different contexts of use of each meaning of the word. The Lesk algorithm is used to assign the correct sense of those proposed by the LSA, Harman, Croft and Okapi. The results found by the proposed system are satisfactory, we obtained a rate of disambiguation equal to 76%.
2009
In this paper we propose an hybrid system of Arabic words disambiguation. To achieve this goal we use the methods employed in the domain of information retrieval: Latent semantic analysis, Harman, Croft, Okapi, combined to the lesk algorithm. These methods are used to estimate the most relevant sense of the ambiguous word. This estimation is based on the calculation of the proximity between the current context (Context of the ambiguous word), and the different contexts of use of each meaning of the word. The Lesk algorithm is used to assign the correct sense of those proposed by the LSA, Harman, Croft and Okapi. The results found by the proposed system are satisfactory, we obtained a rate of disambiguation equal to 73%.
International Journal of Advanced Computer Science and Applications, 2016
Word Sense Disambiguation (WSD) consists of identifying the correct sense of an ambiguous word occurring in a given context. Most of Arabic WSD systems are based generally on the information extracted from the local context of the word to be disambiguated. This information is not usually sufficient for a best disambiguation. To overcome this limit, we propose an approach that takes into consideration, in addition to the local context, the global context too extracted from the full text. More particularly, the sense attributed to an ambiguous word is the one of which semantic proximity is more close both to its local and global context. The experiments show that the proposed system achieved an accuracy of 74%.
World Journal of English Language, 2023
This study aimed at assessing the performance and efficacy of the retrieval information (IR) systems implemented in three widely used search engines (Google, Bing, and Yahoo), specifically with regard to the challenge of word sense disambiguation in Arabic texts. Such a challenge has been confirmed to negatively influence the retrieval of the most relevant documents. Therefore, we extended the paradigm of using computational methods and natural language processing (NLP) tools, primarily tailored for processing English texts, to explore morphosyntactic as well as lexical issues disturbing the accuracy of Arabic IR systems. Findings revealed striking disparities in the efficacy of IR systems integrated into these search engines, which can be attributed to four principal challenges: (a) the intricate morpho-syntactic structures inherent in Arabic; (b) the idiosyncratic orthographical system of the Arabic script; (c) the multifaceted semantic flexibility of certain lexical elements; and (d) the intriguing diaglossic nature of Arabic, allowing for the coexistence of multiple linguistic varieties within a single discourse situation. Drawing from these findings, a series of solutions rooted in supervised machine learning techniques, including clustering models and adaptations based on geographic locations, are proposed. Moreover, the study advocates for the capacity of search engines to interpret queries across all Arabic varieties, encompassing vernacular dialects. Furthermore, the importance of search engines accommodating queries irrespective of the specific language adopted by users is underscored. While the research primarily centers on Arabic, its implications resonate beyond this language alone. By applying computational methodologies originally designed for English to Arabic, the study not only addresses the challenges specific to Arabic IR systems but also contributes valuable insights that transcend linguistic boundaries. Through a comparative lens, issues like word sense disambiguation between Arabic and English are juxtaposed, extracting lessons that can inform advancements in information retrieval for both languages.
2010
Abstract We describe a model for the lexical analysis of Arabic text, using the lists of alternatives supplied by a broad-coverage morphological analyzer, SAMA, which include stable lemma IDs that correspond to combinations of broad word sense categories and POS tags. We break down each of the hundreds of thousands of possible lexical labels into its constituent elements, including lemma ID and part-of-speech.
Word sense disambiguation is a core problem in many tasks related to language processing and was recognized at the beginning of the scientific interest in machine translation and artificial intelligence. In this paper, we introduce the possibilities of using the Support Vector Machine (SVM) classifier to solve the Word Sense Disambiguation problem in a supervised manner after using the Levenshtein Distance algorithm to measure the matching distance between words through the usage of the lexical samples of five Arabic words. The performance of the proposed technique is compared to supervised and unsupervised machine learning algorithms, namely the Naïve Bayes Classifier (NBC) and Latent Semantic Analysis (LSA) with K-means clustering, representing the baseline and state-of-the-art algorithms for WSD.
Polibits, 2012
In this paper we test some supervised algorithms that most of the existing related works of word sense disambiguation have cited. Due to the lack of linguistic data for the Arabic language, we work on non-annotated corpus and with the help of four annotators; we were able to annotate the different samples containing the ambiguous words. Since that, we test the Naïve Bayes algorithm, the decision lists and the exemplar based algorithm. During the experimental study, we test the influence of the window size on the disambiguation quality, the derivation and the technique of smoothing for the (2n+1)-grams. For these tests the exemplar based algorithm achieves the best rate of precision.
2013 International Conference on Electrical Engineering and Software Applications, 2013
In this paper we propose an unsupervised method for Arabic word sense disambiguation. Using the corpus and the glosses of the ambiguous word, we define a method to generate automatically the context of use for each sense. Since that, we define a similarity measure based on collocation measures to find the most nearest context of use to the sentence containing the ambiguous word. The similarity measure may give more than one sense, for that we define a novel supervised approach called vote procedure. Our work was compared with other related works. We obtained a better rate of disambiguation in the average of 79%.
Speech and Language Technology, 2009
We are presenting a word sense disambiguation method applied in automatic translation of a query from Arabic into English. The developed machine learning approach is based on statistical models, that can learn from parallel corpora by analysing the relations between the items included in this corpora in order to use them in the word sense disambiguation task. The relations between items in this corpora are obtained by using and developing a purely statistical method from corpora in order to avoid the use of structured linguistic resources like ontology which are not yet available for Arabic in an appropriate quality. The results of this analysis should provide us with some useful semantic information that can help to find the best translation equivalents of the polysemous items.
Int. Arab J. Inf. Technol., 2016
In this paper, we present two contributions for Arabic Word Sense Disambiguation. In the first one, we propose to use both two external resources Arabic WordNet (AWN) and WN based on term to term Machine Translation System (MTS). The second contribution consists of choosing the nearest concept for the ambiguous terms, based on more relationships with different concepts in the same local context. To evaluate the accuracy of our proposed method, several experiments have been conducted using Feature Selection methods; Chi-Square and CHIR, two machine learning techniques; the Naïve Bayesian (NB) and Support Vector Machine (SVM).The obtained results illustrate that using the proposed method increases greatly the performance of our Arabic Text Categorization System.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.
International Journal of Advanced Computer Science and Applications, 2012
IRAQI JOURNAL OF SCIENCE, 2017
MATEC Web of Conferences
2019
International Journal of Scientific Research in Computer Science, Engineering and Information Technology, 2018
Computational Linguistics, 1994
Journal of Computing and Information Technology