Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2011, 2011 IEEE GCC Conference and Exhibition (GCC)
…
1 page
1 file
The amount of Arabic electronic information is growing drastically on the web. Statistics shows that the number of Internet users in the Middle East has increased enormously since the year 2000 due to increase in ICT awareness and its importance within Arab countries. As a result this has raised the need to find effective methods and techniques for allocating and retrieving the Arabic-based content from the web. This paper presents major Information Retrieval (IR) tools and techniques and it highlights few challenges in this regard.
EAI Endorsed Transactions on Internet of Things
Information retrieval is an important field that aims to provide a relevant document to a user information need, expressed through a query. Arabic is a challenging language that gained much attention recently in the information retrieval domain. To overcome the problems related to its complexity, many studies and techniques have been presented, most of them were conducted to solve the stemming problem. This paper presents an overview of the Arabic information retrieval process, including various text processing techniques, ranking approaches, evaluation measures, and some important information retrieval models. The paper finally presents some recent related studies and approaches in different Arabic information retrieval fields.
… Journes d'Etude sur la Parole …, 2004
Arabic IR (Information Retrieval) has recently become a focus of research and commercial development. Very few standards for evaluation of such tools are known and available. A concrete evaluation for Arabic IR systems is necessary for the advancement of this field.
International Journal of Advanced Computer Science and Applications, 2016
The field of information retrieval has witnessed tangible progress over the past decades in response to the expanded usage of the internet and the dire need of users to search for massive amounts of digital information. Given the steady increase of Arabic e-content, excellent information retrieval systems must be devised to suit the nature and requirements of the Arabic language. This paper sheds light on the current progress in the field of Arabic information retrieval, identifies the challenges that hinder the progress of this science, and proposes suggestions for further research. This paper uses the descriptive analytical method to examine the reality of Arabic studies in the field of information retrieval and to study the problems that are being faced in this area. Specifically, the previous literature on information retrieval is reviewed by searching the related databases and websites.
Arabic words are typically derived through a robust system of Arabic roots. According to Sakhr Software Company, there are over 10,000 potential roots, but far fewer that are used regularly.4 This robust root system also has large implications in Arabic IR and may present information retrieval challenges. This literature review will address two main issues that are consistently raised in this field of Arabic IR: stemming and stopwords. There are still no standardized methods of stemming or stopword elimination highlighting the infancy of the field of Arabic information retrieval.
There are many hot topics related to information retrieval paradigm, and one of these important fields is Automatic text indexing that aims to make process of online retrieving documents easier for the web searchers. In this paper we intend to introduced a comprehensive study on Indexing Arabic Documents, since there have been little works deals with. The introduced papers here addressed this problem from deferent views, some deals with single-term indexing while others deal with phrase indexing, other researchers made comparisons between deferent techniques and gave us preferability to one against others based on some experimental results. On the other hand, some papers proposed new technique or made some enhancements on existing ones either depends on statistical or un-statistical methods. The rest of papers proposed tools as key-terms extractors to be used in text indexing. Till now there are no optimal suggested solutions that solve the indexing problem that could be considered ...
2014 UKSim-AMSS 16th International Conference on Computer Modelling and Simulation, 2014
The main issue that currently faces research in the information society is the flood of information; a problem exacerbated by the massive diversity of information on the World Wide Web. It has given researchers access to millions of references, articles, news and services. Regardless of geographic location and language used, much of this information is unstructured data. There is a large body of research on mining unstructured Web data, but little effort for Web pages authored in Arabic. This paper investigates the Semantic Web (SW) support for handling documents that are authored and/or annotated in Arabic, and how to bridge the gap between the SW and Natural Language Processing (NLP). Moreover, to improve the intelligent exploration of unstructured documents in the Arabic domain.
ACM Transactions on Asian and Low-Resource Language Information Processing, 2016
Cross-language information retrieval (CLIR) deals with retrieving relevant documents in one language using queries expressed in another language. As CLIR tools rely on translation techniques, they are challenged by the properties of highly derivational and flexional languages like Arabic. Much work has been done on CLIR for different languages including Arabic. In this article, we introduce the reader to the motivations for solving some problems related to Arabic CLIR approaches. The evaluation of these approaches is discussed starting from the 2001 and 2002 TREC Arabic CLIR tracks, which aim to objectively evaluate CLIR systems. We also study many other research works to highlight the unresolved problems or those that require further investigation. These works are discussed in the light of a deep study of the specificities and the tasks of Arabic information retrieval (IR). Particular attention is given to translation techniques and CLIR resources, which are key issues challenging ...
The ambition of this paper is to resume briefly the challenges that Arabic offers in Cross-Lingual Information Retrieval, to show the potential of MIMOR 1 , a retrieval system, that has proved to be succesful for cross-lingual retrieval tasks, and to propose string matching techniques for feature unification instead of stemming techniques.
2004
Human Language Technology has played a big role in implementing Latin based information retrieval systems. Two of the most sited techniques are stemming and truncation. Numerous studies have showed that the inflectional structure of words has a big impact on the retrieval accuracy of Latin-based languages information retrieval systems (IRS). Stemming or truncation is done for two principal reasons: the reduction in index storage required and the increase in performance due to the use of word variants. Several stemming algorithms were proposed for stemming text such as Porter for English.
Lecture Notes in Computer Science, 2015
The main reason of adopting Semantic Web technology in information retrieval is to improve the retrieval performance. A semantic search-based system is characterized by locating web contents that are semantically related to the query's concepts rather than relying on the exact matching with keywords in queries. There is a growing interest in Arabic web content worldwide due to its importance for culture, political aspect, strategic location, and economics. Arabic is linguistically rich across all levels which makes the effective search of Arabic text a challenge. In the literature, researches that address searching the Arabic web content using semantic web technology are still insufficient compared to Arabic's actual importance as a language. In this research, we propose an Arabic semantic search approach that is applied on Arabic web content. This approach is based on the Vector Space Model (VSM), which has proved its success and many researches have been focused on improving its traditional version. Our approach uses the Universal WordNet to build a rich conceptspace index instead of the traditional term-space index. This index is used for enabling a Semantic VSM capabilities. Moreover, we introduced a new incidence measurement to calculate the semantic significance degree of the concept in a document which fits with our model rather than the traditional term frequency. Furthermore, for the purpose of determining the semantic similarity of two vectors, we introduced a new formula for calculating the semantic weight of the concept. Because documents are indexed by their topics and classified semantically, we were able to search Arabic documents effectively. The experimental results in terms of Precision, Recall and F-measure have showed improvement in performance from 77%, 56%, and 63% to 71%, 96%, and 81%, respectively.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.
International Journal of ADVANCED AND APPLIED SCIENCES
SMC'03 Conference Proceedings. 2003 IEEE International Conference on Systems, Man and Cybernetics. Conference Theme - System Security and Assurance (Cat. No.03CH37483), 2003
International Journal of Computer Processing of Languages, 2004
Lecture Notes in Computer Science, 2015
Journal of the American Society for Information Science, 1997
IRAQI JOURNAL OF SCIENCE, 2017