Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
…
10 pages
1 file
ABSTRACT A spelling error detection and correction application is based on three main components: a dictionary (or reference word list), an error model and a language model. While most of the attention in the literature has been directed to the language model, we show how improvements in any of the three components can lead to significant cumulative improvements in the overall performance of the system.
Arabic Spelling Error Detection and Correction, 2015
A spelling error detection and correction application is typically based on three main components: a dictionary (or reference word list), an error model and a language model. While most of the attention in the literature has been directed to the language model, we show how improvements in any of the three components can lead to significant cumulative improvements in the overall performance of the system. We develop our dictionary of 9.2 million fully-inflected Arabic words (types) from a morphological transducer and a large corpus, validated and manually revised. We improve the error model by analyzing error types and creating an edit distance re-ranker. We also improve the language model by analyzing the level of noise in different data sources and selecting an optimal subset to train the system on. Testing and evaluation experiments show that our system significantly outperforms Microsoft Word 2013, OpenOffice Ayaspell 3.4 and Google Docs.
Research in Computing Science
The objective of this work is to perform a spell check tool that analyzes the text entered in search for possible misspellings. This tool will suggest possible corrections for each misspelled word in the text. This work will require the presence of a reference dictionary of words in the arabic language. These objective Were Accomplished with resources, effective methods, and approaches. First experimental results on real data are encouraging and provide evidence of the validity of the design choices. They also help to highlight the difficulty of the task, and suggest possible developments.
Proceedings of the EMNLP 2014 Workshop on Arabic Natural Language Processing (ANLP), 2014
In this work, we address the problem of spelling correction in the Arabic language utilizing the new corpus provided by QALB (Qatar Arabic Language Bank) project which is an annotated corpus of sentences with errors and their corrections. The corpus contains edit, add before, split, merge, add after, move and other error types. We are concerned with the first four error types as they contribute more than 90% of the spelling errors in the corpus. The proposed system has many models to address each error type on its own and then integrating all the models to provide an efficient and robust system that achieves an overall recall of 0.59, precision of 0.58 and F1 score of 0.58 including all the error types on the development set. Our system participated in the QALB 2014 shared task "Automatic Arabic Error Correction" and achieved an F1 score of 0.6, earning the sixth place out of nine participants.
The International Conference on Informatics and Systems, 2010
Spellcheckers are widely used in many software products for identifying errors in users' writings. However, they are not designed to address spelling errors made by non-native learners of a language. As a matter of fact, spelling errors made by non-native learners are more than just misspellings. Non-native learners' errors require special handling in terms of detection and correction, especially when it comes to morphologically rich languages such as Arabic, which have few related resources. In this paper, we address common error patterns made by non-native Arabic learners and suggest a two-layer spell-checking approach, including spelling error detection and correction. The proposed error detection mechanism is applied on top of Buckwalter's Arabic morphological analyzer in order to demonstrate the capability of our approach in detecting possible spelling errors. The correction mechanism adopts a rule-based edit distance algorithm. Rules are designed in accordance with common spelling error patterns made by Arabic learners. Error correction uses a multiple filtering mechanism to propose final corrections. The approach utilizes semantic information given in exercising questions in order to achieve highly accurate detection and correction of spelling errors made by non-native Arabic learners. Finally, the proposed approach was evaluated using real test data and promising results were achieved.
International Journal of Computing and Digital Systems
Automatic spelling correction is a very important task used in many Natural Language Processing (NLP) applications such as Optical Character Recognition (OCR), Information retrieval, etc. There are many approaches able to detect and correct misspelled words. These approaches can be divided into two main categories: contextual and context-free approaches. In this paper, we propose a new contextual spelling correction method applied to the Arabic language, without loss of generality for other languages. The method is based on both the Viterbi algorithm and a probabilistic model built with a new estimate of n-gram language models combined with the edit distance. The probabilistic model is learned with an Arabic multipurpose corpus. The originality of our work consists in handling up global and simultaneous correction of a set of many erroneous words within sentences. The experiments carried out prove the performance of our proposal, giving encouraging results for the correction of several spelling errors in a given context. The method achieves a correction accuracy of up to 93.6% by evaluating the first given correction suggestion. It is able to take into account strong links between distant words carrying meaning in a given context. The high-level correction accuracy of our method allows for its integration into many applications.
The 4th Conference on Language Engineering, 2003
Arabic's rich morphology (word construction) and complex orthography (writing system) present unique challenges for automatic spell checking. An Arabic checker attempts to find a dictionary word that might be the correct spelling of the misspelled or misrecognized word In this paper, we report our attempt in developing an Arabic spelling checker program for solving this problem. Our approach is heuristic and involves developing an Arabic morphological analyzer, techniques of spelling checking and spelling correction, and efficient methods of lexicon operations. The developed Arabic spell checker is able to recognize common spelling errors for standard Arabic and Egyptian dialects.
International Journal of Advanced Computer Science and Applications, 2014
In this paper, we propose a new approach for spellchecking errors committed in Arabic language. This approach is almost independent of the used dictionary, of the fact that we introduced the concept of morphological analysis in the process of spell-checking. Hence, our new system uses a stems dictionary of reduced size rather than exploiting a large dictionary not covering the all Arabic words.
Proceedings of the EMNLP 2014 Workshop on Arabic Natural Language Processing (ANLP), 2014
Automatic correction of misspelled words means offering a single proposal to correct a mistake, for example, switching two letters, omitting letter or a key press. In Arabic, there are some typical common errors based on letter errors, such as confusing in the form of Hamza ,ھﻤﺰة confusion between Daad ﺿﺎد and Za ,ﻇﺎء and the omission dots with Yeh ﯾﺎء and Teh ﺗﺎء. So we propose in this paper a system description of a mechanism for automatic correction of common errors in Arabic based on rules, by using two methods, a list of words and regular expressions.
2010
We describe a new Arabic spelling correction system which is intended for use with electronic dictionary search by learners of Arabic. Unlike other spelling correction systems, this system does not depend on a corpus of attested student errors but on student-and teacher-generated ratings of confusable pairs of phonemes or letters. Separate error modules for keyboard mistypings, phonetic confusions, and dialectal confusions are combined to create a weighted finite-state transducer that calculates the likelihood that an input string could correspond to each citation form in a dictionary of Iraqi Arabic. Results are ranked by the estimated likelihood that a citation form could be misheard, mistyped, or mistranscribed for the input given by the user. To evaluate the system, we developed a noisy-channel model trained on students' speech errors and use it to perturb citation forms from a dictionary. We compare our system to a baseline based on Levenshtein distance and find that, when evaluated on single-error queries, our system performs 28% better than the baseline (overall MRR) and is twice as good at returning the correct dictionary form as the top-ranked result. We believe this to be the first spelling correction system designed for a spoken, colloquial dialect of Arabic. 1 A table of SATTS equivalents can be found at
International Journal of Electrical and Computer Engineering (IJECE), 2023
Digital environments for human learning have evolved a lot in recent years thanks to incredible advances in information technologies. Computer assistance for text creation and editing tools represent a future market in which natural language processing (NLP) concepts will be used. This is particularly the case of the automatic correction of spelling mistakes used daily by data operators. Unfortunately, these spellcheckers are considered writing aids tools, they are unable to perform this task automatically without user's assistance. In this paper, we suggest a filtered composition metric based on the weighting of two lexical similarity distances in order to reach the auto-correction. The approach developed in this article requires the use of two phases: the first phase of correction involves combining two well-known distances: the edit distance weighted by relative weights of the proximity of the Arabic keyboard and the calligraphical similarity between Arabic alphabet, and combine this measure with the Jaro-Winkler distance to better weight, filter solutions having the same metric. The second phase is considered as a booster of the first phase, this use the probabilistic bigram language model after the recognition of the solutions of error, which may have the same lexical similarity measure in the first correction phase. The evaluation of the experimental results obtained from the test performed by our filtered composition measure on a dataset of errors allowed us to achieve a 96% of auto-correction rate. This is an open access article under the CC BY-SA license.
The international conference on Language Resources and Evaluation (LREC), 2012
The International Arab Journal of Information Technology
Proceedings of the Second Workshop on Arabic Natural Language Processing, 2015
Text as a Linguistic …, 2001
Natural Language Understanding and Cognitive Science, 2005