Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
The main purpose of this paper is to develop a simple and flexible spell-checker for Arabic language. The proposed spell-checker is based on N-Grams scores. For this purpose, eleven matrices are built to present the combination between the Arabic letters word. Each matrix concerns in the connection between a 2-grams letters. Each cell in the generarated matrix is assigned an integer value 2, 1 or 0. The cell is assigned the value 2 in the corresponding matrix; if the word is ended by these two letter and assigned 1 if there is a connection and the word is not over yet, and is assigned 0 otherwise. On the other side searching process for any word that is by extracting each pair of letters in the word then it examines the value for each pair when the corresponding value is zero then the spell checker will consider the test word as wrong; otherwise it will check if it is assign with 1 that indicates that there is a connection it will be continue until reach to the value of 2 to determine that the word is correct. The overall accuracy for the proposed spell-checker is reached to 98.99%.
International Journal of Advanced Computer Science and Applications, 2014
In this paper, we propose a new approach for spellchecking errors committed in Arabic language. This approach is almost independent of the used dictionary, of the fact that we introduced the concept of morphological analysis in the process of spell-checking. Hence, our new system uses a stems dictionary of reduced size rather than exploiting a large dictionary not covering the all Arabic words.
The 4th Conference on Language Engineering, 2003
Arabic's rich morphology (word construction) and complex orthography (writing system) present unique challenges for automatic spell checking. An Arabic checker attempts to find a dictionary word that might be the correct spelling of the misspelled or misrecognized word In this paper, we report our attempt in developing an Arabic spelling checker program for solving this problem. Our approach is heuristic and involves developing an Arabic morphological analyzer, techniques of spelling checking and spelling correction, and efficient methods of lexicon operations. The developed Arabic spell checker is able to recognize common spelling errors for standard Arabic and Egyptian dialects.
Research in Computing Science
The objective of this work is to perform a spell check tool that analyzes the text entered in search for possible misspellings. This tool will suggest possible corrections for each misspelled word in the text. This work will require the presence of a reference dictionary of words in the arabic language. These objective Were Accomplished with resources, effective methods, and approaches. First experimental results on real data are encouraging and provide evidence of the validity of the design choices. They also help to highlight the difficulty of the task, and suggest possible developments.
The main aim of this study is to develop a spell-checker system for Arabic language. This is done by investigating the viability of applying the radix search tree approach. Through this scientific research several shrubs that represent Arabic characters will be built through serialized tracking of characters word where it can be added to the dictionary and with a special mark in the node that contains the last characters from each word; on other side during searching process, every word can be tracked character by character according suitable path inside its shrub, Accordingly, correct word can be recognized if and only if searching process locates some leaves during the traverse of the shrub. Otherwise, the word will be considered incorrect.
ABSTRACT A spelling error detection and correction application is based on three main components: a dictionary (or reference word list), an error model and a language model. While most of the attention in the literature has been directed to the language model, we show how improvements in any of the three components can lead to significant cumulative improvements in the overall performance of the system.
2014
Since 1997, the MS Arabic spell checker was integrated by Coltec-Egypt in the MS-Office suite and till now many Arabic users find it worthless. In this study, we show why the MS-spell checker fails to attract Arabic users. After spell-checking a document (10 pages -3300 words in Arabic), the assessment procedure spots 78 false positive errors. They reveal the lexical resource flaws: an unsystematic lexical coverage of the feminine and the broken plural of nouns and adjectives, and an arbitrary coverage of verbs and nouns with prefixed or suffixed particles. This unsystematic and arbitrary lexical coverage of the language resources pinpoints the absence of a clear definition of a lexical entry and an inadequate design of the related agglutination rules. Finally, this assessment reveals in general the failure of scientific and technological policies in big companies and in research institutions regarding Arabic.
2020
It is known that the importance of spell checking, which increases with the expanding of technologies, using the Internet and the local dialects, in addition to non-awareness of linguistic language. So, this importance increases with the Arabic language, which has many complexities and specificities that differ from other languages. This paper explains these specificities and presents the existing works based on techniques categories that are used, as well as explores these techniques. Besides, it gives directions for future work.
The Automatic Speech Recognition is defining as the process of convert a speech wave into text by using a computer. Speech recognition is the easiest way manipulate with the computer application especially to the people that have no arms. This paper proposes an Arabic word and popular language (Iraqi language) error correction method and algorithm for speech recognition system. The proposed algorithm is split the input content (that is input as a speech wave and convert it to text by speech recognition system) into a few word-tokens that are submitted as search questions to the system. The system offer to replace the error word by the suggested correction using n-gram features and save the writing words in a text file that the user will choose the path of it. Future research can improve upon the proposed system so much so that it can be take many correction algorithms and make difference between them.
International Journal of Electrical and Computer Engineering (IJECE), 2023
Digital environments for human learning have evolved a lot in recent years thanks to incredible advances in information technologies. Computer assistance for text creation and editing tools represent a future market in which natural language processing (NLP) concepts will be used. This is particularly the case of the automatic correction of spelling mistakes used daily by data operators. Unfortunately, these spellcheckers are considered writing aids tools, they are unable to perform this task automatically without user's assistance. In this paper, we suggest a filtered composition metric based on the weighting of two lexical similarity distances in order to reach the auto-correction. The approach developed in this article requires the use of two phases: the first phase of correction involves combining two well-known distances: the edit distance weighted by relative weights of the proximity of the Arabic keyboard and the calligraphical similarity between Arabic alphabet, and combine this measure with the Jaro-Winkler distance to better weight, filter solutions having the same metric. The second phase is considered as a booster of the first phase, this use the probabilistic bigram language model after the recognition of the solutions of error, which may have the same lexical similarity measure in the first correction phase. The evaluation of the experimental results obtained from the test performed by our filtered composition measure on a dataset of errors allowed us to achieve a 96% of auto-correction rate. This is an open access article under the CC BY-SA license.
Arabic Spelling Error Detection and Correction, 2015
A spelling error detection and correction application is typically based on three main components: a dictionary (or reference word list), an error model and a language model. While most of the attention in the literature has been directed to the language model, we show how improvements in any of the three components can lead to significant cumulative improvements in the overall performance of the system. We develop our dictionary of 9.2 million fully-inflected Arabic words (types) from a morphological transducer and a large corpus, validated and manually revised. We improve the error model by analyzing error types and creating an edit distance re-ranker. We also improve the language model by analyzing the level of noise in different data sources and selecting an optimal subset to train the system on. Testing and evaluation experiments show that our system significantly outperforms Microsoft Word 2013, OpenOffice Ayaspell 3.4 and Google Docs.
With advancements in industry and information technology, large volumes of electronic documents such as newspapers, emails, weblogs, and theses are produced daily. Producing electronic documents has considerable benefits such as easy organizing and data management. Therefore, existence of automatic systems such as spell and grammar-checker/correctors can help to improve their quality. In this article, the development of an automatic spelling, grammatical and realword error checker for Persian (Farsi) language, named Vafa Spell-Checker, is explained. Different kinds of errors in a text can be categorized into spelling, grammatical, and real-word errors. Vafa Spell-Checker is a hybrid system in which both rule-based and statistical approaches are used to detect/correct whole types of errors. The detection and correction phases of spelling and realword errors are fully statistical, while for the grammar-checker, a rule-based approach is proposed. Vafa Spell-Checker attempts to process these kinds of error types in an integrated system for Persian language. The results on the real-world collected test set indicate that continuing the work on grammarchecker requires statistical approaches. Evaluation results with respect to F 0.5 measure for spell-checker, grammar-checker, and real-word error checker are about 0.908, 0.452, and 0.187, respectively. Moreover, several free-usable language resources for Persian that are generated during this project are demonstrated in this article. These resources could be used in the further research in Persian language.
2012
Abstract Arabic is a language known for its rich and complex morphology. Although many research projects have focused on the problem of Arabic morphological analysis using different techniques and approaches, very few have addressed the issue of generation of fully inflected words for the purpose of text authoring. Available open-source spell checking resources for Arabic are too small and inadequate.
Indian Journal of Science and Technology, 2015
Spellchecker is a software tool that identifies and corrects any spelling mistakes in a text document. Designing a spell checker for Punjabi language is a challenging task. Punjabi language can be written in two scripts, Gurmukhi script (a Left to Right script based on Devanagari) and Perso-Arabic Script (a Right to Left script) which is also referred as Shahmukhi. Gurmukhi script follow 'one sound-one symbol' principle where as Shahmukhi follows 'one sound-multiple symbol' principle. Thus making Shahmukhi text even more challenging which complicates the design of spell checker for Shahmukhi text. The text written in Shahmukhi normally does not have short vowels and diacritic marks. So missing some of diacritic marks should not be considered as a mistake. But for Holy books like Quran, missing diacritic marks are considered as a mistake. So spell checker is designed in such a way that it can spell check with and without diacritic marks compulsion, which depends on user's selection to spell check. In addition to this, Shahmukhi text has complex grammatical rules and phonetic properties. Thus it needs different algorithms and techniques for expected efficiency. This paper presents the complete design and implementation of a spell checker for Shahmukhi text.
International Journal of Scientific & Technology Research, 2014
Introducing texts to word processing tools may result in spelling errors. Hence, text processing application software‘s has spell checkers. Integrating spell checker into word processors reduces the amount of time and energy spent to find and correct the misspelled word. However, these tools are not available for Afaan Oromo, Cushitic language family spoken in Ethiopia. In this paper, we describe the design and implementation of a non-word Afaan Oromo spell checker. The system is designed based on a dictionary look-up with morphological analysis (i.e. morphology based spell checker). To develop morphology based spell checker, the knowledge of the language morphology is necessarily required. Accordingly, the morphological properties of Afaan Oromo have been studied. To the best of our knowledge, this work is the first of its kind for Afaan Oromo. The methodology delineated in the paper can be replicated for other languages showing similar morphology with Afaan Oromo.
International journal of engineering research and technology, 2020
Spell checking means to detect the error and correct the error. Spell checking is a well known task in natural language processing. Spelling error detection and correction is the process that will check the spelling of words in a document and in occurrence of any error, list out the correct spelling in the form of suggestions. Spelling checker tools use a dictionary as a database. Every word from the text is looked up in the dictionary. When a word is not present in the dictionary it will be treated as error. To overcome the error spell checker searches the dictionary for words that resemble the erroneous word most. These words are listed as suggestions for that error word and the use has to choose the best word from the list of suggestions. In this work a dictionary is created which contains odia words. Edit distance algorithm is implemented for the design of the Odia spell checker with prediction.
The International Conference on Informatics and Systems, 2010
Spellcheckers are widely used in many software products for identifying errors in users' writings. However, they are not designed to address spelling errors made by non-native learners of a language. As a matter of fact, spelling errors made by non-native learners are more than just misspellings. Non-native learners' errors require special handling in terms of detection and correction, especially when it comes to morphologically rich languages such as Arabic, which have few related resources. In this paper, we address common error patterns made by non-native Arabic learners and suggest a two-layer spell-checking approach, including spelling error detection and correction. The proposed error detection mechanism is applied on top of Buckwalter's Arabic morphological analyzer in order to demonstrate the capability of our approach in detecting possible spelling errors. The correction mechanism adopts a rule-based edit distance algorithm. Rules are designed in accordance with common spelling error patterns made by Arabic learners. Error correction uses a multiple filtering mechanism to propose final corrections. The approach utilizes semantic information given in exercising questions in order to achieve highly accurate detection and correction of spelling errors made by non-native Arabic learners. Finally, the proposed approach was evaluated using real test data and promising results were achieved.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.