Improved Spelling Error Detection and Correction for Arabic

Mohammed Attia

Improved Spelling Error Detection and Correction for Arabic

Mohammed Attia

visibility

…

description

10 pages

link

1 file

ABSTRACT A spelling error detection and correction application is based on three main components: a dictionary (or reference word list), an error model and a language model. While most of the attention in the literature has been directed to the language model, we show how improvements in any of the three components can lead to significant cumulative improvements in the overall performance of the system.

Sign up for access to the world's latest research

checkGet notified about relevant papers

checkSave papers to use in your research

checkJoin the discussion with peers

checkTrack your impact

Khaled Shaalan

Arabic Spelling Error Detection and Correction, 2015

A spelling error detection and correction application is typically based on three main components: a dictionary (or reference word list), an error model and a language model. While most of the attention in the literature has been directed to the language model, we show how improvements in any of the three components can lead to significant cumulative improvements in the overall performance of the system. We develop our dictionary of 9.2 million fully-inflected Arabic words (types) from a morphological transducer and a large corpus, validated and manually revised. We improve the error model by analyzing error types and creating an edit distance re-ranker. We also improve the language model by analyzing the level of noise in different data sources and selecting an optimal subset to train the system on. Testing and evaluation experiments show that our system significantly outperforms Microsoft Word 2013, OpenOffice Ayaspell 3.4 and Google Docs.

Log In

Improved Spelling Error Detection and Correction for Arabic

Sign up for access to the world's latest research

Related papers

Related papers