Arabic Document Classification by Deep Learning

Lobna Hsairi

Arabic Document Classification by Deep Learning

Lobna Hsairi

2021, International Journal of Advanced Computer Science and Applications

visibility

…

description

8 pages

link

1 file

In this paper, we show how to classify Arabic document images using a convolutional neural network, which is one of the most common supervised deep learning algorithms. The main goal of using deep learning is its ability to automatically extract useful features from images, which eliminates the need for a manual feature extraction process. Convolutional neural networks can extract features from images through a convolution process involving various filters. We collected a variety of Arabic document images from various sources and passed them into a convolutional neural network classifier. We adopt a VGG16 pre-trained network trained on ImageNet to classify the dataset of four classes as handwritten, historical, printed, and signboard. For the document image classification, we used VGG16 convolutional layers, ran the dataset through them, and then trained a classifier on top of it. We extract features by fixing the pre-trained network's convolutional layers, then adding the fully connected layers and training them on the dataset. We update the network with the addition of dropout by adding after each max-pooling layer and to the fourteen and the seventeenth layers which are the fully connected layers. The proposed approach achieved a classification accuracy of 92%.

Sign up for access to the world's latest research

checkGet notified about relevant papers

checkSave papers to use in your research

checkJoin the discussion with peers

checkTrack your impact

MAROUAN ELMANSOURI

2020

Handwritten Arabic, like other handwritten (such as Latin, Chinese, etc.), have received increasing attention from several researchers. To preserve and promote wider access to the invaluable cultural and literary heritage held in both public and private collections of manuscripts, the researchers have proposed and developed several approaches based on annotation, metadata, and transcription. The need to access to the manuscript text is increasing on a large scale. For this reason, traditional methods of indexing such as annotation or transcription will be outdated as they require a considerable and unreliable manual effort. It is, therefore, necessary to develop new tools for the identification and recognition of handwritten text contained in images. However, despite the development that has been shown by Convolutional Neural Network (CNN) in different computer vision tasks, the latter has not known many uses in the field of Arabic manuscripts. Even if, the use of these methods based on deep learning to predict the class of characters, such as the Handwritten numbers, has achieved a great result. Hence, the idea of using methods based on deep learning techniques to classify words and characters in images of Arabic manuscripts. In this paper, we propose two classification methods to predict the class of each word, using the HADARA80P dataset. The first one uses a simple neural network and the last one uses a convolutional neural network. The experimental results obtained by these two methods are very interesting

Log In

Arabic Document Classification by Deep Learning

Sign up for access to the world's latest research

Related papers

Related papers

Related topics