Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
…
10 pages
1 file
AI-generated Abstract
This research presents methodologies for Optical Character Recognition (OCR) of Sinhala script, encompassing digital, handwritten, and palm-leaf manuscripts. The initial case study focuses on developing a multi-font and multi-size OCR system that addresses the challenges posed by varying character representations. The second case study explores the recognition of handwritten Sinhala, highlighting the need for efficient data entry applications. Finally, the study discusses OCR for palm-leaf manuscripts, emphasizing the importance of preserving historical texts. The findings indicate that Artificial Neural Networks play a vital role in achieving reliable recognition results across different cases.
Sinhala is a language used by Sinhalese the major ethnic group in Sri Lanka. Most of the textual data gathered in Sri Lanka is in Sinhala language. Converting the data collected on handwritten papers in to digital format enables the automatic data retrieval and also many other advantages including editing, searching, distribution over a network. The manual conversion of the paper documents to electronic format is the available solution which requires an enormous amount of human labour. Using a computer to create an electronic document in Sinhala is extra difficult. If the process of conversion can be automated using handwriting recognition, a significant cost can be saved. Considering the importance of the task, the research effort taken for Sinhala handwriting recognition is not adequate. This paper has examined the current state of handwriting recognition in Sinhala Language and the problem areas that requires attention.
2006
Abstract The online handwritten character recognition on handheld devices presents a challenging assignment due to the limited memory, processing and storage constraints. This paper presents an online handwritten character recognition process based on Non Deterministic Finite Automaton (NFA) and XML character pattern profiles for Sinhala alphabetic characters. The process models a character as a sequence of directional states and accumulates under a hierarchical XML tag structure.
Preprocessing is an essential step in any image processing task. This paper describes the issues that have to be addressed when a Preprocessing engine is developed for an OCR System for Sinhala characters based on a template matching technique. A detail description of the procedures implemented is also given along with their results.
2013 IEEE Global Humanitarian Technology Conference: South Asia Satellite (GHTC-SAS), 2013
A rotationally invariant optical character recognition system for Sinhala language is developed using Two Dimensional Fourier Transform and Artificial Neural Networks. Sinhala characters of different fonts and font sizes are recognized with over 85% recognition accuracy. Segmentation method based on histogram used in this system gives segmentation accuracy over 70% for complex Sinhala characters.
2005
India is a multilingual country. A significantly large number of scripts are used to represent these languages. A desire of vision researchers is to develop an integrated optical character recognition (OCR) system, which will be able to process all such scripts. Such a development, if objectified, will not only enable faster flow of information across the country, but also have a profound effect on its scientific and economical development. Courageous endeavours have been successfully made towards the development of systems capable of recognizing machine-printed or handwritten characters and/or numerals. However, most Indian scripts do not have an integrated OCR system. Further, the development of a unified system, which is capable of processing all Indian scripts is still a dream. This article presents a survey of the current literature on the development of OCR's in Indian scripts. Reviewing the basis of and the motivation towards the development of OCR system, the article analyzes the various methodologies employed in general purpose pattern recognition systems. A critical analysis of the work towards OCR systems in Indian languages, with pointers towards possible future work, is also presented.
In the present scenario most of the importance is given for the " paperless office " there by more and more communication and storage of documents is performed digitally. Documents and files which are present in Hindi and Marathi languages that were once stored physically on paper are now being converted into electronic form in order to facilitate quicker additions, searches, and modifications, as well as to prolong the life of such records. Because of this, there is a great demand of such software, which automatically extracts, analyze, recognize and store information from physical documents for later retrieval. Skew detection is used for text line position determination in Digitized documents, automated page orientation, and skew angle detection for binary document images, skew detection in handwritten scripts, in compensation for Internet audio applications and in the correction of scanned documents.
2019
2317 Published By: Blue Eyes Intelligence Engineering & Sciences Publication Retrieval Number F8614088619/2019©BEIESP DOI: 10.35940/ijeat.F8614.088619 Abstract: Optical character recognition (OCR) is a strategy to perceive character from optically checked and digitized pages. OCR plays an important role for Indian script research. The official language of the state Odisha is Odia. OCR face an incredible difficulties to recognize Odia language due to similar shape characters, their complex nature, the complicated way in which they combine form to compound character, use of Matra etc. Each character and numbers are passed through several modules like binarization, noise removal, segmentation, line segmentation, word segmentation, skeletonization, deskewing, thinning, thickening. The input picture is standardized to a size of 50 x 50 2D pictures. HMM is a stochastic process which has utilized in various applications for example speech recognition, Handwriting recognition, Gesture rec...
International Journal on Advances in ICT for Emerging Regions (ICTer), 2021
While optical character recognition for Latin based scripts have seen near human quality performance, the accuracy for the rounded scripts of South Asia still lags behind. Work on Sinhala OCR has mainly reported on performance on constrained classes of font faces and so been inconclusive. This paper provides a comprehensive series of experiments using conventional machine learning as well as deep learning on texts and font faces of diverse types and in diverse resolutions, in order to present a realistic estimation of the complexity of recognizing the rounded script of Sinhala. While texts of both old and contemporary books can be recognized with over 87% accuracy, those in old newspapers are much harder to recognize owing to poor print quality and resolution.
Optical Character Recognition or OCR is the electronic translation of handwritten, typewritten or printed text into machine translated images. Optical Character Recognition (OCR) is a very important task in Pattern Recognition. Foreign languages, especially English character recognition has been extensively studied by many researches but due to complication of Indian Languages like Hindi ,Punjabi ,teulgu ,malyalam etc. the research work is very limited and constrained. This paper presents the research work related to all Indian languages, various approaches to character recognition along with some applications of character recognition is also discussed in this paper. The aim of this paper is to provide an overview of the research going on in Indian script OCR systems. This survey paper has been felt necessary when the research on OCRs for Indian scripts is still a challenging task. Hence, a brief introduction to the general OCR and typical steps in the development of an OCR are give...
2013
Character Recognition is an active field of research today. It comprises of Pattern Recognition and Image Processing. Character Recognition has gained enormous attention due to its application in various fields. Character Recognition is broadly categorized into Optical Character Recognition (OCR) and Handwritten Character Recognition (HCR). OCR system is most suitable for the applications like multi choice examinations, printed postal address resolution etc, while application of HCR is wider as compared to OCR. HCR is useful in cheque processing in banks; almost all kind of form processing systems, handwritten postal address resolution and many more. Even though, sufficient studies have performed in foreign scripts like Chinese, Japanese and Arabic characters as well as Indian Scripts like Devnagri, still some scripts remain less contributed. In this survey, we provide a detail study of classifiers as well as various techniques proposed by different researchers for handwritten chara...
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.
Arxiv preprint arXiv:1106.0107, 2011
2016
2013
Pattern Recognition, 2004
journal of VLSI and Signal Processing (IOSR-JVSP), 2017
International Journal of Computer Applications, 2016
Innovations in Intelligent Systems and Applications Conference (ASYU), 2023