Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2011, International Journal of Computer Vision and Image Processing
…
4 pages
1 file
Automatic machine-printed Optical Characters or texts Recognizers (OCR) are highly desirable for a multitude of modern IT applications, including Digital Library software. However, the state of the art OCR systems cannot do for Myanmar scripts as the language poses many challenges for document understanding. Therefore, the authors design an Optical Character Recognition System for Myanmar Printed Document (OCRMPD), with several proposed techniques that can automatically recognize Myanmar printed text from document images. In order to get more accurate system, the authors propose the method for isolation of the character image by using not only the projection methods but also structural analysis for wrongly segmented characters. To reveal the effectiveness of the segmentation technique, the authors follow a new hybrid feature extraction method and choose the SVM classifier for recognition of the character image. The proposed algorithms have been tested on a variety of Myanmar printed...
2019
Nowadays, Myanmar optical character recognition is an open area in research field. A great work has been done for Myanmar handwritten character recognition. But in case of Myanmar old printed characters recognition is limited. In character recognition, feature extraction is very important task for high recognition accuracy. This paper describes a relevant feature extraction method for Myanmar old printed characters recognition. Myanmar old printed character recognition performance is compared with feature extraction method and without feature extraction method.
International Journal of Recent …, 2009
This paper describes an Optical Character Recognition (OCR) System for printed text documents in Malayalam, a South Indian language. Indian scripts are rich in patterns while the combinations of such patterns makes the problem even more complex and these complex patterns are exploited to arrive at the solution. The system segments the scanned document image into text lines, words and further characters and sub-characters. The segmentation algorithm proposed is motivated by the structure of the script. A novel set of features, computationally simple to extract are proposed. The approaches used here are based on the distinctive structural features of machine-printed text lines in these scripts. A lateral cross-sectional analysis is performed along each row of the normalized binary image matrix resulting in distinct features. The final recognition is achieved through classifiers based on the Support Vector Machine (SVM) method. The proposed algorithms have been tested on a variety of printed Malayalam characters and currently achieve recognition rates between 90.22% and 95.31 %.
22nd International Conference on Computer and Information Technology (ICCIT) (Publisher: IEEE), 2019
Optical Character Recognition (OCR) is a major computer vision task by which characters of image are detected and recognized by comparing to training set images. Process of detecting character is one of the perplexing tasks in computer vision. This is because of input image often not correctly aligned or because of noise. This paper presents a complete Optical Character Recognition (OCR) system which is worked for English character mostly for Calibri font. This system first corrects skew of image if input image is not correctly aligned followed by noise reduction from input image. This process is passed through line and character segmentation that are passed into the recognition module and recognize characters. By experimenting with a set of 50 images, average achievement is 92%, 98% is for Calibri font. Moreover, the developed technique is computationally efficient and requires less time than other Optical character recognition system.
2017
Optical Character Recognition (OCR) has been a topic of interest for many years. It is defined as the process of digitizing a document image into its constituent characters. Despite decades of intense research, developing OCR with capabilities comparable to that of human still remains an open challenge. Due to this challenging nature, researchers from industry and academic circles have directed their attentions towards Optical Character Recognition. Over the last few years, the number of academic laboratories and companies involved in research on Character Recognition has increased dramatically. This research aims at summarizing the research so far done in the field of OCR. It provides an overview of different aspects of OCR and discusses corresponding proposals aimed at resolving issues of OCR.
This paper presents a literature review on English OCR techniques. English OCR system is compulsory to convert numerous published books of English into editable computer text files. Latest research in this area has been able to grown some new methodologies to overcome the complexity of English writing style. Still these algorithms have not been tested for complete characters of English Alphabet. Hence, a system is required which can handle all classes of English text and identify characters among these classes.
In the running world, there is growing demand for the software systems to recognize characters in computer system when information is scanned through paper documents as we know that we have number of newspapers and books which are in printed format related to different subjects. These days there is a huge demand in " storing the information available in these paper documents in to a computer storage disk and then later reusing this information by searching process ". One simple way to store information in these paper documents in to computer system is to first scan the documents and then store them as IMAGES. But to reuse this information it is very difficult to read the individual contents and searching the contents form these documents line-by-line and word-byword. The reason for this difficulty is the font characteristics of the characters in paper documents are different to font of the characters in computer system. As a result, computer is unable to recognize the characters while reading them. This concept of storing the contents of paper documents in computer storage place and then reading and searching the content is called DOCUMENT PROCESSING. Sometimes in this document processing we need to process the information that is related to languages other than the English in the world. For this document processing we need a software system called CHARCATER RECOGNITION SYSTEM. This process is also called DOCUMENT IMAGE ANALYSIS (DIA).
International Journal of Scientific Research in Science and Technology, 2019
Optical Character Recognition (OCR) is a technology widely adopted for automatic translation of hardcopy text to editable text. The language dependence of the technology makes it far less developed for less popular languages like Myanmar language. Also, the uniqueness and complexity of the Myanmar text system such as touching and complex characters have continued to pose serious challenges to several OCR investigators. In this paper, we propose a new technique to development Myanmar OCR system. Our technique implement skew angle detection and free skew, noisy border correction, extra page elimination, line segmentation from scanned images of Myanmar text. Performance of the proposed method is tested with 430 documents comprising different printed and handwritten Myanmar text of various fonts, sizes, multi-column, tables, stamps or photos, background effects. Our method give an accuracy of 100% for line segmentation and 99.92% for skew angle detection and free skew. The ability of ou...
At display circumstance, there is creating enthusiasm for the item system to see characters in a PC structure when information is investigated paper records. This paper presents point by point review in the field of Optical Character Recognition. Diverse methods are settled that have been proposed to comprehend the point of convergence of character affirmation in an optical character affirmation structure. Decision and feature extraction in light of Optical Character Recognition (OCR). By using the OCR, we can change the information of picture into the information of substance which is definitely not hard to control. In our proposed method, Select the any particular number and crop the selected image and then extract the feature. The text from the OCR process will be compared with the selected number from the loaded image. The overall accuracy of the proposed method is 92%.
This paper presents detailed review in the field of Optical Character Recognition. Various techniques are determine that have been proposed to realize the center of character recognition in an optical character recognition system. Even though, sufficient studies and papers are describes the techniques for converting textual content from a paper document into machine readable form. Optical character recognition is a process where the computer understands automatically the image of handwritten script and transfer into classify character. This material use as a guide and update for readers working in the Character Recognition area. Selection of a relevant feature extraction method is probably the single most important factor in achieving high character recognition with much better accuracy in character recognition systems without any variation.
M S Thesis, 2001
Automatic Recognition of Characters by a machine is one of the challenging problems in Artificial Intelligence. The motivation for the design of such a machine comes from the human visual system (HVS). HVS is endowed with astonishing versatility and constitutes the ultimate physical (albeit neural) realization of a pattern recognition system whose performance is not affected by geometric transformations of patterns, like characters of various styles and sizes. The prime goal of the design of such a machine is to replace the HVS in practical applications involving repetitive, monotonous tasks such as mass digitization of printed manuscripts, processing of letters and mails in postal services, job applications and banking papers. Most research endeavors and commercial software packages focus on the Roman script. In the case of Indian scripts, the problem of automatic recognition is still a topic of considerable interest. In this thesis, an attempt to develop an integrated Optical Character Recognition (OCR) system for printed Odiya script is presented. The task of automatic recognition of documents has the following major subtasks: Digitization, Preprocessing, Segmentation, Feature Extraction, Classification. In this thesis, a novel binarization technique based on windows of variable width is developed and implemented. The width of the window is selected based upon the local statistics of the image. Skew in the document is detected with the help of a two level precise skew detection algorithm, employing Hough transform and statistical properties of the image. The task of segmenting individual lines from the text is accomplished employing horizontal projection vectors, while that of separating words from lines is done with the help of vertical projection vectors. The segmented words are then subjected to connected component analysis to obtain the basic characters and associated matras. Identifying and extracting the right features with minimal error is one of the most important tasks in automatic recognition of documents. The ability of various types of features in discriminating Odiya characters is analyzed and the features that exhibit better discriminating capabilities are chosen for use in the recognition phase. Of the tested features, it was found that the projection profiles of the characters yielded better discrimination. Apart from these features, some heuristic-based features are also employed in the final classification phase. An important requirement of pattern classifiers is their robustness to noise in the input patterns. In an attempt to design a robust classifier, various classification techniques reported in the literature are tried. These include the nearest neighbor, k-NN and modified k-NN classifiers. Apart from these classical pattern classification techniques, modern techniques involving Support Vector Machines (SVM's) are also employed.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.
International journal of computer applications, 2017
Vol. 19 No. 2 FEBRUARY 2021 International Journal of Computer Science and Information Security (IJCSIS), 2021
Lecture Notes in Computer Science, 2015
International Journal of Scientific Research in Computer Science, Engineering and Information Technology, 2024
Proceedings of the 2nd International Conference on ICT for Digital, Smart, and Sustainable Development, ICIDSSD 2020, 27-28 February 2020, Jamia Hamdard, New Delhi, India, 2021
7th International Conference on Networking, Systems and Security, 2020
International Journal of Advance Research In Science And Engineering (IJARSE), India, ISSN 2319-8346 (P), ISSN-2319-8354(E), Vol.3, Issue 7, Pages 261- 274, 2014
Jurnal Elektronika dan Telekomunikasi
Journal of Zhejiang University SCIENCE, 2005
International Journal of Advanced Research in Computer Science and Software Engineering