Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
1996, IEEE Transactions on Pattern Analysis and Machine Intelligence
Character segmentation has long been a critical area of the OCR process. The higher recognition rates for isolated characters vs. those obtained for words and connected character strings well illustrate this fact. A good part of recent progress in reading unconstrained printed and written text may be ascribed to more insightful handling of segmentation.
Proceedings of the International Conference on Advances in Computer Science and Electronics Engineering, 2012
Character segmentation is the critical area of the Optical Character Recognition process. The higher recognition rates for isolated characters as compared to those obtained for words and connected character strings illustrate this fact. This paper provides a review of various techniques of character segmentation, which are classified mainly into four classes. In classical approach the input image is partitioned into sub images, which are then classified. The operation of attempting to decompose the image into classifiable units is called "dissection". In the second class of method, the dissection method is avoided and the image is segmented either explicitly by classification of pre specified windows, or implicitly by classification of subsets of spatial features collected from the image. The third strategy is hybrid of the first two, employing dissection together with recombination rules to define potential segments, but using classification to select from the range of admissible segmentation possibilities offered by these sub images. Finally, holistic approaches avoid segmentation by recognizing entire character strings as units.
2019
In identifying the characters from a given image, character segmentation plays an important role. In a given line of text, first, we have to segment the words. Then, in each word there will be a character-by-character segmentation. There have been some rapid developments in this area. Many algorithms have been implemented to increase the accuracy range and decrease the word error rate. This paper aims to provide a review of some of the developments that have happened in this domain.
International Journal of Computer Vision and Image Processing, 2011
Automatic machine-printed Optical Characters or texts Recognizers (OCR) are highly desirable for a multitude of modern IT applications, including Digital Library software. However, the state of the art OCR systems cannot do for Myanmar scripts as the language poses many challenges for document understanding. Therefore, the authors design an Optical Character Recognition System for Myanmar Printed Document (OCRMPD), with several proposed techniques that can automatically recognize Myanmar printed text from document images. In order to get more accurate system, the authors propose the method for isolation of the character image by using not only the projection methods but also structural analysis for wrongly segmented characters. To reveal the effectiveness of the segmentation technique, the authors follow a new hybrid feature extraction method and choose the SVM classifier for recognition of the character image. The proposed algorithms have been tested on a variety of Myanmar printed...
Proceedings of The IEEE, 1992
This p a~e r discusses character segmentution methods, a key technology for character recognition that determines the usability and upplicubiliiy of optical character readers. A pattermoriented segmentation method that leads to document structure analysis is presented. A first example of advanced character segmrritation is touching handwritten numeral segmentation. Connected pattern components are extracted instead of U pixel image, and spatial interrelations between components are measured to group them into meaningful character patterns. Stroke shapes are analyzed in the ca.se of touching ckuracters. A method of finding the touching positions can separatc about 95% of connecred niimimls correctly. Ambiguities are handled by multiple hypotheses arid Lwificution by rwognition. An extended form of pattern-oriented segmentation is also discussed by presenting another example of tabular form recognition. Document images of tabular forms are analyzed, and frumev in the tabular .structure can be atracted. By identihing semantic relationships between label frames and data frames, information on the form can be properly recognized. Advanced character segmentation with a document structure analysis capabilit) is becoming increasingly significurit in automating information extraction from iarioiis kind5 of documents.
Character Segmentation is the process of separating characters from words. Character Segmentation of handwritten text is a challengenging task in O.C.R because of its features and varied writing styles of different writers. Handwritten text is also prone to the pro blems of overlapped characters, touching characters, skewed characters, broken characters which makes the segmentation process more complicated. Accuracy of character segmentation depend on upto which extent these problems are tacled and character is segme nted. In this paper we provide a review of various techniques used for character segmentation and also discuss existing problems of segmentation. Correct segmentation is necessary for correct recognition of characters
Procedia Computer Science, 2013
Characters extraction is the most critical pre-processing step for any off-line text recognition system because the characters are the smallest unit of any language script. The paper proposes an approach to segment character images from the text containing images and computer printed or handwritten words. This segmentation app roach is based on a set of properties for each connected component (object) in the whole binary image of the machine printed or handwritten text containing some other images. These words which are printed along with some images are of different lengths and are printed by different cursive fonts of different sizes. This character extraction technique is applied for the segmentation of untouched characters from the machine printed or handwritten words of varying length written on a noisy background having some images etc. Very promising results are achieved which reveals the robustness of the proposed character detection and extraction technique.
Procedia Computer Science, 2013
Character Segmentation is the most crucial step for any OCR (Optical Character Recognition) System. The selection of segmentation algorithm being used is the key factor in deciding the accuracy of OCR system. If there is a good segmentation of characters, the recognition accuracy will also be high. Segmentation of words into characters becomes very difficult due to the cursive and unconstrained nature of the handwritten script. This paper proposes a new vertical segmentation algorithm in which the segmentation points are located after thinning the word image to get the stroke width of a single pixel. The knowledge of shape and geometry of English characters is used in the segmentation process to detect ligatures. The proposed segmentation approach is tested on a local benchmark database and high segmentation accuracy is found to be achieved.
This paper examines various approaches for handling the problem of overlapped or fused / merged characters while performing character segmentation of scanned document images. A three pass method is proposed that involves a naïve profile based segmentation followed by methods to resolve overlapped characters and fused / merged characters. It is found that erosion of white regions gives excellent results for separating fused / merged characters. Experiments performed on a large number of pages clearly demonstrate the efficacy of the proposed method.
2015
Hand written Character Recognition is area of research since many years. Automation of existing manual system is need of most industries as well as government areas. Recognition of hand written characters is a demand for many fields. In this paper we have discussed our approach for hand written character segmentation. This paper discusses various methodologies to segment a text based image at various levels of segmentation. This paper serves as a guide for people working on the text based image segmentation area of Computer Vision. First, the need for segmentation is justified in the context of text based information retrieval. Then, the various factors affecting the segmentation process are discussed. Followed by the levels of text segmentation are explored. Also, the available techniques with their advantages and weaknesses are reviewed, along with directions for quick referral are suggested. At last, we have given our approach to text segmentation in brief.
Proceedings of the IEEE, 1992
It is time for a major change of approach to character recognition research. The traditional approach, focusing on the the correct classijication of isolated characters, has been exhausted. The demonstration of the superiority of a new classification method under operational conditions requires large experimental facilities and data bases beyond the resources of most researchers. In any case, even perfect classification of individual characters is insufficient for the conversion of complex archival documents to a useful computer-readable form. Many practical OCR tasks require integrated treatment of entire documents and well-organized typographic and domain-specific knowledge. New OCR systems should take advantage of the typographic uniformity of paragraphs or other layout components. They should also exploit the unavoidable interaction with human operators to improve themselves without explicit "training. "
Ymer, 2022
Text segmentation, whether printed, handwritten or cursive, is one of the most complicated phases in any OCR. The accuracy of recognition will be heavily reliant on good segmentation. Image segmentation is a crucial component of image analysis and the field of computer vision. Researchers have developed several techniques for segmentation, each of which is used for different types of segmented objects. At present no any universal method is available for image segmentation. Existing image segmentation techniques are not capable to deal with images of any types. This survey looked at a variety of image segmentation techniques, evaluated them, and discussed the issues that came up as a result of using them.
— Over the last five years optical character recognition approaches have been under gone an enormous number of changes. Many efforts have been done and a wide range of algorithms have been used in order to improve the performance of existing methods of OCR in many languages. This paper presents an overview of feature extraction methods for character recognition in different texts. The feature extraction stage is an important component of any recognition system. It is also very much dependent on the task, input, and recognition algorithm used. The feature extraction methods are discussed in terms of invariance properties and expected distortions and variability of the characters.
Procedia Technology, 2014
In computer vision, segmentation is the process of partitioning a digital image into multiple segments (sets of pixels). Image segmentation is thus inevitable. Segmentation used for text-based images aim in retrieval of specific information from the entire image. This information can be a line or a word or even a character. This paper proposes various methodologies to segment a text based image at various levels of segmentation. This material serves as a guide and update for readers working on the text based segmentation area of Computer Vision. First, the need for segmentation is justified in the context of text based information retrieval. Then, the various factors affecting the segmentation process are discussed. Followed by the levels of text segmentation are explored. Finally, the available techniques with their superiorities and weaknesses are reviewed, along with directions for quick referral are suggested. Special attention is given to the handwriting recognition since this area requires more advanced techniques for efficient information extraction and to reach the ultimate goal of machine simulation of human reading.
IJSRD, 2013
Nowadays character recognition has gained lot of attention in the field of pattern recognition due to its application in various fields. It is one of the most successful applications of automatic pattern recognition. Research in OCR is popular for its application potential in banks, post offices, office automation etc. HCR is useful in cheque processing in banks; almost all kind of form processing systems, handwritten postal address resolution and many more. This paper presents a simple and efficient approach for the implementation of OCR and translation of scanned images of printed text into machine-encoded text. It makes use of different image analysis phases followed by image detection via pre-processing and post-processing. This paper also describes scanning the entire document (same as the segmentation in our case) and recognizing individual characters from image irrespective of their position, size and various font styles and it deals with recognition of the symbols from English language, which is internationally accepted.
2017
Optical Character Recognition (OCR) systems are very well developed and providing high accuracy for extracting text from printed or hand written documents and images for English language. But they are still not providing better accuracy for Indian languages due to some language constraints. Hindi language uses Devanagari script. In this paper we have presented a brief survey of various segmentation techniques applied by different researchers for both Devanagari and Gujarati scripts separately.
2010
A major difficulty for designing a document image segmentation methodology is the proper value selection for all involved parameters. This is usually done after experimentations or after involving a training supervised phase which is a tedious process since the corresponding segmentation ground truth has to be created. In this paper, we propose a novel automatic unsupervised parameter selection methodology that can be applied to the character segmentation problem. It is based on clustering of the entities obtained as a result of the segmentation for different values of the parameters involved in the segmentation method. The clustering is performed using features extracted from the segmented entities based on zones and from the area that is formed from the projections of the upper/lower and left/right profiles. Optimization of an appropriate intra-class distance measure yields the optimal parameter vector. The method is evaluated on two segmentation algorithms, namely a recently proposed character segmentation technique based on skeleton segmentation paths, as well as the well known RLSA technique. The proposed parameter selection method is capable of finding the segmentation parameters that correspond to the optimal or near optimal segmentation result, as this is determined by counting the number of matches between the entities detected by the segmentation algorithm and the entities in the ground truth.
7th International Conference on Networking, Systems and Security, 2020
In a world of digitization, optical character recognition holds the automation to written history. Optical character recognition system basically converts printed images into editable texts for better storage and usability. To be completely functional, the system needs to go through some crucial methods such as pre-processing and segmentation. Pre-processing helps printed data to be noise free and gets rid of skewness efficiently whereas segmentation helps the image fragment into line, word and character precisely for better conversion. These steps hold the door to better accuracy and consistent results for a printed image to be ready for conversion. Our proposed algorithm is able to segment characters both from ideal and non-ideal cases of scanned or captured images giving a sustainable outcome. The implementation of our work is provided here: https://cutt.ly/rgdfBIa.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.