Skip to main content

Lakshmana Pandian S

Followers

70

Following

41

Public Views

Harpreet S I N G H Bedi

University of Aizu

federica bianco

Lauren Horstmyer '22

C ABDUL HAKEEM COLLEGE OF ENGG AND TECH

SIVA PHANIRAM JOSYULA

Interests

Uploads

Papers by Lakshmana Pandian S

End-To-End Deep Learning Based Tamil Handwritten Document Recognition and Classification Model

Handwritten recognition (HR) remains a challenging process in various real-world applications. Ta... more Handwritten recognition (HR) remains a challenging process in various real-world applications. Tamil handwritten text recognition involves the recognition of text in scanned images. Recognition of handwritten Tamil characters is a tedious process because of the differences in sizes, style and orientation angle. Prior studies concentrated on character-level segmentation and each character was subsequently classified. Segmentation is then used, first at the word level and subsequently at the line level. The recently developed machine learning (ML) and deep learning (DL) approaches can be utilized for Tamil HCR. With this motivation, this paper presents an end-to-end deep learning-enabled Tamil handwritten document recognition (ETEDL-THDR) model. The ETEDL-THDR paragraph text recognition can be accomplished by the use of two modules such as line segmentation and line recognition. Initially, the ETEDL-THDR model enables the improvement of the quality of the input images by the use of th...

Semantic Role Labeling for Tamil Documents

The aim of this work is to design and implement a system to identify, analyze and tag the constit... more The aim of this work is to design and implement a system to identify, analyze and tag the constituents in the sentence which fill a semantic role expressed by some target verbs of a sentence in Tamil. The system reads a Tamil text document and performs tagging of semantic roles associated with a given target verb such as Agent, Patient, Instrument, etc. and also adjuncts such as Locative, Temporal, Manner, Cause, etc. within such a document using a hybrid approach by considering syntactic, semantic, and statistical evidence in the sentences. It consists of two main phases-a Learning Phase and an Evaluation Phase. The Learning phase consists of two main components namely a Maximum Entropy Model (MEM) and a Learning Component. The Evaluation Phase consists of four main components namely MEM Evaluator, Verb Frame Invoker, Rule Based Probability Assigner and Expectation Maximizer Component. A number of different performance measures are charted and the performance of the system is judge...

UNL Deconverter for Malayalam

A State of Art Approaches on Handwriting Recognition Models

2019 Fifth International Conference on Science Technology Engineering and Mathematics (ICONSTEM), 2019

Presently, handwriting recognition (HWR) become an open research issue to recognize the handwritt... more Presently, handwriting recognition (HWR) become an open research issue to recognize the handwritten words out of a scanned document which is unconstrained. In spite of the extensive implication of computer in office, handwriting still stays as a significant mode of annotating and apprehending textual data. In recent days, there is an exponential growth in the number of researches carried out in this field because of its benefits. On the past decade, great efforts are paid to the online hand-drawn content interpretation from handwritten text. Keeping this in mind, in this paper, we made a review of existing HWR model in different aspects have been reviewed.

Malayalam POS Tagger - A Comparison Using SVM and HMM

Many Parts Of Speech (POS) taggers for the Malayalam language has been implemented using Support ... more Many Parts Of Speech (POS) taggers for the Malayalam language has been implemented using Support Vector Machine (SVM), Memory-Based Language Processing (MBLP), Hidden Markov Model (HMM) and other similar techniques. The objective was to find an improved POS tagger for the Malayalam language. This work proposed a comparison of the Malayalam POS tagger using the SVM and Hidden Markov model (HMM). The tagset used was the popular Bureau of Indian Standard (BIS) tag set. A manually created data set which has around 52,000 words has been taken from various Malayalam news sites. The preprocessing steps that have done for news text are also mentioned. Then POS tagging has been done using SVM and HMM. As POS tagging requires the extraction of multiple class labels, a multi-class SVM is used. It also performs feature extraction, feature selection, and classification. The word sense disambiguation and misclassification of words are the two major issues identified in SVM. Hidden Markov Model pr...

Modified Convolutional Neural Network of Tamil Character Recognition

Lecture Notes in Networks and Systems, 2020

The handwritten Tamil character recognition in offline mode is challenging tasks as there are vir... more The handwritten Tamil character recognition in offline mode is challenging tasks as there are virtually different people who have different styles of writing the same characters. Deep convolution neural networks are playing a virtual role nowadays in recognizing handwritten character by automatically learning discriminative features from high dimensionality of input data. This work presents a modified convolution neural network \(\left( \text {M-CNN} \right) \) architecture to achieve a faster convergence rate and also to get the highest recognition accuracy. The M-CNN on different aspects along with layers design, activation function, loss function and optimization is discussed. Systematic experiments on isolated handwritten Tamil character dataset collected from various schools by ourselves. For these collected datasets, the proposed system recognized the characters with 97.07%.

Causal Discovery using Dimensionality Reduction Partial Association Tree

International Research Journal on Advanced Science Hub, 2021

Decision tree is a model to classify data based on labelled attribute values. This model is a sup... more Decision tree is a model to classify data based on labelled attribute values. This model is a supervised learning approach through which one can classify a new entry into an appropriate class. If we want to know the cause behind this classification then decision tree cannot provide the same. When we infer causes behind the classification then they will provide a rich knowledge for better decision making. Causal Bayesian Networks, Structural Equation Models, Potential Outcome Models are the some of the models that are used to get causal relationships. These models need experimental data. But it is not possible/ it is very expensive to conduct full experiments. So a model is needed to identify causes from effects from observational data rather than experimental data. In this paper a novel approach is proposed for causal inference rule mining which can infer the causes from observational data in a faster way and also scalable. Statistical tools and techniques named partial association test, correlation are used to develop the model. A new way of constructing a tree called Dimensionality Reduction Partial Association Tree (DRPAT) is introduced. Sometimes the existing causality cannot be extracted where low associated dimensions are involved in data and hiding the underlying causality and this model extracts causal association in case of hidden causality in data.. The model is applied on "Cardiovascular Disease dataset" sourced from Kaggle Progression System. The result is a Partial Association Tree. From this tree one can get a set of causal rules which can form a basis for better data analytics and then the better decision making.

A Survey on Detection and Prevention of Cross-Site Scripting Attack

International Journal of Security and Its Applications, 2015

In present-day time, securing the web application against hacking is a big challenge. One of the ... more In present-day time, securing the web application against hacking is a big challenge. One of the common types of hacking technique to attack the web application is Cross-Site Scripting (XSS). Cross-Site Scripting (XSS) vulnerabilities are being exploited by the attackers to steal web browser's resources such as cookies, credentials etc. by injecting the malicious JavaScript code on the victim's web applications. Since Web browsers support the execution of commands embedded in Web pages to enable dynamic Web pages attackers can make use of this feature to enforce the execution of malicious code in a user's Web browser. The analysis of detection and prevention of Cross-Site Scripting (XSS) help to avoid this type of attack. We describe a technique to detect and prevent this ki nd of manipulation and hence eliminate Cross-Site Scripting attack.

Cross Site Request Forgery: Preventive Measures

International Journal of Computer Applications

Monitoring of Indian Agriculture using LPC2148

International Journal of Computer Applications, 2015

In this paper, we describe about the monitoring of Indian agriculture using LPC2148.Monitoring of... more In this paper, we describe about the monitoring of Indian agriculture using LPC2148.Monitoring of Indian agriculture is done using information of the temperature and humidity content. This is mainly used for saving water and monitoring agriculture without human presence. Temperature sensor and humidity sensor will continuously sense the information regarding the field. When the values are less than or greater than the threshold values it will do certain operations. We are using GSM technology also for sending information to the farmer. We have included two modes for monitoring manual and automatic mode. In manual mode farmer will be sending message and to monitor and control the water pump .In automatic mode farmer will not be involved to control the operations of motor .It will be operating the water pump motor automatically. So, that he can monitor from remote places also. As ARM7 processor is RISC architecture, it is so flexible to program and as advancements has increased in this field we have done code optimization for the program in this project.

Multilingual acquiring of e-content definition based on universal networking language

2014 IEEE International Conference on MOOC, Innovation and Technology in Education (MITE), 2014

E-content information that exists in the web are available only in a particular language. But peo... more E-content information that exists in the web are available only in a particular language. But people who know only the local language may not be able to use these resources, though they are available. The UNL infrastructure intends to overcome the language barrier over the Internet. It is used to enconvert the information from any natural language into the form of universal semantic networks and then deconvert into any other required languages. A framework towards conversion of e-content from web that exhibit in the English language to UNL is presented in this paper. This approach involves conversion of dependency relation extracted from the parser into UNL relation and identifies pedagogical clues to determine the level of e-content according to the educational taxonomy.

Annotation for Query Result Records Based on Domain-Specific Ontology

International Journal on Natural Language Computing, 2014

The World Wide Web is enriched with a large collection of data, scattered in deep web databases a... more The World Wide Web is enriched with a large collection of data, scattered in deep web databases and web pages in unstructured or semi structured formats. Recently evolving customer friendly web applications need special data extraction mechanisms to draw out the required data from these deep web, according to the end user query and populate to the output page dynamically at the fastest rate. In existing research areas web data extraction methods are based on the supervised learning (wrapper induction) methods. In the past few years researchers depicted on the automatic web data extraction methods based on similarity measures. Among automatic data extraction methods our existing Combining Tag and Value similarity method, lags to identify an attribute in the query result table. A novel approach for data extracting and label assignment called Annotation for Query Result Records based on domain specific ontology. First, an ontology domain is to be constructed using information from query interface and query result pages obtained from the web. Next, using this domain ontology, a meaning label is assigned automatically to each column of the extracted query result records.

Machine Translation from English to Tamil using Hybrid Technique

International Journal of …, 2012

The corpus based techniques in Machine Translation involves parallel corpora, but it is not appli... more The corpus based techniques in Machine Translation involves parallel corpora, but it is not applicable for the languages for which there are less or no parallel corpora available. In such case the Rule based machine Translation suits best. The main objective of our work is to build a translation system that translates English sentences to Tamil Sentences. Due to the less availability of parallel corpora for English to Tamil the system is implemented using a Hybrid Technique (the combination of both Rule Based Technique and Statistical Technique). The system is first implemented in a Rule Based approach which involves segmentation and tagging, Rule Based Reordering, Morphological Analyzing, and dictionary based translation to the Target language. Then the errors in the translated sentences are corrected by applying Statistical technique.

UNL based Document Summarization based on Level of Users

The primary goal is to develop an NLP system to perform automatic document summarization by conve... more The primary goal is to develop an NLP system to perform automatic document summarization by converting the English sentences into the expressions of an Interlingua called Universal Networking Language (UNL). UNL has been designed at the United Nations University (UNU)/Institute of Advanced Studies (IAS), Tokyo in 1990. UNL represents knowledge in the form of semantic network. The nodes represent concepts and links represent semantic relations between the concepts. Initially, the complex document is represented in UNL form by undergoing enconversion process and the document is deconverted to produce the summarized document for different levels of users thus reducing the complexity of the document and helps in understanding and decision making. The aim of this NLP system is to represent the exact meaning of the document represented in a usual language. Since UNL is a language independent meaning representation language, summarization is carried out by analyzing and filtering out the U...

Design and Experimentation of Causal Relationship Discovery among Features of Healthcare Datasets

Intelligent Automation & Soft Computing, 2021

Morpheme based Language Model for Part-of-Speech Tagging

Polibits, 2008

The paper describes a Tamil Part of Speech (POS) tagging using a corpus-based approach by formula... more The paper describes a Tamil Part of Speech (POS) tagging using a corpus-based approach by formulating a Language Model using morpheme components of words. Rule based tagging, Markov model taggers, Hidden Markov Model taggers and transformation-based learning tagger are some of the methods available for part of speech tagging. In this paper, we present a language model based on the information of the stem type, last morpheme, and previous to the last morpheme part of the word for categorizing its part of speech. For estimating the contribution factors of the model, we follow generalized iterative scaling technique. Presented model has the overall F-measure of 96%.

Tamil Question Classification Using Morpheme Features

Question classification plays an important role in question answering systems. This paper present... more Question classification plays an important role in question answering systems. This paper presents the Conditional Random field (CRF) model based on Morpheme features for Tamil question classification. It is a process that analyzes a question and labels it based on its question type and expected answer type (EAT). The selected features are the morpheme parts of the question terms and its dependent terms. The main contribution in this work is in the way of selection of features for constructing CRF Model. They discriminates the position of expected answer type information with respect to question term’s position. The CRF model to find out the phrase which contains the information about EAT is trained with tagged question corpus. The EAT is semantically derived by analyzing the phrase obtained from CRF engine using WordNet. The performance of this morpheme based CRF model is compared with the generic CRF engine.

CRF Models for Tamil Part of Speech Tagging and Chunking

Conditional random fields (CRFs) is a framework for building probabilistic models to segment and ... more Conditional random fields (CRFs) is a framework for building probabilistic models to segment and label sequence data. CRFs offer several advantages over hidden Markov models (HMMs) and stochastic grammars for such tasks, including the ability to relax strong independence assumptions made in those models. CRFs also avoid a fundamental limitation of maximum entropy Markov models (MEMMs) and other discriminative Markov models based on directed graphical models, which can be biased towards states with few successor states. In this paper we propose the Language Models developed for Part Of Speech (POS) tagging and chunking using CRFs for Tamil. The Language models are designed based on morphological information. The CRF based POS tagger has an accuracy of about 89.18%, for Tamil and the chunking process performs at an accuracy of 84.25% for the same language.

Energy efficient Robust On-Demand Multicast Routing Protocol for MANETs

International Journal of Ad Hoc and Ubiquitous Computing, 2008

Int. J. Ad Hoc and Ubiquitous Computing, Vol. 3, No. 2, 2008 ... Energy efficient Robust On-Deman... more

Named Entity Recognition in Tamil using Context-cues and the EM algorithm

End-To-End Deep Learning Based Tamil Handwritten Document Recognition and Classification Model

Handwritten recognition (HR) remains a challenging process in various real-world applications. Ta... more Handwritten recognition (HR) remains a challenging process in various real-world applications. Tamil handwritten text recognition involves the recognition of text in scanned images. Recognition of handwritten Tamil characters is a tedious process because of the differences in sizes, style and orientation angle. Prior studies concentrated on character-level segmentation and each character was subsequently classified. Segmentation is then used, first at the word level and subsequently at the line level. The recently developed machine learning (ML) and deep learning (DL) approaches can be utilized for Tamil HCR. With this motivation, this paper presents an end-to-end deep learning-enabled Tamil handwritten document recognition (ETEDL-THDR) model. The ETEDL-THDR paragraph text recognition can be accomplished by the use of two modules such as line segmentation and line recognition. Initially, the ETEDL-THDR model enables the improvement of the quality of the input images by the use of th...

Semantic Role Labeling for Tamil Documents

The aim of this work is to design and implement a system to identify, analyze and tag the constit... more The aim of this work is to design and implement a system to identify, analyze and tag the constituents in the sentence which fill a semantic role expressed by some target verbs of a sentence in Tamil. The system reads a Tamil text document and performs tagging of semantic roles associated with a given target verb such as Agent, Patient, Instrument, etc. and also adjuncts such as Locative, Temporal, Manner, Cause, etc. within such a document using a hybrid approach by considering syntactic, semantic, and statistical evidence in the sentences. It consists of two main phases-a Learning Phase and an Evaluation Phase. The Learning phase consists of two main components namely a Maximum Entropy Model (MEM) and a Learning Component. The Evaluation Phase consists of four main components namely MEM Evaluator, Verb Frame Invoker, Rule Based Probability Assigner and Expectation Maximizer Component. A number of different performance measures are charted and the performance of the system is judge...

UNL Deconverter for Malayalam

A State of Art Approaches on Handwriting Recognition Models

2019 Fifth International Conference on Science Technology Engineering and Mathematics (ICONSTEM), 2019

Presently, handwriting recognition (HWR) become an open research issue to recognize the handwritt... more Presently, handwriting recognition (HWR) become an open research issue to recognize the handwritten words out of a scanned document which is unconstrained. In spite of the extensive implication of computer in office, handwriting still stays as a significant mode of annotating and apprehending textual data. In recent days, there is an exponential growth in the number of researches carried out in this field because of its benefits. On the past decade, great efforts are paid to the online hand-drawn content interpretation from handwritten text. Keeping this in mind, in this paper, we made a review of existing HWR model in different aspects have been reviewed.

Malayalam POS Tagger - A Comparison Using SVM and HMM

Many Parts Of Speech (POS) taggers for the Malayalam language has been implemented using Support ... more Many Parts Of Speech (POS) taggers for the Malayalam language has been implemented using Support Vector Machine (SVM), Memory-Based Language Processing (MBLP), Hidden Markov Model (HMM) and other similar techniques. The objective was to find an improved POS tagger for the Malayalam language. This work proposed a comparison of the Malayalam POS tagger using the SVM and Hidden Markov model (HMM). The tagset used was the popular Bureau of Indian Standard (BIS) tag set. A manually created data set which has around 52,000 words has been taken from various Malayalam news sites. The preprocessing steps that have done for news text are also mentioned. Then POS tagging has been done using SVM and HMM. As POS tagging requires the extraction of multiple class labels, a multi-class SVM is used. It also performs feature extraction, feature selection, and classification. The word sense disambiguation and misclassification of words are the two major issues identified in SVM. Hidden Markov Model pr...

Modified Convolutional Neural Network of Tamil Character Recognition

Lecture Notes in Networks and Systems, 2020

The handwritten Tamil character recognition in offline mode is challenging tasks as there are vir... more The handwritten Tamil character recognition in offline mode is challenging tasks as there are virtually different people who have different styles of writing the same characters. Deep convolution neural networks are playing a virtual role nowadays in recognizing handwritten character by automatically learning discriminative features from high dimensionality of input data. This work presents a modified convolution neural network \(\left( \text {M-CNN} \right) \) architecture to achieve a faster convergence rate and also to get the highest recognition accuracy. The M-CNN on different aspects along with layers design, activation function, loss function and optimization is discussed. Systematic experiments on isolated handwritten Tamil character dataset collected from various schools by ourselves. For these collected datasets, the proposed system recognized the characters with 97.07%.

Causal Discovery using Dimensionality Reduction Partial Association Tree

International Research Journal on Advanced Science Hub, 2021

Decision tree is a model to classify data based on labelled attribute values. This model is a sup... more Decision tree is a model to classify data based on labelled attribute values. This model is a supervised learning approach through which one can classify a new entry into an appropriate class. If we want to know the cause behind this classification then decision tree cannot provide the same. When we infer causes behind the classification then they will provide a rich knowledge for better decision making. Causal Bayesian Networks, Structural Equation Models, Potential Outcome Models are the some of the models that are used to get causal relationships. These models need experimental data. But it is not possible/ it is very expensive to conduct full experiments. So a model is needed to identify causes from effects from observational data rather than experimental data. In this paper a novel approach is proposed for causal inference rule mining which can infer the causes from observational data in a faster way and also scalable. Statistical tools and techniques named partial association test, correlation are used to develop the model. A new way of constructing a tree called Dimensionality Reduction Partial Association Tree (DRPAT) is introduced. Sometimes the existing causality cannot be extracted where low associated dimensions are involved in data and hiding the underlying causality and this model extracts causal association in case of hidden causality in data.. The model is applied on "Cardiovascular Disease dataset" sourced from Kaggle Progression System. The result is a Partial Association Tree. From this tree one can get a set of causal rules which can form a basis for better data analytics and then the better decision making.

A Survey on Detection and Prevention of Cross-Site Scripting Attack

International Journal of Security and Its Applications, 2015

In present-day time, securing the web application against hacking is a big challenge. One of the ... more In present-day time, securing the web application against hacking is a big challenge. One of the common types of hacking technique to attack the web application is Cross-Site Scripting (XSS). Cross-Site Scripting (XSS) vulnerabilities are being exploited by the attackers to steal web browser's resources such as cookies, credentials etc. by injecting the malicious JavaScript code on the victim's web applications. Since Web browsers support the execution of commands embedded in Web pages to enable dynamic Web pages attackers can make use of this feature to enforce the execution of malicious code in a user's Web browser. The analysis of detection and prevention of Cross-Site Scripting (XSS) help to avoid this type of attack. We describe a technique to detect and prevent this ki nd of manipulation and hence eliminate Cross-Site Scripting attack.

Cross Site Request Forgery: Preventive Measures

International Journal of Computer Applications

Monitoring of Indian Agriculture using LPC2148

International Journal of Computer Applications, 2015

In this paper, we describe about the monitoring of Indian agriculture using LPC2148.Monitoring of... more In this paper, we describe about the monitoring of Indian agriculture using LPC2148.Monitoring of Indian agriculture is done using information of the temperature and humidity content. This is mainly used for saving water and monitoring agriculture without human presence. Temperature sensor and humidity sensor will continuously sense the information regarding the field. When the values are less than or greater than the threshold values it will do certain operations. We are using GSM technology also for sending information to the farmer. We have included two modes for monitoring manual and automatic mode. In manual mode farmer will be sending message and to monitor and control the water pump .In automatic mode farmer will not be involved to control the operations of motor .It will be operating the water pump motor automatically. So, that he can monitor from remote places also. As ARM7 processor is RISC architecture, it is so flexible to program and as advancements has increased in this field we have done code optimization for the program in this project.

Multilingual acquiring of e-content definition based on universal networking language

2014 IEEE International Conference on MOOC, Innovation and Technology in Education (MITE), 2014

E-content information that exists in the web are available only in a particular language. But peo... more E-content information that exists in the web are available only in a particular language. But people who know only the local language may not be able to use these resources, though they are available. The UNL infrastructure intends to overcome the language barrier over the Internet. It is used to enconvert the information from any natural language into the form of universal semantic networks and then deconvert into any other required languages. A framework towards conversion of e-content from web that exhibit in the English language to UNL is presented in this paper. This approach involves conversion of dependency relation extracted from the parser into UNL relation and identifies pedagogical clues to determine the level of e-content according to the educational taxonomy.

Annotation for Query Result Records Based on Domain-Specific Ontology

International Journal on Natural Language Computing, 2014

The World Wide Web is enriched with a large collection of data, scattered in deep web databases a... more The World Wide Web is enriched with a large collection of data, scattered in deep web databases and web pages in unstructured or semi structured formats. Recently evolving customer friendly web applications need special data extraction mechanisms to draw out the required data from these deep web, according to the end user query and populate to the output page dynamically at the fastest rate. In existing research areas web data extraction methods are based on the supervised learning (wrapper induction) methods. In the past few years researchers depicted on the automatic web data extraction methods based on similarity measures. Among automatic data extraction methods our existing Combining Tag and Value similarity method, lags to identify an attribute in the query result table. A novel approach for data extracting and label assignment called Annotation for Query Result Records based on domain specific ontology. First, an ontology domain is to be constructed using information from query interface and query result pages obtained from the web. Next, using this domain ontology, a meaning label is assigned automatically to each column of the extracted query result records.

Machine Translation from English to Tamil using Hybrid Technique

International Journal of …, 2012

The corpus based techniques in Machine Translation involves parallel corpora, but it is not appli... more The corpus based techniques in Machine Translation involves parallel corpora, but it is not applicable for the languages for which there are less or no parallel corpora available. In such case the Rule based machine Translation suits best. The main objective of our work is to build a translation system that translates English sentences to Tamil Sentences. Due to the less availability of parallel corpora for English to Tamil the system is implemented using a Hybrid Technique (the combination of both Rule Based Technique and Statistical Technique). The system is first implemented in a Rule Based approach which involves segmentation and tagging, Rule Based Reordering, Morphological Analyzing, and dictionary based translation to the Target language. Then the errors in the translated sentences are corrected by applying Statistical technique.

UNL based Document Summarization based on Level of Users

The primary goal is to develop an NLP system to perform automatic document summarization by conve... more The primary goal is to develop an NLP system to perform automatic document summarization by converting the English sentences into the expressions of an Interlingua called Universal Networking Language (UNL). UNL has been designed at the United Nations University (UNU)/Institute of Advanced Studies (IAS), Tokyo in 1990. UNL represents knowledge in the form of semantic network. The nodes represent concepts and links represent semantic relations between the concepts. Initially, the complex document is represented in UNL form by undergoing enconversion process and the document is deconverted to produce the summarized document for different levels of users thus reducing the complexity of the document and helps in understanding and decision making. The aim of this NLP system is to represent the exact meaning of the document represented in a usual language. Since UNL is a language independent meaning representation language, summarization is carried out by analyzing and filtering out the U...

Design and Experimentation of Causal Relationship Discovery among Features of Healthcare Datasets

Intelligent Automation & Soft Computing, 2021

Morpheme based Language Model for Part-of-Speech Tagging

Polibits, 2008

The paper describes a Tamil Part of Speech (POS) tagging using a corpus-based approach by formula... more The paper describes a Tamil Part of Speech (POS) tagging using a corpus-based approach by formulating a Language Model using morpheme components of words. Rule based tagging, Markov model taggers, Hidden Markov Model taggers and transformation-based learning tagger are some of the methods available for part of speech tagging. In this paper, we present a language model based on the information of the stem type, last morpheme, and previous to the last morpheme part of the word for categorizing its part of speech. For estimating the contribution factors of the model, we follow generalized iterative scaling technique. Presented model has the overall F-measure of 96%.

Tamil Question Classification Using Morpheme Features

Question classification plays an important role in question answering systems. This paper present... more Question classification plays an important role in question answering systems. This paper presents the Conditional Random field (CRF) model based on Morpheme features for Tamil question classification. It is a process that analyzes a question and labels it based on its question type and expected answer type (EAT). The selected features are the morpheme parts of the question terms and its dependent terms. The main contribution in this work is in the way of selection of features for constructing CRF Model. They discriminates the position of expected answer type information with respect to question term’s position. The CRF model to find out the phrase which contains the information about EAT is trained with tagged question corpus. The EAT is semantically derived by analyzing the phrase obtained from CRF engine using WordNet. The performance of this morpheme based CRF model is compared with the generic CRF engine.

CRF Models for Tamil Part of Speech Tagging and Chunking

Conditional random fields (CRFs) is a framework for building probabilistic models to segment and ... more Conditional random fields (CRFs) is a framework for building probabilistic models to segment and label sequence data. CRFs offer several advantages over hidden Markov models (HMMs) and stochastic grammars for such tasks, including the ability to relax strong independence assumptions made in those models. CRFs also avoid a fundamental limitation of maximum entropy Markov models (MEMMs) and other discriminative Markov models based on directed graphical models, which can be biased towards states with few successor states. In this paper we propose the Language Models developed for Part Of Speech (POS) tagging and chunking using CRFs for Tamil. The Language models are designed based on morphological information. The CRF based POS tagger has an accuracy of about 89.18%, for Tamil and the chunking process performs at an accuracy of 84.25% for the same language.

Energy efficient Robust On-Demand Multicast Routing Protocol for MANETs

International Journal of Ad Hoc and Ubiquitous Computing, 2008

Int. J. Ad Hoc and Ubiquitous Computing, Vol. 3, No. 2, 2008 ... Energy efficient Robust On-Deman... more

Named Entity Recognition in Tamil using Context-cues and the EM algorithm