Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2013
In this paper we present our experiments in parsing for Telugu language. We explore two data driven parsers Malt and MST and compare the results of both the parsers. We describe the data and parser settings used in detail. Some of these are specific to either one particular or all the Indian Languages. The average of best unlabeled attachment, labeled attachment and labeled accuracies are 88.43%, 69.71 % and 70.01 % respectively.We are also presented which parser gives best results for different sentence types in Telugu.
2019
In this paper, we have developed manually annotated Telugu corpora by following DS guidelines (2009) and experimented our Telugu dependency treebank data on the data-driven parsers like Malt (Nivre et al., 2007a) and MST (McDonald et al. 2006) for parsing Telugu sentences. In the dependency, we link the head and dependents with their dependency relations (drels) by giving kāraka and non-kāraka relations to them. Telugu annotated data contains token with their morph information, pos, chunk and the drels. We have used our final Telugu treebank data in CONLL format for parsing in malt and MST parsers. We evaluated the labeled attachment score (LAS), unlabeled attachment score (UAS) and labeled accuracy (LA) for both the parsers and also compared their score in case of dependency relation too. Finally, we evaluated the most frequent errors which occurred after parsing the sentences and explained them with relevant examples with appropriate linguistic analysis, so that we can improve the...
Journal of King Saud University - Computer and Information Sciences, 2017
In this paper we explore different statistical dependency parsers for parsing Telugu. We consider five popular dependency parsers namely, MaltParser, MSTParser, TurboParser, ZPar and Easy-First Parser. We experiment with different parser and feature settings and show the impact of different settings. We also provide a detailed analysis of the performance of all the parsers on major dependency labels. We report our results on test data of Telugu dependency treebank provided in the ICON 2010 tools contest on Indian languages dependency parsing. We obtain state-of-the art performance of 91.8% in unlabeled attachment score and 70.0% in labeled attachment score. To the best of our knowledge ours is the only work which explored all the five popular dependency parsers and compared the performance under different feature settings for Telugu.
Translation Today, 2021
This paper is an attempt in building a rule-based dependency parser for Telugu which can parse simple sentences. This study adopts Pāṇini's Grammatical (PG) tradition i.e., the dependency model to parse sentences. A detailed description of mapping semantic relations to vibhaktis (case suffixes and postpositions) in Telugu using PG is presented. The paper describes the algorithm and the linguistic knowledge employed while developing the parser. The research further provides results, which suggest that enriching the current parser with linguistic inputs can increase the accuracy and tackle ambiguity better than existing data-driven methods.
Very few attempts have been reported in the literature on dependency parsing for Tamil. In this paper, we report results obtained for Tamil dependency parsing with rule-based and corpus-based approaches. We designed annotation scheme partially based on Prague Dependency Treebank (PDT) and manually annotated Tamil data (about 3000 words) with dependency relations. For corpus-based approach, we used two well known parsers MaltParser and MSTParser, and for the rule-based approach , we implemented series of linguistic rules (for resolving coordination , complementation, predicate identification and so on) to build dependency structure for Tamil sentences. Our initial results show that, both rule-based and corpus-based approaches achieved the accuracy of more than 74% for the unlabeled task and more than 65% for the labeled tasks. Rule-based parsing accuracy dropped considerably when the input was tagged automatically.
ACM Transactions on Asian and Low-Resource Language Information Processing, 2015
We show that Combinatory Categorial Grammar (CCG) supertags can improve Telugu dependency parsing. In this process, we first extract a CCG lexicon from the dependency treebank. Using both the CCG lexicon and the dependency treebank, we create a CCG treebank using a chart parser. Exploring different morphological features of Telugu, we develop a supertagger using maximum entropy models. We provide CCG supertags as features to the Telugu dependency parser (MST parser). We get an improvement of 1.8% in the unlabelled attachment score and 2.2% in the labelled attachment score. Our results show that CCG supertags improve the MST parser, especially on verbal arguments for which it has weak rates of recovery.
We present a comparative error analysis of two parsers-MALT and MST on Telugu Dependency Treebank data. MALT and MST are currently two of the most dominant data-driven dependency parsers. We discuss the performances of both the parsers in relation to Telugu language. we also talk in detail about both the algorithmic issues of the parsers as well as the language specific constraints of Telugu.The purpose is, to better understand how to help the parsers deal with complex structures, make sense of implicit language specific cues and build a more informed Treebank.
2013
In this paper, we present our approach towards dependency parsing of Hindi language as a part of Hindi Shared Task on Parsing, COLING 2012. Our approach includes the effect of using different settings available in Malt Parser following the two-step parsing strategy i.e. splitting the data into interChunks and intraChunks to obtain the best possible LAS 1, UAS 2 and LA 3 accuracy. Our system achieved best LAS of 90.99 % for Gold Standard track and second best LAS of 83.91 % for Automated data.
In this paper we present our experiments in parsing Hindi. We first explored Malt and MST parsers. Considering pros of both these parsers, we developed a hybrid approach combining the output of these two parsers in an intuitive manner. We report our results on both development and test data provided in the Hindi Shared Task on Parsing at workshop on MT and parsing in Indian Languages, Coling 2012. Our system secured labeled attachment score of 90.66% and 80.77% for gold standard and automatic tracks respectively. These accuracies are 3 rd best and 5 th best for gold standard and automatic tracks respectively.
Proceedings of the NAACL …, 2010
This paper analyzes the relative importance of different linguistic features for data-driven dependency parsing of Hindi, using a feature pool derived from two state-of-the-art parsers. The analysis shows that the greatest gain in accuracy comes from the addition of morpho-syntactic features related to case, tense, aspect and modality. Combining features from the two parsers, we achieve a labeled attachment score of 76.5%, which is 2 percentage points better than the previous state of the art. We finally provide a detailed ...
2010
DeSR is a statistical transition-based dependency parser which learns from annotated corpora which actions to perform for building parse trees while scanning a sentence. We describe the experiments performed for the ICON 2010 Tools Contest on Indian Dependency Parsing. DesR was configured to exploit specific features from the Indian treebanks. The submitted run used a stacked combination of four configurations of the DeSR parser and achieved the best unlabeled accuracy scores in all languages. The contribution to the result of various choices is analyzed.
2016
This paper describes the Framework of Dependency Grammar for Marathi Parser. Dependency grammar is a grammar formalism, which is a capture direct relations between word to word in the sentence. The parser is a tools, which is automatic analysis sentence and draw a syntactic tree of sentence. The grammar formalism is mechanism to developing parser. Today in filed of computational linguistics, natural language processing and artificial intelligent have two kind of grammar formalism which is Phrase structure grammar and Dependency grammar. Both grammar formalism have their own limitation to developing a parser. In this paper I will use computational Panini grammar approach of dependency grammar. Computational Panini grammar has 37 dependency tag-set and those tag-sets are useful to annotate the Indian languages such as Hindi, Telugu and Bangla. However, I have to examine those dependency tag-set to Marathi and annotate a corpus which is useful to develop a Marathi parser. To annotate d...
Syntactic parsing in NLP is the task of working out the grammatical structure of sentences. Some of the purely formal approaches to parsing such as phrase structure grammar, dependency grammar have been successfully employed for a variety of languages. While phrase structure based constituent analysis is possible for fixed order languages such as English, dependency analysis between the grammatical units have been suitable for many free word order languages such as Indian languages. All these parsing approaches rely on identifying the linguistic units based on their formal syntactic properties and establishing the relationships between such units in the form of a tree. Dravidian languages which are spoken in Southern India are morphologically-rich, agglutinative languages whose characterization on purely structural terms such as adjectives, adverbs, conjunctions, postpositions as well as traditional interpretations of tense and finiteness pose problems in their syntactic analysis which are well-discussed in literature. We propose that the morpho-syntactic structures of Dravidian languages are better analysed from the theoretical perspectives of “Cognitive Grammar” or “Construction Grammar” where every grammatical structure is treated as a symbol that directly maps to meaningful conceptualizations. In other words, natural language is not treated as a formal system but as a functional system that is entirely symbolic or semiotic right from lexicon to grammar. Through linguistic evidences we point out that morpho-syntactic structures in Dravidian languages have their basis in meaningful discourse conceptualizations. Subsequently we hierarchically arrange all these conceptualizations into construction schemas that exhibit multiple-inheritance relationships and we explain all concrete morpho-syntactic structures as instances of these schemas. Based on this fresh theoretical grounding, we model parsing as automatic identification of meaningful dependency relations between such meaningful construction units. We formulated an annotation scheme for labelling the construction units and dependency relations that can exist between these units. Our approach to full parser annotation shows an average MALT LAS of 82.21% on Tamil gold annotated corpus of 935 sentences in a five-fold validation experiment. We conducted experiments by varying training data size, annotation scheme, length of a sentence in terms of number of chunks, granularity of tags and report the parser results of these scenarios. Finally, we build a pipeline with splitter, construction labeller, grouper as intermediate layers before MALT parser input and release the working full parser module.
In this paper we have addressed two dependency parsers for a free-word order Indian language, namely Bengali. One of the parsers is a grammar-driven one whereas the second parser is a datadriven one. The grammar-driven parser is an extension of a previously developed parser whereas the data driven parser is the MaltParser customized for Bengali. Both the parsers are evaluated on two datasets: ICON NLP Tool Contest data and Dataset-II (developed by us). The evaluation shows that the grammar-based parser outperforms the MaltParser on ICON data based on which the demand frames of the Bengali verbs were developed but its performance degrades while dealing with completely unknown data, i.e. dataset-II. However, MaltParser performs better on dataset-II and the whole data. Evaluation and error analysis further reveals that the parsers show some complimentary capabilities, which indicates a future scope for their integration to improve the overall parsing efficiency.
In this paper, we present our approach towards dependency parsing of Hindi language as a part of Hindi Shared Task on Parsing, COLING 2012. Our approach includes the effect of using different settings available in Malt Parser following the two-step parsing strategy i.e. splitting the data into interChunks and intraChunks to obtain the best possible LAS 1 , UAS 2 and LA 3 accuracy. Our system achieved best LAS of 90.99% for Gold Standard track and second best LAS of 83.91% for Automated data.
Lecture Notes in Computer Science, 2010
This paper describes an effort towards building a Telugu Dependency Treebank. We discuss the basic framework and issues we encountered while annotating. 1487 sentences have been annotated in Paninian framework. We also discuss how some of the annotation decisions would effect the development of a parser for Telugu. 1 Hindi is South Asian Language and an official language of India spoken by 300 million people. 2 Telugu is a Dravidian language and an official language of India spoken by 75 million people.
IAEME PUBLICATION, 2021
Malt and Maximum Spanning Tree (MST) parsers are two popular approaches as well as the base parsers in dependency parsing, and are also known as transition and graph based parsers respectively. Each parser has its own method of constructing a dependency tree. This paper describes the approaches for integrating transition and graph based parsers for parsing Telugu sentences. Combining these parsers at learning time is called Stacking and, at parsing time is called Ensembling. Stacking has two levels, level-0 and level-1. In level-0 model is trained under transition approach and generates the augmented trained data, which is used in level-1. The augmented data is then trained using graph based parser in level 1. It has shown better results when compared to the base parsers. Ensembled approach uses variations in base parsers, six variations of base parsers were built, among which four parsers are taken from transition based parsers and two parsers from graph based parsers. Majority, Attardi and Eisner are the three methods of Ensembled Approach which used for evaluating the parsing results. Majority approach has outperformed comparative with Attardi and Eisner methods. Different number of parsers are used for evaluating the performance, and obtained good results with three variations of base parsers, they are Covington projective, Covington non-projective and Non-projective second order features. Stacking and Ensembling dependency parsing methods have shown improved results when compared to transition and graph based parsers.
Syntactic parsing in NLP is the task of working out the grammatical structure of sentences. Some of the purely formal approaches to parsing such as phrase structure grammar, dependency grammar have been successfully employed for a variety of languages. While phrase structure based constituent analysis is possible for fixed order languages such as English, dependency analysis between the grammatical units have been suitable for many free word order languages. These approaches rely on identifying the linguistic units based on their formal syntactic properties and establishing the relationships between such units in the form of a tree. Instead, we characterize every morphosyntactic unit as a mapping between form and function on the lines of Construction Grammar and parsing as identification of dependency relations between such conceptual units. Our approach to parser annotation shows an average MALT LAS score of 82.21% on Tamil gold annotated corpus of 935 sentences in a five-fold validation experiment .
ACM Transactions on Asian and Low-Resource Language Information Processing
Building computational resources and tools for the under-resourced languages is strenuous for any Natural Language Processing (NLP) task. This paper presents the first dependency parser for an under-resourced Indian language, Nepali. A prerequisite for developing a parser for a language is a corpus annotated with the desired linguistic representations known as a treebank. With an aim of cross-lingual learning and typological research, we use a Bengali treebank to build a Bengali-Nepali parallel corpus and apply the method of annotation projection from the Bengali treebank to build a treebank for Nepali. With the developed treebank, MaltParser (with all algorithms for projective dependency structures) and a Neural network-based parser have been used to build Nepali parser models. The Neural network-based parser produced state-of-the-art results with 81.2 Unlabeled Attachment Score (UAS), 73.2 Label Accuracy (LA) and, 66.1 Labeled Attachment Score (LAS) on the gold test data. The pars...
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.