Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2009, Proceedings of the Third …
…
4 pages
1 file
This paper is an attempt to show that an intermediary level of analysis is an effective way for carrying out various NLP tasks for linguistically similar languages. We describe a process for developing a simple parser for doing such tasks. This parser uses a grammar driven approach to annotate dependency relations (both inter and intra chunk) at an intermediary level. Ease in identifying a particular dependency relation dictates the degree of analysis reached by the parser. To establish efficiency of the simple parser we show the improvement in its results over previous grammar driven dependency parsing approaches for Indian languages like Hindi. We also propose the possibility of usefulness of the simple parser for Indian languages that are similar in nature.
2016
This paper describes the Framework of Dependency Grammar for Marathi Parser. Dependency grammar is a grammar formalism, which is a capture direct relations between word to word in the sentence. The parser is a tools, which is automatic analysis sentence and draw a syntactic tree of sentence. The grammar formalism is mechanism to developing parser. Today in filed of computational linguistics, natural language processing and artificial intelligent have two kind of grammar formalism which is Phrase structure grammar and Dependency grammar. Both grammar formalism have their own limitation to developing a parser. In this paper I will use computational Panini grammar approach of dependency grammar. Computational Panini grammar has 37 dependency tag-set and those tag-sets are useful to annotate the Indian languages such as Hindi, Telugu and Bangla. However, I have to examine those dependency tag-set to Marathi and annotate a corpus which is useful to develop a Marathi parser. To annotate d...
ACM Transactions on Asian and Low-Resource Language Information Processing
Building computational resources and tools for the under-resourced languages is strenuous for any Natural Language Processing (NLP) task. This paper presents the first dependency parser for an under-resourced Indian language, Nepali. A prerequisite for developing a parser for a language is a corpus annotated with the desired linguistic representations known as a treebank. With an aim of cross-lingual learning and typological research, we use a Bengali treebank to build a Bengali-Nepali parallel corpus and apply the method of annotation projection from the Bengali treebank to build a treebank for Nepali. With the developed treebank, MaltParser (with all algorithms for projective dependency structures) and a Neural network-based parser have been used to build Nepali parser models. The Neural network-based parser produced state-of-the-art results with 81.2 Unlabeled Attachment Score (UAS), 73.2 Label Accuracy (LA) and, 66.1 Labeled Attachment Score (LAS) on the gold test data. The pars...
Translation Today, 2021
This paper is an attempt in building a rule-based dependency parser for Telugu which can parse simple sentences. This study adopts Pāṇini's Grammatical (PG) tradition i.e., the dependency model to parse sentences. A detailed description of mapping semantic relations to vibhaktis (case suffixes and postpositions) in Telugu using PG is presented. The paper describes the algorithm and the linguistic knowledge employed while developing the parser. The research further provides results, which suggest that enriching the current parser with linguistic inputs can increase the accuracy and tackle ambiguity better than existing data-driven methods.
2013
In this paper, we present our approach towards dependency parsing of Hindi language as a part of Hindi Shared Task on Parsing, COLING 2012. Our approach includes the effect of using different settings available in Malt Parser following the two-step parsing strategy i.e. splitting the data into interChunks and intraChunks to obtain the best possible LAS 1, UAS 2 and LA 3 accuracy. Our system achieved best LAS of 90.99 % for Gold Standard track and second best LAS of 83.91 % for Automated data.
In this paper we have addressed two dependency parsers for a free-word order Indian language, namely Bengali. One of the parsers is a grammar-driven one whereas the second parser is a datadriven one. The grammar-driven parser is an extension of a previously developed parser whereas the data driven parser is the MaltParser customized for Bengali. Both the parsers are evaluated on two datasets: ICON NLP Tool Contest data and Dataset-II (developed by us). The evaluation shows that the grammar-based parser outperforms the MaltParser on ICON data based on which the demand frames of the Bengali verbs were developed but its performance degrades while dealing with completely unknown data, i.e. dataset-II. However, MaltParser performs better on dataset-II and the whole data. Evaluation and error analysis further reveals that the parsers show some complimentary capabilities, which indicates a future scope for their integration to improve the overall parsing efficiency.
2010
DeSR is a statistical transition-based dependency parser which learns from annotated corpora which actions to perform for building parse trees while scanning a sentence. We describe the experiments performed for the ICON 2010 Tools Contest on Indian Dependency Parsing. DesR was configured to exploit specific features from the Indian treebanks. The submitted run used a stacked combination of four configurations of the DeSR parser and achieved the best unlabeled accuracy scores in all languages. The contribution to the result of various choices is analyzed.
2019
In this paper, we have developed manually annotated Telugu corpora by following DS guidelines (2009) and experimented our Telugu dependency treebank data on the data-driven parsers like Malt (Nivre et al., 2007a) and MST (McDonald et al. 2006) for parsing Telugu sentences. In the dependency, we link the head and dependents with their dependency relations (drels) by giving kāraka and non-kāraka relations to them. Telugu annotated data contains token with their morph information, pos, chunk and the drels. We have used our final Telugu treebank data in CONLL format for parsing in malt and MST parsers. We evaluated the labeled attachment score (LAS), unlabeled attachment score (UAS) and labeled accuracy (LA) for both the parsers and also compared their score in case of dependency relation too. Finally, we evaluated the most frequent errors which occurred after parsing the sentences and explained them with relevant examples with appropriate linguistic analysis, so that we can improve the...
In recent years, transition-based parsers have shown promise in terms of efficiency and accuracy. Though these parsers have been extensively explored for multiple Indian languages, there is still considerable scope for improvement by properly incorporating syntactically relevant information. In this article, we enhance transition-based parsing of Hindi and Urdu by redefining the features and feature extraction procedures that have been previously proposed in the parsing literature of Indian languages. We propose and empirically show that properly incorporating syntactically relevant information like case marking, complex predication and grammatical agreement in an arc-eager parsing model can significantly improve parsing accuracy. Our experiments show an absolute improvement of ∼2% LAS for parsing of both Hindi and Urdu over a competitive baseline which uses rich features like part-of-speech (POS) tags, chunk tags, cluster ids and lemmas. We also propose some heuristics to identify ezafe constructions in Urdu texts which show promising results in parsing these constructions.
2013
The present paper describes a three stage technique to parse Hindi sentences. In the first stage we create a model with the features of head words of each chunk and their dependency relations. Here, the dependency relations are inter-chunk dependency relations. We have experimentally fixed a feature set for learning this model. In the second stage, we extract the intra-chunk dependency relations using a set of rules. The first stage is combined with the second stage to build a two-stage word level Hindi dependency parser. In the third stage, we formulate some rules based on features and used them to post-process the output given by the two-stage parser. 1
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.
The 7th …, 2009
Proceedings of the NAACL …, 2010
Proceedings of IJCNLP 2011, 2011
Concepts, Methodologies, Tools, and Applications
Journal of King Saud University - Computer and Information Sciences, 2017
Thesis, Jawaharlal Nehru University, 2023