Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2009
…
4 pages
1 file
AI-generated Abstract
Initial results on parsing Arabic through treebank-based parsers and automatic Lexical Functional Grammar (LFG) f-structure annotation methods are presented. The study employs the Arabic Annotation Algorithm (A3) to utilize the functional annotations in the Penn Arabic Treebank (ATB), resulting in dependency f-scores of 77% in evaluations against the DCU250 Arabic gold standard dependency bank. The results reveal challenges related to data-sparseness and the complexity of phrasal categories, but establish a foundation for future enhancements in treebank-based Arabic parsing methodologies.
The International Conference on Information and Communication Technology Research (ICTRC), 2015
A Treebank is a linguistic resource that is composed of a large collection of manually annotated and verified syntactically analyzed sentences. Statistical Natural Language Processing (NLP) approaches have been successful in using these annotations for developing basic NLP tasks such as tokenization, diacritization, part-of-speech tagging, parsing, among others. In this paper, we address the problem of exploiting Treebank resources for statistical parsing of Modern Standard Arabic (MSA) sentences. Statistical parsing is significant for NLP tasks that use parsed text as an input such as Information Retrieval, and Machine Translation. We conducted an experiment on Pen Arabic Treebank (PATB) and the parsing performance obtained in terms of Precision, Recall, and F-measure was 82.4%, 86.6%, 84.4%, respectively.
This paper concentrates on contrasting between two well-known Arabic parsers that is the Stanford Parser and the Bikel parser by utilizing the Arabic Treebank (ATB). The contrast between the Stanford and Bikel parser is done for model preparing and testing, for this reason we made a software that empowers us to change over the ATB arrangement to language structure organize, change over the Arabic Morphological labels (tags) to Penn labels (tags), and assess the parsers yield by ascertaining the Precision, Recall, F-Score, and Tag Accuracy. We additionally alter Bikel Parser to utilize the Penn labels (tags) in preparing to enhance the Precision, Recall, F-Score, and Tag Accuracy comes about because of the parse yield.
Proceedings of the 23rd International Conference on Computational Linguistics, 2010
In this paper, we offer broad insight into the underperformance of Arabic constituency parsing by analyzing the interplay of linguistic phenomena, annotation choices, and model design. First, we identify sources of syntactic ambiguity understudied in the existing parsing literature. Second, we show that although the Penn Arabic Treebank is similar to other treebanks in gross statistical terms, annotation consistency remains problematic. Third, we develop a human interpretable grammar that is competitive with a latent variable PCFG. Fourth, we show how to build better models for three different parsers. Finally, we show that in application settings, the absence of gold segmentation lowers parsing performance by 2-5% F1.
The Arabic Treebank at the Linguistic Data Consortium has significantly revised and enhanced its annotation guidelines and annotation procedure over the past year. The revised syntactic guidelines are now being applied in annotation production, and the combination of the revised guidelines and a period of intensive annotator training has raised inter-annotator agreement f-measure scores already. Revised morphological/part-of-speech (POS) guidelines are nearly complete as well, and will be applied in annotation production in the near future. This paper reports on an experiment in automatically enhancing the old morphological/POS tags in the right direction and the resulting parsing improvement. Finally, a new division of the POS analysis marking both morphological form and POS function is proposed.
2010
We investigate Arabic Context Free Grammar parsing with dependency annotation comparing lexicalised and unlexicalised parsers. We study how morphosyntactic as well as function tag information percolation in the form of grammar transforms (Johnson, 1998 affects the performance of a parser and helps dependency assignment. We focus on the three most frequent functional tags in the Arabic Penn Treebank: subjects, direct objects and predicates . We merge these functional tags with their phrasal categories and (where appropriate) percolate case information to the non-terminal (POS) category to train the parsers. We then automatically enrich the output of these parsers with full dependency information in order to annotate trees with Lexical Functional Grammar (LFG) f-structure equations with produce f-structures, i.e. attribute-value matrices approximating to basic predicate-argument-adjunct structure representations. We present a series of experiments evaluating how well lexicalized, history-based, generative (Bikel) as well as latent variable PCFG (Berkeley) parsers cope with the enriched Arabic data. We measure quality and coverage of both the output trees and the generated LFG f-structures. We show that joint functional and morphological information percolation improves both the recovery of trees as well as dependency results in the form of LFG f-structures.
Federated Conference on Computer Science and Information Systems (FedCSIS 2012), 2012
A number of trainable dependency parsers have been presented in the literature. These parsers require tagged input: this may potentially cause a problem, because taggers are not in general 100% accurate, and any errors in tagging are likely to lead to errors in the output of the parsers. The current paper investigates the relationship between tagging errors and parsing errors. The investigation is carried out on Arabic text, using specific taggers and parsers, but the lessons that can be learned are applicable to other languages and other tools of the same kind.
2014
2006
This paper describes the construction of a dependency bank gold standard for Arabic, DCU 250 Arabic Dependency Bank (DCU 250), based on the Arabic Penn Treebank Corpus (ATB) (Bies and Maamouri, 2003; Maamouri and Bies, 2004) within the theoretical framework of Lexical Functional Grammar (LFG). For parsing and automatically extracting grammatical and lexical resources from treebanks, it is necessary to evaluate against established gold standard resources. Gold standards for various languages have been developed, but to our knowledge, such a resource has not yet been constructed for Arabic. The construction of the DCU 250 marks the first step towards the creation of an automatic LFG f-structure annotation algorithm for the ATB, and for the extraction of Arabic grammatical and lexical resources.
2009 International Multiconference on Computer Science and Information Technology, 2009
AbstractCorpora present a basic informational source for varied NLP applications. Their construction becomes now days necessary. This paper presents a tool created to help syntactic tagging an Arabic Treebank. It is based on an already con-structed grammar called ArabTAG which ...
2018
This paper presents a methodology for rule based bottom up parsing technique forModern Standard Arabic (MSA) inContext Free Grammar (CFG) formalism in Phrase Structure Grammar (PSG) representation, where the grammar isautomatically extracted from a syntactically annotated corpus.The extracted grammar is used to build an automatic lexicon andgrammar rules module. Furthermore, the extracted CFG is further transformed into Probabilistic Context Free Grammar (PCFG)that could be used in a hybrid approach, which is also calculated automatically. The used corpus is the Penn ArabicTreebank(PATB)and algorithm implementation is performed with Natural Language Processing Toolkit (NLTK).The parsershowed that automatic extraction of grammar improved the grammar building phase in both coverage of structures and timeneeded, but still needs further manual constrains addition. Automatic extraction of grammar is able to enhance rule basedgrammar parsers and it will enable a new paradigm of statistica...
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.
Proceedings of the EMNLP 2014 Workshop on Arabic Natural Language Processing (ANLP), 2014
Zenodo (CERN European Organization for Nuclear Research), 2022
ACM Transactions on Asian and Low-Resource Language Information Processing, 2022