Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2002, The KIPS Transactions:PartB
…
1 file
ABSTRACT Since a natural language has inherently structural ambiguities, one of the difficulties of parsing is resolving the structural ambiguities. Recently, a probabilistic approach to tackle this disambiguation problem has received considerable attention because it has some attractions such as automatic learning, wide-coverage, and robustness. In this paper, we focus on Korean probabilistic parsing model using head co-occurrence. We are apt to meet the data sparseness problem when we're using head co-occurrence because it is lexical. Therefore, how to handle this problem is more important than others. To lighten the problem, we have used the restricted and simplified phrase-structure grammar and back-off model as smoothing. The proposed model has showed that the accuracy is about 84%.
Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics - ACL '05, 2005
This paper reports the development of loglinear models for the disambiguation in wide-coverage HPSG parsing. The estimation of log-linear models requires high computational cost, especially with widecoverage grammars. Using techniques to reduce the estimation cost, we trained the models using 20 sections of Penn Treebank. A series of experiments empirically evaluated the estimation techniques, and also examined the performance of the disambiguation models on the parsing of real-world sentences.
IEEE Transactions on Learning Technologies, 2002
2000
In this paper we argue in favour of an integration between statistically and syntactically based parsing, where syntax is intended in terms of shallow parsing with elementary trees. None of the statistically based analyses produce an accuracy level comparable to the one obtained by means of linguistic rules . Of course their data are strictly referred to English, with the exception of [2, 3, 4]. As to Italian, purely statistically based approaches are inefficient basically due to great sparsity of tag distribution -50% or less of unambiguous tags when punctuation is subtracted from the total count as reported by . We shall discuss our general statistical and syntactic framework and then we shall report on an experiment with four different setups: the first two approaches are bottom-up driven, i.e. from local tag combinations: A. Statistics only tag disambiguation; B. Stastistics plus syntactic biases; C. Syntactic-driven disambiguation with no statistics; D. Syntactic-driven disambiguation with conditional probabilities computed on syntactic constituents. The second two approaches are top-down driven, i.e. driven from syntactic structural cues in terms of elementary trees: In a preliminary experiment we made with automatic tagger, we obtained 99% accuracy in the training set and 98% in the test set using combined approaches: data derived from statistical tagging is well below 95% even when referred to the training set, and the same applies to syntactic tagging.
This paper presents a new approach to syntactic disambiguation based on lexicalized grammars. While existing disambiguation models decompose the probability of parsing results into that of primitive dependencies of two words, our model selects the most probable parsing result from a set of candidates allowed by a lexicalized grammar. Since parsing results given by the lexicalized grammar cannot be decomposed into independent sub-events, we apply a maximum entropy model for feature forests, which allows probabilistic modeling without the independence assumption. Our approach provides a general method of producing a consistent probabilistic model of parsing results given by lexicalized grammars.
Research on Language and Computation, 2005
This article details our experiments on HPSG parse disambiguation, based on the Redwoods treebank. Using existing and novel stochastic models, we evaluate the usefulness of different information sources for disambiguation – lexical, syntactic, and semantic. We perform careful comparisons of generative and discriminative models using equivalent features and show the consistent advantage of discriminatively trained models. Our best system performs at over 76% sentence exact match accuracy.
Natural Language Engineering, 1997
In this paper, we introduce a method to represent phrase structure grammars for building a large annotated corpus of Korean syntactic trees. Korean is different from English in word order and word compositions. As a result of our study, it turned out that the differences are significant enough to induce meaningful changes in the tree annotation scheme for Korean with respect to the schemes for English. A tree annotation scheme defines the grammar formalism to be assumed, categories to be used, and rules to determine correct parses for unsettled issues in parse construction. Korean is partially free in word order and the essential components such as subjects and objects of a sentence can be omitted with greater freedom than in English. We propose a restricted representation of phrase structure grammar to handle the characteristics of Korean more efficiently. The proposed representation is shown by means of an extensive experiment to gain improvements in parsing time as well as grammar size. We also describe the system named Teb that is a software environment set up with a goal to build a tree annotated corpus of Korean containing more than one million units.
1993
Inthis paper we will showt hat Grammatical Inference is applicable to Natural Language Processing. Given the wide and complexrange of structures appearing in an unrestricted Natural Language likeE nglish, full Grammatical Inference, yielding a comprehensive syntactic and semantic definition of English, is too much to hope for at present. Instead, we focus on techniques for dealing with ambiguity resolution by probabilistic ranking; this does not require a full formal Chomskyan grammar.W eg iv e as hort overviewo ft he different levels and methods being investigated at CCALAS for probabilistic ranking of candidates in ambiguous English input. Grammatical Inference from English corpora. An earlier title for this paper was "Overviewo fg rammar acquisition research at CCALAS, Leeds University", but this was modified to avoid the impression of an incoherent set of research strands with no integrated, focussed common techniques or applications. The researchers in our group have nod etailed development plan imposed 'from above', but are working on independent PhD programmes; however, there are common theoretical tennets, ideas, and potential applications linking individual projects. In fact, preparing for the Colloquia on Grammatical Inference has helped us to appreciate these overarching, linking themes, as we realised that the definitions stated in the Programme clearly applied to our own work at CCALAS: 'Grammatical Inference ... has suffered from the lack of a focused research community ... Simply stated, the grammatical inference problem is to learn an efficient description that captures the essence of a set of data. This description may be used subsequently to classify data, or to generate further examples of similar data.' The data in our case is unrestricted English input, as exemplified by a Corpus or large collection of text samples. This renders a much harder challenge to Grammatical Inference than artificial languages, or selected examples of wellformed English sentences. The range of lexical items and grammatical constructs appearing in an unrestricted English Corpus is very large; and the problem is not just one of scale. The Corpus-based approach carries with it a blurring of the classical Chomskyan distinction between 'grammatical' and 'ungrammatical' English sentences. Indeed, [Sampson 87] went to the extreme of positing that there is NO boundary between grammatical and ungrammatical sentences in English; this might seem to imply that it is hopeless and eveni nv alid to attempt to infer a grammar for English. Furthermore, the Corpus-based approach eschews the use of 'intuitively constructed' examples in training: a learning algorithm should be trained with 'real' sentences from a Corpus. It would seem to followf rom this that we are also proscribed from artificially constructing negative counterexamples for our learning algorithms: we cannot guarantee that such counterexamples are truly illegal.
2004
Abstract We describe how simple, commonly understood statistical models, such as statistical dependency parsers, probabilistic context-free grammars, and word-to-word translation models, can be effectively combined into a unified bilingual parser that jointly searches for the best English parse, Korean parse, and word alignment, where these hidden structures all constrain each other. The model used for parsing is completely factored into the two parsers and the TM, allowing separate parameter estimation.
Comput. Linguistics, 1995
This article presents an efficient, implemented approach to cross-linguistic parsing based on Government Binding (GB) Theory (Chomsky 1986) and followers. One of the drawbacks to alternative GB-based parsing approaches is that they generally adopt a filter-based paradigm. These approaches typically generate all possible candidate structures of the sentence that satisfy X theory, and then subsequently apply filters in order to eliminate those structures that violate GB principles. (See, for example, Abney 1989; Correa 1991; Dorr 1993; Fong 1991.) The current approach provides an alternative to filter-based designs that avoids these difficulties by applying principles to descriptions of structures without actually building the structures themselves. Our approach is similar to that of Lin (1993) in that structure-building is deferred until the descriptions satisfy all principles; however, the current approach differs in that it provides a parameterization mechanism along the lines of D...
Linguistik online, 2003
Natural Language is highly ambiguous, on every level. This article describes a fast broadcoverage state-of-the-art parser that uses a carefully handwritten grammar and probabilitybased machine learning approaches on the syntactic level. It is shown in detail which statistical learning models based on Maximum-Likelihood Estimation (MLE) can support a highly developed linguistic grammar in the disambiguation process.
New Developments in Formal Languages and …, 2008
Computación y Sistemas, 2014
Proceedings of the 12th Conference of the European …, 2009
Computational Linguistics, 2003
Proceedings of the workshop on Speech and Natural Language - HLT '91, 1992