Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2008, Proceedings of the 2014 International C* Conference on Computer Science & Software Engineering - C3S2E '14
…
125 pages
1 file
The grammatical structure of natural language shapes and defines nearly every mode of communication, especially in the digital and written form; the misuse of grammar is a common and natural nuisance, and a strategy for automatically detecting mistakes in grammatical syntax presents a challenge worth solving. This thesis research seeks to address the challenge, and in doing so, defines and implements a unique approach that combines machine-learning and statistical natural language processing techniques. Several important methods are established by this research: (1) the automated and systematic generation of grammatical errors and parallel error corpora; (2) the definition and extraction of over 150 features of a sentence; and (3) the application of various machine-learning classification algorithms on extracted feature data, in order to classify and predict the grammaticality of a sentence. v I express my greatest gratitude to my supervisor, Dr. Eric Harley, for introducing and piquing my interest in the topic; I am humbled and grateful for his enduring assistance, tireless patience, and thoughtful encouragement. He has provided advice and direction, especially where I have encountered pause or hesitation, and has inspired new ideas and avenues for exploration within this research. I am thankful for his endless support. I also extend thanks to the members of my thesis dissertation committee, Dr. Alex Ferworn, Dr. Cherie Ding, and Dr. Isaac Woungang, for their time and effort in reviewing my work. Their valuable feedback and insights have served to improve the relevancy and composition of this thesis, as well as my academic mettle. Lastly, I wish to convey my appreciation to the Department of Computer Science at Ryerson University, the faculty and staff, who have instructed and encouraged me to pursue my academic goals along the way.
2020
Grammatical Error Correction (GEC) is the task of correcting different types of errors in written texts. To manage this task, large amounts of annotated data that contain erroneous sentences are required. This data, however, is usually annotated according to each annotator's standards, making it difficult to manage multiple sets of data at the same time. The recently introduced Error Annotation Toolkit (ERRANT) tackled this problem by presenting a way to automatically annotate data that contain grammatical errors, while also providing a standardisation for annotation. ERRANT extracts the errors and classifies them into error types, in the form of an edit that can be used in the creation of GEC systems, as well as for grammatical error analysis. However, we observe that certain errors are falsely or ambiguously classified. This could obstruct any qualitative or quantitative grammatical error type analysis, as the results would be inaccurate. In this work, we use a sample of the F...
2007
This paper compares a deep and a shallow processing approach to the problem of classifying a sentence as grammatically wellformed or ill-formed. The deep processing approach uses the XLE LFG parser and English grammar: two versions are presented, one which uses the XLE directly to perform the classification, and another one which uses a decision tree trained on features consisting of the XLE's output statistics. The shallow processing approach predicts grammaticality based on n-gram frequency statistics: we present two versions, one which uses frequency thresholds and one which uses a decision tree trained on the frequencies of the rarest n-grams in the input sentence. We find that the use of a decision tree improves on the basic approach only for the deep parser-based approach. We also show that combining both the shallow and deep decision tree features is effective. Our evaluation is carried out using a large test set of grammatical and ungrammatical sentences. The ungrammatical test set is generated automatically by inserting grammatical errors into well-formed BNC sentences.
Some grammatical error detection methods, including the ones currently used by the Educational Testing Service's e-rater system , are tuned for precision because of the perceived high cost of false positives (i.e., marking fluent English as ungrammatical). Precision, however, is not optimal for all tasks, particularly the HOO 2012 Shared Task on grammatical errors, which uses F-score for evaluation. In this paper, we extend e-rater's preposition and determiner error detection modules with a largescale n-gram method ) that complements the existing rule-based and classifier-based methods. On the HOO 2012 Shared Task, the hybrid method performed better than its component methods in terms of F-score, and it was competitive with submissions from other HOO 2012 participants.
Many evaluation issues for grammatical error detection have previously been overlooked, making it hard to draw meaningful comparisons between different approaches, even when they are evaluated on the same corpus. To begin with, the three-way contingency between a writer's sentence, the annotator's correction, and the system's output makes evaluation more complex than in some other NLP tasks, which we address by presenting an intuitive evaluation scheme. Of particular importance to error detection is the skew of the data – the low frequency of errors as compared to non-errors – which distorts some traditional measures of performance and limits their usefulness, leading us to recommend the reporting of raw measurements (true positives, false negatives, false positives, true negatives). Other issues that are particularly vexing for error detection focus on defining these raw measurements: specifying the size or scope of an error, properly treating errors as graded rather than discrete phenomena, and counting non-errors. We discuss recommendations for best practices with regard to reporting the results of system evaluation for these cases, recommendations which depend upon making clear one's assumptions and applications for error detection. By highlighting the problems with current error detection evaluation, the field will be better able to move forward.
2009
A classifier which is capable of distinguishing a syntactically well formed sentence from a syntactically ill formed one has the potential to be useful in an L2 language-learning context. In this article, we describe a classifier which classifies English sentences as either well formed or ill formed using information gleaned from three different natural language processing techniques. We describe the issues involved in acquiring data to train such a classifier and present experimental results for this classifier on a variety of ill formed sentences. We demonstrate that (a) the combination of information from a variety of linguistic sources is helpful, (b) the trade-off between accuracy on well formed sentences and accuracy on ill formed sentences can be fine tuned by training multiple classifiers in a voting scheme, and (c) the performance of the classifier is varied, with better performance on transcribed spoken sentences produced by less advanced language learners.
2010
Abstract This paper proposes a novel approach to the problem of training classifiers to detect and correct grammar and usage errors in text by selectively introducing mistakes into the training data. When training a classifier, we would like the distribution of examples seen in training to be as similar as possible to the one seen in testing. In error correction problems, such as correcting mistakes made by second language learners, a system is generally trained on correct data, since annotating data for training is expensive.
Proceedings of CoNLL, 2020
We present a method for classifying syntactic errors in learner language, namely errors whose correction alters the morphosyntactic structure of a sentence. The methodology builds on the established Universal Dependencies syntactic representation scheme, and provides complementary information to other error-classification systems. Unlike existing error classification methods, our method is applicable across languages, which we showcase by producing a detailed picture of syntactic errors in learner English and learner Russian. We further demonstrate the utility of the methodology for analyzing the outputs of leading Grammatical Error Correction (GEC) systems.
2020
1,2,3Student, Dept. of Computer Engineering, Vidyalankar Institute of Technology, Mumbai, India 4Professor, Dept. of Computer Engineering, Vidyalankar Institute of Technology, Mumbai, India ---------------------------------------------------------------------***---------------------------------------------------------------------Abstract This paper identifies and examines the key principles underlying building a state-of-the-art grammatical error correction system. Techniques that are used include rule-based, syntax-based, statistical-based, classification and neural networks. This paper presents previous works of Grammatical Error Correction or Detection systems, challenges related to these systems and at last suggests future directions. We also present a possible scheme for the classification of grammar errors. Among the most observations, we found that efficient and robust grammar checking tools are scarce for real-time applications. Natural Language consists of the many sentence...
1987
The Constituent Likelihood Automatic Word-tagging System (CLAWS) was originally designed for the low-level grammatical analysis of the million-word LOB Corpus of English text samples. CLAWS does not attempt a full parse, but uses a firat-order Markov model of language to assign word-class labels to words. CLAWS can be modified to detect grammatical errors, essentially by flagging unlikely word-class transitions in the input text. This may seem to be an intuitively implausible and theoretically inadequate model of natural language syntax, but nevertheless it can successfully pinpoint most grammatical errors in a text. Several modifications to CLAWS have been explored. The resulting system cannot detect all errors in typed documents; but then neither do far more complex systems, which attempt a full parse, requiting much greater computation. Akkerman, Erik, Pieter Masereeuw, and Willem Meijs 1985 Designing a computerized lexicon for linguistic purposes Rodopi, Amsterdam Atwell, Eric Steven 1981 LOB Corpus Tagging Project: Manual Pre-edit Handbook. Departments of Computer Studies and Linguistics, University of Lancaster Atwell, Eric Steven 1982 LOB Corpus Tagging Project: Manual Postedit Handbook (A mini-grammar of LOB Corpus English, examining the types of error commonly made during automatic (computational) analysis of ordinary written EnglishJ
2009
Applications like word processors and other writing tools typically include a grammar checker. The purpose of a grammar checker is to identify sentences that are grammatically incorrect based on the syntax of the language. The proposed grammar checker is a rule-based system to identify sentences that are most likely to contain errors. The set of rules are automatically generated from a part of speech tagged corpus. The results from the grammar checker is a list of error sentences, error descriptions, and suggested corrections. A grammar checker for other languages can be similarly constructed, given a tagged corpus and a set of stop words.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.
Mathematical Statistician and Engineering Applications, 2021
The 6th Edition of …, 2008
Computers and the Humanities, 1987
Proceedings of the 36th annual meeting on Association for Computational Linguistics -, 1998
Proceedings of the International Conference on Head-Driven Phrase Structure Grammar
Research in Corpus Linguistics
Proceedings of the Eighteenth Conference on Computational Natural Language Learning: Shared Task, 2014