Academia.eduAcademia.edu

Classification and Generation of Grammatical Errors

2008, Proceedings of the 2014 International C* Conference on Computer Science & Software Engineering - C3S2E '14

The grammatical structure of natural language shapes and defines nearly every mode of communication, especially in the digital and written form; the misuse of grammar is a common and natural nuisance, and a strategy for automatically detecting mistakes in grammatical syntax presents a challenge worth solving. This thesis research seeks to address the challenge, and in doing so, defines and implements a unique approach that combines machine-learning and statistical natural language processing techniques. Several important methods are established by this research: (1) the automated and systematic generation of grammatical errors and parallel error corpora; (2) the definition and extraction of over 150 features of a sentence; and (3) the application of various machine-learning classification algorithms on extracted feature data, in order to classify and predict the grammaticality of a sentence. v I express my greatest gratitude to my supervisor, Dr. Eric Harley, for introducing and piquing my interest in the topic; I am humbled and grateful for his enduring assistance, tireless patience, and thoughtful encouragement. He has provided advice and direction, especially where I have encountered pause or hesitation, and has inspired new ideas and avenues for exploration within this research. I am thankful for his endless support. I also extend thanks to the members of my thesis dissertation committee, Dr. Alex Ferworn, Dr. Cherie Ding, and Dr. Isaac Woungang, for their time and effort in reviewing my work. Their valuable feedback and insights have served to improve the relevancy and composition of this thesis, as well as my academic mettle. Lastly, I wish to convey my appreciation to the Department of Computer Science at Ryerson University, the faculty and staff, who have instructed and encouraged me to pursue my academic goals along the way.