Academia.eduAcademia.edu

INTRODUCTION TO CORPUS LINGUISTICS

Abstract
sparkles

AI

Corpus linguistics emerges as an important methodology in linguistics, characterized by a variety of methods that have evolved with advancements in computer technology. The paper traces the historical development of corpus linguistics, from its initial formulation in the early 1980s to its current state, highlighting key contributions such as the Brown Corpus, which was the first electronic collection of English texts. The definition of a corpus is explored, distinguishing between different types, such as sample and monitor corpora, as well as annotated and unannotated corpora, emphasizing their significance in linguistic research.

Key takeaways

  • However, as they further observe, in late 1950s the corpus methodology was severely criticised and it became marginalised, but with the developments in computer technology the exploitation of massive corpora became possible, and the marriage of corpora with computer technology revived the interest in the corpus methodology.
  • Sinclair (1991) distinguishes two types of corpora, namely sample corpus and monitor corpus.
  • Kvĕtoň and Oliva (2002:19) observe that "the quality of corpus annotation is certainly among the pressing problems in current corpus linguistics.
  • Among others, there is also the problem of how representative a given corpus is, and the problem of what size it should have in order to be representative. Kohnen (2007) notes that a fi rst major diffi culty in corpus linguistics is connected with corpus size as it is not known exactly how large corpora must be in order to qualify for valid linguistic research.
  • Corpus linguistics and corpora have been with us starting from the early 1960's, when linguistic research for the fi rst time started to be assisted by means of computers.