Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
AI
Corpus linguistics emerges as an important methodology in linguistics, characterized by a variety of methods that have evolved with advancements in computer technology. The paper traces the historical development of corpus linguistics, from its initial formulation in the early 1980s to its current state, highlighting key contributions such as the Brown Corpus, which was the first electronic collection of English texts. The definition of a corpus is explored, distinguishing between different types, such as sample and monitor corpora, as well as annotated and unannotated corpora, emphasizing their significance in linguistic research.
Stubbs (2006), in his state of the art overview, draws attention to the frequent reticence or vagueness of corpus analysts in discussing their operational methods within a scientific context, (a context addressed in detail in Partington (forthcoming)). This lack of clarity in discussing the methodological framework employed is, perhaps, most surprising given the way in which corpus linguistics situates itself within a scientific frame, and lays such claims to a scientific nature. This brief paper, then, addresses the question posed in its title, namely, “What is Corpus Linguistics?” – is it a discipline, a methodology, a paradigm or none or all of these? - but does not attempt to offer any definitive answers. Rather, the aim is to present the reader with a number of observations on how corpus linguistics has been construed in its own literature and then to leave the question open, in the hope of stimulating further discussion.
Corpus Pragmatics
Contemporaneously with the advances of technology as well as the advent of computers in language studies, we have witnessed a boom in the emergence of new books in Corpus Linguistics (see for example Dash & Ramamoorthy 2019; Paquot & Gries, 2020; Seoane & Biber, 2021). From among the informative books in this fast growing field of knowledge is the current one authored by Barth and Schnell in 2022. This work of scholarship has been organized in 11 chapters, which provide readers with state-of-the-art concepts of theory and practice for conducting research in the domain of Corpus Linguistics. The first two chapters function as an introduction in which the authors, succinctly, shed some light on the basic concept of corpus, its divergence from other approaches as well as its convergence with other usage-oriented fields within linguistics such as Sociolinguistics, Linguistics Typology and Language Change. The authors provide the reader with a definition of corpus and Corpus Linguistics, words, lexeme, type and token as well as some basic statistical concepts such as mode, mean and median. Later on, the authors make a distinction between structural context, syntagmatic context and constructional context in order to delineate the role of context in corpora. There are different types of corpora with specific composition criteria, which need to be delineated for the readers. In this regard, Chapter three, which is thematically divided into two parts, is a detailed description of the corpus composition criteria and typology. In the first part, the authors enumerate such concepts as size, balance, representativeness as well as authenticity and spontaneity as the core criteria for compiling a corpus. Furthermore, a subtle distinction is made between raw, primary
Library Hi Tech, 2018
Purpose The purpose of this paper is to generate awareness of and interest in the techniques used in computer-based corpus linguistics, focusing on their methodological implications for research in library and information science (LIS). Design/methodology/approach This methodology paper provides an overview of computer-based corpus linguistics, describes the main techniques used in this field, assesses its strengths and weaknesses, and presents examples to illustrate the value of corpus linguistics to LIS research. Findings Overall, corpus-based techniques are simple, yet powerful, and they support both quantitative and qualitative analyses. While corpus methods alone may not be sufficient for research in LIS, they can be used to complement and to help triangulate the findings of other methods. Corpus linguistics techniques also have the potential to be exploited more fully in LIS research that involves a higher degree of automation (e.g. recommender systems, knowledge discovery sys...
2011
Perspectives on Corpus Linguistics is a collection of interviews with fourteen well-known researchers in the field of linguistics. Each interview consists of a set of ten questions: the first seven are common to all contributors while the last three are connected to the research experience of each guest. In the general questions, the invited scholars explore (sometimes controversial) topics such as the concept of representativeness, the role of intuition and the status of Corpus Linguistics. In the specific questions, they provide a thorough discussion of materials and methods in corpus research as well as theoretical and applied perspectives on the use of corpora in language studies. Whether experts or novices, the volume should be of interest to all those who want to learn about corpus linguistics and carry out research in this fascinating and growing area.
International Journal of Corpus Linguistics, 2001
During the last decade, it has been common practice among the linguistic community in Europe-both on the continent and on the British Isles-to use corpus linguistics to verify the results of classical linguistics. In North America, however, the situation is different. There, the Philadelphia-based Linguistic Data Consortium, responsible for the dissemination of language resources, is addressing the commercially oriented market of language engineering rather than academic research, the latter often being more interested in universal grammar or semantic universals than in the idiosyncrasies of natural languages. American corpus linguists such as Doug Biber or Nancy Ide and general linguists who are corpus users by conviction such as Charles Fillmore are almost better known in Europe than in the United States, which is even more astonishing when we take into account that the first real corpus in the modern sense, the Brown Corpus, was compiled in Providence, R.I., during the sixties. Meanwhile, European corpus linguistics is gradually becoming a subdiscipline in its own right. Unfortunately, during the last few years, this lead to a slight bias towards those 'self-centred' issues such as the problems of corpus compilation, encoding, annotation and validation, the procedures needed for transforming raw corpus data into artificial intelligence applications and automatic language processing software, not to mention the problem of standardisation with regard to form and content (cf. the long-term project EAGLES [Expert Advisory Group on Language Engineering Standards] and
Special Issue Revista Epañola de Lingüística Aplicada, 25, 2012
Corpus analysis is an area of research that has broadened the scope of a number of different fields of language analysis. One aspect of this research is quantitative. For more than sixty years, linguists have demonstrated that language features can be counted and frequencies calculated, and that these data are useful for the interpretation and understanding of language. For this reason, corpus analysis has been used in several fields of knowledge to support or challenge hypotheses and theories. In this volume our intention is to show that corpus analysis not only deals with a large amount of numbers and quantities, it also comprises studies that consider both quantitative and qualitative analysis. Although the various writers use current techniques to compile and investigate corpora, our main interest is in how researchers apply corpus analysis. To this end, we include papers that cover a range of issues. The discourse types investigated include academic discourse, literary texts and teaching materials. The papers explore topics such as modality, cognition, language learning, lexicography, terminology, and typologies and employ approaches ranging from comparative analysis to genre studies. Taken together, the papers in this special issue have been selected to provide readers with an example of how researchers are developing and exploiting corpus methods to improve linguistic research.
2001
The appearance of not one but two introductions to corpus linguistics within the same series shows the maturation and diversification of this fledgling subdiscipline within linguistics. McEnery and Wilson offer an overview or annotated report on work done within the computer-corpus research paradigm, including computational linguistics, whereas Barnbrook offers a guide or manual on the procedures and methodology of corpus linguistics, particularly with regard to machine-readable texts in English and to the type of results thereby generated.
2009
Corpus linguistics is one of the fastest-growing methodologies in contemporary linguistics. In a conversational format, this article answers a few questions that corpus linguists regularly face from linguists who have not used corpus-based methods so far. It discusses some of the central assumptions ('formal distributional differences reflect functional differences'), notions (corpora, representativity and balancedness, markup and annotation), and methods of corpus linguistics (frequency lists, concordances, collocations), and discusses a few ways in which the discipline still needs to mature.
In this paper I attempt to describe and characterise Corpus Linguistics (hereafter CL); define important terms used in the field; point out different perspectives; show applications; present its limitations and, above all, examine the contribution of CL to linguistic research, particularly for production of educational materials.
A catalogue record for this publication is available from the British Library. ISBN 978-1-108-74485-0 Paperback ISSN 2632-8097 (online) ISSN 2632-8089 (print)
English for Specific Purposes, 1994
This paper is an overview of the application of corpus linguistics methodologies, with special reference to the field of cross-cultural studies. It discusses the application of corpus techniques in the study of grammar, semantics, evaluation, contemporary language evolution, translation, discourse studies and cross-cultural issues. If some of these linguistic aspects are investigated by sorting to general corpora or, more precisely, ‘heterogeneric’ corpora, more specific research objectives may be achieved by compiling ‘monogeneric’, that is, ‘ad-hoc’ specialized corpora. Empirical data provided by both types of corpora may help cross-cultural studies to become more systematic in detecting the shifts in cultural practices as reflected in language.
UAD TEFL International Conference
In this digital era, the role of computer technology as a resource for instruction of foreign language learners is increasing as educators recognise the ability of computer technology to produce both independent and collaborative learning environments. Computer technologies, for example the Internet, multimedia, and hypermedia have been introduced in English Language Learning and Teaching (ELLT) to foster language learning process, all of which fall under the category of Computer-Assisted Language Learning (CALL). Corpus linguistics is a systematic analysis of the actual (real) production of language (either spoken or written), in which texts are assembled using computer technology (concordancer) to form a large collection of authentic texts, called a corpus (plural-corpora) that comes in various sizes. Despite immense research on corpus linguistics in these recent decades, the potentials and limitations of Data-Driven Learning (DDL), the application of corpus linguistics in ELLT h...
System, 2022
As a cutting-edge and rapidly developing area in modern language research and teaching, corpus linguistics (CL) can support the English for Academic Purposes (EAP) community by equipping researchers, practitioners and students with the knowledge of academic language. The timely book Corpus Linguistics for English for Academic Purposes, by Vander Viana and Aisling O'Boyle, investigates the CL-EAP interface and provides a detailed discussion of the key concepts, practices and research applications of CL which are relevant to the EAP community. As corpus linguists and academic English writers, we found the book informative since we had realized the merits of applying CL as methods/resources to language teaching and learning, while a dearth of volumes has specifically looked into the ways to go about using CL for EAP. Accordingly, this book fills the gap in the current knowledge regarding this interface and is thus expected to contribute to the ever-expanding areas of CL and EAP. The book consists of ten chapters. To give readers a clear picture of the book, we assume it could be divided into two large parts except for the last concluding chapter. The first six chapters provide a comprehensive and reader-friendly basis for novices in the field of EAP (Ch. 1-2), CL and the CL-EAP interface (Ch. 3-6). These chapters take up about half of the volume. The next three chapters (Ch. 7-9), from our perspective, constitute the fascinating part and the essence of this book, where the authors introduce corpus studies on spoken, written, and computer-mediated academic discourse (CMAD) respectively. To be more specific, in the first part (Ch. 1-6), the first two chapters concern the topic under investigation, i.e., EAP. Referring to key pieces of seminal works, the authors introduce how EAP which was once a branch of English for Specific Purposes (ESP) has since grown into a full-fledged discipline on its own, and point out that CL is one of the five important approaches to EAP. Chapters 3 and 4 describe the interface between the two fields of CL and EAP. It is noteworthy that the authors present a number of EAP corpora which are freely available in Section 3.4, including those in American English, British English and English as Lingua Franca. In Chapter 3, the authors also pay particular attention to the Corpus of Journal Articles (CJA) 2014, and provide a case study on the synonyms of the sequence, "this article" (p. 41), which serves to give the reader a first impression about a practical corpus-based EAP study. Chapter 5 touches upon corpus compilation. Considering that compiling a large-scale balanced corpus is too time-consuming and costly, general linguists adopting corpus-based approach usually do not compile a corpus themselves, yet it is feasible to do so for the EAP research. Various aspects on corpus compilation are discussed, including 25 potential criteria in the process, sampling, balance, size, ethical matters, and so on. The role of Chapter 6 is twofold. On the one hand, it serves as an introduction to some basic concepts put forward by the traditional corpus linguistic approach (also known as the Neo-Firthian or Birmingham school), such as concordance, collocation, and the like. On the other hand, it can be seen as a user manual of the corpus software, AntConc, for beginners. However, one point worth noting is that the latest version of AntConc seems to have undergone a significant change with the underlying codes completely rewritten. One consequence is that it does not support the standard .xml format of corpora introduced in the book any more. Thus, readers might find it hard to consult the book if they attempt to use the Manuscript (without Author Details) Click here to view linked References
2001
The appearance of not one but two introductions to corpus linguistics within the same series shows the maturation and diversification of this fledgling subdiscipline within linguistics. McEnery and Wilson offer an overview or annotated report on work done within the computer-corpus research paradigm, including computational linguistics, whereas Barnbrook offers a guide or manual on the procedures and methodology of corpus linguistics, particularly with regard to machine-readable texts in English and to the type of results thereby generated.
Glottometrics, 2017
The study aims to conduct a bibliometric analysis of corpus-related studies in linguistics between 2000 and 2015. Results show that the output of corpus-related publications have significantly increased in the past 15 years. In addition, traditional scientific powers such as the United States play leading roles in the area, while developing countries such as China also exert their impact in the area. More importantly, findings reveal that corpora have permeated a wide range of research areas in linguistics and have changed, at least in terms of methodology, these areas.
Asian-Pacific Journal of Second and Foreign Language Education, 2019
In this paper we argue that corpus linguistics needs to expand to cover a wider set of languages. While the reasons that some languages have not been provided with corpus data to date are clear, the intellectual and moral imperative to extend the range of corpus linguistics is strong. However, there are technical problems to be faced in such an extension of corpus linguistics. These problems are reviewed here and possible solutions to them explored. Following on from this, we consider what possible benefits the provision of appropriate corpus data may bring to languages currently untouched by the development of corpus linguistics.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.