Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
Corpus linguistics provides the methodology to extract meaning from texts. Taking as its starting point the fact that language is not a mirror of reality but lets us share what we know, believe and think about reality, it focuses on language as a social phenomenon, and makes visible the attitudes and beliefs expressed by the members of a discourse community.
The Routledge Handbook of Discourse Analysis, 2023
Synergy between corpus linguistics and discourse analysis Discourse analysis covers a vast range of areas and is also one of the least clearly defined fields in applied linguistics. An early conceptualisation is provided in Schiffrin et al. (2001), who define discourse in the following terms: (1) language in use (2) language structure beyond the sentence level, and (3) social practices and ideologies associated with language. Blommaert (2005: 2) notes that, traditionally, discourse has been treated in linguistic terms as 'language-in-use', informing areas such as pragmatics and speech act theory. However, for Blommaert discourse has a wider interpretation as 'languagein-action', i.e., 'meaningful symbolic behaviour', as representing social practices and ideologies. A useful distinction is made by Gee (2005), who defines the 'language-in-use' aspect as 'discourse' (with a little 'd') and the more 'languagein-action' orientation as 'Discourse' (with a capital D), involving not only linguistic practices but other semiotic elements. Discourses are created through recognition work of 'ways with words, actions, beliefs, emotions, values, interactions, people, objects, tools and technologies' that constitute a way of being a member of a particular discourse community (Gee, 2005: 20). Corpus linguistics is a field of enquiry whose essential nature, like that of discourse analysis, has also come under scrutiny. The main contention revolves around 'corpus-driven' vs. 'corpusbased' linguistics and whether corpus linguistics is a theory or a methodology. The 'corpusdriven' approach sees corpus linguistics as essentially a theory, seeking to describe the corpus as comprehensively as possible without being influenced by preconceptions and existing theories (Tognini Bonelli, 2001). The corpus-based approach, on the other hand, views corpus linguistics as a methodology for validating existing descriptions of language and making adjustments where necessary. In spite of these different theoretical positions, corpus linguistics is generally regarded as a methodology, and 'corpus-based' is used as an umbrella term for a range of corpus enquiries, which is the sense adopted in this chapter. Criticisms have been levelled against both corpus linguistic analysis and critical discourse analysis (CDA), with those of the latter applying equally to discourse analysis in general. CDA may rely on a small set of arbitrarily selected texts which lack representativeness (Stubbs, 1996), the analysis may be overly informed by the analyst's subjective preconceptions (Widdowson, 2000) and the approach is mainly qualitative. The main limitations of corpus-based analyses are that the texts present decontextualised examples of language use and the field does not easily 02_9780367473839int-c21_p1-327.indd 126 02_9780367473839int-c21_p1-327.indd 126
Critical Discourse Studies, 2019
This paper is an overview of the application of corpus linguistics methodologies, with special reference to the field of cross-cultural studies. It discusses the application of corpus techniques in the study of grammar, semantics, evaluation, contemporary language evolution, translation, discourse studies and cross-cultural issues. If some of these linguistic aspects are investigated by sorting to general corpora or, more precisely, ‘heterogeneric’ corpora, more specific research objectives may be achieved by compiling ‘monogeneric’, that is, ‘ad-hoc’ specialized corpora. Empirical data provided by both types of corpora may help cross-cultural studies to become more systematic in detecting the shifts in cultural practices as reflected in language.
Academic Writing
This volume explores the interaction between two traditions of investigating written academic prose that might broadly be called 'discourse analysis' and 'corpus linguistics'. The two traditions have much in common. Both take selected examples of naturally occurring discourse as their starting point. Both attempt to identify recurring patterns in those examples. Both relate their findings to the social, intellectual or ideological contexts in which the discourse plays a role. The priorities of the two approaches do tend to diverge, ...
Asian-Pacific Journal of Second and Foreign Language Education, 2019
Proceedings of EUROPHRAS 2017, London, 2017
This working paper presents the progress made thus far in the development of a corpus-lexicographical approach to discourse analysis, more specifically the application of Hanks' [5, 6] Corpus Pattern Analysis (CPA) procedure to a (critical) discourse analysis task. The theoretical basis of CPA is explained, followed by some practical applications of CPA, namely lexicography and the proposed method of discourse analysis. Examples are taken from an ongoing investigation into the use of 'killing' verbs in contemporary British English, which draws upon two corpora: the British National Corpus (BNC) and the animal-themed 'People', 'Products', 'Pests' and 'Pets' (PPPP) corpus [8]. Preliminary findings suggest that a CPA-assisted, or corpus-lexicographical, discourse analysis is one with a strong theoretical basis, whose transparency and systematicity empowers the analyst to make precise and persuasive arguments.
The corpus approach in its contemporary framework marks the return of linguistics within the boundaries of empirically founded sciences from the long predominance of introspection-based analysis. This first part of this paper elaborates on the advent of corpusbased research in linguistics. The second part describes the notion and types of corpora. The third part describes the advantages of corpus-based research and the basic characteristics of corpus linguistics. The last part of the paper explains certain limitations of the corpus-based analysis.
Stubbs (2006), in his state of the art overview, draws attention to the frequent reticence or vagueness of corpus analysts in discussing their operational methods within a scientific context, (a context addressed in detail in Partington (forthcoming)). This lack of clarity in discussing the methodological framework employed is, perhaps, most surprising given the way in which corpus linguistics situates itself within a scientific frame, and lays such claims to a scientific nature. This brief paper, then, addresses the question posed in its title, namely, “What is Corpus Linguistics?” – is it a discipline, a methodology, a paradigm or none or all of these? - but does not attempt to offer any definitive answers. Rather, the aim is to present the reader with a number of observations on how corpus linguistics has been construed in its own literature and then to leave the question open, in the hope of stimulating further discussion.
2020
The Routledge Handbook of Corpus Approaches to Discourse Analysis highlights the diversity, breadth, and depth of corpus approaches to discourse analysis, compiling new and original research from notable scholars across the globe. Chapters showcase recent developments influenced by the exponential growth in linguistic computing, advances in corpus design and compilation, and the applications of sound quantitative and interpretive techniques in analyzing text and discourse patterns. Key discourse domains covered by 35 empirical chapters include: • Research contexts and methodological considerations; • Naturally occurring spoken, professional, and academic discourse; • Corpus approaches to conversational discourse, media discourse, and professional and academic writing. The Routledge Handbook of Corpus Approaches to Discourse Analysis is key reading for both experienced and novice researchers working at the intersection of corpus linguistics and discourse analysis, as well as anyone interested in related fields and adjacent research approaches.
Abstract This book is entitled sociolinguistics and corpus linguistics, It is published in the united kingdom by Edinburgh university press, Edinburgh in 2010. The author is Paul Baker. The book is about 189 pages. Sociolinguistics is the relationships between society and language which takes an important character in society. Corpus linguistics is a relatively recent branch of linguistics, made popular since the advent of personal computer in the 1990s. Typically corpus linguistics is the study of language based on examples of real life language use. The word corpus comes from the Latin word for body ; the plural of is corpora. The book aims at how corpus linguistics methods be used gainfully in order to aid sociolinguistics. The book is curious about corpus techniques and corpus linguist who wants to investigate sociolinguistic problems. The theme of the book is perfect. The chapters are put in to a straight foreword alignment. They agree with each other. Each chapter completes the previous one till reaching the conclusions The book consists of seven chapters, they are as follows:- The first chapter is ''Introduction'' (1-30) presents the various types of corpora (written, spoken, general or specialized, 12-15) and the essential methods and concepts of corpus linguistics, like ''concordance'', ''annotation'', ''frequency''). The second chapter is ''Corpora and sociolinguistic variation'' (31-56) presents the possibilities of investigating the different registers (social varieties of a language) using corpus linguistic methods. The next Chapter is ''Diachronic variation'' (57-80) illustrates how linguistic changes can be observed using corpora different time depths. The chapter four is ''Synchronic variation'' (81-101) is dedicated to the possibilities of comparing synchronic differences, e.g. between the different varieties of English all over the world. The chapter five is ''Corpora and interpersonal communication'' (102-120) shifts attention to the value CL has for interactional linguistics (IL). The chapter six is ''Uncovering discourses'' (121-145) demonstrates how CL can be used to ''show evidence for constructed differences (e.g. man are constructed as *x*, women are constructed as *y*)'' (143). The chapter seven is ''Conclusion'' (146-156)sums up the book and offers prospects of the future developments in .
Systematic Review, 2020
This article conveys a case-of-systematic survey of outstanding progress on corpora conducted by researchers affiliated with different common-section institutions all over the world. Such a range overview selected 20 outstanding types of research from multi research-pushing institutions all around the world. These projects employ corpus techniques and technology to treat an enormous domain of research queries that are relevant to linguistic studies, language teaching and learning, cultural studies, and discourse analysis. These varied implementations of corpus techniques and advances clearly explain the great stress and chances that corpora applied in linguistics can hand to those who have the intention to research, educate, and learn the language.
Reduplication is important in language studies. Its linguistic form at the lexical level has long been explored in terms of various formalist theories. However, the linguistic function at other levels such as the discourse layer tends to be ignored. A reduplication corpus (ongoing compilation; 1687 items in total thus far) has been constructed as the baseline for an integrated approach to the interplay of various kinds of repetition in the use of language. The frequency of each token was calculated based on its occurrence in the British National Corpus (BNC). Then a wordlist with the top 102 items was proposed for related research topics such as frequency, percentage coverage, concordance, and collocation in terms of McCarthy’s framework (1990 and later) using MonoConc Pro,WordSmith 4.0 and the SARA 3.2 software. The probability of collocation was calculated in terms of mutual information (MI). The higher the MI score, the more genuine the association between two items (Church and Hanks, 1990). A powerful search engine, Google, was further employed to locate relevant texts on websites for the analysis of reduplication from lexical to discourse levels. Both reduplication and repetition do play a significant role and exhibit extensively a certain language musicality in our everyday life. # 2004 Elsevier B.V. All rights reserved.
2006
Series editors' preface xv Preface xvii Acknowledgements xix SECTION A: INTRODUCTION Unit A1 Corpus linguistics: the basics A1.1 Introduction A1.2 Corpus linguistics: past and present A1.3 What is a corpus? A1.4 Why use computers to study language? A1.5 The corpus-based approach vs. the intuition-based approach A1.6 Corpus linguistics: a methodology or a theory? A1.7 Corpus-based vs. corpus-driven approaches Summary Looking ahead 1 2 Unit A2 Representativeness, balance and sampling A2.1 Introduction A2.2 What does representativeness mean in corpus linguistics? A2.3 The representativeness of general and specialized corpora A2.4 Balance A2.5 Sampling Summary Looking ahead Unit A3 Corpus markup A3.1 Introduction A3.2 The rationale for corpus markup A3.3 Corpus markup schemes A3.4 Character encoding Summary Looking ahead Unit A4 Corpus annotation A4.1 Introduction A4.2 Corpus annotation = added value A4.3 How is corpus annotation achieved? A4.4 Types of corpus annotation A4.5 Embedded vs. standalone annotation C oatsrif s Summary 44 Looking ahead 45 UnitA5 Multilingual corpora 46 A5.1 Introduction 46 A5.2 Multilingual corpora: terminological issues 47 A5.3
This chapter offers an introduction to corpus linguistics as a methodology for studying language, literature, and other fields in the humanities. It defines corpus linguistics, explores its theoretical background, and discusses the steps and procedures involved in building and analyzing corpora. The benefits of the methodology are also highlighted in the chapter. Throughout the chapter I rely on my own corpus linguistic experiences to explain and show how corpus linguistic procedures actually work.
“More recently, it seems that use of CL techniques is becoming increasingly popular in critical approaches to discourse analysis” (Baker et al., 2008: 274-275). This interest is visible in the bare number of publication using corpus approach: according to the Scopus database, the total number of publications combining critical discourse analysis and corpus linguistics in 1990’s came to 3, in 2000’s it was 29, and since 2010 it has already reached 47 . This tendency also demonstrates itself on the leading conferences in the field: during the 2014 Critical Approaches to Discourse Analysis Among Disciplines (CADAAD) conference almost 40 authors used some form of corpus analysis in their studies. Both the methods used and the research problems are very diversified. Methods vary from bare analysis of frequencies (Alcaraz-Ariza, 2002; Mautner, 2007) through analysis of collocations (Don et al., 2010; Freake et al., 2010; Lischinsky, 2011; Mautner, 2007; Weninger, 2010) or keywords (Bachmann, 2011; Don et al., 2010; Lukac, 2011; Weninger, 2010) to the analysis of key semantic domains (Prentice, 2010). Whereas research subjects range from national identity issues (Don et al., 2010, Freake et al., 2010, Prentice, 2010), through different social problems (Kirejczyk, 1999, Lukac, 2011, Yasin et al., 2012) to social construction of businesswomen (Koller, 2004) or economic crisis (Lischinsky, 2011). Despite the growing interest, the variety of methods and the diversity of research subject, the steadily growing body of corpus-supported CDA studies has not been critically reviewed in order to identify most vulnerable points of the research practice and suggest some improvements. We attempt to fill this gap in the presented paper. Our review is based on over 30 papers in which authors declared to use some technique of corpus linguistics for some form of Critical Discourse Analysis (CDA/CL). We analyze the methods used by paper’s authors as well as the results provided by those methods in order to propose some points for improvement. The analysis concentrates on two major issues: • the relation between used methods, received results and postulated conclusions e.g. the degree to which the results support the conclusion; • the relation of the research practice to the benefits of using CL for CDA. Some of those benefits have been pointed out in the literature: reduction of researchers bias (Mautner, 2009), explicitness and systematicity (Marko, 2008), rather exhaustive than selective description (Hardt-Mautner, 1995) or more focused approach to texts achieved by highlighting lexical and grammatical regularities (Lischinsky, 2011). As a result we describe seven main points in which such CDA/CL analysis may be improved. The first one concerns corpus design: we show how decisions taken on this stage may limit the results. Secondly, we refer to the usage of statistics and demonstrate how some results can be improved by extending the number of issues statistics is used for. Moreover, we point to some inconsequence which may take place during the research process concerning both sticking to rules declared by the author and paying attention to numbers such as word frequencies. Another problem we discuss can be called “mind-reading problem”: while the results concern the proprieties of text, the conclusions regard cognitive states of the users. We also refer to the so-called cherry-picking problem (Breeze 2011), which is postulated to be solved by the usage of CL techniques (Degano, 2007, Lischinsky, 2011). Finally, we discuss briefly the role of researcher’s intuition and show some stages of CDA/CL research in which the intuition continues to play crucial role. For every of these points we present some examples form research practice. As a conclusion we offer some suggestions of improvements which may be beneficial for the growing community of CDA/CL practitioners in order to fully use the potential of corpus linguistics’ tool to reveal socially important discursive constructions. It is our hope that such critical review of the research practice provides some valuable insights not only for CDA researches but for all who use some combination of quantitative corpus methods and qualitative in-depth analysis.
2009
Corpus linguistics is one of the fastest-growing methodologies in contemporary linguistics. In a conversational format, this article answers a few questions that corpus linguists regularly face from linguists who have not used corpus-based methods so far. It discusses some of the central assumptions ('formal distributional differences reflect functional differences'), notions (corpora, representativity and balancedness, markup and annotation), and methods of corpus linguistics (frequency lists, concordances, collocations), and discusses a few ways in which the discipline still needs to mature.
Selected Proceedings from the Designing the …, 2009
In this paper I attempt to describe and characterise Corpus Linguistics (hereafter CL); define important terms used in the field; point out different perspectives; show applications; present its limitations and, above all, examine the contribution of CL to linguistic research, particularly for production of educational materials.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.