Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
AI
A corpus-based study investigates the usage of the words "strong" and "powerful" to identify similarities and differences in their contextual applications. The analysis reflects on relevant methodologies in corpus linguistics, focusing on their meanings, grammatical roles, and connotations in various contexts. Findings indicate distinct patterns in collocations and frequency of usage, suggesting implications for language learners and translators.
Input a Word, Analyze the World represents current perspectives on Corpus Linguistics (CL) from a variety of linguistic subdisciplines. Corpus Linguistics has proven itself an excellent methodology for the study of language variation and change, and is well-suited for interdisciplinary collaboration, as shown by the studies in this volume. Its title is inspired by the use of CL to assess language in different registers and with a variety of purposes. This collection contains thirty contributions by scholars in the field from across the globe, dealing with current topics on corpus production and corpus tools; lexical analysis, phraseology and grammar; translation and contrastive linguistics; and language learning. Language specialists will find these papers inspiring, as they present new insights on aspects related to research and teaching.
English for Specific Purposes, 1994
Corpus Pragmatics
Contemporaneously with the advances of technology as well as the advent of computers in language studies, we have witnessed a boom in the emergence of new books in Corpus Linguistics (see for example Dash & Ramamoorthy 2019; Paquot & Gries, 2020; Seoane & Biber, 2021). From among the informative books in this fast growing field of knowledge is the current one authored by Barth and Schnell in 2022. This work of scholarship has been organized in 11 chapters, which provide readers with state-of-the-art concepts of theory and practice for conducting research in the domain of Corpus Linguistics. The first two chapters function as an introduction in which the authors, succinctly, shed some light on the basic concept of corpus, its divergence from other approaches as well as its convergence with other usage-oriented fields within linguistics such as Sociolinguistics, Linguistics Typology and Language Change. The authors provide the reader with a definition of corpus and Corpus Linguistics, words, lexeme, type and token as well as some basic statistical concepts such as mode, mean and median. Later on, the authors make a distinction between structural context, syntagmatic context and constructional context in order to delineate the role of context in corpora. There are different types of corpora with specific composition criteria, which need to be delineated for the readers. In this regard, Chapter three, which is thematically divided into two parts, is a detailed description of the corpus composition criteria and typology. In the first part, the authors enumerate such concepts as size, balance, representativeness as well as authenticity and spontaneity as the core criteria for compiling a corpus. Furthermore, a subtle distinction is made between raw, primary
Special Issue Revista Epañola de Lingüística Aplicada, 25, 2012
Corpus analysis is an area of research that has broadened the scope of a number of different fields of language analysis. One aspect of this research is quantitative. For more than sixty years, linguists have demonstrated that language features can be counted and frequencies calculated, and that these data are useful for the interpretation and understanding of language. For this reason, corpus analysis has been used in several fields of knowledge to support or challenge hypotheses and theories. In this volume our intention is to show that corpus analysis not only deals with a large amount of numbers and quantities, it also comprises studies that consider both quantitative and qualitative analysis. Although the various writers use current techniques to compile and investigate corpora, our main interest is in how researchers apply corpus analysis. To this end, we include papers that cover a range of issues. The discourse types investigated include academic discourse, literary texts and teaching materials. The papers explore topics such as modality, cognition, language learning, lexicography, terminology, and typologies and employ approaches ranging from comparative analysis to genre studies. Taken together, the papers in this special issue have been selected to provide readers with an example of how researchers are developing and exploiting corpus methods to improve linguistic research.
Stubbs (2006), in his state of the art overview, draws attention to the frequent reticence or vagueness of corpus analysts in discussing their operational methods within a scientific context, (a context addressed in detail in Partington (forthcoming)). This lack of clarity in discussing the methodological framework employed is, perhaps, most surprising given the way in which corpus linguistics situates itself within a scientific frame, and lays such claims to a scientific nature. This brief paper, then, addresses the question posed in its title, namely, “What is Corpus Linguistics?” – is it a discipline, a methodology, a paradigm or none or all of these? - but does not attempt to offer any definitive answers. Rather, the aim is to present the reader with a number of observations on how corpus linguistics has been construed in its own literature and then to leave the question open, in the hope of stimulating further discussion.
Beyond Philology An International Journal of Linguistics, Literary Studies and English Language Teaching
Recently, teaching and learning processes have been significantly influenced by modern technologies. Thus, the teacher’s position as the only authority in the classroom has been changed into playing the role of a guide or a facilitator who should possess the knowledge and skills to use modern technologies and to freely access data. This change is particularly visible in the field of teaching and learning languages with the application of various educational platforms and software. Since this situation has been widely discussed since the 1990s, for the sake of this article only selected aspects have been taken into account. The major focus of the present article is to present language corpus analysis as a method of activating teachers and students as participants in the Data-Driven Learning (DDL) process.
International Journal of Corpus Linguistics, 2018
UAD TEFL International Conference
In this digital era, the role of computer technology as a resource for instruction of foreign language learners is increasing as educators recognise the ability of computer technology to produce both independent and collaborative learning environments. Computer technologies, for example the Internet, multimedia, and hypermedia have been introduced in English Language Learning and Teaching (ELLT) to foster language learning process, all of which fall under the category of Computer-Assisted Language Learning (CALL). Corpus linguistics is a systematic analysis of the actual (real) production of language (either spoken or written), in which texts are assembled using computer technology (concordancer) to form a large collection of authentic texts, called a corpus (plural-corpora) that comes in various sizes. Despite immense research on corpus linguistics in these recent decades, the potentials and limitations of Data-Driven Learning (DDL), the application of corpus linguistics in ELLT h...
LSP International Journal, 2021
This paper is intended for researchers involved in or contemplating research in corpus linguistics, and is concerned in particular with the language of corpus linguistics. It introduces and explains technical terms in the context in which they are normally used. Technical terms lead on to the concepts to which they refer, and the concepts are related to the procedures, including tagging and parsing, by which they are implemented. English and Malay are used as the languages of illustration, and for the benefit of readers who do not know Malay, Malay examples are translated into English. The paper has a historical dimension, and the language of corpus linguistics is traced to traditional usage in the language classroom, and in particular to the study of Latin in Europe. The inheritance from the past is evident in the design of MaLex, which is a working device that does empirical Malay corpus linguistics, and is presented here as a contribution to the digital humanities.
International Journal of English Studies (IJES), 2009
UNIVERSITY OF MURCIA www.um.es/ijes Recent and Applied Corpus-based Studies Corpus-based studies are gaining momentum in current linguistic studies. Just a quarter of a century ago it was rather obsolete to extract conclusions on data previously arranged, systematized and analyzed. In the first part of the 21 st century, this is precisely the most frequent method of analysis in both historical and synchronic linguistics. There seems to be sound reasons to proceed in this way: since samples of language in digital format are nowadays easily accessible and computers allow for a quick processing of huge amounts of data, the change of the research paradigm is shifting from theoretically based constructs to data based ones. Moreover, theoretical constructs have proved to be rather precarious, if we consider their instability along the history of ideas, theories or theoretical proposals advanced once and again by different authors. Language is after all something we can easily 'grasp' in so far as it is a formal system subject of quantification in some way in the field of lexis, morphology, syntax, and the supporting sound or graphical systems. From a scientific point of view, the analysis of language or linguistic items should not be subordinated to prefabricated theories, or representative of real language use. This is precisely what corpora claim and what corpora facilitate in linguistic studies.
2001
The appearance of not one but two introductions to corpus linguistics within the same series shows the maturation and diversification of this fledgling subdiscipline within linguistics. McEnery and Wilson offer an overview or annotated report on work done within the computer-corpus research paradigm, including computational linguistics, whereas Barnbrook offers a guide or manual on the procedures and methodology of corpus linguistics, particularly with regard to machine-readable texts in English and to the type of results thereby generated.
Corpus linguistics provides the methodology to extract meaning from texts. Taking as its starting point the fact that language is not a mirror of reality but lets us share what we know, believe and think about reality, it focuses on language as a social phenomenon, and makes visible the attitudes and beliefs expressed by the members of a discourse community.
International Journal of Corpus Linguistics, 2001
During the last decade, it has been common practice among the linguistic community in Europe-both on the continent and on the British Isles-to use corpus linguistics to verify the results of classical linguistics. In North America, however, the situation is different. There, the Philadelphia-based Linguistic Data Consortium, responsible for the dissemination of language resources, is addressing the commercially oriented market of language engineering rather than academic research, the latter often being more interested in universal grammar or semantic universals than in the idiosyncrasies of natural languages. American corpus linguists such as Doug Biber or Nancy Ide and general linguists who are corpus users by conviction such as Charles Fillmore are almost better known in Europe than in the United States, which is even more astonishing when we take into account that the first real corpus in the modern sense, the Brown Corpus, was compiled in Providence, R.I., during the sixties. Meanwhile, European corpus linguistics is gradually becoming a subdiscipline in its own right. Unfortunately, during the last few years, this lead to a slight bias towards those 'self-centred' issues such as the problems of corpus compilation, encoding, annotation and validation, the procedures needed for transforming raw corpus data into artificial intelligence applications and automatic language processing software, not to mention the problem of standardisation with regard to form and content (cf. the long-term project EAGLES [Expert Advisory Group on Language Engineering Standards] and
2018
A corpus is defined here as a principled collection of naturally occurring texts which are stored on a computer to permit investigation using special software. A corpus is principled because texts are selected for inclusion according to pre-defined research purposes. Usually texts are included on external rather than internal criteria. For example, a researcher who wants to investigate metaphors used in university lectures will attempt to collect a representative sample of lectures across a number of disciplines, rather than attempting to collect lectures that include a lot of figurative language. Most commercially available corpora are made up of samples of a particular language variety which aim to be representative of that variety. Here are some examples of some of the different types of corpora and how they represent a particular variety: General corpora An example of a general corpus is the British National Corpus which " … aims to represent the universe of contemporary British English [and] to capture the full range of varieties of language use. " (Aston & Burnard 1998: 5). As a result of this aim the corpus is very large (containing some 100 million words) and contains a balance of texts from a wide variety of different domains of spoken and written language. Large general corpora are sometimes referred to as reference corpora because they are often used as a baseline against which judgements about the language varieties held in more specialised corpora can be made. Specialised corpora Specialised corpora contain texts from a particular genre or register or a specific time or context. They may contain a sample of this type of text or, if the dataset is finite and of a manageable size, for example all of Shakespeare's plays, be complete. There are numerous examples of specialised corpora; these include The Michigan Corpus of Spoken English (approximately 1.7 million words of spoken data collected from a variety of different encounters at the University of Michigan), the International Corpus of Learner English (20,000 words taken from essays of students learning English as a foreign language) and the Nottingham Health Communication Corpus (see section 5.3 for more details) Comparable corpora Two or more corpora constructed along similar parameters but each containing a different language or a different variety of the same language can be regarded as comparable corpora. An example of this type is the CorTec Corpus which contains examples of technical language in texts from five areas in both English and Portuguese. Parallel corpora These are similar to comparable corpora in that they hold two or more collections of texts in different languages. The main difference lies in the fact that they have been aligned so that the user can view all the examples of a particular search term in one language and all the translation equivalents in a second language. The Arabic English Parallel News Corpus contains 2 million words of news stories in Arabic and their English translation collected between 2001 and 2004, and is aligned at sentence level.
2001
The appearance of not one but two introductions to corpus linguistics within the same series shows the maturation and diversification of this fledgling subdiscipline within linguistics. McEnery and Wilson offer an overview or annotated report on work done within the computer-corpus research paradigm, including computational linguistics, whereas Barnbrook offers a guide or manual on the procedures and methodology of corpus linguistics, particularly with regard to machine-readable texts in English and to the type of results thereby generated.
English Corpus Linguistics is a step-by-step guide to creating and analyzing linguistic corpora. It begins with a discussion of the role that corpus linguistics plays in linguistic theory, demonstrating that corpora have proven to be very useful resources for linguists who believe that their theories and descriptions of English should be based on real, rather than contrived, data. Charles F. Meyer goes on to describe how to plan the creation of a corpus, how to collect and computerize data for inclusion in a corpus, how to annotate the data that are collected, and how to conduct a corpus analysis of a completed corpus. The book concludes with an overview of the future challenges that corpus linguists face to make both the creation and analysis of corpora much easier undertakings than they currently are. Clearly organized and accessibly written, this book will appeal to students of linguistics and English language.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.