Corpus Lexicography in a Wider Context

Chen Gafni

Corpus Lexicography in a Wider Context

Chen Gafni

2019, Proceedings of Recent Advances in Natural Language Processing

https://doi.org/10.26615/978-954-452-056-4_041

visibility

…

description

7 pages

link

1 file

Sign up for access to the world's latest research

checkGet notified about relevant papers

checkSave papers to use in your research

checkJoin the discussion with peers

checkTrack your impact

Abstract

This paper describes a set of tools that offers comprehensive solutions for corpus lexicography. The tools perform a range of tasks, including construction of corpus lexicon, integrating information from external dictionaries, internal analysis of the lexicon, and lexical analysis of the corpus. The set of tools is particularly useful for creating dictionaries for under-resourced languages. The tools are integrated in a general-purpose software that includes additional tools for various research tasks, such as linguistic development analysis. Equipped with a user-friendly interface, the described system can be easily incorporated in research in a variety of fields.

Figures (4)

Figure 9: Lexicon summary by part-of-speech This macro generates a summary table of the lexicon. The summary table includes a list for each lexical field (e.g., “POS’’) that specifies the various values of the field (e.g., “Noun’, “Verb’”’). For each value, the list indicates the number of corpus tokens and types. The number of types is the number of items in the lexicon with the relevant value (e.g., the number of noun types), and the number of corpus tokens is calculated from the “Count” field in the lexicon (Figure 9). 6 Integrating Lexicons

The “Lexical development” macro analyzes lexical growth in corpora that record the age of production of every utterance. Using the age of firs attempt to produce target words (see 3), the macro divides the child’s lexicon into stages of lexica development (Figure 10). The first stage is marked by the acquisition of the first 10 words, the second by a total lexicon size of 50 words, and then an additional 50 words for every subsequent stage (Adam and Bat-El, 2009). Figure 10: Lexical development Stages of lexical development are aligned with recording sessions, such that if a theoretical stage boundary is reached in mid-session, the actual boundary will be assigned either to that session or to the preceding session (whichever is closer). For

specific items, but rather a list of indices of rows in the corpus where such sequences are found.

Related papers

Corpus tools for lexicographers

Adam Kilgarriff

To analyse corpus data, lexicographers need software that allows them to search, manipulate and save data, a 'corpus tool'. A good corpus tool is key to a comprehensive lexicographic analysis–a corpus without a good tool to access it is of little use. Both corpus compilation and corpus tools have been swept along by general technological advances over the last three decades. Compiling and storing corpora has become far faster and easier, so corpora tend to be far larger.

Log In

Corpus Lexicography in a Wider Context

Sign up for access to the world's latest research

Abstract

Figures (4)

Related papers

Related papers