Papers by Maksymilian Michal Dabkowski

arXiv (Cornell University), Jun 12, 2023
Recursion is one of the hallmarks of human language. While many design features of language have ... more Recursion is one of the hallmarks of human language. While many design features of language have been shown to exist in animal communication systems, recursion has not. Previous research shows that GPT-4 is the first large language model (LLM) to exhibit metalinguistic abilities (Beguš, Dąbkowski, and Rhodes, 2023). Here, we propose several prompt designs aimed at eliciting and analyzing recursive behavior in LLMs, both linguistic and non-linguistic. We demonstrate that when explicitly prompted, GPT-4 can both produce and analyze recursive structures. Thus, we present one of the first studies investigating whether meta-linguistic awareness of recursion-a uniquely human cognitive property-can emerge in transformers with a high number of parameters such as GPT-4. 1 introduction Recursion is a process or pattern that repeats in a self-similar or self-referential manner. In linguistics, recursion refers to the embedding of phrases within phrases of the same type. This provides linguistic units with potentially infinite layers of depth and complexity. Recursion is-to the best of our knowledge-one of the few properties entirely unique to human language (Hockett, 1960). Despite previous claims to the contrary (Fitch and Hauser, 2004; Gentner et al., 2006), no other animal communication system has been convincingly shown to feature recursion (Beecher, 2021; Corballis, 2007). For this reason, recursion has become one of the most widely studied, but also hotly debated aspects of language (

The Austronesian voice system (AVS) is among the most typologically intriguing and well-studied p... more The Austronesian voice system (AVS) is among the most typologically intriguing and well-studied phenomena in syntax. Previous diachronic accounts have used the comparative method to argue that either the voice function or the nominalization function of the voice affixes should be reconstructed to Proto-Austronesian (PAn). We propose an alternative path of development using internal reconstruction as a primary methodological tool. First, we reconstruct both the voice and nominalization functions to PAn. We then argue that the non-active voice affixes originated in Pre-PAn as prepositions, which were incorporated into the verb complex as postverbs; the nominalizing function, on the other hand, arose through an inter-stage with compounds. This proposal accounts for a number of properties of AVS, including the prominence of arguments promoted to subject position and the subject-only restriction, and is supported by a typological parallel in Dinka. Finally, we discuss methodological issues in reconstructing typologically unusual morphosyntactic phenomena.

Phonology, Nov 1, 2021
A'ingae (or Cofán) is a language isolate spoken in the Ecuadorian and Colombian Amazon. This stud... more A'ingae (or Cofán) is a language isolate spoken in the Ecuadorian and Colombian Amazon. This study presents a description and analysis of the language's morphologically conditioned verbal stress assignment. Specifically, I show that A'ingae verbal morphemes can be classified with two binary parameters: the presence or absence of prestressing and the presence or absence of stress deletion (i.e. dominance), which vary independently. I formalise my analysis in Cophonology Theory, a non-representational theory of the phonologymorphology interface, which captures morpheme-specific phonology with constraint rankings particularised to morphological constructions. I argue that while non-representational approaches such as Cophonology Theory can handle the facts of A'ingae stress deletion straightforwardly, representational approaches lack the expressive power necessary to capture the stress facts of the language.

The intervocalic position favors voicing in stops. Yet, some languages have been reported to feat... more The intervocalic position favors voicing in stops. Yet, some languages have been reported to feature the opposite (unnatural) process of intervocalic devoicing. This paper investigates two such case studies. Pre-Berawan intervocalic *b and *g have developed into Berawan k. Pre-Kiput intervocalic *g, *ɟʝ, and *v have developed into Kiput k, cç, and f, respectively. To account for the data, we invoke Beguš's (2018, 2019) blurring process model of sound change. The model proposes that unnatural phonology derives from a sequence of at least three phonetically motivated sound changes. We argue that the steps involved in intervocalic devoicing are (i) the intervocalic fricativization of voiced stops, (ii) devoicing of fricatives, and (iii) the occlusion of devoiced fricatives. Each of the steps is independently attested and motivated. We demonstrate that our blurring process proposal explains aspects of the historical development unaccounted for by previous approaches, and present new evidence suggesting that a single sound change could not have operated in the prehistory of Berawan. Thus, we maintain the conservative position that unnatural diachronic developments arise from sequences of natural and phonetically grounded sound changes.

Proceedings of the Linguistic Society of America, Apr 27, 2023
This paper discusses and analyzes the variation between ai and ɨi in A'ingae (or Cofán, an Amazon... more This paper discusses and analyzes the variation between ai and ɨi in A'ingae (or Cofán, an Amazonian isolate, ISO 639-3: con) by comparing the data reported in Borman's (1976) dictionary with contemporary productions. In Borman (1976), ai does not generally appear after labial consonants; the distribution of ɨi is not restricted. In some modern productions, postlabial ai is allowed when the diphthong crosses a morpheme boundary (a+i). I propose that Borman's (1976) distribution of ai and ɨi is a consequence of a diachronic change of ai to ɨi after labial consonants (*ai > ɨi / B _). The contemporary distribution reflects paradigm leveling and contact-induced replacement: Borman's (1976) ɨi corresponds to contemporary ai if a is present in another related form. In novel productively-formed words, the availability of postlabial raising is speaker-specific. The proposed sound change of postlabial raising (*ai > ɨi / B _) is unusual and lacks obvious phonetic motivation. I speculate that postlabial raising reflects postlabial rounding (*ai > *ui / B _) opacified by subsequent unconditioned unrounding and centralizing of the back round vowel (*u > ɨ). Keywords. A'ingae; Cofán; postlabial raising; paradigmatic leveling; sound change; internal reconstruction; Amazon; Andes; telescoping; unnatural rule; fieldwork 1. Introduction. In this paper, I discuss and analyze the diachronic relationship between the closing front diphthong ai and the high fronting diphthong ɨi in A'ingae (or Cofán, an Amazonian isolate, ISO 639-3: con). To do so, I compare the realizations of morphologically simple and complex words reported in Borman's (1976) dictionary with their contemporary productions. I find systematic differences between Borman (1962) and contemporary A'ingae, which I take as evidence of recent language change. In Borman (1976), ai does not generally appear after labial consonants; the distribution of ɨi is not restricted. In forms reported by some contemporary speakers, postlabial ai is sometimes allowed, especially when the diphthong falls across a morpheme boundary (a+i). I propose that Borman (1976)'s distribution of ai and ɨi is a consequence of a diachronic change of ai to ɨi after labial consonants (*ai > ɨi / B _). The contemporary distribution reflects paradigm leveling and contact-induced replacement: ɨi is sometimes replaced by ai if a is present in another transparently related form and in identifiable borrowings from languages known by A'ingae speakers. In new productive formations, the availability of postlabial raising varies with the speaker. This shows that a diachronic change has been variably phonologized by contemporary speakers, leading to considerable language-internal variation. Finally, I note that the proposed postlabial raising (*ai > ɨi / B _) lacks any obvious phonetic motivation. Thus, it is an instance of an unusual and unexpected sound change. I speculate that * First of all, my heartfelt thanks to my Cofán collaborators who have shared their language with me. Thanks especially to Jorge Mendúa, Shen Aguinda, and Raúl Quieta for their kindness, patience, and insight. I would also like to thank
Semantics and Linguistic Theory, Mar 2, 2021
We explore the semantics and typology of functional morphemes encoding apprehensional, i. e. nega... more We explore the semantics and typology of functional morphemes encoding apprehensional, i. e. negative prospective, meanings through a detailed case study of the adjunct uses of =sa'ne 'APPR' in A'ingae (or Cofán, ISO 639-3: con, an Amazonian isolate). We provide the one of the first formal accounts of apprehension: In a structure [p [q=sa'ne]], =sa'ne 'APPR' encodes a modal semantics where the goal worlds of the actor responsible for p avoid a salient situation r ⇒ q. Finally, we reveal two inherent asymmetries among apprehensional functions (precautioning asymmetry and timitive asymmetry), thus making substantial predictions with regards to typological patterns in apprehensional morphology.

Final nasalization of voiced stops is phonetically unmotivated (i.e. not a consequence of univers... more Final nasalization of voiced stops is phonetically unmotivated (i.e. not a consequence of universal articulatory or perceptual tendencies). As such, final nasalization has been deemed an impossible sound change. Nonetheless, Blust (2005; 2016) proposes that final nasalization took place in four Austronesian languages: Kayan-Murik, Berawan dialects, Kalabakan Murut, and Karo Batak. In this paper, we argue final nasalization in these languages is not a single sound change and reduce it to a combination of phonetically grounded changes. We demonstrate that in Austronesian, final nasalization involved four steps: (i) fricativization of voiced stops, (ii) devoicing of the fricatives, (iii) spontaneous nasalization before voiceless fricatives, and (iv) occlusion of the nasalized fricatives to nasal stops. Finally, we extend our account to final nasalization in Dakota (Siouan) and propose a new explanation for the development of the unnatural final voicing in the related Lakota language. Our results shed light on the role of phonetic naturalness in diachrony and synchrony. We maintain that while phonetically unnatural phonological processes may arise via a sequence of sound changes or analogical extension, sound changes are always natural and phonetically grounded.

Phonology
A'ingae (or Cofán) is a language isolate spoken in the Ecuadorian and Colombian Amazon. This ... more A'ingae (or Cofán) is a language isolate spoken in the Ecuadorian and Colombian Amazon. This study presents a description and analysis of the language's morphologically conditioned verbal stress assignment. Specifically, I show that A'ingae verbal morphemes can be classified with two binary parameters: the presence or absence of prestressing and the presence or absence of stress deletion (i.e. dominance), which vary independently. I formalise my analysis in Cophonology Theory, a non-representational theory of the phonology–morphology interface, which captures morpheme-specific phonology with constraint rankings particularised to morphological constructions. I argue that while non-representational approaches such as Cophonology Theory can handle the facts of A'ingae stress deletion straightforwardly, representational approaches lack the expressive power necessary to capture the stress facts of the language.

Proceedings of the ... International Conference on Head-Driven Phrase Structure Grammar, Oct 21, 2017
This paper explores the conundrum posed by two different control constructions in Yucatec Maya, a... more This paper explores the conundrum posed by two different control constructions in Yucatec Maya, a Mayan language spoken by around 800,000 speakers in the Yucatán Peninsula and northern Belize. Basic syntactic structure of the language is introduced, and a general SBCG treatment of control in YM is presented, alongside with an example of motion verbs as control matrices. The unruly case of intransitive subjunctive control, where the controllee appears with an unexpected status (incompletive) and without set-A morphology, is discussed and a proposal to treat it as nominalization is evaluated. The nominalization proposal is rejected based on the following grounds: (1) nominalization tends to attract definitive morphology, which is absent from intransitive subjunctive control constructions, (2) nominalization does not truly explain the lack of set-A morphology if one desires to provide a unified account of set-A morphemes, (3) verbs bereft of otherwise expected set-A morphemes have an independent motivation in the form of agent focus constructions.
Proceedings of the annual meetings on phonology, May 13, 2023
First of all, my heartfelt thanks to my Cofán collaborators who have welcomed me to their communi... more First of all, my heartfelt thanks to my Cofán collaborators who have welcomed me to their community and shared their language with me. Thanks especially to my primary consultant on this project, Jorge Mendúa, for his kindness, patience, and insight. I would also like to thank Hannah Sande, the Phorum (Berkeley Phonetics, Phonology and Psycholinguistics Forum) audience, and the AMP reviewers for helpful discussions and invaluable feedback. My research was supported in part by an Oswalt Endangered Language Grant for the project "A'ingae nominal and deverbal morphophonology." 1 The following glossing abbreviations have been used: 3 = third person, DIST = distal, GPLS = greater plural subject, IPFV = imperfective, PASS = passive, PLS = plural subject, PRCL = preculminative, PROX = proximal, SMFC = semelfactive.

arXiv (Cornell University), May 1, 2023
The performance of large language models (LLMs) has recently improved to the point where the mode... more The performance of large language models (LLMs) has recently improved to the point where the models can perform well on many language tasks. We show here that for the first time, the models can also generate coherent and valid formal analyses of linguistic data and illustrate the vast potential of large language models for analyses of their metalinguistic abilities. LLMs are primarily trained on language data in the form of text; analyzing and evaluating their metalinguistic abilities improves our understanding of their general capabilities and sheds new light on theoretical models in linguistics. In this paper, we probe into GPT-4's metalinguistic capabilities by focusing on three subfields of formal linguistics: syntax, phonology, and semantics. We outline a research program for metalinguistic analyses of large language models, propose experimental designs, provide general guidelines, discuss limitations, and offer future directions for this line of research. This line of inquiry also exemplifies behavioral interpretability of deep learning, where models' representations are accessed by explicit prompting rather than internal representations.

Proceedings of the Linguistic Society of America, May 5, 2022
Cross-linguistically, affix order is commonly determined by semantic scope (Rice 2006) or a morph... more Cross-linguistically, affix order is commonly determined by semantic scope (Rice 2006) or a morphological template. Less frequently, affix order is free, which means that suffixes can be reordered without a concomitant change in scope. To address the question of what gives rise to and constrains free affix order (FAO), I present a case study of Paraguayan Guaraní (or PG, Tupí-Guaraní, Paraguay, ISO 639-3: gug). I argue that FAO in PG should be analyzed as driven by prosodic factors. The prosodic analysis has previously been proposed only for Chintang (Bickel et al. 2007). Two major analyses of FAO see the phenomenon as driven by either morphology (e. g. Ryan 2010) or prosody (Bickel et al. 2007). The morphological analysis proposes that FAO is a consequence of free variation within the morphological template. The prosodic analysis models FAO with prosodic subcategorization for phonologically prominent positions. I argue that the two analyses make different predictions as to the preconditions for and the extent of FAO. I show that both the morphological and the prosodic profile of FAO are attested. I propose that FAO in PG is an instance of the latter. Thus, I demonstrate that FAO is not a unified phenomenon, but rather should be typologized as driven by either morphological or prosodic factors.

Natural Language & Linguistic Theory
This paper describes and analyzes phonological processes pertinent to the glottal stop in A’ingae... more This paper describes and analyzes phonological processes pertinent to the glottal stop in A’ingae (or Cofán, iso 639-3: ). The operations which the glottal stops undergo and trigger reveal an interaction of two morphophonological parameters: stratum and stress dominance. First, verbal suffixes are organized in two morphophonological domains, or strata. Within the inner domain, glottal stops affect stress placement, which I analyze as an interaction with foot structure. In the outer domain, glottal stops do not have any effects on stress. Second, some verbal suffixes delete stress (i. e. they are dominant). Dominance is unpredictable and independent of the suffix’s morphophonological domain, but dominance and the phonological domain interact in a non-trivial way: only inner dominant suffixes delete glottalization. To account for the A’ingae data, I adopt Cophonologies by Phase (Sande et al. 2020), which (i) models phonological stratification while (ii) allowing for morpheme-specific ...

Proceedings of the Linguistic Society of America
This paper discusses and analyzes the variation between ai and ɨi in A’ingae(or Cofán, an Amazoni... more This paper discusses and analyzes the variation between ai and ɨi in A’ingae(or Cofán, an Amazonian isolate, ISO 639-3: con) by comparing the data reported in Borman’s (1976) dictionary with contemporary productions. In Borman (1976), ai does not generally appear after labial consonants; the distribution of ɨi is not restricted. In some modern productions, postlabial ai is allowed when the diphthong crosses a morpheme boundary (a + i). I propose that Borman’s (1976) distribution of ai and ɨi is a consequence of a diachronic change of ai to ɨi after labial consonants (* ai > ɨi /B _). The contemporary distribution reflects paradigm leveling and contact-induced replacement: Borman’s (1976) ɨi corresponds to contemporary ai if a is present in another related form. In novel productively-formed words, the availability of postlabial raising is speaker-specific. The proposed sound change of postlabial raising (*ai > ɨi /B _) is unusual and lacks obvious phonetic motivation. I specula...

Proceedings of the Annual Meetings on Phonology
In Paraguayan Guaraní (Tupian, ISO 639-3: gug), suffix order is determined by several factors, in... more In Paraguayan Guaraní (Tupian, ISO 639-3: gug), suffix order is determined by several factors, including syntactic scope, morphotactic restrictions, free variation, and prosody.Paraguayan Guaraní suffixes form two syntactic classes: predicate-level suffixes and clause-level suffixes. Both syntactic classes include stressable and stressless suffixes. Predicate-level suffixes typically precede clause-level suffixes. However, stressable suffixes always precede stressless ones. Furthermore, within both groups (stressable or stressless), the order of suffixes is largely free.I propose that stressable suffixes are independently prosodified phonological words and stressless suffixes are non-prosodified. I analyze the Paraguayan Guaraní suffix order as an interaction of mirroring between the order of suffixes and the order of syntactic operations, on the one hand, and prosodic subcategorization and demands on phonological well-formedness, on the other. Thus, I document and analyze an unusua...

Proceedings of the Linguistic Society of America, May 5, 2022
Cross-linguistically, affix order is commonly determined by semantic scope (Rice 2006) or a morph... more Cross-linguistically, affix order is commonly determined by semantic scope (Rice 2006) or a morphological template. Less frequently, affix order is free, which means that suffixes can be reordered without a concomitant change in scope. To address the question of what gives rise to and constrains free affix order (FAO), I present a case study of Paraguayan Guaraní (or PG, Tupí-Guaraní, Paraguay, ISO 639-3: gug). I argue that FAO in PG should be analyzed as driven by prosodic factors. The prosodic analysis has previously been proposed only for Chintang (Bickel et al. 2007). Two major analyses of FAO see the phenomenon as driven by either morphology (e. g. Ryan 2010) or prosody (Bickel et al. 2007). The morphological analysis proposes that FAO is a consequence of free variation within the morphological template. The prosodic analysis models FAO with prosodic subcategorization for phonologically prominent positions. I argue that the two analyses make different predictions as to the preconditions for and the extent of FAO. I show that both the morphological and the prosodic profile of FAO are attested. I propose that FAO in PG is an instance of the latter. Thus, I demonstrate that FAO is not a unified phenomenon, but rather should be typologized as driven by either morphological or prosodic factors.
Video recordings and notes from in-class and small-group elicitation sessions pertaining to lexic... more Video recordings and notes from in-class and small-group elicitation sessions pertaining to lexicon, grammar, and phonology, and of narrative texts. The course was carried out during the COVID-19 pandemic on Zoom. Recordings were made as compressed M4A files on Zoom, instead of as uncompressed WAV files using a digital recorder. Notes were taken using Google Docs, instead of with a notebook and pen; the Google documents are archived as PDF/A files. File bundles 004, 005, and 042 were deleted during the course of creating the archival deposit.
Semantics and Linguistic Theory, 2021
We explore the semantics and typology of functional morphemes encoding apprehensional, i.e. negat... more We explore the semantics and typology of functional morphemes encoding apprehensional, i.e. negative prospective, meanings through a detailed case study of the adjunct uses of =sa'ne 'APPR' in A'ingae (or Cofán, ISO 639-3: con, an Amazonian isolate). We provide one of the first formal accounts of apprehension: In a structure [p [q=sa'ne]], =sa'ne 'APPR' encodes a modal semantics where the goal worlds of the actor responsible for p avoid a salient situation r=>q. Finally, we reveal two inherent asymmetries among apprehensional functions (precautioning asymmetry and timitive asymmetry), thus making substantial predictions with regards to typological patterns in apprehensional morphology.

The intention of this proposal is to advocate for the inclusion of an emoji, the pretzel, as a Un... more The intention of this proposal is to advocate for the inclusion of an emoji, the pretzel, as a Unicode emoji character. The recognition of the pretzel, imbued with versatile symbolism, is widespread and cross-cultural, despite its long European tradition. The pretzel, a timeless and widely appreciated baked food, has emotional and cultural significance for a lot of people across the world, especially in Europe and America. The pretzel emoji has a potential for versatile usage due its unique looped shape which can be reinterpreted to stand for many things. Introduction There are pretzels found all around the globe. Even though the details of the pretzel’s advent have been lost somewhere in the pages of history, there is a legend which ties its origins to an VII century Italian monastery, where a monk once decided to presents his pupils with baked goods shaped like crossed arms, a traditional prayer gesture. With the increasing popularity of the pretzel, the three pretzel holes eventu...
Uploads
Papers by Maksymilian Michal Dabkowski