Papers by Emanuela Cresti

CHIMERA: Revista de Corpus de Lenguas Romances y Estudios Lingüísticos
The article deals with the détachement instances, an aspect of spoken language differing from the... more The article deals with the détachement instances, an aspect of spoken language differing from the binary structure (Topic-Comment) considered to both semantically and informationally form the basic unit of spoken language. According to Language into Act Theory, détachment instances are considered specific information units called Appendix of Comment (APC), with a clear distinction from the Topic unit. The APC may be formally identified in the corpus through its distribution after the Comment and its prosodic performance via a suffix unit. The APC records a frequency value of 4.28% of reference units, which is significantly lower than that of the Topic (close to 20%). The morpho-syntactic fillings of the APC show a kind of “randomness”, that cannot truly be generalized, unlike Topic, since they are employed “in the moment”, as late adjunctions, echoes, repetitions, deictics, and formulas. APC doesn’t constitute a syntactic/semantic island, as Topic does, and its content is ultimately...

Contenuto/Content 3 Linguistica delle varietà e Multilinguismo-Variety linguistics and Multilingu... more Contenuto/Content 3 Linguistica delle varietà e Multilinguismo-Variety linguistics and Multilingualism Gaetano Berruto La nozione di 'varietà di lingua': una categoria obsoleta? ���������������������������������� Maria Vender-Maria Teresa Guasti L'apprendimento della lettura nei bambini con italiano L2 ������������������������������� Tanja Kupisch Italian as a heritage language in Germany-Acquisition outcomes and the role of cross-linguistic influence ��������������������������������������������������������������������� 4 Le lingue del Trentino-Alto Adige-Languages in Trentino-South Tyrol Giampaolo Salvi Come mettersi d'accordo se si è persa la testa? L'accordo parziale nel sintagma nominale delle varietà ladine: il caso dei sintagmi nominali con testa non-espressa �������������������������������������������������������������������������������������������� Silvia Dal Negro-Katrin Tartarotti "Muttårschpråche daitsch, però ho sempre parlato italiano"� Comunità linguistiche di confine nella Bassa Atesina ����������������������������������������
Collezione dei preprint 1997-98, Lablita,, 1998
Frontiers in Communication, Apr 17, 2023
Macro-Syntaxe et Pragmatique, L'analyse linguistique de l'oral, 2002
International audienceno abstrac

C-ORAL-ROM is a multilingual corpus of spontaneous speech of around 1.200.000 words representing ... more C-ORAL-ROM is a multilingual corpus of spontaneous speech of around 1.200.000 words representing the four main Romance languages: French, Italian, Portuguese and Spanish.. The resource will be delivered in standard textual format, aligned to the audio source in a multimedia edition. C-ORAL-ROM aims to ensure both a sufficient representation of spontaneous speech variation in each language resource, and comparability among the four resources with respect to a definite set of variation parameters. The multimedia conception of C-ORAL-ROM allows simultaneously alignment and full appreciation of the acoustic information through the speech software WINPITCHCORPUS. The storage of spoken language resources is based on the identification of utterances in the four corpora through perceptively relevant prosodic properties. In C-ORAL-ROM, all the textual information is tagged simultaneously with respect to prosodic parsing and utterance limits. Each prosodic unit corresponding to an utterance is easily and directly aligned to its acoustic counterpart, thus ensuring a natural text -sound correspondence and the definition of a data base of possible speech acts in the four romance languages.
Il progetto è diretto da Emanuela Cresti e vi partecipano ricercatori e dottorandi afferenti a LA... more Il progetto è diretto da Emanuela Cresti e vi partecipano ricercatori e dottorandi afferenti a LABLITA, esso è dedicato a ricerche grammaticali corpus based e corpus driven svolte su lessico e costruzioni del parlato spontaneo. Il progetto mira a dar risalto al valore euristico, innovativo rispetto alla grammatica tradizionale, della ricerca empirica fondata sull'analisi di vasti corpora di parlato. Nello specifico, base di dati dei vari filoni di ricerca sono i diversi sottocorpora di LABLITA (Campionamento LABLITA, C-ORAL-ROM Italia). Le ricerche si avvalgono delle competenze e delle strumentazioni informatiche messe a disposizione da LABLITA. Il quadro teorico che accomuna i diversi settori della ricerca è la "Teoria della lingua in atto" di Emanuela Cresti.

This paper briefly introduces the Language into Act Theory (L-AcT), that proposes a pragmatic fra... more This paper briefly introduces the Language into Act Theory (L-AcT), that proposes a pragmatic framework for the corpus-based collection and analysis of spontaneous speech. The L-AcT methodology takes the utterance (i.e. the counterpart of a speech act) as the reference unit for analysis. A set of large-scale Romance corpora has been collected in accordance with the L-AcT methodology (LABLITA Corpus, C-ORAL-ROM, C-ORAL-BRASIL, Cor-DiAL). Data for each corpus can be compared across languages, since they are built using the same corpus design, which entails a set of variation parameters relevant for representing spontaneous speech and, specifically, its pragmatic variation. LABLITA-C-ORAL corpora are text/sound aligned at the utterance level. Empirical research carried out by LABLITA has verified a systematic correspondence between stretches of speech ending with a terminal prosodic break and the accomplishment of an illocutionary force, thus identifying utterances. Within the latter, ...
IX Giornate di Studio del Gruppo di Fonetica Sperimentale dell'AIA Aspetti computazionali in... more IX Giornate di Studio del Gruppo di Fonetica Sperimentale dell'AIA Aspetti computazionali in fonetica, linguistica e didattica delle lingue: modelli e algoritmi
... 163 Maria Fernanda Bacelar do Nascimento, Jose Bettencourt Gonfalves, Rita Veloso, Sandra Ant... more ... 163 Maria Fernanda Bacelar do Nascimento, Jose Bettencourt Gonfalves, Rita Veloso, Sandra Antunes, Florbela Barreto, and Raquel Amaro 5.1 History ... We are also especially obliged to the project reviewers Johanna Moore and Louis ten Bosch: their suggestions have been ...

This paper introduces the RIDIRE corpus, built by means of an open source tool (RIDIRE-CPI) for c... more This paper introduces the RIDIRE corpus, built by means of an open source tool (RIDIRE-CPI) for creating specifically designed web corpora through a targeted crawling strategy. The RIDIRE-CPI architecture combines existing open source tools with specifically developed modules, comprising a robust crawler, a user friendly web interface, several conversion and cleaning tools, an anti-duplicate filter, a language guesser, and a PoS-tagger. The RIDIRE corpus is a balanced Italian web corpus (1.5 billion tokens) designed for enhancing the study of Italian as a second language, while also being exploitable for lexicographic purposes. The targeted crawling was performed through content selection, metadata assignment, and validation procedures. These features allowed the construction of a large corpus with a specific design, covering a variety of language usage domains (News, Business, Administration and Legislation, Literature, Fiction, Design, Cookery, Sport, Tourism, Religion, Fine Arts,...

Linguistik Online, 2018
This paper introduces the question of the definition of reference units for speech, correlating w... more This paper introduces the question of the definition of reference units for speech, correlating with the necessary condition that they must be an adequate and useful means for analyzing large spoken corpora. According to Language into Act Theory (L-AcT), the utterance is the proper reference unit and the counterpart of the speech act (Austin 1962), being demarcated by prosody within the flow of speech. The pragmatic foundations of the utterance and its information structure will be described and are closely connected to the role of prosody in their identification. The pragmatic and information analysis of English and Romance examples are presented, which are taken from representative spoken corpora (C-ORAL-ROM, C-ORAL-BRAZIL, S. Barbara Corpus). Regarding the information structure, the Comment unit is considered the core of the Information Pattern and since its role is the expression of the illocution it automatically conveys the new information. The Comment may be accompanied and s...

Proceedings e report, 2008
The Proceedings of the 9th Conference of the International Society of Italian Linguistics and Phi... more The Proceedings of the 9th Conference of the International Society of Italian Linguistics and Philology (SILFI), «Prospects in the study of Italian vocabulary» (Florence, 14-17 June 2006), comprise 88 contributions by scholars from Italy and abroad. The essays are divided into twelve sections, each representing a study prospect, thus illustrating the vitality of the great tradition of Italian studies on language. The Conference confirms the importance of tradition, but also points up how the new areas of study – concerning the use of information infrastructures for the acquisition and conservation of the linguistic heritage – are by now pivotal both for research and for the establishment of essential resources for the defence and promotion of our language. Meditation on the Italian lexicon at this moment in time signifies retrieving the relation between our language and our culture, which tends to be overshadowed in a period of globalisation and of vehicular language such as the pre...
Studies in Corpus Linguistics, 2014
Papers by Emanuela Cresti