Skip to main content

Emanuela Cresti

Followers

12

Following

18

Co-authors

2

Public Views

Massachusetts Institute of Technology (MIT)

University of Birmingham

Lorenzo Tomasin

University of Lausanne

Università del Molise

Cecilia Robustelli

Università degli studi di Modena e Reggio Emilia

Cornell University

Florian Schwarz

University of Pennsylvania

Viacheslav Kuleshov

Stockholm University

University of Salerno

Université Paris 1 - Panthéon-Sorbonne

Interests

Uploads

Papers by Emanuela Cresti

The Appendix of Comment according to Language into Act Theory

CHIMERA: Revista de Corpus de Lenguas Romances y Estudios Lingüísticos

The article deals with the détachement instances, an aspect of spoken language differing from the... more The article deals with the détachement instances, an aspect of spoken language differing from the binary structure (Topic-Comment) considered to both semantically and informationally form the basic unit of spoken language. According to Language into Act Theory, détachment instances are considered specific information units called Appendix of Comment (APC), with a clear distinction from the Topic unit. The APC may be formally identified in the corpus through its distribution after the Comment and its prosodic performance via a suffix unit. The APC records a frequency value of 4.28% of reference units, which is significantly lower than that of the Topic (close to 20%). The morpho-syntactic fillings of the APC show a kind of “randomness”, that cannot truly be generalized, unlike Topic, since they are employed “in the moment”, as late adjunctions, echoes, repetitions, deictics, and formulas. APC doesn’t constitute a syntactic/semantic island, as Topic does, and its content is ultimately...

The Discourse Connector according to the Language into Act Theory: data from IPIC Italian

Contenuto/Content 3 Linguistica delle varietà e Multilinguismo-Variety linguistics and Multilingu... more Contenuto/Content 3 Linguistica delle varietà e Multilinguismo-Variety linguistics and Multilingualism Gaetano Berruto La nozione di 'varietà di lingua': una categoria obsoleta? �� Maria Vender-Maria Teresa Guasti L'apprendimento della lettura nei bambini con italiano L2 �� Tanja Kupisch Italian as a heritage language in Germany-Acquisition outcomes and the role of cross-linguistic influence �� 4 Le lingue del Trentino-Alto Adige-Languages in Trentino-South Tyrol Giampaolo Salvi Come mettersi d'accordo se si è persa la testa? L'accordo parziale nel sintagma nominale delle varietà ladine: il caso dei sintagmi nominali con testa non-espressa �� Silvia Dal Negro-Katrin Tartarotti "Muttårschpråche daitsch, però ho sempre parlato italiano"� Comunità linguistiche di confine nella Bassa Atesina ��

$Research paper thumbnail of L\textquoteleftintonazione delle illocuzioni naturali rappresentative. Analisi e validazione percettiva$

L\textquoteleftintonazione delle illocuzioni naturali rappresentative. Analisi e validazione percettiva

Collezione dei preprint 1997-98, Lablita,, 1998

Il connettore discorsivo secondo la teoria della lingua in atto

Il progetto C-ORAL-ROM. Integrated reference corpora for spoken romance languages

The LABLITA Corpus & the Language into Act Theory: Analysis of Viterbo Excerpts

The role of prosody for the expression of illocutionary types. The prosodic system of questions in spoken Italian and French according to Language into Act Theory

Frontiers in Communication, Apr 17, 2023

L'intonation des illocutions naturelles représentatives ; analyse et validation perceptive

Macro-Syntaxe et Pragmatique, L'analyse linguistique de l'oral, 2002

International audienceno abstrac

Sessione Plenaria 1

The C-ORAL-ROM Project. New methods for spoken language archives in a multilingual romance corpus

C-ORAL-ROM is a multilingual corpus of spontaneous speech of around 1.200.000 words representing ... more C-ORAL-ROM is a multilingual corpus of spontaneous speech of around 1.200.000 words representing the four main Romance languages: French, Italian, Portuguese and Spanish.. The resource will be delivered in standard textual format, aligned to the audio source in a multimedia edition. C-ORAL-ROM aims to ensure both a sufficient representation of spontaneous speech variation in each language resource, and comparability among the four resources with respect to a definite set of variation parameters. The multimedia conception of C-ORAL-ROM allows simultaneously alignment and full appreciation of the acoustic information through the speech software WINPITCHCORPUS. The storage of spoken language resources is based on the identification of utterances in the four corpora through perceptively relevant prosodic properties. In C-ORAL-ROM, all the textual information is tagged simultaneously with respect to prosodic parsing and utterance limits. Each prosodic unit corresponding to an utterance is easily and directly aligned to its acoustic counterpart, thus ensuring a natural text -sound correspondence and the definition of a data base of possible speech acts in the four romance languages.

Ricerche grammaticali corpus based e corpus driven

Il progetto è diretto da Emanuela Cresti e vi partecipano ricercatori e dottorandi afferenti a LA... more Il progetto è diretto da Emanuela Cresti e vi partecipano ricercatori e dottorandi afferenti a LABLITA, esso è dedicato a ricerche grammaticali corpus based e corpus driven svolte su lessico e costruzioni del parlato spontaneo. Il progetto mira a dar risalto al valore euristico, innovativo rispetto alla grammatica tradizionale, della ricerca empirica fondata sull'analisi di vasti corpora di parlato. Nello specifico, base di dati dei vari filoni di ricerca sono i diversi sottocorpora di LABLITA (Campionamento LABLITA, C-ORAL-ROM Italia). Le ricerche si avvalgono delle competenze e delle strumentazioni informatiche messe a disposizione da LABLITA. Il quadro teorico che accomuna i diversi settori della ricerca è la "Teoria della lingua in atto" di Emanuela Cresti.

The Language into Act Theory: A Pragmatic Approach to Speech in Real-Life

This paper briefly introduces the Language into Act Theory (L-AcT), that proposes a pragmatic fra... more This paper briefly introduces the Language into Act Theory (L-AcT), that proposes a pragmatic framework for the corpus-based collection and analysis of spontaneous speech. The L-AcT methodology takes the utterance (i.e. the counterpart of a speech act) as the reference unit for analysis. A set of large-scale Romance corpora has been collected in accordance with the L-AcT methodology (LABLITA Corpus, C-ORAL-ROM, C-ORAL-BRASIL, Cor-DiAL). Data for each corpus can be compared across languages, since they are built using the same corpus design, which entails a set of variation parameters relevant for representing spontaneous speech and, specifically, its pragmatic variation. LABLITA-C-ORAL corpora are text/sound aligned at the utterance level. Empirical research carried out by LABLITA has verified a systematic correspondence between stretches of speech ending with a terminal prosodic break and the accomplishment of an illocutionary force, thus identifying utterances. Within the latter, ...

Seconda Giornata

IX Giornate di Studio del Gruppo di Fonetica Sperimentale dell'AIA Aspetti computazionali in... more

C-ORAL-ROM : integrated reference corpora for spoken Romance languages

by Emanuela Cresti, Massimo Moneglia, and Antonio M Sandoval

... 163 Maria Fernanda Bacelar do Nascimento, Jose Bettencourt Gonfalves, Rita Veloso, Sandra Ant... more

Corpus and Tools for the Acquisition of Italian L 2

This paper introduces the RIDIRE corpus, built by means of an open source tool (RIDIRE-CPI) for c... more This paper introduces the RIDIRE corpus, built by means of an open source tool (RIDIRE-CPI) for creating specifically designed web corpora through a targeted crawling strategy. The RIDIRE-CPI architecture combines existing open source tools with specifically developed modules, comprising a robust crawler, a user friendly web interface, several conversion and cleaning tools, an anti-duplicate filter, a language guesser, and a PoS-tagger. The RIDIRE corpus is a balanced Italian web corpus (1.5 billion tokens) designed for enhancing the study of Italian as a second language, while also being exploitable for lexicographic purposes. The targeted crawling was performed through content selection, metadata assignment, and validation procedures. These features allowed the construction of a large corpus with a specific design, covering a variety of language usage domains (News, Business, Administration and Legislation, Literature, Fiction, Design, Cookery, Sport, Tourism, Religion, Fine Arts,...

The illocution-prosody relationship and the Information Pattern in spontaneous speech according to the Language into Act Theory (L-AcT)

Linguistik Online, 2018

This paper introduces the question of the definition of reference units for speech, correlating w... more This paper introduces the question of the definition of reference units for speech, correlating with the necessary condition that they must be an adequate and useful means for analyzing large spoken corpora. According to Language into Act Theory (L-AcT), the utterance is the proper reference unit and the counterpart of the speech act (Austin 1962), being demarcated by prosody within the flow of speech. The pragmatic foundations of the utterance and its information structure will be described and are closely connected to the role of prosody in their identification. The pragmatic and information analysis of English and Romance examples are presented, which are taken from representative spoken corpora (C-ORAL-ROM, C-ORAL-BRAZIL, S. Barbara Corpus). Regarding the information structure, the Comment unit is considered the core of the Information Pattern and since its role is the expression of the illocution it automatically conveys the new information. The Comment may be accompanied and s...

Prospettive nello studio del lessico italiano

Proceedings e report, 2008

The Proceedings of the 9th Conference of the International Society of Italian Linguistics and Phi... more The Proceedings of the 9th Conference of the International Society of Italian Linguistics and Philology (SILFI), «Prospects in the study of Italian vocabulary» (Florence, 14-17 June 2006), comprise 88 contributions by scholars from Italy and abroad. The essays are divided into twelve sections, each representing a study prospect, thus illustrating the vitality of the great tradition of Italian studies on language. The Conference confirms the importance of tradition, but also points up how the new areas of study – concerning the use of information infrastructures for the acquisition and conservation of the linguistic heritage – are by now pivotal both for research and for the establishment of essential resources for the defence and promotion of our language. Meditation on the Italian lexicon at this moment in time signifies retrieving the relation between our language and our culture, which tends to be overshadowed in a period of globalisation and of vehicular language such as the pre...

Preprint 2001 N1

Introduzione ai corpora dell’italiano

Syntactic properties of spontaneous speech in the Language into Act Theory

Studies in Corpus Linguistics, 2014

The Appendix of Comment according to Language into Act Theory

CHIMERA: Revista de Corpus de Lenguas Romances y Estudios Lingüísticos

The article deals with the détachement instances, an aspect of spoken language differing from the... more The article deals with the détachement instances, an aspect of spoken language differing from the binary structure (Topic-Comment) considered to both semantically and informationally form the basic unit of spoken language. According to Language into Act Theory, détachment instances are considered specific information units called Appendix of Comment (APC), with a clear distinction from the Topic unit. The APC may be formally identified in the corpus through its distribution after the Comment and its prosodic performance via a suffix unit. The APC records a frequency value of 4.28% of reference units, which is significantly lower than that of the Topic (close to 20%). The morpho-syntactic fillings of the APC show a kind of “randomness”, that cannot truly be generalized, unlike Topic, since they are employed “in the moment”, as late adjunctions, echoes, repetitions, deictics, and formulas. APC doesn’t constitute a syntactic/semantic island, as Topic does, and its content is ultimately...

The Discourse Connector according to the Language into Act Theory: data from IPIC Italian

Contenuto/Content 3 Linguistica delle varietà e Multilinguismo-Variety linguistics and Multilingu... more Contenuto/Content 3 Linguistica delle varietà e Multilinguismo-Variety linguistics and Multilingualism Gaetano Berruto La nozione di 'varietà di lingua': una categoria obsoleta? �� Maria Vender-Maria Teresa Guasti L'apprendimento della lettura nei bambini con italiano L2 �� Tanja Kupisch Italian as a heritage language in Germany-Acquisition outcomes and the role of cross-linguistic influence �� 4 Le lingue del Trentino-Alto Adige-Languages in Trentino-South Tyrol Giampaolo Salvi Come mettersi d'accordo se si è persa la testa? L'accordo parziale nel sintagma nominale delle varietà ladine: il caso dei sintagmi nominali con testa non-espressa �� Silvia Dal Negro-Katrin Tartarotti "Muttårschpråche daitsch, però ho sempre parlato italiano"� Comunità linguistiche di confine nella Bassa Atesina ��

$Research paper thumbnail of L\textquoteleftintonazione delle illocuzioni naturali rappresentative. Analisi e validazione percettiva$

L\textquoteleftintonazione delle illocuzioni naturali rappresentative. Analisi e validazione percettiva

Collezione dei preprint 1997-98, Lablita,, 1998

Il connettore discorsivo secondo la teoria della lingua in atto

Il progetto C-ORAL-ROM. Integrated reference corpora for spoken romance languages

The LABLITA Corpus & the Language into Act Theory: Analysis of Viterbo Excerpts

The role of prosody for the expression of illocutionary types. The prosodic system of questions in spoken Italian and French according to Language into Act Theory

Frontiers in Communication, Apr 17, 2023

L'intonation des illocutions naturelles représentatives ; analyse et validation perceptive

Macro-Syntaxe et Pragmatique, L'analyse linguistique de l'oral, 2002

International audienceno abstrac

Sessione Plenaria 1

The C-ORAL-ROM Project. New methods for spoken language archives in a multilingual romance corpus

C-ORAL-ROM is a multilingual corpus of spontaneous speech of around 1.200.000 words representing ... more C-ORAL-ROM is a multilingual corpus of spontaneous speech of around 1.200.000 words representing the four main Romance languages: French, Italian, Portuguese and Spanish.. The resource will be delivered in standard textual format, aligned to the audio source in a multimedia edition. C-ORAL-ROM aims to ensure both a sufficient representation of spontaneous speech variation in each language resource, and comparability among the four resources with respect to a definite set of variation parameters. The multimedia conception of C-ORAL-ROM allows simultaneously alignment and full appreciation of the acoustic information through the speech software WINPITCHCORPUS. The storage of spoken language resources is based on the identification of utterances in the four corpora through perceptively relevant prosodic properties. In C-ORAL-ROM, all the textual information is tagged simultaneously with respect to prosodic parsing and utterance limits. Each prosodic unit corresponding to an utterance is easily and directly aligned to its acoustic counterpart, thus ensuring a natural text -sound correspondence and the definition of a data base of possible speech acts in the four romance languages.

Ricerche grammaticali corpus based e corpus driven

Il progetto è diretto da Emanuela Cresti e vi partecipano ricercatori e dottorandi afferenti a LA... more Il progetto è diretto da Emanuela Cresti e vi partecipano ricercatori e dottorandi afferenti a LABLITA, esso è dedicato a ricerche grammaticali corpus based e corpus driven svolte su lessico e costruzioni del parlato spontaneo. Il progetto mira a dar risalto al valore euristico, innovativo rispetto alla grammatica tradizionale, della ricerca empirica fondata sull'analisi di vasti corpora di parlato. Nello specifico, base di dati dei vari filoni di ricerca sono i diversi sottocorpora di LABLITA (Campionamento LABLITA, C-ORAL-ROM Italia). Le ricerche si avvalgono delle competenze e delle strumentazioni informatiche messe a disposizione da LABLITA. Il quadro teorico che accomuna i diversi settori della ricerca è la "Teoria della lingua in atto" di Emanuela Cresti.

The Language into Act Theory: A Pragmatic Approach to Speech in Real-Life

This paper briefly introduces the Language into Act Theory (L-AcT), that proposes a pragmatic fra... more This paper briefly introduces the Language into Act Theory (L-AcT), that proposes a pragmatic framework for the corpus-based collection and analysis of spontaneous speech. The L-AcT methodology takes the utterance (i.e. the counterpart of a speech act) as the reference unit for analysis. A set of large-scale Romance corpora has been collected in accordance with the L-AcT methodology (LABLITA Corpus, C-ORAL-ROM, C-ORAL-BRASIL, Cor-DiAL). Data for each corpus can be compared across languages, since they are built using the same corpus design, which entails a set of variation parameters relevant for representing spontaneous speech and, specifically, its pragmatic variation. LABLITA-C-ORAL corpora are text/sound aligned at the utterance level. Empirical research carried out by LABLITA has verified a systematic correspondence between stretches of speech ending with a terminal prosodic break and the accomplishment of an illocutionary force, thus identifying utterances. Within the latter, ...

Seconda Giornata

IX Giornate di Studio del Gruppo di Fonetica Sperimentale dell'AIA Aspetti computazionali in... more

C-ORAL-ROM : integrated reference corpora for spoken Romance languages

by Emanuela Cresti, Massimo Moneglia, and Antonio M Sandoval

... 163 Maria Fernanda Bacelar do Nascimento, Jose Bettencourt Gonfalves, Rita Veloso, Sandra Ant... more

Corpus and Tools for the Acquisition of Italian L 2

This paper introduces the RIDIRE corpus, built by means of an open source tool (RIDIRE-CPI) for c... more This paper introduces the RIDIRE corpus, built by means of an open source tool (RIDIRE-CPI) for creating specifically designed web corpora through a targeted crawling strategy. The RIDIRE-CPI architecture combines existing open source tools with specifically developed modules, comprising a robust crawler, a user friendly web interface, several conversion and cleaning tools, an anti-duplicate filter, a language guesser, and a PoS-tagger. The RIDIRE corpus is a balanced Italian web corpus (1.5 billion tokens) designed for enhancing the study of Italian as a second language, while also being exploitable for lexicographic purposes. The targeted crawling was performed through content selection, metadata assignment, and validation procedures. These features allowed the construction of a large corpus with a specific design, covering a variety of language usage domains (News, Business, Administration and Legislation, Literature, Fiction, Design, Cookery, Sport, Tourism, Religion, Fine Arts,...

The illocution-prosody relationship and the Information Pattern in spontaneous speech according to the Language into Act Theory (L-AcT)

Linguistik Online, 2018

This paper introduces the question of the definition of reference units for speech, correlating w... more This paper introduces the question of the definition of reference units for speech, correlating with the necessary condition that they must be an adequate and useful means for analyzing large spoken corpora. According to Language into Act Theory (L-AcT), the utterance is the proper reference unit and the counterpart of the speech act (Austin 1962), being demarcated by prosody within the flow of speech. The pragmatic foundations of the utterance and its information structure will be described and are closely connected to the role of prosody in their identification. The pragmatic and information analysis of English and Romance examples are presented, which are taken from representative spoken corpora (C-ORAL-ROM, C-ORAL-BRAZIL, S. Barbara Corpus). Regarding the information structure, the Comment unit is considered the core of the Information Pattern and since its role is the expression of the illocution it automatically conveys the new information. The Comment may be accompanied and s...

Prospettive nello studio del lessico italiano

Proceedings e report, 2008

The Proceedings of the 9th Conference of the International Society of Italian Linguistics and Phi... more The Proceedings of the 9th Conference of the International Society of Italian Linguistics and Philology (SILFI), «Prospects in the study of Italian vocabulary» (Florence, 14-17 June 2006), comprise 88 contributions by scholars from Italy and abroad. The essays are divided into twelve sections, each representing a study prospect, thus illustrating the vitality of the great tradition of Italian studies on language. The Conference confirms the importance of tradition, but also points up how the new areas of study – concerning the use of information infrastructures for the acquisition and conservation of the linguistic heritage – are by now pivotal both for research and for the establishment of essential resources for the defence and promotion of our language. Meditation on the Italian lexicon at this moment in time signifies retrieving the relation between our language and our culture, which tends to be overshadowed in a period of globalisation and of vehicular language such as the pre...

Preprint 2001 N1

Introduzione ai corpora dell’italiano

Syntactic properties of spontaneous speech in the Language into Act Theory

Studies in Corpus Linguistics, 2014