Papers by francesco cutugno
In this paper we explore the usefulness of prosodic features for syllable classification. In orde... more In this paper we explore the usefulness of prosodic features for syllable classification. In order to do this, we represent the syllable as a static analysis unit such that its acoustic-temporal dynamics could be merged into a set of features that the SVM classifier will consider as a whole. In the first part of our experiment we used MFCC as features for classification, obtaining a maximum accuracy of 86.66%. The second part of our study tests whether the prosodic information is complementary to the cepstral information for syllable classification. The results obtained show that combining the two types of information does improve the classification, but further analysis is necessary for a more successful combination of the two types of features.
Automatic Speech Segmentation for Italian (ASSI): tools, models, evaluation and applications
HAL (Le Centre pour la Communication Scientifique Directe), Jan 26, 2011
1. ABSTRACT The main aim of this work is to provide a set of tools for automatic segmentation of ... more 1. ABSTRACT The main aim of this work is to provide a set of tools for automatic segmentation of Italian speech into phones. We make available already trained acoustic models, a compiled version of the automatic aligner, a script to convert the segmentation ...
Automatic speech segmentation for Italian: tools, models, evaluation, and applications
On this website, a set of statistical models is made available, that can be used for the automati... more On this website, a set of statistical models is made available, that can be used for the automatic segmentation of Italian speech into phones. Segmenting a number of speech signals becomes therefore straightforward and fast: no training of the acoustic models is necessary. The ...
Pitch and Functional Characterization of Hesitation Phenomena in Italian Discourse
Schettino L, Betz S, Cutugno F, Wagner P. Pitch and functional characterization of hesitation phe... more Schettino L, Betz S, Cutugno F, Wagner P. Pitch and functional characterization of hesitation phenomena in Italian discourse. Presented at the Phonetics and Phonology in Europe, Barcelona, Spain

Multiple-source Data Collection and Processing into a Graph Database Supporting Cultural Heritage Applications
Journal on Computing and Cultural Heritage, 2021
The continuous growth of available resources on the web, both in the form of Linked Open Data and... more The continuous growth of available resources on the web, both in the form of Linked Open Data and on Social Networks, provides an important opportunity to gather information concerning specific kinds of touristic activities like, for example, cultural tourism, eco-tourism, bike-tourism, and so on. Both decision makers and tourists can take advantage from these data, as demonstrated by previous works, with institutional actors foreseeing an increase in the use of this data to substitute other time-consuming and expensive approaches. However, managing multiple sources built with different goals and structures is not straightforward, so specific design choices must be made when assembling this kind of information. Graph databases represent an ideal way to combine multiple-source data but, to be successful, strategies accounting for inconsistencies and format differences have to be defined to support coherent analysis. Also, the continuously changing nature of crowd-sourced data makes i...
On the use of the rhythmogram for automatic syllabic prominence detection
Interspeech 2011, 2011
... Title: On the use of the rhythmogram for automatic syllabic prominence detection. Authors: Bo... more ... Title: On the use of the rhythmogram for automatic syllabic prominence detection. Authors: Bogdan Ludusan, Antonio Origlia, Francesco Cutugno. ... Authors: Antonio Origlia, Giovanni Abete, Francesco Cutugno, Iolanda Alfano, Renata Savy, Bogdan Ludusan. ...

Percezione e Categorizzazione DI Foni Vocalici: Adeguatezza Delle Procedure Sperimentali
ABSTRACT SOMMARIO. Nel presente lavoro saranno discussi alcuni problemi relativi alle metodologie... more ABSTRACT SOMMARIO. Nel presente lavoro saranno discussi alcuni problemi relativi alle metodologie di indagine utilizzate negli studi sulla percezione fonetica. In particolare, si intende evidenziare il rischio di circolarità insito nelle procedure sperimentali che fanno ricorso a test di percezione su segmenti isolati con paradigma di risposte forzate. I risultati dell'esperimento che verrà illustrato in questa comunicazione conducono a porre in discussione l'utilità e l'adeguatezza di tali procedure, ma soprattutto a mettere in guardia lo sperimentatore dall'attribuire validità generale e assoluta alla categorizzazione indotta dallo specifico compito richiesto e dal set di risposte messo a disposizione. INTRODUZIONE. L'ipotesi di lavoro prende spunto dall'analisi dei risultati di due esperimenti di percezione di stimoli vocalici volti alla definizione di categorie e alla ricerca di confini tra foni vocalici adiacenti. In entrambi gli esperimenti l'attenzione è stata diretta alla porzione del piano F1/F2 relativa all'area di esistenza delle vocali dell'italiano poste sul continuum [a, E, e, i]. Nel primo esperimento [1] veniva effettuato un test di identificazione vocalica con stimoli vocalici sintetici, distinti in tre serie lungo l'asse [a-i], allo scopo di individuare i confini percettivi tra le quattro categorie vocaliche in esame: i risultati del test confermavano la possibilità di individuare aree di esistenza percettiva delimitate da confini netti e ben definiti e portavano a formulare l'ipotesi che il ricorso ad una valutazione di tipo percettivo potesse costituire un utile criterio integrativo di definizione di categorie vocaliche rispetto a quello articolatorio-acustico tradizionale.
Limiti e complessità del recupero dell'informazione da Treebank sintattiche
Su Alcune Correlazioni Tra Riduzioni Segmentali Tratti Prosodici Nel Parlato Spontaneo: Il Ruolo Del Fattore Tempo
Un'indagine sulla definizione del confine percettivo tra foni vocalici
L'A. presente et discute les resultats de deux tests d'identification effectues dans le b... more L'A. presente et discute les resultats de deux tests d'identification effectues dans le but de verifier l'existence des limites perceptives parmi differents phonemes vocaliques en italien. Il s'interesse en particulier a la portion du diagramme F1/F2 qui delimite l'aire des voyelles centrales et anterieures (a, e, e, i). A l'interieur de cette aire il etudie les fluctuations des limites perceptives entre les phonemes contigus qui, en italien parle, manifestent des zones de superposition significatives
API: Archivio del parlato italiano
The vowel system of Italian connected speech
Lecture Notes in Computer Science, 2005
We propose a Multigranular Automatic Speech Recognizer. The hypothesis is that speech signal cont... more We propose a Multigranular Automatic Speech Recognizer. The hypothesis is that speech signal contains information distributed on more different time scales. Many works from various scientific fields ranging from neurobiology to speech technologies, seem to concord on this assumption. In a broad sense, it seems that speech recognition in human is optimal because of a partial parallelization process according to which the left-to-right stream of speech is captured in a multilevel grid in which several linguistic analyses take place contemporarily. Our investigation aims, in this view, to apply these new ideas to the project of more robust and efficient recognizers.

A dialogue system for multimodal human-robot interaction
Proceedings of the 15th ACM on International conference on multimodal interaction, 2013
ABSTRACT This paper presents a POMDP-based dialogue system for multimodal human-robot interaction... more ABSTRACT This paper presents a POMDP-based dialogue system for multimodal human-robot interaction (HRI). Our aim is to exploit a dialogical paradigm to allow a natural and robust interaction between the human and the robot. The proposed dialogue system should improve the robustness and the flexibility of the overall interactive system, including multimodal fusion, interpretation, and decision-making. The dialogue is represented as a Partially Observable Markov Decision Process (POMDPs) to cast the inherent communication ambiguity and noise into the dialogue model. POMDPs have been used in spoken dialogue systems, mainly for tourist information services, but their application to multimodal human-robot interaction is novel. This paper presents the proposed model for dialogue representation and the methodology used to compute a dialogue strategy. The whole architecture has been integrated on a mobile robot platform and has bee n tested in a human-robot interaction scenario to assess the overall performances with respect to baseline controllers.
Proceedings of LREC2010, 2010
In this work we present further development of the SpLaSH (Spoken Language Search Hawk) project. ... more In this work we present further development of the SpLaSH (Spoken Language Search Hawk) project. SpLaSH implements a data model for annotated speech corpora integrated with textual markup (ie POS tagging, syntax, pragmatics) including a toolkit used to perform ...
Sillabificazione fonologica e sillabificazione fonetica
Atti del XXXIII, Congresso della Società di …, 2001
AN. ANA. S.: aligning text to temporal syntagmatic progression in Treebanks
… of the 5th Corpus Linguistics Conference …, 2009
1. Introduction The impressive results derived from multimillion-word corpora and the experience ... more 1. Introduction The impressive results derived from multimillion-word corpora and the experience gained in the automatic processing of syntactic data in the computational linguistics field in recent decades facilitates the pursuit of further objectives today. From a linguistic point of view, ...
EVALITA 2009: Abla srl Participant Report
Abstract. In this paper we describe the two systems we presented at the EVALITA 2009 workshop, fo... more Abstract. In this paper we describe the two systems we presented at the EVALITA 2009 workshop, for the connected digits recognition task. The former is an Abla srl proprietary speech recognizer, based on standard decoding algorithms, with syllabic acoustic models. The ...
Time-and Text-Aligned Annotations: the SpLaSH Data Model
In this work we present SpLaSH data model. SpLaSH (Spoken Language Search Hawk), is a freely avai... more In this work we present SpLaSH data model. SpLaSH (Spoken Language Search Hawk), is a freely available toolkit able to perform complex queries on spoken language corpora. The proposed system implements a data model for the integration of any kind of phonetic annotation ...
Reducing hardware and software complexity in Eye-tracking techniques
In the recent years eye tracking is becoming one of the most promising method-ology for human-com... more In the recent years eye tracking is becoming one of the most promising method-ology for human-computer interaction: its applications are expected in many different fields like human-computer interfaces implementation, adaptive inter-faces based on the user's ...
Uploads
Papers by francesco cutugno