Papers by Dijana Petrovska-delacrétaz

Biometrics and cryptographya re twot ools which have high potential for providing information sec... more Biometrics and cryptographya re twot ools which have high potential for providing information security and privacy.Acombination of these twoc an eliminate their individual shortcomings, such as non-revocability,n on-diversity,a nd privacy issues in biometrics and need of strong authentication in cryptography. Cryptobiometric systems combine techniques from biometrics and cryptographyf or these purposes, and more interestingly,t oo btain biometrics based cryptographic keys. In this paper,w ea ddress the problem of sharing these keys. We propose ac ryptobiometric scheme in which twoc lients can share as ession keys ecurely and establish as ecure communication session. The scheme involves aC entral Authority for Registration and Authentication (CARA) with which the clients are registered. The CARA stores biometric data only in transformed, cancelable form, allowing for easy revocation of the templates and protecting privacy.T here are twodistinctive features of this protocol (1) it achievesmutual authentication and starts secure communication between two clients which may be previously unknown to each other,and (2) this protocol works even if the twoclients use different biometric modalities in the same (as well as in different) session. Nowadays, information exchange via electronic means is awidely employed task. While the transfer of infirmation is aday-to-day need, it is also required to protect the privacy of this information. The information may be sensitive and be meant for use of only designated entities. In order to protect the privacy of this information, its transmission is generally secured through cryptographic means. The information is first converted into unreadable form through aprocess called as encryption before sending it. The receiverneeds to perform adecryption operation to retrieve the information after receiving it. The encryption and decryption operations depend on long cryptographic keys. Generally,only those who have the correct keys in possession can recoverthe transmitted information. However, because these keys are long (e.g., the Advanced Encryption Standard (AES) [aes01] requires keys of sizes 128, 192, or 256 bits), these need to be stored somewhere. Therefore, in order to maintain the secrecya nd privacy of the information, these keys should be kept secret and access control mechanisms are required to share the keys only with the designated entities.
Biometrics-Based Secure Authentication Protocols
Springer eBooks, 2012
Cancelable Biometric System
Springer eBooks, 2012
Locality preserving binary face representations using auto‐encoders
IET Biometrics, Sep 1, 2022
Cet article présente une description de la base de données POLYCOST qui est dédiée aux applicatio... more Cet article présente une description de la base de données POLYCOST qui est dédiée aux applications de reconnaissance du locuteurà travers les lignes téléphoniques. Les caractéristiques de la base de données sont : large corpusà contenu varié (100 locuteurs), anglais parlé par desétrangers, chiffres lus et parole libre, enregistrementà travers des lignes de téléphone internationales, huit sessions et plus par locuteur.
Semi-Automatic Identification of Leopard Frogs
HAL (Le Centre pour la Communication Scientifique Directe), Mar 1, 2014
Text-independent speaker verification using automatically labelled acoustic segments
Conference of the International Speech Communication Association, 1998

Une empreinte audio à base d’ALISP appliquée à l’identification audio dans un flux radiophonique
HAL (Le Centre pour la Communication Scientifique Directe), May 1, 2012
Cet article presente un systeme d’identification audio pour detecter et identifier des publicites... more Cet article presente un systeme d’identification audio pour detecter et identifier des publicites et des morceaux de musique dans les flux radiophoniques en utilisant des unites acoustiques. Ces unites, nommees ALISP (Automatic Language Independent Speech Processing), sont apprises de maniere entierement automatique grâce a la decomposition temporelle, la quantification vectorielle et des modeles HMM. L’originalite de l’approche est qu’aucune transcription n’est utilisee pour apprendre les modeles HMM. Pour identifier des morceaux de musique et les publicites, les transcriptions ALISP des morceaux de reference sont comparees aux transcriptions du flux radiophonique de test en utilisant la distance de Levenshtein. Pour l’identification des publicites, nous obtenons un taux de precision de 99% et un taux de rappel de 94% pour un flux de test contenant 4401 publicites. Pour l’identification de morceaux de musique nous obtenons un taux de precision de 100% et un taux de rappel de 95% sur un flux de test contenant 505 morceaux de musique.

HAL (Le Centre pour la Communication Scientifique Directe), Nov 26, 2012
The IRIM group is a consortium of French teams working on Multimedia Indexing and Retrieval. This... more The IRIM group is a consortium of French teams working on Multimedia Indexing and Retrieval. This paper describes its participation to the TRECVID 2012 semantic indexing and instance search tasks. For the semantic indexing task, our approach uses a six-stages processing pipelines for computing scores for the likelihood of a video shot to contain a target concept. These scores are then used for producing a ranked list of images or shots that are the most likely to contain the target concept. The pipeline is composed of the following steps: descriptor extraction, descriptor optimization, classification, fusion of descriptor variants, higher-level fusion, and re-ranking. We evaluated a number of different descriptors and tried different fusion strategies. The best IRIM run has a Mean Inferred Average Precision of 0.2378, which ranked us 4th out of 16 participants. For the instance search task, our approach uses two steps. First individual methods of participants are used to compute similrity between an example image of instance and keyframes of a video clip. Then a two-step fusion method is used to combine these individual results and obtain a score for the likelihood of an instance to appear in a video clip. These scores are used to obtain a ranked list of clips the most likely to contain the queried instance. The best IRIM run has a MAP of 0.1192, which ranked us 29th on 79 fully automatic runs.
Cohort selection for text-dependent speaker verification score normalization
In this paper a speaker dependent cohort selection for T-norm score normalization is proposed in ... more In this paper a speaker dependent cohort selection for T-norm score normalization is proposed in the context of text-dependent speaker verification. The goal of the proposed technique is to find a set of cohort speakers who are close to the target speaker. In order to properly select the subset of speakers for the normalization, a distance between each target speaker model and the the available normalization models is computed and the nearest models are chosen to represent the cohort set for that target model. The proposed system is evaluated on Part1 of the RSR2015 database. With the proposed normalization method a relative improvement of 71% in terms of the Equal Error Rater (EER) is achieved.

DOAJ (DOAJ: Directory of Open Access Journals), Apr 1, 2004
This paper presents an overview of a state-of-the-art text-independent speaker verification syste... more This paper presents an overview of a state-of-the-art text-independent speaker verification system. First, an introduction proposes a modular scheme of the training and test phases of a speaker verification system. Then, the most commonly speech parameterization used in speaker verification, namely, cepstral analysis, is detailed. Gaussian mixture modeling, which is the speaker modeling technique used in most systems, is then explained. A few speaker modeling alternatives, namely, neural networks and support vector machines, are mentioned. Normalization of scores is then explained, as this is a very important step to deal with real-world data. The evaluation of a speaker verification system is then detailed, and the detection error trade-off (DET) curve is explained. Several extensions of speaker verification are then enumerated, including speaker tracking and segmentation by speakers. Then, some applications of speaker verification are proposed, including on-site applications, remote applications, applications relative to structuring audio information, and games. Issues concerning the forensic area are then recalled, as we believe it is very important to inform people about the actual performance and limitations of speaker verification systems. This paper concludes by giving a few research trends in speaker verification for the next couple of years.
Cryptographic Key Regeneration Using Biometrics
Springer eBooks, 2012

2018 4th International Conference on Advanced Technologies for Signal and Image Processing (ATSIP)
Biometric systems suffer from non-revocabilty. In this paper, we propose a cancelable speaker ver... more Biometric systems suffer from non-revocabilty. In this paper, we propose a cancelable speaker verification system based on classical Gaussian Mixture Models (GMM) methodology enriched with the desired characteristics of revocability and privacy. The GMM model is transformed into a binary vector that is used by a shuffling scheme to generate a cancelable template and to guarantee the cancelabilty of the overall system. Leveraging the shuffling scheme, the speaker model can be revoked and another model can be reissued. Our proposed method enables the generation of multiple cancelable speaker templates from the same biometric modality that cannot be linked to the same user. The proposed system is evaluated on the RSR2015 databases. It outperforms the basic GMM system and experimentations show significant improvement in the speaker verification performance that achieves an Equal Error Rate (ERR) of 0.01%.

Methodologies of Audio-Visual Biometric Performance Evaluation for the H2020 SpeechXRays Project
2020 5th International Conference on Advanced Technologies for Signal and Image Processing (ATSIP)
Biometric recognition is nowadays widely used in different services and applications, making the ... more Biometric recognition is nowadays widely used in different services and applications, making the user authentication easier and more secure than the traditional authentication system. Starting from this idea, the EU SpeechXRays project H2020 developed and evaluated in real-life environments a user recognition platform based on face and voice modalities. Since the proposed biometric solution was evaluated in real-life environments where biometric data recorded was not accessible because of the General Data Protection Regulation GDPR, the ground truth of the conducted evaluation was not available. To correctly report the performance evaluation, some methodologies were proposed to detect the errors caused by the absence of ground truth. This paper describes the biometric solution provided by the project and presents the biometric performance evaluation carried out in three real-life use case pilots on more than 2 000 users.
i-Vectors for Language Dialect Recognition
HAL (Le Centre pour la Communication Scientifique Directe), Dec 1, 2015

Voice characteristics from isolated rapid eye movement sleep behavior disorder to early Parkinson's disease
Parkinsonism & Related Disorders, 2022
Speech disorders are amongst the first symptoms to appear in Parkinson's disease (PD). We aim... more Speech disorders are amongst the first symptoms to appear in Parkinson's disease (PD). We aimed to characterize PD voice signature from the prodromal stage (isolated rapid eye movement sleep behavior disorder, iRBD) to early PD using an automated acoustic analysis and compare male and female patients. We carried out supervised learning classifications to automatically detect patients using voice only. Speech samples were acquired in 256 French speakers (117 participants with early PD, 41 with iRBD, and 98 healthy controls), with a professional quality microphone, a computer microphone and their own telephone. High-level features related to prosody, phonation, speech fluency and rhythm abilities were extracted. Group analyses were performed to determine the most discriminant features, as well as the impact of sex, vocal tasks, and microphone type. These speech features were used as inputs of a support vector machine and were combined with classifiers using low-level features. PD related impairments were found in prosody, pause durations and rhythmic abilities, from the prodromal stage. These alterations were more pronounced in men than in women. Early PD detection was achieved with a balanced accuracy of 89% in males and 70% in females. Participants with iRBD were detected with a balanced accuracy of 63% (reaching 70% in the subgroup with mild motor symptoms). This study provides new insight in the characterization of sex-dependent early PD speech impairments, and demonstrates the valuable benefit of including automated voice analysis in future diagnostic procedures of prodromal PD.

Multi-Channel Biometrics for eHealth Combining Acoustic and Machine Vision Analysis of Speech, Lip Movement and Face: a Case Study
2019 IEEE International Conference on Imaging Systems and Techniques (IST), 2019
The purpose of this work is to present a solution combining user-friendliness and cost-effectiven... more The purpose of this work is to present a solution combining user-friendliness and cost-effectiveness use of audio (speech) & visual (video/image) biometrics, for eHealth, able to achieve better accuracy and increase the ability to avoid counterfeiting. This work shows the evaluation results for an eHealth pilot study that tested the security, privacy, usability and cost-effective features of a user authentication platform for the management of sensitive heterogeneous multi-scale medical data (i.e. medical imaging such as MRI/CT scans, physical reports, and laboratory results), through easy acquisition of biometric data via laptops, and tablets equipped with cameras and microphones. Regarding the user enrollment and verification, audio-visual biometric information from an individual is captured, processed and stored as a biometric template. In subsequent uses, biometric information is captured and compared with the biometric templates. If the comparison is successful the verified user could be allowed to sign in to a medical collaboration platform of the hospitals infrastructure. In this work we present the biometric platform developed, the testing methodology and the administrative framework and legal processes, related to GDPR, for the eHealth pilot study and the results from our quantitative and qualitative analysis that was performed.

ArXiv, 2021
This paper outlines the EMPATHIC Research & Innovation project, which aims to research, innovate,... more This paper outlines the EMPATHIC Research & Innovation project, which aims to research, innovate, explore and validate new interaction paradigms and platforms for future generations of Personalized Virtual Coaches to assist elderly people living independently at and around their home. Innovative multimodial face analytics, adaptive spoken dialogue systems, and natural language interfaces are part of what the project investigates and innovates, aiming to help dependent aging persons and their carers. It will use remote, non-intrusive technologies to extract physiological markers of emotional states and adapt respective coach responses. In doing so, it aims to develop causal models for emotionally believable coach-user interactions, which shall engage elders and thus keep off loneliness, sustain health, enhance quality of life, and simplify access to future telecare services. Through measurable end-user validations performed in Spain, Norway and France (and complementary user evaluati...

First time encounters with Roberta : a humanoid assistant for conversational autobiography creation
During eNTERFACE we developed a dialog system design and conversation material for Roberta, an an... more During eNTERFACE we developed a dialog system design and conversation material for Roberta, an anthropomorphic assistant robot. The focus was on the first stage of what we call LifeLine dialogs, i.e. the conversational creation of users’ life stories. Our goal is to help senior citizens record semiautobiographical narratives while combating the deterioration of memory and speech abilities. We successfully completed modelling dialog scenarios for first time users. This allows Roberta to personalize future conversations based on each user’s place of origin, work and education history, and hobbies, which are all information gathered during a user’s first conversation with Roberta. We accomplished this through (1) an adaptable dialog system with topic management and multi-modal functionalities (specifically face recognition), by extending a RavenClaw-type dialog management framework, (2) using the Wizard of Oz (WOZ) data collection technique for categorizing introductory conversation ma...

IRIM - Indexation et Recherche d'Information Multimedia GDR-ISIS
The IRIM group is a consortium of French teams working on Multimedia Indexing and Retrieval. This... more The IRIM group is a consortium of French teams working on Multimedia Indexing and Retrieval. This paper describes its participation to the TRECVID 2012 semantic indexing and instance search tasks. For the semantic indexing task, our approach uses a six-stages processing pipelines for computing scores for the likelihood of a video shot to contain a target concept. These scores are then used for producing a ranked list of images or shots that are the most likely to contain the target concept. The pipeline is composed of the following steps: descriptor extraction, descriptor optimization, classification, fusion of descriptor variants, higher-level fusion, and re-ranking. We evaluated a number of different descriptors and tried dierent fusion strategies. The best IRIM run has a Mean Inferred Average Precision of 0.2378, which ranked us 4th out of 16 participants. For the instance search task, our approach uses two steps. First individual methods of participants are used to compute simil...
Uploads
Papers by Dijana Petrovska-delacrétaz