Papers by Benjamin Nguyen
Model, Design and Construction of a Service-Oriented Web-Warehouse
We propose a new methodology, a language and tools for the design and construction of Web data wa... more We propose a new methodology, a language and tools for the design and construction of Web data ware- houses. Our approach is Service Oriented, in that our framework makes an extensive use of Web Ser- vices and semi-structured data (XML) to define the data structures, the services and the connections be- tween them. We present an experimental version of the
Journées Bases de Données Avancées, 2002
In this article, we examine the problem of constructing a temporal data warehouse using web servi... more In this article, we examine the problem of constructing a temporal data warehouse using web services. There are many important aspects in the construction of such a warehouse. Our particular contribution in this article regards the global architecture of a system that can (i) acquire specific pages from the web (ii) control page changes (iii) easily be enhanced using various
Computing Research Repository, 2010
In this paper, we present a method and a tool for deriving a skeleton of an ontology from XML sch... more In this paper, we present a method and a tool for deriving a skeleton of an ontology from XML schema files. We first recall what an is ontology and its relationships with XML schemas. Next, we focus on ontology building methodology and associated tool requirements. Then, we introduce Janus, a tool for building an ontology from various XML schemas in
In this article, we describe a novel application of XML and W eb based technologies: a sociologic... more In this article, we describe a novel application of XML and W eb based technologies: a sociological study of the W3C standardization process. We propose a new methodology and tools, to be used by sociologists to study the standardization process, illustrated by the W3C XQuery Working Group. The novelty of our approach has many facets. Information Technology (IT) has received
DELOS Workshops, 2000
We consider a query subscription system that can provide users with information about web changes... more We consider a query subscription system that can provide users with information about web changes that inter- est them. We present a query subscription language and a system that combines monitoring of page changes and continuous queries, i.e., queries that are evaluated regularly.

Pluggable Personal Data Server
An increasing amount of personal data is automatically gathered on servers by administrations, ho... more An increasing amount of personal data is automatically gathered on servers by administrations, hospitals and private companies while several security surveys highlight the failure of database servers to keep confidential data really private. The advent of powerful secure tokens, combining the security of smart card microcontrollers with the storage capacity of NAND Flash chips, introduces a credible alternative to the systematic centralization of personal data. By embedding a full-fledged database server in such device, an individual can now store her personal data in her own secure token, kept under her control, and never disclose in clear her private data to the outside untrusted world. This demonstration shows the benefit of the proposed approach in terms of privacy protection and pervasiveness through a healthcare scenario. This scenario is extracted from a field experiment where medical folders embedded in secure tokens are used to improve the coordination of medical care at ho...

An increasing amount of personal data is automatically gathered and stored on servers by administ... more An increasing amount of personal data is automatically gathered and stored on servers by administrations, hospitals, insurance companies, etc. Citizen themselves often count on internet companies to store their data and make them reliable and highly available through the internet. However, these benefits must be weighed against privacy risks incurred by centralization. This paper suggests a radically different way of considering the management of personal data. It builds upon the emergence of new portable and secure devices combining the security of smart cards and the storage capacity of NAND Flash chips. By embedding a full-fledged Personal Data Server in such devices, user control of how her sensitive data is shared by others (by whom, for how long, according to which rule, for which purpose) can be fully reestablished and convincingly enforced. To give sense to this vision, Personal Data Servers must be able to interoperate with external servers and must provide traditional data...
Serveurs Personnels Sécurisés de Données

How do you keep a secret about your personal life in an age where your daughter’s glasses record ... more How do you keep a secret about your personal life in an age where your daughter’s glasses record and share everything she senses, your wallet records and shares your financial transactions, and your set-top box records and shares your family’s energy consumption? Your personal data has become a prime asset for many companies around the Internet, but can you avoid -- or even detect -- abusive usage? Today, there is a wide consensus that individuals should have increased control on how their personal data is collected, managed and shared. Yet there is no appropriate technical solution to implement such personal data services: centralized solutions sacrifice security for innovative applications, while decentralized solutions sacrifice innovative applications for security. In this paper, we argue that the advent of secure hardware in all personal IT devices, at the edges of the Internet, could trigger a sea change. We propose the vision of trusted cells: personal data servers running on...
XLive 2P : Intégration Dynamique et Optimisation dans un Médiateur Pair-à-Pair

2012 Tenth Annual International Conference on Privacy, Security and Trust, 2012
Application forms are often used by companies and administrations to collect personal data about ... more Application forms are often used by companies and administrations to collect personal data about applicants and tailor services to their specific situation. For example, taxes rates, social care, or personal loans, are usually calibrated based on a set of personal data collected through application forms. In the eyes of privacy laws and directives, the set of personal data collected to achieve a service must be restricted to the minimum necessary. This reduces the impact of data breaches both in the interest of service providers and applicants. In this article, we study the problem of limiting data collection in those application forms, used to collect data and subsequently feed decision making processes. In practice, the set of data collected is far excessive because application forms are filled in without any means to know what data will really impact the decision. To overcome this problem, we propose a reverse approach, where the set of strictly required data items to fill in the application form can be computed on the user's side. We formalize the underlying NP Hard optimization problem, propose algorithms to compute a solution, and validate them with experiments. Our proposal leads to a significant reduction of the quantity of personal data filled in application forms while still reaching the same decision.
On Indexing Multidimensional Values In A P2P Architecture
Journées Bases de Données Avancées, 2006
In this paper we present the state of advancement of the French ANR WebStand project. The objecti... more In this paper we present the state of advancement of the French ANR WebStand project. The objective of this project is to construct a customizable XML based warehouse platform to acquire, transform, analyze, store, query and export data from the web, in particular mailing lists, with the final intension of using this data to perform sociological studies focused on social groups of World Wide Web, with a specific emphasis on the temporal aspects of this data. We are currently using this system to analyze the standardization process of the W3C, through its social network of standard setters. Comment: W3C Workshop on the Future of Social Networking
On Indexing Multidimensional Data In A P2P Context
Based on their remarkable properties (fault tol-erance, scalability, decentralization), P2P sys-t... more Based on their remarkable properties (fault tol-erance, scalability, decentralization), P2P sys-tems tend to be largely accepted as a common support for deploying massively distributed data management applications. Some of the existing P2P systems are built over hybrid ...
The goal of Privacy-Preserving Data Publishing (PPDP) is to generate a sanitized (i.e. harmless) ... more The goal of Privacy-Preserving Data Publishing (PPDP) is to generate a sanitized (i.e. harmless) view of sensitive personal data (e.g. a health survey), to be released to some agencies or simply the public. However, traditional PPDP practices all make the assumption that the process is run on a trusted central server. In this article, we argue that the trust assumption on the central server is far too strong. We propose Met A P, a generic fully distributed protocol, to execute various forms of PPDP algorithms on an asymmetric architecture composed of low power secure devices and a powerful but untrusted infrastructure. We show that this protocol is both correct and secure against honest-but-curious or malicious adversaries. Finally, we provide an experimental validation showing that this protocol can support PPDP processes scaling up to nation-wide surveys.
Data Warehousing: Analysing Web Data Application to the Study of the W3C Standardization Process
Sets of Pages of Interest
Building an active content warehouse
Non-quantitative content represents a large part of the information available nowadays, such as W... more Non-quantitative content represents a large part of the information available nowadays, such as Web pages, e-mails, metadata about photos, etc. In order to manage this new type of information, we introduce the concept of content warehousing, the management of ...
THESUS: Effective thematic selection and organization of web document collections based on link semantics
Uploads
Papers by Benjamin Nguyen