Papers by pedro v. contreras
4th International Conference on Intelligent Environments (IE 08), 2008
We present a prototype multi-agent system whose goal is to support a 3D application for e-retaili... more We present a prototype multi-agent system whose goal is to support a 3D application for e-retailing. The prototype demonstrates how the use of agent environments can be amongst the most promising and flexible approaches to engineer e-retailing applications. We illustrate this point by showing how the agent environment GOLEM uses semantic web concepts to develop the e-retailing application. In this context we describe the features of GOLEM that allow a user to become an avatar and explore the environment by searching and dynamically discovering new products and services.
Martrat, J. Mee, A. 3, 57 157 199 Miranda, S. Murtagh, F

Peer-to-Peer Information Access and Retrieval
Ingénierie des systèmes d'information, 2005
ABSTRACT: In this article the peer-to-peer paradigm is discussed as a means for the provision of ... more ABSTRACT: In this article the peer-to-peer paradigm is discussed as a means for the provision of complex information access and retrieval systems. A review of peer-to-peer technologies precedes a discussion of their potential to provide the basis from which a rich information service may be built. The benefits of such a system and the problems inherent to peer-to-peer approaches are discussed. The peer-to-peer development platform, JXTA, and its use in the implementation of an image database retrieval system is highlighted. To provide richer information services a system should be aware of its context. Context awareness in sensory information systems is used as a means to model behaviour that will lead to the development of other context aware information systems. To demonstrate the potential of a context aware peer-to-peer system a case study of an active camera network is included. KEYWORDS: peer-to-peer, P2P, context awareness, information systems
Understanding how search engines work
Stop List is the list or set of stop words, there are as many stop lists as there are languages. ... more Stop List is the list or set of stop words, there are as many stop lists as there are languages. Ie if a system processes text that includes English, French, German and Spanish it also should be a stop list for each of these languages.
Working with Databases and Java
Most frequent SQL data transaction are: Insert, used to insert rows into a table Delete, used to ... more Most frequent SQL data transaction are: Insert, used to insert rows into a table Delete, used to delete rows in a table Update, used to modify values in a existing table Select, used to retrieve data From, used to indicate from which tables the data will be taken Where, used ...
The main goal of this project is to identify new mathematical approaches to best match and proxim... more The main goal of this project is to identify new mathematical approaches to best match and proximity searching with particular application to very large data stores. In this dissertation I develop a new, linear time hierarchical clustering algorithm and I validate it in a wide range of cases.

In this paper, we analyze the preservation of original semantic similarity among objects when dim... more In this paper, we analyze the preservation of original semantic similarity among objects when dimensional reduction is applied on the original data source and a further clustering process is performed on dimensionally reduced data. An experiment is designed to test Baire, or longest common prefix ultrametric, and K-Means when prior random projection is applied. A data matrix extracted from a cultural heritage database has been prepared for the experiment. Given that the random projection produces a vector with components ranging on the interval [0, 1], clusters are obtained at different precision levels. Next, the mean semantic similarity of clusters is calculated using a modified version of the Jaccard index. Our findings show that semantics is difficult to preserve by these methods. However, a Student's hypothesis test on mean similarity indicates that Baire clusters objects are semantically better than K-Means when we increase the digit precision, but paying an increasing cos...
Electronic Workshops in Computing, 2005
We present the concept of the Virtual Observatory through the AstroGrid project, which is a Virtu... more We present the concept of the Virtual Observatory through the AstroGrid project, which is a Virtual Organization making intensive use of the Internet to produce a working data-grid for Astronomical applications. AstroGrid has implemented a series of tools which facilitate collaborative work such as news, forum and discussion pages, all of them inspired by the open source and licensing philosophy. Additionally we explore how such an Astronomical Grid project can be used in the future as an e-Learning service platform and as a tool for a much wider audience than researchers and astronomers, such as students and the wider public interested in this area.
P-Adic Numbers, Ultrametric Analysis, and Applications, 2012
We describe many vantage points on the Baire metric and its use in clustering data, or its use in... more We describe many vantage points on the Baire metric and its use in clustering data, or its use in preprocessing and structuring data in order to support search and retrieval operations. In some cases, we proceed directly to clusters and do not directly determine the distances. We show how a hierarchical clustering can be read directly from one pass through the data. We offer insights also on practical implications of precision of data measurement. As a mechanism for treating multidimensional data, including very high dimensional data, we use random projections.
Intelligent Systems Reference Library, 2012
Data analysis and data mining are concerned with unsupervised pattern finding and structure deter... more Data analysis and data mining are concerned with unsupervised pattern finding and structure determination in data sets. "Structure" can be understood as symmetry and a range of symmetries are expressed by hierarchy. Such symmetries directly point to invariants, that pinpoint intrinsic properties of the data and of the background empirical domain of interest. We review many aspects of hierarchy here, including ultrametric topology, generalized ultrametric, linkages with lattices and other discrete algebraic structures and with p-adic number representations. By focusing on symmetries in data we have a powerful means of structuring and analyzing massive, high dimensional data stores. We illustrate the powerfulness of hierarchical clustering in case studies in chemistry and finance, and we provide pointers to other published case studies.
Science: Image in Action - Proceedings of the 7th International Workshop on Data Analysis in Astronomy Livio Scarsi and Vito DiGesù, 2011
The Baire metric induces an ultrametric on a dataset and is of linear computational complexity, c... more The Baire metric induces an ultrametric on a dataset and is of linear computational complexity, contrasted with the standard quadratic time agglomerative hierarchical clustering algorithm. We apply the Baire distance to spectrometric and photometric redshifts from the Sloan Digital Sky Survey using, in this work, about half a million astronomical objects. We want to know how well the (more costly to determine) spectrometric redshifts can predict the (more easily obtained) photometric redshifts, i.e. we seek to regress the spectrometric on the photometric redshifts, and we develop a clusterwise nearest neighbor regression procedure for this.
Linear time Baire hierarchical clustering for enterprise information retrieval
Studies in Classification, Data Analysis, and Knowledge Organization, 2010
The Baire or longest common prefix ultrametric allows a hierarchy, a multiway tree, or ultrametri... more The Baire or longest common prefix ultrametric allows a hierarchy, a multiway tree, or ultrametric topology embedding, to be constructed very efficiently. The Baire distance is a 1-bounded ultrametric. For high dimensional data, one approach for the use of the Baire distance is to base the hierarchy construction on random projections. In this paper we use the Baire distance on the Sloan Digital Sky Survey (SDSS, http://www.sdss.org) archive. We are addressing the regression of (high quality, more costly to collect) spectroscopic and (lower quality, more readily available) photometric redshifts. Nonlinear regression is used for mapping photometric and astrometric redshifts.
WIREs Data Mining and Knowledge Discovery, 2011
We survey agglomerative hierarchical clustering algorithms and discuss efficient implementations ... more We survey agglomerative hierarchical clustering algorithms and discuss efficient implementations that are available in R and other software environments. We look at hierarchical self‐organizing maps, and mixture models. We review grid‐based clustering, focusing on hierarchical density‐based approaches. Finally, we describe a recently developed very efficient (linear time) hierarchical clustering algorithm, which can also be viewed as a hierarchical grid‐based algorithm. © 2011 Wiley Periodicals, Inc.This article is categorized under: Algorithmic Development > Hierarchies and Trees Technologies > Structure Discovery and Clustering

Opto-Ireland 2002: Optical Metrology, Imaging, and Machine Vision, 2003
Information navigation and search on the part of a user requires thorough description of the info... more Information navigation and search on the part of a user requires thorough description of the information content of signal and image datasets and archives. Large signal and image databases need comprehensive metadata to facilitate user access. There is no unique way to describe the semantics of images and signals. Therefore a conceptual model serves as an initial platform. From the conceptual model, a database design can be derived, or a definition of metadata. The different steps from model to description can benefit from tools such as the Unified Modeling Language (UML) for the conceptual model, standard Entity/Relationship (ER) models for database design, and eXtensible Markup Language (XML) for metadata description. As examples of the process of conceptual design and semantic description, we consider the case of a signal database, and the case of astronomical image databases.
Artificial Intelligence Review - AIR, 2003
Following a short survey of input data types onwhich to construct interactive visual userinterfac... more Following a short survey of input data types onwhich to construct interactive visual userinterfaces, we report on a new and recentimplementation taking concept hierarchies asinput data. The visual user interfacesexpress domain ontologies which are based onthese concept hierarchies. We detail aweb-based implementation, and show examples ofusage. An appendix surveys related systems,many of them commercial.

SIAM Journal on Scientific Computing, 2008
Coding of data, usually upstream of data analysis, has crucial implications for the data analysis... more Coding of data, usually upstream of data analysis, has crucial implications for the data analysis results. By modifying the data codingthrough use of less than full precision in data values-we can aid appreciably the effectiveness and efficiency of the hierarchical clustering. In our first application, this is used to lessen the quantity of data to be hierarchically clustered. The approach is a hybrid one, based on hashing and on the Ward minimum variance agglomerative criterion. In our second application, we derive a hierarchical clustering from relationships between sets of observations, rather than the traditional use of relationships between the observations themselves. This second application uses embedding in a Baire space, or longest common prefix ultrametric space. We compare this second approach, which is of O(n log n) complexity, to k-means.

Journal of Classification, 2012
The Baire metric induces an ultrametric on a dataset and is of linear computational complexity, c... more The Baire metric induces an ultrametric on a dataset and is of linear computational complexity, contrasted with the standard quadratic time agglomerative hierarchical clustering algorithm. In this work we evaluate empirically this new approach to hierarchical clustering. We compare hierarchical clustering based on the Baire metric with (i) agglomerative hierarchical clustering, in terms of algorithm properties; (ii) generalized ultrametrics, in terms of definition; and (iii) fast clustering through kmeans partititioning, in terms of quality of results. For the latter, we carry out an in depth astronomical study. We apply the Baire distance to spectrometric and photometric redshifts from the Sloan Digital Sky Survey using, in this work, about half a million astronomical objects. We want to know how well the (more costly to determine) spectrometric redshifts can predict the (more easily obtained) photometric redshifts, i.e. we seek to regress the spectrometric on the photometric redshifts, and we use clusterwise regression for this.
cs.rhul.ac.uk
Years ago the IT community was excited over XML standards that opened avenues for a completely ne... more Years ago the IT community was excited over XML standards that opened avenues for a completely new type of interoperability of software across networks. The dream of each individual using the other's applications to develop new ad-hoc services ...
Uploads
Papers by pedro v. contreras