Papers by Ricardo Rodrigues Ciferri

Lecture Notes in Computer Science, 2011
Drill-across SOLAP queries (spatial OLAP queries) allow for strategic decision-making through the... more Drill-across SOLAP queries (spatial OLAP queries) allow for strategic decision-making through the use of numeric measures from distinct fact tables that share dimensions and by the evaluation of spatial predicates. Despite the importance of these queries in geographic data warehouses (GDWs), there is a lack of research aimed at their study. In this paper, we investigate three challenging aspects related to the efficient processing of drill-across SOLAP queries over GDWs: (i) the design of a GDW schema to enable the performance evaluation of drill-across SOLAP query processing; (ii) the definition of classes of drill-across SO-LAP queries to be issued over the proposed GDW schema; and (iii) the analysis of different approaches to process drill-across SOLAP queries, as follows: star-join computation, materialized views and a new proposed approach based on the SB-index, which is named DrillAcrossSB. We conclude that the DrillAcrossSB approach highly speedups the processing of drill-across SOLAP queries from 39% up to 98%.

This paper describes an experiment performed using different approaches for spatial data clusteri... more This paper describes an experiment performed using different approaches for spatial data clustering, aiming to assist the delineation of management classes in Precision Agriculture (PA). These approaches were established from the partitional clustering algorithm Fuzzy c-Means (FCM), traditionally used in this context, and from the hierarchical clustering algorithm HACC-Spatial, especially designed for this PA task. We also performed experiments using traditional ensembles approaches from the literature, evaluating their behavior to achieve consensus solutions from individual clusterings obtained from features splitting or running one of the abovementioned algorithms. Results showed some differences between FCM and HACC-Spatial, mainly for the visu-alization of management classes in the form of maps. Considering the consensus clusterings provided by ensembles, it became clear the attempt to achieve an agreement result that most closely matches the original clusterings, showing us som...
International Journal of Data Warehousing and Mining
A cloud data warehouse (cloud DW) is a subject-oriented, integrated, time-variant, voluminous, no... more A cloud data warehouse (cloud DW) is a subject-oriented, integrated, time-variant, voluminous, nonvolatile and multidimensional distributed database that is hosted in a cloud. A solution to ensure data confidentiality for a cloud DW is cryptography. In this article, the authors propose an encryption methodology for a cloud DW stored according to the star schema, considering both the data confidentiality maintenance of the DW and the capability of processing analytical queries directly over the encrypted DW. The proposed encryption methodology comprises an encryption strategy for DW called MV-HO (MultiValued and HOmomorphic) for the definition of how the different types of…
Faster Cloud Star Joins with Reduced Disk Spill and Network Communication
Procedia Computer Science, 2016
nsP-index: A Robust and Persistent Index for Nucleotide Sequences
Adbis, 2008
Fifus
Proceedings of the 23rd SIGSPATIAL International Conference on Advances in Geographic Information Systems - GIS '15, 2015

Is, 2011
Searching in a dataset for elements that are similar to a given query element is a core problem i... more Searching in a dataset for elements that are similar to a given query element is a core problem in applications that manage complex data, and has been aided by metric access methods (MAMs). A growing number of applications require indices that must be built faster and repeatedly, also providing faster response for similarity queries. The increase in the main memory capacity and its lowering costs also motivate using memory-based MAMs. In this paper, we propose the Onion-tree, a new and robust dynamic memorybased MAM that slices the metric space into disjoint subspaces to provide quick indexing of complex data. It introduces three major characteristics: (i) a partitioning method that controls the number of disjoint subspaces generated at each node; (ii) a replacement technique that can change the leaf node pivots in insertion operations; and (iii) range and k-NN extended query algorithms to support the new partitioning method, including a new visit order of the subspaces in k-NN queries. Performance tests with both real-world and synthetic datasets showed that the Onion-tree is very compact. Comparisons of the Onion-tree with the MM-tree and a memory-based version of the Slim-tree showed that the Onion-tree was always faster to build the index. The experiments also showed that the Onion-tree significantly improved range and k-NN query processing performance and was the most efficient MAM, followed by the MM-tree, which in turn outperformed the Slim-tree in almost all the tests.
A performance comparison among the traditional R-trees, the hilbert R-tree and the SR-tree
23rd International Conference of the Chilean Computer Science Society, 2003. SCCC 2003. Proceedings., 2003
This work investigates the performance of several spatial access methods with respect to the dist... more This work investigates the performance of several spatial access methods with respect to the distribution of the indexed spatial objects. Although having gathered storage and insertion costs as well this work focuses on some issues regarding query costs. The performance results have showed that the R+-tree was the best spatial index structure for the point queries and the enclosure range
In this paper we propose two algorithms aimed at increasing the performance of drill-across queri... more In this paper we propose two algorithms aimed at increasing the performance of drill-across queries in data warehousing environments. The JM-G algorithm is used when numeric measures of different data warehouses are usually required together. The EG-JM algorithm extends the first one also considering storage requirements. The performance tests carried out using the TPC-H benchmark showed a huge improvement on the query performance with a reduced additional storage requirement.
This technical report aims to present the main characteristics of the Sickle Cell Anemia (SCA), a... more This technical report aims to present the main characteristics of the Sickle Cell Anemia (SCA), a hereditary and hematological disease, non-contagious, incurable and whose complications are treatable. It also discusses the emergence of the disease and the number of people affected in the United States and Brazil. The main information about SCA is highlighted such as symptoms, treatments and effects. The drug hydroxyurea is cited in the literature as a successful medicine in patients with SCA in order to alleviate the recurrent pain crises. The objective of this report is to provide material for teachers, students, researchers and people interested in the SCA to know the causes and consequences of this disease that affects millions of people and is considered a public health problem in Brazil.
RESUMO. Este artigo apresenta um estudo sobre a técnica de benchmark aplicada no contexto de anál... more RESUMO. Este artigo apresenta um estudo sobre a técnica de benchmark aplicada no contexto de análise de desempenho e efetua uma análise comparativa entre os principais benchmarks voltados à análise de desempenho de SGBDOOs, a saber: OO1 Benchmark, HyperModel Benchmark e OO7 Benchmark. Estes são avaliados quanto ao conjunto de transações, à caracterização dos dados que compõem o banco de dados, ao modelo de execução, à facilidade de implementação e à sua representatividade para aplicações orientadas a objetos. Palavras-chave: banco de dados, orientação a objetos, análise de desempenho, Benchmark.
Centro de Informática – Universidade Federal de Pernambuco – Recife, PE – Brasil
Geographical Data Warehouses (GDW) are one of the main technologies used in decision-making proce... more Geographical Data Warehouses (GDW) are one of the main technologies used in decision-making processes and spatial analysis. For these, several conceptual and logical data models have been proposed in the literature. However, little attention has been devoted to the study of how spatial data redundancy affects query performance over GDW. In this paper, we investigate this issue. Firstly, we compare redundant and non-redundant GDW schemas and conclude that redundancy is related to high performance losses. Further, we analyze the indexing issue, aiming at improving query performance on a redundant GDW. Comparisons among the SB-index approach, the star-join aided by R-tree and the star-join aided by GiST showed that SB-index significantly improves the elapsed time on query processing from 25 % up to 95%.
Spatial Data Warehouses (SDWs) enable the simultaneous processing of multidimensional queries and... more Spatial Data Warehouses (SDWs) enable the simultaneous processing of multidimensional queries and spatial analysis. In the literature, little attention has been devoted to the development of benchmarks for analyzing the performance of query processing over SDWs. In this paper, we propose a novel benchmark, called Spatial SSB, designed specifically to perform controlled experimental performance evaluation of SDWs environments. The Spatial SSB proposes a non-redundant SDW schema and controls: the generation of data, the query selectivity and the data distribution in the extent. In addition, the Spatial SSB provides the increase of the data volume, varies the complexity of spatial objects geometries, and generates a certain number of objects that intersect an ad hoc spatial query window.
ADI-Minebio: A Graph Mining Algorithm for Biomedical Data
Abstract Graph mining is concerned with mining frequent subgraph patterns over a collection of gr... more Abstract Graph mining is concerned with mining frequent subgraph patterns over a collection of graphs, aiming to find novel and useful knowledge. It has being used to analyze data from different domains, sometimes using algorithms tailored for a specific area of ...

A data warehouse is a solution for organizing and storing multidimensional data related to decisi... more A data warehouse is a solution for organizing and storing multidimensional data related to decision-making processes in companies, generating a historical, highly voluminous, subject-oriented and nonvolatile database. A geographic data warehouse (GDW) stores spatial data (represented by crisp geometries) as attributes in dimension tables or as measures in fact tables. Thus, spatial data have exact location in the space and well-defined boundaries. However, modern geographic applications require the storage of vague spatial data, which have inaccurate location or uncertain boundaries. This master's project aims at incorporating vague spatial data to GDWs. More specifically, we address the implementation of a new abstract data type (ADT) called VagueGeometry to represent vague spatial data in the Spatial Database Management System PostgreSQL with the PostGIS extension. The proposal of the ADT VagueGeometry encompasses the issue of physical storage and the management of vague spati...

A similarity-based data warehousing environment for medical images
Computers in Biology and Medicine, 2015
A core issue of the decision-making process in the medical field is to support the execution of a... more A core issue of the decision-making process in the medical field is to support the execution of analytical (OLAP) similarity queries over images in data warehousing environments. In this paper, we focus on this issue. We propose imageDWE, a non-conventional data warehousing environment that enables the storage of intrinsic features taken from medical images in a data warehouse and supports OLAP similarity queries over them. To comply with this goal, we introduce the concept of perceptual layer, which is an abstraction used to represent an image dataset according to a given feature descriptor in order to enable similarity search. Based on this concept, we propose the imageDW, an extended data warehouse with dimension tables specifically designed to support one or more perceptual layers. We also detail how to build an imageDW and how to load image data into it. Furthermore, we show how to process OLAP similarity queries composed of a conventional predicate and a similarity search predicate that encompasses the specification of one or more perceptual layers. Moreover, we introduce an index technique to improve the OLAP query processing over images. We carried out performance tests over a data warehouse environment that consolidated medical images from exams of several modalities. The results demonstrated the feasibility and efficiency of our proposed imageDWE to manage images and to process OLAP similarity queries. The results also demonstrated that the use of the proposed index technique guaranteed a great improvement in query processing.

Spatial data warehouses and spatial OLAP come towards the cloud: design and performance
Distributed and Parallel Databases, 2015
ABSTRACT Cloud computing systems handle large volumes of data by using almost unlimited computati... more ABSTRACT Cloud computing systems handle large volumes of data by using almost unlimited computational resources, while spatial data warehouses (SDWs) are multidimensional databases that store huge volumes of both spatial data and conventional data. Cloud computing environments have been considered adequate to host voluminous databases, process analytical workloads and deliver database as a service, while spatial online analytical processing (spatial OLAP) queries issued over SDWs are intrinsically analytical. However, hosting a SDW in the cloud and processing spatial OLAP queries over such database impose novel obstacles. In this article, we introduce novel concepts as cloud SDW and spatial OLAP as a service, and afterwards detail the design of novel schemas for cloud SDW and spatial OLAP query processing over cloud SDW. Furthermore, we evaluate the performance to process spatial OLAP queries in cloud SDWs using our own query processor aided by a cloud spatial index. Moreover, we describe the cloud spatial bitmap index to improve the performance to process spatial OLAP queries in cloud SDWs, and assess it through an experimental evaluation. Results derived from our experiments revealed that such index was capable to reduce the query response time from 58.20 up to 98.89 %.
MetricSPlat - A platform for quick development, testing and visualization of content-based retrieval techniques
The development and testing of content-based data retrieval systems is a time-consuming task. Ove... more The development and testing of content-based data retrieval systems is a time-consuming task. Over the concept of metric space, such systems must integrate the three factors that define an indexing environment. These factors are features extraction, metric structures and distance functions, not to mention a suitable user interface. This integration deviates the work from the real focus of research, suppressing quick experimentation of ideas. In this context, we present the Metric Space Platform (MetricSPlat), a system designed for contentbased retrieval enabled with plug-in features. With minimal effort, MetricSPlat substantially speeds up the experimentation of new techniques by providing a well-defined framework aided with interactive data visualization techniques.
Uploads
Papers by Ricardo Rodrigues Ciferri