Papers by Alejandro Vaisman
This report aims to give a comprehensive introduction to the subjects of Data Warehousing and OLA... more This report aims to give a comprehensive introduction to the subjects of Data Warehousing and OLAP. It also gives an overview of the related topic of Materialized Views. Materialized Views become important when trying to improve the performance of an OLAP system. The main intention is to describe the state of the art in these fields in order to identify research opportunities. Topics covered here are: architecture, design, querying, optimization and implementation. Relevant research papers are commented, and some commercial products reviewed, mainly to remark their differences. ROLAP and MOLAP are also analyzed and, finally, an extensive bibliography is presented in the references.

A generic data model and query language for spatiotemporal OLAP cube analysis
Proceedings of the 15th International Conference on Extending Database Technology - EDBT '12, 2012
ABSTRACT Nowadays, organizations need to use OLAP (On Line Analytical Processing) tools together ... more ABSTRACT Nowadays, organizations need to use OLAP (On Line Analytical Processing) tools together with geographical information. To support this, the notion of SOLAP (Spatial OLAP) arouse, aimed at exploring spatial data in the same way as OLAP operates over tables. SOLAP however, only accounts for discrete spatial data. More sophisticated GIS-based decision support systems are increasingly being needed, to handle more complex types of data, like continuous fields. Fields describe physical phenomena that change continuously in time and/or space (e.g., temperature). Although many models have been proposed for adding spatial information to OLAP tools, no one allows the user to perceive data as a cube, and analyze any type of spatial data, continuous or discrete, together with typical alphanumerical discrete OLAP data, using only the classic OLAP operators (e.g., Roll-up, Drill-down). In this paper we propose an algebra that operates over data cubes, independently of the underlying data types and physical data representation. That means, in our approach, the final user only sees the typical OLAP operators at the query level. At lower abstraction levels we provide discrete and continuous spatial data support as well as different ways of partitioning the space. We also describe a proof-of-concept implementation to illustrate the ideas presented in the paper. As far as we are aware of, this is the first proposal that allows analyzing discrete and continuous spatiotemporal data and OLAP cubes together, using just the traditional OLAP operations, thus providing a very general framework for spatiotemporal data analysis.
Enhancing Web access using data mining techniques
14th International Workshop on Database and Expert Systems Applications, 2003. Proceedings., 2003
In this paper we study data mining techniques as tools for reducing the time needed to access Web... more In this paper we study data mining techniques as tools for reducing the time needed to access Web pages in the environment of a corporation where users connect to the Internet through a proxy server. We add a data mining server to the traditional Web architecture. This server computes, using sequential patterns, the pages likely to be requested by the
Requirements Elicitation for Decision Support Systems: A Data Quality Approach
ABSTRACT
In the last few years there has been an increasing interest in the socalled e-Science field. This... more In the last few years there has been an increasing interest in the socalled e-Science field. This poses new challenges to the database community, as the variety and distribution of the information involved leads to data representation, management, storage and access problems. In this paper, we study decentralized Peer-to-peer architectures based on the JXTA protocol, as a solution for scientific data sharing and exchange, in particular, applied to biodiversity information, with the purpose of creating a repository maintained by different parties (mainly the community of researchers on the field). Based on this idea, we present an implementation (along with preliminary testing results), introduce a data model and a query language for biodiversity databases, and show that our proposal can be easily extended to other e-Science fields.
Time in Multidimensional Databases
In spite of the obvious importance of time in data warehousing and OLAP, current commercial syste... more In spite of the obvious importance of time in data warehousing and OLAP, current commercial systems do not support tracking the history of a data warehouse, either at the schema or instance level. In this chapter we address this issue, introducing the Temporal Multidimensional Model and a query language, denoted TOLAP, allowing expressing temporal OLAP queries at a high level
Very Large Data Bases, 2004
Dierent models have been proposed recently for representing temporal data, tracking his- torical ... more Dierent models have been proposed recently for representing temporal data, tracking his- torical information, and recovering the state of the document as of any given time, in XML documents. We address the problem of index- ing temporal XML documents. In particular we show that by indexing continuous paths, i.e. paths that are valid continuously during a certain interval in a
Temporal Queries in OLAP
Very Large Data Bases, 2000
Commercial OLAP systems usually consider OLAP dimensions as static entities. In prac-tice, dimens... more Commercial OLAP systems usually consider OLAP dimensions as static entities. In prac-tice, dimension updates are often necessary in order to adapt the multidimensional database to changing requirements. We have already defined a taxonomy for these dimension up- ...

2007 IEEE 23rd International Conference on Data Engineering Workshop, 2007
Moving objects databases (MOD) have been receiving increasing attention from the database communi... more Moving objects databases (MOD) have been receiving increasing attention from the database community in recent years, mainly due to the wide variety of applications that technology allows nowadays. Trajectories of moving objects like cars or pedestrians, can be reconstructed by means of samples describing the locations of these objects at certain points in time. Although there are many proposals for modeling and querying moving objects, only a small part of them address the problem of aggregation of moving objects data in a GIS (Geographic Information Systems) scenario. In previous work we presented a formal model where the geometric components of the thematic layers in a GIS are represented as an OLAP (On Line Analytical Processing) dimension hierarchy, and introduced the notion of spatial aggregation. In this paper we extend this proposal in order to address moving object aggregation over a GIS. In this way, complex aggregate queries can be expressed in an elegant fashion. We present the data model, characterize the kinds of queries that may appear in this scenario, and show how these queries can be expressed as an aggregation over the result given by a first order formula expressing constraints over the geometries of the layers.

Proceedings of the 17th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems - GIS '09, 2009
Data warehouses and On-Line Analytical Processing (OLAP) provide an analysis framework supporting... more Data warehouses and On-Line Analytical Processing (OLAP) provide an analysis framework supporting the decision making process. In many application domains, complex analysis tasks often require to take geographical information into account. Several proposals exist for integrating OLAP and Geographic Information Systems (GIS). However, there are very few attempts to support continuous fields, i.e., phenomena that are perceived as having a value at each point in space and/or time. Examples of such phenomena include temperature, altitude, or land use. In this paper, we extend a conceptual multidimensional model with continuous fields, showing that this can be achieved by defining an appropriate data type that encapsulates the different operations needed for manipulating such fields. We also define a query language based on relational calculus that allows expressing spatial OLAP queries involving continuous fields, and use this language to formally characterize this class of queries.

Proceedings of the 4th ACM international workshop on Data warehousing and OLAP - DOLAP '01, 2001
Enhancing multidimensional database models with aggregation hierarchies allows viewing data at di... more Enhancing multidimensional database models with aggregation hierarchies allows viewing data at different levels of aggregation. Usually, hierarchy instances are represented by means of so-called rollup functions. Rollup between adjacent levels in the hierarchy are given extensionally, while rollups between connected nonadjacent levels are obtained by means of function composition. In many real-life cases, this model cannot capture accurately the meaning of common situations, particularly when exceptions arise. Exceptions may appear due to corporate policies, unreliable data or uncertainty, and their presence may turn the notion of rollup composition unsuitable for representing real relationships in the aggregation hierarchies. In this paper we present a language allowing augmenting traditional extensional rollup functions with intensional knowledge. We denote this language IRAH (Intensional Redefinition for Aggregation Hierarchies). Programs in IRAH consist of intensional rules, which can be regarded as patterns for: (a) overriding natural composition between rollup functions on adjacent levels in the concept hierarchy, (b) canceling the effect of rollup functions for specific values. Our proposal is presented as a stratified default theory. We show that a unique model for the underlying theory always exists, and can be computed in a bottom-up fashion. Finally, we present an algorithm that computes the revised dimension in polynomial time, although under more realistic assumptions, complexity becomes linear on the number of paths in the hierarchy of the dimension instance.
2008 IEEE 24th International Conference on Data Engineering, 2008
The nature of semistructured data in web collections is evolving. Increasingly, XML web documents... more The nature of semistructured data in web collections is evolving. Increasingly, XML web documents (or documents exchanged via web services) are valid with regard to a schema, yet the actual structure of such documents exhibits significant variations across collections for several reasons: the schema is very lax (e.g., RSS feeds), the schema is large and different subsets are used (e.g., industry standards like UBL), or open content models allow arbitrary schemas to be mixed (e.g., RSS extensions like those used for podcasting). Many web development tasks that incorporate XPath queries to process XML documents require an understanding of the actual structure present in the collection.

Proceedings of the 17th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems - GIS '09, 2009
A common problem in moving object databases �MOD) is the reconstruction of a trajectory from a tr... more A common problem in moving object databases �MOD) is the reconstruction of a trajectory from a trajectory sample �i.e., a finite sequence of time-space points). A typical solution to this problem is linear interpolation. A more realistic model is based on the notion of uncertainty modelled by space-time prisms, which capture the positions where the object could have been, when it moved from a to b. Often, object positions measured by location-aware devices are not on a road network. Thus, matching the user's position to a location on the digital map is required. This problem is called map matching. In this paper we study the relation between map matching and uncertainty, and propose an algorithm that combines weighted k-shortest paths with spacetime prisms. We apply this algorithm to two real-world case studies and we show that accounting for uncertainty leads to obtaining more positive matchings.

Lecture Notes in Computer Science, 2010
Although many proposals exist for extending Geographic Information Systems (GIS) with OLAP and da... more Although many proposals exist for extending Geographic Information Systems (GIS) with OLAP and data warehousing capabilities (a topic denoted SOLAP), only recently the importance of supporting continuous fields (i.e., phenomena that are perceived as having a value at each point in space and/or time) has been acknowledged. Examples of such phenomena include temperature, altitude, or land use. In this paper we discuss physical design issues arising when a spatial data warehouse includes a combination of spatial and non-spatial dimensions and measures, and spatio-temporal dimensions representing continuous fields. We give the syntax and semantics of the data types (and their operators) needed to support fields in SOLAP environments, and present an implementation of these types, on top of spatial-SQL. We also show how queries using the spatio-temporal operators for fields are written, parsed, and executed.
Efficient constraint evaluation in categorical sequential pattern mining for trajectory databases
Proceedings of the 12th International Conference on Extending Database Technology Advances in Database Technology - EDBT '09, 2009
The classic Generalized Sequential Patterns (GSP) algorithm re- turns all frequent sequences pres... more The classic Generalized Sequential Patterns (GSP) algorithm re- turns all frequent sequences present in a database. However, usu- ally a few ones are interesting from a user's point of view. Thus, post-processing tasks are required in order to discard uninterest- ing sequences. To avoid this drawback, languages based on regular expressions (RE) were proposed to restrict frequent sequences to the
Proceedings of the 2008 ACM symposium on Applied computing - SAC '08, 2008
We address aggregate queries over GIS data and moving object data, where non-spatial information ... more We address aggregate queries over GIS data and moving object data, where non-spatial information is stored in a data warehouse. We propose a formal data model and query language to express complex aggregate queries. Next, we study the compression of trajectory data, produced by moving objects, using the notions of stops and moves. We show that stops and moves are expressible in our query language and we consider a fragment of this language, consisting of regular expressions to talk about temporally ordered sequences of stops and moves. This fragment can be used not only for querying, but also for expressing data mining and pattern matching tasks over trajectory data.

Proceedings of the 7th ACM international workshop on Data warehousing and OLAP - DOLAP '04, 2004
A peer-to-peer (P2P) data management system consists essentially of a network of peer systems, ea... more A peer-to-peer (P2P) data management system consists essentially of a network of peer systems, each maintaining full autonomy over its own data resources. Data exchange between peers occurs when one of them, in the role of a local peer, needs data available in other nodes, denoted the acquaintances of the local peer. No global schema is assumed to exist for any data under this computing paradigm. Henceforth, data provided by an acquaintance of a local peer must be adapted, in a manner that answers to queries posed by local peer users conform the view those users have of their data. Because a multidimensional database normally consists in a collection of views of aggregated data, a careful translation process is needed in this case, in order to transform any summary concept that appears in a peer acquaintance into a summary concept meaningful to the requesting peer. We present a model for multidimensional data distributed in a P2P network, and a query rewriting technique, that allows a local peer to propagate OLAP queries among its acquaintances, obtaining a meaningful and correct answer.

Proceedings of the ACM tenth international workshop on Data warehousing and OLAP - DOLAP '07, 2007
Data aggregation in Geographic Information Systems (GIS) is a desirable feature, although only ma... more Data aggregation in Geographic Information Systems (GIS) is a desirable feature, although only marginally present in commercial systems nowadays, mostly through ad-hoc solutions. Integration between GIS and OLAP systems is still needed in commercial systems. With this in mind, we have developed Piet, a system that that makes use of a novel query processing technique: first, a process called subpolygonization decomposes each thematic layer into open convex polygons; then, another process computes and stores in a database the overlay of those layers for later use by the query processor. We describe in detail the implementation of Piet, and provide evidence, through experimentation over real-world maps, that overlay precomputation for spatial aggregate queries can be competitive with GIS systems that employ indexing schemes based on R-trees. Given that the data model and implementation do not prevent the use of traditional indexing techniques as R-tree and aR-trees, we our proposal against these techniques in our experiments.

Lecture Notes in Computer Science, 2002
Commercial OLAP systems usually treat OLAP dimensions as static entities. In practice, dimension ... more Commercial OLAP systems usually treat OLAP dimensions as static entities. In practice, dimension updates are often necessary in order to adapt the multidimensional database to changing requirements. In earlier work we proposed a temporal multidimensional model and T OLAP, a query language supporting it, accounting for dimension updates and schema evolution at a high level of abstraction. In this paper we present our implementation of the model and the query language. We show how to translate a T OLAP program to SQL, and present a real-life case study, a medical center in Buenos Aires. We apply our implementation to this case study in order to show how our approach can address problems that occur in real situations and that current non-temporal commercial systems cannot deal with. We present results on query and dimension update performance, and briefly describe a visualization tool that allows editing and running T OLAP queries, performing dimension updates, and browsing dimensions across time.
Uploads
Papers by Alejandro Vaisman