Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2010, ACM Transactions on Database Systems
In Relational database nlanageme~~t systems, views supplement basic query constructs to cope with the demand for "higher-level" views of data. h?oreover! in traditional query optimization, answering a query using aset of existing materialized views can yield a more efficient query execution plan. Due to their effectiveness, views are attractive to data stream management systems.
a book on data stream …, 2004
IEEE Data(base) Engineering Bulletin, 2003
We propose to demonstrate a Data Stream Management System (DSMS) called STREAM, for STanford stREam datA Manager. The challenges in building a DSMS instead of a traditional DBMS arise from two fundamental differences: ¡ In addition to managing traditional stored data such as relations, a DSMS must handle multiple continuous, unbounded, possibly rapid and time-varying data streams. ¡ Due to the continuous nature of the data, a DSMS typically supports long-running continuous queries, which are expected to produce answers in a continuous and timely fashion.
2002
In this overview paper we motivate the need for and research issues arising from a new model of data processing. In this model, data does not take the form of persistent relations, but rather arrives in multiple, continuous, rapid, time-varying data streams. In addition to reviewing past work relevant to data stream systems and current projects in the area, the paper explores topics in stream query languages, new requirements and challenges in query processing, and algorithmic issues. £ systems, view management, sequence databases, and others. Although much of this work clearly has applications to data stream processing, we hope to show in this paper that there are many new problems to address in realizing a complete DSMS.
IEEE Transactions on Knowledge and Data Engineering, 1990
Data streams are long, relatively unstructured sequences of characters that contain information such as electronic mail or a tape backup of various documents and reports created in an office. This paper deals with a conceptual framework, using relational algebra and relational databases, within which data streams may be queried. As information is extracted from the data stream, it is put into a relational database that may be queried in the usual manner. The database schema evolves as the user's knowledge of the content of the data stream changes. Operators are defined in terms of relational algebra that can be used to extract data from a specially defined relation that contains all or part of the data stream. This approach to querying data streams permits the integration of unstructured data with structured data. The operators defined extend the functionality of relational algebra, in much the same way that the join does relative to the basic operators-select, project, union, difference, and Cartesian product.
2006
Prior work on languages to express continuous queries over streams has defined a stream as a sequence of tu- ples that represents an infinite append-only relation. In this paper, we show that composition of queries is not possible in the append-only model. Query composition is a fundamental property of any query language - com- position makes it possible to build
VLDB '02: Proceedings of the 28th International Conference on Very Large Databases, 2002
This paper introduces monitoring applications, which we will show differ substantially from conventional business data processing. The fact that a software system must process and react to continual inputs from many sources (e.g., sensors) rather than from human operators requires one to rethink the fundamental architecture of a DBMS for this application area. In this paper, we present Aurora, a new DBMS that is currently under construction at Brandeis University, Brown University, and M.I.T. We describe the basic system architecture, a stream-oriented set of operators, optimization tactics, and support for realtime operation.
2010
There are several query languages developed for data stream management systems (DSMS), CQL (Stanford), StreamSQL (StreamBase), WaveScript (MIT), SCSQL (Uppsala University), etc. This thesis is the research phase of a two-phase project where the final goal is to provide CQL support to the Super Computer Stream Query processor (SCSQ); a DSMS developed by the Uppsala DataBase Laboratory. In this paper, the main properties of CQL, the extent to which they are implemented by the Stanford STREAM project and the expressibility of the Linear Road (LR) benchmark using CQL is investigated. An overview and comparison of SQL, CQL, StreamSQL and WaveScript is also given.
2003
Emerging data stream processing systems rely on windowing to enable on-the-fly processing of continuous queries over unbounded streams. As a result, several recent efforts have developed window-aware implementations of query operators such as joins and aggregates. This focus on individual operators, however, ignores the larger issue of how to coordinate the pipelined execution of such operators when combined into a full windowed query plan. In this paper, we first show how the straightforward application of traditional pipelined query processing techniques to sliding window queries can result in inefficient and incorrect behavior. We then present three alternative execution techniques that guarantee correct behavior for pipelined sliding window queries and develop new algorithms for correctly evaluating window-based duplicateelimination, Group-By and Set operators in this context. We implemented all of these techniques in a prototype data stream system and report the results of a detailed performance study of the system.
2005
In many data st.reaming applications. streams may cont ain data tuples that are either redundant. repetitive, or that are not "interesting" to any of the standing continuous queries. Processing such tuples may waste s~'stem resources \\'ithout producing useful answers. To the contrary, some other tuples can be categorized as promi8ing. This paper proposes that stream query engines can have the option to execute on promising tuples only and not on all tuples. 'Ve propose to maintain intermediate stream summaries and indices that can direct the stream query engine to detect and operate on promising tuples. As an illustration. the proposed intermediate stream summaries are tuned towards capturing promising tuples that (1) maximize the number of output tuples. (2) contribute to producing a faithful representative sample of the output tuples (compared to the output produced when assuming infinite resources), or (3) produce the outlier or deviant results. Experiments are conducted in the context of Nile [24]. a prototype stream query processing engine developed at Purdue Unil l ersity.
Proceedings of the 2019 International Conference on Management of Data
Real-time data analysis and management are increasingly critical for today's businesses. SQL is the de facto lingua franca for these endeavors, yet support for robust streaming analysis and management with SQL remains limited. Many approaches restrict semantics to a reduced subset of features and/or require a suite of non-standard constructs. Additionally, use of event timestamps to provide native support for analyzing events according to when they actually occurred is not pervasive, and often comes with important limitations. We present a three-part proposal for integrating robust streaming into the SQL standard, namely: (1) time-varying relations as a foundation for classical tables as well as streaming data, (2) event time semantics, (3) a limited set of optional keyword extensions to control the materialization of timevarying query results. Motivated and illustrated using examples and lessons learned from implementations in Apache Calcite, Apache Flink, and Apache Beam, we show how with these minimal additions it is possible to utilize the complete suite of standard SQL semantics to perform robust stream processing. Publication rights licensed to ACM. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of the United States government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.
2006
By providing an integrated and optimized support for user-defined aggregates (UDAs), data stream management systems (DSMS) can achieve superior power and generality while preserving compatibility with current SQL standards. This is demonstrated by the Stream Mill system that, through is Expressive Stream Language (ESL), efficiently supports a wide range of applications-including very advanced ones such as data stream mining, streaming XML processing, time-series queries, and RFID event processing. ESL supports physical and logical windows (with optional slides and tumbles) on both built-in aggregates and UDAs, using a simple framework that applies uniformly to both aggregate functions written in an external procedural languages and those natively written in ESL. The constructs introduced in ESL extend the power and generality of DSMS, and are conducive to UDA-specific optimization and efficient execution as demonstrated by several experiments.
Proceedings of the 2007 conference of the center for advanced studies on Collaborative research - CASCON '07, 2007
Query processing for data streams raises challenges that cannot be directly handled by existing database management systems (DBMS). Most related work in the literature mainly focuses on developing techniques for a dedicated data stream management system (DSMS). These systems typically either do not permit joining data streams with conventional relations or simply convert relations to streams before joining. In this paper, we present techniques to process queries that join data streams with relations, without treating relations as special streams. We focus on a typical type of
2008
Many modern applications need to process queries over potentially infinite data streams to provide answers in real-time. This dissertation proposes novel techniques to optimize CPU and memory utilization in stream processing by exploiting metadata on streaming data or queries. It focuses on four topics: 1) exploiting stream metadata to optimize SPJ query operators via operator configuration, 2) exploiting stream metadata to optimize SPJ query plans via query-rewriting, 3) exploiting workload metadata to optimize parameterized queries via indexing, and 4) exploiting event constraints to optimize event stream processing via run-time early termination. The first part of this dissertation proposes algorithms for one of the most common and expensive query operators, namely join, to at runtime identify and purge no-longer-needed data from the state based on punctuations. Exploitations of the combination of punctuation and commonly-used window constraints are also studied. Extensive experimental evaluations demonstrate both reduction on memory usage and improvements on execution time due to the proposed strategies. The second part proposes herald-driven runtime query plan optimiza-
2008
This paper introduces the DataCell, a data stream management system designed as a seamless integration of continuous queries based on bulk event processing in an SQL software stack. The continuous stream queries are based on a predicate-window, called "basket" expressions, which support arbitrary complex SQL subqueries including, but not limited to, temporal and sequence constraints.
IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2017
Data Stream Management Systems (DSMSs) are conceived for running continuous queries (CQs) on the most recently streamed data. This model does not completely fit the needs of several modern data-intensive applications that require to manage recent/historical/static data and execute both CQs and OTQs joining such data. In order to cope with these new needs, some DSMSs have moved towards the integration of DBMS functionalities to augment their capabilities. In this paper we adopt the opposite perspective and we lay the groundwork for extending DBMSs to natively support streaming facilities. To this end, we introduce a new kind of table, the streaming table, as a persistent structure where streaming data enters and remains stored for a long period, ideally forever. Streaming tables feature a novel access paradigm: continuous writes and one-time as well as continuous reads. We present a streaming table implementation and two novel types of indices that efficiently support both update and scan high rates. A detailed experimental evaluation shows the effectiveness of the proposed technology.
2004
We study the fundamental limitations of relational algebra (RA) and SQL in supporting sequence and stream queries, and present effective query language and data model enrichments to deal with them. We begin by observing the well-known limitations of SQL in application domains which are important for data streams, such as sequence queries and data mining. Then we present a formal proof that, for continuous queries on data streams, SQL suffers from additional expressive power problems. We begin by focusing on the notion of nonblocking (N B) queries that are the only continuous queries that can be supported on data streams. We characterize the notion of nonblocking queries by showing that they are equivalent to monotonic queries. Therefore the notion of N B-completeness for RA can be formalized as its ability to express all monotonic queries expressible in RA using only the monotonic operators of RA. We show that RA is not N B-complete, and SQL is not more powerful than RA for monotonic queries.
Proceedings of the 27th Annual ACM Symposium on Applied Computing - SAC '12, 2012
The use of stream based applications is in expansion in many contexts and easy and efficient data stream management is crucial for such applications. That is why numerous solutions for stream query processing have been proposed by the scientific community. Several query processors exist and offer heterogeneous querying capabilities. This paper reports a formal work on the operators behind such query processing solutions. It points out the semantic heterogeneity of some important operators and how this leads to some kind of semantic ambiguity which may affect the application semantics. This paper revisits the definition of the main operators used for stream query processing and proposes definitions which are semantically unambiguous. The main issue is the positional order of data items in a stream and its propagation across the operators. The proposed formalization deepens the understanding of stream queries and facilitates the comparison of the semantics implemented by existing systems. This paper also presents the prototype implementing our formal proposal.
2003
Abstract Traditional databases store sets of relatively static records with no pre-defined notion of time, unless timestamp attributes are explicitly added. While this model adequately represents commercial catalogues or repositories of personal information, many current and emerging applications require support for on-line analysis of rapidly changing data streams.
Proceedings of the 2006 ACM SIGMOD international conference on Management of data - SIGMOD '06, 2006
Recent data stream systems such as TelegraphCQ have employed the well-known property of duality between data and queries. In these systems, query processing methods are classified into two dual categories -data-initiative and query-initiative -depending on whether query processing is initiated by selecting a data element or a query. Although the duality property has been widely recognized, previous data stream systems do not fully take advantages of this property since they use the two dual methods independently: data-initiative methods only for continuous queries and query-initiative methods only for ad-hoc queries. We contend that continuous query processing can be better optimized by adopting an approach that integrates the two dual methods. Our primary contribution is based on the observation that spatial join is a powerful tool for achieving this objective. In this paper, we first present a new viewpoint of transforming the continuous query processing problem to a multi-dimensional spatial join problem. We then present a continuous query processing algorithm based on spatial join, which we name Spatial Join CQ. This algorithm processes continuous queries by finding the pairs of overlapping regions from a set of data elements and a set of queries, both defined as regions in the multi-dimensional space. The algorithm achieves the advantages of the two dual methods simultaneously. Experimental results show that the proposed algorithm outperforms earlier algorithms by up to 36 times for simple selection continuous queries and by up to 7 times for sliding window join queries.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.