Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2012
As data proliferates at increasing rates, the need for real-time stream processing applications increases as well. In the same way that data stream management systems have emerged from the database community, there is now a similar concern in managing dynamic knowledge among the Semantic Web community. Unfortunately, early relevant approaches are to a large extent theoretical and do not present convincing evidence of their efficiency in real dynamic environments. In this paper, we present a framework for the effective, real-time processing of streaming data and we define and analyze in depth its key components. Our framework serves as a basis for the implementation of the SensorStream prototype, on which we run numerous performance and scalability measurements that outline its behaviour and demonstrate its suitability and scalability for solutions that require real-time information processing from distributed and heterogeneous data sources.
2016
Due to the growing need to timely process and derive valuable information and knowledge from data produced in the Semantic Web, RDF stream processing (RSP) has emerged as an important research domain. Of course, modern RSP have to address the volume and velocity characteristics encountered in the Big Data era. This comes at the price of designing high throughput, low latency, fault tolerant, highly available and scalable engines. The cost of implementing such systems from scratch is very high and usually one prefers to program components on top of a framework that possesses these properties, e.g., Apache Hadoop or Apache Spark. The research conducting in this PhD adopts this approach and aims to create a production-ready RSP engine which will be based on domain standards, e.g., Apache Kafka and Spark Streaming. In a nutshell, the engine aims to i) address basic event modeling to guarantee the completeness of input data in window operators, ii) process real-time RDF stream in a distr...
The Semantic Web – ISWC 2017, 2017
Real-time processing of data streams emanating from sensors is becoming a common task in Internet of Things scenarios. The key implementation goal consists in efficiently handling massive incoming data streams and supporting advanced data analytics services like anomaly detection. In an ongoing , industrial project, a 24/7 available stream processing engine usually faces dynamically changing data and workload characteristics. These changes impact the engine's performance and reliability. We propose Strider, a hybrid adaptive distributed RDF Stream Processing engine that optimizes logical query plan according to the state of data streams. Strider has been designed to guarantee important industrial properties such as scalability, high availability, fault tolerance, high throughput and acceptable latency. These guarantees are obtained by designing the engine's architecture with state-of-the-art Apache components such as Spark and Kafka. We highlight the efficiency (e.g., on a single machine machine, up to 60x gain on throughput compared to state-of-the-art systems, a throughput of 3.1 million triples/second on a 9 machines cluster, a major breakthrough in this system's category) of Strider on real-world and synthetic data sets.
International Journal on Semantic Web and Information Systems, 2016
This paper presents a generic approach to integrate environmental sensor data efficiently, allowing the detection of relevant situations and events in near real-time through continuous querying. Data variety is addressed with the use of the Semantic Sensor Network ontology for observation data modelling, and semantic annotations for environmental phenomena. Data velocity is handled by distributing sensor data messaging and serving observations as RDF graphs on query demand. The stream processing engine presented in the paper, morph-streams++, provides adapters for different data formats and distributed processing of streams in a cluster. An evaluation of different configurations for parallelization and semantic annotation parameters proves that the described approach reduces the average latency of message processing in some cases.
Lecture Notes in Computer Science, 2016
Processing data streams is increasingly gaining momentum, given the need to process these flows of information in real-time and at Web scale. In this context, RDF Stream Processing (RSP) and Stream Reasoning (SR) have emerged as solutions to combine semantic technologies with stream and event processing techniques. Research in these areas has proposed an ecosystem of solutions to query, reason and perform real-time processing over heterogeneous and distributed data streams on the Web. However, so far one basic building block has been missing: a mechanism to disseminate and exchange RDF streams on the Web. In this work we close this gap, proposing TripleWave, a reusable and generic tool that enables the publication of RDF streams on the Web. The features of TripleWave were selected based on requirements of real use-cases, and support a diverse set of scenarios, independent of any specific RSP implementation. TripleWave can be fed with existing Web streams (e.g. Twitter and Wikipedia streams) or time-annotated RDF datasets (e.g. the Linked Sensor Data dataset). It can be invoked through both pull-and push-based mechanisms, thus enabling RSP engines to automatically register and receive data from TripleWave.
2018
Due to the rapid growth of stream data being generated by sensors, micro-blogs, e-businesses, etc., many organizations require on-line processing of their data for real time analysis and actionable alerts. It is not possible to process such voluminous and velocious data in real time using the traditional centralized stream processing engines. Hence distributed stream processing has emerged to facilitate such large scale real time processing. In this work we present a smart distributed event-driven stream processing approach. In contrast to the ordinary stream processing, event-driven stream processing generates query results on the occurrence of specified events only. In the basic event-driven stream processing, even when no event is raised input stream tuples are continuously processed by query operators, though they do not generate any query result. This results in increased system load and wastage of system resources. Whereas in the smart event-driven stream processing scheme, in...
2008
RFIDs, cell phones, and sensor nodes produce streams of sensor data that help computers monitor, react to, and affect the changing status of the physical world. Our goal in this paper is to allow these data streams to be first-class citizens on the World Wide Web. We present a new Web primitive called stream feeds that extend traditional XML feeds such as blogs and Podcasts to accommodate the large size, high frequency, and real-time nature of sensor streams.
VLDB '02: Proceedings of the 28th International Conference on Very Large Databases, 2002
This paper introduces monitoring applications, which we will show differ substantially from conventional business data processing. The fact that a software system must process and react to continual inputs from many sources (e.g., sensors) rather than from human operators requires one to rethink the fundamental architecture of a DBMS for this application area. In this paper, we present Aurora, a new DBMS that is currently under construction at Brandeis University, Brown University, and M.I.T. We describe the basic system architecture, a stream-oriented set of operators, optimization tactics, and support for realtime operation.
J. Inf. Data Manag., 2016
Today, large amounts of data are produced by sensor networks. They are continuously producing data streams about real world phenomena. However, these data streams are generated in raw and different formats, lacking the semantics to describe their meanings, which imposes barriers to accessing and using them. To tackle this problem, several solutions using Linked Data Principles have been proposed. In this paper, we survey the main solutions developed by the research communities for publishing data streams in the Web of Data. The major contributions of the paper are the identification of the strengths and limitations of these solutions and, over that basis, the main steps that someone should follow to publish data streams in a manner that anyone can use them, with a minimal understanding of the details. We also highlight the main challenges that emerge from this survey, concluding with a list of research tasks for future work.
In the last years, there has been an increase in the amount of real-time data generated. Sensors attached to things are transforming how we interact with our environment. Extracting meaningful information from these streams of data is essential for some application areas and requires processing systems that scale to varying conditions in data sources, complex queries, and system failures. This paper describes ongoing research on the development of a scalable RDF streaming engine.
2016
Processing data as they arrive has recently gained momentum to mine continuous, high-volume and unbounded sequence of data streams. Due to the heterogeneity and the multi-modality of this data, RDF is widely used to provide a unified metadata layer in streaming context. In response to this ever-increasing demand, a number of systems and languages were produced, aiming at RDF stream processing (RSP). However, most of them adopt a centralized execution approach which puts a barrier to ensure correct behavior and high scalability under certain circumstances such as concurrent queries and increasing input load. Only few systems sought to distribute processing, but their implementation is still in its infancy. None of them provide a full-fledged and production-ready RSP engine that is easy-to-use, supports all SPARQL 1.1 operators and adapted to industrial needs. As a solution, we present a distributed, fault-tolerant and scalable RSP system that exploits the Apache Storm framework.
2018
Due to the growing need to timely process and derive valuable information and knowledge from data produced in the Semantic Web, RDF stream processing (RSP) has emerged as an important research domain. In this paper, we describe the design of an RSP engine that is built upon state of the art Big Data frameworks, namely Apache Kafka and Apache Spark. Together, they support the implementation of a production-ready RSP engine that guarantees scalability, fault-tolerance, high availability, low latency and high throughput. Moreover, we highlight that the Spark framework considerably eases the implementation of complex applications requiring libraries as diverse as machine learning, graph processing, query processing and stream processing.
Abstract. The emergence of dynamic information sources–including sensor networks–has led to large streams of real-time data on the Web. Research studies suggest, these dynamic networks have created more data in the last three years than in the entire history of civilization, and this trend will only increase in the coming years [1]. With this coming data explosion, real-time analytics software must either adapt or die [2].
Corr, 2010
The Resource Description Framework (RDF) provides a common data model for the integration of "real-time" social and sensor data streams with the Web and with each other. While there exist numerous protocols and data formats for exchanging dynamic RDF data, or RDF updates, these options should be examined carefully in order to enable a Semantic Web equivalent of the high-throughput, low-latency streams of typical Web 2.0, multimedia, and gaming applications. This paper contains a brief survey of RDF update formats and a high-level discussion of both TCP and UDPbased transport protocols for updates. Its main contribution is the experimental evaluation of a UDP-based architecture which serves as a real-world example of a high-performance RDF streaming application in an Internet-scale distributed environment.
In the last years, there has been an increase in the amount of real-time data generated. Sensors attached to things are transforming how we interact with our environment. Extracting meaningful information from these streams of data is essential for some application areas and requires processing systems that scale to varying conditions in data sources, complex queries, and system failures. This paper describes ongoing research on the development of a scalable RDF streaming engine.
IEEE Data(base) Engineering Bulletin, 2003
We propose to demonstrate a Data Stream Management System (DSMS) called STREAM, for STanford stREam datA Manager. The challenges in building a DSMS instead of a traditional DBMS arise from two fundamental differences: ¡ In addition to managing traditional stored data such as relations, a DSMS must handle multiple continuous, unbounded, possibly rapid and time-varying data streams. ¡ Due to the continuous nature of the data, a DSMS typically supports long-running continuous queries, which are expected to produce answers in a continuous and timely fashion.
2016
Due to the growing need to timely process and derive valuable information and knowledge from data produced in the Semantic Web, RDF stream processing (RSP) has emerged as an important research domain. In this paper, we describe the design of an RSP engine that is built upon state of the art Big Data frameworks, namely Apache Kafka and Apache Spark. Together, they support the implementation of a production-ready RSP engine that guarantees scalability, fault-tolerance, high availability, low latency and high throughput. Moreover, we highlight that the Spark framework considerably eases the implementation of complex applications requiring libraries as diverse as machine learning, graph processing, query processing and stream processing.
a book on data stream …, 2004
Proc. VLDB Endow. 10(12), 2017
Real-time processing of data streams emanating from sensors is becoming a common task in industrial scenarios. An increasing number of processing jobs executed over such platforms are requiring reasoning mechanisms. The key implementation goal is thus to eciently handle massive incoming data streams and support reasoning, data analytic services. Moreover, in an ongoing industrial project on anomaly detection in large potable water networks, we are facing the eect of dynamically changing data and work characteristics in stream processing. The Strider system addresses these research and implementation challenges by considering scalability, fault-tolerance, high throughput and acceptable latency properties. We will demonstrate the benets of Strider on an Internet of Things-based real world and industrial setting.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.