Academia.eduAcademia.edu

A New Architecture for Real Time Data Stream Processing

2017, International Journal of Advanced Computer Science and Applications

Abstract

Processing a data stream in real time is a crucial issue for several applications, however processing a large amount of data from different sources, such as sensor networks, web traffic, social media, video streams and other sources, represents a huge challenge. The main problem is that the big data system is based on Hadoop technology, especially MapReduce for processing. This latter is a high scalability and fault tolerant framework. It also processes a large amount of data in batches and provides perception blast insight of older data, but it can only process a limited set of data. MapReduce is not appropriate for real time stream processing, and is very important to process data the moment they arrive at a fast response and a good decision making. Ergo the need for a new architecture that allows real-time data processing with high speed along with low latency. The major aim of the paper at hand is to give a clear survey of the different open sources technologies that exist for real-time data stream processing including their system architectures. We shall also provide a brand new architecture which is mainly based on previous comparisons of real-time processing powered with machine learning and storm technology.

Key takeaways

  •  Streams/flows processing: Storm can be used to process a stream of new data and update databases in real time.
  • The illustration above in Table 1 shows that storm is the best tool for real-time stream processing, Hadoop performs batch processing, and spark is able of doing micro-batching.
  • The lambda architecture unifies real-time and batch processing in a single framework which provides low latency and better results.
  • A Kappa architecture system is like the lambda architecture with the batch processing system eliminated.
  • Storm allows processing a very large volume of data with low latency and high velocity.