Lambda Architecture for Real Time Big Data Analytic

Zirije Hasani

Lambda Architecture for Real Time Big Data Analytic

Zirije Hasani

2014

Sign up for access to the world's latest research

checkGet notified about relevant papers

checkSave papers to use in your research

checkJoin the discussion with peers

checkTrack your impact

Abstract

Defining the environment for analyzing streamed big data in real time is not an easy task. There are many architecture proposals for real time big data analytic, but the most interesting one for our problem is Lambda Architecture. In this paper we are presenting motivation for developing such architecture, how it works and our practical work for implementing it. Lambda Architecture is comprised by three layers batch, speed and serving layer. Thus far we have implemented the batch layer employing Hadoop framework. We also briefly review the other two layers in order to implement them in the next phase of our work, where for serving and speed layer we conclude that Storm is the best choice. Practical example demonstrates the analytical process in Hadoop for analyzing Wikipedia text data.

Figures (4)

The Batch layer stores the master copy of the dataset and pre-computes batch views (arbitrary functions) on that master dataset (fig.2). Hadoop is the typical exam- ple of a batch processing system. Figure 3 illustrates the main principle of the Hadoop implementation by utilizing MapReduce. It divides the large problem into a sub-problems (mapping), performs the same function on each sub-problems and finally combines (reduce) the output from all sub-problems.

Fig. 5. A part of result from wordcount example

chandrakanth lekkala

Leveraging Lambda Architecture for Efficient Real-Time Big Data Analytics, 2020

In this era of big data, firms struggle with processing, analyzing and making sense of the "big data" in realtime, hence the ability to extract valuable information in time for decision-making. The lambda architecture has become a potent framework that allows extensive data systems to handle large and real-time data sets, both batch and streaming. Through Lambda Architecture, this paper discusses Lambda architecture and its relation to up-to-date data analysis. We will review the core architecture parts, including batch, speed, and serving layers, and then show how they work to provide a practical solution to colossal data processing and service. Moreover, we dive into the usage of Lambda Architecture by employing comprehensive big data technologies, including Apache Kafka for taking data in, Apache Hadoop for batch processing, Apache Spark for stream processing, and Apache Cassandra for serving the results. Moreover, we discuss the positive aspects and the issues that may arise with applying Lambda Architecture and even provide some real examples related to this area. The performance of the scheme we implemented shows that the Lambda Architecture models a sustainable and functional system to exploit big data for quick analytics and decision-making in all areas of life.

Log In

Lambda Architecture for Real Time Big Data Analytic

Sign up for access to the world's latest research

Abstract

Related papers

Related papers