0% found this document useful (0 votes)

11 views9 pages

DA Module2

Big Data Technologies encompass software utilities designed to analyze and process large datasets, integrating with technologies like AI and IoT. Apache Hadoop, an open-source framework, facilitates distributed storage and processing through its core modules: HDFS, YARN, and MapReduce. Data discovery and mobile business intelligence are essential components, enabling organizations to uncover insights and make data-driven decisions efficiently.

Uploaded by

pikki.pavankumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views9 pages

DA Module2

Uploaded by

pikki.pavankumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 9

Module– 2:

Big Data Technologies

Hadoop’s Parallel World–Data discovery–Open Source technology for Big Data Analytics–
cloud and Big Data–Predictive Analytics –Mobile Business Intelligence and Big Data.

What is Big Data Technologies?

Big data technology is defined as software-utility. This technology is primarily designed to
analyze, process and extract information from a large data set and a huge set of extremely
complex structures. This is very difficult for traditional data processing software to deal with.

Among the larger concepts of rage in technology, big data technologies are widely associated
with many other technologies such as deep learning, machine learning, artificial intelligence
(AI), and Internet of Things (IoT) that are massively augmented. In combination with these
technologies, big data technologies are focused on analyzing and handling large amounts of
real-time data and batch-related data.

Hadoop Parallel world:

Hadoop is an open source framework. It is provided by Apache to process and analyze very
huge volume of data. It is written in Java and currently used by Google, Facebook, LinkedIn,
Yahoo, Twitter etc.

Modules of Hadoop:

Apache Hadoop is composed of four core modules that facilitate its functionality for
distributed storage and processing of large datasets:
 Hadoop Common:
This module provides the essential utilities and libraries that support the other Hadoop
modules. It contains the necessary Java libraries and scripts required to start Hadoop and is
utilized by HDFS, YARN, and MapReduce.
 Hadoop Distributed File System (HDFS):
HDFS is a distributed file system designed to store very large files across multiple
machines in a cluster. It provides high-throughput access to application data and ensures
fault tolerance by replicating data blocks across different nodes.
 Hadoop YARN (Yet Another Resource Negotiator):
YARN is the resource management layer of Hadoop. It is responsible for managing
compute resources in clusters and scheduling user applications. YARN separates resource
management from job scheduling, allowing various processing engines beyond MapReduce
to run on Hadoop.
 Hadoop MapReduce:
MapReduce is a programming model and processing engine for parallel processing of large
datasets. It provides a framework for developing applications that process vast amounts of
data in a distributed and fault-tolerant manner across a cluster of commodity hardware.

1. HDFS: Hadoop Distributed File System. Google published its paper GFS and on the
basis of that HDFS was developed. It states that the files will be broken into blocks
and stored in nodes over the distributed architecture.
2. Yarn: Yet another Resource Negotiator is used for job scheduling and manage the
cluster.
3. Map Reduce: This is a framework which helps Java programs to do the parallel
computation on data using key value pair. The Map task takes input data and converts
it into a data set which can be computed in Key value pair. The output of Map task is
consumed by reduce task and then the out of reducer gives the desired result.
4. Hadoop Common: These Java libraries are used to start Hadoop and are used by
other Hadoop modules.

What is data discovery?

Data discovery enables your organization to identify, catalog, and classify business-critical
and sensitive data, so you can govern it for meaningful purposes with increased transparency.
Data discovery helps you:

 Uncover new insights for opportunities in business value creation

 Apply data protection to lower risk exposure from abuse and comply with privacy
mandates
 Drive similar high-value business outcomes where data is the fuel of modern business
operations.

Data discovery provides the data intelligence an organization needs to develop new products
and services, optimize data use, and protect data from risk exposure. The result enables
greater opportunities for new revenue sources when collecting greater volumes of data
discovered across today’s modern enterprises.
As an example, information captured from a company’s consumers, such as personal
preferences and transaction records, may lack the necessary data transparency needed when
scattered across enterprise systems. Data discovery helps automate building a metadata
repository using AI and machine learning to accelerate an understanding of where data is
located, where it’s being moved and used, and help determine its value to an organization to
make it available through data democratization efforts, such as a data marketplace.

Open Source technology for Big Data Analytics

Apache Hadoop is a collection of open-source software utilities that facilitates using a
network of many computers to solve problems involving massive amounts of data and
computation. It provides a software framework for distributed storage and processing of big
data using the MapReduce programming model. Hadoop was originally designed for
computer clusters built from commodity hardware, which is still the common use. It has since
also found use on clusters of higher-end hardware. All the modules in Hadoop are designed
with a fundamental assumption that hardware failures are common occurrences and should be
automatically handled by the framework.

A Brief History of Apache Hadoop:

Apache Hadoop is a big data analytics tool that is a java based free software framework. It
helps in the effective storage of a vast amount of data in a storage place known as a cluster.
The special feature of this framework is it runs in parallel on a cluster and also can process
huge data across all nodes in it. There is a storage system in Hadoop popularly known as the
Hadoop Distributed File System (HDFS), which helps to splits the large volume of data and
distribute it across many nodes present in a cluster. It also performs the replication process of
data in a cluster hence providing high availability and recovery from failure – which
increases the fault tolerance.

Cloud and BigData:

Cloud Computing: It is an on-demand delivery of resources like servers, databases,

networking, software, analytics, applications and computational power over the Internet to
promote speed and flexibility as well as the economy of scale. It helps in lowering
operational costs and is much more reliable. Vast amounts of computing resources can be
delivered within minutes or even less.

Big Data Analytics: It is the process of observing complicated patterns and relationships
within large volumes of varied data, the big data, and using that analysis to make informed
and effective business decisions. Large data sets are analyzed to draw conclusions about
them.Below is a table of differences between Cloud Computing and Big Data Analytics:
Data Analytics: It is the process of deducing the logical sets and patterns by filtering and
applying required transformations and models on raw data. The following steps can be
followed to explore the behavioral pattern of data and draw the necessary conclusions.

The top tools available for data analytics in the market are R Programming, Python, SAS,
Tableau Public, KNIME, Apache Spark, Excel, QlikView, and OpenRefine.

Predictive Analytics:

Predictive Analytics: It encompasses making predictions about future outcomes by studying

current and past data trends. It utilizes data modeling, data mining, machine learning, and
deep learning algorithms to extract the required information from data and project behavioral
patterns for future.

Some industry tools used for Predictive analytics are Periscope Data, Google AI Platform,
SAP Predictive Analytics, Anaconda, Microsoft Azure, Rapid Insight Veera and KNIME
Analytics Platform.
Mobile Business Intelligence and Big Data :
Mobile Business Intelligence (BI) is the ability to access and perform BI-related data analysis
on mobile devices and tablets. It can help users make data-driven decisions wherever they are
Mobile BI is different from big data, which is a term for large and complex data sets that
require advanced tools and techniques to process and analyze.

With the introduction of business intelligence software, managers and executives have
typically had access to necessary information on traditional computer desktops and laptops.
As mobile computing device use has increased, including the use of Internet-capable mobile
phones, business intelligence applications have been developed for these devices. Mobile
business intelligence applications allow users to gain access to the software that stores the
information they need.

Need for mobile BI:

Mobile phones' data storage capacity has grown with their use. You are expected to make
decisions and act quickly in this fast-paced environment. The number of businesses receiving
assistance in such a situation is growing by the day.

To expand your business or boost your business productivity, mobile BI can help, and it
works with both small and large businesses. Mobile BI can help you whether you are a
salesperson or a CEO. There is a high demand for mobile BI in order to reduce information
time and use that time for quick decision making.

Advantages of mobile BI
1. Simple access

Mobile BI is not restricted to a single mobile device or a certain place. You can view

your data at any time and from any location. Having real-time visibility into a firm

improves production and the daily efficiency of the business. Obtaining a company's

perspective with a single click simplifies the process.

2. Competitive advantage

Many firms are seeking better and more responsive methods to do business in order to

stay ahead of the competition. Easy access to real-time data improves company

opportunities and raises sales and capital. This also aids in making the necessary

decisions as market conditions change.

3. Simple decision-making

As previously stated, mobile BI provides access to real-time data at any time and from

any location. During its demand, Mobile BI offers the information. This assists

consumers in obtaining what they require at the time. As a result, decisions are made

quickly.

4. Increase Productivity

By extending BI to mobile, the organization's teams can access critical company data

when they need it. Obtaining all of the corporate data with a single click frees up a

significant amount of time to focus on the smooth and efficient operation of the firm.

Increased productivity results in a smooth and quick-running firm.

Disadvantages of mobile

1. Stack of data :The primary function of a mobile BI is to store data in a systematic

manner and then present it to the user as required. As a result, Mobile BI stores all of

the information and does end up with heaps of earlier data. The corporation only

needs a small portion of the previous data, but they need to store the entire

information, which ends up in the stack

2. Expensive
Mobile BI can be quite costly at times. Large corporations can continue to pay for
their expensive services, but small businesses cannot. As the cost of mobile BI is not
sufficient, we must additionally consider the rates of IT workers for the smooth
operation of BI, as well as the hardware costs involved.

Cloud & Big Data
No ratings yet
Cloud & Big Data
5 pages
Big Data Analytics
0% (1)
Big Data Analytics
19 pages
BD Unit 3
No ratings yet
BD Unit 3
8 pages
BDA Module-2 Notes PDF
100% (1)
BDA Module-2 Notes PDF
14 pages
I Jcs It 20150605100
No ratings yet
I Jcs It 20150605100
4 pages
Big Data Overview
No ratings yet
Big Data Overview
18 pages
Hadoop PPT
100% (1)
Hadoop PPT
25 pages
Unit 5
No ratings yet
Unit 5
68 pages
Big Data Analytics
No ratings yet
Big Data Analytics
8 pages
Unit 2
No ratings yet
Unit 2
17 pages
Business Intelligence Exam II Answers
0% (1)
Business Intelligence Exam II Answers
24 pages
Lecture 2 - Hadoop 221
No ratings yet
Lecture 2 - Hadoop 221
28 pages
UG-Big Data Analytics Unit - 3 - Big Data Business Perspectives
No ratings yet
UG-Big Data Analytics Unit - 3 - Big Data Business Perspectives
23 pages
BDA Unit-1
No ratings yet
BDA Unit-1
33 pages
Cours BI 23 24 Session 4 2
No ratings yet
Cours BI 23 24 Session 4 2
46 pages
Big Data Glossary: Key Terms Explained
No ratings yet
Big Data Glossary: Key Terms Explained
2 pages
Hadoop
No ratings yet
Hadoop
562 pages
UNIT-1 BigData
No ratings yet
UNIT-1 BigData
10 pages
Big Data Analytics Explained
No ratings yet
Big Data Analytics Explained
4 pages
Unit 1
No ratings yet
Unit 1
36 pages
Unit II
No ratings yet
Unit II
60 pages
Unit 1 Data Science and Big Data
No ratings yet
Unit 1 Data Science and Big Data
23 pages
MODULE 1 - ST
No ratings yet
MODULE 1 - ST
13 pages
Presentation 20
No ratings yet
Presentation 20
31 pages
Unit 3 Data-Analytics
No ratings yet
Unit 3 Data-Analytics
48 pages
Lauras
No ratings yet
Lauras
33 pages
Business Intelligence Notes
No ratings yet
Business Intelligence Notes
27 pages
BIG DATA AND ANALYTICS Presentation
No ratings yet
BIG DATA AND ANALYTICS Presentation
31 pages
International Journal of Engineering Research and Development (IJERD)
No ratings yet
International Journal of Engineering Research and Development (IJERD)
6 pages
Big Data Analytics Overview and Practices
No ratings yet
Big Data Analytics Overview and Practices
27 pages
CS8091 BIGDATA ANALYTICS QUESTION BANK - Watermark
No ratings yet
CS8091 BIGDATA ANALYTICS QUESTION BANK - Watermark
95 pages
Association Rule Examples in Big Data
No ratings yet
Association Rule Examples in Big Data
40 pages
Module 1
No ratings yet
Module 1
29 pages
Big Data Technology Report With Pages Removed
No ratings yet
Big Data Technology Report With Pages Removed
32 pages
Big Data ANALYSIS LONG
No ratings yet
Big Data ANALYSIS LONG
117 pages
Big Data Analytics 1
No ratings yet
Big Data Analytics 1
22 pages
Unit II
No ratings yet
Unit II
32 pages
A Guide For Beginners: Big Data Glossary
No ratings yet
A Guide For Beginners: Big Data Glossary
1 page
Big Data Analytics - Project
50% (2)
Big Data Analytics - Project
27 pages
Unit-1 Introduction To Data Analytics
No ratings yet
Unit-1 Introduction To Data Analytics
35 pages
Unit 1 Big Data
No ratings yet
Unit 1 Big Data
124 pages
BIGDATA
No ratings yet
BIGDATA
43 pages
Understanding Big Data Characteristics
No ratings yet
Understanding Big Data Characteristics
15 pages
Understanding Big Data Analytics Basics
No ratings yet
Understanding Big Data Analytics Basics
35 pages
Hadoop Report
No ratings yet
Hadoop Report
110 pages
Hadoop - MapReduce
No ratings yet
Hadoop - MapReduce
51 pages
Big Data's Role in Business Decisions
No ratings yet
Big Data's Role in Business Decisions
13 pages
Hadoop - Quick Guide Hadoop - Big Data Overview
No ratings yet
Hadoop - Quick Guide Hadoop - Big Data Overview
32 pages
Data Science and Big Data UNIT 3
No ratings yet
Data Science and Big Data UNIT 3
11 pages
Hadoop Quick Guide
No ratings yet
Hadoop Quick Guide
32 pages
Hadoop & BigData (UNIT - 2)
No ratings yet
Hadoop & BigData (UNIT - 2)
22 pages
Big Data: Key Concepts and Challenges
No ratings yet
Big Data: Key Concepts and Challenges
5 pages
Chapter - 2 Hadoop
100% (1)
Chapter - 2 Hadoop
32 pages
Big Data Analytics
No ratings yet
Big Data Analytics
5 pages
Unit 1 Topic 0 Introduction To Big Data
No ratings yet
Unit 1 Topic 0 Introduction To Big Data
39 pages
BA - Topic1 - Introduction To Business Analytics PDF
No ratings yet
BA - Topic1 - Introduction To Business Analytics PDF
96 pages
Data Analytics Lab Experiment 2
No ratings yet
Data Analytics Lab Experiment 2
2 pages
Liver Tumor Localization Project
No ratings yet
Liver Tumor Localization Project
1 page
Python Functions: Types and Usage
No ratings yet
Python Functions: Types and Usage
12 pages
Ex 12
No ratings yet
Ex 12
4 pages
CPU Control & I/O Essentials
No ratings yet
CPU Control & I/O Essentials
13 pages
Stats 5th Module
No ratings yet
Stats 5th Module
36 pages
Software - Engineering MCA
No ratings yet
Software - Engineering MCA
134 pages
Online Train Ticket Booking System
No ratings yet
Online Train Ticket Booking System
16 pages
Data Analytics For Decision Making
No ratings yet
Data Analytics For Decision Making
8 pages
ĐỀ 9 (HS)
No ratings yet
ĐỀ 9 (HS)
4 pages
Equity 2023-2024
No ratings yet
Equity 2023-2024
2 pages
BOB Card Statement April
No ratings yet
BOB Card Statement April
5 pages
80043-712-05 - Panelboard Information Manual
No ratings yet
80043-712-05 - Panelboard Information Manual
31 pages
(@bohring - Bot) BT and PnC-L4
No ratings yet
(@bohring - Bot) BT and PnC-L4
9 pages
Hindustan Times 27-11-2025
No ratings yet
Hindustan Times 27-11-2025
28 pages
SAP ABAP Training Modules Overview
No ratings yet
SAP ABAP Training Modules Overview
18 pages
Box 16 and 17 Financial Assistance To Individual in Crisis
No ratings yet
Box 16 and 17 Financial Assistance To Individual in Crisis
6 pages
Riger Wifi Router DB108-WL User Manual
100% (6)
Riger Wifi Router DB108-WL User Manual
20 pages
Bug Bounty Tools
No ratings yet
Bug Bounty Tools
2 pages
Cec364 Syllabus
No ratings yet
Cec364 Syllabus
1 page
Altosonic V12 Altosonic V12 Altosonic V12 Altosonic V12: Ultrasonic Gas Flowmeter For Custody Transfer
No ratings yet
Altosonic V12 Altosonic V12 Altosonic V12 Altosonic V12: Ultrasonic Gas Flowmeter For Custody Transfer
40 pages
oiCBEDen 012016 04
No ratings yet
oiCBEDen 012016 04
41 pages
Prajeet Man Dhoubhadel
No ratings yet
Prajeet Man Dhoubhadel
50 pages
5th International Conference of Education (CONEDU 2025)
No ratings yet
5th International Conference of Education (CONEDU 2025)
2 pages
Chapter 2 - System Planning
No ratings yet
Chapter 2 - System Planning
66 pages
Photography Basics: Start Chart Guide
No ratings yet
Photography Basics: Start Chart Guide
1 page
Description and Discussion On Dcase 2025 Challenge Task 2: First-Shot Unsupervised Anomalous Sound Detection For Machine Condition Monitoring
No ratings yet
Description and Discussion On Dcase 2025 Challenge Task 2: First-Shot Unsupervised Anomalous Sound Detection For Machine Condition Monitoring
4 pages
OBS Setup Screen Monitoring Students
No ratings yet
OBS Setup Screen Monitoring Students
7 pages
A K Akella Abstract
No ratings yet
A K Akella Abstract
4 pages
Database Management System Fundamentals
No ratings yet
Database Management System Fundamentals
114 pages
Matrix of Curriculum Standards (Competencies), With Corresponding Recommended Flexible Learning Delivery Mode and Materials Per Grading Period
No ratings yet
Matrix of Curriculum Standards (Competencies), With Corresponding Recommended Flexible Learning Delivery Mode and Materials Per Grading Period
3 pages
Copy+of+DP+Science+IA+Criteria+ +new
No ratings yet
Copy+of+DP+Science+IA+Criteria+ +new
5 pages
UCSC012 Internet Programming: Dr.S.Sumathi, Assistant Professor - Senior Grade Sri Ramakrishna Institute of Technology
No ratings yet
UCSC012 Internet Programming: Dr.S.Sumathi, Assistant Professor - Senior Grade Sri Ramakrishna Institute of Technology
38 pages
Data Center Overview and Design Considerations
100% (1)
Data Center Overview and Design Considerations
20 pages
Passenger Amenities
No ratings yet
Passenger Amenities
11 pages
BS en 60974-6-2003 (2005)
No ratings yet
BS en 60974-6-2003 (2005)
24 pages
Chapter 5
No ratings yet
Chapter 5
32 pages
Graphics Designing and Video Editing
No ratings yet
Graphics Designing and Video Editing
5 pages

DA Module2

Uploaded by

DA Module2

Uploaded by

Module– 2:

Big Data Technologies

What is Big Data Technologies?

Hadoop Parallel world:

What is data discovery?

 Uncover new insights for opportunities in business value creation

Open Source technology for Big Data Analytics

A Brief History of Apache Hadoop:

Cloud and BigData:

Cloud Computing: It is an on-demand delivery of resources like servers, databases,

Predictive Analytics: It encompasses making predictions about future outcomes by studying

Need for mobile BI:

perspective with a single click simplifies the process.

decisions as market conditions change.

Increased productivity results in a smooth and quick-running firm.

1. Stack of data :The primary function of a mobile BI is to store data in a systematic

information, which ends up in the stack

You might also like