0% found this document useful (0 votes)

16 views41 pages

UNIT-1 - Big Data and Hadoop

Unit 1 covers the concepts of Big Data, including its characteristics defined by the 4 Vs: Volume, Velocity, Variety, and Variability. It discusses the evolution of Big Data through three phases, highlighting the transition from traditional data analysis to modern Big Data analytics driven by technological advancements. Additionally, it outlines different types of data, challenges faced by conventional systems, and the features and use cases of Big Data platforms.

Uploaded by

J07Anubha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views41 pages

UNIT-1 - Big Data and Hadoop

Uploaded by

J07Anubha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 41

UNIT-1

Big Data and Hadoop

Unit 1:

CONCEPTS OF BIG DATA: Concept of Big Data Platform – Evolution and Challenges of
Conventional Systems - Intelligent data analysis – Nature of Data - Analytic Processes and
Tools - Analysis vs Reporting - Modern Data Analytic Tools- Applications of big data.

Questions:

Q1. Explain the characteristics or dimensions of big data.

Big data is a term that is used to describe data that is high volume, high velocity, and/or high
variety; requires new technologies and techniques to capture, store, and analyze it; and is used to
enhance decision making, provide insight and discovery, and support and optimize processes.
There are three dimensions to big data known as Volume, Variety and Velocity.

Characteristics of Big Data: Initially only 3 Vs. After wards 4th V is added. 1.
Volume(Scale)-Data Volume, 44x increase from 2009 2020, From 0.8 zettabytes to 35zb,
Data volume is increasing exponentially
2. Velocity(Speed)-Data is being generated fast and need to be processed fast
Online Data Analytics
Late decisions ➔ missing opportunities
Examples
E-Promotions: Based on your current location, your purchase history, what you like ➔
send promotions right now for store next to you
Healthcare monitoring: sensors monitoring your activities and body ➔ any abnormal
measurements require immediate reaction
3. Variety (Complexity)- Various formats, types, and structures
Text, numerical, images, audio, video, sequences, time series, social media data, multi
dim arrays, etc…
Static data vs. streaming data
A single application can be generating/collecting many types of data
To extract knowledge➔ all these types of data need to linked together
4. Variability – This refers to the inconsistency which can be shown by the data at times,
thus hampering the process of being able to handle and manage the data effectively.
MONIKA VERMA, CSE, BIT DURG 1
UNIT-1
MONIKA VERMA, CSE, BIT DURG 2
UNIT-1

1TB=1024GB

1PetaByte (5th power of 1000, 1015) =1024TB

1 ExaByte (6th power of 1000, 1018) =1024 PB

1 ZettaByte=1024 EB

1 YottaByte=1024 ZB
Big data "size" is a constantly moving target, as of 2012 ranging from a few dozen terabytes to
many petabytes of data.[

Big Data is a complex set of Data. Big Data is a collection of huge volumes of

Data. Q 2: Explain the Evolution of big data.

“Big Data” is a relative term depending on who is discussing it. Big Data to Amazon or Google
is very different than Big Data to a medium-sized insurance organization, but no less “Big” in
the minds of those contending with it.

Such foundational steps to the modern conception of Big Data involve the development of
computers, smart phones, the internet, and sensory (Internet of Things) equipment to provide
data. Credit cards also played a role, by providing increasingly large amounts of data, and
certainly social media changed the nature of data volumes in novel and still developing ways.
The evolution of modern technology is interwoven with the evolution of Big Data.

MONIKA VERMA, CSE, BIT DURG 3

UNIT-1

Where does ‘Big Data’ come from?

The term ‘Big Data’ has been in use since the early 1990s. Although it is not exactly known who
first used the term, most people credit John R. Mashey (who at the time worked at Silicon
Graphics) for making the term popular.

In its true essence, Big Data is not something that is completely new or only of the last two
decades. Over the course of centuries, people have been trying to use data analysis and analytics
techniques to support their decision-making process. The ancient Egyptians around 300 BC
already tried to capture all existing ‘data’ in the library of Alexandria. Moreover, the Roman
Empire used to carefully analyze statistics of their military to determine the optimal distribution
for their armies.

However, in the last two decades, the volume and speed with which data is generated has
changed – beyond measures of human comprehension. The total amount of data in the world was
4.4 zettabytes in 2013. That is set to rise steeply to 44 zettabytes by 2020. To put that in
perspective, 44 zettabytes is equivalent to 44 trillion gigabytes. Even with the most advanced
technologies today, it is impossible to analyze all this data. The need to process these
increasingly larger (and unstructured) data sets is how traditional data analysis transformed into
‘Big Data’ in the last decade.

To illustrate this development over time, the evolution of Big Data can roughly be sub-divided
into three main phases. Each phase has its own characteristics and capabilities. In order to
understand the context of Big Data today, it is important to understand how each phase
contributed to the contemporary meaning of Big Data.

Big Data phase 1.0

Data analysis, data analytics and Big Data originate from the longstanding domain of database
management. It relies heavily on the storage, extraction, and optimization techniques that are
common in data that is stored in Relational Database Management Systems (RDBMS).

Database management and data warehousing are considered the core components of Big Data
Phase 1. It provides the foundation of modern data analysis as we know it today, using well
known techniques such as database queries, online analytical processing and standard reporting
tools.

Big Data phase 2.0

MONIKA VERMA, CSE, BIT DURG 4

UNIT-1

Since the early 2000s, the Internet and the Web began to offer unique data collections and data
analysis opportunities. With the expansion of web traffic and online stores, companies such as
Yahoo, Amazon and eBay started to analyze customer behaviour by analyzing click-rates, IP
specific location data and search logs. This opened a whole new world of possibilities.

From a data analysis, data analytics, and Big Data point of view, HTTP-based web traffic
introduced a massive increase in semi-structured and unstructured data. Besides the standard
structured data types, organizations now needed to find new approaches and storage solutions to
deal with these new data types in order to analyze them effectively. The arrival and growth of
social media data greatly aggravated the need for tools, technologies and analytics techniques
that were able to extract meaningful information out of this unstructured data.

Big Data phase 3.0

Although web-based unstructured content is still the main focus for many organizations in data
analysis, data analytics, and big data, the current possibilities to retrieve valuable information are
emerging out of mobile devices.

Mobile devices not only give the possibility to analyze behavioral data (such as clicks and search
queries), but also give the possibility to store and analyze location-based data (GPS-data). With
the advancement of these mobile devices, it is possible to track movement, analyze physical
behaviour and even health-related data (number of steps you take per day). This data provides a
whole new range of opportunities, from transportation, to city design and health care.

Simultaneously, the rise of sensor-based internet-enabled devices is increasing the data

generation like never before. Famously coined as the ‘Internet of Things’ (IoT), millions of TVs,
thermostats, wearables and even refrigerators are now generating zettabytes of data every day.
And the race to extract meaningful and valuable information out of these new data sources has
only just begun.

A summary of the three phases in Big Data is listed in the figure below:

MONIKA VERMA, CSE, BIT DURG 5

UNIT-1

Q 3. Write different types of data. Give example of file types that fall under
different types?

Types of Data
Generally Big Data consists unstructured Data

• Structured Data
Structured data concerns all data which can be stored in database SQL in table with rows and
columns. They have relational key and can be easily mapped into pre-designed fields.

Structured data is highly organized information that uploads neatly into a relational database

Structured data is relatively simple to enter, store, query, and analyze, but it must be strictly
defined in terms of field name and type

• Unstructured Data
Unstructured data may have its own internal structure, but does not conform neatly into a
spreadsheet or database.

Most business interactions, in fact, are unstructured in nature.

Today more than 80% of the data generated is unstructured.

The fundamental challenge of unstructured data sources is that they are difficult for nontechnical
business users and data analysts alike to unbox, understand, and prepare for analytic use.

Examples: Satellite images, Social media data, Mobile data,

MONIKA VERMA, CSE, BIT DURG 6

UNIT-1

Photographs and video: This includes security, surveillance, and traffic video

Website content: This comes from any site delivering unstructured content, like YouTube,
Flickr, or Instagram.

• Semi structured Data

Semi-structured data is information that doesn’t reside in a relational database but that does have
some organizational properties that make it easier to analyze.

Examples of semi-structured: CSV but XML and JSON (JavaScript Object Notation) documents
are semi structured documents, NoSQL databases are considered as semi structured.

Semi-structured data is a form of structured data that does not conform to the formal structure
of data models associated with relational databases or other forms of data tables, but nonetheless
contains tags or other markers to separate semantic elements and enforce hierarchies of records
and fields within the data.
Q 4: What do you mean by Big data Analytics?
Artificial Intelligence (AI), mobile, social and Internet of Things (IoT) are driving data
complexity, new forms and sources of data. Big data analytics is the use of advanced analytic
techniques against very large, diverse data sets that include structured, semi-structured and
unstructured data, from different sources, and in different sizes from terabytes to zettabytes.

Big data is a term applied to data sets whose size or type is beyond the ability of traditional
relational databases to capture, manage, and process the data with low-latency. And it has one or
more of the following characteristics – high volume, high velocity, or high variety. Big data
comes from sensors, devices, video/audio, networks, log files, transactional applications, web,
and social media - much of it generated in real time and in a very large scale.

Analyzing big data allows analysts, researchers, and business users to make better and faster
decisions using data that was previously inaccessible or unusable. Using advanced analytics
techniques such as text analytics, machine learning, predictive analytics, data mining, statistics,
and natural language processing, businesses can analyze previously untapped data sources
independent or together with their existing enterprise data to gain new insights resulting in better
and faster decisions.

Big Data Analytics

• Analytics has, in a sense, been around since 1663, when John Graunt dealt with
“overwhelming amounts of information,” using statistics to study the bubonic plague. In
2017, 2,800 experienced professionals who worked with Business Intelligence were

MONIKA VERMA, CSE, BIT DURG 7

UNIT-1

surveyed, and they predicted Data Discovery and Data Visualization will become an
important trend. Data Visualization is a form of visual communication (think
infographics). It describes information which has been translated into schematic format,
and includes changes, variables, and fluctuations. A human brain can process visual
patterns very efficiently.

• Visualization models are steadily becoming more popular as an important method for
gaining insights from Big Data. (Graphics are common, and animation will become
common. At present, data visualization models are a little clumsy, and could use some
improvement.) Listed below are some of the businesses offering Big Data visualization
models:

Q 5. What are the Use cases Of Big Data?

The Uses of Big Data

Big Data is revolutionizing entire industries and changing human culture and behavior. It is a
result of the information age and is changing how people exercise, create music, and work. The
following provides some examples of Big Data use.

• Big Data is being used in healthcare to map disease outbreaks and test alternative treatments. •
NASA uses Big Data to explore the universe.
• The music industry replaces intuition with Big Data studies.
• Utilities use Big Data to study customer behavior and avoid blackouts.
• Nikeuses health monitoring wearables to track customers and provide feedback on their health. •
Big Data is being used by cybersecurity to stop cybercrime.

Q 6. What are the different features provided by big data platforms? List
four big data platforms.

Concept of Big Data Platform:

Big data platform is a type of IT solution that combines the features and capabilities of several
big data application and utilities within a single solution. It is an enterprise class IT platform that
enables organization in developing, deploying, operating and managing a big data infrastructure
/environment.

MONIKA VERMA, CSE, BIT DURG 8

UNIT-1

Big data platform generally consists of big data storage, servers, database, big data management,
business intelligence and other big data management utilities. It also supports custom
development, querying and integration with other systems. The primary benefit behind a big data
platform is to reduce the complexity of multiple vendors/ solutions into a one cohesive solution.
Big data platform are also delivered through cloud where the provider provides an all inclusive
big data solutions and services.

• Big data technology based on platform-as-a-service

• Storage capacity extremely scalable (> petabytes)
• Management and analysis of structured and unstructured data
• Direct access and processing of real-time information
• End-to-end-service with established quality standards
Top Big data platforms:
• Microsoft Azure
• Cloudera Enterprise
• MapR Converged Data Platform
• 1010data
• Oracle Big data analytics
• HPE (Hewlett Packard Enterprise)Vertica
• Hortonworks Data Platform
• Kognitio Analytical Platform
• IBM big data Platform
• SAP HANA Platform

MONIKA VERMA, CSE, BIT DURG 9

UNIT-1

Q 7. Explain different challenges faced by Challenges of Conventional

Systems

Conventional Data Systems

Traditional data systems, such as relational databases and data warehouses, have been the
primary way businesses and organizations have stored and analyzed their data for the past 30 to
40 years. Although other data stores and technologies exist, the major percentage of business
data can be found in these traditional systems. Traditional systems are designed from the ground
up to work with data that has primarily been structured data. Characteristics of structured data
include the following:

• Clearly defined fields organized in records. Records are usually stored in tables. Fields have
names, and relationships are defined between different fields.
• Schema-on-write that requires data to be validated against a schema before it can be written to
disk. A significant amount of requirements analysis, design, and effort up front can be
involved in putting the data in clearly defined structured formats. This can increase the time
before business value can be realized from the data.
• A design to get data from the disk and load the data into memory to be processed by
applications. This is an extremely inefficient architecture when processing large volumes of
data this way. The data is extremely large and the programs are small. The big component
must move to the small component for processing.
• The use of Structured Query Language (SQL) for managing and accessing the data. •
Relational and warehouse database systems that often read data in 8k or 16k block sizes. These
block sizes load data into memory, and then the data are processed by applications. When
processing large volumes of data, reading the data in these block sizes is extremely inefficient.
• Organizations today contain large volumes of information that is not actionable or being
leveraged for the information it contains.
• An order management system is designed to take orders. A web application is designed for
operational efficiency. A customer system is designed to manage information on customers.
Data from these systems usually reside in separate data silos. However, bringing this
information together and correlating with other data can help establish detailed patterns on
customers.
• In a number of traditional soloed environments, data scientists can spend 80% of their time
looking for the right data and 20% of the time doing analytics. A data-driven environment
must have data scientists spending a lot more time doing analytics.

Google realized that if it wanted to be able to rank the Internet, it had to design a new way
of solving the problem. It started with looking at what was needed:

MONIKA VERMA, CSE, BIT DURG 10

UNIT-1

• Inexpensive storage that could store massive amounts of data cost effectively •
To scale cost effectively as the data volume continued to increase
• To analyze these large data volumes very fast
• To be able to correlate semi-structured and unstructured data with existing structured data • To
work with unstructured data that had many forms that could change frequently; for example,
data structures from organizations such as Twitter can change regularly

Google also identified the problems:

• The traditional storage vendor solutions were too expensive.

• When processing very large volumes of data at the level of hundreds of terabytes and
petabytes, technologies based on “shared block-level storage” were too slow and couldn’t
scale cost effectively. Relational databases and data warehouses were not designed for the
new level of scale of data ingestion, storage, and processing that was required. Today’s data
scale requires a high-performance super-computer platform that could scale at cost.
• The processing model of relational databases that read data in 8k and 16k increments and then
loaded the data into memory to be accessed by software programs was too inefficient for
working with large volumes of data.
• The traditional relational database and data warehouse software licenses were too expensive
for the scale of data Google needed.
• The architecture and processing models of relational databases and data warehouses were
designed to handle transactions for a world that existed 30 to 40 years ago. These
architectures and processing models were not designed to process the semi-structured and
unstructured data coming from social media, machine sensors, GPS coordinates, and RFID.
Solutions to address these challenges are so expensive that organizations wanted another
choice.
• Reducing business data latency was needed. Business data latency is the differential between
the time when data is stored to the time when the data can be analyzed to solve business
problems.
• Google needed a large single data repository to store all the data. Walk into any large
organization and it typically has thousands of relational databases along with a number of
different data warehouse and business analysis solutions. All these data platforms stored
their data in their own independent silos. The data needed to be correlated and analyzed
with different datasets to maximize business value. Moving data across data silos is
expensive, requires lots of resources, and significantly slows down the time to business
insight.

The solution criteria follow:

• Inexpensive storage. The most inexpensive storage is local storage from off-the-shelf disks.

MONIKA VERMA, CSE, BIT DURG 11

UNIT-1

•A data platform that could handle large volumes of data and be linearly scalable at cost and
performance.
• A highly parallel processing model that was highly distributed to access and compute the data
very fast.
• A data repository that could break down the silos and store structured, semi-structured, and
unstructured data to make it easy to correlate and analyze the data together.

Q 8. Differentiate between conventional systems and big data systems

Conventional Systems Big data Systems

Traditional systems are designed from the Big data systems contain data repository that
ground up to work with data that has could break down the silos (A data silo is a
primarily been structured data with clearly repository of fixed data that remains under
defined fields organized in records. Records the control of one department and is isolated
are usually stored in tables. Fields have from the rest of the organization) and store
names, and relationships are defined structured, semi-structured, and unstructured
between different fields. data to make it easy to correlate and analyze
the data together.

MONIKA VERMA, CSE, BIT DURG 12

UNIT-1

Uses Schema-On-Write: Traditional systems are

schema-on-write. Schema-on-write requires
the data to be validated when it is written. This Uses Schema-On-Read: Hadoop systems are
means that a lot of work must be done before schema-on-read, which means any data can be
new data sources can be analyzed. Here is an written to the storage system immediately. Data
example: Suppose a company wants to start are not validated until they are read. This
analyzing a new source of data from enables Hadoop systems to load any type of
unstructured or semi-structured sources. A data and begin analyzing it quickly. Hadoop
company will usually spend months (3–6 systems have extremely short business latency
months) designing schemas and so on to store compared to traditional systems. A lot of
the data in a data warehouse. That is 3 to 6 companies need real-time processing of data
months that the company cannot use the data and customer models generated in hours or
to make business decisions. Then when the days versus weeks or months. The Internet of
data warehouse design is completed 6 months Things (IoT) is accelerating the data streams
later, often the data has changed again. If you coming from different types of devices and
look at data structures from social media, they physical objects, and digital personalization is
change on a regular basis. The accelerating the need to be able to make real
schema-on-write environment is too slow and time decisions. Schema-on-read gives Hadoop a
rigid to deal with the dynamics of semi tremendous advantage over traditional systems
structured and unstructured data in an area that matters most, that of being able
environments that are changing over a period to analyze the data faster to make business
of time. The other problem with unstructured decisions. When working with complex data
data is that traditional systems usually use structures that are semi-structured or
Large Object Byte (LOB) types to handle unstructured, schema-on-read enables data to
unstructured data, which is often very be accessed much faster than schema-on-write
inconvenient and difficult to work with. systems.
Use shared storage: Traditional systems use Uses Local Storage: Hadoop uses the Hadoop
shared storage. As organizations start to Distributed File System (HDFS), a distributed
ingest larger volumes of data, shared file system that uses local disks as servers.
storage is cost prohibitive. Shared storage is costlier than local storage.
Even though Hadoop’s HDFS creates three
replicas by default for high availability, it is
still a fraction of the cost of traditional shared
storage. Hadoop compute and storage on the
same physical machine so it represents “data
locality”

MONIKA VERMA, CSE, BIT DURG 13

Low cost of Commodity Hardware: It is possible

to build a high-performance super-computer
High Cost of Proprietary Hardware: Large environment using Hadoop. One customer was
proprietary hardware solutions can be cost looking at a proprietary hardware vendor for a
prohibitive when deployed to process solution. The hardware vendor’s solution
extremely large volumes of data. Organizations contains both hardware cost and licensed
are spending millions of dollars in hardware software cost. The Hadoop solution for the
and software licensing costs while supporting same processing power involves only
large data environments. Organizations are commodity hardware, the software was free.
often growing their hardware in million dollar Because data volumes would be constantly
increments to handle the increasing data. New increasing, the proprietary solutions cost
technology in traditional vendor systems that growth is much higher than growth of cost of
can grow to petabyte scale and good Hadoop solutions.
performance are extremely expensive.
UNIT-1
High Complexity: When you look at any Simplicity: Because Hadoop uses commodity
traditional proprietary solution, it is full of hardware and follows the “shared-nothing”
extremely complex silos of system architecture, it is a platform that one person
administrators, DBAs, application server can understand very easily. Numerous
teams, storage teams, and network teams. organizations running Hadoop have one
Often there is one DBA for every 40 to 50 administrator for every 1,000 data nodes.
database servers. Anyone running traditional With commodity hardware, one person can
systems knows that complex systems fail in understand the entire technology stack.
complex ways.

Lower accuracy in data analytics: Because Higher accuracy of data analysis: Because of
data is so expensive to store in traditional the relatively low cost of storage of Hadoop,
systems, data is filtered, and large volumes the detailed records are stored in Hadoop’s
are thrown out because of the cost of storage system HDFS. Traditional data can
storage. Minimizing the data to be analyzed then be analyzed with nontraditional data in
reduces the accuracy and confidence of the Hadoop to find correlation points that can
results. Not only are accuracy and confidence provide much higher accuracy of data
to the resulting data affected, but it also analysis.
limits an organization’s ability to identify
business opportunities.

MONIKA VERMA, CSE, BIT DURG 14

UNIT-1
databases that read data in 8k and 16k
increments and then loaded the data into
memory to be accessed by software programs
was too inefficient for working with large
volumes of data.
Bringing Programs to the Data: With Hadoop,
the programs are moved to where the data is.
Hadoop data is spread across all the disks on
Brings Data to the Programs: In relational the local servers that make up the Hadoop
databases and data warehouses, data are cluster, often in 64MB or 128MB block
loaded from shared storage elsewhere in the increments. Individual programs, one for every
datacenter. The data must go over wires and block, runs in parallel (up to the number of
through switches that have bandwidth available map slots, more on this later) across
limitations before programs can process the the cluster, delivering a very high level of
data. For many types of analytics that process parallelization and Input/output Operations per
10s, 100s, and 1000s of terabytes, the Second (IOPS). This means Hadoop systems can
capability of the computational side to process process extremely large volumes of data much
data greatly exceeds the storage bandwidth faster than traditional systems and at a fraction
available. The processing model of relational of the cost because of the architecture model.
Moving the programs (small component) to the supports the extremely fast processing of large
data (large component) is an architecture that volumes of data

Q 9 . Draw and Explain the Architecture of Big Data.

Big Data architecture is a system used for ingesting, storing, and processing vast amounts of
data (known as Big Data) that can be analyzed for business gains. It is a blueprint of a big
data solution based on the requirements and infrastructure of business organizations. A

MONIKA VERMA, CSE, BIT DURG 15

UNIT-1

robust architecture saves the company money. It helps them to predict future trends and
improves decision making.

It is designed for handling:

• Batch processing of big data sources.

• Real-time processing of big data.
• Predictive analytics and machine learning.
It comprises the following components:

1. Data Sources
Data sources govern Big Data architecture. It involves all those sources from where the data
extraction pipeline gets built. Data Sources are the starting point of the big data pipeline.
Data arrives through multiple sources including relational databases, sensors, company

MONIKA VERMA, CSE, BIT DURG 16

UNIT-1

servers, IoT devices, static files generated from apps such as Windows logs, third-party data
providers, etc. This data can be batch data or real-time data. Big Data architecture is designed
in such a way that it handles this vast amount of data.

2. Data Storage
Data Storage is the receiving end for Big Data. Data Storage receives data of varying formats
from multiple data sources and stores them. It even changes the format of the data received
from data sources depending on the system requirements. For example, Big Data architecture
stores unstructured data in distributed file storage systems like HDFS or NoSQL database. It
stores structured data in RDBMS.

3. Real-time Message Ingestion

We need to build a mechanism in our Big Data architecture that captures and stores real-time
data that is consumed by stream processing consumers. It is simply a datastore where the
new messages are dropped inside the folder. There are a number of solutions that require the
necessity of a message-based ingestion store that acts like a message buffer and supports
scale based processing. They provide reliable delivery along with the other messaging
queuing semantics. It may include options like Apache Kafka, Event hubs from Azure,
Apache Flume, etc.

4. Batch Processing
The architecture requires a batch processing system for filtering, aggregating, and processing
data which is huge in size for advanced analytics. These are generally long-running batch
jobs that involve reading the data from the data storage, processing it, and writing outputs to
the new files. The most commonly used solution for Batch Processing is Apache Hadoop.

MONIKA VERMA, CSE, BIT DURG 17

UNIT-1

5. Stream Processing
There is a little difference between stream processing and real-time message ingestion.
Stream processing handles all streaming data which occurs in windows or streams. It then
writes the data to the output sink. It includes Apache Spark, Storm, Apache Flink, etc.
6. Analytical Data Store
After processing data, we need to bring data in one place so that we can accomplish an
analysis of the entire data set. The analytical data store is important as it stores all our
process data at one place making analysis comprehensive. It is optimized mainly for analysis
rather than transactions. It can be a relational database or cloud-based data warehouse
depending on our needs.

7. Analytics and Reporting

After ingesting and processing data from varying data sources we require a tool for analyzing
the data. For this, there are many data analytics and visualization tools that analyze the data
and generate reports or a dashboard. Companies use these reports for making data-driven
decisions.

8. Orchestration
Moving data through these systems requires orchestration in some form of automation.
Ingesting data, transforming the data, moving data in batches and stream processes, then
loading it to an analytical data store, and then analyzing it to derive insights must be in a
repeatable workflow. This allows us to continuously gain insights from our big data.

MONIKA VERMA, CSE, BIT DURG 18

UNIT-1

Q 10. What are the Challenges in Designing Big Data

Architecture? 1. Data Quality
Data quality is a challenge while working with multiple data sources. The architecture must
ensure data quality. The data formats must match, no duplicate data, and no data must be
missed. The architecture must be designed in such a way that it analyses and prepares the
data before bringing data together with other data for analysis.

2. Scaling
Big Data architecture must be designed in such a way that it can scale up when the need
arises. Otherwise, the system performance can degrade significantly.

3. Security
Data Security is the most crucial part. It is the biggest challenge while dealing with big data.
Hackers and Fraudsters may try to add their own fake data or skim companies’ data for
sensitive information. Cybercriminal would easily mine company data if companies do not
encrypt the data, secure the perimeters, and work to anonymize the data for removing
sensitive information.

4. Choosing Technology set

There are many tools and technologies with their pros and cons for big data analytics like
Apache Hadoop, Spark, Casandra, Hive, etc. Choosing the right technology set is difficult.
Companies must be aware that whether they need Spark or the speed of Hadoop MapReduce
is enough. Also they must know whether to store data in Cassandra, HDFS, or HBase.

MONIKA VERMA, CSE, BIT DURG 19

UNIT-1

5. Paying loads of money

Big data architecture entails lots of expenses. During architecture design, the Big data
company must know the hardware expenses, new hires expenses, electricity expenses,
needed framework is open-source or not, and many more.

Q 11. What are the Benefits of Big Data Architecture?

1. Reducing costs: Big data technologies such as Apache Hadoop significantly reduce
storage costs.
2. Improve decision making: The use of Big data architecture streaming component
enables companies to make decisions in real-time.
3. Future trends prediction: Big Data analytics helps companies to predict future
trends by analyzing big data from multiple sources.
4. Creating new Products: Companies can understand the customer’s requirements
by analyzing customer previous purchases and create new products accordingly.

Q 12. What are different types of big data analytics? Explain in

detail. There are four types of big data analytics

1. Prescriptive – This type of analysis reveals what actions should be taken. This is the most valuable

kind of analysis and usually results in rules and recommendations for next steps. Ex: In health care, to

determine where to focus treatment

2. Predictive – An analysis of likely scenarios of what might happen. The deliverables are usually a

predictive forecast. Ex: Weather forecast, share price forecast, exit poll

MONIKA VERMA, CSE, BIT DURG 20

UNIT-1
3. Diagnostic – A look at past performance to determine what happened and why. The result of the

analysis is often an analytic dashboard. Ex: Reason for winning elections by analyzing the social

media etc

4. Descriptive – What is happening now based on incoming data. To mine the analytics, you typically

use a real-time dashboard and/or email reports.

Ex: To categorize customers by their likely product preferences

Q 13. Differentiate between Reporting and Analysis

S.No Point of difference Reporting Analysis

1 Definition The process of organizing data The process of exploring data and
into informational summaries in reports in order to extract
order to monitor how different meaningful insights, which can be
areas of a business are used to better understand and
performing. improve business performance
MONIKA VERMA, CSE, BIT DURG 21
UNIT-1
Reporting translates raw data into
information.
Analysis transforms data and
information into insights.
2Translation/Transf ormation

3 Goal Reporting helps companies to The goal of analysis is to answer

monitor their online business questions by interpreting the data at
and be alerted to when data a deeper level and providing
falls outside of expected actionable recommendations.
ranges. Good reporting should Through the process of performing
raise questions about the analysis you may raise additional
business from its end users questions, but the goal is to identify
answers, or at least potential
answers that can be tested

4 Output In short, reporting shows you Analysis focuses on explaining why it

what is happening is happening and what you can do
about it.

5 Task If most of the team’s time is Analysis focuses on different tasks

spent on activities such as such as questioning, examining,
building, configuring, interpreting, comparing, and
consolidating, organizing, confirming
formatting, and summarizing

6 Types Three main types of reporting: Analysis is two main types: ad hoc
canned reports, dashboards, responses and analysis
and alerts. presentations.

7 Delivery Reporting is more of a push Analysis is all about human beings

model, where people can using their superior reasoning and
access reports through an analytical skills to extract key insights
analytics tool, Excel from the data and form actionable
spreadsheet, widget, or have recommendations for their
them scheduled for delivery organizations. Because of the
into their mailbox, mobile demands of having to provide
device, FTP site, etc. Analysis periodic reports (daily, weekly,
can be “submitted” to decision monthly, etc.) to multiple individuals
makers; it is more effectively and groups, automation becomes a
presented person-to person. key focus in building and delivering
reports.
Q 14. Write short notes on nature of big data and cleaning of

data Nature of Data

MONIKA VERMA, CSE, BIT DURG 22

UNIT-1

Data Quality Measures:

• Accuracy
– The data was recorded correctly.
• Completeness
– All relevant data was recorded.
• Uniqueness
– Entities are recorded once.
• Timeliness
– The data is kept up to date.
• Consistency
– The data agrees with itself

It is becoming easier for enterprises to store and acquire the large amounts of data. These data
sets can facilitate improved decision making, richer analytics, and increasingly, provide training
data for Machine Learning. However, data quality remains to be a major concern, and dirty data
can lead to incorrect decisions and unreliable analysis. Examples of common errors include
missing values, typos, mixed formats, replicated entries of the same real-world entity, outliers
and violations of business rules. Analysts must consider the effects of dirty data before making
any decisions, and as a result, data cleaning has been a key phase of data analytics.

In statistics, an outlier is an observation point that is distant from other observations. Outliers
are sometimes excluded from the data set. For example: a person’s data with height 7.2”.
One of the key differentiating factors is how to define data error (i.e., error detection).
Quantitative techniques, largely used for outlier detection, employ statistical methods to
identify abnormal behaviors and errors (e.g., “a salary that is three standard deviation away

MONIKA VERMA, CSE, BIT DURG 23

UNIT-1

from the mean salary is an error”). On the other hand, qualitative techniques use constraints,
rules, patterns to detect errors (e.g., “there cannot exist two employees of the same level, the
one who is located in Raipur is earning less than the one not in Raipur”).

Q 15. What is intelligent data analytics? Explain in details.

Intelligent Data Analysis

The use of statistical, pattern recognition, machine learning, data abstraction, and visualization
tools for analysis of data and discovery of mechanisms that created the data.

Intelligent Data Analysis (IDA) is an interdisciplinary study concerned with the effective
analysis of data. IDA draws the techniques from diverse fields, including artificial intelligence,
databases, high-performance computing, pattern recognition, and statistics.

Intelligent data analysis mimics a human being and his/her intelligence in analysis of complex
datasets. It is a way of data analysis based on artificial intelligence, on methods enabling to find
out information and knowledge for a particular data domain.

IDA finds out rules and knowledge from data, that is to say it extracts value from data. Though
the exact number of IDA algorithms is too large to calculate, they can be summarized by means
of their developing trends, which are (a) algorithm principle, (b) the scale of the dataset, (c) the
type of the dataset.

Analysis is separating out a whole into parts, study the parts individually and their relationships
with one another.

For example - if we have a whole data set and we are doing analysis on it means we pull sample
data set from the whole data and then learn more about it and how it is related to the other
samples.

Analytics is the principle or logic that drives the analysis.

For example - As mentioned in above example when we pull the sample and do analysis on it,
the techniques or logic we are using to analyze the sample is analytics.

MONIKA VERMA, CSE, BIT DURG 24

UNIT-1

Q 16. What is the difference between Analysis and Analytics?

Analysis is a way to interpret the data and derive meaningful insights from the data. Essentially,
you may use the analytical tools such as Microsoft Excel to plot the graph, pivot, chart to delve
into the subject of interest. Let’s take a very simple example: Your executive wants to know,
“Who are the top 10 salesforce folks who exceeded the targets this year in U.S. region?”. Well,
you can extract the U.S. sales data from the tool and sort it by descending order to arrive at the
top 10. Your leadership team might think of a surprise gift vouchers to them as a token of hard
work and determination!

Analytics: This also holds true in deriving meaningful insights from the data. The difference is,
analytics involves statistical tools & techniques with business acumen to bring out the hidden
patterns, stories from the data. I would say analysis is a sub-set of analytics whereas the latter
involves some complex techniques to solve the problem. Ex: Google recommends you search
ideas when you start typing your keywords. Let’s say, you want to know “how to make a
website”. Google has the search data from your country’s demographics who had already
searched about the similar keywords. Using machine learning algorithm in real-time, your search
query is suggested by the search engine before you complete the keywords!

MONIKA VERMA, CSE, BIT DURG 25

UNIT-1

• Data analytics and data analysis tend to be used interchangeably. Data analysis refers to the
process of examining in close detail the components of a given data set – separating them
out and studying the parts individually and their relationship between one another. Data
analytics, on the other hand, is a broader term referring to a discipline that encompasses
the complete management of data – including collecting, cleaning, organizing, storing,
governing, and analyzing data – as well as the tools and techniques used to do so. So,
data analysis is a process, whereas data analytics is an overarching discipline (which
includes data analysis as a necessary subcomponent). Both data analytics and data
analysis are used to uncover patterns, trends, and anomalies lying within data, and
thereby deliver the insights businesses need to enable evidence-based decision making.
Where they differ, however, is in their approach to data – to put this simply, data analysis
looks at the past, while data analytics tries to predict the future.
• Essentially, the primary difference between analytics and analysis is a matter of scale, as
data analytics is a broader term of which data analysis is a subcomponent. Data analysis
refers to the process of examining, transforming and arranging a given data set in specific
ways in order to study its individual parts and extract useful information. Data analytics
is an overarching science or discipline that encompasses the complete management of
data. This not only includes analysis, but also data collection, organisation, storage, and
all the tools and techniques used.
• It’s the role of the data analyst to collect, analyse, and translate data into information that’s
accessible. By identifying trends and patterns, analysts help organisations make better
business decisions. Their ability to describe, predict, and improve performance has
placed them in increasingly high demand globally and across industries.

Q 17. What are the Top 10 Big Data Tools for Analysis?

MONIKA VERMA, CSE, BIT DURG 26

UNIT-1

“ Information is the oil of the 21st century, and analytics is the combustion engine. ” –
Peter Sondergaard
In the old days, people generally traveled using a horse cart or bullock cart. But it is not
feasible to use such carts in today’s world. Right ??? and Why ???….absolutely because of
the growing population and also the time required in horse or bullock cart is high.

Similarly, in technology world, data is generated at a high rate and it is impossible to store these
massive amounts of data in a traditional way. Thus there is a need for some efficient, modern
and feasible way for the storage of such a large amount of data.

Big data tools for analysis are used to solve the problem of handling and managing data. These
tools perform data analysis tasks that are both time and cost-effective. Also, these tools helped in
exploring the business insights and enhanced the effectiveness of
business. 1. Tableau
The primary objective of Tableau is to focus on business intelligence. It is the best efficient
data visualization tool. In tableau, users do not have to write a program in order to create
maps, charts, etc. For live data in the visualization, tableau explored the web connector to
connect the database or API.

Features of Tableau :
• Tableau provides a central location to delete, manage schedules and tag, and change
permissions.
• It does not require complicated software setup.
• In this real-time collaboration is available.
• Without any integration cost, it can blend various datasets like relational datasets,
structured datasets, etc.

MONIKA VERMA, CSE, BIT DURG 27

UNIT-1

2. Cloudera Distribution for hadoop

If you are searching for a highly secure Big data platform, then Cloudera is the best option
for your project. It is the fastest, modern and most accessible platform. Using this cloudera
hadoop, you will easily get any data across any environment within a single and scalable
platform.

Features of Cloudera Hadoop:

• Provides real-time insights for monitoring and detection.

• Delivers an enterprise-grade and hybrid cloud solution.

• Develops and trains the data model.

Spins up and terminates clusters, and only pays for what is needed.

3. Teradata
Teradata is a tool used for developing large scale data warehousing applications. It is a well
known relational database management system. It generally offers end to end solutions for data
warehousing. Its development is based on the MPP (Massively Parallel Processing Architecture
).
Features of Teradata :
• Teradata can connect network-attached systems or mainframes.
• Its significant components are a node, parsing engine, the message passing layer, and the
access module processor (AMP).
• It is highly scalable.
• It supports industry-standard SQL in order to interact with the data.

MONIKA VERMA, CSE, BIT DURG 28

UNIT-1

4. R – Programming
R Programming language is used for statistical computing, graphics and for big data analysis. It
provides a wide variety of statistical tests.
Features of R programming tool:
•R programming tools provide an effective data handling and storage facility. • It provides a
coherent and integrated collection of big data tools for data analysis. • It also provides graphical
facilities for data analysis which display either on-screen or on hardcopy.

5. Spark
Apache Spark is one of the most powerful open-source big data analytics tools. It is used by
many organizations to process large datasets. It offers high-level operators that make it easy to
build parallel apps.

Features of Spark:
• It offers Fast Processing

• Has the ability to integrate with Hadoop and existing Hadoop Data
• Using Spark an application can be run in a Hadoop cluster, up to 100 times faster in
memory, and ten times faster on disk.

6. Lumify
Lumify is a platform that involves big data fusion, analysis, and visualization. It is a free and
open source tool for analytics. It supports the cloud-based environment and also works well with
Amazon’s AWS.

Features of Lumify:
MONIKA VERMA, CSE, BIT DURG 29
UNIT-1

• Lumify’s primary features include the full-text search, 2D and 3D graph visualizations, link
analysis between graph entities, automatic layout, integration with mapping systems,
geospatial layouts, multimedia analysis, real-time collaboration through a set of projects or
workspaces.
• It is usually built on proven, scalable big data technologies.
• It is secure, scalable, and supported by a dedicated full-time development team.
7. Talend
Talend simplifies and automates big data integration. Its graphical wizard generates native code.
It also allows big data integration, check data quality, and master data management.

Features of Talend:
• Talend Big Data Platform generates native code which simplifies using MapReduce and
Spark.
• It accelerates time to value for big data projects.
• It also simplifies ETL & ELT for big data.
8. Microsoft HDInsight
Azure HDInsight is a Spark and Hadoop service in the cloud. Standard and Premium are the two
data cloud offerings provided by Azure HDInsight. For running the Big data workloads of the
organization it also provides an enterprise-scale cluster.

Features of HDInsight:
• Offers enterprise-grade security and monitoring.
• Protects data assets and extends on-premises security and governance controls to the cloud. •
Provides a high-productivity platform for developers and scientists.

9. Skytree
Skytree is a big data analytics tool that helps data scientists to build more accurate models faster.
It also offers accurate predictive machine learning models that are easy to use.

MONIKA VERMA, CSE, BIT DURG 30

UNIT-1

Features of Skytree:
• Helps to develop Highly Scalable Algorithms.
• Allows data scientists to visualize and understand the logic behind Machine Learning
decisions.
• Solves robust predictive problems with data preparation capabilities.
10. Pentaho
Pentaho is a software that can access, prepare and can analyze any data from any source. It is
a best and trendy choice for data integration, orchestration, and business analytics platform.
The main motto of this too is to turn Big data analytics into big insights.

Features of Pentaho:
• Pentaho generally supports a wide range of Big data sources.
• No such coding is required and it can deliver the data effortlessly to your business. • It
generally permits to check data with easy access to analytics, like charts, visualizations, etc.
• It can also access and integrate data for data visualization effectively.
Q 18. What are the advantages of Big Data?
There is no denying the fact that in less than a decade, Big Data becomes a multi-billion-dollar
industry. Today, Big data revolution has arrived with the growth of the internet, wireless
networks, smartphones, social media and other technologies.

MONIKA VERMA, CSE, BIT DURG 31

UNIT-1

We can define Big data as a very large dataset that can be analyzed to reveal trends, patterns, and
associations. It is beneficial for both big and small businesses. They are making data-driven
decisions using Big data.

Now let us look at some of the most important Advantages of Big Data.
1. Advantages of Big Data for understanding the Market Conditions
Better understanding of current market conditions is possible by analyzing the Big data. Let’s
take an example – by analyzing a customer’s purchasing behaviour, a company can find out the
products which are sold most. It helps to analyze the trend and what customers want. Using this,
a particular business can get ahead of its competitors.

MONIKA VERMA, CSE, BIT DURG 32

UNIT-1

Case Study 1 : Big Data Is Making Fast Food Faster

“ McDonald’s and Burger King are using the below Big Data strategy ”
Have you ever noticed how you get your fries and burgers at McDonald’s or Burger King on
time or sometimes a little earlier ??? Answer is Big Data….Yes, Big Data helps in timely
delivery of your food at the counter. Want to know How ???

Some fast food chains are using Big Data analytics for monitoring their drive through lanes and
it also helps to change their menu features. If the food order line is really backed up, then
features will change to reflect only those items which can be quickly prepared and served. And if
the line is relatively short, then the feature will display only those items that take a bit more time
to prepare.

Consequently you can observe all these menu changes on the LCD screen at food outlets.

MONIKA VERMA, CSE, BIT DURG 33

UNIT-1

2. Big data to improve strategy and pricing

Business intelligence tools based on Big data analytics are used to evaluate finances, which
helps to get a clearer picture of where your business stands. And based on that evaluation who
can adopt the required strategy needed to improve pricing in business or
organization. Case Study 2 : Big Data in Disney Magic Band

At Disneyland park entry, they give a wrist device called Magic Band to each age visitor. That
band will provide you with the key information regarding riding times, queuing times, other
activities etc. All this is done to give you a magical experience from their end.

Now let us learn what’s the magic is behind this “Magic Band”

Magic Band is developed with RFID technology, it interacts with thousands of sensors
strategically placed all around the amusement park. Those sensors monitor and gather optimized
information of activities.

Thus Big Data helps to enhance the customer experience and helps to increase the operational
efficiency at Disneyland park.

3. Big data controls online reputation

We can use Big data tools for sentimental analysis. Thus, you can use it to get feedback about
your company or organization. Feedback is like who is saying what about your company. And if
you want to monitor and improve the online presence of your business, then Big data tools can
be utilized for this purpose.

4. Advantages of Big data in new product development

Using Big data analytics, trends of customer needs and satisfaction can be analyzed. This can
further help to develop a whole new product according to their requirements.

MONIKA VERMA, CSE, BIT DURG 34

UNIT-1
5. Big Data Advantages in Cost Saving
Using Big data tools like Hadoop and Cloud-Based Analytics, cost saving in business can be
done. In business, when large amounts of data is there, then these tools help to handle and
maintain that data in more efficient ways.
6. Use Big data to ensure that we have hired the
right employees
Recruiting companies scan candidate’s resumes and linkedIn profiles using Keywords.
Keywords are such that which matches the job description. The hiring process is no longer based
on how the candidate looks on paper and how they are perceived in
person. 7.
Use Big data to increase your
efficiency
Digital technology like Big data tools boosts your business’s efficiency. From using tools such
as Google Maps, Google Earth, and social media can do many tasks right at your desk. These
tools save a great amount of travel expenses and time too.
8. Big data to focus on local preferences
Big data allows small businesses to focus on the local environment they cater to. It allows you to
have insight into your local client’s likes, dislikes and preferences even more. Once you know
your customer’s preferences it becomes easy to compete in the market.
9. Big Data in Time reduction
High speed tools like Hadoop and in-memory analytics can easily identify new sources of data.
As a result, this helps in immediate analysis of business data and make quick decisions based on
the learning.
10. Big Data to increase sales and loyalty
Big Data helps businesses to grow digitally and to tailor their products and services to what
customers exactly want. The digital footprints of business allows customers to have insight of
product and services on social media platforms. When this satisfies the customers, they start to
MONIKA VERMA, CSE, BIT DURG 35
UNIT-1

purchase and advertise it more and more. As a result, this makes a business more reliable and
loyal among customers.

Q 19. What are the Advantages and Disadvantages of Big

Data? The Advantages and Disadvantages are as follows :
Advantages Disadvantages

Better decision-making Data quality: the quality of data needs to be good

and arranged to proceed with big data analytics.

Increased productivity Hardware needs: Storage space that needs to be

there for housing the data, networking bandwidth to

transfer it to and from analytics systems, are all

expensive to purchase and maintain the Big Data

environment.
Reduce costs Cybersecurity risks: Storing sensitive and large

amounts of data, can make companies a more

attractive target for cyber attackers, which can use

the data for ransom or other wrongful purposes.

Improved customer service Hiccups in integrating with legacy systems:

MONIKA VERMA, CSE, BIT DURG 36

UNIT-1

Many old enterprises that have been in

business for a long time have stored data in

different applications and systems throughout

different architecture and environments. This

creates problems in integrating outdated data

sources and moving data, which further adds

to the time and expense of working with big

data.

Q 20. What are the popular Big Data Analytics Tools?

Here are some of the tools :

• Hadoop - helps in storing and analyzing data

• MongoDB - used on datasets that change frequently
• Talend - used for data integration and management

• Cassandra - a distributed database used to handle chunks of data • Spark - used

for real-time processing and analyzing large amounts of data • STORM - an
open-source real-time computational system • Kafka - a distributed streaming
platform that is used for fault-tolerant storage

MONIKA VERMA, CSE, BIT DURG 37

UNIT-1
MONIKA VERMA, CSE, BIT DURG 38

Seminar Report BIG DATA
No ratings yet
Seminar Report BIG DATA
28 pages
Evolution of Big Data Analytics Tools
No ratings yet
Evolution of Big Data Analytics Tools
9 pages
Dsbda Unit1
No ratings yet
Dsbda Unit1
221 pages
Unit - 1
No ratings yet
Unit - 1
46 pages
Big Data Analytics Seminar Report
No ratings yet
Big Data Analytics Seminar Report
34 pages
Unit 1 Bigdata
No ratings yet
Unit 1 Bigdata
30 pages
Unit 1
No ratings yet
Unit 1
107 pages
BDA Unit 1
No ratings yet
BDA Unit 1
10 pages
CS8091 - Big Data Analytics - Unit 1
No ratings yet
CS8091 - Big Data Analytics - Unit 1
28 pages
Bda (Unit 1)
No ratings yet
Bda (Unit 1)
24 pages
Unit I-KCS-061
No ratings yet
Unit I-KCS-061
42 pages
Big Data Analysis
No ratings yet
Big Data Analysis
14 pages
001 Introduction Big Data
No ratings yet
001 Introduction Big Data
12 pages
Big Data Basics for IT Professionals
No ratings yet
Big Data Basics for IT Professionals
108 pages
Big Data Analytics
No ratings yet
Big Data Analytics
21 pages
Big Data Analytics Overview
No ratings yet
Big Data Analytics Overview
37 pages
Unit I: Chapter 1: Introduction To Big Data
No ratings yet
Unit I: Chapter 1: Introduction To Big Data
35 pages
Big Data Introduction Unit 1
No ratings yet
Big Data Introduction Unit 1
19 pages
Overview of Big Data Evolution
No ratings yet
Overview of Big Data Evolution
8 pages
Big Data Analytics Course Overview
No ratings yet
Big Data Analytics Course Overview
38 pages
Big Data
No ratings yet
Big Data
10 pages
BIG Data Analytics
No ratings yet
BIG Data Analytics
17 pages
AMR Assignment
No ratings yet
AMR Assignment
11 pages
R19 Bda Unit-1
No ratings yet
R19 Bda Unit-1
22 pages
BDA Unit 1
No ratings yet
BDA Unit 1
68 pages
Big Data: Challenges and Technologies
No ratings yet
Big Data: Challenges and Technologies
6 pages
Big Data Insights for Businesses
No ratings yet
Big Data Insights for Businesses
23 pages
Module I Big Data
No ratings yet
Module I Big Data
7 pages
Unit 3 Big Data Analytics
No ratings yet
Unit 3 Big Data Analytics
18 pages
Current Trends in Big Data - V4
0% (1)
Current Trends in Big Data - V4
8 pages
Big Data Analytics: Recent Achievements and New Challenges
No ratings yet
Big Data Analytics: Recent Achievements and New Challenges
5 pages
UNIT - 1 - DA - Notes
No ratings yet
UNIT - 1 - DA - Notes
51 pages
Unit I
No ratings yet
Unit I
66 pages
Big Data Study 1
No ratings yet
Big Data Study 1
77 pages
Big Data Essentials for Professionals
No ratings yet
Big Data Essentials for Professionals
29 pages
Introduction To Big Data Analytics - Thendral1
No ratings yet
Introduction To Big Data Analytics - Thendral1
26 pages
Bda Unit I LM
No ratings yet
Bda Unit I LM
14 pages
Processign Using Hadoop
No ratings yet
Processign Using Hadoop
44 pages
Big Data
No ratings yet
Big Data
7 pages
Introduction To Bigdata
No ratings yet
Introduction To Bigdata
31 pages
Unit 1 Notes
No ratings yet
Unit 1 Notes
26 pages
Report On Big Data
No ratings yet
Report On Big Data
23 pages
Big Data Platforms and Analytics
No ratings yet
Big Data Platforms and Analytics
20 pages
Bda Unit 1
No ratings yet
Bda Unit 1
47 pages
Big Data Analytics Module-1
No ratings yet
Big Data Analytics Module-1
26 pages
Big Data Intro
No ratings yet
Big Data Intro
32 pages
Big Data Integration in Enterprises
No ratings yet
Big Data Integration in Enterprises
10 pages
BDA Lecture Notes Updated Unit 1
No ratings yet
BDA Lecture Notes Updated Unit 1
37 pages
Big Data Pgdca
No ratings yet
Big Data Pgdca
23 pages
Unit 1
No ratings yet
Unit 1
89 pages
Understanding Big Data Characteristics
No ratings yet
Understanding Big Data Characteristics
20 pages
Itfm Assignment Group 8
100% (1)
Itfm Assignment Group 8
16 pages
Lec 1 - Introduction To Big Data
No ratings yet
Lec 1 - Introduction To Big Data
37 pages
Bigdata Writing
No ratings yet
Bigdata Writing
11 pages
Bda 1
No ratings yet
Bda 1
26 pages
Big Data Mid Term Report
No ratings yet
Big Data Mid Term Report
11 pages
Big Data Analytics (VN) 1
No ratings yet
Big Data Analytics (VN) 1
98 pages
Unit 1
No ratings yet
Unit 1
20 pages
Fundamentals of Big Data Analytics
No ratings yet
Fundamentals of Big Data Analytics
151 pages
English: Sources of Media and Information
No ratings yet
English: Sources of Media and Information
14 pages
Milroyal C Pumps Datasheet
No ratings yet
Milroyal C Pumps Datasheet
12 pages
LHS-GHS Area - Hoist Schedule
No ratings yet
LHS-GHS Area - Hoist Schedule
12 pages
Encyclopedia of Data Warehousing and Mining 1st John Wang PDF Download
100% (1)
Encyclopedia of Data Warehousing and Mining 1st John Wang PDF Download
77 pages
Hindustan Times 27-11-2025
No ratings yet
Hindustan Times 27-11-2025
28 pages
880 Series User Manual - MAN-027.2023
No ratings yet
880 Series User Manual - MAN-027.2023
88 pages
Passenger Amenities
No ratings yet
Passenger Amenities
11 pages
XXE (XML External Entity) Vuln
100% (1)
XXE (XML External Entity) Vuln
13 pages
Ignacio Devers David CV
No ratings yet
Ignacio Devers David CV
3 pages
Export Certificate UAE to Indonesia
No ratings yet
Export Certificate UAE to Indonesia
3 pages
Gerdelan Anton - Professional Programming Tools For C and C++ (2020)
No ratings yet
Gerdelan Anton - Professional Programming Tools For C and C++ (2020)
152 pages
AWS Cost Breakdown for IT Teams
100% (1)
AWS Cost Breakdown for IT Teams
1 page
Application Process - 10222022
No ratings yet
Application Process - 10222022
108 pages
Introduction To TCP and UDP: Sign Up
No ratings yet
Introduction To TCP and UDP: Sign Up
5 pages
Telemarketing PPT Made by Arsalan and Aradhak
No ratings yet
Telemarketing PPT Made by Arsalan and Aradhak
13 pages
Name Shubham Mali
No ratings yet
Name Shubham Mali
11 pages
Cot 3 - Tle Ict Week 6
100% (1)
Cot 3 - Tle Ict Week 6
6 pages
Autofocus Test Chart
No ratings yet
Autofocus Test Chart
31 pages
Soal Bahasa Inggris
No ratings yet
Soal Bahasa Inggris
5 pages
The Immune System 4th Edition Ebook PDF Official Test Bank
No ratings yet
The Immune System 4th Edition Ebook PDF Official Test Bank
411 pages
AEM Mailer
No ratings yet
AEM Mailer
2 pages
Tech Use Across Generations
No ratings yet
Tech Use Across Generations
3 pages
Seal428: V2.0 User's Manual Vol. 2
No ratings yet
Seal428: V2.0 User's Manual Vol. 2
111 pages
Industrial Battery Solutions
No ratings yet
Industrial Battery Solutions
2 pages
Scrum in Construction Industry
No ratings yet
Scrum in Construction Industry
44 pages
Wi-Fi Monthly Statement: This Month's Charges Summary
No ratings yet
Wi-Fi Monthly Statement: This Month's Charges Summary
3 pages
A K Akella Abstract
No ratings yet
A K Akella Abstract
4 pages
ĐỀ 9 (HS)
No ratings yet
ĐỀ 9 (HS)
4 pages
Oxylog 1000 SOP for EMRS Staff
No ratings yet
Oxylog 1000 SOP for EMRS Staff
5 pages
Scania DI09, DI13, DI16 Marine Engines - CAN Interface PDF
100% (4)
Scania DI09, DI13, DI16 Marine Engines - CAN Interface PDF
81 pages