UNIT-1 - Big Data and Hadoop
UNIT-1 - Big Data and Hadoop
CONCEPTS OF BIG DATA: Concept of Big Data Platform – Evolution and Challenges of
Conventional Systems - Intelligent data analysis – Nature of Data - Analytic Processes and
Tools - Analysis vs Reporting - Modern Data Analytic Tools- Applications of big data.
Questions:
Big data is a term that is used to describe data that is high volume, high velocity, and/or high
variety; requires new technologies and techniques to capture, store, and analyze it; and is used to
enhance decision making, provide insight and discovery, and support and optimize processes.
There are three dimensions to big data known as Volume, Variety and Velocity.
Characteristics of Big Data: Initially only 3 Vs. After wards 4th V is added. 1.
Volume(Scale)-Data Volume, 44x increase from 2009 2020, From 0.8 zettabytes to 35zb,
Data volume is increasing exponentially
2. Velocity(Speed)-Data is being generated fast and need to be processed fast
Online Data Analytics
Late decisions ➔ missing opportunities
Examples
E-Promotions: Based on your current location, your purchase history, what you like ➔
send promotions right now for store next to you
Healthcare monitoring: sensors monitoring your activities and body ➔ any abnormal
measurements require immediate reaction
3. Variety (Complexity)- Various formats, types, and structures
Text, numerical, images, audio, video, sequences, time series, social media data, multi
dim arrays, etc…
Static data vs. streaming data
A single application can be generating/collecting many types of data
To extract knowledge➔ all these types of data need to linked together
4. Variability – This refers to the inconsistency which can be shown by the data at times,
thus hampering the process of being able to handle and manage the data effectively.
MONIKA VERMA, CSE, BIT DURG 1
UNIT-1
MONIKA VERMA, CSE, BIT DURG 2
UNIT-1
1TB=1024GB
1 ZettaByte=1024 EB
1 YottaByte=1024 ZB
Big data "size" is a constantly moving target, as of 2012 ranging from a few dozen terabytes to
many petabytes of data.[
Big Data is a complex set of Data. Big Data is a collection of huge volumes of
“Big Data” is a relative term depending on who is discussing it. Big Data to Amazon or Google
is very different than Big Data to a medium-sized insurance organization, but no less “Big” in
the minds of those contending with it.
Such foundational steps to the modern conception of Big Data involve the development of
computers, smart phones, the internet, and sensory (Internet of Things) equipment to provide
data. Credit cards also played a role, by providing increasingly large amounts of data, and
certainly social media changed the nature of data volumes in novel and still developing ways.
The evolution of modern technology is interwoven with the evolution of Big Data.
In its true essence, Big Data is not something that is completely new or only of the last two
decades. Over the course of centuries, people have been trying to use data analysis and analytics
techniques to support their decision-making process. The ancient Egyptians around 300 BC
already tried to capture all existing ‘data’ in the library of Alexandria. Moreover, the Roman
Empire used to carefully analyze statistics of their military to determine the optimal distribution
for their armies.
However, in the last two decades, the volume and speed with which data is generated has
changed – beyond measures of human comprehension. The total amount of data in the world was
4.4 zettabytes in 2013. That is set to rise steeply to 44 zettabytes by 2020. To put that in
perspective, 44 zettabytes is equivalent to 44 trillion gigabytes. Even with the most advanced
technologies today, it is impossible to analyze all this data. The need to process these
increasingly larger (and unstructured) data sets is how traditional data analysis transformed into
‘Big Data’ in the last decade.
To illustrate this development over time, the evolution of Big Data can roughly be sub-divided
into three main phases. Each phase has its own characteristics and capabilities. In order to
understand the context of Big Data today, it is important to understand how each phase
contributed to the contemporary meaning of Big Data.
Database management and data warehousing are considered the core components of Big Data
Phase 1. It provides the foundation of modern data analysis as we know it today, using well
known techniques such as database queries, online analytical processing and standard reporting
tools.
Since the early 2000s, the Internet and the Web began to offer unique data collections and data
analysis opportunities. With the expansion of web traffic and online stores, companies such as
Yahoo, Amazon and eBay started to analyze customer behaviour by analyzing click-rates, IP
specific location data and search logs. This opened a whole new world of possibilities.
From a data analysis, data analytics, and Big Data point of view, HTTP-based web traffic
introduced a massive increase in semi-structured and unstructured data. Besides the standard
structured data types, organizations now needed to find new approaches and storage solutions to
deal with these new data types in order to analyze them effectively. The arrival and growth of
social media data greatly aggravated the need for tools, technologies and analytics techniques
that were able to extract meaningful information out of this unstructured data.
Mobile devices not only give the possibility to analyze behavioral data (such as clicks and search
queries), but also give the possibility to store and analyze location-based data (GPS-data). With
the advancement of these mobile devices, it is possible to track movement, analyze physical
behaviour and even health-related data (number of steps you take per day). This data provides a
whole new range of opportunities, from transportation, to city design and health care.
A summary of the three phases in Big Data is listed in the figure below:
Q 3. Write different types of data. Give example of file types that fall under
different types?
Types of Data
Generally Big Data consists unstructured Data
• Structured Data
Structured data concerns all data which can be stored in database SQL in table with rows and
columns. They have relational key and can be easily mapped into pre-designed fields.
Structured data is highly organized information that uploads neatly into a relational database
Structured data is relatively simple to enter, store, query, and analyze, but it must be strictly
defined in terms of field name and type
• Unstructured Data
Unstructured data may have its own internal structure, but does not conform neatly into a
spreadsheet or database.
The fundamental challenge of unstructured data sources is that they are difficult for nontechnical
business users and data analysts alike to unbox, understand, and prepare for analytic use.
Photographs and video: This includes security, surveillance, and traffic video
Website content: This comes from any site delivering unstructured content, like YouTube,
Flickr, or Instagram.
Examples of semi-structured: CSV but XML and JSON (JavaScript Object Notation) documents
are semi structured documents, NoSQL databases are considered as semi structured.
Semi-structured data is a form of structured data that does not conform to the formal structure
of data models associated with relational databases or other forms of data tables, but nonetheless
contains tags or other markers to separate semantic elements and enforce hierarchies of records
and fields within the data.
Q 4: What do you mean by Big data Analytics?
Artificial Intelligence (AI), mobile, social and Internet of Things (IoT) are driving data
complexity, new forms and sources of data. Big data analytics is the use of advanced analytic
techniques against very large, diverse data sets that include structured, semi-structured and
unstructured data, from different sources, and in different sizes from terabytes to zettabytes.
Big data is a term applied to data sets whose size or type is beyond the ability of traditional
relational databases to capture, manage, and process the data with low-latency. And it has one or
more of the following characteristics – high volume, high velocity, or high variety. Big data
comes from sensors, devices, video/audio, networks, log files, transactional applications, web,
and social media - much of it generated in real time and in a very large scale.
Analyzing big data allows analysts, researchers, and business users to make better and faster
decisions using data that was previously inaccessible or unusable. Using advanced analytics
techniques such as text analytics, machine learning, predictive analytics, data mining, statistics,
and natural language processing, businesses can analyze previously untapped data sources
independent or together with their existing enterprise data to gain new insights resulting in better
and faster decisions.
• Analytics has, in a sense, been around since 1663, when John Graunt dealt with
“overwhelming amounts of information,” using statistics to study the bubonic plague. In
2017, 2,800 experienced professionals who worked with Business Intelligence were
surveyed, and they predicted Data Discovery and Data Visualization will become an
important trend. Data Visualization is a form of visual communication (think
infographics). It describes information which has been translated into schematic format,
and includes changes, variables, and fluctuations. A human brain can process visual
patterns very efficiently.
• Visualization models are steadily becoming more popular as an important method for
gaining insights from Big Data. (Graphics are common, and animation will become
common. At present, data visualization models are a little clumsy, and could use some
improvement.) Listed below are some of the businesses offering Big Data visualization
models:
Big Data is revolutionizing entire industries and changing human culture and behavior. It is a
result of the information age and is changing how people exercise, create music, and work. The
following provides some examples of Big Data use.
• Big Data is being used in healthcare to map disease outbreaks and test alternative treatments. •
NASA uses Big Data to explore the universe.
• The music industry replaces intuition with Big Data studies.
• Utilities use Big Data to study customer behavior and avoid blackouts.
• Nikeuses health monitoring wearables to track customers and provide feedback on their health. •
Big Data is being used by cybersecurity to stop cybercrime.
Q 6. What are the different features provided by big data platforms? List
four big data platforms.
Big data platform is a type of IT solution that combines the features and capabilities of several
big data application and utilities within a single solution. It is an enterprise class IT platform that
enables organization in developing, deploying, operating and managing a big data infrastructure
/environment.
Big data platform generally consists of big data storage, servers, database, big data management,
business intelligence and other big data management utilities. It also supports custom
development, querying and integration with other systems. The primary benefit behind a big data
platform is to reduce the complexity of multiple vendors/ solutions into a one cohesive solution.
Big data platform are also delivered through cloud where the provider provides an all inclusive
big data solutions and services.
• Clearly defined fields organized in records. Records are usually stored in tables. Fields have
names, and relationships are defined between different fields.
• Schema-on-write that requires data to be validated against a schema before it can be written to
disk. A significant amount of requirements analysis, design, and effort up front can be
involved in putting the data in clearly defined structured formats. This can increase the time
before business value can be realized from the data.
• A design to get data from the disk and load the data into memory to be processed by
applications. This is an extremely inefficient architecture when processing large volumes of
data this way. The data is extremely large and the programs are small. The big component
must move to the small component for processing.
• The use of Structured Query Language (SQL) for managing and accessing the data. •
Relational and warehouse database systems that often read data in 8k or 16k block sizes. These
block sizes load data into memory, and then the data are processed by applications. When
processing large volumes of data, reading the data in these block sizes is extremely inefficient.
• Organizations today contain large volumes of information that is not actionable or being
leveraged for the information it contains.
• An order management system is designed to take orders. A web application is designed for
operational efficiency. A customer system is designed to manage information on customers.
Data from these systems usually reside in separate data silos. However, bringing this
information together and correlating with other data can help establish detailed patterns on
customers.
• In a number of traditional soloed environments, data scientists can spend 80% of their time
looking for the right data and 20% of the time doing analytics. A data-driven environment
must have data scientists spending a lot more time doing analytics.
Google realized that if it wanted to be able to rank the Internet, it had to design a new way
of solving the problem. It started with looking at what was needed:
• Inexpensive storage that could store massive amounts of data cost effectively •
To scale cost effectively as the data volume continued to increase
• To analyze these large data volumes very fast
• To be able to correlate semi-structured and unstructured data with existing structured data • To
work with unstructured data that had many forms that could change frequently; for example,
data structures from organizations such as Twitter can change regularly
• Inexpensive storage. The most inexpensive storage is local storage from off-the-shelf disks.
•A data platform that could handle large volumes of data and be linearly scalable at cost and
performance.
• A highly parallel processing model that was highly distributed to access and compute the data
very fast.
• A data repository that could break down the silos and store structured, semi-structured, and
unstructured data to make it easy to correlate and analyze the data together.
Traditional systems are designed from the Big data systems contain data repository that
ground up to work with data that has could break down the silos (A data silo is a
primarily been structured data with clearly repository of fixed data that remains under
defined fields organized in records. Records the control of one department and is isolated
are usually stored in tables. Fields have from the rest of the organization) and store
names, and relationships are defined structured, semi-structured, and unstructured
between different fields. data to make it easy to correlate and analyze
the data together.
Lower accuracy in data analytics: Because Higher accuracy of data analysis: Because of
data is so expensive to store in traditional the relatively low cost of storage of Hadoop,
systems, data is filtered, and large volumes the detailed records are stored in Hadoop’s
are thrown out because of the cost of storage system HDFS. Traditional data can
storage. Minimizing the data to be analyzed then be analyzed with nontraditional data in
reduces the accuracy and confidence of the Hadoop to find correlation points that can
results. Not only are accuracy and confidence provide much higher accuracy of data
to the resulting data affected, but it also analysis.
limits an organization’s ability to identify
business opportunities.
robust architecture saves the company money. It helps them to predict future trends and
improves decision making.
1. Data Sources
Data sources govern Big Data architecture. It involves all those sources from where the data
extraction pipeline gets built. Data Sources are the starting point of the big data pipeline.
Data arrives through multiple sources including relational databases, sensors, company
servers, IoT devices, static files generated from apps such as Windows logs, third-party data
providers, etc. This data can be batch data or real-time data. Big Data architecture is designed
in such a way that it handles this vast amount of data.
2. Data Storage
Data Storage is the receiving end for Big Data. Data Storage receives data of varying formats
from multiple data sources and stores them. It even changes the format of the data received
from data sources depending on the system requirements. For example, Big Data architecture
stores unstructured data in distributed file storage systems like HDFS or NoSQL database. It
stores structured data in RDBMS.
4. Batch Processing
The architecture requires a batch processing system for filtering, aggregating, and processing
data which is huge in size for advanced analytics. These are generally long-running batch
jobs that involve reading the data from the data storage, processing it, and writing outputs to
the new files. The most commonly used solution for Batch Processing is Apache Hadoop.
5. Stream Processing
There is a little difference between stream processing and real-time message ingestion.
Stream processing handles all streaming data which occurs in windows or streams. It then
writes the data to the output sink. It includes Apache Spark, Storm, Apache Flink, etc.
6. Analytical Data Store
After processing data, we need to bring data in one place so that we can accomplish an
analysis of the entire data set. The analytical data store is important as it stores all our
process data at one place making analysis comprehensive. It is optimized mainly for analysis
rather than transactions. It can be a relational database or cloud-based data warehouse
depending on our needs.
8. Orchestration
Moving data through these systems requires orchestration in some form of automation.
Ingesting data, transforming the data, moving data in batches and stream processes, then
loading it to an analytical data store, and then analyzing it to derive insights must be in a
repeatable workflow. This allows us to continuously gain insights from our big data.
2. Scaling
Big Data architecture must be designed in such a way that it can scale up when the need
arises. Otherwise, the system performance can degrade significantly.
3. Security
Data Security is the most crucial part. It is the biggest challenge while dealing with big data.
Hackers and Fraudsters may try to add their own fake data or skim companies’ data for
sensitive information. Cybercriminal would easily mine company data if companies do not
encrypt the data, secure the perimeters, and work to anonymize the data for removing
sensitive information.
1. Reducing costs: Big data technologies such as Apache Hadoop significantly reduce
storage costs.
2. Improve decision making: The use of Big data architecture streaming component
enables companies to make decisions in real-time.
3. Future trends prediction: Big Data analytics helps companies to predict future
trends by analyzing big data from multiple sources.
4. Creating new Products: Companies can understand the customer’s requirements
by analyzing customer previous purchases and create new products accordingly.
1. Prescriptive – This type of analysis reveals what actions should be taken. This is the most valuable
kind of analysis and usually results in rules and recommendations for next steps. Ex: In health care, to
2. Predictive – An analysis of likely scenarios of what might happen. The deliverables are usually a
predictive forecast. Ex: Weather forecast, share price forecast, exit poll
analysis is often an analytic dashboard. Ex: Reason for winning elections by analyzing the social
media etc
4. Descriptive – What is happening now based on incoming data. To mine the analytics, you typically
1 Definition The process of organizing data The process of exploring data and
into informational summaries in reports in order to extract
order to monitor how different meaningful insights, which can be
areas of a business are used to better understand and
performing. improve business performance
MONIKA VERMA, CSE, BIT DURG 21
UNIT-1
Reporting translates raw data into
information.
Analysis transforms data and
information into insights.
2Translation/Transf ormation
6 Types Three main types of reporting: Analysis is two main types: ad hoc
canned reports, dashboards, responses and analysis
and alerts. presentations.
• Accuracy
– The data was recorded correctly.
• Completeness
– All relevant data was recorded.
• Uniqueness
– Entities are recorded once.
• Timeliness
– The data is kept up to date.
• Consistency
– The data agrees with itself
It is becoming easier for enterprises to store and acquire the large amounts of data. These data
sets can facilitate improved decision making, richer analytics, and increasingly, provide training
data for Machine Learning. However, data quality remains to be a major concern, and dirty data
can lead to incorrect decisions and unreliable analysis. Examples of common errors include
missing values, typos, mixed formats, replicated entries of the same real-world entity, outliers
and violations of business rules. Analysts must consider the effects of dirty data before making
any decisions, and as a result, data cleaning has been a key phase of data analytics.
In statistics, an outlier is an observation point that is distant from other observations. Outliers
are sometimes excluded from the data set. For example: a person’s data with height 7.2”.
One of the key differentiating factors is how to define data error (i.e., error detection).
Quantitative techniques, largely used for outlier detection, employ statistical methods to
identify abnormal behaviors and errors (e.g., “a salary that is three standard deviation away
from the mean salary is an error”). On the other hand, qualitative techniques use constraints,
rules, patterns to detect errors (e.g., “there cannot exist two employees of the same level, the
one who is located in Raipur is earning less than the one not in Raipur”).
The use of statistical, pattern recognition, machine learning, data abstraction, and visualization
tools for analysis of data and discovery of mechanisms that created the data.
Intelligent Data Analysis (IDA) is an interdisciplinary study concerned with the effective
analysis of data. IDA draws the techniques from diverse fields, including artificial intelligence,
databases, high-performance computing, pattern recognition, and statistics.
Intelligent data analysis mimics a human being and his/her intelligence in analysis of complex
datasets. It is a way of data analysis based on artificial intelligence, on methods enabling to find
out information and knowledge for a particular data domain.
IDA finds out rules and knowledge from data, that is to say it extracts value from data. Though
the exact number of IDA algorithms is too large to calculate, they can be summarized by means
of their developing trends, which are (a) algorithm principle, (b) the scale of the dataset, (c) the
type of the dataset.
Analysis is separating out a whole into parts, study the parts individually and their relationships
with one another.
For example - if we have a whole data set and we are doing analysis on it means we pull sample
data set from the whole data and then learn more about it and how it is related to the other
samples.
Analysis is a way to interpret the data and derive meaningful insights from the data. Essentially,
you may use the analytical tools such as Microsoft Excel to plot the graph, pivot, chart to delve
into the subject of interest. Let’s take a very simple example: Your executive wants to know,
“Who are the top 10 salesforce folks who exceeded the targets this year in U.S. region?”. Well,
you can extract the U.S. sales data from the tool and sort it by descending order to arrive at the
top 10. Your leadership team might think of a surprise gift vouchers to them as a token of hard
work and determination!
Analytics: This also holds true in deriving meaningful insights from the data. The difference is,
analytics involves statistical tools & techniques with business acumen to bring out the hidden
patterns, stories from the data. I would say analysis is a sub-set of analytics whereas the latter
involves some complex techniques to solve the problem. Ex: Google recommends you search
ideas when you start typing your keywords. Let’s say, you want to know “how to make a
website”. Google has the search data from your country’s demographics who had already
searched about the similar keywords. Using machine learning algorithm in real-time, your search
query is suggested by the search engine before you complete the keywords!
• Data analytics and data analysis tend to be used interchangeably. Data analysis refers to the
process of examining in close detail the components of a given data set – separating them
out and studying the parts individually and their relationship between one another. Data
analytics, on the other hand, is a broader term referring to a discipline that encompasses
the complete management of data – including collecting, cleaning, organizing, storing,
governing, and analyzing data – as well as the tools and techniques used to do so. So,
data analysis is a process, whereas data analytics is an overarching discipline (which
includes data analysis as a necessary subcomponent). Both data analytics and data
analysis are used to uncover patterns, trends, and anomalies lying within data, and
thereby deliver the insights businesses need to enable evidence-based decision making.
Where they differ, however, is in their approach to data – to put this simply, data analysis
looks at the past, while data analytics tries to predict the future.
• Essentially, the primary difference between analytics and analysis is a matter of scale, as
data analytics is a broader term of which data analysis is a subcomponent. Data analysis
refers to the process of examining, transforming and arranging a given data set in specific
ways in order to study its individual parts and extract useful information. Data analytics
is an overarching science or discipline that encompasses the complete management of
data. This not only includes analysis, but also data collection, organisation, storage, and
all the tools and techniques used.
• It’s the role of the data analyst to collect, analyse, and translate data into information that’s
accessible. By identifying trends and patterns, analysts help organisations make better
business decisions. Their ability to describe, predict, and improve performance has
placed them in increasingly high demand globally and across industries.
Q 17. What are the Top 10 Big Data Tools for Analysis?
“ Information is the oil of the 21st century, and analytics is the combustion engine. ” –
Peter Sondergaard
In the old days, people generally traveled using a horse cart or bullock cart. But it is not
feasible to use such carts in today’s world. Right ??? and Why ???….absolutely because of
the growing population and also the time required in horse or bullock cart is high.
Similarly, in technology world, data is generated at a high rate and it is impossible to store these
massive amounts of data in a traditional way. Thus there is a need for some efficient, modern
and feasible way for the storage of such a large amount of data.
Big data tools for analysis are used to solve the problem of handling and managing data. These
tools perform data analysis tasks that are both time and cost-effective. Also, these tools helped in
exploring the business insights and enhanced the effectiveness of
business. 1. Tableau
The primary objective of Tableau is to focus on business intelligence. It is the best efficient
data visualization tool. In tableau, users do not have to write a program in order to create
maps, charts, etc. For live data in the visualization, tableau explored the web connector to
connect the database or API.
Features of Tableau :
• Tableau provides a central location to delete, manage schedules and tag, and change
permissions.
• It does not require complicated software setup.
• In this real-time collaboration is available.
• Without any integration cost, it can blend various datasets like relational datasets,
structured datasets, etc.
Spins up and terminates clusters, and only pays for what is needed.
3. Teradata
Teradata is a tool used for developing large scale data warehousing applications. It is a well
known relational database management system. It generally offers end to end solutions for data
warehousing. Its development is based on the MPP (Massively Parallel Processing Architecture
).
Features of Teradata :
• Teradata can connect network-attached systems or mainframes.
• Its significant components are a node, parsing engine, the message passing layer, and the
access module processor (AMP).
• It is highly scalable.
• It supports industry-standard SQL in order to interact with the data.
4. R – Programming
R Programming language is used for statistical computing, graphics and for big data analysis. It
provides a wide variety of statistical tests.
Features of R programming tool:
•R programming tools provide an effective data handling and storage facility. • It provides a
coherent and integrated collection of big data tools for data analysis. • It also provides graphical
facilities for data analysis which display either on-screen or on hardcopy.
5. Spark
Apache Spark is one of the most powerful open-source big data analytics tools. It is used by
many organizations to process large datasets. It offers high-level operators that make it easy to
build parallel apps.
Features of Spark:
• It offers Fast Processing
• Has the ability to integrate with Hadoop and existing Hadoop Data
• Using Spark an application can be run in a Hadoop cluster, up to 100 times faster in
memory, and ten times faster on disk.
6. Lumify
Lumify is a platform that involves big data fusion, analysis, and visualization. It is a free and
open source tool for analytics. It supports the cloud-based environment and also works well with
Amazon’s AWS.
Features of Lumify:
MONIKA VERMA, CSE, BIT DURG 29
UNIT-1
• Lumify’s primary features include the full-text search, 2D and 3D graph visualizations, link
analysis between graph entities, automatic layout, integration with mapping systems,
geospatial layouts, multimedia analysis, real-time collaboration through a set of projects or
workspaces.
• It is usually built on proven, scalable big data technologies.
• It is secure, scalable, and supported by a dedicated full-time development team.
7. Talend
Talend simplifies and automates big data integration. Its graphical wizard generates native code.
It also allows big data integration, check data quality, and master data management.
Features of Talend:
• Talend Big Data Platform generates native code which simplifies using MapReduce and
Spark.
• It accelerates time to value for big data projects.
• It also simplifies ETL & ELT for big data.
8. Microsoft HDInsight
Azure HDInsight is a Spark and Hadoop service in the cloud. Standard and Premium are the two
data cloud offerings provided by Azure HDInsight. For running the Big data workloads of the
organization it also provides an enterprise-scale cluster.
Features of HDInsight:
• Offers enterprise-grade security and monitoring.
• Protects data assets and extends on-premises security and governance controls to the cloud. •
Provides a high-productivity platform for developers and scientists.
9. Skytree
Skytree is a big data analytics tool that helps data scientists to build more accurate models faster.
It also offers accurate predictive machine learning models that are easy to use.
Features of Skytree:
• Helps to develop Highly Scalable Algorithms.
• Allows data scientists to visualize and understand the logic behind Machine Learning
decisions.
• Solves robust predictive problems with data preparation capabilities.
10. Pentaho
Pentaho is a software that can access, prepare and can analyze any data from any source. It is
a best and trendy choice for data integration, orchestration, and business analytics platform.
The main motto of this too is to turn Big data analytics into big insights.
Features of Pentaho:
• Pentaho generally supports a wide range of Big data sources.
• No such coding is required and it can deliver the data effortlessly to your business. • It
generally permits to check data with easy access to analytics, like charts, visualizations, etc.
• It can also access and integrate data for data visualization effectively.
Q 18. What are the advantages of Big Data?
There is no denying the fact that in less than a decade, Big Data becomes a multi-billion-dollar
industry. Today, Big data revolution has arrived with the growth of the internet, wireless
networks, smartphones, social media and other technologies.
We can define Big data as a very large dataset that can be analyzed to reveal trends, patterns, and
associations. It is beneficial for both big and small businesses. They are making data-driven
decisions using Big data.
Now let us look at some of the most important Advantages of Big Data.
1. Advantages of Big Data for understanding the Market Conditions
Better understanding of current market conditions is possible by analyzing the Big data. Let’s
take an example – by analyzing a customer’s purchasing behaviour, a company can find out the
products which are sold most. It helps to analyze the trend and what customers want. Using this,
a particular business can get ahead of its competitors.
Some fast food chains are using Big Data analytics for monitoring their drive through lanes and
it also helps to change their menu features. If the food order line is really backed up, then
features will change to reflect only those items which can be quickly prepared and served. And if
the line is relatively short, then the feature will display only those items that take a bit more time
to prepare.
Consequently you can observe all these menu changes on the LCD screen at food outlets.
At Disneyland park entry, they give a wrist device called Magic Band to each age visitor. That
band will provide you with the key information regarding riding times, queuing times, other
activities etc. All this is done to give you a magical experience from their end.
Now let us learn what’s the magic is behind this “Magic Band”
Magic Band is developed with RFID technology, it interacts with thousands of sensors
strategically placed all around the amusement park. Those sensors monitor and gather optimized
information of activities.
Thus Big Data helps to enhance the customer experience and helps to increase the operational
efficiency at Disneyland park.
purchase and advertise it more and more. As a result, this makes a business more reliable and
loyal among customers.
environment.
Reduce costs Cybersecurity risks: Storing sensitive and large
data.