100% found this document useful (1 vote)

123 views17 pages

Introduction to Big Data Concepts

Uploaded by

Ishika Singh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

123 views17 pages

Introduction to Big Data Concepts

Uploaded by

Ishika Singh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 17

UNIT I

INTRODUCTION TO BIG DATA

I ) TYPES OF DIGITAL DATA

DIGITAL DATA
Digital data is information stored on a computer system as a series of 0’s and 1’s
in a binary language. Digital data jumps from one value to the next in a step by step
sequence.
Example: Whenever we send an email, read a social media post, or take pictures with our digital
camera, we are working with digital data.
Digital data can be classified into three forms:
a. Unstructured Data: The data which does not conform to a data model or is not in a form that can
be used easily by a computer program is categorized as unstructured data. About 80—90% data of
an organization is in this format.
Example: Memos, chat rooms, PowerPoint presentations, images, videos, letters, researches, white
papers, the body of an email, etc.
b. Semi-Structured Data: The data which does not conform to a data model but has some structure
is categorized as semi-structured data. However, it is not in a form that can be used easily by a
computer program.
Example : Emails, XML, markup languages like HTML, etc. Metadata for this data is available
but is not sufficient.
c. Structured Data: The data which is in an organized form (ie. in rows and columns) and can be
easily used by a computer program is categorized as semi-structured data. Relationships exist
between entities of data, such as classes and their objects.
Example: Data stored in databases.

II ) HISTORY OF BIG DATA

The 21 st century is characterized by the rapid advancement in the field of information technology.
IT has become an integral part of daily life as well as various other industries like: health,
education, entertainment, science and technology, genetics, or business operations and these
industries generate a lot of data, this can be called Big Data.Big Data consists of large datasets that
cannot be managed efficiently by the common database management systems. These datasets
range from terabytes to exabytes.
Mobile phones, credit cards, Radio Frequency Identification (RFID) devices, and social
networking platforms create huge amounts of data that may reside unutilized at unknown servers
for many years.
And with the evolution of Big Data, this data can be accessed and analyzed on a regular basis to
generate useful information.
“Big Data” is a relative term depending on who is discussing it. For Example, Big Data to
Amazon or Google is very different from Big Data to a medium-sized insurance organization.

III) INTRODUCTION TO BIG DATA PLATFORM

A big data platform is a type of IT solution that combines the features and capabilities of several
big data applications and utilities within a single solution, this is then used further for managing
as well as analyzing Big Data.
It focuses on providing its users with efficient analytics tools for massive datasets.
The users of such platforms can custom build applications according to their use case like to
calculate customer loyalty (E-Commerce user case), and so on.
Goal: The main goal of a Big Data Platform is to achieve: Scalability, Availability, Performance,
and Security.
Example: Some of the most commonly used Big Data Platforms are :

• Hadoop Delta Lake Migration Platform

• Data Catalog Platform
• Data Ingestion Platform
• IoT Analytics Platform

IV) DRIVERS FOR BIG DATA

Big Data has quickly risen to become one of the most desired topics in the
industry. The main business drivers for such rising demand for Big Data
Analytics are :
1. The digitization of society

2. The drop in technology costs

3. Connectivity through cloud computing
4. Increased knowledge about data science
5. Social media applications
6. The rise of Internet-of-Things(IoT)

2
V ) BIG DATA ARCHITECTURE :
Big data architecture is designed to handle the ingestion, processing, and analysis of data that is
too large or complex for traditional database systems.

1. Data sources

Data is sourced from multiple inputs in a variety of formats, including both structured and
unstructured. Sources include relational databases allied with applications such as ERP or CRM, data
warehouses, mobile devices, social media, email, and real-time streaming data inputs such as IoT
devices. Data can be ingested in batch mode or in real-time.

2. Data storage

This is the data receiving layer, which ingests data, stores it, and converts unstructured data into a
format analytic tools can work with. Structured data is often stored in a relational database, while
unstructured data can be housed in a NoSQL database such as MongoDB Atlas. A specialized
distributed system like Hadoop Distributed File System (HDFS) is a good option for high-volume
batch processed data in various formats.

3. Batch processing

With very large data sets, long-running batch jobs are required to filter, combine, and generally render
the data usable for analysis. Source files are typically read and processed, with the output written to
new files. Hadoop is a common solution for this.

4. Real-time message ingestion

This component focuses on categorizing the data for a smooth transition into the deeper layers of the
environment. An architecture designed for real-time sources needs a mechanism to ingest and store
3
real-time messages for stream processing. Messages can sometimes just be dropped into a folder, but
in other cases, a message capture store is necessary for buffering and to enable scale-out processing,
reliable delivery, and other queuing requirements.

5. Stream processing

Once captured, the real-time messages have to be filtered, aggregated, and otherwise prepared for
analysis, after which they are written to an output sink. Options for this phase include Azure Stream
Analytics, Apache Storm, and Apache Spark Streaming.

6. Analytical data store

The processed data can now be presented in a structured format – such as a relational data warehouse
– for querying by analytical tools, as is the case with traditional business intelligence (BI) platforms.
Other alternatives for serving the data are low-latency NoSQL technologies or an interactive Hive
database.

7. Analysis and reporting

Most Big Data platforms are geared to extracting business insights from the stored data via analysis
and reporting. This requires multiple tools. Structured data is relatively easy to handle, while more
advanced and specialized techniques are required for unstructured data. Data scientists may
undertake interactive data exploration using various notebooks and tool-sets. A data modeling layer
might also be included in the architecture, which may also enable self-service BI using popular
visualization and modeling techniques.

Analytics results are sent to the reporting component, which replicates them to various output systems
for human viewers, business processes, and applications. After visualization into reports or
dashboards, the analytic results are used for data-driven business decision making.

8. Orchestration

The cadence of Big Data analysis involves multiple data processing operations followed by data
transformation, movement among sources and sinks, and loading of the prepared data into an
analytical data store. These workflows can be automated with orchestration systems from Apache
such as Oozie and Sqoop, or Azure Data Factory.

4
VI) BIG DATA CHARACTERISTICS :

1. Volume:
 The name ‘Big Data’ itself is related to a size which is enormous.
 Volume is a huge amount of data.
 To determine the value of data, size of data plays a very crucial role. If the volume of data is
very large, then it is actually considered as a ‘Big Data’. This means whether a particular data
can actually be considered as a Big Data or not, is dependent upon the volume of data.
 Hence while dealing with Big Data it is necessary to consider a characteristic ‘Volume’.
 Example: In the year 2016, the estimated global mobile traffic was 6.2 Exabytes (6.2 billion
GB) per month. Also, by the year 2020 we will have almost 40000 Exabytes of data.
2. Velocity:
 Velocity refers to the high speed of accumulation of data.
 In Big Data velocity data flows in from sources like machines, networks, social media, mobile
phones etc.
 There is a massive and continuous flow of data. This determines the potential of data that how
fast the data is generated and processed to meet the demands.
 Sampling data can help in dealing with the issue like ‘velocity’.
3. Variety:
 It refers to nature of data that is structured, semi-structured and unstructured data.
 It also refers to heterogeneous sources.
 Variety is basically the arrival of data from new sources that are both inside and outside of an
enterprise. It can be structured, semi-structured and unstructured.
 Structured data: This data is basically an organized data. It generally refers to data
that has defined the length and format of data.
 Semi- Structured data: This data is basically a semi-organised data. It is generally a
form of data that do not conform to the formal structure of data. Log files are the
examples of this type of data.
 Unstructured data: This data basically refers to unorganized data. It generally refers

5
to data that doesn’t fit neatly into the traditional row and column structure of the
relational database. Texts, pictures, videos etc. are the examples of unstructured data
which can’t be stored in the form of rows and columns.
4. Veracity:
 It refers to inconsistencies and uncertainty in data, that is data which is available can sometimes
get messy and quality and accuracy are difficult to control.
 Big Data is also variable because of the multitude of data dimensions resulting from multiple
disparate data types and sources.
 Example: Data in bulk could create confusion whereas less amount of data could convey half or
Incomplete Information.
5. Value:
 After having the 4 V’s into account there comes one more V which stands for Value! The bulk
of Data having no Value is of no good to the company, unless you turn it into something
useful.
 Data in itself is of no use or importance but it needs to be converted into something valuable to
extract Information. Hence, you can state that Value! is the most important V of all the 6V’s.
6. Variability:
 How fast or available data that extent is the structure of your data is changing?
 How often does the meaning or shape of your data change?
 Example: if you are eating same ice-cream daily and the taste just keep changing.

VII) BIG DATA TECHNOLOGY COMPONENTS :

Extract, transform and load (ETL) is the process of preparing data for analysis. While the actual ETL
workflow is becoming outdated, it still works as a general terminology for the data preparation layers
of a big data ecosystem. Concepts like data wrangling and extract, load, transform are becoming
more prominent, but all describe the pre-analysis prep work. Working with big data requires
significantly more prep work than smaller forms of analytics.
With different data structures and formats, it’s essential to approach data analysis with a thorough
plan that addresses all incoming data. Sometimes you’re taking in completely unstructured audio and
video, other times it’s simply a lot of perfectly-structured, organized data, but all with differing
schemas, requiring realignment.
The first two layers of a big data ecosystem, ingestion and storage, include ETL and are worth
exploring together.

6
Data Sources/Ingestion
The ingestion layer is the very first step of pulling in raw data. It comes from internal sources,
relational databases, nonrelational databases and others, etc. It can even come from social media,
emails, phone calls or somewhere else. There are two kinds of data ingestion:

1. Batch, in which large groups of data are gathered and delivered together. Data collection can
be triggered by conditions, launched on a schedule or ad hoc.
2. Streaming, which is a continuous flow of data. This is necessary for real-time data analytics.
It locates and pulls data as it’s generated. This requires more resources because it is constantly
monitoring for changes in data pools.

It’s all about just getting the data into the system. Parsing and organizing comes later. It’s like when
a dam breaks; the valley below is inundated. It’s quick, it’s massive and it’s messy. This presents lots
of challenges, some of which are:

 Maintaining security and compliance: With so much data flowing in, making sure that any
single dataset isn’t introducing security vulnerabilities is a legitimate worry. Additionally,
legal regulations don’t go away just because there is so much content to sift through. All data
must be obtained ethically and within the bounds of the law, which can be difficult to manage
and validate with such large quantities.
 Variable data speeds: Data sources have different infrastructures for transporting data. A
particularly slow source with low resources in exporting can bog down the entire process and
even introduce errors if the speed is too much slower than other sources.
 Ensuring data quality: Just because there is a large sum of data available doesn’t mean it’s
all relevant and useful. Having too much irrelevant, tangential or even incorrect, corrupt and
incomplete data can cause issues in analysis and processing down the line. The next step of
ETL helps address this.
7
Data Massaging, Cleansing and Organizing
As the data comes in, it needs to be sorted and translated appropriately before it can be used for
analysis. Because there is so much data that needs to be analyzed in big data, getting as close to
uniform organization as possible is essential to process it all in a timely manner in the actual analysis
stage. The components in the storage layer are responsible for making data readable, homogenous
and efficient.
Data arrives in different formats and schemas. It’s up to this layer to unify the organization of all
inbound data. This task will vary for each data project, whether the data is structured or unstructured.
If it’s the latter, the process gets much more convoluted.
Depending on the form of unstructured data, different types of translation need to happen. For things
like social media posts, emails, letters and anything in written language, natural language
processing software needs to be utilized. Formats like videos and images utilize techniques like log
file parsing to break pixels and audio down into chunks for analysis by grouping.Once all the data is
converted into readable formats, it needs to be organized into a uniform schema.
A schema is simply defining the characteristics of a dataset, much like the X and Y axes of a
spreadsheet or a graph. It’s a roadmap to data points. For structured data, aligning schemas is all that
is needed. For unstructured and semistructured data, semantics needs to be given to it before it can
be properly organized. Sometimes semantics come pre-loaded in semantic tags and metadata. For
example, a photo taken on a smartphone will give time and geo stamps and user/device information.
The metadata can then be used to help sort the data or give it deeper insights in the actual analytics.
Once all the data is as similar as can be, it needs to be cleansed. This means getting rid of redundant
and irrelevant information within the data.
When data comes from external sources, it’s very common for some of those sources to duplicate or
replicate each other. Often they’re just aggregations of public information, meaning there are hard
limits on the variety of information available in similar databases. Other times, the info contained in
the database is just irrelevant and must be purged from the complete dataset that will be used for
analysis.
After all the data is converted, organized and cleaned, it is ready for storage and staging for analysis.

Storage
The final step of ETL is the loading process. This is where the converted data is stored in a data lake
or warehouse and eventually processed. It’s the actual embodiment of big data: a huge set of usable,
homogenous data, as opposed to simply a large collection of random, incohesive data.
Many consider the data lake/warehouse the most essential component of a big data ecosystem. It
needs to contain only thorough, relevant data to make insights as valuable as possible. It must be
efficient with as little redundancy as possible to allow for quicker processing. It needs to be accessible
with a large output bandwidth for the same reason.
Lakes differ from warehouses in that they preserve the original raw data, meaning little has been
done in the transformation stage other than data quality assurance and redundancy reduction.
Comparatively, data stored in a warehouse is much more focused on the specific task of analysis, and
is consequently much less useful for other analysis efforts. Because of the focus, warehouses store
much less data and typically produce quicker results.

8
This also means that a lot more storage is required for a lake, along with more significant
transforming efforts down the line. Modern capabilities and the rise of lakes have created a
modification of extract, transform and load: extract, load and transform.

Analysis
Analysis is the big data component where all the dirty work happens.
You’ve done all the work to find, ingest and prepare the raw data. Now it’s time to crunch them all
together. In the analysis layer, data gets passed through several tools, shaping it into actionable
insights.
There are four types of analytics on big data: diagnostic, descriptive, predictive and prescriptive.

 Diagnostic: Explains why a problem is happening. Big data allows analytics to take a deep
dive into things like customer information, marketing metrics and key performance indicators
to explain why certain actions didn’t produce the expected results. Projects are undertaken
with an expectation of certain results based on certain estimations of markets, customers and
other similar criteria. Diagnostic analytics digs into which assumed contributors didn’t meet
their projected metrics.
 Descriptive: Describes the current state of a business through historical data. It uses previous
trends to forecast things like sales rates, seasonal impacts and more. In big data, the use of
far-reaching market data and customer insights help contextualize internal metrics and
increase the intelligence of a business’s position amongst its competitors. In boiled down
terms, it answers “what” questions.
 Predictive: Projects future results based on historical data. By highlighting patterns and
evaluating trajectories of relevant metrics, predictive analytics estimates future efforts.
 Prescriptive: Takes predictive analytics a step further by projecting best future efforts. By
tweaking inputs and changing actions, prescriptive analytics allows businesses to decide how
to put their best foot forward. Different actions will yield different results, and prescriptive
analytics helps decision makers try to decide the best way to proceed.
Just as the ETL layer is evolving, so is the analysis layer. AI and machine learning are moving the
goalposts for what analysis can do, especially in the predictive and prescriptive landscapes. We can
now discover insights impossible to reach by human analysis.

Consumption
The final big data component involves presenting the information in a format digestible to the end-
user. This can materialize in the forms of tables, advanced visualizations and even single numbers if
requested. This is what businesses use to pull the trigger on new processes.
The most important thing in this layer is making sure the intent and meaning of the output is
understandable. Up until this point, every person actively involved in the process has been a data
scientist, or at least literate in data science. But in the consumption layer, executives and decision-
makers enter the picture. They need to be able to interpret what the data is saying.

9
VIII) BIG DATA IMPORTANCE AND APPLICATIONS
Big Data Importance :
Big Data importance doesn’t revolve around the amount of data a company has but lies in the
fact that how the company utilizes the gathered data.
Every company uses its collected data in its own way. More effectively the company uses its data,
more rapidly it grows.
By analysing the big data pools effectively the companies can get answers to :
Cost Savings :
o Some tools of Big Data like Hadoop can bring cost advantages to business when large amounts
of data are to be stored.
o These tools help in identifying more efficient ways of doing business.
Time Reductions :
o The high speed of tools like Hadoop and in-memory analytics can easily identify new sources of
data which helps businesses analyzing data immediately.
o This helps us to make quick decisions based on the learnings.

10
o For example: By analyzing customers’ purchasing behaviours, a company can find out the
products that are sold the most and produce products according to this trend. By this, it can get
ahead of its competitors. Control online reputation :
o Big data tools can do sentiment analysis.
o Therefore, you can get feedback about who is saying what about your company.
o If you want to monitor and improve the online presence of your business, then big data tools can
help in all this.
Using Big Data Analytics to Boost Customer Acquisition(purchase) and Retention :
o The customer is the most important asset any business depends on.
o No single business can claim success without first having to establish a solid customer base.
o If a business is slow to learn what customers are looking for, then it is very likely to deliver poor
quality products.
o The use of big data allows businesses to observe various customer-related patterns and trends.
Using Big Data Analytics to Solve Advertisers Problem and Offer Marketing Insights :
o Big data analytics can help change all business operations.
o Like the ability to match customer expectations,
changing company’s product line, etc.
o And ensuring that the marketing campaigns are powerful.

Big Data Applications :

In today’s world big data have several applications, some of them are listed below :
Tracking Customer Spending Habit, Shopping Behavior :
In big retails stores, the management team has to keep data of customer’s spending habits,
shopping behaviour, most liked product, which product is being searched/sold most, based on that
data, the production/collection rate of that product gets fixed.
Recommendation :
By tracking customer spending habits, shopping behaviour, big retail stores provide
recommendations to the customers.
Smart Traffic System :
Data about the condition of the traffic of different roads, collected through cameras, GPS devices
placed in the vehicle.
All such data are analyzed and jam-free or less jam way, less time taking ways are
recommended. One more profit is fuel consumption can be reduced.
Secure Air Traffic System :
At various places of flight, sensors are present.

11
These sensors capture data like the speed of flight, moisture, temperature, and other
environmental conditions.
Based on such data analysis, an environmental parameter within flight is set up and varied.
By analyzing flight’s machine-generated data, it can be estimated how long the machine can
operate flawlessly and when it can be replaced/repaired.
Auto Driving Car :
In the various spots of the car camera, a sensor is placed that gathers data like the size of the
surrounding car, obstacle, distance from those, etc.
These data are being analyzed, then various calculations are carried
out. These calculations help to take action automatically.
Virtual Personal Assistant Tool :
Big data analysis helps virtual personal assistant tools like Siri, Cortana and Google Assistant to
provide the answer to the various questions asked by users.
This tool tracks the location of the user, their local time, season, other data related to questions
asked, etc. Analyzing all such data provides an answer.
Example: Suppose one user asks “Do I need to take Umbrella?”The tool collects data like location
of the user, season and weather condition at that location, then analyzes these data to conclude if
there is a chance of raining, then provides the answer.
IoT :
Manufacturing companies install IOT sensors into machines to collect operational data.
Analyzing such data, it can be predicted how long a machine will work without any problem
when it requires repair.
Thus, the cost to replace the whole machine can be saved.
Education Sector Energy Sector :
Online educational courses conducting organization utilize big data to search candidates interested
in that course.
If someone searches for a YouTube tutorial video on a subject, then an online or offline course
provider organization on that subject sends an ad online to that person about their course.
Media and Entertainment Sector :
Media and entertainment service providing company like Netflix,
Amazon Prime, Spotify do analysis on data collected from their
users.
Data like what type of video, music users are watching, listening to most, how long users are
spending on site, etc are collected and analyzed to set the next business strategy.

12
IX) BIG DATA FEATURES –SECURITY, COMPLIANCE, AUDITING AND
PROTECTION BIG DATA SECURITY :
Big data security is the collective term for all the measures and tools used to guard both the data
and analytics processes from attacks, theft, or other malicious activities that could harm or
negatively affect them.

For companies that operate on the cloud, big data security challenges are multi-faceted.

When customers give their personal information to companies, they trust them with personal data
which can be used against them if it falls into the wrong hands.

BIG DATA COMPLIANCE :

Data compliance is the practice of ensuring that sensitive data is organized and managed in such
a way as to enable organizations to meet enterprise business rules along with legal and
governmental regulations.

Organizations that don’t implement these regulations can be fined up to tens of millions of dollars
and even receive a 20-year penalty.

BIG DATA AUDITING :

Auditors can use big data to expand the scope of their projects and draw comparisons over larger
populations of data.

Big data also helps financial auditors to streamline the reporting process and detect fraud.

These professionals can identify business risks in time and conduct more relevant and accurate
audits.

BIG DATA PROTECTION :

Big data security is the collective term for all the measures and tools used to guard both the data
and analytics processes from attacks, theft, or other malicious activities that could harm or
negatively affect them.

That’s why data privacy is there to protect those customers but also companies and their employees
from security breaches.

When customers give their personal information to companies, they trust them with personal data
which can be used against them if it falls into the wrong hands.

Data protection is also important as organizations that don’t implement these regulations can be
fined up to tens of millions of dollars and even receive a 20-year penalty.

X) BIG DATA PRIVACY AND ETHICS

Most data is collected through surveys, interviews, or observation.

13
When customers give their personal information to companies, they trust them with personal data
which can be used against them if it falls into the wrong hands.

That’s why data privacy is there to protect those customers but also companies and their employees
from security breaches.

One of the main reasons why companies comply with data privacy regulations is to avoid fines.

Organizations that don’t implement these regulations can be fined up to tens of millions of dollars
and even receive a 20-year penalty.

Reasons, why we need to take data privacy seriously, are :

• Data breaches could hurt your business.

• Protecting your customers’ privacy
• Maintaining and improving brand value
• It gives you a competitive advantage
• It supports the code of ethics

XI) BIG DATA ANALYTICS

Big data analytics is a complex process of examining big data to uncover information, such as -
hidden patterns, correlations, market trends and customer preferences.

This can help organizations make informed business decisions.

Data Analytics technologies and techniques give organizations a way to analyze data sets and gather
new information.

Big Data Analytics enables enterprises to analyze their data in full context quickly and some also
offer real-time analysis.

Importance of Big Data Analytics :

Organizations use big data analytics systems and software to make data-driven decisions that can
improve business-related outcomes.

The benefits include more effective marketing, new revenue opportunities, customer personalization
and improved operational efficiency.

With an effective strategy, these benefits can provide competitive advantages

over rivals.
Big Data Analytics tools also help businesses save time and money and aid in gaining insights to
inform data-driven decisions.

14
Big Data Analytics enables enterprises to narrow their Big Data to the most relevant information
and analyze it to inform critical business decisions.

XII) CHALLENGES OF CONVENTIONAL SYSTEMS

• Big data is the storage and analysis of large data sets.

• These are complex data sets that can be both structured or unstructured.
• They are so large that it is not possible to work on them with traditional analytical tools.
• One of the major challenges of conventional systems was the uncertainty of the
Data Management Landscape.
• Big data is continuously expanding, there are new companies and technologies that are
being developed every day.
• A big challenge for companies is to find out which technology works bests for them
without the introduction of new risks and problems.
• These days, organizations are realising the value they get out of big data analytics and
hence they are deploying big data tools and processes to bring more efficiency in their
work environment.

XIII) INTELLIGENT DATA ANALYSIS, NATURE OF DATA

Intelligent Data Analysis (IDA) is one of the most important approaches in the field of data mining.
Based on the basic principles of IDA and the features of datasets that IDA handles, the
development of IDA is briefly summarized from three aspects :

• Algorithm principle
• The scale
• Type of the dataset
Intelligent Data Analysis (IDA) is one of the major issues in artificial intelligence and information.
Intelligent data analysis discloses hidden facts that are not known previously and provide
potentially important information or facts from large quantities of data.
It also helps in making a decision.
Based on machine learning, artificial intelligence, recognition of pattern, and records and
visualization technology, IDA helps to obtain useful information, necessary data and interesting
models from a lot of data available online in order to make the right choices.
IDA includes three stages:
(1) Preparation of data
(2) Data mining

15
(3) Data validation and Explanation

XIV) ANALYTIC PROCESSES AND TOOLS

Big Data Analytics is the process of collecting large chunks of structured/unstructured data,
segregating and analyzing it and discovering the patterns and other useful business insights from
it.
These days, organizations are realising the value they get out of big data analytics and hence
they are deploying big data tools and processes to bring more efficiency in their work environment.
Many big data tools and processes are being utilised by companies these days in the processes
of discovering insights and supporting decision making.
Big data processing is a set of techniques or programming models to access large- scale data to
extract useful information for supporting and providing decisions.
Below is the list of some of the data analytics tools used most in the industry :

• R Programming (Leading Analytics Tool in the industry)

• Python
• Excel
• SAS
• Apache Spark
• Splunk
• RapidMiner
• Tableau Public
• KNime

XV) ANALYSIS VS REPORTING

Reporting :

• Once data is collected, it will be organized using tools such as graphs and tables.
• The process of organizing this data is called reporting.
• Reporting translates raw data into information.
• Reporting helps companies to monitor their online business and be alerted when data falls
outside of expected ranges.
• Good reporting should raise questions about the business from its end users.
Analysis :

• Analytics is the process of taking the organized data and analyzing it.
16
• This helps users to gain valuable insights on how businesses can improve their performance.
• Analysis transforms data and information into insights.
• The goal of the analysis is to answer questions by interpreting the data at a deeper
level and providing actionable recommendations.
Conclusion :

• Reporting shows us “what is happening”.

• The analysis focuses on explaining “why it is happening” and “what we can do about it”.

XVI) MODERN DATA ANALYTIC TOOLS

• These days, organizations are realising the value they get out of big data analytics and
hence they are deploying big data tools and processes to bring more efficiency to their
work environment.
• Many big data tools and processes are being utilised by companies these days in the
processes of discovering insights and supporting decision making.
• Data Analytics tools are types of application software that retrieve data from one or more
systems and combine it in a repository, such as a data warehouse, to be reviewed and
analysed.
• Most organizations use more than one analytics tool including spreadsheets with statistical
functions, statistical software packages, data mining tools, and predictive modelling tools.
• Together, these Data Analytics Tools give the organization a complete overview of the
company to provide key insights and understanding of the market/business so smarter
decisions may be made.
• Data analytics tools not only report the results of the data but also explain why the results
occurred to help identify weaknesses, fix potential problem areas, alert decision-makers to
unforeseen events and even forecast future results based on decisions the company might
make.
• Below is the list some of data analytics tools :
• R Programming (Leading Analytics Tool in the industry)
• Python
• Excel
• SAS
• Apache Spark
• Splunk
• RapidMiner
• Tableau Public

Introduction To Big Data: Types of Digital Data, History of Big Data Innovation
No ratings yet
Introduction To Big Data: Types of Digital Data, History of Big Data Innovation
12 pages
Types of Digital Data & Big Data
No ratings yet
Types of Digital Data & Big Data
136 pages
Big Data Unit 1 Notes
100% (1)
Big Data Unit 1 Notes
27 pages
Big Data Overview: Types and Characteristics
No ratings yet
Big Data Overview: Types and Characteristics
15 pages
Understanding Big Data Types and History
No ratings yet
Understanding Big Data Types and History
22 pages
Big Data Unit 1
No ratings yet
Big Data Unit 1
24 pages
Dsc652 - Chapter 1 Introduction To Big Data Systems
No ratings yet
Dsc652 - Chapter 1 Introduction To Big Data Systems
27 pages
BDU1
No ratings yet
BDU1
39 pages
Introduction To Big Data Platform
No ratings yet
Introduction To Big Data Platform
20 pages
BDA Unit 1
No ratings yet
BDA Unit 1
50 pages
Data Analytics
No ratings yet
Data Analytics
69 pages
Unit - 1 (Big Data)
No ratings yet
Unit - 1 (Big Data)
15 pages
Stream Processing Chapter 2
No ratings yet
Stream Processing Chapter 2
21 pages
Big Data (Unit 1)
No ratings yet
Big Data (Unit 1)
32 pages
Unit 1
No ratings yet
Unit 1
51 pages
Chapter 1
No ratings yet
Chapter 1
21 pages
Big Data (Unit 1)
No ratings yet
Big Data (Unit 1)
37 pages
Unit 1 Big Data Notes
No ratings yet
Unit 1 Big Data Notes
48 pages
Notesfor BDA
No ratings yet
Notesfor BDA
59 pages
Unit 1 B Tech 3 Year BD
No ratings yet
Unit 1 B Tech 3 Year BD
10 pages
BDA - Unit - I Mtec 2024-2025
No ratings yet
BDA - Unit - I Mtec 2024-2025
64 pages
Unit I: Chapter 1: Introduction To Big Data
No ratings yet
Unit I: Chapter 1: Introduction To Big Data
35 pages
Big Data Analytics
100% (1)
Big Data Analytics
14 pages
Big Data 1 Unit
No ratings yet
Big Data 1 Unit
21 pages
Big Data Analytics
No ratings yet
Big Data Analytics
21 pages
Big Data Analytics 18CS72 - Module 1
No ratings yet
Big Data Analytics 18CS72 - Module 1
84 pages
Chapter 2
No ratings yet
Chapter 2
22 pages
Big Data: Challenges and Technologies
No ratings yet
Big Data: Challenges and Technologies
6 pages
Big Data Seminar Overview and Insights
No ratings yet
Big Data Seminar Overview and Insights
57 pages
Big Data Essentials for IT Professionals
No ratings yet
Big Data Essentials for IT Professionals
26 pages
Data Science Essentials & Big Data Concepts
No ratings yet
Data Science Essentials & Big Data Concepts
20 pages
Unit 1
No ratings yet
Unit 1
17 pages
BDA 1-5 Units
No ratings yet
BDA 1-5 Units
87 pages
Module 1. 16974328175990
No ratings yet
Module 1. 16974328175990
119 pages
Big Data Unit 1 Notes
No ratings yet
Big Data Unit 1 Notes
20 pages
Lecture1 Introductiontobigdata 190301171350
No ratings yet
Lecture1 Introductiontobigdata 190301171350
63 pages
Types of Digital Data: Unit 1 Big Data KCS-061
No ratings yet
Types of Digital Data: Unit 1 Big Data KCS-061
12 pages
Unit 1
No ratings yet
Unit 1
44 pages
Big Data Technologies Overview
No ratings yet
Big Data Technologies Overview
32 pages
Unit 1 Big Data - VII SEM (2024-25)
No ratings yet
Unit 1 Big Data - VII SEM (2024-25)
48 pages
BDA Unit 1
No ratings yet
BDA Unit 1
10 pages
Unit 1 BD
No ratings yet
Unit 1 BD
24 pages
Da Unit - I - Notes
No ratings yet
Da Unit - I - Notes
30 pages
Big Data Applications in Business
No ratings yet
Big Data Applications in Business
11 pages
Unit 1
No ratings yet
Unit 1
20 pages
Big Data
No ratings yet
Big Data
34 pages
BIG DATA Notes
No ratings yet
BIG DATA Notes
11 pages
Bda Module 1
No ratings yet
Bda Module 1
19 pages
Unit 1 and Unit 2 Notes Bda
No ratings yet
Unit 1 and Unit 2 Notes Bda
11 pages
Bda Unit 1
No ratings yet
Bda Unit 1
47 pages
BD 1
No ratings yet
BD 1
15 pages
BIGDATAUNIT1 AKTUpdf
No ratings yet
BIGDATAUNIT1 AKTUpdf
33 pages
Understanding Big Data Concepts
No ratings yet
Understanding Big Data Concepts
16 pages
Uc PDF
No ratings yet
Uc PDF
10 pages
Super Important Questions For BDA
100% (1)
Super Important Questions For BDA
26 pages
Challenges in Big Data Analytics Techniques
No ratings yet
Challenges in Big Data Analytics Techniques
6 pages
Understanding Data: Types and Careers
No ratings yet
Understanding Data: Types and Careers
18 pages
Big Data
No ratings yet
Big Data
16 pages
Introduction to Big Data Concepts
No ratings yet
Introduction to Big Data Concepts
65 pages
Covid Vaccine Data Analysis Report
No ratings yet
Covid Vaccine Data Analysis Report
14 pages
Data Analyst Resume Guide for Success
100% (1)
Data Analyst Resume Guide for Success
4 pages
Fall Term 2023 Full Term CSCI E 82 1 Advanced Machine Learning, Data Mining, and Artificial Intelligence
No ratings yet
Fall Term 2023 Full Term CSCI E 82 1 Advanced Machine Learning, Data Mining, and Artificial Intelligence
17 pages
SQL for Data-Driven Decision Making
No ratings yet
SQL for Data-Driven Decision Making
31 pages
Lec01 Introduction To DS 23092020 0230 0430 12102020 040538pm 27092022 121533pm
No ratings yet
Lec01 Introduction To DS 23092020 0230 0430 12102020 040538pm 27092022 121533pm
70 pages
Gen - AI Project Report
No ratings yet
Gen - AI Project Report
34 pages
Data Science Lecture No 01
No ratings yet
Data Science Lecture No 01
28 pages
1.3 Module-1
No ratings yet
1.3 Module-1
26 pages
FDS Book
No ratings yet
FDS Book
106 pages
Iai Brochure College 1
No ratings yet
Iai Brochure College 1
8 pages
BCV W
No ratings yet
BCV W
1 page
Data Science - FYBCA-Sem-II
No ratings yet
Data Science - FYBCA-Sem-II
13 pages
Case Study DSBDA
No ratings yet
Case Study DSBDA
12 pages
Data Science Question Bank
No ratings yet
Data Science Question Bank
3 pages
Data Analyst with Power BI Expertise
No ratings yet
Data Analyst with Power BI Expertise
1 page
Academic Transcript
No ratings yet
Academic Transcript
1 page
Yasaman Tahernezhad CV
No ratings yet
Yasaman Tahernezhad CV
3 pages
Data Science Tools and Techniques Guide
No ratings yet
Data Science Tools and Techniques Guide
139 pages
MCA Data Science Project Guide
No ratings yet
MCA Data Science Project Guide
4 pages
IIT Indore Online MSc in Data Science
No ratings yet
IIT Indore Online MSc in Data Science
16 pages
Chapter 2 - Data ScienceOR Ing
No ratings yet
Chapter 2 - Data ScienceOR Ing
35 pages
R2023 PG DS Curriculum and Syllabus 2024
No ratings yet
R2023 PG DS Curriculum and Syllabus 2024
52 pages
Data Science Career Profile
No ratings yet
Data Science Career Profile
2 pages
Junior BI Analyst Job in Nairobi
No ratings yet
Junior BI Analyst Job in Nairobi
3 pages
GRAI019 AI Principles and Applications 2024 SC
No ratings yet
GRAI019 AI Principles and Applications 2024 SC
9 pages
Unit - I - Introduction
100% (1)
Unit - I - Introduction
77 pages
327C5A Data Visualization
100% (1)
327C5A Data Visualization
2 pages
Introduction To Python and Machine Learning For Beginners
No ratings yet
Introduction To Python and Machine Learning For Beginners
2 pages
SAP Analytics Cloud Foundation Overview
No ratings yet
SAP Analytics Cloud Foundation Overview
20 pages
Data Analytics Essentials Quiz
No ratings yet
Data Analytics Essentials Quiz
4 pages

Introduction to Big Data Concepts

Uploaded by

Introduction to Big Data Concepts

Uploaded by

UNIT I

INTRODUCTION TO BIG DATA

I ) TYPES OF DIGITAL DATA

II ) HISTORY OF BIG DATA

III) INTRODUCTION TO BIG DATA PLATFORM

• Hadoop Delta Lake Migration Platform

IV) DRIVERS FOR BIG DATA

2. The drop in technology costs

4. Real-time message ingestion

6. Analytical data store

7. Analysis and reporting

VII) BIG DATA TECHNOLOGY COMPONENTS :

Big Data Applications :

BIG DATA COMPLIANCE :

BIG DATA AUDITING :

BIG DATA PROTECTION :

X) BIG DATA PRIVACY AND ETHICS

Most data is collected through surveys, interviews, or observation.

Reasons, why we need to take data privacy seriously, are :

• Data breaches could hurt your business.

XI) BIG DATA ANALYTICS

This can help organizations make informed business decisions.

Importance of Big Data Analytics :

With an effective strategy, these benefits can provide competitive advantages

XII) CHALLENGES OF CONVENTIONAL SYSTEMS

• Big data is the storage and analysis of large data sets.

XIII) INTELLIGENT DATA ANALYSIS, NATURE OF DATA

XIV) ANALYTIC PROCESSES AND TOOLS

• R Programming (Leading Analytics Tool in the industry)

XV) ANALYSIS VS REPORTING

• Reporting shows us “what is happening”.

XVI) MODERN DATA ANALYTIC TOOLS

You might also like