Unit - I DA

The document provides an overview of data analytics, including its definitions, classifications, and characteristics, with a focus on Big Data. It outlines the data analytics lifecycle, detailing phases such as discovery, data preparation, model planning, and operationalization, while emphasizing the need for various roles in successful analytics projects. Additionally, it contrasts traditional analytics with Big Data analytics and discusses the importance of modern tools and technologies for managing large datasets.

Uploaded by

ianurags2509

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

49 views107 pages

Unit - I DA

Uploaded by

ianurags2509

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 107

Unit - I

• Introduction to Data Analytics: Sources and

nature of data, classification of data (structured,
semi-structured, unstructured), characteristics of
data, introduction to Big Data platform, need of
data analytics, evolution of analytic scalability,
analytic process and tools, analysis vs reporting,
modern data analytic tools, applications of data
analytics.
• Data Analytics Lifecycle: Need, key roles for
successful analytic projects, various phases of data
analytics lifecycle – discovery, data preparation,
model planning, model building, communicating
results, operationalization.
Source of Data
What is Big data
CLASSIFICATION OF DATA
• Data classification is broadly defined as the process
of organizing data by relevant categories so that it
may be used and protected more efficiently. On a
basic level, the classification process makes data
easier to locate and retrieve. Data classification is of
particular importance when it comes to risk
management, compliance, and data security.
• Big Data includes huge volume, high velocity, and
extensible variety of data. These are 3 types:
Structured data, Semi-structured data, and
Unstructured data.
Structured data
• Structured data is data whose elements are
addressable for effective analysis. It has been
organized into a formatted repository that is typically
a database. It concerns all data which can be stored in
database SQL in a table with rows and columns. They
have relational keys and can easily be mapped into
pre-designed fields. Today, those data are most
processed in the development and simplest way to
manage information. Example: Relational data.
Unstructured data
• Unstructured data is a data that is which is not
organized in a pre-defined manner or does not
have a pre-defined data model, thus it is not a good
fit for a mainstream relational database. So for
Unstructured data, there are alternative platforms
for storing and managing, it is increasingly
prevalent in IT systems and is used by
organizations in a variety of business intelligence
and analytics applications. Example: Word, PDF,
Text, Media logs.
Semi-Structured data
• Semi-structured data is information that does
not reside in a relational database but that have
some organizational properties that make it
easier to analyze. With some process, you can
store them in the relation database (it could be
very hard for some kind of semi-structured
data), but Semi-structured exist to ease
space. Example: XML data.
CHARACTERISTICS OF DATA
Volume
Variety
Veracity
Value
Velocity
How Much Data
cern's large hadron collider
TYPES OF DATA
VARIETIES BIG DATA COLLECTED
What is Big data
• Big data exceeds to reach of commonly used
hardware environments and software tools
to capture, manage and process it within a
tolerable elapsed time for its user population -
Merv ardern
• Big data' refers to datasets whose size is
beyond the ability of typical database
software tools to capture, store, manage, and
analyze. – Mckinsey global institute
Summary
• Definition of Data Analytics
• Data Analytics vs Data Mining
• Definitions to Big data
• Classification's of big data
• Characteristics of big data
• Applications of Big data
TRADITIONAL ANALYTICS VS BIG
DATA ANALYTICS
360-degree view
EVOLUTION OF ANALYTIC
SCALABILITY
• The amount of data organizations process
continues to increase.
The old method for handling data doesn't work
efficiently
• Important technologies to handle big data are
MPP (Massive Parallel processing )
The cloud
Grid computing
Map reduce
MORDERN DATA BASE
ARCHITECTURE
Massively Parallel Processing
What is cloud computing ?
Grid Computing
Map Reduce
Working process
Good & Bad
Technologies can integrate and work
together
Evolution of Analytical Processes
Definition of Analytical frame work
An internal Configuration
An External Configuration
A Hybrid Configuration
Benefits
Definition of ADS
The data that is pulled together in order to create
an analysis or model
• In the format required for the specific analysis at hand

• Generated by transforming, aggregating, and combining

data

• Help to bridge the gap between efficient storage and ease of

use
Two Primary kinds of Analytics Data sets
Traditional Analytics data sets
Enterprise Analytic Data Set
EDA Set - Structure
Summary Table or View?
Embedded Scoring
Model and Score Management
• Model and score management procedures will need to
be in place to scale the use of models by an
organization.
REPORTING Vs ANALYSIS
• Reporting: The process of organizing data into
informational summaries in order to monitor how
different areas of a business are performing.

– They select the reports which they want to run

– Get the reports executed
– View Results
• Analysis: The process of exploring data and reports
in order to extract meaningful insights, which can be
used to better understand and improve business
performance.

– Tracking Problem
– Finding Data Required
– Analyze the data
– Interpret the result
Difference
Making Inference
• To produce a great analysis, it is necessary to infer
potential actions
– Make initial inference based on analysis
– Visualization plays vital role in understanding
– An effective Visualization can bring out much
more inferences
– Today visualization tool allows multiple tabs, links
the graphs and charts
– New idea for visualizations is 3-D
Applications
• Open source software have been around for some
time
– In many cases, open source products are outside
the mainstream
• Many individuals are contributing to improve the
functionality
– Bugs can be patched soon
Data Analytics Lifecycle
• Big Data analysis differs from traditional data
analysis primarily due to the volume, velocity and
va r ie ty cha ra cte r s t i c s o f t h e d a t a b e i n
g processes.
• To address the distinct requirements for performing
analysis on Big Data, a step-by-step methodology is
needed to organize the activities and tasks involved
with acquiring, processing, analyzing and repurposing
data.
Key Roles for a Successful Analytics
Project
• Business User – understands the domain area
• Project Sponsor – provides requirements
• Project Manager – ensures meeting objectives
• Business Intelligence Analyst – provides business
domain expertise based on deep understanding of the
data
• Database Administrator (DBA) – creates DB
environment
• Data Engineer – provides technical skills, assists data
management and extraction, supports analytic sandbox
• Data Scientist – provides analytic techniques and
modeling
Data Analytics Lifecycle (cont..)
• The data analytic lifecycle is designed for Big Data
problems and data science projects
• The cycle is iterative to represent a real project
• Work can return to earlier phases as new
• information is uncovered
Data Analytics Lifecycle-Abstract
View
Discovery
• In this phase,
• The data science team must learn and investigate the
problem,
• Develop context and understanding and Learn about
the data sources needed and available for the project.
• In addition, the team formulates initial hypotheses
that
• can later be tested with data.
• The team should perform five main activities during this step
of the discovery.
• Identify data sources: Make a list of data sources the team
may need to test the initial hypotheses outlined in this phase.
Make an inventory of the datasets currently available and
those that can be purchased or otherwise acquired for the
tests the team wants to perform.
• Capture aggregate data sources: This is for previewing the
data and providing high-level understanding.
It enables the team to gain a quick overview of the data and
perform further exploration on specific areas.
• Review the raw data: Begin understanding the
interdependencies among the data attributes.
Become familiar with the content of the data, its quality,
and its limitations
• Evaluate the data structures and tools needed: The data
type and structure dictate which tools the team can use to
analyze the data.
• Scope the sort of data infrastructure needed for this type of
problem: In addition to the tools needed, the data influences
the kind of infrastructure that's required, such as disk storage
and network capacity.
• Unlike many traditional stage-gate processes, in which the
team can advance only when specific criteria are met, the Data
Analytics Lifecycle is intended to accommodate more
ambiguity
• For each phase of the process, it is recommended to pass
certain checkpoints as a way of gauging whether the team is
ready to move to the next phase of the Data Analytics
Lifecycle.
Data preparation
• This phase includes
• Steps to explore, Preprocess, and condition data prior to
modeling and analysis.
• It requires the presence of an analytic sandbox (workspace), in
which the team can work with data and perform analytics for the
duration of the project.
✔ The team needs to execute Extract, Load, and Transform (ELT)
or extract, transform and load (ETL) to get data into the
sandbox.
✔ In ETL, users perform processes to extract data from a datastore,
perform data transformations, and load the data back into the
datastore
✔ The ELT and ETL are sometimes abbreviated as ETLT. Data should
be transformed in the ETLT process so the team can work with it and
analyze it.
Data preparation (Cont.,)
Data preparation (Cont.,)
Common Tools for the Data
Preparation Phase
Model Planning
Common Tools for the Model
Planning Phase
Model Building
Communicate Results
Operationalize
Common Tools for the Model
Building Phase
Key outputs for each of the main
stakeholders

Unit - I - 2
No ratings yet
Unit - I - 2
63 pages
Data Analytics Unit - I Data Analytics and Lifecycle
No ratings yet
Data Analytics Unit - I Data Analytics and Lifecycle
46 pages
Unit I Big Data
No ratings yet
Unit I Big Data
256 pages
Unit - I Da PDF - 1-2
No ratings yet
Unit - I Da PDF - 1-2
79 pages
Lecture 2
No ratings yet
Lecture 2
50 pages
Big Data Analytics Overview
No ratings yet
Big Data Analytics Overview
61 pages
Big Data Analytics Lifecycle Overview
No ratings yet
Big Data Analytics Lifecycle Overview
15 pages
Unit 1 - DATA ANALYTICS - KIT-601 - AKTU
No ratings yet
Unit 1 - DATA ANALYTICS - KIT-601 - AKTU
24 pages
Unit 1 - DSA
No ratings yet
Unit 1 - DSA
12 pages
WINSEM2023-24 BCSE206L TH VL2023240501787 2024-01-29 Reference-Material-I
No ratings yet
WINSEM2023-24 BCSE206L TH VL2023240501787 2024-01-29 Reference-Material-I
53 pages
CSCI946 w2-BDLifecycle
No ratings yet
CSCI946 w2-BDLifecycle
76 pages
Data Analysis - Unit1
No ratings yet
Data Analysis - Unit1
65 pages
Unit 2 - Data Science
No ratings yet
Unit 2 - Data Science
37 pages
Unit 1 Introduction
No ratings yet
Unit 1 Introduction
70 pages
Big Data Insights for IT Professionals
No ratings yet
Big Data Insights for IT Professionals
35 pages
Unit 1
No ratings yet
Unit 1
19 pages
Data Analytics Notes
No ratings yet
Data Analytics Notes
26 pages
UNUT 1 - Introduction and Data Analytics Life Cycle
No ratings yet
UNUT 1 - Introduction and Data Analytics Life Cycle
86 pages
Introduction to Data Analytics
No ratings yet
Introduction to Data Analytics
42 pages
KCA 034 - Unit 1
No ratings yet
KCA 034 - Unit 1
48 pages
Big Data Analysis
100% (1)
Big Data Analysis
30 pages
Understanding Data Analytics and Its Importance
No ratings yet
Understanding Data Analytics and Its Importance
5 pages
Data Science: Chapter 1: Introduction To Big Data
100% (2)
Data Science: Chapter 1: Introduction To Big Data
77 pages
Business Analytics Unit I
No ratings yet
Business Analytics Unit I
45 pages
Download
No ratings yet
Download
4 pages
Unit 1 Topic 1 Intro
100% (1)
Unit 1 Topic 1 Intro
30 pages
Data Analytics-Wps Office
No ratings yet
Data Analytics-Wps Office
21 pages
Big Data
No ratings yet
Big Data
69 pages
Unit 1ppt
No ratings yet
Unit 1ppt
29 pages
Inroduction To Data Science
No ratings yet
Inroduction To Data Science
62 pages
Big Data Analytics Unit1
No ratings yet
Big Data Analytics Unit1
10 pages
BDA CH 1 V1
No ratings yet
BDA CH 1 V1
48 pages
CH 1
No ratings yet
CH 1
31 pages
Introduction To Big Data
No ratings yet
Introduction To Big Data
4 pages
Data Analysis
No ratings yet
Data Analysis
6 pages
Unit 1 Introduction To Data Analytics
No ratings yet
Unit 1 Introduction To Data Analytics
20 pages
ITGY403 Lesson 1
No ratings yet
ITGY403 Lesson 1
16 pages
Trends in Data Science: AI and DS-I
No ratings yet
Trends in Data Science: AI and DS-I
32 pages
Dataanalyticsunit 1
No ratings yet
Dataanalyticsunit 1
26 pages
ToolKit 1 - Unit 1 - Introduction To Data Analytics
No ratings yet
ToolKit 1 - Unit 1 - Introduction To Data Analytics
15 pages
Unit I
No ratings yet
Unit I
47 pages
DA - Unit I
No ratings yet
DA - Unit I
83 pages
Intro To Data Analytics
No ratings yet
Intro To Data Analytics
42 pages
Unit 1 Notes Final Part C
No ratings yet
Unit 1 Notes Final Part C
38 pages
Big Data and Data Analytics Overview
No ratings yet
Big Data and Data Analytics Overview
58 pages
Data Analytics Unit I 1
No ratings yet
Data Analytics Unit I 1
87 pages
Diff Analysisand Analytics
No ratings yet
Diff Analysisand Analytics
14 pages
Data Analytics
No ratings yet
Data Analytics
32 pages
BDA Unit 1 Bigdata Intro
No ratings yet
BDA Unit 1 Bigdata Intro
69 pages
Understanding Data Analytics Essentials
No ratings yet
Understanding Data Analytics Essentials
15 pages
Week 1
No ratings yet
Week 1
50 pages
Week 1 Lecture
No ratings yet
Week 1 Lecture
26 pages
Big Data Analytics Overview
100% (1)
Big Data Analytics Overview
81 pages
Big Data
No ratings yet
Big Data
20 pages
Nosql Database Systems: M.Tech. (Iind, Sem Ce/Cn)
100% (1)
Nosql Database Systems: M.Tech. (Iind, Sem Ce/Cn)
135 pages
Business Analytics Notes
No ratings yet
Business Analytics Notes
31 pages
My First ETL Pipeline
No ratings yet
My First ETL Pipeline
10 pages
Gate Dbms Notes
No ratings yet
Gate Dbms Notes
14 pages
Group 14 - DSG - Assignment2
No ratings yet
Group 14 - DSG - Assignment2
9 pages
SQL Server Cheat Sheet: by Via
No ratings yet
SQL Server Cheat Sheet: by Via
1 page
Unit 5
No ratings yet
Unit 5
4 pages
SQL Lab Manual 3
No ratings yet
SQL Lab Manual 3
10 pages
Binary Trees
No ratings yet
Binary Trees
37 pages
Database Management Concepts Explained
No ratings yet
Database Management Concepts Explained
14 pages
ASM Provides Striping and Mirroring.: 1. What Is The Use of ASM (Or) Why Is ASM Preferred Over File System?
No ratings yet
ASM Provides Striping and Mirroring.: 1. What Is The Use of ASM (Or) Why Is ASM Preferred Over File System?
4 pages
Database Design for Students
No ratings yet
Database Design for Students
3 pages
Oracle Flexfields Guide
0% (1)
Oracle Flexfields Guide
13 pages
VCS-277 Veritas NetBackup Exam Guide
No ratings yet
VCS-277 Veritas NetBackup Exam Guide
45 pages
Unit V - 4. Explain Search Engine
No ratings yet
Unit V - 4. Explain Search Engine
3 pages
Period Start Time PLMN Namernc Name: Ps Drop Call Ratehsdpa Drop Rate
No ratings yet
Period Start Time PLMN Namernc Name: Ps Drop Call Ratehsdpa Drop Rate
39 pages
Computer Applications in Pharmacy
No ratings yet
Computer Applications in Pharmacy
109 pages
DBMS
No ratings yet
DBMS
66 pages
Lesson Objective:: Student Activity 3.1A Key: Utilizing Select With Simple
No ratings yet
Lesson Objective:: Student Activity 3.1A Key: Utilizing Select With Simple
3 pages
Computer Unit - 4
No ratings yet
Computer Unit - 4
28 pages
Core Data
No ratings yet
Core Data
183 pages
CSCI 4707 - Written Submission 3 Solutions: Question# Sections Max Score Details Score
No ratings yet
CSCI 4707 - Written Submission 3 Solutions: Question# Sections Max Score Details Score
16 pages
MCS-023 Database Management Exam Paper
No ratings yet
MCS-023 Database Management Exam Paper
5 pages
Elementary Data Structures Lecture
No ratings yet
Elementary Data Structures Lecture
25 pages
Practice 5 PDF
No ratings yet
Practice 5 PDF
5 pages
RDA Policy Statement and Guidelines For Philippine Libraries
No ratings yet
RDA Policy Statement and Guidelines For Philippine Libraries
32 pages
Big Data Engineer Profile: Ansar Hayat
No ratings yet
Big Data Engineer Profile: Ansar Hayat
3 pages
Worksheet Mysql
No ratings yet
Worksheet Mysql
4 pages
Mastering Data Sphere
No ratings yet
Mastering Data Sphere
9 pages
Operating System Exercises - Chapter 11-Sol
No ratings yet
Operating System Exercises - Chapter 11-Sol
4 pages
DBAS Assignment Spring 2024 - Winter 2024 (6750)
No ratings yet
DBAS Assignment Spring 2024 - Winter 2024 (6750)
6 pages
AdventureWorks SQL Server Reports Guide
0% (1)
AdventureWorks SQL Server Reports Guide
7 pages

Unit - I DA

Uploaded by

Unit - I DA

Uploaded by

Unit - I

• Introduction to Data Analytics: Sources and

• Generated by transforming, aggregating, and combining

• Help to bridge the gap between efficient storage and ease of

– They select the reports which they want to run

You might also like