Data Analytics

Data analytics involves collecting, processing, and analyzing data from various sources, categorized into primary, secondary, internal, and external data. It encompasses structured, semi-structured, and unstructured data, each requiring different storage and processing methods. The need for data analytics arises from its ability to inform decision-making, improve efficiency, and enhance customer experience in today's data-driven landscape.

Uploaded by

lipima3572

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views20 pages

Data Analytics

Uploaded by

lipima3572

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 20

Data Analytics

Sources and Nature of

Data in Data Analytics
Data analytics involves collecting, processing, and analyzing
data from various sources. These sources can be categorized
into different types:
Data Sources-
• Primary Data: Data that is collected first-hand through direct
interaction or original research. Methods include:
• Surveys: Questionnaires, online polls, or face-to-face interactions.
• Interviews: Direct conversations with subjects to gather
qualitative insights.
• Experiments: Controlled conditions to observe specific outcomes.
Sources of Data
• Secondary Data: Data collected by someone else, often available
publicly or through licensed databases. Examples include:
• Government Reports: Statistical data from census, economic surveys, etc.
• Research Papers: Academic journals, white papers.
• Market Data: Purchased or publicly available industry reports, financial data.
• Internal Data: Data generated from within an organization’s own
processes. This can include:
• Sales Data: Information from point-of-sale systems, CRM platforms.
• Operational Data: Logs from manufacturing, inventory, and supply chains.
• Customer Data: Behavioral data from customer interactions, user feedback.
Sources of Data

• External Data: Data obtained from outside the

organization, such as:
• Public Databases: Open data portals (e.g., government or
world health databases).
• Social Media: User-generated content, sentiment analysis,
trends.
• Third-party Data Providers: Data purchased from vendors.
Nature of Data
• Data used in analytics can come in different forms and qualities. It's important to
understand its characteristics:
• Structured Data: Organized data that follows a specific format, easily stored in
databases (e.g., SQL databases). Examples include:
• Tables of numbers: Sales figures, financial statements.
• Log data: User interaction logs, transaction logs.
• Unstructured Data: Data that doesn’t fit into predefined models or structures,
requiring more complex processing. Examples include:
• Text: Emails, social media posts, documents.
• Multimedia: Images, audio, and video files.
• Semi-Structured Data: Data that is not fully structured but has some organizational
properties, often stored in formats like XML or JSON. Examples include:
• Sensor Data: IoT device logs, environmental readings.
• Web Data: Web scraping results, clickstream data.
• Data Collection Methods
 Manual Collection: Human-driven, such as filling out surveys or manually
inputting data.
 Automated Collection: Data gathered through software tools, sensors, or
web scraping.

Understanding the sources and nature of data is critical for ensuring

that the analytics process yields meaningful insights and drives
decision-making.
Classification of Data in Data
Analytics
Data Type Structure Examples Storage Method
• Data in analytics can be classified
based on its structure, which Structured Data Predefined format SQL databases, Relational
(rows/columns) financial records, Databases (SQL,
impacts how it is stored, POS data Oracle)

processed, and analyzed. The

three main types of data
Semi-Structured Partially organized JSON, XML, log Document Stores
classification are structured, semi- Data (tags/keys) files, NoSQL (MongoDB,
documents CouchDB)
structured, and unstructured
data.
Unstructured Data No specific Text documents, Data Lakes, Cloud
structure multimedia, social Storage (HDFS)
media
Characteristics of Data in
Data Analytics
• Understanding the characteristics of data is essential in data
analytics, as it affects the way data is collected, processed,
stored, and analyzed. Below are the key characteristics that
define data in the context of analytics.
• Accuracy: Accuracy refers to how closely the data
represents the true values or conditions of the entities being
measured. High accuracy ensures that the data is reliable
and can be used confidently in analysis and decision-making.
• Example: A weather sensor measuring the correct temperature
without any deviations or errors.
Characteristics of Data in
Data Analytics
• Completeness: Completeness refers to whether all necessary data is
available. Missing or incomplete data can lead to inaccurate results
or biased conclusions in the analysis process.
• Example: A customer database where all customers have complete
information (name, address, phone number, etc.) vs. a database where
some key information is missing.
• Consistency: Consistency ensures that data across different sources
or datasets follows the same formats, conventions, and units of
measurement. It ensures that data does not conflict when combined
from multiple sources.
• Example: Sales data across different branches of a company recorded in the
same currency, with uniform product codes.
Big Data
• Big Data refers to datasets that are so large and complex that they
cannot be processed using conventional data management tools.
The key characteristics of Big Data are:
• Volume: The size of the data is extremely large, often ranging from
terabytes to petabytes and beyond.
• Velocity: Data is generated and processed at high speed, often in real
time or near-real-time.
• Variety: Data comes in different formats, such as structured, semi
structured, and unstructured data (e.g., text, images, videos, social
media posts).
Big Data

Other characteristics include:

• Veracity: The uncertainty or unreliability of data, particularly
unstructured data.
• Value: The potential insights or business value derived from
Big Data analytics.
Components of a Big Data
Platform
• A Big Data platform typically includes a variety of tools and
technologies to handle the challenges posed by Big Data.
Some key components are:
• Data Storage:
• Distributed File Systems: Big Data platforms often use distributed
storage solutions to handle large datasets.
• Examples include the Hadoop Distributed File System (HDFS) and cloud
storage solutions (Amazon S3, Google Cloud Storage).
• NoSQL Databases: These databases (e.g. MongoDB) are designed
to store and manage semi-structured and unstructured data.
Components of a Big Data
Platform
• Data Processing Frameworks:
• Batch Processing: Frameworks like Apache Hadoop allow for the
processing of large datasets in batches.
• Stream Processing: Tools like Apache Kafka and Apache link enable
real-time or near-real-time processing of continuous data streams.
• Data Management Tools:
• Data Integration: Tools like Apache NiFi and Talend help in integrating
various data sources, allowing data to be collected, cleaned, and
transformed for analysis.
• Data Governance: Ensures data security, privacy, and compliance with
regulations using tools like Apache Ranger or AWS Lake Formation.
Components of a Big Data
Platform
• Data Analytics Tools:
• Big Data Querying: Apache Hive and Apache Impala allow users
to query large datasets using SQL-like queries.
• Machine Learning Frameworks: Big Data platforms often
integrate with machine learning libraries (e.g., Apache Spark
MLlib, TensorFlow) for advanced predictive analytics and data
modeling.
• Data Visualization: Tools like Tableau, Power BI, and
Apache Superset provide visual representations of large
datasets, allowing for easier interpretation and insights.
Big Data Technologies
• Apache Hadoop: One of the most widely used frameworks, Hadoop enables
distributed storage (HDFS) and distributed processing (MapReduce) of large datasets.
• Apache Spark: A powerful data processing engine that supports both batch and real-
time analytics. Spark is often used for large-scale machine learning, graph processing,
and stream analytics.
• NoSQL Databases: Unlike traditional relational databases, NoSQL databases such as
MongoDB, Cassandra, and HBase can handle unstructured or semi-structured data,
providing scalability and flexibility.
• Apache Kafka: A distributed streaming platform that enables the ingestion of real-
time data streams for processing.
• Cloud-based Big Data Platforms: Major cloud providers such as Amazon Web Services
(AWS), Google Cloud Platform (GCP), and Microsoft Azure offer scalable Big Data
solutions, including data lakes, distributed computing, and machine learning services.
Need for Data Analytics
• Data analytics is essential in today's digital age for several reasons, offering key benefits
across industries and organizations:
• Informed Decision-Making: Analytics helps businesses make data-driven decisions by
uncovering trends, patterns, and insights from large datasets.
• Example: Companies use sales data to optimize product pricing and marketing
strategies.
• Improving Efficiency: By analyzing operational data, businesses can identify bottlenecks,
reduce costs, and improve overall efficiency.
• Example: Manufacturers use predictive analytics for maintenance, reducing downtime.
• Enhancing Customer Experience: Data analytics provides insights into customer
preferences, allowing businesses to personalize services and improve customer
satisfaction.
• Example: E-commerce platforms recommend products based on user behavior and
purchase history.
Need for Evolution of Analytics
Scalability

• The evolution of analytics scalability is driven by the

increasing complexity, volume, and diversity of data in
today's digital landscape. The need for scalable analytics
arises from the following factors:
• Growing Data Volume: With the explosion of data from social
media, IoT devices, and other sources, traditional systems
cannot handle the massive datasets. Scalable analytics
enables processing and analyzing large data efficiently.
• Example: A retailer analyzing billions of transactions to optimize
inventory and supply chain.
• Real-Time Analytics: As businesses demand quicker
insights for real-time decision-making, scalable systems are
essential to process high-velocity data streams without
delays.
• Example: Financial institutions using real-time analytics to detect
fraud instantly.
• Variety of Data: Data comes in structured, semi-
structured, and unstructured formats (e.g., text, images,
videos), requiring scalable analytics platforms that can
process and integrate diverse data sources.
• Example: Social media analytics that combine text, image, and
video content for sentiment analysis.
Analytic Process and
Tools
• The data analytics process consists of several steps that help in
turning raw data into actionable insights:
• Data Collection: Gathering data from various sources such as
databases, sensors, social media, or surveys.
• Tools: Web scraping tools (e.g., Scrapy), database management systems
(e.g., MySQL, MongoDB).
• Data Cleaning: Preparing the data by handling missing values,
correcting errors, and removing duplicates to ensure data quality.
• Tools: OpenRefine, Trifacta, Pandas (Python).
• Data Exploration: Analyzing the data to understand its patterns and
distributions using descriptive statistics or visualizations.
• Tools: Excel, Tableau, Power BI, Python libraries (e.g., Matplotlib, Seaborn).
• Data Modelling: Applying statistical models, machine learning
algorithms, or predictive analytics to extract meaningful insights.
• Tools: R, Python (e.g., Scikit-learn, TensorFlow), SAS, SPSS.
• Data Interpretation: Drawing conclusions from the analysis and
translating them into actionable business strategies or solutions.
• Tools: Visualization tools (e.g., Power BI, Tableau), reporting tools (e.g., Google
Data Studio).
• Deployment & Monitoring: Implementing the analytics models in real-
world scenarios and continuously monitoring the results for
improvements.
• Tools: Apache Kafka (real-time processing), Jenkins (automation), cloud
platforms (e.g., AWS, Azure).

Unit 1ppt
No ratings yet
Unit 1ppt
29 pages
Unit-1 Introduction To Data Analytics
No ratings yet
Unit-1 Introduction To Data Analytics
35 pages
BDA Unit 1
No ratings yet
BDA Unit 1
39 pages
Big Data and Data Analytics
No ratings yet
Big Data and Data Analytics
6 pages
Dataanalyticsunit 1
No ratings yet
Dataanalyticsunit 1
26 pages
Big Data Analytics Overview and Insights
No ratings yet
Big Data Analytics Overview and Insights
20 pages
Data Analytics For Healthcare Notes
No ratings yet
Data Analytics For Healthcare Notes
11 pages
Big Data and Data Analytics Overview
No ratings yet
Big Data and Data Analytics Overview
58 pages
KCA 034 - Unit 1
No ratings yet
KCA 034 - Unit 1
48 pages
Data Science Unit 2 Part 1
No ratings yet
Data Science Unit 2 Part 1
10 pages
Unit-1 Notes
No ratings yet
Unit-1 Notes
30 pages
Data Analytics Unit - I Data Analytics and Lifecycle
No ratings yet
Data Analytics Unit - I Data Analytics and Lifecycle
46 pages
Introduction To Big Data
No ratings yet
Introduction To Big Data
4 pages
Data Analysis - Unit1
No ratings yet
Data Analysis - Unit1
65 pages
BDA Unit 1 Notes-1
No ratings yet
BDA Unit 1 Notes-1
34 pages
BDA Unit 1 Notes
No ratings yet
BDA Unit 1 Notes
34 pages
What Is Big Data
No ratings yet
What Is Big Data
4 pages
Introduction To Data
No ratings yet
Introduction To Data
34 pages
Unit 1
No ratings yet
Unit 1
19 pages
TP 4 2docuatrimestre
No ratings yet
TP 4 2docuatrimestre
10 pages
Unit 1 Introduction To Data Analytics
No ratings yet
Unit 1 Introduction To Data Analytics
20 pages
Big Data Analytics 1
No ratings yet
Big Data Analytics 1
22 pages
# What Is Big Data
No ratings yet
# What Is Big Data
10 pages
Challenges in Big Data Analytics Techniques
No ratings yet
Challenges in Big Data Analytics Techniques
6 pages
Unit1 BDT
No ratings yet
Unit1 BDT
96 pages
Unit 1 Introduction
No ratings yet
Unit 1 Introduction
70 pages
Unit 1ppt 241202105748 Ba1c594f
No ratings yet
Unit 1ppt 241202105748 Ba1c594f
30 pages
Big Data
No ratings yet
Big Data
69 pages
Types of Digital Data & Big Data
No ratings yet
Types of Digital Data & Big Data
136 pages
Unit 1 Big Data
No ratings yet
Unit 1 Big Data
124 pages
Big Data Analytics Overview
No ratings yet
Big Data Analytics Overview
6 pages
BDA1-4 Bunits
No ratings yet
BDA1-4 Bunits
113 pages
Big Data Analytics Unit-1
100% (2)
Big Data Analytics Unit-1
5 pages
Introduction To Big Data Platform (Module-3)
No ratings yet
Introduction To Big Data Platform (Module-3)
23 pages
Big Data and Data Analysis: Offurum Paschal I Kunoch Education and Training College, Owerri
No ratings yet
Big Data and Data Analysis: Offurum Paschal I Kunoch Education and Training College, Owerri
35 pages
Bigdata Mod-1
No ratings yet
Bigdata Mod-1
33 pages
Unit - I - Types of Digital Data
No ratings yet
Unit - I - Types of Digital Data
45 pages
Business Analytics Notes
No ratings yet
Business Analytics Notes
31 pages
Data Analytics Notes
No ratings yet
Data Analytics Notes
26 pages
Introduction To Big Data: Types of Digital Data, History of Big Data Innovation
No ratings yet
Introduction To Big Data: Types of Digital Data, History of Big Data Innovation
12 pages
Big Data Analytics and QliqView Insights
No ratings yet
Big Data Analytics and QliqView Insights
47 pages
Data Analytics Complete Notes
No ratings yet
Data Analytics Complete Notes
33 pages
Module 1
No ratings yet
Module 1
21 pages
1.2 Big Data
No ratings yet
1.2 Big Data
23 pages
BigData - BCom Unit 2
No ratings yet
BigData - BCom Unit 2
10 pages
Getting An Overview of Big Data (Module1)
No ratings yet
Getting An Overview of Big Data (Module1)
58 pages
Unit 1
No ratings yet
Unit 1
21 pages
Big Data Analytics Data Science-M10
No ratings yet
Big Data Analytics Data Science-M10
62 pages
Big Data Analytics
No ratings yet
Big Data Analytics
58 pages
Big Data Analytics Project Proposal by Slidesgo
No ratings yet
Big Data Analytics Project Proposal by Slidesgo
12 pages
Data Architecture for Analytics Tools
No ratings yet
Data Architecture for Analytics Tools
21 pages
DA Chapter 1 Notes Final
No ratings yet
DA Chapter 1 Notes Final
2 pages
Data Analytics Unit 1 2
No ratings yet
Data Analytics Unit 1 2
29 pages
Big Data Analytics
No ratings yet
Big Data Analytics
32 pages
Lecture 2
No ratings yet
Lecture 2
50 pages
Introduction To Big Data Platform
No ratings yet
Introduction To Big Data Platform
20 pages
Bose 2008
No ratings yet
Bose 2008
20 pages
NLP Deep Learning Techniques in Healthca
No ratings yet
NLP Deep Learning Techniques in Healthca
4 pages
User Guide: Ibm Maximo Equipment Maintenance Assistant Saas
No ratings yet
User Guide: Ibm Maximo Equipment Maintenance Assistant Saas
48 pages
2 Data-Science PDF
No ratings yet
2 Data-Science PDF
49 pages
International Marketing Course Overview
No ratings yet
International Marketing Course Overview
135 pages
Overview of Web Mining Techniques
No ratings yet
Overview of Web Mining Techniques
41 pages
Project 1
No ratings yet
Project 1
6 pages
Intelligent eCRF Design Improving Data Quality With AI
No ratings yet
Intelligent eCRF Design Improving Data Quality With AI
5 pages
Competitive Landscape Intelligent Document Processing Platform Providers
No ratings yet
Competitive Landscape Intelligent Document Processing Platform Providers
19 pages
Ibm Filenet P8 Platform and Architecture: Reprinted For Supriya Kapoor, Tata Consultancy Svcs
No ratings yet
Ibm Filenet P8 Platform and Architecture: Reprinted For Supriya Kapoor, Tata Consultancy Svcs
14 pages
Chapter 1 Types of Data and Sources (Vol - 1)
No ratings yet
Chapter 1 Types of Data and Sources (Vol - 1)
12 pages
843 AI Teacher HandbookXI
No ratings yet
843 AI Teacher HandbookXI
217 pages
Understanding Structured vs Unstructured Data
No ratings yet
Understanding Structured vs Unstructured Data
1 page
Big Data: Insight: Mrs. S.V. Balshetwar, Dr. R.M.Tugnayat
No ratings yet
Big Data: Insight: Mrs. S.V. Balshetwar, Dr. R.M.Tugnayat
3 pages
DV Lab Manual (Ex - No.1-10)
No ratings yet
DV Lab Manual (Ex - No.1-10)
23 pages
BI and Analytics
No ratings yet
BI and Analytics
2 pages
FALLSEM2024-25 SWE2011 ETH VL2024250103282 2024-07-15 Reference-Material-I
No ratings yet
FALLSEM2024-25 SWE2011 ETH VL2024250103282 2024-07-15 Reference-Material-I
69 pages
Principles of Data Science
No ratings yet
Principles of Data Science
46 pages
BDU1
No ratings yet
BDU1
39 pages
SDC LAB Manual
No ratings yet
SDC LAB Manual
40 pages
Seven Text Mining Techniques
No ratings yet
Seven Text Mining Techniques
21 pages
Bi - Unit 1
No ratings yet
Bi - Unit 1
382 pages
Data Science QB Solve SEM6
No ratings yet
Data Science QB Solve SEM6
157 pages
Facets of Data Important
No ratings yet
Facets of Data Important
4 pages
Crime Analysis and Prediction Using Datamining: A Review
No ratings yet
Crime Analysis and Prediction Using Datamining: A Review
20 pages
Big Data Basics for Beginners
No ratings yet
Big Data Basics for Beginners
53 pages
Development of NLP Powered Semantic Analysis For Document Understanding
No ratings yet
Development of NLP Powered Semantic Analysis For Document Understanding
4 pages
r22 Manual Master
No ratings yet
r22 Manual Master
68 pages
Unit 1 - Big Data Analytics - CCS334
No ratings yet
Unit 1 - Big Data Analytics - CCS334
35 pages
CAF-3-ST (Data Systems & Risk)
No ratings yet
CAF-3-ST (Data Systems & Risk)
360 pages

Data Analytics

Uploaded by

Data Analytics

Uploaded by

Data Analytics

Sources and Nature of

• External Data: Data obtained from outside the

Understanding the sources and nature of data is critical for ensuring

processed, and analyzed. The

Other characteristics include:

• The evolution of analytics scalability is driven by the

You might also like