Data Similarity and Dissimilarity

Data science is a multi-disciplinary field that utilizes scientific methods and algorithms to extract insights from both structured and unstructured data. The data science life cycle includes stages such as data acquisition, pre-processing, model building, pattern evaluation, and knowledge representation. It employs various analyses like descriptive, diagnostic, predictive, and prescriptive to inform business decisions and optimize outcomes.

Uploaded by

sujal.22210365

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

36 views73 pages

Data Similarity and Dissimilarity

Uploaded by

sujal.22210365

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 73

Data similarity and

dissimilarity
What is Data Science?
• A multi-disciplinary field that uses scientific methods, algorithms,
processes and systems to extract a knowledge and insights from
structured and unstructured data.
What is Data Science?

• a “concept to unify statistics, data analysis, machine learning and their

related methods”. In order to “understand and analyze actual
phenomena with data”.
• Employs various techniques and theories drawn from many fields
within the context of mathematics, statistics, computer science,
information science.
Life Cycle of Data Science
Data Acquisition
• We already know that data comes from multiple sources
and it comes in multiple formats. So, our first step
would be to integrate all of this data and store it in one
single location. Further, from this integrated data, we’ll
have to select a particular section to implement our
Data Science task on.
Data Pre-processing
• Once the data acquisition is done, it’s time for pre-processing.
• The raw data which we have acquired cannot be used directly for Data Science
tasks.
• This data needs to be processed by applying some operations such as
normalization and aggregation.
Model Building
• Once pre-processing is done, it is time for the most important step in the Data
Science life cycle, which is model building.
• Here, we apply different scientific algorithms such as linear regression, k-
means clustering, and random forest to find interesting insights.
Pattern Evaluation
• After we build the model on top of our data and extract some patterns, it’s
time to check for the validity of these patterns, i.e., in this step, we check if
the obtained information is correct, useful, and new.
• Only if the obtained information satisfies these three conditions, we consider
the information to be valid.
Knowledge Representation
• Once the information is validated, it is time to represent
the information with simple aesthetic graphs.
Introduction to Data Science
• What is Data Science?
Data science is the domain of study that deals with vast volumes of data
using modern tools and techniques to find unseen patterns, derive
meaningful information, and make business decisions.
Data science uses complex machine learning algorithms to build
predictive models.
Data science is the study of data to extract meaningful insights for
business.
It is a multidisciplinary approach that combines principles and practices
from the fields of mathematics, statistics, artificial intelligence, and
computer engineering to analyze large amounts of data.
This analysis helps data scientists to ask and answer questions like what
happened, why it happened, what will happen, and what can be done
with the results.
What is data science used for?
1. Descriptive analysis
Descriptive analysis examines data to gain insights into what happened or
what is happening in the data environment. It is characterized by data
visualizations such as pie charts, bar charts, line graphs, tables, or
generated narratives. For example, a flight booking service may record data
like the number of tickets booked each day. Descriptive analysis will reveal
booking spikes, booking slumps, and high-performing months for this
service.
2. Diagnostic analysis
Diagnostic analysis is a deep-dive or detailed data examination to
understand why something happened. It is characterized by techniques
such as drill-down, data discovery, data mining, and correlations. Multiple
data operations and transformations may be performed on a given data set
to discover unique patterns in each of these techniques. For example, the
flight service might drill down on a particularly high-performing month to
better understand the booking spike. This may lead to the discovery that
many customers visit a particular city to attend a monthly sporting event.
3. Predictive analysis
Predictive analysis uses historical data to make accurate forecasts about data
patterns that may occur in the future. It is characterized by techniques such as
machine learning, forecasting, pattern matching, and predictive modeling. In each of
these techniques, computers are trained to reverse engineer causality connections in
the data. For example, the flight service team might use data science to predict
flight booking patterns for the coming year at the start of each year. The computer
program or algorithm may look at past data and predict booking spikes for certain
destinations in May. Having anticipated their customer’s future travel requirements,
the company could start targeted advertising for those cities from February.
4. Prescriptive analysis
Prescriptive analytics takes predictive data to the next level. It not only predicts what
is likely to happen but also suggests an optimum response to that outcome. It can
analyze the potential implications of different choices and recommend the best
course of action. It uses graph analysis, simulation, complex event processing,
neural networks, and recommendation engines from machine learning.
Back to the flight booking example, prescriptive analysis could look at historical
marketing campaigns to maximize the advantage of the upcoming booking spike. A
data scientist could project booking outcomes for different levels of marketing spend
on various marketing channels. These data forecasts would give the flight booking
company greater confidence in their marketing decisions.
The CRoss Industry Standard Proce
ss for Data Mining (CRISP-DM)
• It has six sequential phases:
1.Business understanding – What does the business need?
2.Data understanding – What data do we have / need? Is it
clean?
3.Data preparation – How do we organize the data for
modeling?
4.Modeling – What modeling techniques should we apply?
5.Evaluation – Which model best meets the business
objectives?
6.Deployment – How do stakeholders access the results?

Data Science
No ratings yet
Data Science
64 pages
Lec 1 - Data Science
No ratings yet
Lec 1 - Data Science
3 pages
What Is Data Science?: Module - 1
No ratings yet
What Is Data Science?: Module - 1
29 pages
DSA Module 1 Notes
No ratings yet
DSA Module 1 Notes
24 pages
Data Science
No ratings yet
Data Science
11 pages
Data Science
100% (1)
Data Science
31 pages
Defining Data Science and Its Lifecycle
No ratings yet
Defining Data Science and Its Lifecycle
74 pages
Data Science
No ratings yet
Data Science
11 pages
Daftar Isi Modul Data Science
100% (1)
Daftar Isi Modul Data Science
56 pages
Data Science Analytics Overview
0% (1)
Data Science Analytics Overview
17 pages
DSV Sem Exam
No ratings yet
DSV Sem Exam
15 pages
What Is Data Science Used For
No ratings yet
What Is Data Science Used For
1 page
1 Stop Project1
No ratings yet
1 Stop Project1
27 pages
Unit-1 - Introduction To Data Science
No ratings yet
Unit-1 - Introduction To Data Science
17 pages
(IJCST-V10I4P1) :swagata Sarkar, Dhivya Balaje, Vibha V, Harish Pichumani
No ratings yet
(IJCST-V10I4P1) :swagata Sarkar, Dhivya Balaje, Vibha V, Harish Pichumani
4 pages
DS Mod 1 To 2 Complete Notes
No ratings yet
DS Mod 1 To 2 Complete Notes
63 pages
Unit - I
No ratings yet
Unit - I
17 pages
DS Module 1 Notes
No ratings yet
DS Module 1 Notes
25 pages
Trends in Data Science: AI and DS-I
No ratings yet
Trends in Data Science: AI and DS-I
32 pages
Summer Training
No ratings yet
Summer Training
8 pages
Exploratory Data Analysis
100% (1)
Exploratory Data Analysis
209 pages
Unit 1
No ratings yet
Unit 1
9 pages
Data Science in IOT
No ratings yet
Data Science in IOT
220 pages
BI Unit 2
No ratings yet
BI Unit 2
113 pages
Understanding Data Science on AWS
No ratings yet
Understanding Data Science on AWS
13 pages
Data Science Analytics Module
No ratings yet
Data Science Analytics Module
5 pages
Unit - 1
No ratings yet
Unit - 1
25 pages
Big - Data Unit-2
100% (2)
Big - Data Unit-2
64 pages
Data Science-Lec 1
No ratings yet
Data Science-Lec 1
17 pages
Introduction of Data Science
No ratings yet
Introduction of Data Science
28 pages
Notes Unit 1
No ratings yet
Notes Unit 1
8 pages
Data Science
No ratings yet
Data Science
46 pages
Introduction To Data Science
No ratings yet
Introduction To Data Science
29 pages
Unit2data Science Methodology
No ratings yet
Unit2data Science Methodology
6 pages
Screenshot 2025-04-23 at 8.26.12 AM
No ratings yet
Screenshot 2025-04-23 at 8.26.12 AM
14 pages
Data Science Foundations Seminar Report
No ratings yet
Data Science Foundations Seminar Report
15 pages
TLMweek 1 Intro Ds
No ratings yet
TLMweek 1 Intro Ds
11 pages
Introduction To Data Science and Data Analytics
No ratings yet
Introduction To Data Science and Data Analytics
85 pages
Unit 1
No ratings yet
Unit 1
50 pages
Technical Report Writing For Ca2 Examination: Topic: Introduction To Data Science
No ratings yet
Technical Report Writing For Ca2 Examination: Topic: Introduction To Data Science
7 pages
Summary of Data Science
No ratings yet
Summary of Data Science
5 pages
Data Science
100% (2)
Data Science
33 pages
Data Science and Python for Business Insights
No ratings yet
Data Science and Python for Business Insights
12 pages
Data Science Unit 1
No ratings yet
Data Science Unit 1
30 pages
Datas Unit1
No ratings yet
Datas Unit1
20 pages
The Transformative Role of Data Science in Contemporary Society
No ratings yet
The Transformative Role of Data Science in Contemporary Society
14 pages
Unit I and Unit II Dev
No ratings yet
Unit I and Unit II Dev
36 pages
IDS Complete Notes
No ratings yet
IDS Complete Notes
126 pages
Data Analytics
No ratings yet
Data Analytics
51 pages
Unit2 DATA SCIENCE
No ratings yet
Unit2 DATA SCIENCE
8 pages
Data Analytics
No ratings yet
Data Analytics
4 pages
Data Science Notes Unit 1
No ratings yet
Data Science Notes Unit 1
28 pages
AI DS Unit 3
No ratings yet
AI DS Unit 3
5 pages
Data Science Overview Basic To Advance Guide
No ratings yet
Data Science Overview Basic To Advance Guide
27 pages
09 Handout 1
No ratings yet
09 Handout 1
4 pages
Chapter 01 2
No ratings yet
Chapter 01 2
19 pages
Statistics and Probability: Quarter 4 - Module 2
No ratings yet
Statistics and Probability: Quarter 4 - Module 2
20 pages
Applied Statistics Exam Paper
No ratings yet
Applied Statistics Exam Paper
9 pages
Understanding Normal Distribution
No ratings yet
Understanding Normal Distribution
105 pages
Design of Experiments
No ratings yet
Design of Experiments
65 pages
Assesment
40% (5)
Assesment
15 pages
9580 ANIL PANDEY Anil DATA Week 4 Assignment 4 833985 1072711078
No ratings yet
9580 ANIL PANDEY Anil DATA Week 4 Assignment 4 833985 1072711078
16 pages
Introduction to Political Science
No ratings yet
Introduction to Political Science
18 pages
Seshat's WTF Method - A Unified Critical Thinking Algorithm
No ratings yet
Seshat's WTF Method - A Unified Critical Thinking Algorithm
6 pages
Data Collection, Scales and Indexes, Reliability & Validity
No ratings yet
Data Collection, Scales and Indexes, Reliability & Validity
13 pages
Cross-Tabulation for Researchers
No ratings yet
Cross-Tabulation for Researchers
2 pages
RCBD Anova Notes (III)
No ratings yet
RCBD Anova Notes (III)
13 pages
Statistics For Managers Using Microsoft® Excel 5th Edition: Analysis of Variance
No ratings yet
Statistics For Managers Using Microsoft® Excel 5th Edition: Analysis of Variance
31 pages
Changed Last Version Criteria
No ratings yet
Changed Last Version Criteria
7 pages
Biophysics Syllabus
No ratings yet
Biophysics Syllabus
2 pages
ANOVA on Village Financial Management
No ratings yet
ANOVA on Village Financial Management
8 pages
Data Science Course for Programmers
No ratings yet
Data Science Course for Programmers
18 pages
Design of Experiment
No ratings yet
Design of Experiment
5 pages
Anaerobic Respiration in Yeast Experiment
No ratings yet
Anaerobic Respiration in Yeast Experiment
3 pages
Science & Philosophy: Popper's Impact
No ratings yet
Science & Philosophy: Popper's Impact
13 pages
Understanding Science Definitions
No ratings yet
Understanding Science Definitions
5 pages
如何在youtube上撰写论文陈述
100% (2)
如何在youtube上撰写论文陈述
6 pages
Introduction To Quantitative Methods
No ratings yet
Introduction To Quantitative Methods
33 pages
Nisha Resume 3
No ratings yet
Nisha Resume 3
3 pages
Strengths and Weaknesses of Qualitative Research
100% (2)
Strengths and Weaknesses of Qualitative Research
2 pages
Quantum Mechanics: Solved Problems on Hamiltonians and Measurements
No ratings yet
Quantum Mechanics: Solved Problems on Hamiltonians and Measurements
10 pages
Contingent Valuation Method (CVM) : Hypothetical Scenario
No ratings yet
Contingent Valuation Method (CVM) : Hypothetical Scenario
12 pages
Quality Management Quiz
No ratings yet
Quality Management Quiz
5 pages
Documents Questioned Albert S. Osborn PDF
67% (3)
Documents Questioned Albert S. Osborn PDF
540 pages
Prediction of Maximal Aerobic Power From The 20-m Multi-Stage Shuttle Run Test
No ratings yet
Prediction of Maximal Aerobic Power From The 20-m Multi-Stage Shuttle Run Test
12 pages
1990 Henry
No ratings yet
1990 Henry
81 pages

Data Similarity and Dissimilarity

Uploaded by

Data Similarity and Dissimilarity

Uploaded by

Data similarity and

• a “concept to unify statistics, data analysis, machine learning and their

You might also like