0% found this document useful (0 votes)
36 views73 pages

Data Similarity and Dissimilarity

Data science is a multi-disciplinary field that utilizes scientific methods and algorithms to extract insights from both structured and unstructured data. The data science life cycle includes stages such as data acquisition, pre-processing, model building, pattern evaluation, and knowledge representation. It employs various analyses like descriptive, diagnostic, predictive, and prescriptive to inform business decisions and optimize outcomes.

Uploaded by

sujal.22210365
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views73 pages

Data Similarity and Dissimilarity

Data science is a multi-disciplinary field that utilizes scientific methods and algorithms to extract insights from both structured and unstructured data. The data science life cycle includes stages such as data acquisition, pre-processing, model building, pattern evaluation, and knowledge representation. It employs various analyses like descriptive, diagnostic, predictive, and prescriptive to inform business decisions and optimize outcomes.

Uploaded by

sujal.22210365
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 73

Data similarity and

dissimilarity
What is Data Science?
• A multi-disciplinary field that uses scientific methods, algorithms,
processes and systems to extract a knowledge and insights from
structured and unstructured data.
What is Data Science?

• a “concept to unify statistics, data analysis, machine learning and their


related methods”. In order to “understand and analyze actual
phenomena with data”.
• Employs various techniques and theories drawn from many fields
within the context of mathematics, statistics, computer science,
information science.
Life Cycle of Data Science
Data Acquisition
• We already know that data comes from multiple sources
and it comes in multiple formats. So, our first step
would be to integrate all of this data and store it in one
single location. Further, from this integrated data, we’ll
have to select a particular section to implement our
Data Science task on.
Data Pre-processing
• Once the data acquisition is done, it’s time for pre-processing.
• The raw data which we have acquired cannot be used directly for Data Science
tasks.
• This data needs to be processed by applying some operations such as
normalization and aggregation.
Model Building
• Once pre-processing is done, it is time for the most important step in the Data
Science life cycle, which is model building.
• Here, we apply different scientific algorithms such as linear regression, k-
means clustering, and random forest to find interesting insights.
Pattern Evaluation
• After we build the model on top of our data and extract some patterns, it’s
time to check for the validity of these patterns, i.e., in this step, we check if
the obtained information is correct, useful, and new.
• Only if the obtained information satisfies these three conditions, we consider
the information to be valid.
Knowledge Representation
• Once the information is validated, it is time to represent
the information with simple aesthetic graphs.
Introduction to Data Science
• What is Data Science?
Data science is the domain of study that deals with vast volumes of data
using modern tools and techniques to find unseen patterns, derive
meaningful information, and make business decisions.
Data science uses complex machine learning algorithms to build
predictive models.
Data science is the study of data to extract meaningful insights for
business.
It is a multidisciplinary approach that combines principles and practices
from the fields of mathematics, statistics, artificial intelligence, and
computer engineering to analyze large amounts of data.
This analysis helps data scientists to ask and answer questions like what
happened, why it happened, what will happen, and what can be done
with the results.
What is data science used for?
1. Descriptive analysis
Descriptive analysis examines data to gain insights into what happened or
what is happening in the data environment. It is characterized by data
visualizations such as pie charts, bar charts, line graphs, tables, or
generated narratives. For example, a flight booking service may record data
like the number of tickets booked each day. Descriptive analysis will reveal
booking spikes, booking slumps, and high-performing months for this
service.
2. Diagnostic analysis
Diagnostic analysis is a deep-dive or detailed data examination to
understand why something happened. It is characterized by techniques
such as drill-down, data discovery, data mining, and correlations. Multiple
data operations and transformations may be performed on a given data set
to discover unique patterns in each of these techniques. For example, the
flight service might drill down on a particularly high-performing month to
better understand the booking spike. This may lead to the discovery that
many customers visit a particular city to attend a monthly sporting event.
3. Predictive analysis
Predictive analysis uses historical data to make accurate forecasts about data
patterns that may occur in the future. It is characterized by techniques such as
machine learning, forecasting, pattern matching, and predictive modeling. In each of
these techniques, computers are trained to reverse engineer causality connections in
the data. For example, the flight service team might use data science to predict
flight booking patterns for the coming year at the start of each year. The computer
program or algorithm may look at past data and predict booking spikes for certain
destinations in May. Having anticipated their customer’s future travel requirements,
the company could start targeted advertising for those cities from February.
4. Prescriptive analysis
Prescriptive analytics takes predictive data to the next level. It not only predicts what
is likely to happen but also suggests an optimum response to that outcome. It can
analyze the potential implications of different choices and recommend the best
course of action. It uses graph analysis, simulation, complex event processing,
neural networks, and recommendation engines from machine learning.
Back to the flight booking example, prescriptive analysis could look at historical
marketing campaigns to maximize the advantage of the upcoming booking spike. A
data scientist could project booking outcomes for different levels of marketing spend
on various marketing channels. These data forecasts would give the flight booking
company greater confidence in their marketing decisions.
The CRoss Industry Standard Proce
ss for Data Mining (CRISP-DM)
• It has six sequential phases:
1.Business understanding – What does the business need?
2.Data understanding – What data do we have / need? Is it
clean?
3.Data preparation – How do we organize the data for
modeling?
4.Modeling – What modeling techniques should we apply?
5.Evaluation – Which model best meets the business
objectives?
6.Deployment – How do stakeholders access the results?

You might also like