0% found this document useful (0 votes)

324 views14 pages

Data Science Theory: Analysis and Analytics

This document discusses key concepts in data science theory including: 1) It distinguishes between analysis (examining past data) and analytics (predicting future patterns). Qualitative analysis uses intuition while quantitative analysis uses formulas. 2) Data science can improve predictive accuracy by analyzing data extracted from various activities. Business intelligence analyzes historical data to explain past events. 3) Machine learning uses data to make predictions and analyze patterns without explicit programming. Artificial intelligence simulates human decision making. 4) The document outlines approaches for working with different data types from raw to processed data to information and techniques for analyzing big data.

Uploaded by

Nonameforever

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

324 views14 pages

Data Science Theory: Analysis and Analytics

Uploaded by

Nonameforever

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

🧠

Data science Theory

Class Data science theory

Completed

Created Jun 12, 2020 1019 PM

Materials

Source Udemy

Type Lecture

Analysis and analytics

Analysis-

preform analysis on things that have already happended in the past.

Example: Hoe the sales decreased in the summer.

We do analysis to find what or something happen

Analytics-

Exploring patterns in exploring what we can do in the future.

There are two type of analytics

Qualitative analysis = intution and analysis

Data science Theory 1

Quantitative analysis =formulas and algorithms

Introduction:

In this some business activities are data driven while others are subjective or
experience driven.

Business needs -

Business case studies - real world experience of how companies succeed

and fail. We dont need a data set to understand case studies.

Qualitative anaytics - its all about intuition and knowleage about the market
,This includes working with tools to pridict the future behavior.

Preliminary data reporting

reporting with visuals

Creating dashboards
Sales forecastings

👆In the following the pink are data driven

👆The yello is experience driven
Some of the terms you refer to activites that aim to explain past
behavior(This is called as Analysis) while others refer to activites used for
predicting future behavior(This is called as analytics).

Data science Theory 2

Here the business case studies are analysis and qualitative analysis is all
about analytics predicting the fututre(Analytics).

NOTE Business analytics=business analysis + business analytics.

Data science: Can be used to improve the accuracy of prediction based on

data extraced from various activities.

Business Intelligence BI :The process of analysing and reporting historical

business data .Aims to explain past events using business data.preliminary
step of predictive analytics

 Analyse past data and extract useful insights

 create appropriate models

Reporting visuals and creating dashboards is all about BI

Machine Learning: The ability of mahine to pridict outcomes without being

explicitly programmed. is all about creating and implementing algorithms that
let machines receive data and use this data to

 Make pridictions

 analyses patterns

 give recommendations

Artificial intelligence: Simulating human knowledge and decision makeing with

computers.

Data science Handbook :

Data science Theory 3

Approaches and techniques working with traditional data.
Raw data to processed data and to information

Data science Theory 4

 Raw facts or Raw data

 Cannot be analysied straight away

 in is untouched data you have accumulated and storded in the server

 Data collection

 Examples: Survey Can be taken by surveys.How much people like

or dislike the product in the scale of 1 to 10 }

 Cookies : They provide companies with detailed information about

users activities on a web site.

 processed data

 Data pre-processing :

 Before data processing we do data pre-processing.This we do

after data collection.This is a group of operation that will basically
convert your raw data into a format that is more understandable.

 Example : In the SQL database is the person enters the age is 932
or name as united kingdom

 Before any analysis that data should be makred as invalid or

corrected.

 Methods in pre-processing:

 Class labeling -

 This inculdes labeling the data point to the correct data

type or arranging data by category.

 This can be

 Numerical - number of unites sold in the day

 categorical - cannot be manipulated.

 Data cleansing = data cleaning = data scrubbing

 It is to deal with inconsistant data

 Example: Correcting spelling mistakes and deal with

missing values.

 Example for Data preprocessing :

Data science Theory 5

 Balancing : Imagine you have copiled a survey to gather
data on the shopping habits of man and women .To find
who spends more money in the weekend.When you have
the data 80% of women and 20% of men in the
respondents. So the trends you may notice are not going
towards men as much as women to counteract.Applying
balancing techiques wiuld be the best thing to do such as
takeing equal number of respondents from each group.so
the ratio is 50/50.

 Data shuffling : Shuffling the observation from the dataset

is just like shuffling of cards.Prevents unwanted
patterns.Improves predictive perforance.helps avoid
misleading results.Suffling is the process of randomize
data.

 Information

Visualization represents databases containing traditional data.

(visualization of relational database management system)
Entity relationship diagram (or ER Relational schema

Showes how the tables in the Here each rectangle represent a

datbase are related. distinct data table. and the line
represents which is and which are.

Teachniques for working with big data

Here there are much more verity beyond categorial and numerical Examples of
big data can be number text,digital images ,digital video data ,digital audio
data.

Data science Theory 6

with a wider range of data types comes with wider range if data cleansing
methods.
There are thchniques that verify that a digital image observation is ready for
processing.

Text data mining: The process of deriving valuable ,unstructured data from a
text.

Data masking: analyse the information without compromising private detailes.

Business intelligence (BI) analysis:

Data skills + business knowledge and intution to eplain the past performance
of the company.

How we measure business performance.

We start by collecting observation.

For example Collecting variables shuch as sales volume or new

customer enrolled in your web site

Each monthly revenue is each customer is consider a single

observation

Then we must quantify that information.Quantification is the process of

representing observation as numbers.

Measure: ameasure is the

accumulation of observations to
show some information

For example : If you total the

revenue of all three months
to obtain the value of $350
that would be that will be the
measure if the revenue of the
first quarter of that year.

Similary add together the

nukmber of new customer for
the same period : 50 and you
have a another measure.

Data science Theory 7

Analyze the data
Metrics - refers to the value that derives from the measure you obtain and
aims at gauging business performance or progress.
NOTE : Metric=meansure + business meaning

☝This is useful for comparision.

Can we kepp track of all possible metric we can extract from data set? - YES

Does it makes sense to do that ? NO

What you need to do is choose the metrics that are tightly aligned with your
business objective.There metrics are called KPIs Key Performance Indicators)
KPIs=metrics + business objectives

Key - related to ypur business goals

Performance - how successfully you have performed within a specified time

frame.
Indicators - generated only from users who have clicked on a link provided in
your ad campaign.

Metric KPI

The traffic of a page from your The traffic generated only from users
website that was visited by any type who have clicked on a link provided
of user. in your ad campaign

Data science Theory 8

And the next step every quantitative meaning you extracted must me
visulaized.

Traditional methods
At this stage we start applying analytics.

Techniques for working with traditional data

Regression: A model used for quantifying casal relationships among the

different variables included in your analysis.

For example:
Linear regression models

The table below is the data of price and house in square feet. This is linear
regression models.

Here the Red line is regression line.

because the all the point are close to the red
line while its not close to the green line. So
green line is not regression line

Data science Theory 9

So this red line can be written as

y = bx

Here, y -house price ,b-coefficient and x-house size

Logistic regression

The values in the vertical line will be 1s or 0s only.

Such models used in decision making process.

Companies apply logistical regression algorithms to filter job candidates

during their screening process.

If the algorithm estimates the probability that a prospective candidate will

perform well and the company is above 50 % it would be predicted one or
a successful application. Otherwise its 0

Data science Theory 10

Cluster analysis

For example if the house price vs house square feet graph is like below

Here the red line is the regression line. But here we ca do more : cluster
analysis .

This is another technique that will take into account that certain observations
exhibit similar house sizes and prices

Here in the cluster city

center : cost high and small
,far from the city : big but
cost less , nice

Data science Theory 11

neighborhoods : in the city
cost high and big house

For this example we only have the house size and house price.
but when it comes to this table:

Here the mathematical expression for regression model.

y = a + b1 x1 + b2 x2 + b3 x3 + ....... + bn xn

NOTE X explanatory variable is AKA regressor or independent variable

=predictor variable
For example analyzing a survey that consist of 100 questions.

In this question the regression model is:

y = a + b1 x1 + b2 + x2 + b3 x3 + .......... + b100 x100

Data science Theory 12

Here the factor analysis comes place.

In the example : Question 1 : I like animals ⭕⭕⭕⭕⭕

Question 2 : I care about animals ⭕⭕⭕⭕⭕
Question 3 : I am against animal cruelty ⭕⭕⭕⭕⭕
Whoever marks 5 to the first question most likely to give 5 for the rest two
questions.In other words if you strongly agree with one of there questions
you will not disagree with other 2.
With factor analysis We can add all the three questions to general attitude
towards animals.

⎧
⎪x1 1. I like animals
z1 = ⎨x2 2. I care about animals
⎪
⎩
x3 3. I am against animal cruelty

By this way we can reduce the regressor to 100 to 10.Which is more accurate
prediction.

y = n + n1 z1 + n2 z2 + n3 z3 + ......... + n10 z10

Time series
Plotting values against time. Time is always in x-axis.

Example for traditional methods

Example : User experience

Image you are the head of the user experience UX)department of a web site
selling goods on a global scale.
So as the head of UX our goal is to maximize user satisfaction.

Assume you already designed and implemented a survey that measured the
attitude of your customers towards the latest global products you have
launched

Data science Theory 13

When you the data on survey as the graph in left side. We should do the
cluster analysis.
Once we find out there are 4 separate groups it makes sense to run four
separate test.

Machine learning
creating an algorithm, which a computer then uses to find a model that fits the
data as best as possible and makes vert predictions based on that.

Machine learning algorithm -A trial and error process. Each consecutive trial
is at least as good as the previous one .

There are 4 ingredients.

 Data

 Model

 Objective function - To measure the inaccuracy

 Optimization algorithm - To improve

Types of machine learning :

 Supervised learning - This uses the prior results here the data is labeled

 Unsupervised learning - Here the data is unlabeled.

 Reinforcement learning -

Data science Theory 14

Linear Regression with Scikit-Learn
No ratings yet
Linear Regression with Scikit-Learn
8 pages
ML Unit 2
No ratings yet
ML Unit 2
25 pages
Module 1
No ratings yet
Module 1
96 pages
KPMG Data Analytics - Task 1
100% (1)
KPMG Data Analytics - Task 1
1 page
ML0101EN Clas Logistic Reg Churn Py v1
100% (1)
ML0101EN Clas Logistic Reg Churn Py v1
13 pages
Data Preprocessing
No ratings yet
Data Preprocessing
77 pages
Data Science & Business Analytics: Post Graduate Program in
No ratings yet
Data Science & Business Analytics: Post Graduate Program in
16 pages
Data Science Guide: Concepts & Roles
100% (1)
Data Science Guide: Concepts & Roles
67 pages
Lecture+Notes (Upgrad)
No ratings yet
Lecture+Notes (Upgrad)
5 pages
Machine Learning and Data Analytics Using Python Lab
No ratings yet
Machine Learning and Data Analytics Using Python Lab
36 pages
Python Data Science Exercises
No ratings yet
Python Data Science Exercises
8 pages
Churn Modeling
100% (1)
Churn Modeling
11 pages
R and Python for Data Science Insights
100% (1)
R and Python for Data Science Insights
7 pages
Data Pre-Processing (Pandas)
No ratings yet
Data Pre-Processing (Pandas)
19 pages
Machine Learning: Bilal Khan
100% (2)
Machine Learning: Bilal Khan
20 pages
SQL - Basics
No ratings yet
SQL - Basics
25 pages
Predictive Modeling for Customer Churn
100% (1)
Predictive Modeling for Customer Churn
58 pages
Wine Quality Prediction with SVR
100% (1)
Wine Quality Prediction with SVR
6 pages
Customer Segmentation Analysis
No ratings yet
Customer Segmentation Analysis
34 pages
Project - Data Mining: Bank - Marketing - Part1 - Data - CSV
No ratings yet
Project - Data Mining: Bank - Marketing - Part1 - Data - CSV
4 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
22 pages
(IJETA-V8I5P1) :yew Kee Wong
No ratings yet
(IJETA-V8I5P1) :yew Kee Wong
5 pages
IIT Madras Advanced Data Science Certification
No ratings yet
IIT Madras Advanced Data Science Certification
18 pages
Statistics For Data Science
100% (3)
Statistics For Data Science
39 pages
6 - KNN Classifier
No ratings yet
6 - KNN Classifier
10 pages
Comprehensive Python & ML Course Guide
100% (2)
Comprehensive Python & ML Course Guide
12 pages
Data Preprocessing Overview and Techniques
100% (1)
Data Preprocessing Overview and Techniques
41 pages
Data Science PPT Module 1
100% (1)
Data Science PPT Module 1
24 pages
Machine Learning Hands-On
100% (1)
Machine Learning Hands-On
18 pages
Data Analyst Cheatsheet - For - Kuhtfe
No ratings yet
Data Analyst Cheatsheet - For - Kuhtfe
6 pages
Python For Finance - The Complete Beginner's Guide - by Behic Guven - Jul, 2020 - Towards Data Science PDF
100% (1)
Python For Finance - The Complete Beginner's Guide - by Behic Guven - Jul, 2020 - Towards Data Science PDF
12 pages
Crime Prediction in Nigeria's Higer Institutions
No ratings yet
Crime Prediction in Nigeria's Higer Institutions
13 pages
Data Science Interview Questions (#Day11) PDF
100% (1)
Data Science Interview Questions (#Day11) PDF
11 pages
Regression Analysis in Machine Learning
No ratings yet
Regression Analysis in Machine Learning
26 pages
Lecture 9 PDF
100% (1)
Lecture 9 PDF
28 pages
Time Series Modeling & Analysis
No ratings yet
Time Series Modeling & Analysis
55 pages
Machine Learning and Linear Regression
100% (1)
Machine Learning and Linear Regression
55 pages
Ocs353 DSF Unit III Notes
No ratings yet
Ocs353 DSF Unit III Notes
11 pages
Logistic Regression Basics
No ratings yet
Logistic Regression Basics
1 page
Project 5 - Cars
100% (1)
Project 5 - Cars
22 pages
Python Data Analysis Guide
No ratings yet
Python Data Analysis Guide
75 pages
SAS Presentation
No ratings yet
SAS Presentation
49 pages
Customer Churn Prediction
100% (1)
Customer Churn Prediction
18 pages
Free Data Science Courses & Certs
No ratings yet
Free Data Science Courses & Certs
2 pages
Sajjad DS
100% (2)
Sajjad DS
97 pages
Data Science Use Cases
100% (1)
Data Science Use Cases
10 pages
Data Visualization
No ratings yet
Data Visualization
9 pages
Student Movie Ticket System Report
No ratings yet
Student Movie Ticket System Report
14 pages
365 Data Science R Course Notes
No ratings yet
365 Data Science R Course Notes
20 pages
DATA SCIENCE With DA, ML, DL, AI Using Python & R PDF
100% (1)
DATA SCIENCE With DA, ML, DL, AI Using Python & R PDF
10 pages
Predictive Analytics for Businesses
100% (1)
Predictive Analytics for Businesses
32 pages
Predictive Modeling Project Report
100% (2)
Predictive Modeling Project Report
31 pages
Machine Learning Lab Manual 7
100% (1)
Machine Learning Lab Manual 7
8 pages
Data Mini Proj
100% (2)
Data Mini Proj
44 pages
Intro to Data Science Fields
No ratings yet
Intro to Data Science Fields
8 pages
Data Analytics 1
No ratings yet
Data Analytics 1
4 pages
Data Science Essentials for Beginners
No ratings yet
Data Science Essentials for Beginners
203 pages
IDS Unit 1
No ratings yet
IDS Unit 1
67 pages
File
No ratings yet
File
27 pages
Data Analytics: Key Concepts & Terms
No ratings yet
Data Analytics: Key Concepts & Terms
22 pages
Answer Sheet Dhruv
No ratings yet
Answer Sheet Dhruv
40 pages
Correlation & Linear Association
No ratings yet
Correlation & Linear Association
29 pages
SPSS Overview and Features Guide
No ratings yet
SPSS Overview and Features Guide
4 pages
Introduction to Probability Concepts
100% (3)
Introduction to Probability Concepts
12 pages
Importance of Literature Review in Research
No ratings yet
Importance of Literature Review in Research
7 pages
Classical vs. Bayes Reliability Growth in Theory and Practice
No ratings yet
Classical vs. Bayes Reliability Growth in Theory and Practice
6 pages
Two-Sample Tests of Hypothesis: Mcgraw Hill/Irwin
No ratings yet
Two-Sample Tests of Hypothesis: Mcgraw Hill/Irwin
14 pages
Becs 184 e Dec 2022 - Pyq
No ratings yet
Becs 184 e Dec 2022 - Pyq
40 pages
Research Proposal Preparation Guide
No ratings yet
Research Proposal Preparation Guide
24 pages
Writing An Action Research Proposal: Jemboy R. Hermogenes
No ratings yet
Writing An Action Research Proposal: Jemboy R. Hermogenes
37 pages
Formulating Hypotheses The Meaning of Hypothesis
100% (1)
Formulating Hypotheses The Meaning of Hypothesis
8 pages
Test - BJU Science 2 Chapter 1 - Quizlet
No ratings yet
Test - BJU Science 2 Chapter 1 - Quizlet
8 pages
BUSS252
No ratings yet
BUSS252
3 pages
Five Lectures On General Semantics Korzybski
80% (5)
Five Lectures On General Semantics Korzybski
25 pages
Analytical Chemistry
No ratings yet
Analytical Chemistry
21 pages
L6research Problem and Research Question
No ratings yet
L6research Problem and Research Question
11 pages
UV Characterization of Compound X
No ratings yet
UV Characterization of Compound X
2 pages
Regression Modeling Strategies: Frank E. Harrell, JR
50% (2)
Regression Modeling Strategies: Frank E. Harrell, JR
11 pages
Research Skills Quiz
No ratings yet
Research Skills Quiz
17 pages
An Evidence-Based Review of HR Analytics Janet H. Marler & John W. Boudreau
No ratings yet
An Evidence-Based Review of HR Analytics Janet H. Marler & John W. Boudreau
3 pages
Bootstrap Test for Backtesting Significance
No ratings yet
Bootstrap Test for Backtesting Significance
11 pages
Statistics and Probablity SHS 11-Module 1 - Week1
92% (49)
Statistics and Probablity SHS 11-Module 1 - Week1
23 pages
Quali Notes From Sir Leo
No ratings yet
Quali Notes From Sir Leo
62 pages
SPSS Nonparametric Statistics-Rank Tests
No ratings yet
SPSS Nonparametric Statistics-Rank Tests
5 pages
Introduction To SPSS and Epi-Info
0% (1)
Introduction To SPSS and Epi-Info
129 pages
DBD Incidence Factors in Jeneponto
No ratings yet
DBD Incidence Factors in Jeneponto
6 pages
Test Bank For Research Methods Design and Analysis, 11th Edition: Christensen Instant Download
No ratings yet
Test Bank For Research Methods Design and Analysis, 11th Edition: Christensen Instant Download
94 pages
Chapter 3 Reliability Validity and Ethics)
No ratings yet
Chapter 3 Reliability Validity and Ethics)
11 pages
Quantitative Methods Guide for SPSS
No ratings yet
Quantitative Methods Guide for SPSS
343 pages
Data Screening & Cleaning Guide
No ratings yet
Data Screening & Cleaning Guide
20 pages