0% found this document useful (0 votes)

28 views72 pages

Lecture 1-2 - Introduction

Uploaded by

nikita.andhale

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

28 views72 pages

Lecture 1-2 - Introduction

Uploaded by

nikita.andhale

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Business Data

Science
MIS382N

Constantine Caramanis
The University of Texas at Austin
constantine@[Link]
Business Data Science

• Prof. Constantine Caramanis

• [Link]

• Teaching Assistant: Debajit Chakraborty

• email: debajit@[Link]

• Recap of administrative issues through Syllabus

• This class and next semester’s class
• Our traditional conception of knowledge is
axiomatic:
• Example: We prove theorems of geometry
based on Euclid’s axioms.

What is • Even in empirical sciences, we use

Artificial
experiments to try to formulate general laws
of nature, and then use those laws to make
predictions about how things work / the
Intelligence? future.

• In its modern rendition, AI and Machine

learning are giving much more emphasis on
the pathway directly from data to
predictions.
What is Artificial Intelligence?

Axioms
Theorems
“Knowledge”

Results
Data Predictions
Decisions
John Snow &
Cholera

• 1854 was a bad year in the

SoHo neighborhood, in the
heart of London.
• For yet another time in the
history of the city, Cholera
was claiming scores of
lives…

• Why? What was the cause?

John Snow &
Cholera
• Living conditions were already
challening…

• General lack of hygiene

• Overpopulation

• Very dense living

conditions

• Pollution in air and water

John Snow
& Cholera
• Cholera incidences in
the SoHo NBHD:
John Snow
& Cholera
• Cholera incidences in
the SoHo NBHD:
• Population without
cholera:
John Snow &
Cholera
• Possible explanations /
theories at the time about what
was the cause of the epidemic

• Miasma in the air

John Snow &
Cholera
• Possible explanations /
theories at the time about what
was the cause of the epidemic

• Person to person transmission,

like Flu or Covid
John Snow
& Cholera
• Possible explanations
/ theories at the time
about what was the
cause of the epidemic
1. Miasma in the air
2. Transmission Α à B
John Snow
& Cholera
• Possible explanations
/ theories at the time
about what was the
cause of the epidemic
1. Miasma in the air
2. Transmission Α à B
John Snow
& Cholera
• Possible explanations
/ theories at the time
about what was the
cause of the epidemic
1. Miasma in the air
2. Transmission Α à B
John Snow
& Cholera
• Possible explanations
/ theories at the time
about what was the
cause of the epidemic
1. Miasma in the air
2. Transmission Α à B
3. Something else?
John Snow
& Cholera
• Possible explanations
/ theories at the time
about what was the
cause of the epidemic
1. Miasma in the air
2. Transmission Α à B
3. Cluster according to
proximity to water
wells
John Snow
& Cholera
• Possible explanations
/ theories at the time
about what was the
cause of the epidemic
1. Miasma in the air
2. Transmission Α à B
3. Cluster according to
proximity to water
wells
Modern
Problems in • Essentially infinite computational power.
• Essentially infinite data.
ML/AI • Problems that combine images/videoi
(Computer Vision), natural language – English
or other languages or multiple languages
(Natural Language Processing), Dynamic
What has Decision Making.

changed?
Prediction:

Some of the • Classification

• Regression
basic or • Images/Video, Language, Combination

foundational Generative AI
problems in • Natural Language
ML/AI • Images
• Video
• Multimodal/combinations
Computer
Vision:
classification
Computer
Vision:
Regression
Natural
Language
Processing:
Classification
Natural
Language
Processing:
Regression
Generative AI: Computer Vision
[Link]
Generative AI: Computer Vision
[Link]
Generative AI: Computer Vision
[Link]
Generative AI: Natural Language
Overview of the landscape

Terms used:

Data science
Machine Learning
Artificial Intelligence
Big Data
Data Mining
Statistics
Overview of the landscape
@jeremyjarvis
Terms used:
“A data scientist is a statistician who lives in San Francisco”
Data science
@BigDataBorat
Machine Learning
Data science is statistics on a Mac.
Artificial Intelligence
Big Data
@josh_wills
Data Mining
Data Scientist (n.): Person who is better at statistics than any software
Statistics
engineer and better at software engineering than any statistician

(anonymous)
The difference between statistics and data science is about 30k per
year.

(anonymous)
It is statistics if it’s done in R.
It is Machine learning if it’s done in Python
It is AI if it’s done in Powerpoint.
Overview of the landscape
Statistics: The very first line of the American Statistical Association’s definition of statistics is “Statistics is the science of
learning from data...” Given that the words “data” and “science” appear in the definition, one might assume that data science is
just a rebranding of statistics.
Statisticians are not happy they are not getting the research funding and salary bumps involved.

Data science: Broad and modern term. Includes much more software engineering knowledge. Usually done in Python (as opposed
to R/Stata/SAS/SPSS/Excel etc). Includes analytics on bigger datasets (e.g. terabytes or petabytes by using tools like Apache Spark,
Hadoop Mapreduce which enable distributed processing. Includes data collection and data cleaning pipelines (data engineering/
data wrangling). Connections to database backends and web-serving front-ends. Includes to some extend machine learning and
AI as sub-areas.

Machine learning: The more mathematically complex part of data science focused on modeling (as opposed to software).
Includes supervised learning, predictive modeling, unsupervised learning (like clustering), text and image understanding.

Artificial Intelligence: Broad and classic term that includes machine learning as a sub-discipline. Allowing computers to do things
that humans do when they say they are thinking. Includes perception (image understanding, speech understanding), Language
translation, playing games, statistical machine learning techniques but also logic-based symbolic AI, reasoning, planning, and
robotics).

Data Mining: Applied version of Machine learning. Includes more large-scale software and performance issues. Also intersects
the database community.

Big Data: Focused on scaling data analytics to very large data sets. Part of data science that will hopefully follow the information
superhighway and internet surfing to obsolete historical nomenclature.
In terms of research communities:
Statistics
Research published in stats journals. Top venues: Annals of Statistics, JASA, Journal of the Royal Statistical Society.

Data science:
Not a properly defined research community.

Machine learning:
Research published in top ML conferences: NeurIPS, ICML, also more recently ICLR. Also includes KDD (more applied, data
mining).
[Link]

Artificial Intelligence:
Includes ML conferences but also AAAI and IJCAI as top venues.

Data Mining: Applied version of Machine learning. Includes more large-scale software and performance issues. Also intersects
the database community.
Research published in top Data Mining conferences: KDD, SDM.

Big Data:
Not a properly defined research community.
Engineers who can setup Hadoop/Spark clusters. Can work on data directly on disk and process at massive scale.
Supervised and Unsupervised Learning
A taxonomy for machine learning

• Supervised Learning: learning how to predict labels

• Unsupervised Learning: finding structure in data, without labels.

Supervised learning: Binary classification
• Given a table of training data containing features (x1,x2,..) and a target
variable y we want to predict.
• Example: Taste-test: Predict if a new beverage will be evaluated as having
‘Great taste’ from a focus group.

Acidity(A) Sweetness (S) y=‘Great taste’?

Acidity
Binary classification with a linear classifier
Acidity(A) Sweetness y=‘Great Model 3
(S) taste’? predicts
Bev1 0.8 0.8 1 ?
Bev2 0.3 0.25 0 ?
Bev3 0.2 0.8 0 ?
Bev4 0.3 0.7 0 ?
Bev5 0.9 0.7 1 ?

Model 3: Sweetnes
s
f(A,S) = 1 if A+S -1 ≥ 0
0 otherwise

Compute the predictions of this model.

Draw the decision boundary

Model 3: Sweetnes
s
f(A,S) = 1 if A+S -1 ≥ 0
0 otherwise

Compute the predictions of this model.

Draw the decision boundary

Model 4: Sweetnes
s
f(A,S) = 1 if A+S -1.3 ≥ 0
0 otherwise

Compute the predictions of this model.

Draw the decision boundary

Acidity
Diabetes is the 8th leading cause of death in the US.
Predicting Diabetes Major goals include predicting and preventing
diabetes.
Diabetes is the 8th leading cause of death in the US.
Predicting Diabetes Major goals include predicting and preventing
diabetes.

First patient in our dataset Result: is the patient diabetic

Second patient in our dataset
Diabetes is the 8th leading cause of death in the US.
Predicting Diabetes Major goals include predicting and preventing
diabetes.

Χ – features X y = what we want to predict (outcome/target) y

Diabetes is the 8th leading cause of death in the US.
Predicting Diabetes Major goals include predicting and preventing
diabetes.

X1 X2 y
Predicting
Diabetes
• What is Χ?
• What is y?
Predicting
Diabetes
Eye-balling it: bottom left
should be blue (outcome = 0),
top right probably should be
red (outcome = 1)
Predicting
Diabetes
Eye-balling it: bottom left
should be blue (outcome = 0),
top right probably should be
red (outcome = 1)

We want an algorithm: we
want an implementable (on a
computer) procedure that
takes two numbers, X1, X2,
and outputs “0” or “1”
A first algorithm: Decision Trees

Algorithm:

𝑋! ≥ 45: 𝑟𝑒𝑑
Χ ! < 45: 𝑏𝑙𝑢𝑒
A first algorithm: Decision Trees

Algorithm :

𝑋! ≥ 150: 𝑟𝑒𝑑
Χ! < 150: 𝑏𝑙𝑢𝑒
Decision Trees

Decision Tree with Depth = 1

𝑋/ ≥ α We have 4 parameters:
(1) i
(2) α
(3) a
(4) b
𝑦=𝑎 𝑦=𝑏
Decision Trees
Decision Tree with Depth = 1

𝑋/ ≥ α

𝑦=𝑎 𝑦=𝑏

We have four parameters:

(1) i = 2
(2) α = 45
(3) a = blue
(4) b = red
Decision Trees
Decision Tree with Depth = 1

𝑋/ ≥ α

𝑦=𝑎 𝑦=𝑏

We have four paremeters:

(1) i = 1
(2) α = 150
(3) a = blue
(4) b = red
Decision Trees

(1) i = 1
(2) α = 180
(3) a = blue
(4) b = red
The “best” depth 1 decision tree

The best decision tree is the one that has

the lowest loss, in this case, the one that
makes the fewest mistakes on the data!
Supervised learning paradigm

Features X
(Acidity,
Sweetness,
Color, prediction: h(X)
Carbonation,
Others…)

Model: h

Goal: design and train the model to make good

predictions on data it has not seen.
Several Important Objectives
• What are the algorithmic tools available and how to use
them/expand them, given computational constraints.

• How to use the algorithmic tools and statistical knowledge to ask

the right questions.

• How to use algorithmic tools and statistical knowledge to interpret

the results.
• Are they meaningful?
• Can we trust them?

Module 1 Part 2
No ratings yet
Module 1 Part 2
19 pages
Christ Lecture 9 AI Intro, Evolution, & Terminology
No ratings yet
Christ Lecture 9 AI Intro, Evolution, & Terminology
62 pages
AI Basics for Tech Enthusiasts
No ratings yet
AI Basics for Tech Enthusiasts
8 pages
Artificial Intelligence and Machine Learning
No ratings yet
Artificial Intelligence and Machine Learning
45 pages
Unit 3 - Data Science, Machine Learning
No ratings yet
Unit 3 - Data Science, Machine Learning
20 pages
AIML Module-2.2 Notes
No ratings yet
AIML Module-2.2 Notes
55 pages
Introduction To Artificial Intelligence
No ratings yet
Introduction To Artificial Intelligence
13 pages
Machine Learning and Data Mining Overview
No ratings yet
Machine Learning and Data Mining Overview
40 pages
Machine Learning
No ratings yet
Machine Learning
7 pages
Inttroduction To Ai & Data Science
No ratings yet
Inttroduction To Ai & Data Science
65 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
32 pages
Computational Intelligence: (Introduction To Machine Learning)
No ratings yet
Computational Intelligence: (Introduction To Machine Learning)
55 pages
Slides-1 Intro Day1
No ratings yet
Slides-1 Intro Day1
71 pages
Machine Learning Unit-1.1
No ratings yet
Machine Learning Unit-1.1
43 pages
DSUP (AI-DS) Experiments Prem
No ratings yet
DSUP (AI-DS) Experiments Prem
107 pages
GE 461: Data Science Overview
No ratings yet
GE 461: Data Science Overview
39 pages
AIH Module 1-2
No ratings yet
AIH Module 1-2
43 pages
Data Science ML Learning Demo
No ratings yet
Data Science ML Learning Demo
34 pages
ML Module2-Chapter 1
No ratings yet
ML Module2-Chapter 1
50 pages
Chapter-14 Data Science
No ratings yet
Chapter-14 Data Science
12 pages
Introduction to Data Science Concepts
No ratings yet
Introduction to Data Science Concepts
74 pages
MILIT PPT Modifies
No ratings yet
MILIT PPT Modifies
43 pages
Textbook ML - Removed - Removed - Removed
No ratings yet
Textbook ML - Removed - Removed - Removed
42 pages
Introduction To Machine Learning
100% (1)
Introduction To Machine Learning
119 pages
Class X AI Unit 4: Data Science
No ratings yet
Class X AI Unit 4: Data Science
57 pages
Data Science AI Guide For Society ++
No ratings yet
Data Science AI Guide For Society ++
20 pages
Machine Learning For Absolute Beginners A Plain English Introduction 2 Edition Edition Oliver Theobald Install Download
No ratings yet
Machine Learning For Absolute Beginners A Plain English Introduction 2 Edition Edition Oliver Theobald Install Download
59 pages
Schaum's Outline Theory and Problems of Fourier Analysis
0% (1)
Schaum's Outline Theory and Problems of Fourier Analysis
12 pages
Lect 01 DS Intro
No ratings yet
Lect 01 DS Intro
4 pages
1 - Business Artificial IntelegenceIntroduction
No ratings yet
1 - Business Artificial IntelegenceIntroduction
25 pages
Introduction to Data Science & ML
No ratings yet
Introduction to Data Science & ML
41 pages
Understanding AI, ML, and Data Science
No ratings yet
Understanding AI, ML, and Data Science
46 pages
Unit I MACHINE LEARNING
No ratings yet
Unit I MACHINE LEARNING
87 pages
Machine Learning Unit-1.1
No ratings yet
Machine Learning Unit-1.1
29 pages
AI in Spam Email Detection
No ratings yet
AI in Spam Email Detection
18 pages
1.2. Related Fields
No ratings yet
1.2. Related Fields
10 pages
AI Lecture 6
No ratings yet
AI Lecture 6
23 pages
AI and ML
No ratings yet
AI and ML
68 pages
Exploring Artificial Intelligence Machine Learning
No ratings yet
Exploring Artificial Intelligence Machine Learning
178 pages
Unit 1
No ratings yet
Unit 1
62 pages
Intro to AI: Key Concepts & Techniques
No ratings yet
Intro to AI: Key Concepts & Techniques
27 pages
Intro To Machine Learning
No ratings yet
Intro To Machine Learning
32 pages
Chapter 1 Introduction To AI
No ratings yet
Chapter 1 Introduction To AI
53 pages
Ai For Everyone Presentation 135 Slides!
No ratings yet
Ai For Everyone Presentation 135 Slides!
135 pages
Ethics, Uses and Abuses of ML
No ratings yet
Ethics, Uses and Abuses of ML
11 pages
AI, Machine Learning and Data Science
No ratings yet
AI, Machine Learning and Data Science
21 pages
EPS DL Handout1 Introduction Compressed
No ratings yet
EPS DL Handout1 Introduction Compressed
46 pages
Introduction
No ratings yet
Introduction
84 pages
Introduction to Machine Learning Concepts
No ratings yet
Introduction to Machine Learning Concepts
28 pages
Artificial Intelligence
No ratings yet
Artificial Intelligence
57 pages
UGA Data Science & ML Overview
No ratings yet
UGA Data Science & ML Overview
22 pages
AI Techniques and Applications Overview
No ratings yet
AI Techniques and Applications Overview
36 pages
What Is Machine Learning - Saagie
No ratings yet
What Is Machine Learning - Saagie
7 pages
Slides - Intro
No ratings yet
Slides - Intro
23 pages
Nitin Raj Sharma - AIApplicationsInTheDomainsOfMachineLearningAndDeepLearning - NitinRajSh
No ratings yet
Nitin Raj Sharma - AIApplicationsInTheDomainsOfMachineLearningAndDeepLearning - NitinRajSh
9 pages
Data Science Vs Machine Learning and Artificial Intelligence
No ratings yet
Data Science Vs Machine Learning and Artificial Intelligence
12 pages
AI Intro Session
No ratings yet
AI Intro Session
21 pages
11 Ai Level 1 Notes
No ratings yet
11 Ai Level 1 Notes
8 pages
Master Data
No ratings yet
Master Data
520 pages
Midterm 2023 Fall
No ratings yet
Midterm 2023 Fall
8 pages
BDS-Homework-1-Submission - Ipynb - Colab
No ratings yet
BDS-Homework-1-Submission - Ipynb - Colab
11 pages
MIS382N Fall 2024 Syllabus
No ratings yet
MIS382N Fall 2024 Syllabus
4 pages
Updated Nikita MSITM CV Fortune Brands
No ratings yet
Updated Nikita MSITM CV Fortune Brands
2 pages
The Alternative Dispute Resolution
No ratings yet
The Alternative Dispute Resolution
176 pages
Natural Cosmetics: Buyer Insights
No ratings yet
Natural Cosmetics: Buyer Insights
12 pages
IFRS Impact on Financial Reporting Quality
No ratings yet
IFRS Impact on Financial Reporting Quality
15 pages
Simple Linear Regression Overview
No ratings yet
Simple Linear Regression Overview
31 pages
ESD ChangiTaxiBoarding
No ratings yet
ESD ChangiTaxiBoarding
33 pages
Intro to Linear Regression Analysis
No ratings yet
Intro to Linear Regression Analysis
2 pages
Mahdzan and Tabiani 2013 00008277 - 76522
No ratings yet
Mahdzan and Tabiani 2013 00008277 - 76522
16 pages
Machine Learning With Python: Amin Zollanvari
100% (1)
Machine Learning With Python: Amin Zollanvari
457 pages
Inventory Control and EOQ Analysis
100% (1)
Inventory Control and EOQ Analysis
25 pages
The Food System Transformation in Developing Countries - A Disaggregate Demand Analysis For Fruits and Vegetables in Vietnam
No ratings yet
The Food System Transformation in Developing Countries - A Disaggregate Demand Analysis For Fruits and Vegetables in Vietnam
11 pages
Hospital Equipment Management Study
No ratings yet
Hospital Equipment Management Study
14 pages
Conjoint Analysis in Marketing
No ratings yet
Conjoint Analysis in Marketing
13 pages
The Climate Change Mitigation Impacts of Active Travel Evidence From A Longitudinal Panel Study in Seven European Cities
No ratings yet
The Climate Change Mitigation Impacts of Active Travel Evidence From A Longitudinal Panel Study in Seven European Cities
15 pages
CHAPTER 1 - INTRODUCTION - Introduction To Linear Regression Analysis, 5th Edition
No ratings yet
CHAPTER 1 - INTRODUCTION - Introduction To Linear Regression Analysis, 5th Edition
11 pages
Disclaimer This Content Is AI-Generated How AI-Disclosures Influence Trust in Advertisements and Organizations
No ratings yet
Disclaimer This Content Is AI-Generated How AI-Disclosures Influence Trust in Advertisements and Organizations
15 pages
Chapter 14-Introduction To Multiple Regression
No ratings yet
Chapter 14-Introduction To Multiple Regression
67 pages
Cameroon Debt and Economic Growth Analysis
No ratings yet
Cameroon Debt and Economic Growth Analysis
25 pages
2006 Measuring Firm Access To Finance
No ratings yet
2006 Measuring Firm Access To Finance
25 pages
K 14slr PDF
No ratings yet
K 14slr PDF
49 pages
613 P
No ratings yet
613 P
2 pages
Two Mark Question and Answers: SRM Institute of Science and Technology
No ratings yet
Two Mark Question and Answers: SRM Institute of Science and Technology
9 pages
Variables in Science-ESR
No ratings yet
Variables in Science-ESR
2 pages
Impact of Free Cash Flow On Profitability: An Empirical Study On Pharmaceutical Company
No ratings yet
Impact of Free Cash Flow On Profitability: An Empirical Study On Pharmaceutical Company
10 pages
Transforming Marketing With Artificial Intelligence: July 2020
No ratings yet
Transforming Marketing With Artificial Intelligence: July 2020
14 pages
01 Variables
No ratings yet
01 Variables
16 pages
Surgeon Anesthesiologist Dyad
No ratings yet
Surgeon Anesthesiologist Dyad
10 pages
The Next 5 Questions Are Based On The Following Information.
No ratings yet
The Next 5 Questions Are Based On The Following Information.
10 pages
Experimental Research Design
100% (2)
Experimental Research Design
21 pages
Unit - 1 Introduction-Statistical Inference
No ratings yet
Unit - 1 Introduction-Statistical Inference
28 pages
Regression Analysis Quiz 1
No ratings yet
Regression Analysis Quiz 1
4 pages