Introduction to
Data Science & ML
AI
Sources:
https://becominghuman.ai/how-to-get-the-perfect-start-in-a
i-ml-as-newbie-learn-the-art-in-just-5-mins-cba28d2705e4
neuefische.de 3
neuefische.de
WHAT IS “NOT MACHINE
LEARNING”?
neuefische.de
Humans and algorithms
Which problems can be solved by “NOT MACHINE LEARNING”
1. "Rock paper scissors"
2. "Tic tac toe"
3. Cookie monster eats 10 kg of cookies each day. For every 10 kg
that he eats, he gets fatter by 5kg. (The rest of the energy is
consumed by having to hunt for cookies.)
How many kgs does cookie monster weigh today if his initial
weight was 100kg and he has been eating cookies for 5 days?
Solution :
neuefische.de
Humans and algorithms
Algorithm definition
“A finite set of unambiguous instructions that, given some set of initial conditions,
can be performed in a prescribed sequence to achieve a certain goal and that has a
recognizable set of end conditions.”
“Learning” - the act, process, or
experience of gaining knowledge
or skill.
In the examples above the Machine is
not learning, it’s doing what you told it to.
So who’s doing the “learning”?
neuefische.de
Humans and algorithms
Does “Not Machine Learning” have limitations?
Cookie monster sometimes gets a visit from his auntie and they drink tea together. His auntie brings 15 kg of cookies with her every time for
her favourite nephew to consume. His auntie comes only on days that she has good mood and not more frequently than 2 times a week.
The following is known about Cookie Monster's auntie mood swings :
-She likes when it's sunny outside
-She doesn't like if it's more than 28 degrees outside
-She doesn't like if her neighbour is looking out of the window when she is leaving the house
-She likes to take tram number 1 and not tram number 3
The auntie only has good mood if at the number of likes on the day outweighs the number of dislikes. It is also known that on average she has
good mood 3 times a week.
How many kgs does cookie monster weighs today if his initial weight is 100kg and he has been eating cookies for 5 days, his auntie came to
visit once this week already and it's been a nice week with 25 temperature, but tram number 1 is not working?
Solution :
neuefische.de
Determinism and probability
Uncertainty
A deterministic system is one in which the occurrence of all events is known
with certainty. If the description of the system state at a particular point of time
of its operation is given, the next state can be perfectly predicted.
A probabilistic system is one in which the occurrence of events cannot be
perfectly predicted. Though the behavior of such a system can be described in
terms of probability, a certain degree of error is always attached to the
prediction of the behavior of the system.
neuefische.de
Determinism and probability
Heuristics / baseline model
A heuristic (/hjʊˈrɪstɪk/; from Ancient Greek εὑρίσκω (heurískō) 'I find, discover'), or heuristic
technique, is an approach to problem solving or self-discovery using 'a calculated guess' derived
from previous experiences. Heuristics are mental shortcuts that ease the cognitive load of
[1][2]
making a decision. Usually the opposite process to heuristics is the application of
algorithms. Algorithms involve calculated answers and guesswork is eliminated.
In our case : We could assume that auntie comes once a week. It’s not 100% right, but it’s not
completely wrong either.
neuefische.de
Humans or machines learning?
Cookie Monster gain weight - an uncertain mystery
Very little is known about how Cookie Monster gains weight.
The following observations are however available :
day KIllograms of Neigbour looking Temperature Tram 1 working Lake water Evgeny teaching Weight beginning Weight end of day
cookies out of the window outside temperature ML class of day
consumed
1 15 Yes 25 1 15 1 100 114.3
2 10 No 23 0 15.5 0 114.3 120.7
3 40 Yes 29 1 15.3 1 120.7 135.4
Cookie monster has a birthday in 2 weeks and local municipality would like to give him a
postcard with his exact weight written on it. Can you accurately predict it?
neuefische.de
Humans or machine learning?
What if the system is non-deterministic and also highly
complex?
● It is difficult to understand what the rules are
● The rules are too complex to write down
● There are too many rules
● Rules sometimes apply and sometimes don’t and you
don’t know when or why
● You have tried heuristics and they don’t work well
neuefische.de
Machine Learning
Perhaps the Machine can figure it out?
If it’s too much for you to figure out, perhaps the Machine could?
Human learning Machine Learning
neuefische.de
MACHINE LEARNING IS A TOOL
TO DEAL WITH UNCERTAINTY IN
PROBABILISTIC SYSTEMS
Use it when you have exhausted all other options and not because you were too lazy to think and explore
neuefische.de
DO NOT SOLVE
DETERMINISTIC PROBLEMS WITH
MACHINE LEARNING
Using Machine Learning introduces complexity and overheads that can only be justified if they are absolutely necessary
neuefische.de
WHAT IS DATA?
neuefische.de 16
once upon a time
interactions
& learning
interactions
& learning
neuefische.de 17
now
interactions
& learning
interactions
& learning
neuefische.de 18
now
everything can be data:
a click, walking with your phone, opening
zoom, accessing a website, the weather,
buying something online, paying by card
we both produce data and are clients for the
data systems.. which collect our data
neuefische.de 19
data lifecycle
interaction - collection - transformation - enriching - modeling - getting
insights - improving the application
neuefische.de 20
WHO DOES WHAT
IN DATA?
neuefische.de 21
some data roles in keywords
data engineer - data warehouse, data lake, data infrastructure, data
pipeline, data transformation and enriching, ETL, automation, software
engineering
closely related roles : data ops, ml ops
neuefische.de
some data roles in keywords
data analyst - data warehouse, data pipeline, data transformation and
enriching, ETL, data analysis, EDA, KPIs, statistics, data exploration,
dashboards, visualization, communicating, assessing data products
closely related roles : product analyst, data scientist, data visualizer,
(growth hacker...)
neuefische.de
some data roles in keywords
data scientist - data pipeline, data analysis, KPIs, statistics, data
exploration, visualization, EDA, communicating, data modeling,
predicting, building data products, deep learning
closely related roles : product analyst, machine learning engineer, data
visualizer
neuefische.de
some data roles in keywords
machine learning engineer - data pipeline, data analysis, data
modeling, predicting, building data products, automation, software
engineering
closely related roles : data scientist, data engineer
neuefische.de
WHAT IS AI?
neuefische.de 26
What is AI?
It’s not the Terminator
It is a branch of Computer Science!! with
subdomains
Narrow AI: real AI .. math / computational
statistics on steroids .. solves one task
General AI: imaginary AI .. killer robots,
paperclip machine (decides to build paper clips
and drowns all mankind)
Technochauvinism: believing that all problems
can be solved by tech
Meredith Broussard, Artificial Unitelligence
https://www.c-span.org/video/?457638-2/artificial-unintelligence
neuefische.de 27
Machine Learning
When was the term Machine Learning coined?
What about Neural Networks?
neuefische.de 28
Machine Learning
When was the term Machine learning coined?
1959 Arthur Samuel
The term machine learning was coined in 1959 by Arthur Samuel, an American IBMer and
pioneer in the field of computer gaming and artificial intelligence.
What about neural networks?
1958 psychologist Frank Rosenblatt
The first artificial neural network was invented in 1958 by psychologist Frank Rosenblatt. Called
Perceptron, it was intended to model how the human brain proccessed visual data and learned to
recognize objects
neuefische.de 29
Machine Learning - what changed
AI winter due to “ahead of their time”
Computing power and Democratization of algorithms
Now it is AT SCALE
neuefische.de 30
ML Applications in Society
How can Machine learning help?
Smarter weather prediction and agriculture
Energy optimization
Self-driving cars
AI in healthcare / Drug discovery
Finance / Fraud detection
On-demand language translation
neuefische.de 31
Machine Learning - applications
What can we do with it?
Product recommendations
Demand prediction for a service
Dynamic Pricing in transport
Predictive maintenance
Winning a game of chess
Sentiment analysis
Personalized medication
https://www.projectpro.io/article/10-awesome-machine-learning-applications-of-today/364
neuefische.de 32
AI - Effect on Society
How can AI be dangerous?
Autonomous weapons
Social manipulation
Invasion of privacy and social grading
Recruiting
Amplifies discrimination
check out Coded Bias on Netflix
neuefische.de 33
AI - Effect on Society
neuefische.de 34
AI and discrimination - PULSE AI
Twitter storm
neuefische.de
AI and the big players
Ethics or profit?
Facebook (meta), Google, Twitter, Amazon, Apple ..
are constantly in the news with stories about their
algorithms not being properly regulated
https://www.theregreview.org/2022/01/03/cusumano-yoffie-gawer-pushing-social-media-sel
f-regulate/
neuefische.de
Awful AI
https://github.com/daviddao/awful-ai
neuefische.de
WHERE IS BIAS
COMING FROM?
neuefische.de 38
who is contributing to the data?
neuefische.de 39
what we do with the data?
algorithms can also be biased,
examples:
do they care about the average?
is the target of the model really what the system
should optimise for?
today in the Markup newsletter
https://www.wsj.com/articles/facebook-algorithm-change-zu
ckerberg-11631654215
neuefische.de 40
MACHINE
LEARNING
neuefische.de 41
Machine Learning
AI can answer only 5
questions
1. How much?/How many?
2. Which class/category?
3. Which group?
4. Is it weird?
5. Which action?
Sources:
Microsoft AI
https://www.flaticon.com/free-icon/hand_328035 neuefische.de 42
Machine Learning
Birds-Eye View
Sources:
https://datute.net/bigdata.html
neuefische.de 43
Machine Learning
Supervised Learning
This is an
apple
new Response
known Data
these are Model
apples
known Response
new Data
neuefische.de 44
Machine Learning
Supervised Learning This is an
apple
Training data (known data) includes
the desired output (response) as well new Response
known Data
Example: these are Model
Predicting house prices based on given apples
features like: number of rooms, known Response
bathrooms, garage space, year it was
built, location, etc.
new Data
Sources:
Apple:https://www.flaticon.com/free-icon/apple_415682?term=apple&page=1&position=12
Machine Learning:
xhttps://www.flaticon.com/free-icon/machine-learning_2464316?term=machine%20learning&page=2&position=5
neuefische.de 45
Computer: https://www.flaticon.com/free-icon/pc-monitor_81793?term=computer%20screen&page=6&position=14
Machine Learning
Unsupervised Learning
I can see a
Pattern!
Model
Input Data
Sources:
Apple/Banana/And Pear: https://www.flaticon.com/packs/summer-food-drink
Machine Learning:
https://www.flaticon.com/free-icon/machine-learning_2464316?term=machine%20learning&page=2&position=5
Computer: https://www.flaticon.com/free-icon/pc-monitor_81793?term=computer%20screen&page=6&position=14
Thinking Bubble: https://www.flaticon.com/free-icon/thinking_522938?term=thinking%20bubble&page=1&position=17 neuefische.de 46
Machine Learning
Unsupervised Learning
I can see a
Pattern!
The training data (known data)
does NOT include the desired
output (response)
Example:
Grouping customers by
purchasing behavior Model
Input Data
Sources:
Apple/Banana/And Pear: https://www.flaticon.com/packs/summer-food-drink
Machine Learning:
https://www.flaticon.com/free-icon/machine-learning_2464316?term=machine%20learning&page=2&position=5
Computer: https://www.flaticon.com/free-icon/pc-monitor_81793?term=computer%20screen&page=6&position=14
Thinking Bubble: https://www.flaticon.com/free-icon/thinking_522938?term=thinking%20bubble&page=1&position=17
neuefische.de 47
Machine Learning
Semi-supervised Learning
Training data includes SOME of the
desired output
Example:
Photo archive, where only some images
are labeled (eg. dog, cat,person) and the
majority is unlabeled.
neuefische.de 48
Machine Learning
Reinforcement Learning
Training data has a feedback loop
Example:
autonomous video game player
Sources:
https://www.kdnuggets.com/2018/03/5-things-reinforceme
nt-learning.html
neuefische.de 49
Machine Learning
Regression vs.
Classification
Sources:
https://datute.net/bigdata.html
neuefische.de 50
Supervised learning
Classification vs. Regression
neuefische.de 51
Unsupervised learning
Dimensionality reduction
Sources:
Hands-on Machine Learning, Geron
neuefische.de 52
Unsupervised learning
Clustering
Sources:
kslearn data set, own visualization
neuefische.de 53
Deep Learning
Definition
Deep Learning is a class of ML AI
algorithms that uses multiple layers
to progressively extract higher level
features from the raw input. ML
For example, in image processing,
lower layers may identify edges, while DL
higher layers may identify the
concepts relevant to a human such as
digits or letters or faces.
neuefische.de 54
and more
Time Series Forecasting
A Time Series a series of data
points indexed in time order, most
commonly the data points are taken
at equal intervals.
neuefische.de 55
and more
Natural Language Processing
NLP is the field dealing with how to
program computers to process and
analyze large amounts of natural
language data.
neuefische.de 56
BECOMING A
DATA SCIENTIST
neuefische.de 57
some data roles in keywords
data scientist - data pipeline, data analysis, KPIs, statistics, data
exploration, visualization, communicating, data modeling, predicting,
building data products, EDA, deep learning
closely related roles : product analyst, machine learning engineer, data
visualizer
neuefische.de
learn about the subject and where
does your past experience fit in
book: https://www.manning.com/books/build-a-career-in-data-science
podcast:
https://open.spotify.com/show/78Nft51TuU3X2urEKfCuys?si=-f7cN3v2S
gu0pyDelBc-Yg&dl_branch=1
neuefische.de
Getting started
Try it out: kaggle.. zindi ..
and more
Sources:
https://www.kaggle.com/c/reducing-commercial-aviation-fa
talities/overview
neuefische.de 60
Week 3
Getting started w coding
■ Working with IDEs and Python scripts
■ pandas and NumPy
■ SQL
■ Visualization
■ Data Cleaning
■ Exploratory Data Analysis
neuefische.de 61