0% found this document useful (0 votes)
20 views61 pages

03 Intro To Data Science ML

The document provides an introduction to data science and machine learning, explaining concepts such as algorithms, deterministic vs. probabilistic systems, and the roles of various data professionals. It discusses the limitations of machine learning, its applications in society, and the importance of understanding data biases. Additionally, it outlines different types of machine learning, including supervised, unsupervised, and reinforcement learning, along with their respective examples.

Uploaded by

raul.ogz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views61 pages

03 Intro To Data Science ML

The document provides an introduction to data science and machine learning, explaining concepts such as algorithms, deterministic vs. probabilistic systems, and the roles of various data professionals. It discusses the limitations of machine learning, its applications in society, and the importance of understanding data biases. Additionally, it outlines different types of machine learning, including supervised, unsupervised, and reinforcement learning, along with their respective examples.

Uploaded by

raul.ogz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Introduction to

Data Science & ML


AI

Sources:
https://becominghuman.ai/how-to-get-the-perfect-start-in-a
i-ml-as-newbie-learn-the-art-in-just-5-mins-cba28d2705e4
neuefische.de 3
neuefische.de
WHAT IS “NOT MACHINE
LEARNING”?

neuefische.de
Humans and algorithms

Which problems can be solved by “NOT MACHINE LEARNING”

1. "Rock paper scissors"


2. "Tic tac toe"

3. Cookie monster eats 10 kg of cookies each day. For every 10 kg


that he eats, he gets fatter by 5kg. (The rest of the energy is
consumed by having to hunt for cookies.)

How many kgs does cookie monster weigh today if his initial
weight was 100kg and he has been eating cookies for 5 days?

Solution :

neuefische.de
Humans and algorithms

Algorithm definition

“A finite set of unambiguous instructions that, given some set of initial conditions,
can be performed in a prescribed sequence to achieve a certain goal and that has a
recognizable set of end conditions.”

“Learning” - the act, process, or


experience of gaining knowledge
or skill.

In the examples above the Machine is


not learning, it’s doing what you told it to.
So who’s doing the “learning”?

neuefische.de
Humans and algorithms

Does “Not Machine Learning” have limitations?


Cookie monster sometimes gets a visit from his auntie and they drink tea together. His auntie brings 15 kg of cookies with her every time for
her favourite nephew to consume. His auntie comes only on days that she has good mood and not more frequently than 2 times a week.

The following is known about Cookie Monster's auntie mood swings :

-She likes when it's sunny outside


-She doesn't like if it's more than 28 degrees outside
-She doesn't like if her neighbour is looking out of the window when she is leaving the house
-She likes to take tram number 1 and not tram number 3

The auntie only has good mood if at the number of likes on the day outweighs the number of dislikes. It is also known that on average she has
good mood 3 times a week.

How many kgs does cookie monster weighs today if his initial weight is 100kg and he has been eating cookies for 5 days, his auntie came to
visit once this week already and it's been a nice week with 25 temperature, but tram number 1 is not working?

Solution :

neuefische.de
Determinism and probability

Uncertainty

A deterministic system is one in which the occurrence of all events is known


with certainty. If the description of the system state at a particular point of time
of its operation is given, the next state can be perfectly predicted.
A probabilistic system is one in which the occurrence of events cannot be
perfectly predicted. Though the behavior of such a system can be described in
terms of probability, a certain degree of error is always attached to the
prediction of the behavior of the system.

neuefische.de
Determinism and probability

Heuristics / baseline model

A heuristic (/hjʊˈrɪstɪk/; from Ancient Greek εὑρίσκω (heurískō) 'I find, discover'), or heuristic
technique, is an approach to problem solving or self-discovery using 'a calculated guess' derived
from previous experiences. Heuristics are mental shortcuts that ease the cognitive load of
[1][2]
making a decision. Usually the opposite process to heuristics is the application of
algorithms. Algorithms involve calculated answers and guesswork is eliminated.

In our case : We could assume that auntie comes once a week. It’s not 100% right, but it’s not
completely wrong either.

neuefische.de
Humans or machines learning?

Cookie Monster gain weight - an uncertain mystery

Very little is known about how Cookie Monster gains weight.


The following observations are however available :
day KIllograms of Neigbour looking Temperature Tram 1 working Lake water Evgeny teaching Weight beginning Weight end of day
cookies out of the window outside temperature ML class of day
consumed
1 15 Yes 25 1 15 1 100 114.3
2 10 No 23 0 15.5 0 114.3 120.7
3 40 Yes 29 1 15.3 1 120.7 135.4

Cookie monster has a birthday in 2 weeks and local municipality would like to give him a
postcard with his exact weight written on it. Can you accurately predict it?

neuefische.de
Humans or machine learning?

What if the system is non-deterministic and also highly


complex?
● It is difficult to understand what the rules are
● The rules are too complex to write down
● There are too many rules
● Rules sometimes apply and sometimes don’t and you
don’t know when or why
● You have tried heuristics and they don’t work well

neuefische.de
Machine Learning

Perhaps the Machine can figure it out?

If it’s too much for you to figure out, perhaps the Machine could?
Human learning Machine Learning

neuefische.de
MACHINE LEARNING IS A TOOL
TO DEAL WITH UNCERTAINTY IN
PROBABILISTIC SYSTEMS
Use it when you have exhausted all other options and not because you were too lazy to think and explore

neuefische.de
DO NOT SOLVE
DETERMINISTIC PROBLEMS WITH
MACHINE LEARNING
Using Machine Learning introduces complexity and overheads that can only be justified if they are absolutely necessary

neuefische.de
WHAT IS DATA?

neuefische.de 16
once upon a time
interactions
& learning

interactions
& learning

neuefische.de 17
now
interactions
& learning

interactions
& learning

neuefische.de 18
now

everything can be data:

a click, walking with your phone, opening


zoom, accessing a website, the weather,
buying something online, paying by card

we both produce data and are clients for the


data systems.. which collect our data

neuefische.de 19
data lifecycle

interaction - collection - transformation - enriching - modeling - getting


insights - improving the application

neuefische.de 20
WHO DOES WHAT
IN DATA?

neuefische.de 21
some data roles in keywords

data engineer - data warehouse, data lake, data infrastructure, data


pipeline, data transformation and enriching, ETL, automation, software
engineering

closely related roles : data ops, ml ops

neuefische.de
some data roles in keywords

data analyst - data warehouse, data pipeline, data transformation and


enriching, ETL, data analysis, EDA, KPIs, statistics, data exploration,
dashboards, visualization, communicating, assessing data products

closely related roles : product analyst, data scientist, data visualizer,


(growth hacker...)

neuefische.de
some data roles in keywords

data scientist - data pipeline, data analysis, KPIs, statistics, data


exploration, visualization, EDA, communicating, data modeling,
predicting, building data products, deep learning

closely related roles : product analyst, machine learning engineer, data


visualizer

neuefische.de
some data roles in keywords

machine learning engineer - data pipeline, data analysis, data


modeling, predicting, building data products, automation, software
engineering

closely related roles : data scientist, data engineer

neuefische.de
WHAT IS AI?

neuefische.de 26
What is AI?

It’s not the Terminator

It is a branch of Computer Science!! with


subdomains

Narrow AI: real AI .. math / computational


statistics on steroids .. solves one task

General AI: imaginary AI .. killer robots,


paperclip machine (decides to build paper clips
and drowns all mankind)

Technochauvinism: believing that all problems


can be solved by tech

Meredith Broussard, Artificial Unitelligence


https://www.c-span.org/video/?457638-2/artificial-unintelligence

neuefische.de 27
Machine Learning

When was the term Machine Learning coined?

What about Neural Networks?

neuefische.de 28
Machine Learning

When was the term Machine learning coined?

1959 Arthur Samuel

The term machine learning was coined in 1959 by Arthur Samuel, an American IBMer and
pioneer in the field of computer gaming and artificial intelligence.

What about neural networks?

1958 psychologist Frank Rosenblatt

The first artificial neural network was invented in 1958 by psychologist Frank Rosenblatt. Called
Perceptron, it was intended to model how the human brain proccessed visual data and learned to
recognize objects

neuefische.de 29
Machine Learning - what changed

AI winter due to “ahead of their time”

Computing power and Democratization of algorithms

Now it is AT SCALE

neuefische.de 30
ML Applications in Society

How can Machine learning help?

Smarter weather prediction and agriculture

Energy optimization

Self-driving cars

AI in healthcare / Drug discovery

Finance / Fraud detection

On-demand language translation

neuefische.de 31
Machine Learning - applications

What can we do with it?

Product recommendations

Demand prediction for a service

Dynamic Pricing in transport

Predictive maintenance

Winning a game of chess

Sentiment analysis

Personalized medication

https://www.projectpro.io/article/10-awesome-machine-learning-applications-of-today/364
neuefische.de 32
AI - Effect on Society

How can AI be dangerous?

Autonomous weapons

Social manipulation

Invasion of privacy and social grading

Recruiting

Amplifies discrimination

check out Coded Bias on Netflix

neuefische.de 33
AI - Effect on Society

neuefische.de 34
AI and discrimination - PULSE AI

Twitter storm

neuefische.de
AI and the big players

Ethics or profit?

Facebook (meta), Google, Twitter, Amazon, Apple ..

are constantly in the news with stories about their


algorithms not being properly regulated

https://www.theregreview.org/2022/01/03/cusumano-yoffie-gawer-pushing-social-media-sel
f-regulate/

neuefische.de
Awful AI

https://github.com/daviddao/awful-ai

neuefische.de
WHERE IS BIAS
COMING FROM?

neuefische.de 38
who is contributing to the data?

neuefische.de 39
what we do with the data?

algorithms can also be biased,

examples:
do they care about the average?
is the target of the model really what the system
should optimise for?

today in the Markup newsletter


https://www.wsj.com/articles/facebook-algorithm-change-zu
ckerberg-11631654215

neuefische.de 40
MACHINE
LEARNING

neuefische.de 41
Machine Learning

AI can answer only 5


questions

1. How much?/How many?


2. Which class/category?
3. Which group?
4. Is it weird?
5. Which action?

Sources:
Microsoft AI
https://www.flaticon.com/free-icon/hand_328035 neuefische.de 42
Machine Learning

Birds-Eye View

Sources:
https://datute.net/bigdata.html
neuefische.de 43
Machine Learning

Supervised Learning
This is an
apple

new Response
known Data

these are Model


apples

known Response

new Data

neuefische.de 44
Machine Learning

Supervised Learning This is an


apple

Training data (known data) includes


the desired output (response) as well new Response
known Data

Example: these are Model


Predicting house prices based on given apples
features like: number of rooms, known Response
bathrooms, garage space, year it was
built, location, etc.

new Data

Sources:
Apple:https://www.flaticon.com/free-icon/apple_415682?term=apple&page=1&position=12
Machine Learning:
xhttps://www.flaticon.com/free-icon/machine-learning_2464316?term=machine%20learning&page=2&position=5
neuefische.de 45
Computer: https://www.flaticon.com/free-icon/pc-monitor_81793?term=computer%20screen&page=6&position=14
Machine Learning

Unsupervised Learning
I can see a
Pattern!

Model

Input Data

Sources:
Apple/Banana/And Pear: https://www.flaticon.com/packs/summer-food-drink
Machine Learning:
https://www.flaticon.com/free-icon/machine-learning_2464316?term=machine%20learning&page=2&position=5
Computer: https://www.flaticon.com/free-icon/pc-monitor_81793?term=computer%20screen&page=6&position=14
Thinking Bubble: https://www.flaticon.com/free-icon/thinking_522938?term=thinking%20bubble&page=1&position=17 neuefische.de 46
Machine Learning

Unsupervised Learning
I can see a
Pattern!

The training data (known data)


does NOT include the desired
output (response)

Example:
Grouping customers by
purchasing behavior Model

Input Data

Sources:
Apple/Banana/And Pear: https://www.flaticon.com/packs/summer-food-drink
Machine Learning:
https://www.flaticon.com/free-icon/machine-learning_2464316?term=machine%20learning&page=2&position=5
Computer: https://www.flaticon.com/free-icon/pc-monitor_81793?term=computer%20screen&page=6&position=14
Thinking Bubble: https://www.flaticon.com/free-icon/thinking_522938?term=thinking%20bubble&page=1&position=17
neuefische.de 47
Machine Learning

Semi-supervised Learning

Training data includes SOME of the


desired output

Example:
Photo archive, where only some images
are labeled (eg. dog, cat,person) and the
majority is unlabeled.

neuefische.de 48
Machine Learning

Reinforcement Learning

Training data has a feedback loop

Example:
autonomous video game player

Sources:
https://www.kdnuggets.com/2018/03/5-things-reinforceme
nt-learning.html
neuefische.de 49
Machine Learning

Regression vs.
Classification

Sources:
https://datute.net/bigdata.html
neuefische.de 50
Supervised learning

Classification vs. Regression

neuefische.de 51
Unsupervised learning

Dimensionality reduction

Sources:
Hands-on Machine Learning, Geron
neuefische.de 52
Unsupervised learning

Clustering

Sources:
kslearn data set, own visualization
neuefische.de 53
Deep Learning

Definition

Deep Learning is a class of ML AI


algorithms that uses multiple layers
to progressively extract higher level
features from the raw input. ML
For example, in image processing,
lower layers may identify edges, while DL
higher layers may identify the
concepts relevant to a human such as
digits or letters or faces.

neuefische.de 54
and more

Time Series Forecasting

A Time Series a series of data


points indexed in time order, most
commonly the data points are taken
at equal intervals.

neuefische.de 55
and more

Natural Language Processing

NLP is the field dealing with how to


program computers to process and
analyze large amounts of natural
language data.

neuefische.de 56
BECOMING A
DATA SCIENTIST

neuefische.de 57
some data roles in keywords

data scientist - data pipeline, data analysis, KPIs, statistics, data


exploration, visualization, communicating, data modeling, predicting,
building data products, EDA, deep learning

closely related roles : product analyst, machine learning engineer, data


visualizer

neuefische.de
learn about the subject and where
does your past experience fit in

book: https://www.manning.com/books/build-a-career-in-data-science

podcast:
https://open.spotify.com/show/78Nft51TuU3X2urEKfCuys?si=-f7cN3v2S
gu0pyDelBc-Yg&dl_branch=1

neuefische.de
Getting started

Try it out: kaggle.. zindi ..


and more

Sources:
https://www.kaggle.com/c/reducing-commercial-aviation-fa
talities/overview
neuefische.de 60
Week 3

Getting started w coding


■ Working with IDEs and Python scripts

■ pandas and NumPy

■ SQL

■ Visualization

■ Data Cleaning

■ Exploratory Data Analysis

neuefische.de 61

You might also like