0% found this document useful (0 votes)
87 views41 pages

Introduction to Data Science & ML

Uploaded by

245123742004
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
87 views41 pages

Introduction to Data Science & ML

Uploaded by

245123742004
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 41

Introduction to

Data Science and Machine Learning

Speaker:
Dr. Venkateswara Raoo,
NIT Warangal
Note: These slides were assembled by Dr. K. V. Rao, with grateful acknowledgement of the many others who
made their course materials available online.
Outline
• Introduction to Data Science
o What is data science
o Why do we need Data Science ?
o What do Data Scientists do?
o Concentration in Data Science
• Introduction to Machine Learning
o Learning
o Types of Machine Learning
Data All Around
• Lots of data is being collected and warehoused
• Social Data
o likes/dislikes, comments, uploads
o Social Network, tweets
o Provides valuable insights into customer behavior
o Enormously influential in market analysis.
o Public web is good source of social data.
• Machine Data
o Information which is generated by industrial equipment
• Sensors that are installed in machinery
• Web logs which track user behavior, etc.
o This type of data is expected to grow exponentially as the internet of things grows
o Sensors such as medical devices, smart meters, road cameras, satellites,
games and the rapidly growing Internet Of Things will deliver high velocity, value,
volume and variety of data in the very near future.
Data All Around
• Transactional data
o Financial transactions, bank/credit transactions, etc.
o e-commerce
o Invoices, payment orders, storage records, delivery receipts –
o Online trading and purchasing

• We have various forms of data


o Relational Data (Tables/Transaction), Text Data (Web), Semi-structured Data
(XML), Graph Data, Social Network, Semantic Web (RDF), Streaming Data, and
so forth.
How Much Data Do We have?
• The amount of data in the world was estimated to be 44
zettabytes at the beginning of 2020.
• By 2025, the amount of data generated each day is expected to reach 463
exabytes globally.
• Google, Facebook, Microsoft, and Amazon store at least 1,200 petabytes of
information.
• The world spends almost $1 million per minute on commodities on the Internet.
• Electronic Arts process roughly 50 terabytes of data every day.
• By 2025, there would be 75 billion Internet-of-Things (IoT) devices in the world
• By 2030, nine out of every ten people aged six and above would be digitally
active.
• 26 billion texts were sent each day by 27 million people in the US. That is 94 texts
per day per person in the US in 2017.
• Over 2.5 quintillion bytes of data are created every single day, and it’s only going
to grow from there. It’s estimated that 1.7MB of data will be created every
second for every person on earth (2020).“
• Click here https://www.thinkful.com/blog/what-is-data-science/
Can we use these
patterns to help the
Can we use these user from information
patterns to expand the overload problem?
business?

Can we use the


knowledge extracted
Can we draw from data to make
meaningful patterns right decisions?
out of this data ?

yes
What is Data Science?
• An area that manages, manipulates, extracts, and interprets knowledge
from tremendous amount of data.

• Data science (DS) is a multidisciplinary field of study with goal to address


the challenges in big data.
o Computer Science
• Pattern recognition, visualization, data warehousing, High performance computing,
Databases, AI
o Mathematics
• Mathematical Modeling
o Statistics
• Statistical and Stochastic modeling, Probability.

• Data science principles apply to all data – big and small

https://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century/
Why do we need Data Science
• Simple Business Intelligence tools are not capable of processing huge
volumes and varieties of data.
• To understand the requirements of a customer
o Recommender systems
• Decision making
o Self driving cars
• Predictive analysis
o Weather forecasting
• And many other
Data Scientists
• Data scientists are the key to realizing the opportunities presented by
big data. They bring structure to it, find compelling patterns in it, and
advise executives on the implications for products, processes, and
decisions.
• Data Scientist The Sexiest Job of the 21st Century (Davenport et al. Harvard
Business Review. )
o Isn’t it exciting
• If the output of our work directly addresses customers’ need
• To tell a story from the data.
• To stay up to date.
o There is an increasing demand, many opportunities.
• They find stories, extract knowledge. They are not reporters
What do Data Scientists do?
• National Security
• Cyber Security
• Business Analytics
• Engineering
• Healthcare
• And more ….
Real Life Examples
• Companies learn your secrets, shopping patterns, and preferences
o For example, can we know if a woman is pregnant, even if she doesn’t want us to
know?
• Data Science and election (2008, 2012)
o 1 million people installed the Obama Facebook app that gave access to info on
“friends”
• Identifying and predicting disease
• Personalized healthcare recommendations
• Optimizing shipping routes in real-time
• Finding the next slew of world-class athletes
• Stamping out tax fraud
• Automating digital ad placement
• Algorithms that help you find love
• Predicting incarceration rates
Concentration in Data Science
• Mathematics and Applied Mathematics
• Applied Statistics/Data Analysis
• Solid Programming Skills (R, Python, Julia, SQL)
• Data Mining
• Data Base Storage and Management
• Machine Learning and discovery
Introduction to Machine
Learning
Learning ?
• We say, we are learning something when the performance is
improving with our experience.

• Learning = Improving with experience at some task.

• Human’s Learn from experience.


Machine Learning
• Machine Learning?
o Improve over task T.
o With respect to performance measure P.
o Based on experience E.

• What are T, P, E ? How do we formulate a machine learning problem?


A few Examples
• Handwritten digit recognition
o T – classifying handwritten digits within images.
o P – percent of digits correctly classified.
o E – database of handwritten digits with given classifications.
• Robot Driving
o T – Driving on public four-lane highways using vision sensors.
o P – Average distance traveled before an error.
o E – sequences of images and steering commands recorded observing a
human driver.
Machine Learning

• Machine learning is an application of artificial intelligence (AI) that


provides systems the ability to automatically learn and improve from
experience without being explicitly programmed.

• The primary aim is to allow the computers learn


automatically without human intervention or assistance and adjust
actions accordingly.
Example: Classification – Swan Vs. Duck

https://animalscomparison.com/swan-vs-
duck-vs-goose-difference-and-comparison/

• If the neck length is large


➔ Swan
• Else
➔ Duck
➔We need not explicitly define the rules, machine learning algorithm automatically learns from the given data
Classic Approaches vs Machine Learning

• Let’s say we want to predict the price


of a house based on the size of the
house, the size of its garden, and the
number of rooms it has.

Source: https://towardsdatascience.com/introduction-to-machine-learning-f41aabc55264
Why is Machine Learning Important?
• Explosive growth of data (click here: https://www.thinkful.com/blog/what-is-data-science)
• Data is the lifeblood of all business.
• Data-driven decisions increasingly make the difference between keeping up with competition or
falling further behind.
• Machine learning can be the key to unlocking the value of corporate and customer data and
enacting decisions that keep a company ahead of the competition.

• Machine Learning Use Cases


• Manufacturing. Predictive maintenance and condition monitoring
• Retail. Upselling and cross-channel marketing
• Healthcare and life sciences. Disease identification and risk satisfaction
• Travel and hospitality. Dynamic pricing
• Financial services. Risk analytics and regulation
• Energy. Energy demand and supply optimization
• And many more.
Opportunities
• Machine Learning is one of the best career choices of the 21st
century.
• It has plenty of job opportunities with a high-paying salary.
• Machine Learning is on its way to make a drastic change in the world
of automation.
• Further, there is a wide scope of Machine Learning in India.
Types of Machine Learning
Types of ML algorithms
Supervised Unsupervised Reinforcement
Learning Learning Learning

Regression Classification Clustering Dimensionality


Reduction

• Medical diagnosis • Targeted marketing • Game AI


• Customer segmentation
• Spam filtering • Robot navigation
• Image color compression
• Weather forecasting • Self driving cabs
• Structure discovery
• Image classification • Meaningful
• Real-time decisions
• Fraud detection, etc. compression, etc. • Skill acquisition, etc.
Supervised
Learning

• The data is labelled


• Label acts like a supervisor to
guide the learning process.
Supervised
Learning The data is labelled

Female
Male Male

???

Female
Male Female
Supervised
Learning Feature Extraction
Eyebrow Hair length
width

Eye width

height
Waist
length

Leg length
Supervised
Learning Classification

S.No. Height waist Gender


(cm) length
(cm)
1 156 28 Female
2 178 32 Male
Female
Male Male 3 168 30 Male
4 154 29 Female
5 169 30 Male
6 153 24 Female
7 165 29 ???

???
Female Male Female
Supervised
Learning Regression

S.No. Height waist Weight


(cm) length (Kg)
(cm)
1 156 28 68
2 178 32 72
68
72 69 3 168 30 69
4 154 29 64.5
5 169 30 74.9
6 153 24 52.3
7 165 29 ???

???
64.5 74.9 52.3
Supervised
Learning Regression or classification ?

Problem Regression or Classfication?

E-mail spam and non-spam


filtering Classification
Regression
House Price Prediction
Detection of news article type
Classification
Regression
Marks Prediction
Grade prediction Classification
Weather forecasting Classification
Temperature forecasting Regression
https://www.thinkful.com/blog/what-is-data-science
Source: https://towardsdatascience.com/introduction-to-machine-learning-f41aabc55264

Unsupervised
Learning Unlabelled data
Unsupervised Learning
• The computer is trained with unlabeled data.

• Useful in cases where the human expert doesn’t know what to look for in the data.

• Family of machine learning algorithms which are mainly used in pattern detection and descriptive
modeling.

• No output categories or labels here based on which the algorithm can try to model relationships.

• These algorithms try to use techniques on the input data to mine for rules, detect patterns, and
summarize and group the data points which helps in deriving meaningful insights and describe
the data better to the users.

• The main types of unsupervised learning algorithms include Clustering algorithms and Association
rule learning algorithms.
https://www.thinkful.com/blog/what-is-data-science
Reinforcement Learning
• Learning to interact with an
environment
o Robots, games, process control
o With limited human training
o Where the ‘right thing’ isn’t obvious
Reinforcement Learning

1. Observation of the environment


2. Deciding how to act using some
strategy
3. Acting accordingly
4. Receiving a reward or penalty
5. Learning from the experiences and
refining our strategy
6. Iterate until an optimal strategy is
found
Unsupervised Learning
• The computer is trained with unlabeled data.

• Useful in cases where the human expert doesn’t know what to look for in the data.

• Family of machine learning algorithms which are mainly used in pattern detection and descriptive
modeling.

• No output categories or labels here based on which the algorithm can try to model relationships.

• These algorithms try to use techniques on the input data to mine for rules, detect patterns, and
summarize and group the data points which helps in deriving meaningful insights and describe
the data better to the users.

• The main types of unsupervised learning algorithms include Clustering algorithms and Association
rule learning algorithms.
Reinforcement Learning
• Learning to interact with an
environment
o Robots, games, process control
o With limited human training
o Where the ‘right thing’ isn’t obvious
Reinforcement Learning

1. Observation of the environment


2. Deciding how to act using some
strategy
3. Acting accordingly
4. Receiving a reward or penalty
5. Learning from the experiences and
refining our strategy
6. Iterate until an optimal strategy is
found
Reference Books

You might also like