0% found this document useful (0 votes)

11 views21 pages

Big Data Lecture # 08

The document provides an overview of machine learning (ML), detailing its types, applications, and tools used in big data analytics. It explains various ML paradigms such as supervised, unsupervised, semi-supervised, and reinforcement learning, highlighting their unique characteristics and use cases. Additionally, it covers predictive modeling processes, including model creation, testing, validation, and evaluation, emphasizing the importance of selecting appropriate models for specific problems.

Uploaded by

Ali

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views21 pages

Big Data Lecture # 08

Uploaded by

Ali

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 21

BIG DATA ANALYTICS

Lecture 8 --- Week 9

Content

 Overview of Machine Learning

 Task Types in Machine Learning

 Big Data and Machine Learning

 Tools for Machine Learning

 Overview of Predictive Modeling

Overview of Machine Learning

 Machine learning (ML) is concerned with algorithms and techniques that allow
computers to learn.
 The ML approach covers main domains, such as data mining, difficult to
program applications, and software applications.
 It is a collection of a variety of algorithms that can provide multivariate,
nonlinear, nonparametric regression or classification.
 The remarkable simulation capabilities of the ML-based methods have
resulted in their extensive applications in science and engineering.
 Recently, the ML techniques have found many applications in astronomy and
the geosciences and remote sensing.
 More specifically, these techniques are proved to be practical for cases where
the system’s deterministic model is computationally expensive or there is no
deterministic model to solve the problem.
Task Types in Machine Learning

 The first step in applying ML is teaching the algorithm using a training

dataset.
 The training dataset is a collection of independent variables with the
corresponding dependent variables.
 The machine uses the training data to learn how the independent variables
(input)
relate to the dependent variable (output).
 Later, when the algorithm is applied to new input data, it can apply that
relationship and return a prediction.
 After the algorithm is trained, it needs to be tested to get a measure of how
well it can make predictions from new data.
 This requires another dataset with independent and dependent variables, but
the dependent variables are not provided to the learner.
 The algorithm predictions are compared to the withheld data to determine
the quality of the predictions.
 This process requires a dataset that is large enough to be split in two for
training and testing.
 The type of ML method, the size and nature of the training and test dataset,
and the evaluation method should be chosen to optimize the trade-off
between bias and accuracy to give a meaningful result for the problem at
hand.
 ML algorithms can be classified into many different paradigms, based on the
desired outcome of the algorithm.
Supervised Learning

 Supervised learning is one of the most widely used ML algorithms.

 In supervised learning, the training data you use are already labeled.
 These training data are used to infer a learning algorithm or mapping function
from the input variable (X) to the output variable (Y ).
 The correct answers or desired outputs (labels), here, are already
 known, given a labeled set of input–output pairs, M = {(Xi,Yi )}Ni ; N is
simply the number of training examples.
 The training input Xi is a d-dimensional vector or numbers also known as
features, or attributes.
 The input Xi can be an image, an email message, a time series, a molecular
shape, or a graph.
 The output Yi, also known as a response variable, is a categorical or nominal
variable for a classification problem or real value for a regression problem.
 Classification algorithms and regression techniques are two types of
supervised learning widely used to develop predictive models.
Unsupervised Learning

 Unsupervised learning (also known as knowledge discovery) uses unlabeled,

unclassified, and categorized training data.
 The main goal of unsupervised learning is to discover hidden and interesting
patterns in unlabeled data.
 Unlike supervised learning, unsupervised learning methods cannot be directly
applied to a regression or a classification problem as one has no idea what the
values for the output might be.
 Clustering is the most common unsupervised learning algorithm used to
explore the data analysis to find hidden patterns or groupings in the data.
 Applications for cluster analysis include gene sequence analysis, market
research and object recognition.
 Common algorithms used in unsupervised learning include clustering, anomaly
detection, neural networks, and approaches for learning latent variable
models.
Semi-supervised Learning

 Semi-supervised learning is a combination of supervised and unsupervised ML

methods.
 Semi-supervised learning algorithms make use of partially labeled training
data – typically a small amount of labeled data with a large amount of
unlabeled data.
 Semi-supervised algorithms are trained on a combination of labeled and
unlabeled data.
 This is very useful for improving the learning accuracy.
Reinforcement Learning

 Reinforcement learning is a type of dynamic programming that trains algorithms

using a system of reward and penalty.
 The learning system, called agent in this context, learns with an interactive
environment.
 The agent selects and performs actions and receives rewards by performing
correctly and penalties
 for performing incorrectly.
 In reinforcement learning the agent learns by itself, without the intervention from
a human, the best strategy to maximize reward in a particular situation using
dynamic programming.
 Unlike unsupervised learning, reinforcement learning is different in terms of goals,
while the goal in unsupervised learning is to find a suitable action model that
would maximize the total cumulative reward of the agent.
 Represents the basic idea and elements involved in reinforcement learning
model.
 Typical practical applications of reinforcement learning include the building
of artificial intelligence for playing computer games, robotics and industrial
automation, text summarizing engines, dialogue agent (text, speech), etc.
Big Data and Machine Learning
 Big Data and Machine Learning are the blue-chips of the current IT Industry.
 The big data stores analyzes and extracts information out of bulk data sets.
 On the other hand, Machine learning is the ability to automatically learn and improve from
experience without being explicitly programmed.
 Machine Learning provides efficient and automated tools for data gathering, analysis, and
assimilation.
 In collaboration with cloud computing superiority, the machine learning ingests agility into
processing and integrates large amounts of data regardless of its source.
 Machine learning algorithms can be applied to every element of Big Data operation including:
 Data Segmentation
 Data Analytics
 Simulation
 All these stages are integrated create the big picture out of Big Data with insights, patterns,
which later get categorized and packaged into an understandable format.
 The fusion of Machine Learning and Big Data is a never-ending loop. The algorithms created
for certain purposes are monitored and perfected over time as the information is coming into
the system and out of the system.
Tools for Machine Learning

 Python – This is one of the most dominant languages for data science in the
industry today because of its ease, flexibility, open-source nature. It has
gained rapid popularity and acceptance in the ML community.
 R – It is another very commonly used and respected language in data science.
R has a thriving and incredibly supportive community and it comes with a
plethora of packages and libraries that support most machine learning tasks.
 Apache Spark – Spark was open-sourced by UC Berkley in 2010 and has since
become one of the largest communities in big data. It is known as the swiss
army knife of big data analytics as it offers multiple advantages such as
flexibility, speed, computational power, etc.
 Jupyter Notebooks – These notebooks are widely used for coding in Python.
While it is predominantly used for Python, it also supports other languages
such as Julia, R, etc.
 SAS – It is a very popular and powerful tool. It’s prevalently and commonly used in
the banking and financial sectors. It has a very high share in private organizations
like American Express, JP Morgan, Mu Sigma, Royal Bank of Scotland, etc.
 SPSS – Short for Statistical Package for Social Sciences, SPSS was acquired by IBM
in 2009. It offers advanced statistical analysis, a vast library of machine learning
algorithms, text analysis, and much more.
 Matlab – Matlab is really underrated in the organizational landscape but it is
widely used in academia and research divisions. It has lost a lot of ground in
recent times to the likes of Python, R, and SAS but universities, especially in the
US, still teach a lot of undergraduate courses using Matlab.
 Weka7 - stands for Waikato environment for knowledge analysis. Weka is an open
source, easy to use, and user-friendly for applied ML algorithms. It has graphical
user interface and also a command line interface where all features of the
software can be used from the command line. It is a useful tool when working with
massive datasets where scripting helps in the automation of the work.
Overview of Predictive Modeling

 Predictive modeling is the process of creating, testing and validating a model to best predict
the probability of an outcome.
 A number of modeling methods from machine learning, artificial intelligence, and statistics
are available in predictive analytics software solutions for this task.
 The model is chosen on the basis of testing, validation and evaluation using the detection
theory to guess the probability of an outcome in a given set amount of input data.
 Models can use one or more classifiers in trying to determine the probability of a set of data
belonging to another set.
 The different models available on the Modeling portfolio of predictive analytics software
enables to derive new information about the data and to develop the predictive models.
 Each model has its own strengths and weakness and is best suited for particular types of
problems. A model is reusable and is created by training an algorithm using historical data
and saving the model for reuse purpose to share the common business rules which can be
applied to similar data, in order to analyze results without the historical data, by using the
trained algorithm.
Business process on Predictive Modeling

 Creating the model : Software solutions allows you to create a model to run
one or more algorithms on the data set.
 Testing the model: Test the model on the data set. In some scenarios, the
testing is done on past data to see how best the model predicts.
 Validating the model : Validate the model run results using visualization tools
and business data understanding.
 Evaluating the model : Evaluating the best fit model from the models used
and choosing the model right fitted for the data.
Predictive modeling process

 The process involve running one or more algorithms on the data set where
prediction is going to be carried out.
 This is an iterative processing and often involves training the model, using
multiple models on the same data set and finally arriving on the best fit
model based on the business data understanding.
Models Category

 Predictive models: The models in Predictive models analyze the past

performance for future predictions.
 Descriptive models: The models in descriptive model category quantify the
relationships in data in a way that is often used to classify data sets into
groups.
 Decision models: The decision models describe the relationship between all
the elements of a decision in order to predict the results of decisions
involving many variables.
Features in Predictive Modeling

 Data Analysis and manipulation : Tools for data analysis, create new data
sets, modify, club, categorize, merge and filter data sets.
 Visualization : Visualization features includes interactive graphics, reports.
 Statistics : Statistics tools to create and confirm the relationships between
variables in the data. Statistics from different statistical software can be
integrated to some of the solutions.
 Hypothesis testing : Creation of models, evaluation and choosing of the right
model.

Unit 3
No ratings yet
Unit 3
97 pages
Machine Learning Course Overview
No ratings yet
Machine Learning Course Overview
225 pages
MLUnit - 1 Share
No ratings yet
MLUnit - 1 Share
162 pages
Big Data Analytics Unit 4
No ratings yet
Big Data Analytics Unit 4
17 pages
Intro To AI With Python
No ratings yet
Intro To AI With Python
50 pages
Aiya Session 4
No ratings yet
Aiya Session 4
42 pages
Machine Learning
No ratings yet
Machine Learning
54 pages
Machine Learning for Beginners
No ratings yet
Machine Learning for Beginners
27 pages
Introduction to Machine Learning Concepts
No ratings yet
Introduction to Machine Learning Concepts
47 pages
Module 1
No ratings yet
Module 1
54 pages
Deep Learning Exam Guide
No ratings yet
Deep Learning Exam Guide
19 pages
Unit 1 PDF
No ratings yet
Unit 1 PDF
135 pages
Advanced Machine Learning Mastering Level Learning With Python
No ratings yet
Advanced Machine Learning Mastering Level Learning With Python
81 pages
Machine Learning
No ratings yet
Machine Learning
31 pages
Mlintro 3
No ratings yet
Mlintro 3
28 pages
Module2 ch2
No ratings yet
Module2 ch2
36 pages
Ad8552 ML Unit I
No ratings yet
Ad8552 ML Unit I
31 pages
Report Print
No ratings yet
Report Print
22 pages
Module 1 Part - 1
No ratings yet
Module 1 Part - 1
42 pages
Mlintro 2
No ratings yet
Mlintro 2
28 pages
Topic 1
No ratings yet
Topic 1
39 pages
Notes Unit 1
No ratings yet
Notes Unit 1
13 pages
Faheem's Guide to Machine Learning
No ratings yet
Faheem's Guide to Machine Learning
16 pages
Neural Network Bootcamp Overview
No ratings yet
Neural Network Bootcamp Overview
64 pages
Mlintro 4
No ratings yet
Mlintro 4
28 pages
Fundamentals of ML 1
No ratings yet
Fundamentals of ML 1
38 pages
Module 1 - Intro To ML - V2
No ratings yet
Module 1 - Intro To ML - V2
47 pages
DL Module 1
No ratings yet
DL Module 1
11 pages
Unit 1
No ratings yet
Unit 1
62 pages
Machine Learning: Understanding The Basics of Machine Learning and Its Applications
No ratings yet
Machine Learning: Understanding The Basics of Machine Learning and Its Applications
24 pages
Machine Learning - Introduction
No ratings yet
Machine Learning - Introduction
36 pages
Module 1 MMC201
No ratings yet
Module 1 MMC201
77 pages
What Is Machine Learning?
No ratings yet
What Is Machine Learning?
6 pages
CPCS335 - Chapter 8-Final
No ratings yet
CPCS335 - Chapter 8-Final
23 pages
1machine Learning
No ratings yet
1machine Learning
26 pages
ML Module I
No ratings yet
ML Module I
71 pages
Machine Learning Course Overview
No ratings yet
Machine Learning Course Overview
22 pages
MLP Unit-I
No ratings yet
MLP Unit-I
62 pages
Unit I MACHINE LEARNING
No ratings yet
Unit I MACHINE LEARNING
87 pages
ML Unit 1 Intro ML
No ratings yet
ML Unit 1 Intro ML
43 pages
Intro to Machine Learning Basics
No ratings yet
Intro to Machine Learning Basics
38 pages
Lec-7 Intro Machine Learning
No ratings yet
Lec-7 Intro Machine Learning
87 pages
Unit 3 ML
No ratings yet
Unit 3 ML
119 pages
Introduction To Data Science Module 3
No ratings yet
Introduction To Data Science Module 3
24 pages
Machine Learning
No ratings yet
Machine Learning
74 pages
Lecture 1
No ratings yet
Lecture 1
65 pages
Introduction to Machine Learning
No ratings yet
Introduction to Machine Learning
14 pages
Module 1
No ratings yet
Module 1
68 pages
ML Unit 1
No ratings yet
ML Unit 1
21 pages
Lecture 1
No ratings yet
Lecture 1
47 pages
UNIT I-Part 1
No ratings yet
UNIT I-Part 1
52 pages
Advanced Machine Learning Tutorial
No ratings yet
Advanced Machine Learning Tutorial
37 pages
Mlunit 1
No ratings yet
Mlunit 1
139 pages
Introduction to Machine Learning Concepts
No ratings yet
Introduction to Machine Learning Concepts
139 pages
Data Science Unit-4 B.sc. III Sem. MDC
No ratings yet
Data Science Unit-4 B.sc. III Sem. MDC
6 pages
Meta Motion Fitness Tracker 241109 213742 (1) Removed
No ratings yet
Meta Motion Fitness Tracker 241109 213742 (1) Removed
20 pages
Machine Learning Fundamentals Guide
No ratings yet
Machine Learning Fundamentals Guide
46 pages
Machine Learning: BE Sixth Semester 20CS610
No ratings yet
Machine Learning: BE Sixth Semester 20CS610
211 pages
Report of Python (1.)
No ratings yet
Report of Python (1.)
52 pages
SolidWorks 2019 Shortcuts Guide
No ratings yet
SolidWorks 2019 Shortcuts Guide
13 pages
T210 T210D T230 T230D EnglishManual PDF
No ratings yet
T210 T210D T230 T230D EnglishManual PDF
166 pages
U1 Session 1 Styles
No ratings yet
U1 Session 1 Styles
4 pages
AN Lift LM2A Unbalanced Load Compensation With Backlash Compensation EN
No ratings yet
AN Lift LM2A Unbalanced Load Compensation With Backlash Compensation EN
3 pages
Undergraduate Thesis Topics For Electronics Engineering
100% (3)
Undergraduate Thesis Topics For Electronics Engineering
8 pages
User Manual-Module & Course Enrolment-MAYA-v1.6-29012019-NonClinical-STUDENT PUBLISH PDF
No ratings yet
User Manual-Module & Course Enrolment-MAYA-v1.6-29012019-NonClinical-STUDENT PUBLISH PDF
161 pages
Lookup Exercises
No ratings yet
Lookup Exercises
14 pages
e-Panchayat: Enhancing Local Governance
No ratings yet
e-Panchayat: Enhancing Local Governance
67 pages
History of Operating Systems
No ratings yet
History of Operating Systems
7 pages
Symphonia Mapper: Getting Started Guide
No ratings yet
Symphonia Mapper: Getting Started Guide
31 pages
SMART Zoom Participant Guide
No ratings yet
SMART Zoom Participant Guide
17 pages
Computer 01: Daily Class Notes (English)
No ratings yet
Computer 01: Daily Class Notes (English)
7 pages
2025 b6 End of Term 3 Exams - Computing
No ratings yet
2025 b6 End of Term 3 Exams - Computing
2 pages
PI Checklist
No ratings yet
PI Checklist
2 pages
Essential Windows and Chrome Shortcuts
No ratings yet
Essential Windows and Chrome Shortcuts
2 pages
ALL LCD - LED Service Menu Code - Kazmi Elecom
80% (5)
ALL LCD - LED Service Menu Code - Kazmi Elecom
22 pages
Genius Setup Guide Single
No ratings yet
Genius Setup Guide Single
8 pages
Office Computer and Lab Usage Policies
No ratings yet
Office Computer and Lab Usage Policies
2 pages
Jio's Telecom Revolution in India
No ratings yet
Jio's Telecom Revolution in India
2 pages
CSE532 Homework: Software Project Concepts
0% (1)
CSE532 Homework: Software Project Concepts
1 page
Asynchronous Counter Lab Experiment
100% (1)
Asynchronous Counter Lab Experiment
2 pages
The IBM z13
No ratings yet
The IBM z13
80 pages
Design and Implementation of Electricity Meter Based On Iot
No ratings yet
Design and Implementation of Electricity Meter Based On Iot
4 pages
HVDC Notes: September 2017
No ratings yet
HVDC Notes: September 2017
105 pages
Coursersaaa - Quiz
No ratings yet
Coursersaaa - Quiz
14 pages
Warehouse Management Process Cycle
No ratings yet
Warehouse Management Process Cycle
39 pages
6AV21233MB320AW0 Datasheet en
No ratings yet
6AV21233MB320AW0 Datasheet en
5 pages
Tableau Developer Resume: 8+ Years Experience
No ratings yet
Tableau Developer Resume: 8+ Years Experience
6 pages
Ad Hoc Routing: CMU 15-744 David Andersen
No ratings yet
Ad Hoc Routing: CMU 15-744 David Andersen
28 pages

Big Data Lecture # 08

Uploaded by

Big Data Lecture # 08

Uploaded by

BIG DATA ANALYTICS

Lecture 8 --- Week 9

 Overview of Machine Learning

 Task Types in Machine Learning

 Big Data and Machine Learning

 Tools for Machine Learning

 Overview of Predictive Modeling

 The first step in applying ML is teaching the algorithm using a training

 Supervised learning is one of the most widely used ML algorithms.

 Unsupervised learning (also known as knowledge discovery) uses unlabeled,

 Semi-supervised learning is a combination of supervised and unsupervised ML

 Reinforcement learning is a type of dynamic programming that trains algorithms

 Predictive models: The models in Predictive models analyze the past

You might also like