0% found this document useful (0 votes)

13 views11 pages

What Is Machine Learning

Ml studying material

Uploaded by

dongareom77

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views11 pages

What Is Machine Learning

Ml studying material

Uploaded by

dongareom77

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 11

What is Machine Learning?

 Machine Learning (ML) is a branch of Artificial Intelligence (AI) where computers

learn patterns from data and make decisions or predictions without being explicitly
programmed.

 In simple words: Instead of writing rules for every task, we give the machine data +
examples, and it learns by itself.

Definition of ML

Machine Learning is the field of study that gives computers the ability to learn from data and
improve their performance on a task without being explicitly programmed.
– Arthur Samuel (who introduced the term ML in 1959)

Real-Life Applications of ML

1. Email Spam Detection

o Gmail automatically detects spam/junk emails.

o Uses ML models trained on examples of spam and non-spam emails.

2. Movie & Music Recommendations

o Netflix, YouTube, Spotify suggest content based on what you previously

watched or listened to.

o ML models analyze user behavior and predict preferences.

3. Medical Diagnosis

o ML helps doctors detect diseases (like cancer from X-rays or diabetes from
reports).

o Trained on large sets of medical data.

4. Self-Driving Cars

o Cars like Tesla use ML to recognize objects (pedestrians, signals, other

vehicles) and take decisions.

5. Voice Assistants

o Siri, Alexa, Google Assistant use ML for speech recognition and natural
language understanding.

6. Fraud Detection

o Banks use ML to detect unusual transactions (credit card fraud).

S.
Comparison Descriptive Data Task Predictive Data Task
No

Determines what can happen in

Determines what happened in the
1 Basic the future using past data
past by analyzing stored data.
analysis.

Produces results but does not

2 Preciseness Provides accurate data.
ensure accuracy.

Uses standard reporting, Uses predictive modeling,

Practical Analysis
3 query/drill-down, and ad-hoc forecasting, simulation, and
Methods
reporting. alerts.

Works with labeled data(i/p +

4 Data Type Works mostly with unlabeled data.
o/p).

5 Type of Approach Follows a reactive approach. Follows a proactive approach.

Customer segmentation, Market Weather forecasting, Spam

6 Examples basket analysis, Social network email detection, Stock price
analysis. prediction.
Unsupervised Semi-Supervised Reinforcement
Aspect Supervised Learning
Learning Learning Learning

Unlabeled data Small labeled + No fixed dataset;

Data Labeled data (input
(only input, no large unlabeled learns from the
Used + correct output).
output). data. environment.

Learn mapping Find hidden Use both labeled Learn the best
Goal between input and patterns or groups and unlabeled data strategy by trial and
output. in data. for better accuracy. error.

Improved Sequence of actions

Predictions or Clusters, patterns,
Output prediction with less (policy) to maximize
classifications. structures.
labeled data. reward.

Direct supervision No supervision,

Reward/Penalty
Feedback with correct only data Partial supervision.
after each action.
answers. exploration.

Spam detection, Customer

Medical imaging, Self-driving cars,
house price segmentation,
Examples speech recognition, game playing,
prediction, medical market basket
web classification. robotics.
diagnosis. analysis.

What is a Feature?

 In Machine Learning, a feature is an individual measurable property, attribute, or

characteristic of the data that is used as input to the model.

 Features are basically the independent variables (X) that help in predicting the
output (Y).

👉 In simple words: A feature is a column in your dataset.

Examples of Features

1. In a house price prediction dataset:

o Features → Size of house, Number of rooms, Location, Age of house.

o Target → Price of house.

2. In a student performance dataset:

o Features → Hours studied, Attendance, Past grades.

o Target → Final exam marks.

3. In a spam detection dataset:

o Features → Number of links in email, Presence of suspicious words, Length of

email.

o Target → Spam / Not Spam.

Types of Features

1. Numerical Features

o Represent quantities.

o Example: Height, Weight, Salary.

2. Categorical Features

o Represent categories or labels.

o Example: Gender (Male/Female), Blood Group (A, B, O).

3. Boolean Features

o Represent True/False values.

o Example: “Does email contain the word FREE?” (Yes = 1, No = 0).

4. Derived Features (Engineered Features)

o New features created from existing ones.

o Example: BMI (Body Mass Index) derived from Weight and Height.

Feature Construction in Machine Learning

What is Feature Construction?

 Feature Construction is the process of creating new features from the existing raw
data to make the dataset more useful for machine learning models.

 It is a part of Feature Engineering.

 Goal: Improve model performance by providing more meaningful and informative

inputs.
👉 In simple words: We take the existing data and construct new features that better
represent the problem.

Why Feature Construction is Important?

 Raw data often does not contain features in the exact form needed by ML models.

 Constructing new features can:

o Improve accuracy of predictions.

o Make hidden patterns more visible.

o Reduce noise and irrelevant information.

o Allow simpler models to perform better.

Examples of Feature Construction

1. From Date/Time Data

o Raw feature: “2025-08-20 18:30”

o Constructed features: Day, Month, Year, Hour, Day of week,

Weekend/Weekday.

o Useful in: Sales forecasting, traffic prediction.

2. From Text Data

o Raw feature: Customer reviews (text).

o Constructed features: Word counts, Sentiment score (positive/negative),

Presence of keywords.

o Useful in: Sentiment analysis, spam detection.

3. From Numerical Data

o Raw features: Height (cm), Weight (kg).

o Constructed feature: Body Mass Index (BMI = weight / height²).

o Useful in: Health/medical predictions.

4. From Transaction Data

o Raw feature: Purchase history of customer.

o Constructed features: Total spending, Average spending, Frequency of
purchase.

o Useful in: Customer segmentation, fraud detection.

Steps in Feature Construction

1. Understand the problem & dataset – Know what the model needs.

2. Analyze raw data – Identify which attributes are useful.

3. Create new features – Using domain knowledge (like BMI from height & weight).

4. Test features – Check if new features improve model accuracy.

Feature Selection in Machine Learning

What is Feature Selection?

 Feature Selection is the process of choosing only the most relevant features from
the dataset and removing irrelevant or redundant ones.

 It helps in reducing the size of data while keeping only the important information.

 Goal: Improve model performance (accuracy, speed, interpretability).

👉 In simple words: If your dataset has many columns (features), feature selection picks the
best ones for training the model.

Why Feature Selection is Needed?

1. Reduces Overfitting → Less noise, fewer irrelevant features.

2. Improves Accuracy → Focus on important variables only.

3. Reduces Training Time → Smaller dataset, faster training.

4. Better Interpretability → Easier to understand the model.

Examples of Feature Selection

1. House Price Prediction

o Raw features: Size, Location, Rooms, Flooring type, Owner’s name, House
color, Date of construction.

o Selected features: Size, Location, Rooms → these directly affect price.

o Ignored: Owner’s name, House color (not useful).

2. Spam Email Detection

o Raw features: Email length, Number of links, Words like “FREE”, Sender’s font
style.

o Selected features: Number of links, Suspicious words → useful for

classification.

3. Medical Diagnosis

o Raw features: Blood test results, Age, Patient ID, Room number.

o Selected features: Blood test results, Age.

o Ignored: Patient ID, Room number.

Methods of Feature Selection

1. Filter Methods – Use statistical tests (correlation, chi-square, mutual information) to

select features.

2. Wrapper Methods – Use machine learning models to test different feature subsets
(Forward selection, Backward elimination).

3. Embedded Methods – Feature selection happens during model training (like Lasso
Regression, Decision Trees).

Training Dataset vs Testing Dataset

S.
Aspect Training Dataset Testing Dataset
No

Used to train the machine learning

Used to evaluate the performance
1 Purpose model by allowing it to learn
and accuracy of the trained model.
patterns and rules.

Model does not learn; it only

Model learns from this dataset
2 Data Usage predicts outcomes to check
(adjusts weights, parameters).
performance.
S.
Aspect Training Dataset Testing Dataset
No

Presence in Always used during the training Never used during training; only in
3
Training phase. evaluation phase.

Usually larger to allow better

Usually smaller to validate results
4 Size learning (e.g., 70–80% of total
(e.g., 20–30% of total data).
data).

Helps detect overfitting if the model

Role in Helps in reducing underfitting when
5 performs well on training but poorly
Overfitting used correctly.
on testing.

K-Fold Cross-Validation

K-Fold Cross-Validation is a technique used to evaluate a machine learning model's

performance more reliably.
It reduces the risk of overfitting or underfitting during model selection.

Steps of K-Fold Cross-Validation

1. Split the dataset into k equal-sized subsets (folds).

2. For each iteration (k times):

o Use one fold as the testing set.

o Use the remaining (k-1) folds as the training set.

3. Train the model on the training set and evaluate it on the testing set.

4. Repeat this process k times (each fold becomes the test set once).

5. Calculate the average performance score (e.g., accuracy, precision) of all k runs.

Example (k=5)

Suppose you have 100 data points and choose k = 5:

 Each fold will have 20 samples.

Iteration 1: Train on folds 2–5 → Test on fold 1

Iteration 2: Train on folds 1, 3–5 → Test on fold 2
Iteration 3: Train on folds 1–2, 4–5 → Test on fold 3
Iteration 4: Train on folds 1–3, 5 → Test on fold 4
Iteration 5: Train on folds 1–4 → Test on fold 5

Then, you average the results.

Advantages

 More reliable than a single train-test split.

 Uses all data for both training and testing.

 Reduces bias in evaluation.

Tabular Representation (k=5)

Training Data (Folds

Fold Testing Data (Fold Used)
Used)

1 2, 3, 4, 5 1

2 1, 3, 4, 5 2

3 1, 2, 4, 5 3

4 1, 2, 3, 5 4

5 1, 2, 3, 4 5

Scenario

You are developing a machine learning model to predict house prices based on features like
location, area, number of bedrooms, and amenities.

Real-Life Problem

You have a dataset of 1,000 houses, but you want your model to perform well on new
houses that it hasn’t seen before.

Applying K-Fold Cross Validation (let's say k = 5)

1. Split the dataset into 5 equal parts (folds) → each fold has 200 houses.

2. Iteration 1: Train on folds 2–5 (800 houses), test on fold 1 (200 houses).

3. Iteration 2: Train on folds 1, 3, 4, 5, test on fold 2.

4. Iteration 3: Train on folds 1, 2, 4, 5, test on fold 3.

5. Iteration 4: Train on folds 1, 2, 3, 5, test on fold 4.

6. Iteration 5: Train on folds 1–4, test on fold 5.

After all 5 iterations, average the 5 test accuracies to get the final model performance.

Why use it in real life?

 Ensures the model performs well on unseen data (generalization).

 Makes full use of your dataset (every house is used for both training and testing).

 Reduces the risk of overfitting or underfitting caused by random train-test splits.

Leave-One-Out Cross-Validation (LOOCV)

Definition:
LOOCV is a special case of k-fold cross-validation where the number of folds k = number of
samples (n).

 Each time, only one sample is used as the test set, and the remaining n−1n-1n−1
samples are used for training.

 This process is repeated for each sample, and the performance is averaged.

How It Works

1. Suppose you have 5 samples: A, B, C, D, E

2. Iteration 1 → Train on B, C, D, E → Test on A

3. Iteration 2 → Train on A, C, D, E → Test on B

4. … and so on, until each sample has been tested once.

5. Average the errors/accuracy from all iterations to get the final result.

Real-Life Example

You are creating a model to predict student exam performance from their study hours:

 You have 50 students’ data.

 Each time, train on 49 students and test on the remaining 1.

 Repeat 50 times, then calculate the average prediction error.

Advantages

 Uses maximum data for training in each iteration (n−1n-1n−1 samples).

 Reduces bias in performance estimation.

 Best for very small datasets.

Disadvantages

 Computationally expensive for large datasets (training occurs nnn times).

 Variance in the evaluation may still be high.

ML Chap 2
No ratings yet
ML Chap 2
60 pages
ML 02 Dataset-Feature Selection PDF
No ratings yet
ML 02 Dataset-Feature Selection PDF
44 pages
AI-900 - Fundamental Principles of ML
No ratings yet
AI-900 - Fundamental Principles of ML
55 pages
Module 4
No ratings yet
Module 4
28 pages
ML Unit 2
No ratings yet
ML Unit 2
33 pages
Air Quality Prediction Using Machine Learning
No ratings yet
Air Quality Prediction Using Machine Learning
29 pages
ML and DL
No ratings yet
ML and DL
15 pages
ML 2 PPT Unit 2
No ratings yet
ML 2 PPT Unit 2
214 pages
Feature Engineering & Selection Guide
No ratings yet
Feature Engineering & Selection Guide
32 pages
Machine Learning INTRO
No ratings yet
Machine Learning INTRO
12 pages
ML Unit 1
No ratings yet
ML Unit 1
73 pages
ML Lectures Summary 2
No ratings yet
ML Lectures Summary 2
52 pages
FML - KNN
No ratings yet
FML - KNN
64 pages
Lecture 5 - Feature Extraction, Model Building & Evaluation
No ratings yet
Lecture 5 - Feature Extraction, Model Building & Evaluation
35 pages
Final ML
No ratings yet
Final ML
2 pages
Understanding Datasets Features Selection Train Test Validation Sets L12
No ratings yet
Understanding Datasets Features Selection Train Test Validation Sets L12
25 pages
AI Unit 1
No ratings yet
AI Unit 1
30 pages
Data Analyst Interview Questionaries
No ratings yet
Data Analyst Interview Questionaries
16 pages
Machine Learning Essentials
No ratings yet
Machine Learning Essentials
86 pages
Machine Learning Basics
No ratings yet
Machine Learning Basics
9 pages
Machine Learning Lec 1
No ratings yet
Machine Learning Lec 1
68 pages
Machine Learning Basics & kNN Guide
No ratings yet
Machine Learning Basics & kNN Guide
94 pages
Machine Learning
No ratings yet
Machine Learning
51 pages
Introduction Class
No ratings yet
Introduction Class
134 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
316 pages
Chapter 3 NeeLXU
No ratings yet
Chapter 3 NeeLXU
68 pages
Module 3 Data Science Machine Learning
No ratings yet
Module 3 Data Science Machine Learning
53 pages
Unit4 PPT
No ratings yet
Unit4 PPT
126 pages
1 - Intro To Machine Learning
No ratings yet
1 - Intro To Machine Learning
34 pages
End SEM V IMP DSE 2
No ratings yet
End SEM V IMP DSE 2
9 pages
INTRODUCTION
No ratings yet
INTRODUCTION
51 pages
ML Workshop
No ratings yet
ML Workshop
78 pages
Supervised Learning in Image Classification
No ratings yet
Supervised Learning in Image Classification
25 pages
What Is Machine Learning
No ratings yet
What Is Machine Learning
9 pages
Machine Learning Concepts Guide
No ratings yet
Machine Learning Concepts Guide
23 pages
Chapter 5 Machine Learning
No ratings yet
Chapter 5 Machine Learning
96 pages
Machine Learning Path
No ratings yet
Machine Learning Path
21 pages
Machine Learning Section2 Ebook
No ratings yet
Machine Learning Section2 Ebook
16 pages
Research Trends in Machine Learning: Muhammad Kashif Hanif
No ratings yet
Research Trends in Machine Learning: Muhammad Kashif Hanif
80 pages
Week 09 Lesson 1 Intro Machine Learning 1 To 32
No ratings yet
Week 09 Lesson 1 Intro Machine Learning 1 To 32
61 pages
AIMl TA2
No ratings yet
AIMl TA2
4 pages
5.1 Large Scale ML
No ratings yet
5.1 Large Scale ML
10 pages
Module 2 - ML
No ratings yet
Module 2 - ML
53 pages
Lecture Notes 1 2 Intro Python
No ratings yet
Lecture Notes 1 2 Intro Python
13 pages
AIch 5
No ratings yet
AIch 5
50 pages
Machine Learning Note
No ratings yet
Machine Learning Note
40 pages
Introduction to Machine Learning Guide
No ratings yet
Introduction to Machine Learning Guide
6 pages
Lecture 4 - Intro To Machine Learning and Decision Trees
No ratings yet
Lecture 4 - Intro To Machine Learning and Decision Trees
61 pages
Unit 1
No ratings yet
Unit 1
93 pages
Machine - Learning - Unit - 1
No ratings yet
Machine - Learning - Unit - 1
70 pages
Lecture # 09
No ratings yet
Lecture # 09
3 pages
Unit-1 ML
No ratings yet
Unit-1 ML
19 pages
Amlt Bca Unit-1
No ratings yet
Amlt Bca Unit-1
24 pages
ML Week 3
No ratings yet
ML Week 3
6 pages
Aiml Notes
No ratings yet
Aiml Notes
12 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
21 pages
KNN Evaluation
No ratings yet
KNN Evaluation
51 pages
Guidebook Machine Learning Basics PDF
100% (1)
Guidebook Machine Learning Basics PDF
16 pages
Classification of Text Documents Based On Naive Bayes Using N-Gram Features
No ratings yet
Classification of Text Documents Based On Naive Bayes Using N-Gram Features
5 pages
Intro Project Explaination
No ratings yet
Intro Project Explaination
12 pages
Thesis Conf Iiuc
No ratings yet
Thesis Conf Iiuc
6 pages
Machine Learning Algorithms
No ratings yet
Machine Learning Algorithms
2 pages
Lecture Notes
No ratings yet
Lecture Notes
35 pages
UNIT-3 Supervised Learning
No ratings yet
UNIT-3 Supervised Learning
77 pages
SpeechPrompt Prompting Speech Language Models For
No ratings yet
SpeechPrompt Prompting Speech Language Models For
16 pages
2025 - PNet-IDS A Lightweight and Generalizable Convolutional Neural Network For Intrusion Detection in Internet of Things
No ratings yet
2025 - PNet-IDS A Lightweight and Generalizable Convolutional Neural Network For Intrusion Detection in Internet of Things
16 pages
3 Keras
No ratings yet
3 Keras
3 pages
Deep Learning Questions
No ratings yet
Deep Learning Questions
3 pages
3.DiffBoost Enhancing Medical Image
No ratings yet
3.DiffBoost Enhancing Medical Image
13 pages
Introduction to Machine Learning 4th Edition Ethem Alpaydin ebook complete package
100% (5)
Introduction to Machine Learning 4th Edition Ethem Alpaydin ebook complete package
155 pages
CUML1021 Machine Learning For Predictive Analytics Syllabus
No ratings yet
CUML1021 Machine Learning For Predictive Analytics Syllabus
4 pages
ML Handout
No ratings yet
ML Handout
9 pages
Cascading Autoencoder With Attention Residual U-Net For Multi-Class Plant Leaf Disease Segmentation and Classification
No ratings yet
Cascading Autoencoder With Attention Residual U-Net For Multi-Class Plant Leaf Disease Segmentation and Classification
18 pages
Unified Approaches To Handwritten Digit Recognition A Fusion of Four Advanced Models
No ratings yet
Unified Approaches To Handwritten Digit Recognition A Fusion of Four Advanced Models
6 pages
Kenlm: Language Model Toolkit
No ratings yet
Kenlm: Language Model Toolkit
16 pages
Titanic Dataset Analysis
No ratings yet
Titanic Dataset Analysis
14 pages
1 s2.0 S2405959524000651 Main
No ratings yet
1 s2.0 S2405959524000651 Main
20 pages
DL 1
No ratings yet
DL 1
20 pages
2025-Diffusion Model-Based Image Editing-A Survey
No ratings yet
2025-Diffusion Model-Based Image Editing-A Survey
27 pages
Evaluating Machine Learning Algorithms and Model Selection
No ratings yet
Evaluating Machine Learning Algorithms and Model Selection
3 pages
2025 - Ref - Predictive Modelling For Enhanced Customer Retention A Machine
No ratings yet
2025 - Ref - Predictive Modelling For Enhanced Customer Retention A Machine
14 pages
Ibm Research Paper 2
No ratings yet
Ibm Research Paper 2
11 pages
Efficient Deep Learning-Based Tomato Leaf Disease Detection Through Global and Local Feature Fusion
No ratings yet
Efficient Deep Learning-Based Tomato Leaf Disease Detection Through Global and Local Feature Fusion
13 pages
Eeg 4
No ratings yet
Eeg 4
10 pages
Advancing Brain Tumor Detection Using Machine Learning and Artificial Intelligence A Systematic Literature Review of Predictive Models and Diagnostic Accuracy
No ratings yet
Advancing Brain Tumor Detection Using Machine Learning and Artificial Intelligence A Systematic Literature Review of Predictive Models and Diagnostic Accuracy
20 pages
Rumelhart - Backpropagation.the Basic Theory
No ratings yet
Rumelhart - Backpropagation.the Basic Theory
34 pages
Interview Guide For GenAI
No ratings yet
Interview Guide For GenAI
2 pages
Cross Validation 1
No ratings yet
Cross Validation 1
5 pages