CS Study Guide

The document provides an overview of various machine learning techniques including linear and logistic regression, decision trees, KNN, SVM, neural networks, and clustering methods like K-Means and DBSCAN. It discusses their strengths, weaknesses, and applications, as well as concepts like feature engineering, bias and variance, and evaluation metrics. Additionally, it covers advanced topics such as ensemble methods, anomaly detection, and association analysis, highlighting their relevance in data analysis and predictive modeling.

Uploaded by

Noanaki

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views3 pages

CS Study Guide

Uploaded by

Noanaki

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Linear Regression

Simple and Multiple: used to predict a continuous target y based on data x and finds a line of best fit by minimizing the loss function.
Can only work with data that has linear relationships
Strengths: Simple model and easy to interpret, easy to compute, low overfitting,
Weakness: Assumes linearity, multicolinearity and outliers can cause issues, dependent of feature scaling
Logistic Regression
Used for binary classification problems through a sigmoid function with min 0, max 1
Maps outputs to a probability, and depending on the probability threshold, assigns a class
Strengths: Simple and interpretable, Fast training and prediction, Provides probability scores, Works well with linear boundaries, Less prone to overfitting than more complex
models, Good for small to medium datasets
Weakness: Assumes linear relationship between features, Struggles with non-linear Boundaries, Sensitive to outliers, May underperform with highly imbalanced data,
Requires more data for stable Estimates, Features should be independen
Feature Engineering
Feature engineering involves extracting and transforming raw data variables into optimized features to enable improved machine learning model performance and predictive
power.
Binning to turn continuous data into discrete categories, Create polynomial features for nonlinear relationships, Combine features to create new ones
EX: Aggregate data (mean, median, etc.), encoding categorical data, Scaling numeric data, PCA (to reduce dimensionality)
Feature selection: choose a subset of features from data through reducing dimensionality, importance-based selection from algorithms, or removing highly correlated features
Bias and Variance
Bias is caused by a model that is too simple and cannot capture the relationships in the data, resulting in poor performance on test and training data (underfitting)
Variance is caused by an overly complex model that is overfitting on the training data, including all noise, and performs poorly on new data
Variance and Bias are inversely related, and there is a tradeoff between them
As dimensionality grows, the amount of data needed increases exponentially
Decision Trees
Uses nodes to create a flow chart to classify data (Root node: first node and best predictor, internal nodes: decision nodes to split data, lead node: terminal node that
classifies data)
Each split is measured by an impurity score, evaluating how well the plit reduces the mixture of classes
Entropy: Summation of (-Pi * log2(Pi))) where Pi is the fraction of records in class i
Gini: Summation of (Pi ** 2)
Reduce overfitting in trees by implementing pre-pruning (Max depth and min samples to split)
Or post pruning, removing branches that are not significant in predictions
Strengths: Inexpensive, quick at classifying new records, easy to interpret, can handle noisy data, can use numerical and categorical data, can handle redundant features
Overfitting and Cross Validation
To avoid overfitting and other complications during the model's training, a validation set is used to test unseen data and adjust hyperparameters before using the testing data.
Cross-validation can be used when data is limited to avoid over and underfitting on data.
K-fold: data is divided into kk folds of data, trained on k-1 folds, tested on the remaining, and done k times, providing a better understanding of the model
LOOCV: creates a data set of all datapoints but one, and uses left out datapoint as validation, done for each datapoint
Stratified: Used for situations with imbalance class distributions
Weakness: Expensive, data leaks can happen from preprocessing data breofre splits
Strengths: Stable performance estimate, Reduces overfitting and underfitting, and robust hyperparameter selection
KNN
A lazy learner: no training and predicts information directly from the data
Works by assigning the classification of new data to the closest observations in the training data
The metrics compared is distance, which can drastically change the effectiveness of the model
Voting for class can be by majority or weighted
K is the most important metric for KNN, determines how many neighbors are looked at for prediction, and optimal k depends onf the data (noisy data works better with high k
and clean data works better with low k)
CV (cross-validation) is used to determine the best k
Strengths: Easy to understand and implement, instance based
Weaknesses: Data must be scaled, Class imbalance can cause issues if not weighted, cannot be used under high dimensionality, must remove irrelevant features
Naive Bayes
One of the bayes classifiers, naive bayes assumes that features are conditionally independent given the class,
Strengths: Robust performance in most applications, Usually used for text classification, Handles high dimensionality well and resistant to overfitting
Use regular bayes when: feature dependence is important, data is ample in size, model need complexity, and problem requires understanding join-prob
Laplace smoothing is used to avoid a zero probability in machine learning by adding a small number to the calculation of probability
Evalutation
Accuracy: Correct prediction made on the testing set (Error rate is the incorrect predictions made)
Recall: TP/TP+FN used for measuring false negatives
Precision: TP/TP+FP used when the cost of a FP is high
F1 Score: 2 * (Precision*Recall/Precision + Recall): a good metric for imbalance data sets, also measures FP and FN equally, so the costs are equal
ROC curve: Shows how well a model separates P and N classes across different thresholds
Ensemble Methods
Bagging: Train multiple models in parallel to reduce variance (trains on multiple subsets of the OG dataset and uses the aggregate of each model for prediction) use for
models that overfitt easily
Boosting: train models sequentially, where each model focuses on errors to reduce bias (each instance has a weight that gets adjusted each round) used on complex
datasets with missclassifications
Stacking: train diverse base models on the same dataset, create a new dataset based on the outputs of the base models, and train a meta-model on the new dataset to get a
final prediction use when multiple diverse models are usable and need to be combined
Random Forest: uses multiple decision trees and bagging
SVM
Searches for an optimal linear separator for a feature space (A line on a graph that best separates 2 classifications)
The goal is to find the hyperplane with a maximum margin, or maximize the distance between the closest points from each class (closest points are called support vectors). The width
2
margin is 𝑀 = ||𝑤|| , SVM aims to minimize ||w||, to maximize the margin
All new points after training are labeled depending on which side of the hyperplane they land on. SVM must balance maximizing the margin for 2 classes and minimizing classification
error,
All data used for SVM must be scaled (distance-based model) to avoid bias towards larger-scale numbers
Decision boundaries for SVM are only affected by the support vectors; other datapoints are irrelevant
Multiclass for SVM is done by One v All (Train multiple binary SVMs, each treating one class as + and all others as -), used on all datapoints, and assigned the highest scoring class
𝐾(𝐾−1)
One v One (Train 2
binary SVMs for each class pair ex: A vs B, A vs C, B vs C). Each SVM votes, and the class w/ most votes wins
Soft Margin SVM aims to consider a tradeoff between margin and errors, since data can be noisy, and perfect models lead to overfitting. The slack variable penalizes points that are either
within the margin or misclassified.
Kernel Methods are used if data cannot be separated linearly. Data is transformed into a higher-dimensional plane.
SVM is a robust model that is not weak to the dimensionality curse, guarantees a global optimum solution, however, it is sensitive to the kernel choice and needs feature scaling
Neural Networks
Takes an input(s), multiples inputs by a weight, sums up the weighted inputs, adds a bias to the sum, and applies an activation function to get an output
Advantages: Pattern Recognition for nonlinear relations, Can handle high dimensionality, automtated feature extraction, better than most traditional models, combines both structured and
unstructured data, supports NLP, Image Analysis, and anomaly detection, Easily fine tuned
3 levels of AI, AI: computers mimicking human behavior, ML: Ability to learn w/o being explicitly programmed, DL: Extract patterns from data using patterns
Traditional ML requires manual feature engineering and selection, DL uses neural networks to auto learn patterns from the data
Activation functions are used on layers to add non-linearity to the model to capture more complex relations. Binary Class: Sigmoid, Multiclass: Softmax, Regression: No function, Hidden
Layers: RELU to avoid the vanishing gradient problem caused by deep networks with many layers
Training involves updating weights until some stop condition: Epochs met, error/loss rate is below a threshold. Adujsting weights until the model learns to produce outputs close to actual
Forward propagation: computing predictions, Backward propagation: Adjust weights using gradients
Neural Networks suffer from being data hungry, overfitting, interpretability, and computation cost.
K-Means Clustering
Clustering groups data into sets based on similarity and no prior knowledge of the labels (Unsupervised learning), clusters are potential classes found in unlabeled data
3 types of clustering: Centroid: Each cluster is represented by a central vector, Density-Based: Looks for datapoints that are packed together vs those that are spread thin, and
Hierarchical: Clusters are born from trees that split or merge depending on initialization
Centroid-based clustering ex: K-Means, splits data into k clusters based on the distance between points and centroids, aiming to minimize within-cluster variance. K random points are
made centroids, points are assigned to the closest centroid, new centroids are made from the mean of clusters, repeat.
Data must be one-hot encoded, and empty clusters can be reselected, and there is also a chance that centroids get stuck in local minima, which is a non-optimal solution
Characteristics: Supports various data (categorical must be OHE), Fast, bisecting, and k-means++ help initialization, Sensitive to outliers, Curse of dimensionality, needs spherical data
DBSCAN
Used when clusters are arbitrary shapes, points are grouped by proximity a point has a radius eps, all points within eps are neighbors, minpts are the minimum required points in a
neighborhood to be considered dense. Core points meet the minpts, border points dont meet minpts but are within the eps of a core point, outliers are neither core nor border points
2 core points in the same eps are in the same cluster, border points are assigned to the first core point cluster and not reassigned
Characteristics: Can capture arbitrary shapes, Robust to noise/outliers, works across datatypes (spatial, image, network), struggles w/ high dimensional spaces, sensitive to distance
metric used, poor at handling varied densities
Hierarchical Clustering
2 Types: Agglomerative (Bottum Up): each point is its own cluster, merges points based on similarity, Divisive (Top-Down): 1 cluster split on dissimilarity, for both each level represents
different level of granularity. On a dendrogram, merge points shows which clusters are combined, and merge heigh determines level of dissimilarity (lower means more similar)
Agglomerative: at each step, combine the closest pair of clusters based on a distance metric, update distances after new cluster is made, repeat until only 1 cluster remains
Core operation of merging clusters is done by deciding a distance metric, and how to define the distance between clusters (Linkage)
Linkage types: Single(min), Complete(Max), Average(avg. Dist between all pairs), Centroid(dist. Between mean vectors w/ possible inversions), Wards
A proximity matrix tracks and updates the distance between all clusters, and the smallest number in the matrix is the merge formed, the new cluster is created and the distance between
the new cluster and others is made and the matrix is updated
Characteristics: Expensive computation, Sensitive to noise/outlier, distance metrics drastically changes results
Anomaly Detection
Look for rare or unusual events that deviate from the majority of the data, used due to the difficulty in defining abnormal and the lack of outlier data.
Global Perspective: flags anomaly based on the deviation from the whole dataset, Local: flags anomaly based on local/neighborhoods in the data (subsets of data)
Sequential detectionL used in streaming data and minimizing costs, Batch: used when relationship between datapoints can be comprehensively analyzed
Masking: occurs when to many anomalies hide the detection of other anomalies, Swamping: occurs when normal points are misflagged as anomalies due to actual anomalies influence
Model-based detection: detects anomalies based on constructed statistical models and evaluates anomalies by how well they fit the model (Probabliy distributions)
Density-Based: Labels points in low-density regions as anomalies (DBSCAN), adaptive to different shapes in clusters
Cluster-Based: Groups similar datapoints, Identifies anomalies if: they dont fit well into any cluster, lie far away from centers, have low membership scores
Proximity-Based: Datapoints that have the highest distance from their kth neibor are labled anomalies
Isolation Forests: Recursively isolate points, score points based on how quickly they were isolated, minimal parameters needed, low cost, effective in high dimensions
One-class SVM: used in novelty detection (data does not contain outliers, but want to detect anomalies in new data), detects deviations from normal patterns, works with normal data
points, and handles high dimensions well but needs a kernel and scalar selection
Statistical methods: Best for well-understood distributions, Density-based: Ideal for spatial data, Isolation Forest: Excellent for high-dimensional data, Deep learning: Best for complex
patterns in large datasets
Association Analysis
Data mining technique used to find patterns between different items in a large dataset (market-basket analysis: identify rules that predict occurrence of an item based on other items)
Rules suggest a strong correlation between 2 items (items can also be groups of items) and typically represented as an if-then
Transactions Width: Number of items in a transaction, Itemset: a collection of 0 or more items (k-itemset is an itemset with k items), Support Count: Number of transactions that contain
said itemset, Frequent Itemset: an itemset whos support is higher than the minsup
# 𝑜𝑓 𝑡𝑟𝑎𝑛𝑠𝑎𝑐𝑡𝑖𝑜𝑛𝑠 𝑐𝑜𝑛𝑡𝑎𝑖𝑛𝑖𝑛𝑔 𝑋 𝑎𝑛𝑑 𝑌 # 𝑜𝑓 𝑡𝑟𝑎𝑛𝑠𝑎𝑐𝑡𝑖𝑜𝑛𝑠 𝑐𝑜𝑛𝑡𝑎𝑖𝑛𝑖𝑛𝑔 𝑋 𝑎𝑛𝑑 𝑌
𝑆𝑢𝑝𝑝𝑜𝑟𝑡(𝑋 → 𝑌) = 𝑇𝑜𝑡𝑎𝑙 # 𝑜𝑓 𝑡𝑟𝑎𝑛𝑠𝑎𝑐𝑡𝑖𝑜𝑛𝑠
how often a rule is applicable to a dataset, 𝐶𝑜𝑛𝑓𝑖𝑑𝑒𝑛𝑐𝑒(𝑋 → 𝑌) = # 𝑜𝑓 𝑡𝑟𝑎𝑛𝑠𝑎𝑐𝑡𝑖𝑜𝑛𝑠 𝑐𝑜𝑛𝑡𝑎𝑖𝑛𝑖𝑛𝑔 𝑋
how frequently Y appears with X
Frequent Itemset gen: Finad all itemsets that satisfy a minsup, Strong Rule gen: find all rules in the frequent itemsets that satisfy minconf
Apriori Principle: if an itemset is frequent, all subsets must also be frequent; if an itemset is infrequent, all its supersets must also be infrequent and is used to trim exponential candidate
growth. Ex: If bread is infrequent, then no need to test itemsets with bread
Find individual items and their support, prune away items that do not meet minsup, create 2 item sets, prune away based on minsup, create 3 itemset, prune away based on minsup, and
if any possible subset is not frequent prune itemset.
Rule Gen
Maximal Frequent itemset: maximal if an itemset has no immediate frequent supersets, Closed itemset: a frequent itemset tgat has no immediate superset with the same sup count
Antecedents: left side of the rule, Consequent: Right side of the rule
Apriori Rule gen: create rules from freq with 1 item as the consequent, Only high conf rules from the prior set are used to generate new candidates (based on consequents from last
round), repeat and evaluate all candidate rules based on minconf
Evalution for Association Analysis
Lift measures how much more often the antecedent and consequent of a rule occur together than expected if they were statistically independent. Lift can help to identify rules that might
𝑃(𝑋,𝑌) 𝑐(𝑋→𝑌)
be interesting even if the support is not very high. However, lift alone doesn't tell us about the actual frequency of the rule. 𝐿𝑖𝑓𝑡 = 𝑃(𝑋)𝑃(𝑌) = 𝑠(𝑌)
Leverage measures the difference between the observed frequency of a rule and the frequency that would be expected if the rule's items were independent. It adds context to the
support and confidence by showing whether the rule occurs more often than would be expected based on the individual items' frequencies. 𝐿𝑒𝑣𝑒𝑟𝑎𝑔𝑒 = 𝑃(𝑋 ∩ 𝑌) − 𝑃(𝑋)𝑃(𝑌)
Conviction measures the degree to which the antecedent depends on the consequent. It indicates the expected error rate of the rule if the antecedent and consequent were assumed to
1−𝑃(𝑌)
be independent. 1−𝑐(𝑋→𝑌)

Supervised Learning Notes
No ratings yet
Supervised Learning Notes
7 pages
05 - Ensemble Learning
No ratings yet
05 - Ensemble Learning
39 pages
ML Fundamentals
No ratings yet
ML Fundamentals
15 pages
Machine Learning Engineer Cheatsheet
No ratings yet
Machine Learning Engineer Cheatsheet
3 pages
ML Overview
No ratings yet
ML Overview
11 pages
ML - ML in Nutshell
No ratings yet
ML - ML in Nutshell
7 pages
Machine Learning Model Evaluation
No ratings yet
Machine Learning Model Evaluation
8 pages
All About ML
No ratings yet
All About ML
18 pages
ML CHeat Sheet
No ratings yet
ML CHeat Sheet
3 pages
ML CheatSheet
No ratings yet
ML CheatSheet
14 pages
Lecture 5 - Feature Extraction, Model Building & Evaluation
No ratings yet
Lecture 5 - Feature Extraction, Model Building & Evaluation
35 pages
ML and DL
No ratings yet
ML and DL
15 pages
Amlt Bca Unit-1
No ratings yet
Amlt Bca Unit-1
24 pages
U21amg05 Aif and ML Unit 04 Notes
No ratings yet
U21amg05 Aif and ML Unit 04 Notes
42 pages
Module 2 MMC201
No ratings yet
Module 2 MMC201
25 pages
AIML-Unit 5 Notes-Assignment 5
No ratings yet
AIML-Unit 5 Notes-Assignment 5
24 pages
Key Machine Learning Terminologies and Their Expla
No ratings yet
Key Machine Learning Terminologies and Their Expla
4 pages
Machine Learning Engineer Interview Preparation Guide
No ratings yet
Machine Learning Engineer Interview Preparation Guide
14 pages
Module 3 Data Science Machine Learning
No ratings yet
Module 3 Data Science Machine Learning
53 pages
Supervised Learning
No ratings yet
Supervised Learning
30 pages
Week 14 Bias Variance Tradeoff
No ratings yet
Week 14 Bias Variance Tradeoff
3 pages
ML Important
No ratings yet
ML Important
11 pages
Unit 3 Ds
No ratings yet
Unit 3 Ds
10 pages
ML Models
No ratings yet
ML Models
21 pages
Chapter 03
No ratings yet
Chapter 03
30 pages
2-ML Principles
No ratings yet
2-ML Principles
34 pages
Python 06 MachineLearning
No ratings yet
Python 06 MachineLearning
45 pages
Parameter's Resumes
No ratings yet
Parameter's Resumes
18 pages
Week 7 - Tree-Based Model
100% (1)
Week 7 - Tree-Based Model
8 pages
SML
No ratings yet
SML
8 pages
Evaluating Machine Learning Models
100% (2)
Evaluating Machine Learning Models
10 pages
MLTAHER
No ratings yet
MLTAHER
14 pages
Project Report Kodeinkgp
No ratings yet
Project Report Kodeinkgp
6 pages
ES335
No ratings yet
ES335
22 pages
Module 4 Supervised Learning
No ratings yet
Module 4 Supervised Learning
4 pages
Classification
No ratings yet
Classification
4 pages
Unit 3 by GPT
No ratings yet
Unit 3 by GPT
10 pages
DL
No ratings yet
DL
10 pages
Final ML
No ratings yet
Final ML
2 pages
Assign2 01clc.06 Duongmt
No ratings yet
Assign2 01clc.06 Duongmt
23 pages
Notes
No ratings yet
Notes
35 pages
ML11 Generalization
No ratings yet
ML11 Generalization
40 pages
Unit 4 Learning
No ratings yet
Unit 4 Learning
5 pages
GATE ML Updated 111023
No ratings yet
GATE ML Updated 111023
109 pages
Machine Learning Basics
No ratings yet
Machine Learning Basics
3 pages
ML Notes
No ratings yet
ML Notes
15 pages
AI For Eng Supervised-Learning
No ratings yet
AI For Eng Supervised-Learning
25 pages
ML Revision
No ratings yet
ML Revision
5 pages
Machine Learning Super Cheatsheet (Prof. Pedram Jahangiry)
No ratings yet
Machine Learning Super Cheatsheet (Prof. Pedram Jahangiry)
2 pages
Supervised Machine Learning
No ratings yet
Supervised Machine Learning
6 pages
8 Machine Learning Algorithms
No ratings yet
8 Machine Learning Algorithms
13 pages
Jadavpur University: Assignment Submission
No ratings yet
Jadavpur University: Assignment Submission
9 pages
ML Lectures Summary 2
No ratings yet
ML Lectures Summary 2
52 pages
5 Markd
No ratings yet
5 Markd
24 pages
1.write The Formula For Sigmoid, Hyperbolic Tangen...
No ratings yet
1.write The Formula For Sigmoid, Hyperbolic Tangen...
3 pages
Data Mining NOTES
No ratings yet
Data Mining NOTES
57 pages
Chapter 03
No ratings yet
Chapter 03
30 pages
Activity 7
No ratings yet
Activity 7
1 page
TSI-LIB-131Aslam Kassimali Matrix Analysis of Structure-Alvarez1
No ratings yet
TSI-LIB-131Aslam Kassimali Matrix Analysis of Structure-Alvarez1
25 pages
M.A Sahin Et - Al 2003, Lentil Type Identification Using Machine Vision
No ratings yet
M.A Sahin Et - Al 2003, Lentil Type Identification Using Machine Vision
7 pages
Java Regular Expressions Guide
100% (8)
Java Regular Expressions Guide
26 pages
Solved Problems From Hibbelers Book Engineering Mechanics Sections 12 9 and 12 10
No ratings yet
Solved Problems From Hibbelers Book Engineering Mechanics Sections 12 9 and 12 10
20 pages
Chapter 6-The Two Fundamental Motions of Rigid Bodies
No ratings yet
Chapter 6-The Two Fundamental Motions of Rigid Bodies
31 pages
Drone Sound Localization Contest
No ratings yet
Drone Sound Localization Contest
8 pages
Manual - M4016e v106 4saasds
No ratings yet
Manual - M4016e v106 4saasds
63 pages
One Month All Subjects Timetable
No ratings yet
One Month All Subjects Timetable
2 pages
Answer Key - Skill - Arithemtic Operations Addition and Subtraction
No ratings yet
Answer Key - Skill - Arithemtic Operations Addition and Subtraction
8 pages
ML-1 Project INN Hotels Business Report
No ratings yet
ML-1 Project INN Hotels Business Report
45 pages
HW2 2
No ratings yet
HW2 2
3 pages
Geometry: Axioms of Congruence
No ratings yet
Geometry: Axioms of Congruence
5 pages
In-Process Motor Testing Results
No ratings yet
In-Process Motor Testing Results
5 pages
Engineers' Guide to Nonprismatic I-Sections
No ratings yet
Engineers' Guide to Nonprismatic I-Sections
20 pages
Understanding Motion: Key Concepts and Problems
No ratings yet
Understanding Motion: Key Concepts and Problems
75 pages
Algebra 2 ProjectMag - Exp. Growth
No ratings yet
Algebra 2 ProjectMag - Exp. Growth
6 pages
Algebraic Fractions: Simplify & Solve
No ratings yet
Algebraic Fractions: Simplify & Solve
4 pages
Decimal Math Practice
No ratings yet
Decimal Math Practice
5 pages
Multi-Task Learning On Mnist Image Datasets
No ratings yet
Multi-Task Learning On Mnist Image Datasets
4 pages
Maths A Level As Chapter 1
No ratings yet
Maths A Level As Chapter 1
37 pages
Optimization Techniques Guide
No ratings yet
Optimization Techniques Guide
28 pages
IBIG 04 04 Equity Value Enterprise Value Metrics Multiples
No ratings yet
IBIG 04 04 Equity Value Enterprise Value Metrics Multiples
95 pages
PDS ControlStudioOnline
No ratings yet
PDS ControlStudioOnline
6 pages
Event Handling in AEM 6
No ratings yet
Event Handling in AEM 6
18 pages
Psychology Muppadai
No ratings yet
Psychology Muppadai
95 pages
Curriculum Scope & Sequence
100% (1)
Curriculum Scope & Sequence
29 pages
The Shiva625
No ratings yet
The Shiva625
55 pages
Mesh Warping
No ratings yet
Mesh Warping
6 pages
For More Study Material & Test Papers: Manoj Chauhan Sir (Iit-Delhi) Ex. Sr. Faculty (Bansal Classes)
No ratings yet
For More Study Material & Test Papers: Manoj Chauhan Sir (Iit-Delhi) Ex. Sr. Faculty (Bansal Classes)
5 pages

CS Study Guide

Uploaded by

CS Study Guide

Uploaded by

Linear Regression

You might also like