0% found this document useful (0 votes)

50 views37 pages

Data Science Notes

Support vector machines (SVMs) are a supervised machine learning method that finds the optimal hyperplane to categorize new examples. Decision trees are another supervised method that splits data into subsets based on attribute values to classify new examples. Both have advantages and disadvantages, such as SVMs performing poorly on large or noisy datasets while decision trees are prone to overfitting. Ensemble methods like boosting and bagging combine multiple models to overcome weaknesses of individual models.

Uploaded by

Balvinder Dhillon

Available Formats

Download as PDF, TXT or read online on Scribd

Download as pdf or txt

0% found this document useful (0 votes)

50 views37 pages

Data Science Notes

Uploaded by

Balvinder Dhillon

Available Formats

Download as PDF, TXT or read online on Scribd

Download as pdf or txt

You are on page 1/ 37

Decision Trees and Support Vector

Machines
Support Vector Machine
Supervised ML which is a non linear machine learning method (it can capture
more complex data) where with the given data, it finds the best hyperplanes
to categorise new examples

Hyperplanes

A plane of dimensionality lower than your data

If your data is 2D, hyperplane is a line (1D)
If your data is 1D, hyperplane is a point (0D)
If your data is 3D, hyperplane is a plane (2D) etc…

How to seperate the data

linearly separable
non linearly separable

Choosing a hyperplane:

We need to maximise the boundaries

We also need to obey some constraints
The margin is the distance from the plane to the closest points – in SVM we
optimise this
SVM considers only the closest (hardest to classify points) – these are the
support vectors (use this to calculate the dot product)
the projection is used to assign a class

Doing this mathematically:

1. calculating d which is the distance between two vectors

2. The constraint:

need to ensure that no point is classified on the wrong side of the line

3. The optimisation

We can find the hyperplane that maximises the distance x2 – x1, which is
the distance between the support vectors.
However, even with this, not all data will fall onto the correct side of any line so
there is a need to relax the contraint and this is called soft margin SVM

The kernel trick

this is used when a reasonable hyperplane can't be drawn
this converts the lower dimension space to a higher dimension space

Types of kernel

polynomial kernel where d is a hyperparameter

radial basis function kernel
Create non-linear combinations of our features to lift your samples onto a
higher-dimensional feature space where we can use a linear decision
boundary to separate your classes
The most used kernel in SVM
Pros and cons of SVMs

Pros:
It works really well with a clear margin of separation
It is effective in high dimensional spaces.
It is effective in cases where the number of dimensions is greater than
the number of samples.
It uses a subset of training points in the decision function (called support
vectors), so it is also memory efficient.
Cons:
It doesn’t perform well when we have large data set because the
required training time is higher
It also doesn’t perform very well, when the data set has more noise i.e.
target classes are overlapping
SVM doesn’t directly provide probability estimates, these are calculated
using an expensive five-fold cross-validation. It is included in the
related SVC method of Python scikit-learn library.

Decision trees
it is an intuitive algorithm

Entropy and information

entropy classifies how homogenous the info is

information is:
Simply the difference between the entropy before and after splitting.
If information is high entropy is almost zero if information is low entropy is
almost 1
So we want to find the combination of splits and thresholds that maximises
the information gain over the tree

As a whole:

Decision trees pros and cons

Pros:
Computationally cheap to use, easy for humans to understand results and it
can deal with irrelevant features also
Cons:
Prone to Overfitting.(It refers to the process when models is trained on
training data too well that any noise in testing data can bring negative
impacts to performance of model.)

Boosting + Bagging
To overcome the limitations of a weak learner we can use booting or bagging.
Both methods use an ensemble of weak learners to build a strong learner
Boosting – choose next learner based on the errors of the last learner
(gradient boosted decision trees)
Bagging – stochastically choose next learners (random forests)
Exploratory Data Analysis
Exploratory Data Analysis
method of looking at data that does not include formal statistical modelling
and inference

Classes of EDA:
Univariate graphical
Univariate Non graphical
Multivariate graphical
Univariate graphical
looking at a single value from an experiment and getting an idea about the
distribution of the value

Categorical, Ordinate, Interval data

A categorical variable (sometimes called a nominal variable) is one that has
two or more categories, but there is no intrinsic ordering to the categories.
Example – hair colour
An ordinal variable is similar to a categorical variable. The difference between
the two is that there is a clear ordering of the categories.
Example – economic groups
An interval variable is similar to an ordinal variable, except that the intervals
between the values of the numerical variable are equally spaced.
Example – evenly spaced price ranges

Categorical non-graphical representations

Characteristics of interest for a categorical variable which are simply the
range of values and the frequency of occurence for each value
A simple tabulation of the frequency of each category is the best univariate
non-graphical EDA for categorical data
Quantitative data representations
The characteristics of the population distribution of a quantitative variable are
its centre, spread, modality (number of peaks in the probability distribution
function), shape (including “heaviness of the tails”), and outliers.

Non graphical representations of quantitative data

In most situations it is worthwhile to think of univariate nongraphical EDA as
telling you about aspects of the histogram of the distribution of the variable of
interest.
If the quantitative variable does not have too many distinct values, a
tabulation, as we used for categorical data, will be a worthwhile univariate,
non-graphical technique.
Mostly, for quantitative variables we are concerned here with the quantitative
numeric (non-graphical) measures which are the various sample statistics

Descriptors of quantitative data

1. Modality - the number of peaks there are

2. Central tendency

Mean - the common and useful measures are the arithmetic mean, median
and mode. There are other means such as geometric, harmonic, turncated
or Winsorized means
Median - the middle value after all values are in an ordered list. for
symmetric distributions, the mean and median coincide

3. Spread - how far away from the centre we are still likely to find data values.
The standard deviation is the square root of the variance
Variance and Standard deviation
Variance is an important property that they are additive for any number
of different independent sources of variation
Standard deviation has the same units as the original data
Inter-quartile range
The IQR is a more robust measure of spread than the variance or
standard deviation.
The IQR is not affected by extreme outliers as strongly (if at all).
Percentiles
more flexible version of quartiles
4. Skew - measure of asymmetry

5. Kurtosis - measure of peakedness relative to a Gaussian shape

Univariate and Graphical

1. Histograms

Barplot in which each bar represents the frequency (count) or proportion

(count/total count) of cases for a range of value
The only one of these techniques that makes sense for categorical data
Generally you will choose between about 5 and 30 bins • It is often
worthwhile to try a few different bin sizes/numbers
It is very instructive to look at multiple samples from the same population to
get a feel for the variation that will be found in histograms

2. Boxplots

Boxplots are very good at presenting information about the central tendency,
symmetry and skew, as well as outliers, although they can be misleading
about aspects such as multimodality

3. Outliers

The term “outlier” is not well defined in statistics, and the definition varies
depending on the purpose and situation. The “outliers” identified by a boxplot,
which could be called “boxplot outliers” are defined as any points more than
1.5 IQRs above Q3 or more than 1.5 IQRs below Q1.

4. Violin plots

A violin plot is like a box plot, which shows peaks in the data. It is used to
visualize the distribution of numerical data. Unlike a box plot that can only
show summary statistics, violin plots depict summary statistics and the
density of each variable.

Multivariate non graphical

1. Cross tabulation - basical bivariate non-graphical EDA technique

2. Correlation of variables

Cramer’s V is used to calculate the correlation between nominal categorical

variables. Recall that nominal variables are ones that take on category labels
but have no natural ordering
The value for Cramer’s V ranges from 0 to 1, with 0 indicating no association
between the variables and 1 indicating a strong association between the
variables.

3. Quantitative variable statistics (covariance)

For two quantitative variables, the basic statistics of interest are the sample
covariance and/or sample correlation. Positive covariance values suggest that
when one measurement is above the mean the other will probably also be
above the mean, and vice versa.

Correlation is closely related to covariance

4. Covariance and correlation matrices

When we have many quantitative variables the most common non-graphical

EDA technique is to calculate all of the pairwise covariances and/or
correlations and assemble them into a matrix.

Graphical multivariate
1. Univariate plots by category

When we have one categorical (usually explanatory) and one quantitative

(usually outcome) variable, graphical EDA usually takes the form of
“conditioning” on the categorical random variable. This simply indicates that
we focus on all of the subjects with a particular level of the categorical
random variable, then make plots of the quantitative variable for those

subjects.

2. Scatterplots

For two quantitative variables, the basic graphical EDA technique is the
scatterplot which has one variable on the x-axis, one on the y-axis and a point
for each case in your dataset. If one variable is explanatory and the other is
outcome, it is a very, very strong convention to put the outcome on the y
(vertical) axis.
In a scatterplot we can increase the dimensionality with things like marker
size, colour, shape etc…but don’t go too far or you will simply overload the

viewer

Clustering and Dimensionality

Reduction
Clustering
Finding natural groups in data
Automatically identifying what is common between data points.

Dimensionality reduction
Transforming a high dimensional space to a low dimensional one.
For visualisation
For feature selection
Can help combat the curse of dimensionality

K-means clustering
simplest clustering approaches and the limitations include:
Assumes spherical distributions
Assumes equal cluster sizes
Requires an estimate of k
A hard estimation method – each point belongs to one and only one cluster

Distances
K-means relies on a Euclidian distance
But there are many other ways that we can measure distance
We need to take a quick look at some of these
Where are they useful where are they problematic

What is Euclidian distance?

The simplest distance metric
It deals with the distance between vectors x and y with i dimensions
The Euclidean distance is simple to calculate and can be useful.
However one must be careful that all units of each dimension have been
normalised, or it will skew the distance.
Also the Euclidean distance tends to become less useful as we move into
spaces with dimensionality much greater than three.
What is Cosine distance?
The cosine of the angle between x and y
Overcomes some of the difficulties encountered by the Euclidean distance in
higher dimensions
Only considers directions of vectors and not their magnitude

What is Manhattan Distance?

The distance between two points if they coulf only move at right angles
Rather similar to the Euclidean distance.
However, the Manhattan distance does not suffer as badly as Euclidean
distance in higher dimensions.
As with Eucildean distance, care should be taken to normalize all dimensions

before calculating the Manhattan distance.

What is Minkowski Distance?

A generalisation of Manhattan and Euclidean distances

Minkowski distance allows for a great deal of flexibility

It is also advisable to explore the simpler metrics first and develop a feeling
for how the choice of p might affect the performance of the distance metric
that you choose

Scaling vectors
In several cases above we mentioned normalizing degrees of freedom
This is generally VERY important in machine learning
We should ensure that all dimensions are of a similar scale
Otherwise arbitrary choices of unit could show up as important trends in data
We can rescale, standardize or normalize

Rescaling - MinMaxScaler
Subtracts the minimum value in the feature and then divides by the range
MinMaxScaler preserves the shape of the original distribution. It doesn’t
meaningfully change the information embedded in the original data.
Note that MinMaxScaler doesn’t reduce the importance of outliers.

Rescale data - RobustScaler

Transforms the feature vector by subtracting the median and then dividing by
the interquartile range (75% value — 25% value)
Note that the range for each feature after RobustScaler is applied is larger
than it was for MinMaxScaler.
Use RobustScaler if you want to reduce the effects of outliers, relative to
MinMaxScaler.

Rescaling – StandardScaler
Standardizes a feature by subtracting the mean and then scaling to unit
variance
Unit variance means dividing all the values by the standard deviation
StandardScaler is the industry’s go-to algorithm

Clustering algorithms – Gaussian Mixture Model

A Gaussian Mixture is a function that is comprised of several Gaussians, each

identified by k ∈ {1,…, K}, where K is the number of clusters of our dataset.
Each Gaussian k in the mixture is comprised of the following parameters:
A mean μ that defines its centre.
A covariance Σ that defines its width. This would be equivalent to the
dimensions of an ellipsoid in a multivariate scenario.
A mixing probability π that defines how big or small the Gaussian function
will be.

GMM expectation maximisation

Start with an initial set of parameters

Repeat until convergence:
Use parameters to calculate latent variables of the model
Use the latent variables to obtain new optimal parameters

GMM - difference from k-means

GMM assigns a probability of belonging to a cluster , k-means assigns only 1

or zero
GMM can handle different shapes of cluster, depending on how free we allow
the covariance matrix to be
GMM is generally more expensive but more nuanced.

Curse of dimensionality
As the dimension of the data increases, the volume causes data to be sparse
Problems with statistical significance
Sparsity can cause dissimilarity

Reducing dimensionality

Dimensionality reduction generally involves finding a smaller set of dimensions

that preserve as much information about the data as possible

Principal component analysis

Take all the factors in the original data and use it to form new factors which
are:
uncorrelated with one another
ranked in order of importance
The steps of PCA:

Step 1: Standardise the data and transform all dimensions to zero mean and
unit variance

z = value - mean/ standard dev

Step 2: Set up a covariance matrix

Step 3:
Obtain the eigenvectors of the covariance matrix
The eignevectors provide us with a new set of vectors which tell us how
much that eigenvector explains the original data

Step 4: Choose how many components to be used and the remaining

vectors is called the feature vector

Step 5: Recast the original data

the feature vector formed using the eignevectors of the covariance matrix
to reorient the data from the original axes to the ones represented by the
principal components

Other ways to reduce dimensionality:

Non-negative matrix factorization

Stochastic methods

Eignevectors and eigenvalues

Eigenvectors are a special set of vectors associated with a linear system of
equations (i.e., a matrix equation)
In our case the linear equations are nothing but the original data features
Eigenvalues are the weights associated with each eigenvector
There are numerous methods for obtaining the eigenvectors and eigenvalues

of a given matrix
Linear Regression & Naive Bayes
Classification
Linear regression

Establish if there is a relationship between two variables

Forecast new observations
The least squares method
How to find the linear equation that best fit the data

Within this, there are variables where:

the dependent variable - values depend on another variable

independent variable - independent
y = mx + c is the normal form but the model does not fit into the data so
errors need to be taken into account by looking at the residuals term. errors
can also rise because of noise

Minimising the errors

focused on the squared errors

this is usually done by calculating the coefficinets
getting the mean values of the data and calculating the slope and intercept

Measuring the quality of the regression:

1. Sum of squared errors

pro - works to compare models for the same data

Cons – not easy to interpret – scales with the number of data where it gives
a squared number
2. Root mean squared error

Divide by number of samples; no longer depends on data size

Take square root so it is in the same dimensions as the underlying data
smaller than the average size of data then it is good and vice versa
3. R squared value

Denominator is the SSE that results if we predict every y value to be the

average of the y in the data
Captures value added by using a model
r2 = 0 – no value
r2 = 1 – perfect model
Allows comparison across models and data
It is unitless
Beware – for hard problems even a good model can have r2 close to zero

Multiple linear regression

can have more than one independent vairable
same procedure as the linear regression above
use r2 values to decide if the model has improved with the addition of new
independent variables

Classification with Naive Bayes

Example:

we have some base distribution that we know of the labels but we want to be
able to predict the labels for some data which is not in this set

Terminology:
1. Posterior

2. Prior - what is the probability the data belongs to y without any other info

3. Likelihood - opposite to the posterior

Now, with all of this, we can compare the classes

Probabilities are multiplicative

When introducing a new data point, we can put them to one dimensional
marginal distributions first then from there, determine what we will be
calculating

Setting the priors

This is what we believe about the data before making any measurement
So here since it is a new data point, we know nothing about x1 and x2 of a
point - how likely is it going to belong to y0 and y1
If we look at the training data ½ belongs to y0 and ½ to y1, so it is reasonable
to say that a new point is 50:50 in the absence of any information
Therefore p(y0)/p(y1) = 0.5/0.5 = 1

We can calculate the log likelihoods for the various terms in the previous
equation.
It turns out that the log likelihood of belonging to the red distribution is ~ -11
and that for the blue distribution is ~ -4. So the blue distribution is more likely
It is interesting to note that the biggest difference was for the x_2 value
Modelling Data
Machine learning
computer systems which are able to learn and adapt without followinf explicit
instructions
The numerical simulation always follows the same algorithm and always gives
the same outcome (ML updates the model on the basis of the data observed)
ML starts with a core algorithm and some data and then updates parameters
withing the core algorithm to best represent the data observed
it is essentially representation + evaluation + optimisation

Supervised ML

Data plus labels

Learning a function that maps an input to an output based on example
inputoutput pairs.

Unsupervised ML

Data do not have labels

Identifying trends in unlabelled datasets
E.g. cluster analysis, is used for exploratory data analysis to find hidden
patterns or grouping in data

Classification

Identifying to which of a set of categories a new sample belongs, on the basis

of a training set
E.g. spam filter or which crystal structure gives a certain pattern

Regression

Models a target prediction value based on the independent variables

Linear regression is a classical method whereas neural network type models
are deep methods
Evaluation/model selection
Parameters and hyperparameters

Parameters - properties of the mdoel which are modified during training

Hyperparameters - set of values that define the model and how it trains but it
is not to be updated during training

Features

When machine learning approaches the data, it will consist of several or more
features which are simply input variables for the model
A feature is a mesurable quantity of something that is observable
Models need features to learn
examples would include the identification and differentiation of a cat and a
car
Feature engineering includes transforming raw data into features that better
represent the underlying problem. It makes inputs into things that an
algorithm can understand. But sometimes som inputs are not algorithm
ready so we need to convert them into something useful. This is where one
hot encoding comes into play

One hot encoding

It is for classification porblmes where the vector of length is the same as the
number of categories.
Each element is the probability that the data represents a given class

Optimisation/Evaluation
1. Evaluation
objective function or scoring function which distinguishes the good from
bad models. it must always represent the goodness of a model in a single
number

Evaluation metrics

1. Mean squared error

used in regression since the square endures a single minimum, avoids local
minima trapping and is easy to calculate
2. Mean absolute error

this is similar to mean squared error but there is no quadric term.

it is more robust to the outliers but MSE penalises large differences more
than MAE
3. Huber loss

quadratic is close to the minimum where if it is linear, it is far from the min
it is more exp to calculate but it overcomes the problems of MSE and MAE
4. Cross entropy

it is used for classification problems and tells us how similar our model
distribution is to the true distribution
penalises all error but more particularly those which are most inaccurate
5. Hinge loss

used for classification

it does not seek to reproduce the distribution of data

Test and validation sets

The model must always be validated on the data that is not used for testing.
Most of the time, only 20& of the data is used for validation.
Need to make sure that the validation and training distributions are the same
Neural Networks & Deep Learning
History of deep neural nets (DNN)
originally a device which was intended for binary classification
it produces a single output from a matrix of inputs of weights and biases
nueral networks are of a single layer
what was key to NN:
back propagation algorithm which was in place to reduce error
the gradients could be used to minimise error
modifications back propagate through the netowkr using the chain rule

Layers of a DNN
it is a multi layer perceptron

it is also called fully connectd layers

There are two ways to program a NN (tensorflor/keras)

sequential: quick and easy

functional: more complex but flexible

Activation functions
1. Linear: simplest form of activation function

2. Sigmoid

Vanishing gradient problem

Secondly , its output isn’t zero centered. It makes the gradient updates go
too far in different directions. 0 < output < 1, and it makes optimization
harder.
Sigmoids saturate and kill gradients.
Sigmoids have slow convergence.

3. Tanh

Output is zero centered

Usually preferred to sigmoid as it converges better
Still it suffers from vanishing gradient problem

4. ReLU

6 times improvement in convergence from Tanh function

Should only be used within Hidden layers of a neural network model
most used

5. LeakyReLU
Some ReLu gradients can be fragile during training and can die.
Cause a weight update which will makes it never activate on any data point
again.
ReLu could result in Dead Neurons.

Backpropagation

Optimisation
1. First order

Minimise the gradient of the loss function with respect to the parameters
Relatively quick, but ignores curvature

2. Second order

Calculate the second derivative of the loss function with respect to

parameters
Slower per step, but includes curvature so can be quicker overall

3. Stochastic gradient descent

Gradient descent – calculate the gradient of the loss of the entire set with
respect to parameters
SGD – calculated per sample rather than on the entire batch.This is much
quicker to calculate, but can lead to high variance
Mini-batch SGD – calculate loss gradient on batches of set size which is
essentially best of both worlds

4. Momentum

High variance oscillations in SGD makes it hard to converge

Momentum softens the oscillations in irrelevant directions

5. Nesterov

Momentum still has problems

M is high even close to the minimum, so we often overshoot
Nesterov accelerated gradient jumps out on the momentum direction, and
estimates a correction to updated the parameters

6. Adaptive Methods

Some parameters update much more often than others

Therefore different learning rates can be appropriate for different parameters
Adagrad modifies the learning rate η at each time step for every parameter
based on the past gradients computed for that parameter

7. AdaDelta
Adagrad suffers because the gradients from all previous steps are
accumulated, so the learning rate continuously decays
AdaDelta circumvents this by storing gradients only from n previous steps

8. Adam

Similar to AdaDelta
Add in information about the mean of the momentum of previous steps too
Works very well in most situations

Regularisation

Candoğan2021 Article AuthenticationAndQualityAssess
No ratings yet
Candoğan2021 Article AuthenticationAndQualityAssess
26 pages
Globalization of Architectural Practice
No ratings yet
Globalization of Architectural Practice
21 pages
All Multiple Choice Without Answer
No ratings yet
All Multiple Choice Without Answer
26 pages
Green Corridors and The Quality of Urban Life in Singapore: Clive Briffett, Navjot Sodhi, Belinda Yuen, Lily Kong
No ratings yet
Green Corridors and The Quality of Urban Life in Singapore: Clive Briffett, Navjot Sodhi, Belinda Yuen, Lily Kong
8 pages
DW and Olap
No ratings yet
DW and Olap
59 pages
CNS-MODEL (New) S
No ratings yet
CNS-MODEL (New) S
5 pages
Aiml Assignment - 1
No ratings yet
Aiml Assignment - 1
2 pages
Query Processing and Optimization
No ratings yet
Query Processing and Optimization
42 pages
Introduction To Pig: SESSION 2016-2017
No ratings yet
Introduction To Pig: SESSION 2016-2017
44 pages
Mathematical System (Applications)
No ratings yet
Mathematical System (Applications)
35 pages
Bda Unit 5
No ratings yet
Bda Unit 5
30 pages
Module - 2 - 2D Transformations
No ratings yet
Module - 2 - 2D Transformations
35 pages
DM Unit V
No ratings yet
DM Unit V
13 pages
Numerical Ability Reasoning Ability: About Tcs NQT
No ratings yet
Numerical Ability Reasoning Ability: About Tcs NQT
7 pages
Coding Question
No ratings yet
Coding Question
36 pages
Syllabus of Big Data Analysis - Proposed
No ratings yet
Syllabus of Big Data Analysis - Proposed
2 pages
BI Module 4 Notes
No ratings yet
BI Module 4 Notes
31 pages
Data Science Module1
No ratings yet
Data Science Module1
20 pages
Chapter 3 Data Representation
No ratings yet
Chapter 3 Data Representation
23 pages
Unit 01 Basic Concepts of DBMS & Data Models
100% (1)
Unit 01 Basic Concepts of DBMS & Data Models
150 pages
Chapter-1:-Introduction To R Language: 1.1 History and Overview
No ratings yet
Chapter-1:-Introduction To R Language: 1.1 History and Overview
7 pages
QB Solved m3
No ratings yet
QB Solved m3
4 pages
Machine Learning - Home - Coursera Quiz PDF
100% (1)
Machine Learning - Home - Coursera Quiz PDF
5 pages
Data Science Engineering Full Time Program Brochure
No ratings yet
Data Science Engineering Full Time Program Brochure
21 pages
DSMP 1.0 CampusX Data Science Mentorship Program
No ratings yet
DSMP 1.0 CampusX Data Science Mentorship Program
14 pages
Soft Computing PDF
No ratings yet
Soft Computing PDF
60 pages
Big Data Nit067
No ratings yet
Big Data Nit067
1 page
Emergingtrendsnotes by Sohail
No ratings yet
Emergingtrendsnotes by Sohail
7 pages
DBMS Unit 4
No ratings yet
DBMS Unit 4
71 pages
Linear Regression Analysis. Statistics 2 Notes
No ratings yet
Linear Regression Analysis. Statistics 2 Notes
20 pages
Sharda dss10 PPT 04
No ratings yet
Sharda dss10 PPT 04
38 pages
Jatayu Season2 Idea-Solution Presentation Template
No ratings yet
Jatayu Season2 Idea-Solution Presentation Template
15 pages
C Dac
80% (5)
C Dac
117 pages
Data Analytics Important Questions
No ratings yet
Data Analytics Important Questions
11 pages
Data Mining Unit 5
No ratings yet
Data Mining Unit 5
30 pages
Machine Learning (6CS4-02) Unit-3 Notes
No ratings yet
Machine Learning (6CS4-02) Unit-3 Notes
21 pages
Discuss Ethical Issues in Data Science Covering .
100% (1)
Discuss Ethical Issues in Data Science Covering .
3 pages
BI Chapter 4 - SP2020 PDF
No ratings yet
BI Chapter 4 - SP2020 PDF
16 pages
Salaryconditional
No ratings yet
Salaryconditional
1 page
Problem Reduction AO Star
No ratings yet
Problem Reduction AO Star
24 pages
DSA 2022 Question Paper
No ratings yet
DSA 2022 Question Paper
2 pages
Intern Report
No ratings yet
Intern Report
51 pages
Data Wrangling
No ratings yet
Data Wrangling
15 pages
Computer Networks
No ratings yet
Computer Networks
5 pages
DAA Unit-1 PPTs On Asymptotic Notations (20.01.2022)
No ratings yet
DAA Unit-1 PPTs On Asymptotic Notations (20.01.2022)
15 pages
6.1 Emerging Databases
No ratings yet
6.1 Emerging Databases
18 pages
Binomial Distribution
No ratings yet
Binomial Distribution
7 pages
BD Problem Solving - I
No ratings yet
BD Problem Solving - I
2 pages
Topics To Be Covered: Introduction Single Item - Deterministic Models - Purchase Inventory Models With
No ratings yet
Topics To Be Covered: Introduction Single Item - Deterministic Models - Purchase Inventory Models With
13 pages
BDA (18CS72) Module-5
No ratings yet
BDA (18CS72) Module-5
52 pages
Class 10 Artificial Intelligence Sample Paper Set 3
No ratings yet
Class 10 Artificial Intelligence Sample Paper Set 3
9 pages
Unit-2 Cgip
No ratings yet
Unit-2 Cgip
20 pages
Questionbank CPP
No ratings yet
Questionbank CPP
7 pages
SPPM Unit-I
No ratings yet
SPPM Unit-I
30 pages
GATE 2024 Question Paper CS &IT With Solutions
No ratings yet
GATE 2024 Question Paper CS &IT With Solutions
28 pages
EVALUATION PPT
No ratings yet
EVALUATION PPT
25 pages
Distributed Computing Question Bank
No ratings yet
Distributed Computing Question Bank
6 pages
Classification - Issues Regarding Classification and Prediction
No ratings yet
Classification - Issues Regarding Classification and Prediction
42 pages
SUN's
No ratings yet
SUN's
16 pages
AoA Important Question
100% (1)
AoA Important Question
3 pages
Unit3 Inferentialnew
No ratings yet
Unit3 Inferentialnew
36 pages
Interview Questions
No ratings yet
Interview Questions
225 pages
Features
No ratings yet
Features
42 pages
Scikit - Notes ML
100% (2)
Scikit - Notes ML
12 pages
PCA
No ratings yet
PCA
4 pages
Perception of Youth Towards Online Shopping
No ratings yet
Perception of Youth Towards Online Shopping
12 pages
Employer Attractiveness - Generation Z Employment Expectations in India
No ratings yet
Employer Attractiveness - Generation Z Employment Expectations in India
13 pages
Influence of Refrigerated Storage On Viability of Microorganisms in Fermented Vegetables Juice
No ratings yet
Influence of Refrigerated Storage On Viability of Microorganisms in Fermented Vegetables Juice
5 pages
(1991) Kresta, MacGregor and Marlin, Multivariate Statistical Monitoring of Process Operating Performance, Can J CH
No ratings yet
(1991) Kresta, MacGregor and Marlin, Multivariate Statistical Monitoring of Process Operating Performance, Can J CH
13 pages
Analysis of High-Dimensional Data: Leif Kobbelt
No ratings yet
Analysis of High-Dimensional Data: Leif Kobbelt
87 pages
Principal Component Analysis (PCA) Based Indexing: March 2017
No ratings yet
Principal Component Analysis (PCA) Based Indexing: March 2017
6 pages
Teaching Classical Machine Learning As A Graduate-Level Course in Chemical Engineering: An Algorithmic Approach
No ratings yet
Teaching Classical Machine Learning As A Graduate-Level Course in Chemical Engineering: An Algorithmic Approach
11 pages
hw4 PDF
No ratings yet
hw4 PDF
3 pages
A Review of Network Traffic Analysis and Prediction Techniques
No ratings yet
A Review of Network Traffic Analysis and Prediction Techniques
22 pages
Machine Learning Basic Principles
No ratings yet
Machine Learning Basic Principles
124 pages
Methodological Note - Measuring Relative Wealth Using Household Asset Indicators
No ratings yet
Methodological Note - Measuring Relative Wealth Using Household Asset Indicators
9 pages
An Overview On Privacy Preserving Data Mining Methodologies
No ratings yet
An Overview On Privacy Preserving Data Mining Methodologies
5 pages
Problem 11: Jordan Canonical Form
No ratings yet
Problem 11: Jordan Canonical Form
2 pages
Time Series Models For Macroeconomics and Finance: K. Vakili
No ratings yet
Time Series Models For Macroeconomics and Finance: K. Vakili
13 pages
ML PDF
No ratings yet
ML PDF
17 pages
Evaluating The Impact of Transformational Leadership On Employee Performance at A Construction Site
No ratings yet
Evaluating The Impact of Transformational Leadership On Employee Performance at A Construction Site
26 pages
Customer Participation, E-Service Quality, Satisfaction: (E) Service Dominant Logic Trinity
No ratings yet
Customer Participation, E-Service Quality, Satisfaction: (E) Service Dominant Logic Trinity
26 pages
Deep Learning - Summary - Deep - Learning
No ratings yet
Deep Learning - Summary - Deep - Learning
17 pages
Lecture of Simca and Classification
No ratings yet
Lecture of Simca and Classification
14 pages
Effect - of - Storage - Time - On - The - Composition On Eucalyptus Wood Extractives
No ratings yet
Effect - of - Storage - Time - On - The - Composition On Eucalyptus Wood Extractives
10 pages
Dimensionality Reduction (Pca)
No ratings yet
Dimensionality Reduction (Pca)
32 pages
HW 5
No ratings yet
HW 5
7 pages
Scikit-Learn Cheat Sheet
No ratings yet
Scikit-Learn Cheat Sheet
1 page
Day 2B - Geometric Morphometrics in R PDF
No ratings yet
Day 2B - Geometric Morphometrics in R PDF
30 pages
VIP Cheatsheet: Unsupervised Learning: Afshine Amidi and Shervine Amidi September 9, 2018
No ratings yet
VIP Cheatsheet: Unsupervised Learning: Afshine Amidi and Shervine Amidi September 9, 2018
3 pages