DIFFERENCE BETWEEN ML/DL/DATA SCIENCE/AI
AI:
AI enable the machine to think.
Anything which enable machine to mimic human.
Computer system uses math and logic to simulate the reasoning that people
use to learn from new information and make decisions.
Example: Manufacturing robots, Self-driving cars, Marketing chatbots
ML:
Machine learning is considered a subset of AI.
ML use statistical technique to explore data.
Algorithms that enable systems to identify patterns, make decisions, and improve
themselves through experience
Example: Product Recommendations, Image Recognition, Predict Potential Heart
Failure
DL:
Deep Learning is a subset of Machine Learning where the artificial neural
network and the recurrent neural network come in relation
The algorithms are created exactly just like machine learning but it
consists of many more levels of algorithms
All these networks of the algorithm are together called the artificial neural
network
In much simpler terms, it replicates just like the human brain as all the
neural networks are connected in the brain,
EXAMPLE: image recognition tools, natural language processing (NLP) and
speech recognition software.
Data Science:
An interdisciplinary field that constitutes various scientific processes,
algorithms, tools, and machine learning techniques
working to help find common patterns and gather sensible insights
from the given raw input data using statistical and mathematical
analysis is called Data Science.
WHERE ML,DEEP LEARING USED?
MACHINE LEARNING:
What Are the Different Types of Machine Learning?
1-supervised learning
It is defined by its use of labeled datasets to train algorithms that to classify data or
predict outcomes accurately
TYPES OF SUPERVISED LEARNING:
Classification
o Classification algorithms are used to predict/Classify
the discrete values such as Male or Female, True or
False, Spam or Not Spam, etc.
Classification Algorithms can be further divided into the
following types:
o Logistic Regression
o K-Nearest Neighbours
o Support Vector Machines
o Kernel SVM
o Naïve Bayes
o Decision Tree Classification
o Random Forest Classification
o
REGRESSION:
o Regression algorithms are used to predict the
continuous values such as price, salary, age, etc.
o Types of Regression Algorithm:
o Simple Linear Regression
o Multiple Linear Regression
o Polynomial Regression
o Support Vector Regression
o Decision Tree Regression
o Random Forest Regression
1- Logistic Regression:
o Logistic Regression is used when the dependent variable (target) is categorical.
o MATHEMATICS:
Output = 0 or 1
Hypothesis => Z = WX + B
hΘ(x) = sigmoid (Z)
X is independent variable. e is Euler constant and y is output
Sigmoid in logistic regression. It is simply trying to convert
independent
variable into expression of probability that range between 0 and
1 with
respect to dependent var.
Types of Logistic Regression
1. Binary Logistic Regression
The categorical response has only two 2 possible outcomes. Example: Spam
or Not
2. Multinomial Logistic Regression
Three or more categories without ordering. Example: Predicting which food
is preferred more (Veg, Non-Veg, Vegan)
3. Ordinal Logistic Regression
Three or more categories with ordering. Example: Movie rating from 1 to 5
Decision Boundary:
o To predict which class a data belongs, a threshold can be set. Based
upon this threshold, the obtained estimated probability is classified
into classes.
o Say, if predicted_value ≥ 0.5, then classify email as spam else as not
spam.
2-KNN K-NEAREST NEIGHBOURS
o K-Nearest Neighbour is one of the simplest Machine Learning
algorithms based on Supervised Learning technique.
o K-NN algorithm assumes the similarity between the new case/data and
available cases and put the new case into the category that is most
similar to the available categories
o K-NN algorithm can be used for Regression as well as for Classification
but mostly it is used for the Classification problems.
o K-NN is a non-parametric algorithm, which means it does not make
any assumption on underlying data
o It is also called a lazy learner algorithm because it does not learn
from the training set immediately instead it stores the dataset and at
the time of classification, it performs an action on the dataset.
How does K-NN work?
o Step-1: Select the number K of the neighbors
o Step-2: Calculate the Euclidean distance of K number of neighbors
o Step-3: Take the K nearest neighbors as per the calculated Euclidean
distance.
o Step-4: Among these k neighbors, count the number of the data
points in each category.
o Step-5: Assign the new data points to that category for which the
number of the neighbor is maximum.
o Step-6: Our model is ready.
How value of k is choose?
In KNN, finding the value of k is not easy. A small value of k means that noise will have a
higher influence on the result and a large value make it computationally expensive.
1-Data scientists usually choose as an odd number if the number of classes is 2 and
2-another simple approach to select k is set k=sqrt(n)
2- Lets assume you have a train set xtrain and test set xtest.now create the
model with k value 1 and predict with test set data and check the
accuracy and other parameters then repeat the same process after
increasing the k value by 1 each time.
3-SUPPORT VECTOR MACHINE:
o Support vector machine is highly preferred by many as it produces
significant accuracy with less computation power.
o Support Vector Machine, abbreviated as SVM can be used for both
regression and classification tasks. But, it is widely used in
classification objectives.
o ideology behind SVM:SVM is based on the idea of finding a hyperplane
that best separates the features into different domains. The
hyperplane is a function which is used to differentiate between
features.
o We also draw the parallal two lines which are parallel to the
hyperplane called margin. One of the positive point passed through
one of the hyperplane and other neg point passed through one of the
other hyperplane.
o The distance between the margin and hyperplane is called marginal
distance.
o These margine line is a cusion like if the test point come between
hyperplane and margin we can easily distinguish.
o The points closest to the hyperplane are called as the support vector
points
o The basic intuition to develop over here is that more the farther SV
points, from the hyperplane, more is the probability of correctly
classifying the points in their respective region or classes.
o SV points are very critical in determining the hyperplane because if
the position of the vectors changes the hyperplane’s position is
altered. Technically this hyperplane can also be called as margin
maximizing hyperplane.
o All these techniques are applied on linear separable points.
o If we try to separate the non linear point so the accuracy will be less
than 50 percent
o The closest point which are passing through margin is called support
vector.
o SVM Kernal trick for non linear points
NAÏVE BAYES CLASSIFIEER:
o A Naive Bayes classifier is a probabilistic machine learning model
that’s used for classification task
o Naïve Bayes is based on Bayes Theorem.
o Using Bayes theorem, we can find the probability of A happening,
given that B has occurred. Here, B is the evidence and A is the
hypothesis
o
o
o
o
TYPES OF NAÏVE BAYES ALGORITHM
GAUSSIAN NAÏVE BAYES
o It is used where the features are continuous in nature.
o We can apply Gaussian nayes bayes in iris dataset
o What is probability density fun
o Pdf there are three thing in pdf mean variance standard deviation
Bernolli NAÏVE BAYES
o If the nature of features is binary then we can apply bernolli naïve
bayes.
MULTINOMIAL NAVES BAYES
o If you want to count the no. of occurances then you should
pick multinomial nayes bayes.
What is Exploratory Data Analysis?
Exploratory Data Analysis (EDA) is an approach to analyze the data
using visual techniques. It is used to discover trends, patterns, or to
check assumptions with the help of statistical summary and graphical
representations
We can analyze the data through
o Shape
o Describe
o Df.info
o
HANDLING MISSING VALUES
#drop the data points, since the datapoints are very less so we drop
it
#remove the column but if the significance of attribute is large so
we can't drop it
#replace null value to some other value like mean
Now let’s check if there are any missing values in our dataset or not.
We can see that every column has a different amount of missing
values. Like Gender as 145 missing values and salary has 0. Now for
handling these missing values there can be several cases like
dropping the rows containing NaN or replacing NaN with either mean,
median, mode, or some other value
If the column is categorical then we can take the mode of value.
Data visualization
Data Visualization is the process of analyzing data in the form of graphs or
maps, making it a lot easier to understand the trends or patterns in the data.
There are various types of visualizations –
Univariate analysis: This type of data consists of only one variable. The
analysis of univariate data is thus the simplest form of analysis since the
information deals with only one quantity that changes. It does not deal with
causes or relationships and the main purpose of the analysis is to describe
the data and find patterns that exist within it.
Bi-Variate analysis: This type of data involves two different variables. The
analysis of this type of data deals with causes and relationships and the
analysis is done to find out the relationship among the two variables.
Multi-Variate analysis: When the data involves three or more variables, it is
categorized under multivariate.
Histogram
It can be used for both uni and bivariate analysis.
PERCENTILE:
percentiles are the values below which a certain percentage of the data in a data
set is found. If you want to know where you stand compared to the rest of the
crowd, you need a statistic that reports relative standing, and that statistic is
called a percentile.
Removing Outliers
For removing the outlier, one must follow the same process of removing an
entry from the dataset using its exact position in the dataset because in all the
above methods of detecting the outliers end result is the list of all those data
items that satisfy the outlier definition according to the method used.
Example: We will detect the outliers using IQR and then we will remove them.
We will also draw the boxplot to see if the outliers are removed or not.
BIAS:
Bias is the gap between actual value and the predicted value.
Low bias means the gap between predicted and actual valu are less
Bias means the error of training data
VARIANCE:
How much scattered the predicted value are.
Low variance mean predicted value are less scattered.
Variance means error of testing data.
UNDERFITTING:
When I created a model on training dataset and error is very high.This is called
under fitting. And the accuracy is quite low for training data and it is quite
low for testing data.
If we take example of linear regression, the point are too much scattered from
original line
High bias, low variance
Techniques to reduce underfitting:
Increase model complexity
Increase the number of features, performing feature engineering
Remove noise from the data.
Increase the number of epochs or increase the duration of training
to get better results.
OVERFITTING:
When I created a model on training dataset and error is zero
this is called overfitting. And the accuracy is quite high for
training data but it is quite low for testing data.
If we take example of polynomial regression of 4th degree regression, the point
are are exactly on original line
Low bias.high variance
so there are some ways by which we can reduce the occurrence of
overfitting in our model.
Cross-Validation
Training with more data
Removing features
Early stopping the training
Regularization
PERFECT FIT:
If we use polynomial of 2nd degree, it is perfectly fit.
Low bias.low variance
BIAS VARIANCE TRADE OFF:
In order to understand bias variance trade off we understand what bias
is and what variance is.
There is a Prediction error and model complexity
If we plot relation between prediction error and model complexity when
we have high prediction error and lower model complexity for both
training and testing data then we have low variance and high bias called
under fitting but when we increase model complexity then prediction
error of training dataset is decrease and prediction error of test data
after going down increase which is low bias and high variance overfitting
This is referred to as the best point chosen for the training of the
algorithm which gives low error in training as well as testing data.
LINEAR V/S NON-LINEAR:
How would you tell if a given dataset is linear or non-linear in nature? Of
course, the selection of the models to be utilized will depend on it.
So, the idea is to apply simple linear regression to the dataset and then to
check least square error. If the least square error shows high accuracy, it
implies the dataset being linear in nature, else dataset is non-linear.
DIFFERENCE BETWEEN FEATURE SELECTION AND DIMENSION REDUCTION:
While both methods are used for reducing the number of features in a dataset, there is
an important difference. Feature selection is simply selecting and excluding given
features without changing them. Dimensionality reduction transforms features into a
lower dimension convey similar information concisely.
What are the differences between MSE and
RMSE
MSE (Mean Squared Error) represents the difference
between the original and predicted values which are
extracted by squaring the average difference over the data
set. The lesser the Mean Squared Error, the closer the fit is
to the data set.
RMSE: A metric that tells us the square root of the average
squared difference between the predicted values and the actual
values in a dataset
RMSE = √Σ(ŷ i – y i ) 2 / n
what is the difference between correlation
and covariance
Covariance and correlation are two terms that are opposed and are both used in statistics and
regression analysis. Covariance shows you how the two variables differ, whereas
correlation shows you how the two variables are related.
Correlationa are bound b/w +1 and -1
Coverance are bound between –inf to +inf
PERFORMANCE METRICS:
The code above boasts 90% accuracy on the test images. As you can guess, this
brilliant performance is an illusion. It doesn’t come from a smart classifier–it comes
from the fact that only one image in the test data is a platypus. The classifier always
denies that an image represents a platypus, so it gets it right in the other 9 cases:
Precision
Here is how precision works. Take all the positive results from the classifier. In
our example, those would be all the images that the system classifies as platypuses.
How many of those are actually platypuses? That’s the classifier’s precision.
As a concrete example, imagine a platypus classifier that’s fussy and particular.
This classifier won’t say that an image is a platypus unless it’s pretty darn sure that
it’s looking at a platypus. Let’s say that for every 100 images it identifies as
platypuses, 98 are indeed platypuses. That’s 98% precision. On the other hand,
the classifier uncompromising attitude might result in a few false negatives.
That’s when the classifier turns down a perfectly good platypus:
RECALL
The counterpart to precision is recall, and we can wrap it up like this: take
all the platypuses in the data. How many of them does the system classify
correctly? Like precision, recall is often expressed a percentage. If a system
has 97% recall, that means that it recognized 97 platypuses over 100.
This alliteration reminds me that I should trust a system with high precision
when it says “yes”. Conversely, I should trust a system with high recall when
it says “no”.
On the other hand, in its eagerness to catch ‘em all, the system might also catch a
few false positives
Picking the Right Metric
Imagine that you’re comparing two systems: one has high
precision, the other has high recall. Which of the two is
better?
You can replace that question with a simpler one: when the
system makes a mistake, which would you rather get–a false
positive, or a false negative? If you want to minimize false
positives, prefer precision. To minimize false negatives,
prefer recall.
Here comes a concrete example. Think back to the machine
learning-powered fire alarm that I mentioned earlier. In that
case, a false positive means that the alarm rings even though
no fire is going on:
By contrast, a false negative means that the system fails to
recognize an ongoing fire:
In this case, a false positive isn’t a big deal, but a false
negative is. Most people won’t mind the occasional
unnecessary walk outside, but nobody wants to linger in a
burning building. For this system, we should focus on recall,
even at the expense of some precision. If the system has high
recall, then we can trust it when it says: “No, there’s no fire
going on.”
F1:
Harmonic mean of precision and recall
The more nearer the model is to 1 the more accurate the model is.
Roc curve
Is apply on binary classification problem
If we are using logistic regression so we need to decide a threshould that we
usually take as 0.5. In many use case this threshould play a important role.
In some situation we need more true positive rate or some we need less false
positive rate so domain person will guide you about that.
So the domain expert person will tell you after seeing how your model is working
in term of graph.
The output value of use case is 0 and 1 and my model is predicting value in
the range 0 to 1
We just start with a threshould value like 0 0.2 0.4 and 0.6 0.8
After this I will calculate true positive rate and false positive rate and plot the
point for threshould zero ,0.2,0.4 etc
Now joint the lines
If the domain expert say that we need more true positive rate then we chose
a thresould that show more true positive and less false positive
If he say I need more true positive and don’t care about false positive rate
TRUE POSITIVE RATE AND FALSE POSITIVE RATE:
The true positive rate (TPR, also called sensitivity) is calculated as TP/TP+FN. TPR is the
probability that an actual positive will test positive. The true negative rate (also called
specificity), which is the probability that an actual negative will test negative. It is calculated as
TN/TN+FP.
CONVOLUTIONAL NEURAL NETWORK:
Artificial neural networks (ANNs) are a core element of deep
learning algorithms
The CNN is another type of neural network that can uncover key
information in both time series and image data. For this reason, it
is highly valuable for image-related tasks, such as image
recognition, object classification and pattern recognition.
To identify patterns within an image, a CNN leverages principles
from linear algebra, such as matrix multiplication
A deep learning CNN consists of three layers: a convolutional
layer, a pooling layer and a fully connected (FC) layer. The
convolutional layer is the first layer while the FC layer is the last.
The first two, convolution and pooling layers, perform
feature extraction, whereas the third, a fully connected
layer, maps the extracted features into final output, such
as classification
A convolution layer plays a key role in CNN, which is
composed of a stack of mathematical operations, such as
convolution, a specialized type of linear operation
OPTIMIZATION OF MACHINE LEARNING MODEL
The concept of optimisation is integral to machine learning. Most
machine learning models use training data to learn the relationship
between input and output data. The models can then be used to make
predictions about trends or classify new input data. This training is a
process of optimisation, as each iteration aims to improve the model’s
accuracy and lower the margin of error.
Optimisation is measured through a loss or cost function, which is
typically a way of defining the difference between the predicted and
actual value of data. Machine learning models aim to minimise this loss
function, or lower the gap between prediction and reality of output data.
Iterative optimisation will mean that the machine learning model
becomes more accurate at predicting an outcome or classifying data.
Tuning or optimising hyperparameters allows the model to be adopted
for specific use cases and with different datasets.
ROC (Receiver Operating Characteristic) Curve tells us about how good the
model can distinguish between two things (e.g If a patient has a disease or no).
The AUC is the area under the ROC curve. This score gives us a good idea of
how well the model performances. If AUC score of 0.5, means that the model is
performing poorly and it is predictions are almost random.
Pandas:
Python library is primarily for data science to examine, sort or modify data.
Pandas create data frame which has row and column.
DATAFRAME
Used for any kind of analysis or grouping.
Can be exported to excel.
only view the column
only column crim values
first three rows
crecords of city column which
are at river side
Data science
Data collection, data cleaning, data exploration, model building, explaining models, model deployment
are all things that data scientists do on the job.
SERIES:A single column or data related to one variable.it correspond to 1d array of dataframe.
DataFrame it consist of multiple column.that it is related to multiple variable.It correspond to 2
dimension array.
We can create a series
a.series([1,2,3])
we can also create a series using numpy array
a=np.array
a.series(a)