0% found this document useful (0 votes)

16 views32 pages

Data Science Interview - 1

The document outlines the distinctions between Artificial Intelligence (AI), Machine Learning (ML), Deep Learning (DL), and Data Science, highlighting their definitions, applications, and methodologies. AI encompasses systems that simulate human-like reasoning, ML is a subset of AI focused on pattern recognition and decision-making, while DL is a specialized form of ML using neural networks. Data Science is an interdisciplinary field that utilizes various techniques to extract insights from data, including exploratory data analysis and performance metrics.

Uploaded by

Areesha Khan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views32 pages

Data Science Interview - 1

Uploaded by

Areesha Khan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 32

DIFFERENCE BETWEEN ML/DL/DATA SCIENCE/AI

AI:
 AI enable the machine to think.
 Anything which enable machine to mimic human.
 Computer system uses math and logic to simulate the reasoning that people
use to learn from new information and make decisions.
 Example: Manufacturing robots, Self-driving cars, Marketing chatbots

ML:
 Machine learning is considered a subset of AI.
 ML use statistical technique to explore data.
 Algorithms that enable systems to identify patterns, make decisions, and improve
themselves through experience
 Example: Product Recommendations, Image Recognition, Predict Potential Heart
Failure
DL:

 Deep Learning is a subset of Machine Learning where the artificial neural

network and the recurrent neural network come in relation
 The algorithms are created exactly just like machine learning but it
consists of many more levels of algorithms
 All these networks of the algorithm are together called the artificial neural
network
 In much simpler terms, it replicates just like the human brain as all the
neural networks are connected in the brain,
 EXAMPLE: image recognition tools, natural language processing (NLP) and
speech recognition software.

Data Science:

 An interdisciplinary field that constitutes various scientific processes,

algorithms, tools, and machine learning techniques
 working to help find common patterns and gather sensible insights
from the given raw input data using statistical and mathematical
analysis is called Data Science.

WHERE ML,DEEP LEARING USED?

MACHINE LEARNING:

What Are the Different Types of Machine Learning?

1-supervised learning

 It is defined by its use of labeled datasets to train algorithms that to classify data or
predict outcomes accurately

TYPES OF SUPERVISED LEARNING:

 Classification

o Classification algorithms are used to predict/Classify

the discrete values such as Male or Female, True or
False, Spam or Not Spam, etc.

Classification Algorithms can be further divided into the

following types:

o Logistic Regression
o K-Nearest Neighbours
o Support Vector Machines
o Kernel SVM
o Naïve Bayes
o Decision Tree Classification
o Random Forest Classification
o

 REGRESSION:

o Regression algorithms are used to predict the

continuous values such as price, salary, age, etc.

o Types of Regression Algorithm:

o Simple Linear Regression
o Multiple Linear Regression
o Polynomial Regression
o Support Vector Regression
o Decision Tree Regression
o Random Forest Regression

1- Logistic Regression:
o Logistic Regression is used when the dependent variable (target) is categorical.
o MATHEMATICS:
Output = 0 or 1
Hypothesis => Z = WX + B
hΘ(x) = sigmoid (Z)

X is independent variable. e is Euler constant and y is output

Sigmoid in logistic regression. It is simply trying to convert
independent
variable into expression of probability that range between 0 and
1 with
respect to dependent var.
Types of Logistic Regression

1. Binary Logistic Regression

The categorical response has only two 2 possible outcomes. Example: Spam
or Not

2. Multinomial Logistic Regression

Three or more categories without ordering. Example: Predicting which food

is preferred more (Veg, Non-Veg, Vegan)

3. Ordinal Logistic Regression

Three or more categories with ordering. Example: Movie rating from 1 to 5

Decision Boundary:
o To predict which class a data belongs, a threshold can be set. Based

upon this threshold, the obtained estimated probability is classified

into classes.
o Say, if predicted_value ≥ 0.5, then classify email as spam else as not

spam.

2-KNN K-NEAREST NEIGHBOURS

o K-Nearest Neighbour is one of the simplest Machine Learning
algorithms based on Supervised Learning technique.
o K-NN algorithm assumes the similarity between the new case/data and

available cases and put the new case into the category that is most

similar to the available categories

o K-NN algorithm can be used for Regression as well as for Classification
but mostly it is used for the Classification problems.
o K-NN is a non-parametric algorithm, which means it does not make

any assumption on underlying data

o It is also called a lazy learner algorithm because it does not learn
from the training set immediately instead it stores the dataset and at
the time of classification, it performs an action on the dataset.

How does K-NN work?

o Step-1: Select the number K of the neighbors
o Step-2: Calculate the Euclidean distance of K number of neighbors
o Step-3: Take the K nearest neighbors as per the calculated Euclidean
distance.
o Step-4: Among these k neighbors, count the number of the data
points in each category.
o Step-5: Assign the new data points to that category for which the
number of the neighbor is maximum.
o Step-6: Our model is ready.

How value of k is choose?

In KNN, finding the value of k is not easy. A small value of k means that noise will have a
higher influence on the result and a large value make it computationally expensive.

1-Data scientists usually choose as an odd number if the number of classes is 2 and

2-another simple approach to select k is set k=sqrt(n)

2- Lets assume you have a train set xtrain and test set xtest.now create the
model with k value 1 and predict with test set data and check the
accuracy and other parameters then repeat the same process after
increasing the k value by 1 each time.

3-SUPPORT VECTOR MACHINE:

o Support vector machine is highly preferred by many as it produces
significant accuracy with less computation power.
o Support Vector Machine, abbreviated as SVM can be used for both
regression and classification tasks. But, it is widely used in
classification objectives.

o ideology behind SVM:SVM is based on the idea of finding a hyperplane

that best separates the features into different domains. The
hyperplane is a function which is used to differentiate between
features.

o We also draw the parallal two lines which are parallel to the
hyperplane called margin. One of the positive point passed through
one of the hyperplane and other neg point passed through one of the
other hyperplane.

o The distance between the margin and hyperplane is called marginal

distance.

o These margine line is a cusion like if the test point come between
hyperplane and margin we can easily distinguish.
o The points closest to the hyperplane are called as the support vector

points
o The basic intuition to develop over here is that more the farther SV
points, from the hyperplane, more is the probability of correctly
classifying the points in their respective region or classes.
o SV points are very critical in determining the hyperplane because if
the position of the vectors changes the hyperplane’s position is
altered. Technically this hyperplane can also be called as margin
maximizing hyperplane.
o All these techniques are applied on linear separable points.
o If we try to separate the non linear point so the accuracy will be less
than 50 percent
o The closest point which are passing through margin is called support
vector.
o SVM Kernal trick for non linear points
NAÏVE BAYES CLASSIFIEER:

o A Naive Bayes classifier is a probabilistic machine learning model

that’s used for classification task
o Naïve Bayes is based on Bayes Theorem.
o Using Bayes theorem, we can find the probability of A happening,
given that B has occurred. Here, B is the evidence and A is the
hypothesis

o
o

TYPES OF NAÏVE BAYES ALGORITHM

GAUSSIAN NAÏVE BAYES

o It is used where the features are continuous in nature.

o We can apply Gaussian nayes bayes in iris dataset
o What is probability density fun
o Pdf there are three thing in pdf mean variance standard deviation

Bernolli NAÏVE BAYES

o If the nature of features is binary then we can apply bernolli naïve
bayes.

MULTINOMIAL NAVES BAYES

o If you want to count the no. of occurances then you should

pick multinomial nayes bayes.
What is Exploratory Data Analysis?
 Exploratory Data Analysis (EDA) is an approach to analyze the data
using visual techniques. It is used to discover trends, patterns, or to
check assumptions with the help of statistical summary and graphical
representations
 We can analyze the data through
o Shape
o Describe
o Df.info

o
 HANDLING MISSING VALUES
 #drop the data points, since the datapoints are very less so we drop
it
 #remove the column but if the significance of attribute is large so
we can't drop it
 #replace null value to some other value like mean


 Now let’s check if there are any missing values in our dataset or not.
We can see that every column has a different amount of missing
values. Like Gender as 145 missing values and salary has 0. Now for
handling these missing values there can be several cases like
dropping the rows containing NaN or replacing NaN with either mean,
median, mode, or some other value

 If the column is categorical then we can take the mode of value.

Data visualization
Data Visualization is the process of analyzing data in the form of graphs or
maps, making it a lot easier to understand the trends or patterns in the data.
There are various types of visualizations –
 Univariate analysis: This type of data consists of only one variable. The
analysis of univariate data is thus the simplest form of analysis since the
information deals with only one quantity that changes. It does not deal with
causes or relationships and the main purpose of the analysis is to describe
the data and find patterns that exist within it.
 Bi-Variate analysis: This type of data involves two different variables. The
analysis of this type of data deals with causes and relationships and the
analysis is done to find out the relationship among the two variables.
 Multi-Variate analysis: When the data involves three or more variables, it is
categorized under multivariate.

Histogram

It can be used for both uni and bivariate analysis.

PERCENTILE:
percentiles are the values below which a certain percentage of the data in a data
set is found. If you want to know where you stand compared to the rest of the
crowd, you need a statistic that reports relative standing, and that statistic is
called a percentile.

Removing Outliers

For removing the outlier, one must follow the same process of removing an
entry from the dataset using its exact position in the dataset because in all the
above methods of detecting the outliers end result is the list of all those data
items that satisfy the outlier definition according to the method used.
Example: We will detect the outliers using IQR and then we will remove them.
We will also draw the boxplot to see if the outliers are removed or not.

BIAS:
Bias is the gap between actual value and the predicted value.
Low bias means the gap between predicted and actual valu are less
Bias means the error of training data
VARIANCE:
How much scattered the predicted value are.
Low variance mean predicted value are less scattered.
Variance means error of testing data.
UNDERFITTING:

 When I created a model on training dataset and error is very high.This is called
under fitting. And the accuracy is quite low for training data and it is quite
low for testing data.
 If we take example of linear regression, the point are too much scattered from
original line
 High bias, low variance
Techniques to reduce underfitting:
 Increase model complexity
 Increase the number of features, performing feature engineering
 Remove noise from the data.
 Increase the number of epochs or increase the duration of training
to get better results.

OVERFITTING:
 When I created a model on training dataset and error is zero
this is called overfitting. And the accuracy is quite high for
training data but it is quite low for testing data.
 If we take example of polynomial regression of 4th degree regression, the point
are are exactly on original line
 Low bias.high variance

so there are some ways by which we can reduce the occurrence of

overfitting in our model.

 Cross-Validation
 Training with more data
 Removing features
 Early stopping the training
 Regularization

PERFECT FIT:
 If we use polynomial of 2nd degree, it is perfectly fit.
 Low bias.low variance
BIAS VARIANCE TRADE OFF:

 In order to understand bias variance trade off we understand what bias

is and what variance is.

 There is a Prediction error and model complexity

 If we plot relation between prediction error and model complexity when

we have high prediction error and lower model complexity for both
training and testing data then we have low variance and high bias called
under fitting but when we increase model complexity then prediction
error of training dataset is decrease and prediction error of test data
after going down increase which is low bias and high variance overfitting

 This is referred to as the best point chosen for the training of the
algorithm which gives low error in training as well as testing data.

LINEAR V/S NON-LINEAR:

How would you tell if a given dataset is linear or non-linear in nature? Of
course, the selection of the models to be utilized will depend on it.

So, the idea is to apply simple linear regression to the dataset and then to
check least square error. If the least square error shows high accuracy, it
implies the dataset being linear in nature, else dataset is non-linear.

DIFFERENCE BETWEEN FEATURE SELECTION AND DIMENSION REDUCTION:

While both methods are used for reducing the number of features in a dataset, there is
an important difference. Feature selection is simply selecting and excluding given
features without changing them. Dimensionality reduction transforms features into a
lower dimension convey similar information concisely.

What are the differences between MSE and

RMSE
MSE (Mean Squared Error) represents the difference
between the original and predicted values which are
extracted by squaring the average difference over the data
set. The lesser the Mean Squared Error, the closer the fit is
to the data set.
RMSE: A metric that tells us the square root of the average
squared difference between the predicted values and the actual
values in a dataset

RMSE = √Σ(ŷ i – y i ) 2 / n
what is the difference between correlation
and covariance
Covariance and correlation are two terms that are opposed and are both used in statistics and
regression analysis. Covariance shows you how the two variables differ, whereas
correlation shows you how the two variables are related.
Correlationa are bound b/w +1 and -1
Coverance are bound between –inf to +inf

PERFORMANCE METRICS:

The code above boasts 90% accuracy on the test images. As you can guess, this
brilliant performance is an illusion. It doesn’t come from a smart classifier–it comes
from the fact that only one image in the test data is a platypus. The classifier always
denies that an image represents a platypus, so it gets it right in the other 9 cases:

Precision
Here is how precision works. Take all the positive results from the classifier. In
our example, those would be all the images that the system classifies as platypuses.
How many of those are actually platypuses? That’s the classifier’s precision.
As a concrete example, imagine a platypus classifier that’s fussy and particular.
This classifier won’t say that an image is a platypus unless it’s pretty darn sure that
it’s looking at a platypus. Let’s say that for every 100 images it identifies as
platypuses, 98 are indeed platypuses. That’s 98% precision. On the other hand,
the classifier uncompromising attitude might result in a few false negatives.
That’s when the classifier turns down a perfectly good platypus:
RECALL
The counterpart to precision is recall, and we can wrap it up like this: take
all the platypuses in the data. How many of them does the system classify
correctly? Like precision, recall is often expressed a percentage. If a system
has 97% recall, that means that it recognized 97 platypuses over 100.
This alliteration reminds me that I should trust a system with high precision
when it says “yes”. Conversely, I should trust a system with high recall when
it says “no”.
On the other hand, in its eagerness to catch ‘em all, the system might also catch a
few false positives

Picking the Right Metric

Imagine that you’re comparing two systems: one has high

precision, the other has high recall. Which of the two is
better?

You can replace that question with a simpler one: when the
system makes a mistake, which would you rather get–a false
positive, or a false negative? If you want to minimize false
positives, prefer precision. To minimize false negatives,
prefer recall.

Here comes a concrete example. Think back to the machine

learning-powered fire alarm that I mentioned earlier. In that
case, a false positive means that the alarm rings even though
no fire is going on:
By contrast, a false negative means that the system fails to
recognize an ongoing fire:

In this case, a false positive isn’t a big deal, but a false

negative is. Most people won’t mind the occasional
unnecessary walk outside, but nobody wants to linger in a
burning building. For this system, we should focus on recall,
even at the expense of some precision. If the system has high
recall, then we can trust it when it says: “No, there’s no fire
going on.”

F1:
Harmonic mean of precision and recall
The more nearer the model is to 1 the more accurate the model is.
Roc curve
Is apply on binary classification problem

 If we are using logistic regression so we need to decide a threshould that we

usually take as 0.5. In many use case this threshould play a important role.

 In some situation we need more true positive rate or some we need less false
positive rate so domain person will guide you about that.

 So the domain expert person will tell you after seeing how your model is working
in term of graph.

 The output value of use case is 0 and 1 and my model is predicting value in
the range 0 to 1

 We just start with a threshould value like 0 0.2 0.4 and 0.6 0.8

 After this I will calculate true positive rate and false positive rate and plot the
point for threshould zero ,0.2,0.4 etc

 Now joint the lines

 If the domain expert say that we need more true positive rate then we chose
a thresould that show more true positive and less false positive

 If he say I need more true positive and don’t care about false positive rate
TRUE POSITIVE RATE AND FALSE POSITIVE RATE:
The true positive rate (TPR, also called sensitivity) is calculated as TP/TP+FN. TPR is the
probability that an actual positive will test positive. The true negative rate (also called
specificity), which is the probability that an actual negative will test negative. It is calculated as
TN/TN+FP.

CONVOLUTIONAL NEURAL NETWORK:

 Artificial neural networks (ANNs) are a core element of deep

learning algorithms
 The CNN is another type of neural network that can uncover key
information in both time series and image data. For this reason, it
is highly valuable for image-related tasks, such as image
recognition, object classification and pattern recognition.
 To identify patterns within an image, a CNN leverages principles
from linear algebra, such as matrix multiplication
 A deep learning CNN consists of three layers: a convolutional
layer, a pooling layer and a fully connected (FC) layer. The
convolutional layer is the first layer while the FC layer is the last.
 The first two, convolution and pooling layers, perform
feature extraction, whereas the third, a fully connected
layer, maps the extracted features into final output, such
as classification
 A convolution layer plays a key role in CNN, which is
composed of a stack of mathematical operations, such as
convolution, a specialized type of linear operation
OPTIMIZATION OF MACHINE LEARNING MODEL

 The concept of optimisation is integral to machine learning. Most

machine learning models use training data to learn the relationship
between input and output data. The models can then be used to make
predictions about trends or classify new input data. This training is a
process of optimisation, as each iteration aims to improve the model’s
accuracy and lower the margin of error.

 Optimisation is measured through a loss or cost function, which is

typically a way of defining the difference between the predicted and
actual value of data. Machine learning models aim to minimise this loss
function, or lower the gap between prediction and reality of output data.
Iterative optimisation will mean that the machine learning model
becomes more accurate at predicting an outcome or classifying data.

 Tuning or optimising hyperparameters allows the model to be adopted

for specific use cases and with different datasets.
ROC (Receiver Operating Characteristic) Curve tells us about how good the
model can distinguish between two things (e.g If a patient has a disease or no).

The AUC is the area under the ROC curve. This score gives us a good idea of
how well the model performances. If AUC score of 0.5, means that the model is
performing poorly and it is predictions are almost random.
Pandas:
 Python library is primarily for data science to examine, sort or modify data.
 Pandas create data frame which has row and column.
 DATAFRAME
 Used for any kind of analysis or grouping.
 Can be exported to excel.




 only view the column

 only column crim values

 first three rows

 crecords of city column which

are at river side







Data science
Data collection, data cleaning, data exploration, model building, explaining models, model deployment
are all things that data scientists do on the job.

SERIES:A single column or data related to one variable.it correspond to 1d array of dataframe.

DataFrame it consist of multiple column.that it is related to multiple variable.It correspond to 2

dimension array.

We can create a series

a.series([1,2,3])

we can also create a series using numpy array

a=np.array

a.series(a)

Machine Learning Basics for Beginners
No ratings yet
Machine Learning Basics for Beginners
28 pages
Financial Machine Learning-Unit-1: Dr. J.Dhanalakshmi
No ratings yet
Financial Machine Learning-Unit-1: Dr. J.Dhanalakshmi
70 pages
Machine Learning
No ratings yet
Machine Learning
32 pages
Presentation ML-1
No ratings yet
Presentation ML-1
67 pages
Presentation ML 1
No ratings yet
Presentation ML 1
67 pages
Module 3
No ratings yet
Module 3
63 pages
Chapter Four
No ratings yet
Chapter Four
75 pages
Machine Learning Ppts
No ratings yet
Machine Learning Ppts
38 pages
Big Data 2 Analytical Theory
No ratings yet
Big Data 2 Analytical Theory
27 pages
Machine Learning Overview and Techniques
No ratings yet
Machine Learning Overview and Techniques
36 pages
Unit 1
No ratings yet
Unit 1
15 pages
Unit 5
No ratings yet
Unit 5
28 pages
Overview of Machine Learning Algorithms
No ratings yet
Overview of Machine Learning Algorithms
123 pages
Understanding Machine Learning Algorithms - in Depth
No ratings yet
Understanding Machine Learning Algorithms - in Depth
167 pages
Day 4 Content
No ratings yet
Day 4 Content
35 pages
Classification
No ratings yet
Classification
7 pages
Classification
No ratings yet
Classification
74 pages
Supervised Learning Guide
No ratings yet
Supervised Learning Guide
46 pages
Lesson 8 - Classification
No ratings yet
Lesson 8 - Classification
74 pages
Classification and Regression Models
No ratings yet
Classification and Regression Models
20 pages
ML Unit-2 (CEC)
No ratings yet
ML Unit-2 (CEC)
96 pages
Lecture - 2 & 3
No ratings yet
Lecture - 2 & 3
62 pages
Chapter Four - Part One
No ratings yet
Chapter Four - Part One
44 pages
Machine Learning
No ratings yet
Machine Learning
35 pages
DAC ML Tutorial Final Deck
No ratings yet
DAC ML Tutorial Final Deck
150 pages
MLT Study
No ratings yet
MLT Study
22 pages
ML Unit 2 Possible Questions and Answers
No ratings yet
ML Unit 2 Possible Questions and Answers
48 pages
Machine Learning Concepts Explained
No ratings yet
Machine Learning Concepts Explained
17 pages
DM Assignment 2
No ratings yet
DM Assignment 2
23 pages
Chapter 4. Classification Algorithms-Stud
No ratings yet
Chapter 4. Classification Algorithms-Stud
43 pages
Chapter 2
No ratings yet
Chapter 2
31 pages
11 Most Common Machine Learning Algorithms Explained in A Nutshell by Soner Yıldırım Towards Data Science
No ratings yet
11 Most Common Machine Learning Algorithms Explained in A Nutshell by Soner Yıldırım Towards Data Science
16 pages
Intro to Machine Learning for Data Science
No ratings yet
Intro to Machine Learning for Data Science
37 pages
Refer For KNNDecison Tree SVM
No ratings yet
Refer For KNNDecison Tree SVM
90 pages
Classification Algorithms 3rd
No ratings yet
Classification Algorithms 3rd
15 pages
Tutorial 7 Machine Learning Algorithms
No ratings yet
Tutorial 7 Machine Learning Algorithms
30 pages
InSem Question Paper Answer
No ratings yet
InSem Question Paper Answer
15 pages
ML Unit 3
No ratings yet
ML Unit 3
12 pages
Module Iii
No ratings yet
Module Iii
15 pages
Machine Learning and Deep Learning Overview
No ratings yet
Machine Learning and Deep Learning Overview
6 pages
Machine Learning Types & Algorithms
No ratings yet
Machine Learning Types & Algorithms
29 pages
(English (Auto-Generated) ) All Machine Learning Algorithms Explained in 17 Min (DownSub - Com)
No ratings yet
(English (Auto-Generated) ) All Machine Learning Algorithms Explained in 17 Min (DownSub - Com)
19 pages
Machine Learning Concept1
No ratings yet
Machine Learning Concept1
16 pages
Ch3 BayesianNetwork Onwards
No ratings yet
Ch3 BayesianNetwork Onwards
5 pages
Data Science Unit 3
No ratings yet
Data Science Unit 3
33 pages
MCA Machine Learning Practical File
No ratings yet
MCA Machine Learning Practical File
22 pages
Aiya Session 4
No ratings yet
Aiya Session 4
42 pages
Unit 4 Supervised Learning
100% (1)
Unit 4 Supervised Learning
75 pages
Week 8. Supervised Learning. Classification
No ratings yet
Week 8. Supervised Learning. Classification
45 pages
Artificial Intelligence Lec 3
No ratings yet
Artificial Intelligence Lec 3
17 pages
Machine Learning Algorithms
No ratings yet
Machine Learning Algorithms
9 pages
ML Algorithms Week 3
No ratings yet
ML Algorithms Week 3
30 pages
Machine Learning QNA
No ratings yet
Machine Learning QNA
1 page
Notes On Data Science and Machine Learning
No ratings yet
Notes On Data Science and Machine Learning
53 pages
Project Report 2
No ratings yet
Project Report 2
11 pages
CH 7
No ratings yet
CH 7
33 pages
Factors Influencing Environmental Awareness and So
No ratings yet
Factors Influencing Environmental Awareness and So
23 pages
Diabetes Prediction with KNN
No ratings yet
Diabetes Prediction with KNN
2 pages
Automated Pitting Corrosion Detection of Metallic Glass Using Deep Learning
No ratings yet
Automated Pitting Corrosion Detection of Metallic Glass Using Deep Learning
6 pages
IET Nanodielectrics - 2024 - Hechifa - Enhancing Power Transformer Health Assessment Through Dimensional Reduction and
No ratings yet
IET Nanodielectrics - 2024 - Hechifa - Enhancing Power Transformer Health Assessment Through Dimensional Reduction and
13 pages
Implementing Click-Through Relevance Ranking in Solr and LucidWorks Enterprise
No ratings yet
Implementing Click-Through Relevance Ranking in Solr and LucidWorks Enterprise
31 pages
Incremental Learning for Polymorphic Attack Detection
No ratings yet
Incremental Learning for Polymorphic Attack Detection
47 pages
Predicting Student Dropout With Minimal Information
No ratings yet
Predicting Student Dropout With Minimal Information
15 pages
DNA Design
No ratings yet
DNA Design
10 pages
Vision-Based Work Zone Safety Alerts
No ratings yet
Vision-Based Work Zone Safety Alerts
22 pages
Automated Market Research via Reviews
No ratings yet
Automated Market Research via Reviews
62 pages
2023 Scopus Kids Hobby Prediction
No ratings yet
2023 Scopus Kids Hobby Prediction
6 pages
Young Et Al., 2023
No ratings yet
Young Et Al., 2023
9 pages
Review Work
No ratings yet
Review Work
6 pages
A Comparative Study of Deep Learning Models For Guava Leaf Disease Detection
No ratings yet
A Comparative Study of Deep Learning Models For Guava Leaf Disease Detection
5 pages
05 57 67 Cyber Augmenting+wazuh+siem+with+machine
No ratings yet
05 57 67 Cyber Augmenting+wazuh+siem+with+machine
11 pages
16th ICCCNT 2025 Paper 3002
No ratings yet
16th ICCCNT 2025 Paper 3002
6 pages
Geoparser
No ratings yet
Geoparser
8 pages
Diabetic Retinopathy Presentation 30 Slides
No ratings yet
Diabetic Retinopathy Presentation 30 Slides
30 pages
Lecture 7 - Evaluation in IR, Relevance Feedback, Query Expansion
No ratings yet
Lecture 7 - Evaluation in IR, Relevance Feedback, Query Expansion
79 pages
Detection of Deep Fake in Face Images Using Deep Learning
No ratings yet
Detection of Deep Fake in Face Images Using Deep Learning
12 pages
Query Directed Web Page Clustering
No ratings yet
Query Directed Web Page Clustering
9 pages
UAV Detection of Gas Pipeline Leaks
No ratings yet
UAV Detection of Gas Pipeline Leaks
19 pages
Poster Thesis
No ratings yet
Poster Thesis
1 page
Content-Based Image Retrieval System Using Sketches
No ratings yet
Content-Based Image Retrieval System Using Sketches
50 pages
Data Mining Series 2 Important Topics
No ratings yet
Data Mining Series 2 Important Topics
22 pages
9th AI Model Test Paper Answers
No ratings yet
9th AI Model Test Paper Answers
5 pages
AI ML Report
No ratings yet
AI ML Report
24 pages
Sample MCQ
No ratings yet
Sample MCQ
16 pages
Data Science Resume 2.2.25
No ratings yet
Data Science Resume 2.2.25
2 pages
MLT Unit 2 Notes
No ratings yet
MLT Unit 2 Notes
58 pages

Data Science Interview - 1

Uploaded by

Data Science Interview - 1

Uploaded by

DIFFERENCE BETWEEN ML/DL/DATA SCIENCE/AI

 Deep Learning is a subset of Machine Learning where the artificial neural

 An interdisciplinary field that constitutes various scientific processes,

WHERE ML,DEEP LEARING USED?

What Are the Different Types of Machine Learning?

TYPES OF SUPERVISED LEARNING:

o Classification algorithms are used to predict/Classify

Classification Algorithms can be further divided into the

o Regression algorithms are used to predict the

o Types of Regression Algorithm:

X is independent variable. e is Euler constant and y is output

1. Binary Logistic Regression

2. Multinomial Logistic Regression

Three or more categories without ordering. Example: Predicting which food

is preferred more (Veg, Non-Veg, Vegan)

3. Ordinal Logistic Regression

Three or more categories with ordering. Example: Movie rating from 1 to 5

upon this threshold, the obtained estimated probability is classified

2-KNN K-NEAREST NEIGHBOURS

similar to the available categories

any assumption on underlying data

How does K-NN work?

How value of k is choose?

2-another simple approach to select k is set k=sqrt(n)

3-SUPPORT VECTOR MACHINE:

o ideology behind SVM:SVM is based on the idea of finding a hyperplane

o The distance between the margin and hyperplane is called marginal

o A Naive Bayes classifier is a probabilistic machine learning model

TYPES OF NAÏVE BAYES ALGORITHM

GAUSSIAN NAÏVE BAYES

o It is used where the features are continuous in nature.

Bernolli NAÏVE BAYES

MULTINOMIAL NAVES BAYES

o If you want to count the no. of occurances then you should

 If the column is categorical then we can take the mode of value.

It can be used for both uni and bivariate analysis.

so there are some ways by which we can reduce the occurrence of

 In order to understand bias variance trade off we understand what bias

 There is a Prediction error and model complexity

 If we plot relation between prediction error and model complexity when

LINEAR V/S NON-LINEAR:

DIFFERENCE BETWEEN FEATURE SELECTION AND DIMENSION REDUCTION:

What are the differences between MSE and

Picking the Right Metric

Imagine that you’re comparing two systems: one has high

Here comes a concrete example. Think back to the machine

In this case, a false positive isn’t a big deal, but a false

 If we are using logistic regression so we need to decide a threshould that we

 Now joint the lines

CONVOLUTIONAL NEURAL NETWORK:

 Artificial neural networks (ANNs) are a core element of deep

 The concept of optimisation is integral to machine learning. Most

 Optimisation is measured through a loss or cost function, which is

 Tuning or optimising hyperparameters allows the model to be adopted

 only view the column

 only column crim values

 first three rows

 crecords of city column which

DataFrame it consist of multiple column.that it is related to multiple variable.It correspond to 2

We can create a series

we can also create a series using numpy array

You might also like