Category Sub-Category Type of Question
Data Science Maths Subjective
Data Science Misc Subjective
Data Science Misc Subjective
Data Science Misc Subjective
Data Science Misc Subjective
Data Science Misc Subjective
Data Science Misc Subjective
Data Science Misc Subjective
Data Science Misc Subjective
Data Science Model EvaluatSubjective
Data Science Statistics Subjective
Data Science Statistics Subjective
Data Science Statistics Tech
Data Science Time Series Subjective
Deep Learning Misc Subjective
Pre-processing Feature Engin Subjective
Pre-processing Feature Engin Subjective
Machine Learni Decision TreesSubjective
Machine Learni Distance Tech
Machine Learni Gradient desc Subjective
Machine Learni Loss function Subjective
Machine Learni Misc Subjective
Machine Learni Misc Subjective
Machine Learni Misc Subjective
Machine Learni Misc Subjective
Machine Learni Misc Subjective
Machine Learni Misc Subjective
Machine Learni Misc Subjective
Machine Learni Misc Subjective
Machine Learni Model EvaluatSubjective
Machine Learni Model EvaluatSubjective
Machine Learni Model EvaluatSubjective
Machine Learni Model EvaluatSubjective
Machine Learni Model EvaluatSubjective
Machine Learni Model EvaluatTech
Machine Learni Model EvaluatSubjective
Machine Learni Model EvaluatSubjective
Machine Learni Model EvaluatSubjective
Machine Learni Model EvaluatSubjective
Machine Learni Model OptimizSubjective
Machine Learni Model OptimizSubjective
Machine Learni Model Selecti Tech
Machine Learni Neural NetworSubjective
Machine Learni PCA Subjective
Model EvaluatioModel EvaluatSubjective
Pre-processing Feature Engin Subjective
Pre-processing feature ExtracTech
Pre-processing Missing value Subjective
Pre-processing Outliers Subjective
Pre-processing Outliers Subjective
Python Misc Tech
Python Misc Tech
Question Level
What do you understand by Eigenvectors and Eigenvalues?
L2
What made you decide you wanted to be a data Scientist?
L2
What is your favorite and least favorite thing about being a
Data Scientist? L2
How do you approach a project? Walk me through your
problem statement formulation, tools you use, etc. L2
What are your career aspirations? L1
tell us about a project you loved and describe how you solved
each of the phases. L2
When deciding to accept or reject a data science project,
what are some red flags or success criteria you look for? L3
How would you handle an imbalanced dataset? L2
How would you implement a recommendation system for our
company’s users? L3
What’s the difference between Type 1 and Type 2 error? L1
What are collinearity and multicollinearity? L2
Explain prior probability, likelihood and marginal likelihood in
context of naiveBayes algorithm? L1
Is it possible capture the correlation between continuous and
categorical variable? If yes, how? L1
What cross validation technique would you use on time series
data set? Is it k-fold or LOOCV? L2
What is deep learning, and how does it contrast with other
machine learning algorithms? L3
What do you mean by PCA? How does this work? L1
Limitations of PCA L2
How is a decision tree pruned? L2
In k-means or kNN, we use euclidean distance to calculate the
distance between nearest neighbors. Why not manhattan
distance ?
L1
How do you mention the learning rate? L2 Loss / Cost / Error
What are loss function in machine learning? How do they
work? L2 MAE, MAPE, MSE, RMSE
When should you use classification over regression? L1
What is ensemble learning techniques ? L1
How do you ensure you’re not overfitting with a model? L2 AIC, BIC, crossentropy
Pick an algorithm. Write the psuedo-code for a parallel
implementation. L3
What are the different types of Machine Learning? L1
What is bagging and boosting in Machine Learning? L2 Hinge Loss - SVM
how is random forest different from Gradient boosting
algorithm (GBM)? L2
You’ve got a data set to work having p (no. of variable) > n
(no. of observation). Why is OLS as bad option to work with?
Which techniques would be best to use? Why?
L2
What cross-validation technique L2
What’s the F1 score? How would you use it? L1
What do you understand by Precision and Recall? L1
Explain false negative, false positive, true negative and true
positive with a simple example. L1
What is a Confusion Matrix? L1
How is KNN different from K-means clustering? L1
Is it better to have too many false positives or too many false
negatives? Explain. L1
What is the difference between Entropy and Information
Gain? L2
your model is suffering from low bias and high variance.
Which algorithm should you use to tackle it? Why? L1
You’ve built a random forest model with 10000 trees. You got
delighted after getting training error as 0.00. But, the
validation error is 34.23. What is going on?
L2
Explain the difference between L1 and L2 regularization. L2
When does regularization becomes necessary in Machine
Learning? L2
When is Ridge regression favorable over Lasso regression?
L2
In which scenario you will use neural network? L2
Can or cannot PCA + Linear Regression work and why? L3
How do you validate your model? L2
how do you select important variables? Explain your
methods. L2
You are given a data set. The data set contains many
variables, some of which are highly correlated and you know
about it. Your manager has asked you to run PCA. Would you
remove correlated variables first? Why?
L2
How do you impute the missing value? L1
How do you deal with outliers? L1
How would you screen for outliers and what should you do if
you find one? L1
Name a few libraries in Python used for Data Analysis and
Scientific Computations. L1
How are NumPy and SciPy related? L1
Level Type
L1 Beginner Tech
L2 Intermediate Subjective
L3 Expert