DS - Sample Questions (Practical)

Uploaded by

almallugamer0420

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

29 views8 pages

DS - Sample Questions (Practical)

Uploaded by

almallugamer0420

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 8

Data Science Sample Interview Questions – Practical

1. Python for Data Science

1. Q: What is the purpose of using Pandas in Data Science?
A: Pandas is used for data manipulation and analysis. It provides data structures like
Series and DataFrame to handle structured data efficiently.
2. Q: How do you read a CSV file in Python using Pandas?
A: import pandas as pd; df = pd.read_csv('file.csv')
3. Q: What function would you use to check for null values in a DataFrame?
A: df.isnull().sum()
4. Q: How can you select a subset of columns in a DataFrame?
A: df[['column1', 'column2']]
5. Q: What is the difference between loc and iloc in Pandas?
A: loc is label-based indexing; iloc is integer-based indexing.
6. Q: How do you handle missing data using Pandas?
A: Use df.fillna() to fill or df.dropna() to remove missing data.
7. Q: How do you group data in Pandas?
A: Use df.groupby('column').agg({'value': 'sum'}) for aggregation.
8. Q: What does the apply() function do in Pandas?
A: It applies a function along an axis of the DataFrame or Series.
9. Q: How do you merge two DataFrames?
A: pd.merge(df1, df2, on='key')
10. Q: How can you convert a column’s datatype in Pandas?
A: df['col'] = df['col'].astype(int)

🔹 2. Statistics & Machine Learning Concepts

11. Q: What is the Central Limit Theorem?
A: It states that the sampling distribution of the sample mean approaches a normal
distribution as the sample size becomes large.
12. Q: Define p-value.
A: The p-value indicates the probability of observing the results given that the null
hypothesis is true.
13. Q: What is the difference between Type I and Type II error?
A: Type I is a false positive (rejecting a true null), Type II is a false negative (failing to
reject a false null).
14. Q: What does R-squared represent?
A: It represents the proportion of variance in the dependent variable explained by
the independent variables.
15. Q: What is multicollinearity?
A: It refers to high correlation between independent variables in regression, which
can distort results.
16. Q: How do you check for normal distribution in data?
A: Use histograms, Q-Q plots, or statistical tests like Shapiro-Wilk.
17. Q: What is hypothesis testing?
A: It is a statistical method to test assumptions (hypotheses) about a parameter in a
population using sample data.
18. Q: What is the difference between supervised and unsupervised learning?
A: Supervised uses labeled data (e.g., regression), unsupervised uses unlabeled data
(e.g., clustering).
19. Q: What is overfitting in machine learning?
A: When a model performs well on training data but poorly on unseen data.
20. Q: How can you prevent overfitting?
A: Use techniques like cross-validation, regularization, pruning, and reducing model
complexity.

🔹 3. Data Wrangling & Visualization

21. Q: What is data wrangling?
A: It’s the process of cleaning and transforming raw data into a usable format.
22. Q: Name a few Python libraries for data visualization.
A: Matplotlib, Seaborn, Plotly.
23. Q: How do you plot a histogram using Seaborn?
A: sns.histplot(data=df, x='column')
24. Q: What is the difference between a bar plot and a histogram?
A: Bar plots show categorical data; histograms show frequency of numerical data
bins.
25. Q: How to identify outliers using box plots?
A: Outliers are shown as points outside the whiskers of a box plot.
26. Q: How can you handle categorical variables?
A: Through one-hot encoding or label encoding.
27. Q: How do you deal with duplicates in a dataset?
A: Use df.drop_duplicates().
28. Q: What does the describe() function do in Pandas?
A: Provides summary statistics for numerical columns.
29. Q: How to rename a column in Pandas?
A: df.rename(columns={'old': 'new'})
30. Q: What is feature scaling and why is it important?
A: It brings features to a similar scale to improve model performance; common
methods include MinMax and Standard Scaler.

🔹 4. SQL & Databases

31. Q: What is a primary key?
A: A column that uniquely identifies each row in a table.
32. Q: How do you select all columns from a table?
A: SELECT * FROM table_name;
33. Q: How do you retrieve unique values from a column?
A: SELECT DISTINCT column FROM table;
34. Q: How to filter data using SQL?
A: Use the WHERE clause. Example: SELECT * FROM table WHERE age > 25;
35. Q: What is the difference between INNER JOIN and LEFT JOIN?
A: INNER JOIN returns only matching records; LEFT JOIN returns all records from the
left table and matching from the right.
36. Q: How do you find the average in SQL?
A: SELECT AVG(column) FROM table;
37. Q: How can you sort results in SQL?
A: Use ORDER BY clause.
38. Q: What does GROUP BY do?
A: It groups rows that have the same values in specified columns.
39. Q: How do you limit results in SQL?
A: Use LIMIT n (MySQL/PostgreSQL) or TOP n (SQL Server).
40. Q: How to use subqueries in SQL?
A: By nesting a SELECT statement inside another. Example: SELECT name FROM
employees WHERE id IN (SELECT emp_id FROM sales);
🔹 5. Machine Learning Algorithms
41. Q: What is linear regression?
A: A method to model the relationship between a dependent variable and one or
more independent variables using a straight line.
42. Q: What is logistic regression used for?
A: For binary classification problems.
43. Q: Name three distance-based algorithms.
A: KNN, K-means, Hierarchical clustering.
44. Q: How does KNN work?
A: It classifies data points based on the majority class among their ‘k’ nearest
neighbors.
45. Q: What is the cost function of logistic regression?
A: Log-loss or binary cross-entropy.
46. Q: What is regularization?
A: A technique to penalize large coefficients in regression to avoid overfitting (L1 and
L2).
47. Q: What is a decision tree?
A: A tree-based model that splits data into branches to make decisions.
48. Q: What are Random Forests?
A: An ensemble of decision trees used for classification and regression.
49. Q: What is Gradient Boosting?
A: A method that builds models sequentially to correct the previous model’s errors.
50. Q: What is cross-validation?
A: A method to evaluate model performance by dividing data into training and
validation sets multiple times.

🔹 6. Deep Learning & NLP

51. Q: What is deep learning?
A: A subset of machine learning that uses neural networks with multiple layers to
learn from data.
52. Q: What is a neural network?
A: A model inspired by the human brain, consisting of layers of nodes (neurons) to
learn complex patterns.
53. Q: What is the activation function?
A: A function that introduces non-linearity to the model. Examples: ReLU, Sigmoid,
Tanh.
54. Q: What is the use of an optimizer in neural networks?
A: It updates the weights of the network to minimize loss. Common optimizers: SGD,
Adam.
55. Q: What is backpropagation?
A: A process to update weights in neural networks using gradients calculated from
the loss function.
56. Q: What is a convolutional neural network (CNN) used for?
A: For image processing tasks like classification and object detection.
57. Q: What is a recurrent neural network (RNN) used for?
A: For sequence data like text or time series.
58. Q: What is the vanishing gradient problem?
A: When gradients become very small during training, making it hard for the model
to learn.
59. Q: How can you mitigate vanishing gradients?
A: Use ReLU activation or architectures like LSTM/GRU.
60. Q: What is the difference between LSTM and GRU?
A: Both are RNN variants, but GRUs are simpler and faster while LSTMs capture long-
term dependencies better.
61. Q: What is word embedding?
A: A technique to represent text in vector space, e.g., Word2Vec, GloVe.
62. Q: What does TF-IDF stand for?
A: Term Frequency-Inverse Document Frequency; it measures word importance in
documents.
63. Q: What is tokenization in NLP?
A: The process of splitting text into words or sentences (tokens).
64. Q: What is lemmatization?
A: Reducing words to their base or dictionary form.
65. Q: What is the Bag of Words model?
A: A representation of text that counts word frequency while ignoring grammar and
word order.
66. Q: What is a stop word?
A: Common words (like 'the', 'is') removed during preprocessing to reduce noise.
67. Q: What is named entity recognition (NER)?
A: The task of identifying entities like people, organizations, and locations in text.
68. Q: What is a language model?
A: A model that learns to predict the probability of a sequence of words.
69. Q: How is NLP used in chatbots?
A: To process and understand user input using techniques like intent recognition and
response generation.
70. Q: What is sentiment analysis?
A: A technique to determine the sentiment (positive/negative/neutral) of text data.

🔹 7. Projects & Business Case Studies

71. Q: Why are real-world projects important in data science?
A: They demonstrate the application of concepts to solve practical business
problems.
72. Q: What’s a good approach to solving a data science case study?
A: Understand the problem, explore the data, preprocess, model, and interpret
results.
73. Q: What kind of data is used in a customer churn project?
A: Customer demographics, transaction history, usage behavior, and service details.
74. Q: How would you approach a fraud detection problem?
A: Use supervised models on labeled data or anomaly detection on unlabeled data.
75. Q: In a sales forecasting project, what algorithms might you use?
A: Time series models like ARIMA, Prophet, or LSTM.
76. Q: What is EDA and why is it important?
A: Exploratory Data Analysis; it helps understand data distribution, patterns, and
anomalies.
77. Q: How can you present your data science project?
A: Use dashboards, reports, and presentations with visuals and key metrics.
78. Q: What is A/B testing used for?
A: To compare two versions of a product or model to determine which performs
better.
79. Q: How do you measure model performance for classification tasks?
A: Metrics like accuracy, precision, recall, F1-score, ROC-AUC.
80. Q: What is a confusion matrix?
A: A table that shows true vs predicted classifications to evaluate model
performance.

🔹 8. Tools & Deployment

81. Q: What is Jupyter Notebook used for?
A: An interactive environment for writing and running code, especially in data
science.
82. Q: What is Anaconda?
A: A distribution of Python for scientific computing, including tools like Jupyter and
libraries like Pandas.
83. Q: What is Git used for?
A: Version control – tracking changes in code and collaboration.
84. Q: What is the purpose of Docker in Data Science?
A: To package projects with dependencies into containers for easy deployment.
85. Q: How does Flask help in ML deployment?
A: It’s a micro web framework for building REST APIs to serve machine learning
models.
86. Q: What is Streamlit?
A: A Python library to build interactive web apps for data science projects easily.
87. Q: What is the difference between REST API and a web app?
A: REST APIs allow data communication; web apps provide user interfaces.
88. Q: What is MLOps?
A: A practice to streamline and automate the lifecycle of ML models, including
development, deployment, and monitoring.
89. Q: What are pipelines in ML?
A: A way to automate a sequence of data processing and modeling steps.
90. Q: What is a model registry?
A: A central place to track, version, and manage machine learning models.

🔹 9. Evaluation & Best Practices

91. Q: How do you handle class imbalance?
A: Use techniques like SMOTE, class weighting, or oversampling/undersampling.
92. Q: What is precision and recall?
A: Precision = TP / (TP + FP), Recall = TP / (TP + FN)
93. Q: What is F1-score?
A: Harmonic mean of precision and recall; useful in imbalanced datasets.
94. Q: What is ROC-AUC?
A: Measures the area under the ROC curve, indicating model performance at various
thresholds.
95. Q: What’s the difference between bagging and boosting?
A: Bagging reduces variance (Random Forest); boosting reduces bias (XGBoost).
96. Q: What are hyperparameters?
A: Parameters set before training a model (e.g., learning rate, depth).
97. Q: How do you tune hyperparameters?
A: Using Grid Search or Random Search with cross-validation.
98. Q: What is feature engineering?
A: Creating new input features from existing ones to improve model performance.
99. Q: What is dimensionality reduction?
A: Reducing the number of features using techniques like PCA or t-SNE.
100. Q: What are some best practices in data science projects?
A: Clear problem definition, clean data, reproducible code, proper documentation,
and model monitoring.

PDF For Ds
No ratings yet
PDF For Ds
7 pages
ML DS Interview Quetions
100% (1)
ML DS Interview Quetions
17 pages
Data Science Interview Q&A Guide
No ratings yet
Data Science Interview Q&A Guide
21 pages
AIL Quiz
No ratings yet
AIL Quiz
30 pages
Data Science Q&A for Class X
No ratings yet
Data Science Q&A for Class X
4 pages
Data Science
No ratings yet
Data Science
28 pages
Data Science Interview Questions
No ratings yet
Data Science Interview Questions
32 pages
200 Data Science Interview Questions
No ratings yet
200 Data Science Interview Questions
16 pages
Top 100 Data Analyst Interview Questions
No ratings yet
Top 100 Data Analyst Interview Questions
16 pages
Data Science MCQs Sample Mid2xlsx 2024 11-29-23!19!54
No ratings yet
Data Science MCQs Sample Mid2xlsx 2024 11-29-23!19!54
8 pages
Machine Learning and AI Quiz
No ratings yet
Machine Learning and AI Quiz
33 pages
Top 30 AI ML Fresher QA
No ratings yet
Top 30 AI ML Fresher QA
3 pages
Viva
No ratings yet
Viva
7 pages
Data Science Interview Qna
No ratings yet
Data Science Interview Qna
5 pages
DS
No ratings yet
DS
7 pages
AI - Question Bank
No ratings yet
AI - Question Bank
20 pages
Data Science
No ratings yet
Data Science
16 pages
Quiz 4 5 6
No ratings yet
Quiz 4 5 6
11 pages
CHP 1,2
No ratings yet
CHP 1,2
18 pages
Day 2 Python Interview QnA
No ratings yet
Day 2 Python Interview QnA
15 pages
Sample Viva Qns
No ratings yet
Sample Viva Qns
3 pages
Untitled Document
No ratings yet
Untitled Document
8 pages
Cls10datascience 24082024 113123
No ratings yet
Cls10datascience 24082024 113123
4 pages
Data Science Theory QA
No ratings yet
Data Science Theory QA
2 pages
Machine Learning Viva Questions
No ratings yet
Machine Learning Viva Questions
6 pages
Ds Viva
No ratings yet
Ds Viva
9 pages
Data Science Tool Box Important Viva Question
No ratings yet
Data Science Tool Box Important Viva Question
14 pages
Unit 4 & 5-Data Science and Computer Vision
No ratings yet
Unit 4 & 5-Data Science and Computer Vision
18 pages
500 Data Science Interview Questions and Answers - Vamsee Puligadda PDF
75% (8)
500 Data Science Interview Questions and Answers - Vamsee Puligadda PDF
141 pages
Machine Learning Concepts and Applications
No ratings yet
Machine Learning Concepts and Applications
7 pages
Untitled Document
No ratings yet
Untitled Document
21 pages
ML SIG - Day 1
No ratings yet
ML SIG - Day 1
55 pages
Top Data Science Interview Questions and Answers in 2023 PDF
100% (1)
Top Data Science Interview Questions and Answers in 2023 PDF
14 pages
Data Science 100 MCQs
100% (1)
Data Science 100 MCQs
16 pages
Easy Interview Questions
No ratings yet
Easy Interview Questions
8 pages
Full Machine Learning Definition
No ratings yet
Full Machine Learning Definition
79 pages
R and Python for Data Science Insights
100% (1)
R and Python for Data Science Insights
7 pages
ML Interview
No ratings yet
ML Interview
17 pages
Feature Engineering Assignment
No ratings yet
Feature Engineering Assignment
7 pages
Common DS Interview Questions and Answers - 1
No ratings yet
Common DS Interview Questions and Answers - 1
4 pages
Data Analytics Lab QA
No ratings yet
Data Analytics Lab QA
7 pages
Company Wise Data Science Interview Questions
100% (2)
Company Wise Data Science Interview Questions
39 pages
Data Science Exam Practice Questions
No ratings yet
Data Science Exam Practice Questions
5 pages
UNIT 4 Data Science Notes
100% (1)
UNIT 4 Data Science Notes
4 pages
Machine Learning Viva Questions
No ratings yet
Machine Learning Viva Questions
8 pages
Data Science Interview Questions
No ratings yet
Data Science Interview Questions
39 pages
ML Lab Viva Questions
No ratings yet
ML Lab Viva Questions
5 pages
ML 2 Marks
No ratings yet
ML 2 Marks
7 pages
Ai ML
No ratings yet
Ai ML
16 pages
Data Science Interview Questions
No ratings yet
Data Science Interview Questions
31 pages
ML Question Set - N
No ratings yet
ML Question Set - N
6 pages
Big Data (Imp-Questions)
No ratings yet
Big Data (Imp-Questions)
17 pages
Essentials of Data Science Exploration
No ratings yet
Essentials of Data Science Exploration
15 pages
Study Structure
No ratings yet
Study Structure
13 pages
ML Question
No ratings yet
ML Question
4 pages
Ds Revision 1
No ratings yet
Ds Revision 1
5 pages
Automatic Diagnosis of The 12-Lead ECG Using A Deep Neural Network
No ratings yet
Automatic Diagnosis of The 12-Lead ECG Using A Deep Neural Network
17 pages
ShubhayanDutta BDS302 CA2 AIML
No ratings yet
ShubhayanDutta BDS302 CA2 AIML
20 pages
AI Engineering Zero To Hero in 18 Months
No ratings yet
AI Engineering Zero To Hero in 18 Months
10 pages
Deep Learning Technique Syllabus
100% (1)
Deep Learning Technique Syllabus
2 pages
Manan Sharma Resume
No ratings yet
Manan Sharma Resume
1 page
CS601 - Machine Learning - Unit 2 New
No ratings yet
CS601 - Machine Learning - Unit 2 New
56 pages
Ai Based Insect Detection and Pesticide Recommendation System Updated
No ratings yet
Ai Based Insect Detection and Pesticide Recommendation System Updated
32 pages
Generative AI Course for Business Decisions
No ratings yet
Generative AI Course for Business Decisions
8 pages
Ai Subject Skill Complete Notes
No ratings yet
Ai Subject Skill Complete Notes
43 pages
Emotion Detection Project Report
No ratings yet
Emotion Detection Project Report
51 pages
NNDL Question Bank
No ratings yet
NNDL Question Bank
9 pages
CNN Assignment Report
No ratings yet
CNN Assignment Report
16 pages
Zhang Robust Mixture-Of-Expert Training For Convolutional Neural Networks ICCV 2023 Paper
No ratings yet
Zhang Robust Mixture-Of-Expert Training For Convolutional Neural Networks ICCV 2023 Paper
12 pages
Metaheuristic Algorithms For Solar Radiation Prediction A Systematic Analysis
No ratings yet
Metaheuristic Algorithms For Solar Radiation Prediction A Systematic Analysis
23 pages
AI Driven Essay Grading and Tutor Arjun C
No ratings yet
AI Driven Essay Grading and Tutor Arjun C
5 pages
Efficient Physics Informed Neural Networks
No ratings yet
Efficient Physics Informed Neural Networks
15 pages
Iot-Based Traffic Prediction and Traffic Signal Control System For Smart City
No ratings yet
Iot-Based Traffic Prediction and Traffic Signal Control System For Smart City
8 pages
Ai in Two Dimensional Random Variable
No ratings yet
Ai in Two Dimensional Random Variable
12 pages
Recognition of Weld Defects From X-Ray Images Based On Improved Convolutional Neural Network
No ratings yet
Recognition of Weld Defects From X-Ray Images Based On Improved Convolutional Neural Network
18 pages
Applsci 15 06019
No ratings yet
Applsci 15 06019
57 pages
Risk Assessment and Management of Excavation System Based On Fuzzy Set Theory and Machine Learning Methods
No ratings yet
Risk Assessment and Management of Excavation System Based On Fuzzy Set Theory and Machine Learning Methods
17 pages
Seaa Master Siipa 2023-2024 Syl
No ratings yet
Seaa Master Siipa 2023-2024 Syl
5 pages
1 Introduction To ML
No ratings yet
1 Introduction To ML
43 pages
DL LWB
No ratings yet
DL LWB
116 pages
U-Shaped Neural Operators: MD Ashiqur Rahman
No ratings yet
U-Shaped Neural Operators: MD Ashiqur Rahman
17 pages
Data-Driven Decision-Making D3M Framework Methodology and Directions
No ratings yet
Data-Driven Decision-Making D3M Framework Methodology and Directions
11 pages
Unit 3 and Unit 4 Notes - Data Science - III BCA 2
No ratings yet
Unit 3 and Unit 4 Notes - Data Science - III BCA 2
27 pages
Building Occupancy Estimation and Detection A Review
No ratings yet
Building Occupancy Estimation and Detection A Review
42 pages
Chapter 10 Artificial Neural Networks Instructor Manual
No ratings yet
Chapter 10 Artificial Neural Networks Instructor Manual
5 pages
Computer Vision Class 10 Questions and Answers
No ratings yet
Computer Vision Class 10 Questions and Answers
5 pages

DS - Sample Questions (Practical)

Uploaded by

DS - Sample Questions (Practical)

Uploaded by

Data Science Sample Interview Questions – Practical

1. Python for Data Science

🔹 2. Statistics & Machine Learning Concepts

🔹 3. Data Wrangling & Visualization

🔹 4. SQL & Databases

🔹 6. Deep Learning & NLP

🔹 7. Projects & Business Case Studies

🔹 8. Tools & Deployment

🔹 9. Evaluation & Best Practices

You might also like