Basic Concepts of Machine Learning (ML) for Interviews
Definition of Machine Learning
Machine Learning is a subset of Artificial Intelligence (AI) that allows systems to learn from
data and improve performance without being explicitly programmed.
Interview Tip: If asked 'What is ML?', say:
ML is a way to create systems that automatically learn patterns from data and make
predictions or decisions without manual programming.
Types of Machine Learning
There are three main types:
1. Supervised Learning
- Data has input (features) and output (labels).
- Goal → Learn a mapping from input to output.
- Examples: Predicting house prices, spam detection.
- Algorithms: Linear Regression, Logistic Regression, Decision Trees, Random Forest, SVM,
Neural Networks.
2. Unsupervised Learning
- Data has only input, no labels.
- Goal → Find hidden patterns or groups in data.
- Examples: Customer segmentation, market basket analysis.
- Algorithms: K-Means, Hierarchical Clustering, PCA.
3. Reinforcement Learning
- Learning by trial and error with rewards and penalties.
- Examples: Self-driving cars, game-playing bots.
- Algorithms: Q-Learning, Deep Q-Networks (DQN).
Interview Tip: If asked 'difference between supervised and unsupervised', stress labels vs
no labels.
Important ML Terminologies
1. Feature (Input Variable) → The independent variables used for prediction (e.g., age,
salary).
2. Label (Target Variable) → The output we want to predict (e.g., house price).
3. Model → The mathematical representation of the problem learned from data.
4. Training Data → Data used to teach the model.
5. Testing Data → Data used to evaluate model performance.
6. Overfitting → Model performs well on training data but poorly on unseen data (too
complex).
7. Underfitting → Model is too simple and fails to capture patterns.
8. Bias-Variance Tradeoff → Balance between simplicity (bias) and flexibility (variance).
Interview Tip: If asked about overfitting → 'Overfitting happens when the model memorizes
training data instead of learning general patterns.'
Steps in a Machine Learning Project
1. Data Collection → Gather raw data.
2. Data Preprocessing → Clean missing values, remove duplicates, normalize/standardize.
3. Feature Engineering → Select or create the most relevant features.
4. Model Selection → Choose suitable algorithms.
5. Training → Fit the model with training data.
6. Evaluation → Check accuracy, precision, recall, F1-score, RMSE, etc.
7. Deployment → Use the model in real-world applications.
8. Monitoring → Track model performance over time.
Interview Tip: If asked 'How do you handle missing values?' → 'By removing rows, imputing
mean/median, or using advanced imputation techniques.'
Evaluation Metrics
Classification Metrics:
- Accuracy
- Precision (how many predicted positives are correct)
- Recall (how many actual positives are detected)
- F1 Score (balance between precision & recall)
- ROC-AUC
Regression Metrics:
- MSE (Mean Squared Error)
- RMSE (Root Mean Squared Error)
- MAE (Mean Absolute Error)
- R² (Coefficient of Determination)
Interview Tip: If asked 'Why not only accuracy?' → 'Accuracy fails with imbalanced datasets,
so we prefer Precision, Recall, or F1-score.'
Common ML Algorithms
1. Linear Regression → Predict continuous values.
2. Logistic Regression → Binary classification (yes/no).
3. Decision Trees → Tree-like structure for classification/regression.
4. Random Forest → Ensemble of decision trees for better accuracy.
5. Support Vector Machines (SVM) → Finds best boundary between classes.
6. K-Nearest Neighbors (KNN) → Classifies based on neighbors.
7. Naïve Bayes → Based on Bayes’ Theorem, good for text classification.
8. K-Means Clustering → Groups similar data points.
9. Neural Networks / Deep Learning → Complex models for images, NLP, etc.
Interview Tip: Be ready to explain one algorithm in detail (e.g., 'Logistic Regression uses a
sigmoid function to output probabilities').
Feature Engineering & Selection
Feature Scaling → Standardization (Z-score), Normalization (0–1 range).
Dimensionality Reduction → PCA (reduce features while keeping variance).
Encoding → One-Hot Encoding, Label Encoding for categorical data.
Interview Tip: If asked 'Why scaling?' → 'Because some algorithms like SVM and KNN are
distance-based and sensitive to feature magnitude.'
Real-World Applications of ML
Healthcare → Disease prediction, medical imaging.
Finance → Fraud detection, stock prediction.
Retail → Recommendation systems.
Autonomous Vehicles → Self-driving cars.
Natural Language Processing → Chatbots, translation.
Interview Tip: If asked 'Where have you applied ML?' → Give a small project example (even
a dataset like Titanic survival prediction).
Challenges in Machine Learning
1. Data Quality → Noisy or missing data.
2. Overfitting & Underfitting.
3. Imbalanced Datasets.
4. Interpretability of Complex Models.
5. Scalability for Big Data.
Interview Tip: If asked about challenges → 'The biggest challenge is getting quality data,
since ML models heavily depend on data.'