Data Scientist & Data Analyst Interview Cheat Sheet
1. Machine Learning (ML) Core Concepts
- Supervised vs Unsupervised Learning
- Overfitting vs Underfitting
- Bias-Variance Tradeoff
- Cross-validation techniques (K-Fold, Stratified)
- Classification vs Regression
- Feature Engineering, Feature Scaling (StandardScaler, MinMaxScaler)
- Evaluation Metrics: Accuracy, Precision, Recall, F1-Score, ROC-AUC
2. Common ML Algorithms
- Linear/Logistic Regression, Decision Trees, Random Forests
- SVM, Naive Bayes, KNN, KMeans
- PCA, t-SNE (Dimensionality Reduction)
- Gradient Descent (Batch, Stochastic, Mini-batch)
3. Deep Learning
- Neural Networks, Activation Functions (ReLU, Sigmoid)
- CNN vs RNN, LSTM
- Backpropagation, Vanishing Gradient
- Dropout, Batch Normalization
- Transfer Learning, Fine-tuning, Early Stopping
4. Python Libraries (ML & DL)
Data Scientist & Data Analyst Interview Cheat Sheet
- scikit-learn: model building, training, evaluation
- pandas: data manipulation, EDA
- NumPy: numerical computations
- TensorFlow/Keras: neural network design & training
- PyTorch: dynamic computation graph (for researchers)
5. SQL (For Analysts and Data Scientists)
- SELECT, WHERE, GROUP BY, HAVING, ORDER BY, LIMIT
- JOINs: INNER, LEFT, RIGHT, FULL
- Subqueries, CTEs (WITH), Window Functions (ROW_NUMBER, RANK)
- Aggregations: COUNT, SUM, AVG, MIN, MAX
- NULL handling: IS NULL, COALESCE, IFNULL
6. Data Analyst Specific
- KPIs, Dashboards, Reporting
- Data Cleaning (handling missing values, duplicates)
- EDA (Histograms, Boxplots, Correlation Matrix)
- Visualization tools: Matplotlib, Seaborn, Power BI, Tableau
7. Coding/Real-World Scenarios
- Writing Python functions to clean or aggregate data
- SQL queries to find top-selling products, churned users
Data Scientist & Data Analyst Interview Cheat Sheet
- Case studies: A/B Testing, Retention Analysis, Customer Segmentation