DATA ANALYST INTERVIEW QUESTIONS (Top 50)
1. General & Process
1. What is a Data Analyst?
A professional who collects, cleans, analyzes, and visualizes data to generate actionable insights.
2. Steps in Data Analysis?
Define objectives -> collect data -> clean/wrangle -> analyze -> visualize -> interpret.
3. Data Cleaning Importance?
Ensures accuracy by removing duplicates, fixing formats, handling outliers/missing values.
4. Structured vs Unstructured Data?
Structured: rows/columns (SQL, Excel).
Unstructured: free format (text, images).
5. What is Data Profiling?
Assess quality: distribution, uniqueness, NULLs, consistency.
6. What is a Data Pipeline?
End-to-end data flow: capture -> process -> load for analysis.
2. SQL & Database
7. Define JOIN types:
INNER, LEFT, RIGHT, FULL/FULL OUTER, CROSS JOINS.
8. WHERE vs HAVING?
WHERE: filters before grouping.
HAVING: filters after GROUP BY.
9. Find duplicates in SQL?
SELECT col, COUNT(*) FROM tbl GROUP BY col HAVING COUNT(*)>1
10. LEFT JOIN use case:
All from left table + matching from right.
11. EXISTS vs NOT EXISTS:
EXISTS: checks existence in subquery; NOT EXISTS: opposite.
12. How to optimize slow queries?
Use indexes, avoid SELECT *, analyze execution plans, write efficient joins.
13. What are CTEs and views?
CTE: temporary named result.
View: stored query.
14. Window functions examples:
ROW_NUMBER(), RANK(), SUM(...) OVER (...).
15. Union vs Union ALL:
UNION removes duplicates; UNION ALL keeps all rows.
16. Convert DATETIME -> DATE:
Use CAST(column AS DATE) or DATE(column) based on DB.
17. Subquery vs JOIN:
Subquery: nested SELECT. JOIN: merges tables.
18. Aggregate functions:
COUNT(), SUM(), AVG(), MAX(), MIN()
19. GROUP BY example:
SELECT dept, AVG(sal) FROM emp GROUP BY dept HAVING AVG(sal)>X
20. TOP vs LIMIT/OFFSET:
Use TOP (SQL Server), LIMIT (MySQL), OFFSET-FETCH (ANSI).
3. Data Cleaning & Stats
21. Handling missing data?
Drop rows, fill (mean/median/mode), regression, KNN, multiple imputation.
22. Detect outliers?
IQR method (1.5xIQR rule), standard deviation (+/-3 sigma), box plots.
23. Normalization vs Standardization?
Normalization: scale [0,1]. Standardization: mean=0, sd=1.
24. Mean vs Median vs Mode?
Mean: average; median: middle; mode: most frequent.
25. Descriptive vs Inferential Stats?
Descriptive: summarize data. Inferential: draw conclusions about population.
26. Correlation vs Causation?
Correlation is not causation; avoid incorrect inference.
27. Hypothesis testing steps?
Define H0 & H1, choose test (t/z/chi-square), compute p-value, accept/reject.
28. Type I / II errors?
Type I: false positive; Type II: false negative.
29. When to use t-test vs z-test?
t-test: n<30 or unknown sigma. z-test: n>=30, known sigma.
30. Explain p-value:
Probability of observing results under the null hypothesis.
4. Data Analysis & Modeling
31. What is EDA?
Visual + statistical exploration: distributions, correlations, outliers.
32. Time-series analysis?
Trend, seasonality, forecasting using ARIMA, exponential smoothing.
33. A/B testing?
Compare two versions, establish hypothesis, test statistical significance.
34. Cross-validation importance?
Avoid overfitting; K-fold cross-validation ensures reliable model performance.
35. Regression metrics (MAE, MSE, R-squared)?
MAE: avg abs error; MSE: squared error; R-squared: variance explained.
36. Overfitting vs Underfitting?
Overfitting: model too complex; Underfitting: too simple.
37. Clustering algorithms:
K-Means, Hierarchical, DBSCAN.
38. Bootstrapping meaning?
Resampling technique to estimate accuracy without assuming distributions.
39. Random Forest vs Decision Tree:
RF: ensemble of trees -> better accuracy, less overfit; DT: simple, interpretable.
40. When to use KNN imputation?
When missing values relate to similar records.
5. Visualization & Tools
41. Popular visualization tools?
Tableau, Power BI, Excel, Python's Matplotlib/Seaborn.
42. Pivot table use-case?
Summarize data quickly via drag-and-drop metrics and dimensions.
43. Bar vs Line charts?
Bar: categorical comparison. Line: trends over time.
44. Good dashboard attributes?
Clear KPIs, filters, minimal clutter, story-focused visuals.
45. Calculated fields & parameters in Tableau/Power BI?
Use for dynamic metrics, interactivity.
6. Behavioral & Scenario
46. STAR technique?
Structure answers: Situation -> Task -> Action -> Result.
47. Tell a data story?
Present context, findings, impact using visuals.
48. Describe solving a business problem.
Walk through data approaches and impact outcomes.
49. Explain to non-technical audience?
Use simple language, visuals, emphasize business outcomes.
50. Questions to ask interviewer?
Ask about typical day, team structure, tools, growth opportunities.