Amazon Data Analyst Interview Guide
✅ Top Skills & Tools to Focus On
1. SQL (Most Important)
🔹 Redshift SQL (PostgreSQL-based): Amazon uses Redshift, so understand large-scale
querying, performance, and syntax quirks.
🔹 Must master:
o JOINs (INNER, LEFT, etc.)
o GROUP BY, HAVING, WHERE
o Window functions (ROW_NUMBER(), RANK(), LEAD/LAG, etc.)
o Subqueries and CTEs
o Performance tuning for large datasets
🧠 Why it's important: Core skill tested at every stage (assessment, virtual loop, Bar Raiser).
2. Statistical Reasoning & A/B Testing
🔹 Understand:
o P-values, confidence intervals, effect size
o T-tests and multiple comparisons problem (e.g., Bonferroni correction)
o A/B test design, control vs test groups, network effects
🔹 Example Qs:
o How to assess if a 0.04 p-value is significant
o Debugging biased or skewed test results
🧠 Why it's important: Tests your judgment in drawing conclusions from data, which is vital at
Amazon.
3. Business & Analytical Thinking
🔹 Ability to:
o Define & analyze KPIs
o Translate insights into actionable business decisions
o Build dashboards and reports (Tableau or Excel)
🔹 Practice:
o Business case questions like “Why are Prime renewals dropping?”
o Analyzing customer, supply chain, or product metrics
🧠 Why it's important: Amazon expects customer-focused insights, not just technical correctness.
4. Behavioral & Leadership Principle Alignment
🔹 Use the STAR method for storytelling.
🔹 Key Amazon Leadership Principles to highlight:
o Ownership
o Dive Deep
o Deliver Results
o Bias for Action
o Earn Trust
🔹 Sample behavioral Qs:
o "Tell me about a time you exceeded expectations"
o "How do you explain technical insights to non-technical people?"
🧠 Why it's important: Leadership alignment is a make-or-break factor—especially in the Bar Raiser
round.
5. Communication & Stakeholder Collaboration
🔹 Practice:
o Explaining SQL logic or test results to non-technical stakeholders
o Connecting data to business impact
o Presenting clearly using visualizations and summaries
🔹 Tools that help:
o Tableau for dashboards
o PowerPoint/Notion for summaries
🧠 Why it's important: You must influence decisions, not just share data.
6. Tools to Know
Tool Why It Matters
SQL (Redshift) Core querying tool; must master
Tool Why It Matters
Excel For quick analysis, trend finding
Tableau For stakeholder communication & dashboards
Python (Pandas/NumPy) Optional but helpful for data cleaning & analysis
Jupyter Notebooks For Python-based data storytelling
📋 Interview Rounds Overview
Stage Focus Areas
Recruiter Call Tech background, tools, stakeholder experience
Online Assessment SQL + data interpretation + logic puzzles
Virtual/On-site Loop SQL round, business case, behavioral (STAR), stakeholder comms
Bar Raiser Round Deep dives into decisions, ambiguity, ownership mindset
🧠 Preparation Strategy
✅ Daily Practice Goals:
🚀 1–2 SQL challenges (focus on JOINs + window functions)
📊 1 business case scenario weekly (interpret a dashboard or KPIs)
🧠 Behavioral story writing (2–3 STAR stories ready)
📘 Learn A/B testing concepts
🎯 Mock interviews for communication practice
📝 Amazon-Specific Focus Areas
Category What to Study / Practice
SQL Redshift-style large dataset queries, CTEs, ranking
Statistics Hypothesis testing, p-values, A/B test design
Business Sense KPI creation, metric analysis, decision-making using data
Communication Data storytelling, simplifying insights, dashboards
Leadership Fit STAR stories matching Leadership Principles like “Bias for Action”
💬 Final Tip: Think Like an Amazonian
Amazon loves people who “think like an owner”—always solving real problems, diving into the data,
and delivering measurable results.
✅ Highlight:
Projects where you took initiative
Automated something
Caught a data anomaly before it became a problem
Helped others understand insights
Important Terms
🧩 1. Data Fundamentals
Data
Dataset
Structured data
Unstructured data
Semi-structured data
Big Data
Data Pipeline
Data Source
Data Warehouse
Data Lake
Data Mart
ETL (Extract, Transform, Load)
ELT (Extract, Load, Transform)
Data Lifecycle
🧹 2. Data Cleaning & Wrangling
Missing values
Outliers
Data normalization
Data standardization
Data transformation
Null values
Duplicates
Data types (int, float, string, datetime, etc.)
Encoding (Label, One-hot)
Imputation
Feature scaling
📊 3. Data Analysis
Exploratory Data Analysis (EDA)
Descriptive statistics
Summary statistics
Correlation
Covariance
Grouping
Aggregation
Filtering
Sorting
Pivot tables
Cross-tabulation
Time series analysis
📈 4. Visualization
Bar chart
Line graph
Histogram
Box plot
Scatter plot
Pie chart
Heatmap
Treemap
Area chart
Dashboard
Slicers/Filters
Storytelling with data
💡 5. Business Intelligence (BI) Tools
Power BI
Tableau
Looker
Excel (Advanced features like PivotTable, VLOOKUP, XLOOKUP, etc.)
KPI (Key Performance Indicator)
Metric
Dashboard vs. Report
💻 6. SQL Terms
Tables
Columns / Rows
Primary Key / Foreign Key
Joins (INNER, LEFT, RIGHT, FULL)
Subquery
CTE (Common Table Expression)
Window functions (ROW_NUMBER, RANK, etc.)
Aggregate functions (SUM, COUNT, AVG, MIN, MAX)
GROUP BY / HAVING
WHERE / ORDER BY
Indexing
Normalization (1NF, 2NF, 3NF)
🐍 7. Python for Data Analysis
Pandas (DataFrame, Series)
NumPy
Matplotlib / Seaborn
Functions
List comprehension
Lambda functions
Map / Filter / Reduce
Exception handling
File handling (CSV, Excel, JSON)
Regex
JSON handling
🤖 8. Intro to Analytics & Modeling
Predictive analytics
Descriptive analytics
Prescriptive analytics
Regression
Classification
Clustering
Hypothesis testing (p-value, t-test, chi-square test)
Confidence interval
A/B testing
🧠 9. Soft Skills / Business Thinking
Problem-solving
Communication (verbal and visual)
Business acumen
Domain knowledge (Retail, HR, Finance, etc.)
Stakeholder management
Requirement gathering
Root cause analysis
Storytelling
🔐 10. Data Governance / Security
PII (Personally Identifiable Information)
Data privacy
GDPR
Data accuracy
Data lineage
Data integrity
⭐ Bonus Terms to Learn (Advanced/Optional)
API (for data extraction)
Web scraping
Cloud platforms (AWS, GCP, Azure)
DAX (in Power BI)
SQL optimization
Git/GitHub
Version control
Data annotation (for ML)
Data catalog
Techniques performed on Data
📊 A. Data Understanding & Preparation
1. Data Collection
2. Data Profiling
3. Data Cleaning
4. Handling Missing Values
5. Handling Outliers
6. Data Type Conversion
7. Duplicates Removal
8. Data Normalization / Standardization
9. Data Transformation
10. Feature Engineering
11. Feature Selection
12. Encoding Categorical Variables
13. Data Integration (Merging, Joining)
14. Data Reduction (PCA, etc.)
📈 B. Exploratory Data Analysis (EDA)
15. Univariate Analysis
16. Bivariate Analysis
17. Multivariate Analysis
18. Descriptive Statistics
19. Grouping and Aggregation
20. Correlation Analysis
21. Trend Analysis
22. Distribution Analysis
23. Time Series Decomposition (Trend, Seasonality)
24. Crosstab Analysis
📉 C. Data Visualization Techniques
25. Bar Chart / Column Chart
26. Pie Chart / Donut Chart
27. Histogram
28. Box Plot
29. Heatmap
30. Scatter Plot
31. Line Chart
32. Area Chart
33. Pair Plot
34. Treemap
35. Violin Plot
36. Funnel Chart
37. KPI Cards / Scorecards
38. Interactive Dashboards (e.g., Power BI, Tableau)
📚 D. Statistical Techniques
39. Hypothesis Testing
40. Z-test / T-test / Chi-square test / ANOVA
41. Confidence Intervals
42. Sampling Techniques
43. Central Tendency Measures (Mean, Median, Mode)
44. Dispersion Measures (Range, Variance, Std Dev)
45. Probability Distributions (Normal, Poisson, etc.)
🔍 E. Advanced Analytical Techniques
46. Time Series Analysis
47. Clustering (K-Means, Hierarchical)
48. Regression Analysis (Linear, Multiple)
49. Classification Techniques
50. Segmentation Analysis
51. A/B Testing
52. Principal Component Analysis (PCA)
53. Forecasting
54. Text Analysis / NLP
55. Association Rules (Market Basket)
💡 F. Business Analysis & Interpretation
56. KPI Measurement
57. Root Cause Analysis
58. Business Performance Analysis
59. Customer Cohort Analysis
60. Churn/Retention Analysis
61. Sales Funnel Analysis
62. ROI Analysis
63. Profitability Analysis
64. Cost-Benefit Analysis
Here’s a full list of Data Manipulation Techniques (topics only), useful for data analysts working with tools like
Python (pandas), SQL, Excel, etc.
🔁 1. Data Selection & Filtering
Column selection
Row filtering (conditions)
Boolean indexing
.loc[] and .iloc[] in pandas
WHERE clause in SQL
🔄 2. Sorting & Ordering
Sort by one or multiple columns
Ascending/descending order
Custom sort
🔄 3. Renaming & Reshaping
Renaming columns/rows
Transposing data
Pivoting and unpivoting
Melt and stack/unstack (pandas)
PIVOT / UNPIVOT in SQL
🔢 4. Aggregation & Grouping
groupby() in pandas
GROUP BY in SQL
Aggregation functions (sum, mean, count, min, max, median, std)
➕ 5. Creating New Columns
Column calculations
Conditional columns
Date/time extraction (year, month, weekday)
String manipulation (split, strip, replace)
🧹 6. Handling Missing Values
Detecting nulls
Dropping nulls
Filling nulls (mean, median, forward fill, etc.)
Interpolation
🚫 7. Handling Duplicates
Identifying duplicates
Removing duplicates
Keeping first/last/all
✂️8. Data Slicing & Subsetting
Head/tail (view top/bottom rows)
Slicing by index
Date slicing
Random sampling
🔀 9. Data Merging & Joining
Concatenation (rows/columns)
Merging (left, right, inner, outer joins)
JOIN operations in SQL
Merge on index or columns
🧪 10. Data Type Conversion
Convert string to numeric or datetime
astype() in pandas
CAST() / CONVERT() in SQL
📏 11. Scaling & Transformation
Normalization / Standardization
Min-Max scaling
Log transformation
Binning (equal width/frequency)
Z-score
📆 12. Date-Time Manipulation
Converting to datetime
Extracting parts (day, month, year, etc.)
Calculating time differences
Sorting or filtering by date
🧬 13. String Operations
Case conversion (upper/lower)
Replace/substitute
Extract substring
Regex operations
🔧 14. Advanced Techniques
Apply functions (apply, map, lambda)
Window functions (rolling, expanding)
Cumulative metrics (cumsum, cumprod)
User-defined functions (UDFs)
Let me know if you want this as a visual PDF or mind map!