0% found this document useful (0 votes)

22 views10 pages

Interview

Uploaded by

Kalyan Jadhav

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views10 pages

Interview

Uploaded by

Kalyan Jadhav

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

DATA ENGINEERING

INTERVIEW QUESTIONS
SALARY - 40 LPA TO 50 LPA

EXPERIENCE -7 TO 15 YEARS

DEVIKRISHNA R
LinkedIn: @Devikrishna R Email: [email protected]
QUESTIONS
1. What is a Slowly Changing Dimension (SCD)? Explain different
types with examples.
2. Compare Star vs Snowflake schema. In what scenarios would
you choose each?
3. Explain the concept of a surrogate key and its role in data
warehousing.
4. What are Factless Fact Tables? When are they used?
5. How do you design a schema to handle late-arriving
dimensions?

SQL for Analytics & Data Warehousing

6. Write a SQL query using a window function to calculate a
rolling average over 7 days.
7. How would you handle deduplication in a massive fact table
with multiple duplicates?
8. Explain how CTEs (Common Table Expressions) help in
modularizing complex warehouse queries.
9. Describe a scenario where you used a CROSS JOIN effectively.
10. Write a SQL query to get the second highest sale per
region using DENSE_RANK.

Performance & Optimization

11. What is partitioning in SQL warehouses? How does it
improve performance?
12. Difference between clustered and non-clustered indexes.
Which is better in data warehousing?
13. How do materialized views help improve data
warehouse performance?
14. How would you optimize a slow query running on a 10
TB fact table?
15. Explain the trade-offs between denormalization and
normalization in warehouses.

ETL & Data Pipeline Scenarios

16. How do you implement CDC (Change Data Capture) in
your data pipeline?
17. Describe your approach to incremental loading in a SQL-
based data warehouse.
18. How do you ensure idempotency in data warehouse
pipelines?
19. How do you validate and reconcile large volumes of data
post-ingestion?
20. Explain your experience handling schema evolution in a
data warehouse.

Data Warehousing Concepts & Architecture

1. What is a Slowly Changing Dimension (SCD)? Explain different
types with examples.
Answer:
SCD manages changes in dimensional data over time.
 Type 1: Overwrites old data (e.g., correcting a typo).
 Type 2: Keeps history by adding a new row (e.g., address
change).
 Type 3: Tracks limited history in the same row (e.g., previous
and current manager). Use Type 2 in most real-world scenarios
to preserve historical accuracy.

2. Compare Star vs Snowflake schema. When would you choose

each?
Answer:
 Star Schema: Denormalized, faster query performance, easy
joins. Ideal for dashboards.
 Snowflake Schema: Normalized, reduces redundancy, better for
complex ETL pipelines.
Choose Star for speed, Snowflake when data size is huge or
write performance matters.

3. What is a surrogate key and why is it used?

Answer:
A surrogate key is a system-generated unique identifier (usually
an integer) used instead of natural keys.
 Ensures data consistency across systems.
 Allows tracking changes over time (e.g., in SCD Type 2).
Useful when source keys are not unique or stable.
4. What are Factless Fact Tables? When are they used?
Answer:
Factless fact tables contain no numeric measures, only foreign
keys.
 Used to track events (e.g., student attendance) or coverage
(e.g., promotion eligibility). Enables powerful analysis via joins
and counts.

5. How do you handle late-arriving dimensions in a warehouse?

Answer:
 Use a placeholder surrogate key (e.g., -1 or "Unknown") for the
missing dimension.
 Once the dimension arrives, update the fact record via a
merge/upsert. Modern ETL tools can do this automatically with
CDC and upsert logic.

SQL for Analytics & Data Warehousing

6. Write a SQL query using a window function to calculate a
rolling average over 7 days.
SELECT
user_id,
event_date,
AVG(metric_value) OVER (
PARTITION BY user_id
ORDER BY event_date
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
) AS rolling_avg_7d
FROM events;

7. How do you deduplicate a large fact table with multiple

duplicates?
Answer:
Use ROW_NUMBER() or RANK() with a partition.
WITH deduped AS (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY transaction_id ORDER BY
updated_at DESC) AS rn
FROM fact_sales
)
SELECT * FROM deduped WHERE rn = 1;

8. Explain how CTEs help modularize complex warehouse

queries.

CTEs (Common Table Expressions) break queries into readable,

reusable layers, aiding debugging, logic reuse, and maintaining
code quality.
Especially useful for multi-step transformations, recursive
queries, and fact-to-fact joins.

9. Describe a scenario where you used a CROSS JOIN effectively.

Answer:
Used in generating a calendar table, combinations of
campaigns and regions, or for AB testing permutations.
SELECT *
FROM campaigns
CROSS JOIN regions;

10. Write a SQL query to get the second highest sale per
region using DENSE_RANK.
SELECT * FROM (
SELECT region, sale_amount,
DENSE_RANK() OVER (PARTITION BY region ORDER BY
sale_amount DESC) AS rnk
FROM sales
) ranked
WHERE rnk = 2;

🔹 Performance & Optimization

11. What is partitioning? How does it improve performance?
Answer:
Partitioning divides large tables into smaller, more manageable
parts (by date, region, etc.).
 Speeds up query performance via partition pruning.
 Makes ETL jobs faster and easier to maintain.
Avoids scanning the entire table for filtered queries.

12. Difference between clustered and non-clustered indexes.

Answer:
 Clustered Index: Alters physical order of table. Only one
allowed.
 Non-Clustered Index: Separate structure; multiple allowed. Use
clustered on frequently queried column, non-clustered for
filters and joins.

13. How do materialized views help in warehouses?

Answer:
Materialized views pre-compute and store results of heavy
queries.
 Boost performance for repetitive queries.
 Scheduled refreshes maintain accuracy.
Ideal for aggregated dashboards or complex joins.

14. How would you optimize a slow query on a 10TB fact

table?
Answer:
 Use partitioning and indexing.
 Filter early, avoid SELECT *, use projections.
 Use CTEs or materialized views.
 Apply columnar formats (like Parquet).
 Analyze query plan and tune joins.

15. Denormalization vs normalization: trade-offs?

Answer:
 Denormalization: Faster reads, redundancy risk, storage-heavy.
 Normalization: Efficient storage, slower queries, complex joins.
In warehousing, denormalization preferred for read-heavy
OLAP use cases.

🔹 ETL & Data Pipeline Scenarios

16. How do you implement Change Data Capture (CDC)?
Answer:
CDC tracks changes in source tables using:
 Timestamps (updated_at column)
 Triggers/log-based CDC (e.g., Debezium, SQL Server CDC)
 Merge or UPSERT operations in target warehouse

17. How do you handle incremental loading?

Answer:
 Filter data using WHERE updated_at > last_sync_time.
 Use MERGE or UPSERT to avoid duplicates.
 Maintain a watermark table to track last load timestamp.

18. How do you ensure idempotency in pipelines?

Answer:
 Use unique constraints + MERGE statements
 Deduplicate using ROW_NUMBER()
 Write to staging first, validate, then load final tables
Ensures reruns don’t cause data duplication.
19. How do you reconcile large volumes of data post-
ingestion?
Answer:
 Row count validation
 Checksum or hash comparison
 Summary aggregates (min/max/sum)
 Automate using data quality frameworks (e.g., Great
Expectations)

20. How do you handle schema evolution in a data

warehouse?
Answer:
 Use tools that support schema-on-read (BigQuery, Delta Lake)
 Maintain versioned schemas
 Use nullable fields or JSON columns for flexibility
 Backfill missing data when new columns are added

Data Warehouse Interview Questions
No ratings yet
Data Warehouse Interview Questions
5 pages
DW ETLprep
No ratings yet
DW ETLprep
14 pages
Data Warehousing and ETL Concepts Guide
No ratings yet
Data Warehousing and ETL Concepts Guide
5 pages
DWH
No ratings yet
DWH
5 pages
Barclays Data Engineer Interview Questions
No ratings yet
Barclays Data Engineer Interview Questions
17 pages
SQL DM1
No ratings yet
SQL DM1
5 pages
Test 2 SE130560
100% (1)
Test 2 SE130560
4 pages
2 Data Warehouse
No ratings yet
2 Data Warehouse
61 pages
SQL Full Notes
No ratings yet
SQL Full Notes
17 pages
MCQ DWH
No ratings yet
MCQ DWH
6 pages
Data Warehouse Interview Questions
No ratings yet
Data Warehouse Interview Questions
2 pages
Warehousing
No ratings yet
Warehousing
13 pages
Unit I DMT
No ratings yet
Unit I DMT
74 pages
DWM Unit-IV
No ratings yet
DWM Unit-IV
27 pages
DWH Faqs
No ratings yet
DWH Faqs
13 pages
Business Intelligence Interview Questions and Answer
No ratings yet
Business Intelligence Interview Questions and Answer
12 pages
Introduction To Data Warehouse
No ratings yet
Introduction To Data Warehouse
22 pages
Mastercard Data Engineer Interview Questions
No ratings yet
Mastercard Data Engineer Interview Questions
16 pages
ADBMS Assignment 2
No ratings yet
ADBMS Assignment 2
16 pages
2 Data Warehouse 2
No ratings yet
2 Data Warehouse 2
57 pages
Data Warehousing One Mark Questions
No ratings yet
Data Warehousing One Mark Questions
5 pages
Comprehensive ETL and Data Warehousing Guide
100% (1)
Comprehensive ETL and Data Warehousing Guide
4 pages
CS614MCQs Spring13 solvedbyDrTariqhanif1
No ratings yet
CS614MCQs Spring13 solvedbyDrTariqhanif1
22 pages
SE130351 - NgoNhat Thien - DBW301 - Test2
No ratings yet
SE130351 - NgoNhat Thien - DBW301 - Test2
4 pages
Datadgeling
No ratings yet
Datadgeling
22 pages
SQL Questionnaire
No ratings yet
SQL Questionnaire
8 pages
ETL Testing Concepts V16
No ratings yet
ETL Testing Concepts V16
35 pages
DW Questions
0% (1)
DW Questions
35 pages
Data Warehouse Design Principles
No ratings yet
Data Warehouse Design Principles
75 pages
BSC - It 53 (Data Warehhousing)
No ratings yet
BSC - It 53 (Data Warehhousing)
16 pages
DW - Unit 2
No ratings yet
DW - Unit 2
11 pages
12 20 - 2 Mark Questions With Answers
No ratings yet
12 20 - 2 Mark Questions With Answers
6 pages
2 Data Warehouse 2
No ratings yet
2 Data Warehouse 2
57 pages
Answer Key Model Data Warehousing
No ratings yet
Answer Key Model Data Warehousing
48 pages
2 Datawarehouse 2
No ratings yet
2 Datawarehouse 2
57 pages
ETL Interview Questions for Informatica
No ratings yet
ETL Interview Questions for Informatica
143 pages
DWM - Viva and Short Question Answers
No ratings yet
DWM - Viva and Short Question Answers
24 pages
Data Warehousin G Concepts
No ratings yet
Data Warehousin G Concepts
41 pages
Data Warehouse Concepts Explained
No ratings yet
Data Warehouse Concepts Explained
9 pages
Interview Questions and Answar
No ratings yet
Interview Questions and Answar
22 pages
Data Modelling
No ratings yet
Data Modelling
16 pages
INFA FAQs
No ratings yet
INFA FAQs
7 pages
PDF Data Engineering Interview Questions and Answers
No ratings yet
PDF Data Engineering Interview Questions and Answers
18 pages
Want To Ace ETL Testing Interview - Watch This Now
No ratings yet
Want To Ace ETL Testing Interview - Watch This Now
7 pages
7 - Data Warehousing & Data Modelling - DE - Feb25
No ratings yet
7 - Data Warehousing & Data Modelling - DE - Feb25
18 pages
Understanding Data Warehousing Concepts
No ratings yet
Understanding Data Warehousing Concepts
12 pages
5.data Warehousing Interview Questions
No ratings yet
5.data Warehousing Interview Questions
4 pages
Data Engineer Interview Prep
100% (1)
Data Engineer Interview Prep
8 pages
Data Warehousing Concepts Explained
No ratings yet
Data Warehousing Concepts Explained
20 pages
SE130336 Test 2
No ratings yet
SE130336 Test 2
6 pages
ETL Testing - Concepts - V24
No ratings yet
ETL Testing - Concepts - V24
60 pages
Data Warehousing & OLAP Guide
No ratings yet
Data Warehousing & OLAP Guide
20 pages
ICAI Intermediate Course Registration Guide
No ratings yet
ICAI Intermediate Course Registration Guide
5 pages
Error Log
No ratings yet
Error Log
2 pages
Biometric System Performance Guide
No ratings yet
Biometric System Performance Guide
14 pages
Abap Certification Questions
100% (3)
Abap Certification Questions
17 pages
App Sheet Diagram
100% (1)
App Sheet Diagram
1 page
Maid Hiring Management System
No ratings yet
Maid Hiring Management System
32 pages
SAP BI Extended Star Schema Guide
No ratings yet
SAP BI Extended Star Schema Guide
2 pages
Arun Agasanakoppa: IT Expertise Overview
No ratings yet
Arun Agasanakoppa: IT Expertise Overview
6 pages
Modified Slides From Martin Roesch Sourcefire Inc
No ratings yet
Modified Slides From Martin Roesch Sourcefire Inc
31 pages
Cisco Network Foundation Protection Guide
No ratings yet
Cisco Network Foundation Protection Guide
6 pages
Microsoft Official Course: Deploying and Managing Windows Server 2012
100% (1)
Microsoft Official Course: Deploying and Managing Windows Server 2012
38 pages
001-Azure - Azure Blueprints
No ratings yet
001-Azure - Azure Blueprints
227 pages
PCIDSS Compliance Requirement Checklist 2020
No ratings yet
PCIDSS Compliance Requirement Checklist 2020
6 pages
OWASP 10 Web Security Risks - Class PPT New
No ratings yet
OWASP 10 Web Security Risks - Class PPT New
26 pages
Creating A Boiler Plant Dashboard in Power BI Involves Several Steps
No ratings yet
Creating A Boiler Plant Dashboard in Power BI Involves Several Steps
2 pages
EdgeSight Install Guide
No ratings yet
EdgeSight Install Guide
78 pages
Essential Firewall Security Practices
No ratings yet
Essential Firewall Security Practices
2 pages
Multi-Node Hadoop Cluster on AWS EC2
No ratings yet
Multi-Node Hadoop Cluster on AWS EC2
25 pages
FinalReport CSC305 CarRentalSystem
100% (1)
FinalReport CSC305 CarRentalSystem
34 pages
Data Warehousing and OLAP Technology For Data Mining
No ratings yet
Data Warehousing and OLAP Technology For Data Mining
30 pages
Computer Programming For Beginners. Fundamentals of Programming
100% (3)
Computer Programming For Beginners. Fundamentals of Programming
146 pages
Data Mining and Warehouse Lab Manual
100% (1)
Data Mining and Warehouse Lab Manual
69 pages
Rupali Shukla - Computer Science Profile
No ratings yet
Rupali Shukla - Computer Science Profile
1 page
IT Landscape Management Guide
No ratings yet
IT Landscape Management Guide
6 pages
Project Report On Online Shopping in JAVA
83% (35)
Project Report On Online Shopping in JAVA
75 pages
E-Commerce REST API Feature Specification
No ratings yet
E-Commerce REST API Feature Specification
9 pages
CMPE 226 Database Systems Overview
No ratings yet
CMPE 226 Database Systems Overview
7 pages
Set Your Data in Motion
No ratings yet
Set Your Data in Motion
8 pages
Dreambox File Management Guide
No ratings yet
Dreambox File Management Guide
19 pages
Hostel Mess Administration Report
No ratings yet
Hostel Mess Administration Report
40 pages

Interview

Uploaded by

Interview

Uploaded by

DATA ENGINEERING

SQL for Analytics & Data Warehousing

Performance & Optimization

ETL & Data Pipeline Scenarios

Data Warehousing Concepts & Architecture

2. Compare Star vs Snowflake schema. When would you choose

3. What is a surrogate key and why is it used?

5. How do you handle late-arriving dimensions in a warehouse?

SQL for Analytics & Data Warehousing

7. How do you deduplicate a large fact table with multiple

8. Explain how CTEs help modularize complex warehouse

CTEs (Common Table Expressions) break queries into readable,

9. Describe a scenario where you used a CROSS JOIN effectively.

🔹 Performance & Optimization

12. Difference between clustered and non-clustered indexes.

13. How do materialized views help in warehouses?

14. How would you optimize a slow query on a 10TB fact

15. Denormalization vs normalization: trade-offs?

🔹 ETL & Data Pipeline Scenarios

17. How do you handle incremental loading?

18. How do you ensure idempotency in pipelines?

20. How do you handle schema evolution in a data

You might also like