0% found this document useful (0 votes)

15 views9 pages

Data Bricks

The document provides a series of scenario-based interview questions and answers related to data engineering, specifically focusing on PySpark pipelines, real-time joins, snapshot tables, serverless ingestion patterns, tokenization strategies, upsert logic, and data ingestion from various sources. It highlights common struggles faced by candidates and emphasizes the importance of preparation for interviews in the field. Prominent Academy offers services such as mock interviews, hands-on training, and personalized coaching to help candidates succeed in their job search.

Uploaded by

Hariom Damekar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views9 pages

Data Bricks

Uploaded by

Hariom Damekar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

scenario-

based
interview
questions
www.prominentacademy.in
+91 98604 38743
Question : How can you implement automated testing for
PySpark pipelines across CI/CD stages?

Answer :
Use Flow or Apex to auto-create Contract when
Opportunity moves to Closed Won
Populate fields like Start Date, End Date, Amount,
Product summaryUse pytest + chispa (for DataFrame
comparison)
Example test:
Run tests in GitHub Actions or Azure DevOps

python

from chispa import assert_df_equality

def test_cleaning_logic(spark):
input_df = spark.createDataFrame([...])
expected_df = spark.createDataFrame([...])
result_df = cleaning_func(input_df)
assert_df_equality(result_df, expected_df)

⚠️ Common Struggles:
❌ No mock data
❌ Only testing schema, not content

Your next opportunity is closer than you think. Let’s get you there!
📞 Don’t wait—call us at +91 98604 38743 today
Question : How do you perform a real-time join between
a large fact stream and a small dimension table while
minimizing skew?

Answer :
Use broadcast join for small static dimensions
Use salting technique if both are large

python

dim_df = spark.read.parquet("/mnt/dim").withColumn("salt",
lit(rand()))
fact_df = spark.readStream... \
.withColumn("salt", expr("CAST(rand() * 10 AS INT)"))
joined = fact_df.join(dim_df, ["join_key", "salt"])

⚠️ Common Struggles:
❌ Joining large tables directly without optimization
❌ Not monitoring shuffle partitions and skew

Your next opportunity is closer than you think. Let’s get you there!
📞 Don’t wait—call us at +91 98604 38743 today
Question : How do you design a snapshot table to track
incremental daily changes without full reload?

Answer :
Use MERGE INTO + effective_date columns:
Maintain history using versioned Delta + Z-Ordering

sql

MERGE INTO snapshot_table tgt

USING daily_input src
ON tgt.id = src.id AND tgt.effective_date = current_date()
WHEN MATCHED THEN UPDATE SET *
WHEN NOT MATCHED THEN INSERT *

⚠️ Common Struggles:
❌ Overwriting historical data
❌ No time partition → slow queries

Your next opportunity is closer than you think. Let’s get you there!
📞 Don’t wait—call us at +91 98604 38743 today
Question : How do you build a serverless ingestion
pattern triggered on file arrival in Azure?

Answer :
Configure Event Grid on ADLS Gen2 → Webhook to
Azure Function
Azure Function calls Databricks job using REST API

python

requests.post("https://<workspace>/api/2.1/jobs/run-now", json=
{"job_id": job_id})

⚠️ Common Struggles:
❌ Polling ADLS instead of using events
❌ No retry mechanism on Event Grid webhook failure

Your next opportunity is closer than you think. Let’s get you there!
📞 Don’t wait—call us at +91 98604 38743 today
Question : How do you design a tokenization strategy for
PCI/PII data in Delta Lake while keeping raw access
restricted?

Answer :
Use hashing or encryption on sensitive fields before
writing:
Store mapping table (ID ↔
token) in secure, access-
restricted zone

python

from cryptography.fernet import Fernet

cipher = Fernet(key)
df = df.withColumn("ssn_token", sha2("ssn", 256)) # or
cipher.encrypt(...)

⚠️ Common Struggles:
❌ Storing raw + token in the same Delta table
❌ Using reversible masking without encryption

Your next opportunity is closer than you think. Let’s get you there!
📞 Don’t wait—call us at +91 98604 38743 today
Question : How do you implement upsert logic in a
streaming write with dynamic partition overwrite?

Answer :
Use foreachBatch with MERGE INTO inside:

python

def upsert_to_delta(batch_df, batch_id):

batch_df.createOrReplaceTempView("updates")
spark.sql(\"\"\"
MERGE INTO silver_table tgt
USING updates src
ON tgt.id = src.id
WHEN MATCHED THEN UPDATE SET *
WHEN NOT MATCHED THEN INSERT *
\"\"\")
stream_df.writeStream.foreachBatch(upsert_to_delta).start()

⚠️ Common Struggles:
❌ Using append mode with duplicates
❌ Overwriting full partitions — causes high I/O

Your next opportunity is closer than you think. Let’s get you there!
📞 Don’t wait—call us at +91 98604 38743 today
Question : How do you ingest data from Kafka, Azure SQL,
and Blob Storage into a unified Delta Lake model?

Answer :
Kafka → Structured Streaming
Azure SQL → incremental copy using ADF or JDBC
Blob → AutoLoader for schema inference
Then merge:

python

df_kafka = spark.readStream...
df_sql = spark.read.format("jdbc")...
df_blob = spark.readStream.format("cloudFiles")...

merged_df = df_kafka.unionByName(df_sql).unionByName(df_blob)

⚠️ Common Struggles:
❌ Schema mismatch across sources
❌ Inconsistent ingestion frequency or deduplication

Your next opportunity is closer than you think. Let’s get you there!
📞 Don’t wait—call us at +91 98604 38743 today
#AzureSynapse #DataEngineering
#InterviewPreparation #JobReady
#MockInterviews #Deloitte #CareerSuccess
#ProminentAcademy

❌Think your skills are enough?

Think again—these Data engineer
scenario-based questions could cost you
your data engineering job.
In a recent interview at many big MNC’s, one of our
students faced scenario-based questions related to
data engineering, and many candidates struggled to
answer them correctly. These questions are designed
to test your real-world knowledge and ability to solve
complex data engineering problems.

Unfortunately, many students failed to answer these

questions confidently. The truth is, preparation is key,
and that’s where Prominent Academy comes in!
We specialize in preparing you for spark and data

✅
engineering interviews by:

✅
Offering scenario-based mock interviews
Providing hands-on training with data engineering

✅
features

✅
Optimizing your resume & LinkedIn profile
Giving personalized interview coaching to ensure
you’re job-ready
Don’t leave your future to chance!

📞Call us at +91 98604 38743and get the

interview prep you need to succeed

Publicis Sapient Pyspark
No ratings yet
Publicis Sapient Pyspark
10 pages
? Knows Syntax, But Can't Debug DAG - Failed in Final Round
No ratings yet
? Knows Syntax, But Can't Debug DAG - Failed in Final Round
10 pages
?stuck in A Loop of Rejections - Let's Break The Cycle!?
No ratings yet
?stuck in A Loop of Rejections - Let's Break The Cycle!?
7 pages
Interview
No ratings yet
Interview
2 pages
Infosys Data Engineering Questions and Answers - 2025
No ratings yet
Infosys Data Engineering Questions and Answers - 2025
25 pages
Py 1731703428
No ratings yet
Py 1731703428
8 pages
Data Engineer Interview at A Top Product-Based Company
No ratings yet
Data Engineer Interview at A Top Product-Based Company
7 pages
LTIMindtree Data Lake Round
No ratings yet
LTIMindtree Data Lake Round
10 pages
Azure Databricks Data Engineering Guide
No ratings yet
Azure Databricks Data Engineering Guide
8 pages
Deloitte Scenario-Based Questions in Spark
No ratings yet
Deloitte Scenario-Based Questions in Spark
7 pages
TCS Rejected Many Due To Weak PySpark Logic!?
No ratings yet
TCS Rejected Many Due To Weak PySpark Logic!?
7 pages
Spark Interview Questions
No ratings yet
Spark Interview Questions
4 pages
PDF Data Engineering Interview Questions and Answers
No ratings yet
PDF Data Engineering Interview Questions and Answers
18 pages
Deloitte Data Engineer
No ratings yet
Deloitte Data Engineer
7 pages
SCD Type-2 with Pandas in Spark
0% (1)
SCD Type-2 with Pandas in Spark
8 pages
Deloitte Pyspark Interview Questions For Data Engineer 2024 - by Ronit Malhotra - Jun, 2024 - Medium
No ratings yet
Deloitte Pyspark Interview Questions For Data Engineer 2024 - by Ronit Malhotra - Jun, 2024 - Medium
9 pages
Data Extraction in Engineering
No ratings yet
Data Extraction in Engineering
31 pages
Databricks Certified Data Engineer Associate Exam Guide 25
No ratings yet
Databricks Certified Data Engineer Associate Exam Guide 25
10 pages
Databricks Certified Data Engineer Professional Exam Guide 1 Mar 2025
No ratings yet
Databricks Certified Data Engineer Professional Exam Guide 1 Mar 2025
6 pages
New Microsoft Word Document
No ratings yet
New Microsoft Word Document
3 pages
Pyspark Interview Questions
No ratings yet
Pyspark Interview Questions
9 pages
Data Engineer Interview Prep
No ratings yet
Data Engineer Interview Prep
27 pages
Spark Handbook
No ratings yet
Spark Handbook
7 pages
Databricks Certified Data Engineer Associate 9
100% (1)
Databricks Certified Data Engineer Associate 9
12 pages
EoDA Open QA
No ratings yet
EoDA Open QA
1 page
The Data Engineering Team Is Configuring Environments For Devonentg New Data Pipeline
No ratings yet
The Data Engineering Team Is Configuring Environments For Devonentg New Data Pipeline
3 pages
PySpark Optimization Interview Scenarios
No ratings yet
PySpark Optimization Interview Scenarios
8 pages
Data Integration & Modeling Guide
No ratings yet
Data Integration & Modeling Guide
27 pages
Deloitee Data Engineer Interview Questions
100% (1)
Deloitee Data Engineer Interview Questions
24 pages
Azure Comapny Wise Question
No ratings yet
Azure Comapny Wise Question
68 pages
Data Engineering Agenda
No ratings yet
Data Engineering Agenda
19 pages
Data Pipelines From Zero To Solid
No ratings yet
Data Pipelines From Zero To Solid
58 pages
CT 2
No ratings yet
CT 2
8 pages
Python and Pyspark With Databricks, With Azure Project
No ratings yet
Python and Pyspark With Databricks, With Azure Project
9 pages
Databricks Data Engineer Professional
No ratings yet
Databricks Data Engineer Professional
98 pages
Spark Scenario Based Interview Questions !! For Interview
No ratings yet
Spark Scenario Based Interview Questions !! For Interview
4 pages
Deloitte Data Engineer Interview Experience (0-3 Yoe)
No ratings yet
Deloitte Data Engineer Interview Experience (0-3 Yoe)
22 pages
Data Engineer
No ratings yet
Data Engineer
12 pages
22241A66C5 Assignment21
No ratings yet
22241A66C5 Assignment21
16 pages
19 Databricks
No ratings yet
19 Databricks
28 pages
Data Engineer Interview Questions With Examples
No ratings yet
Data Engineer Interview Questions With Examples
8 pages
PySpark Interview Questions & Answers
No ratings yet
PySpark Interview Questions & Answers
5 pages
12 - DataEngineer - Interview - Questions and Answers - EPAM Anywhere
No ratings yet
12 - DataEngineer - Interview - Questions and Answers - EPAM Anywhere
2 pages
Azure Data Engineer Interview Questions - Part 1
No ratings yet
Azure Data Engineer Interview Questions - Part 1
19 pages
DBDE Associate Exam Readiness Session Set2
No ratings yet
DBDE Associate Exam Readiness Session Set2
56 pages
Analyzing Unstructured Data in Hadoop
No ratings yet
Analyzing Unstructured Data in Hadoop
5 pages
Deloite Data Engineer Interview Questions
No ratings yet
Deloite Data Engineer Interview Questions
24 pages
Mastercard Data Engineer Interview Questions
No ratings yet
Mastercard Data Engineer Interview Questions
16 pages
Data Engineering Optimization Best Practices
No ratings yet
Data Engineering Optimization Best Practices
53 pages
MasterCard Data Engineering
No ratings yet
MasterCard Data Engineering
17 pages
Data Engineer
No ratings yet
Data Engineer
19 pages
Incremenatla Load
No ratings yet
Incremenatla Load
16 pages
Null 4
No ratings yet
Null 4
46 pages
DataGrokr Technical Assignment - Data Engineering - Internshala
No ratings yet
DataGrokr Technical Assignment - Data Engineering - Internshala
5 pages
Big Data Systems A Software Engineering Perspective
No ratings yet
Big Data Systems A Software Engineering Perspective
39 pages
Report Zazmic Inc. Senior Middle Data Engineer Hiring Test AWS Snowflake Databricks Python SQL Kalgaonkarsiddhesh
No ratings yet
Report Zazmic Inc. Senior Middle Data Engineer Hiring Test AWS Snowflake Databricks Python SQL Kalgaonkarsiddhesh
36 pages
Databricks Certified Data Engineer Professional Real Questions
No ratings yet
Databricks Certified Data Engineer Professional Real Questions
5 pages
Top 100+ Data Engineer Interview Questions and Answers For 2022
No ratings yet
Top 100+ Data Engineer Interview Questions and Answers For 2022
4 pages
NIT Raipur B.Tech Spring 2019 Results
No ratings yet
NIT Raipur B.Tech Spring 2019 Results
1 page
Understanding Business Intelligence
No ratings yet
Understanding Business Intelligence
5 pages
HW2 24
No ratings yet
HW2 24
8 pages
KP3 Plus MIDIimp
No ratings yet
KP3 Plus MIDIimp
13 pages
Sprites: Sprite Animation
No ratings yet
Sprites: Sprite Animation
5 pages
Chapter # 1 (Management and Leadership) : Mind Maps
No ratings yet
Chapter # 1 (Management and Leadership) : Mind Maps
11 pages
Risk Analysis - Assignment 1
No ratings yet
Risk Analysis - Assignment 1
2 pages
Best Practices Workshop: Overset Meshing
No ratings yet
Best Practices Workshop: Overset Meshing
21 pages
Guide - Class Setups - Post-Moon Lord - Official Calamity Mod Wiki
No ratings yet
Guide - Class Setups - Post-Moon Lord - Official Calamity Mod Wiki
15 pages
Chris Moreno - Google Search
No ratings yet
Chris Moreno - Google Search
1 page
Bioinspo Capstone Pujitha
No ratings yet
Bioinspo Capstone Pujitha
25 pages
Amazon Manager II Risk Mining Interview Preparation Guide
No ratings yet
Amazon Manager II Risk Mining Interview Preparation Guide
11 pages
Signature and Photo Upload Guidelines
No ratings yet
Signature and Photo Upload Guidelines
2 pages
VLSI SRAM Integration Guide
No ratings yet
VLSI SRAM Integration Guide
10 pages
Mikrotik HTTPS PDF
No ratings yet
Mikrotik HTTPS PDF
20 pages
Modern Marketing Communication in Tourism
No ratings yet
Modern Marketing Communication in Tourism
5 pages
Computers Buyers & Importers in India
No ratings yet
Computers Buyers & Importers in India
7 pages
PMP ITTO Process Chart PMBOK Guide 6th Edition-1a
No ratings yet
PMP ITTO Process Chart PMBOK Guide 6th Edition-1a
14 pages
9788advancing Into Analytics From Excel To Python and R 1st Edition Mount George Download
100% (2)
9788advancing Into Analytics From Excel To Python and R 1st Edition Mount George Download
62 pages
Elekta The Theory and Operation of Computer-Controlled Medical Linear 8280-44071
No ratings yet
Elekta The Theory and Operation of Computer-Controlled Medical Linear 8280-44071
19 pages
SAP HR T Codes With Details
No ratings yet
SAP HR T Codes With Details
14 pages
Computer Awareness 1 PDF
No ratings yet
Computer Awareness 1 PDF
17 pages
FNIRSI Catalog (9 (2025-02-02 21 - 53 - 10)
No ratings yet
FNIRSI Catalog (9 (2025-02-02 21 - 53 - 10)
51 pages
HTML Table Styling Guide
No ratings yet
HTML Table Styling Guide
25 pages
Bank Account Debit Authorization Form
No ratings yet
Bank Account Debit Authorization Form
1 page
BITS PILANI Operation Management Syllabus
No ratings yet
BITS PILANI Operation Management Syllabus
12 pages
Mastering ChatGPT: Effective Usage Guide
No ratings yet
Mastering ChatGPT: Effective Usage Guide
10 pages
Ceia THS-21
No ratings yet
Ceia THS-21
6 pages
CNC GRBL Settings Tuning Guide
No ratings yet
CNC GRBL Settings Tuning Guide
15 pages
Aruba Switches for Partners
No ratings yet
Aruba Switches for Partners
2 pages

Data Bricks

Uploaded by

Data Bricks

Uploaded by

scenario-

from chispa import assert_df_equality

MERGE INTO snapshot_table tgt

from cryptography.fernet import Fernet

def upsert_to_delta(batch_df, batch_id):

❌Think your skills are enough?

Unfortunately, many students failed to answer these

📞Call us at +91 98604 38743and get the

You might also like