0% found this document useful (0 votes)

111 views6 pages

Data Engineering Interview Prep

The documents provide interview questions for various companies. The questions cover topics like Spark, Python, SQL, data modeling, AWS services, streaming, and more. They are technical questions to assess a candidate's skills and knowledge in big data, cloud, and related domains.

Uploaded by

venket s

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

111 views6 pages

Data Engineering Interview Prep

Uploaded by

venket s

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Interview Questions-Visa

Manager round

1.Explain your project

2.What are the optimizations you have worked on in spark?

3.What is shuffling?explain

4.any scenario where you deployed a code and experienced failure alerts

5.Difference between coalesce and repartition

6.Any lessons learnt while leading the team

7.Broadcast join

8.Explain your project

Technical round

1.Python question

find all unique pairs of numbers in an array,N which sum to a value s=15
n=[1,9,42,6,2,0,14,15]

2.explain hash table and hash function

3.how do you handle long running jobs in spark?

4.how do you handle data skewness?

5. two tables

merchant_volume,merchant_name,volume are columns

merchant_category,category and merchant_name

sql query to select top merchant of each category

a.What happens if different categories have same merchant name

CELEBAL TECHNOLOGIES

1ST TECHNICAL ROUND

1.explain project story

2.Databricks runtime

3.difference between csv and parquet

4.transformations used in project

5.is it possible to union 2 df with different schema?How can we do it?

6.find non matches between 2 df

7.operators used in airflow

8. what happens if a incremental daily file does not come on a day in databricks

9.How is incremental load done in databricks?

10.higher order functions and anonymous functions in scala

11.is pyspark and sparksql same in terms of execution?difference

2nd round:

AWS Design round

1.Different redshift clusters

2.glue crawlers and what happens if schema changes?

3.different ec2 instances

4.why redshift does not allow primary keys?

5.3 data sources are there and client wants one single source of data?how will the
data modelling be?

6.what is delta load?

7.difference between incremental load and CDC

8.difference between data lake and delta lake

9.difference between athena and redshift

10.how can we increase execution time of lambda?

11.service used to migrate databases?

12.different s3 storage levels and difference

13.how can glue job be triggered?what if one job depends on another?

SMART CUBE

2nd Round:

1.Explain a challenging situation faced in project

2.What is denormalization?

3.difference between union and union all

4.list comprehension

5.lambda functions

6.data structures used

7.find even elements from a list

8.display only the unmatched records from two list

9.why pandas is preferred over spark?

10.how to explode a nested json into row and column in pyspark?

11.what happens internally when we submit spark job

3rd Round:
1.describe any complex architecture you built

2.aws services used

3.find duplicates in a df

4.top 2 customers per month with highest sales(sql)

5.list=[1,2,3,4,5,6]

Find the sum of the odd indexes with and without built-in functions

COFORGE

ROUND 1

1.DIFFERENCE BETWEEN RANK AND DENSE_Rank

2.DEEP COPY VS SHALLOW COPY

3.DATAFRAME VS SERIES

4.LIST VS TUPLE

5.GROUPBYKEY VS REDUCEBYKEY

6.GLUE RESIDES ON MEMORY?

7.RDD VS DATAFRAME

8.SYNCHRONOUS AND ASYNCHRONOUS FUNCTIONS IN LAMBDA

WALMART

ROUND 1

1.find the maximum length of the subset of array having sum as 0

2.find expiry date by adding remaining days to recharge date in pyspark

3.find the count of top trending hashtags but duplicates would not count in the same line
4.spark architecture

5.spark optimizations

6.partitioning in spark

7.yarn architecture

8.shuffle partitions

9.rank vs dense rank

10.data skewness

11.airflow architecture

PUBLICIS

ROUND 2:

1.SERVICES WORKED ON IN AWS

2.PARTITIONING IN HIVE

3.JOBS,STAGE,TASKS IN SPARK

4.SPARK ARCHITECTURE
5.LST=[a,a,b,b,c,c]

Find count of occurrences in python and pyspark

6.airflow architecture

7.project architecture

EPAM

ROUND 2:

1.AWS GLUE ,3,EMR

2.TRANSIENT AND LONG RUNNING JOB IN EMR

3.STEP EXECUTION IN EMR

4.BACKEND OF LAMBDA

5.SYNCHRONOUS AND ASYNCHRONOUS IN LAMBDA

6.spark optimizations

7.relation between cpu cores and partitions

8.ways to solve data skewness

9.can we do repartition on columns

10.adequate query execution in spark

11.generators and decorators

12.list comprehension

13.scd implementation using pyspark

14.args in python

15.checkpointing in spark

16.limitations of lambda

17.where can we see the logs of emr

18.difference between data lake and delta lake

19.serialisation in spark

20.checkpointing in spark

WALMART

ROUND 2:

1.WHAT IS DATA SPILLING?

2.HOW DO YOU DEFINE THE NUMBER OF SHUFFLE PARTITIONS WITH A FILE OF 500 GB AND 10 GB
EXECUTOR MEMORY?

3.Broadcast join and Sort merge join

4.broadcast nested loop join

5.Shuffle partitions concepts

6.spark streaming

7.sql leetcode

8.data spilling

9.how to identify long running jobs

10.how to assign resources to spark jobs

11.case class in scala

12.z-order

13.how to solve out of memory errors in spark?

Important Interview Qa
No ratings yet
Important Interview Qa
13 pages
Senior Data Engineer Qna
No ratings yet
Senior Data Engineer Qna
4 pages
Walmart Data Engineering Question
No ratings yet
Walmart Data Engineering Question
10 pages
Big Data Introduction
No ratings yet
Big Data Introduction
5 pages
Interview Questions For 5 Yrs of Exp
No ratings yet
Interview Questions For 5 Yrs of Exp
6 pages
Data Engineering Part - 2
No ratings yet
Data Engineering Part - 2
21 pages
Tech Mahindra
No ratings yet
Tech Mahindra
2 pages
Data Engineer Interview Questions With Examples
No ratings yet
Data Engineer Interview Questions With Examples
8 pages
New Microsoft Word Document
No ratings yet
New Microsoft Word Document
3 pages
Senior Data Engineer Qs
No ratings yet
Senior Data Engineer Qs
7 pages
Question
No ratings yet
Question
6 pages
Apache Spark Interview Prep Guide
No ratings yet
Apache Spark Interview Prep Guide
18 pages
Pyspark 4
No ratings yet
Pyspark 4
5 pages
© Shubham Wadekar: JP Morgan & Chase Data Engineer Interview Guide - Experienced
No ratings yet
© Shubham Wadekar: JP Morgan & Chase Data Engineer Interview Guide - Experienced
9 pages
Pyspark Interview Q & A in Topic Wise
No ratings yet
Pyspark Interview Q & A in Topic Wise
5 pages
18-22LPA Important Interview Questions On: Harshavardhana I Data Engineer
No ratings yet
18-22LPA Important Interview Questions On: Harshavardhana I Data Engineer
8 pages
Data Engineer
No ratings yet
Data Engineer
12 pages
Interviewsss
No ratings yet
Interviewsss
4 pages
EoDA Open QA
No ratings yet
EoDA Open QA
1 page
Interview Tips & Tricks
No ratings yet
Interview Tips & Tricks
5 pages
My Walmart Interviewexperience Answers
No ratings yet
My Walmart Interviewexperience Answers
13 pages
2025 Pyspark Interview Questions Collections
No ratings yet
2025 Pyspark Interview Questions Collections
50 pages
Data Engineer
No ratings yet
Data Engineer
19 pages
PySpark Interview Questions & Answers
No ratings yet
PySpark Interview Questions & Answers
5 pages
Spark vs Hadoop: Key Concepts Explained
No ratings yet
Spark vs Hadoop: Key Concepts Explained
3 pages
SQL Important Revision
No ratings yet
SQL Important Revision
3 pages
Spark Interview Questions
No ratings yet
Spark Interview Questions
4 pages
Q2
No ratings yet
Q2
2 pages
Spark Mock Interview Questions Guide
No ratings yet
Spark Mock Interview Questions Guide
2 pages
Data Engineer Interview Prep
No ratings yet
Data Engineer Interview Prep
27 pages
? Knows Syntax, But Can't Debug DAG - Failed in Final Round
No ratings yet
? Knows Syntax, But Can't Debug DAG - Failed in Final Round
10 pages
Data Engineer Preparation
No ratings yet
Data Engineer Preparation
5 pages
New Questions From Batch
No ratings yet
New Questions From Batch
7 pages
19 Databricks
No ratings yet
19 Databricks
28 pages
Myntra 1751087310
No ratings yet
Myntra 1751087310
10 pages
PySpark & ADF Interview Prep
No ratings yet
PySpark & ADF Interview Prep
1 page
Spark Development for Developers
No ratings yet
Spark Development for Developers
172 pages
Spark Scenario Based Interview Questions !! For Interview
No ratings yet
Spark Scenario Based Interview Questions !! For Interview
4 pages
Myntra Data Engineer Interview Guide
No ratings yet
Myntra Data Engineer Interview Guide
4 pages
Azure Data Engineer Scenario Based Interview Questions
No ratings yet
Azure Data Engineer Scenario Based Interview Questions
2 pages
Pyspark STAR Questions
No ratings yet
Pyspark STAR Questions
21 pages
Interview Q & A (SQL Spark HIVE Airflow AWS Kafka) - 1
No ratings yet
Interview Q & A (SQL Spark HIVE Airflow AWS Kafka) - 1
25 pages
Ade Companywise Interview
No ratings yet
Ade Companywise Interview
133 pages
Marketing Questions - Updated
No ratings yet
Marketing Questions - Updated
6 pages
Interview Questions
No ratings yet
Interview Questions
2 pages
Deloitte Pyspark Interview Questions For Data Engineer 2024 - by Ronit Malhotra - Jun, 2024 - Medium
No ratings yet
Deloitte Pyspark Interview Questions For Data Engineer 2024 - by Ronit Malhotra - Jun, 2024 - Medium
9 pages
Amazon, Data Engineer I - Interview Experience.
No ratings yet
Amazon, Data Engineer I - Interview Experience.
3 pages
WT Ha M /: The Big Data Fix Book
No ratings yet
WT Ha M /: The Big Data Fix Book
52 pages
Data Engineering Roadmap For Freshers & Resources
No ratings yet
Data Engineering Roadmap For Freshers & Resources
6 pages
MasterCard Data Engineering
No ratings yet
MasterCard Data Engineering
17 pages
Azure Comapny Wise Question
No ratings yet
Azure Comapny Wise Question
68 pages
Understanding Apache Spark Architecture
0% (1)
Understanding Apache Spark Architecture
30 pages
@Q - B@Snowflake & AWS
No ratings yet
@Q - B@Snowflake & AWS
17 pages
Amazon Data Engineer Interview Guide
No ratings yet
Amazon Data Engineer Interview Guide
3 pages
BigData - Recent Interview Q's
No ratings yet
BigData - Recent Interview Q's
25 pages
Pyspark Scenario Based Qs
No ratings yet
Pyspark Scenario Based Qs
13 pages
Data Engineering Interview Prep
No ratings yet
Data Engineering Interview Prep
8 pages
150 Data Engineering Interview Questions PDF
50% (4)
150 Data Engineering Interview Questions PDF
8 pages
DB Exercises 2026
No ratings yet
DB Exercises 2026
20 pages
Introduction To CARDIC Hospital
No ratings yet
Introduction To CARDIC Hospital
30 pages
MySQL Insert Rows Into The Table
No ratings yet
MySQL Insert Rows Into The Table
15 pages
ITAP3010 Developing Data Access Solutions Tutorial A Semester 1
No ratings yet
ITAP3010 Developing Data Access Solutions Tutorial A Semester 1
5 pages
Viva Questions For SQL & Java For STD 12
No ratings yet
Viva Questions For SQL & Java For STD 12
20 pages
CIT843 2023 Answered
No ratings yet
CIT843 2023 Answered
6 pages
Microsoft - Certifyme.70 562.v2010!02!17.by
No ratings yet
Microsoft - Certifyme.70 562.v2010!02!17.by
5 pages
HP Vertica 7.1.x AnalyzingData
No ratings yet
HP Vertica 7.1.x AnalyzingData
181 pages
SQL Sample MCQ dc5
No ratings yet
SQL Sample MCQ dc5
12 pages
HOW To Analyzing and Interpreting AWR Report
0% (1)
HOW To Analyzing and Interpreting AWR Report
1 page
Computer Science Exercises
No ratings yet
Computer Science Exercises
19 pages
MongoDB The Definitive Guide Powerful and Scalable Data Storage 3rd Edition Shannon Bradshaw ebook concise edition
100% (2)
MongoDB The Definitive Guide Powerful and Scalable Data Storage 3rd Edition Shannon Bradshaw ebook concise edition
45 pages
Dbms Important Questions
No ratings yet
Dbms Important Questions
15 pages
Práctica1y2 Velasco Ricardez LizbethMaría
No ratings yet
Práctica1y2 Velasco Ricardez LizbethMaría
9 pages
Ict200 - Final Test
No ratings yet
Ict200 - Final Test
8 pages
Fabric Data Warehouse
50% (2)
Fabric Data Warehouse
280 pages
SPF-20A PartsBook UB701057-21
No ratings yet
SPF-20A PartsBook UB701057-21
151 pages
DD Database Design Learner
No ratings yet
DD Database Design Learner
98 pages
Common Pre-Board Examination 2023-24: General Instructions
No ratings yet
Common Pre-Board Examination 2023-24: General Instructions
11 pages
2nd Year One File Notes
No ratings yet
2nd Year One File Notes
23 pages
Periodic SQL Database
No ratings yet
Periodic SQL Database
3 pages
Building An Analytic Extension To MySQL With ClickHouse and Open Source
No ratings yet
Building An Analytic Extension To MySQL With ClickHouse and Open Source
36 pages
Question Bank RDBMS
No ratings yet
Question Bank RDBMS
3 pages
Database Administration Level IV Practical Exam 2
No ratings yet
Database Administration Level IV Practical Exam 2
6 pages
Practical - 8
No ratings yet
Practical - 8
3 pages
Lecture 5
No ratings yet
Lecture 5
21 pages
School of Information Science: Addis Ababa University College of Natural and Computational Science
0% (1)
School of Information Science: Addis Ababa University College of Natural and Computational Science
8 pages
SQL Queries for Supplier and Manager Data
No ratings yet
SQL Queries for Supplier and Manager Data
2 pages
Python MySQL Integration Guide
No ratings yet
Python MySQL Integration Guide
12 pages
Write Difference Between File Based System and DBMS
No ratings yet
Write Difference Between File Based System and DBMS
3 pages

Data Engineering Interview Prep

Uploaded by

Data Engineering Interview Prep

Uploaded by

Interview Questions-Visa

1.Explain your project

2.What are the optimizations you have worked on in spark?

5.Difference between coalesce and repartition

6.Any lessons learnt while leading the team

8.Explain your project

2.explain hash table and hash function

3.how do you handle long running jobs in spark?

4.how do you handle data skewness?

merchant_volume,merchant_name,volume are columns

merchant_category,category and merchant_name

sql query to select top merchant of each category

a.What happens if different categories have same merchant name

1ST TECHNICAL ROUND

1.explain project story

3.difference between csv and parquet

4.transformations used in project

5.is it possible to union 2 df with different schema?How can we do it?

7.operators used in airflow

9.How is incremental load done in databricks?

10.higher order functions and anonymous functions in scala

11.is pyspark and sparksql same in terms of execution?difference

AWS Design round

1.Different redshift clusters

2.glue crawlers and what happens if schema changes?

3.different ec2 instances

4.why redshift does not allow primary keys?

6.what is delta load?

7.difference between incremental load and CDC

8.difference between data lake and delta lake

9.difference between athena and redshift

10.how can we increase execution time of lambda?

11.service used to migrate databases?

12.different s3 storage levels and difference

13.how can glue job be triggered?what if one job depends on another?

1.Explain a challenging situation faced in project

3.difference between union and union all

6.data structures used

7.find even elements from a list

8.display only the unmatched records from two list

9.why pandas is preferred over spark?

10.how to explode a nested json into row and column in pyspark?

11.what happens internally when we submit spark job

2.aws services used

4.top 2 customers per month with highest sales(sql)

1.DIFFERENCE BETWEEN RANK AND DENSE_Rank

2.DEEP COPY VS SHALLOW COPY

6.GLUE RESIDES ON MEMORY?

8.SYNCHRONOUS AND ASYNCHRONOUS FUNCTIONS IN LAMBDA

1.find the maximum length of the subset of array having sum as 0

2.find expiry date by adding remaining days to recharge date in pyspark

9.rank vs dense rank

1.SERVICES WORKED ON IN AWS

Find count of occurrences in python and pyspark

1.AWS GLUE ,3,EMR

2.TRANSIENT AND LONG RUNNING JOB IN EMR

3.STEP EXECUTION IN EMR

5.SYNCHRONOUS AND ASYNCHRONOUS IN LAMBDA

7.relation between cpu cores and partitions

8.ways to solve data skewness

9.can we do repartition on columns

10.adequate query execution in spark

11.generators and decorators

13.scd implementation using pyspark

17.where can we see the logs of emr

18.difference between data lake and delta lake

1.WHAT IS DATA SPILLING?

3.Broadcast join and Sort merge join

4.broadcast nested loop join

5.Shuffle partitions concepts

9.how to identify long running jobs

10.how to assign resources to spark jobs

11.case class in scala

13.how to solve out of memory errors in spark?

You might also like