0% found this document useful (0 votes)
65 views5 pages

AWS Interview QA Detailed

The document contains a comprehensive list of AWS interview questions and detailed answers covering various services such as EMR, Glue, S3, and RDS. It addresses topics like node types, scaling, job submission, data storage, and performance tuning. Each section provides specific queries related to the respective AWS service, aimed at preparing candidates for technical interviews in cloud computing roles.

Uploaded by

hirwesaurav
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
65 views5 pages

AWS Interview QA Detailed

The document contains a comprehensive list of AWS interview questions and detailed answers covering various services such as EMR, Glue, S3, and RDS. It addresses topics like node types, scaling, job submission, data storage, and performance tuning. Each section provides specific queries related to the respective AWS service, aimed at preparing candidates for technical interviews in cloud computing roles.

Uploaded by

hirwesaurav
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

AWS Interview Questions - Detailed Answers

1. Tell me about yourself?

2. Tell me about your project?

3. EMR:

a. How many master nodes? Type of nodes in the cluster?

b. What is the difference between core, master, and task nodes?

c. What is the size of your EMR nodes?

d. What is the guidelines to choose the size of nodes?(Horizontal scaling vs vertical scaling)

e. Type of cluster scaling?(auto scaling)

f. How do you distribute the pyspark process load on different core nodes?( spark-submit - option mode =

cluster other options are client and local)

g. How do you submit these jobs?(use EMR step-with-step concurrency)

h. What are the different applications you have used on EMR?

i. What are the use of these Applications?

j. Tell me about the use of Hive?(What hive meta store)

k. What is the difference between external and managed tables)

l. What version of EMR you have used, versions of the application(spark version, python etc.)

m. How do you optimize the cost of EMR?(auto scale, use transient clusters)

Page 1
AWS Interview Questions - Detailed Answers

n. What is a transient cluster?

o. Tell me what type of nodes you provisioned?(On demand, spot, reserved)

p. Tell the difference between these types of nodes? And tell me when to use what type of nodes.

q. What is serverless EMR? What is EMR Studio when do you use this?

r. When do you EMR vs glue/athena/MKS/Kinesis?

s. How do you store application logs from EMR?

t. How to Monitor EMR applications?

u. How to debug EMR jobs?

v. What is spark application and resource monitoring applications, How do you access it?

w. How do these applications help you to debug the jobs?

x. What is the architecture of spark processing?

y. How to terminate EMR? What happens when we terminate vs stop the EMR?

z. What is AMI and how we can use to launch an EMR?

aa. What was the disc volume type and size for your EMR?

bb. Was data stored on HDFS or S3?

cc. What is EMRFS? How do you access data stored on S3 in EMR?

dd. What are Tags? Why do you use it?

ee. What is bootstrap actions? What's it' use?

ff. What is an instance fleet? When do we use this?

gg. How do you increase the instance quota limit for a AZ?

hh. What is auto termination in EMR? When do we use it?

ii. What is the use of security groups in EMR?

jj. What was the volume of data? What was the format of data?

kk. What is parquet/orc file format?(what is the difference between columnar vs non- columnar data types)

Page 2
AWS Interview Questions - Detailed Answers

ll. How do you performance tune your EMR jobs?

4. Glue:

a. What is the glue? Why do we use glue service?

b. What type of glue engine you have used?

c. What python version you have used?

d. What is DPU(data processing unit)

e. What are the different transformations you have used in your glue job?

f. Have you used custom scripts for your glue job?

g. What are the different libraries and packages you have used?

h. How do you use custom packages or libraries? How you configured these for a glue job?

i. How do we pass parameters to glue job?

j. What is use of Tags?

k. How do you track changes in glue job,(versioning of glue job)

l. How do we monitor a glue job?

m. How do we troubleshoot the glue job?

n. How do we do the development of glue job on local machine?

o. How can we push the glue job code to code repository?

p. My glue job is taking a lot of time to execute what I should do?

q. How do I setup time out or increase time out for a glue job?

r. How do I create a glue workflow? What is use of it?

s. Difference between (Airflow vs glue workflow)

t. What is glue crawler? When do we use it?

Page 3
AWS Interview Questions - Detailed Answers

u. What is glue catalog? How do you update glue catalog?

v. How do you automatically detect Changes in the schema?

w. What is schema registries and what is use of it?

x. How do you performance tune your Glue jobs?

5. S3:

a. What is the purpose of S3? What is object store?

b. What is the difference between object store vs block store?

c. What is max size of a single an object stored on S3?

d. What is the size of S3? What is the availability of S3?

e. Can we have bucket with same name? what is one bucket in one account and another bucket in a

f. Can we have bucket with same name in two different accounts in different regions?

g. What bucket versioning? (If someone deleted file stored on S3 and I want to retrieve that file how I do that)

h. What is a delete marker? What happened when you delete the delete marker in s3?

i. How the data stored on s3 is encrypted? (dat at rest and data at transit encryption)

j. In which different layers the data is stored? (to optimize the cost how do I store my data so that it will cost

me less)

k. How do I log when the object is accessed from the bucket, and how do I get notified for few actions?(event

notification)

l. How do I automatically classify data in different layes to save the cost?

m. What if glacier storage? When do we use this?

n. How do I restrict users not to delete the object stored on s3?

Page 4
AWS Interview Questions - Detailed Answers

o. How do we restrict the access of s3 data to users?

p. How do you give access to users from different AWS accounts?(CORS)

q. How do you monitor s3 bucket size, objects etc.

r. I want to automatically remove data that has not been used or will not be used after 180 days

s. How do I make sure that if region failure occurs I do not loose the data

6. RDS:

a. What is the difference between use of RDS vs use of a database stored on premises system or in

b. What are available databases for RDS?

c. What is the difference between these databases? Which RDS you have used?

d. I have an operational system for which database is in RDS, how do I make sure my RDS instance

e. How do I handle load on RDS instance?

f. How do I manage storage if my data is growing over the time?

g. How do I make a backup of my data base automatically?

h. How do secure my data?

i. Who do patching/upgrade/security scanning in case of RDS?

j. Can you provide maintenance windows to AWS for RDS maintenance?

k. How do you monitor and debug the usage on query performance?

l. How do you restrict the access of RDS instance?

m. Difference between SQL and No-sql databases?

n. Difference between snapshot and backup?

o. What type of RDS instances you have used? For what purpose(dev/test/prod)

Page 5

You might also like