0% found this document useful (0 votes)

12 views1 page

Amazon SQL Interview Questions - StrataScratch

Uploaded by

vinthk1992

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views1 page

Amazon SQL Interview Questions - StrataScratch

Uploaded by

vinthk1992

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Blog

Pricing

Amazon SQL Interview

Questions

Categories Amazon SQL Interviews Companies

Written by:
Nathan Rosidi

Author Bio

September 25th, 2023

Main Topics

The Data Science Role at Amazon

Technical Concepts Tested in Amazon SQL
Interview Questions
Amazon SQL Interview Question for Data
Scientist Position
Framework to Solve this Amazon SQL
Interview Question
Solution 1: Common Table Expressions
and INTERVAL function
Solution 2: Inner Query / Window
Function
Solution 3: CTEs and no analytic
functions
Comparison of Approaches
Conclusion

In this article, we will discuss how to solve one of

the interesting Amazon SQL interview questions
using several different approaches!

Amazon is a company that requires no introduction.

Launched by Jeff Bezos in 1994 as an online
bookstore, Amazon is now the largest online retailer
in the world. As the company currently employs over
a million people in over 200 teams across nearly 30
branches and products, 1 out of every 153 American
workers can call themselves an Amazon employee.
With its continuous growth and the aspiration to
innovate and automate, Amazon is always looking
for talented data scientists to join its team.

The Data Science Role at

Amazon

The exact tasks of a data scientist at Amazon highly

depend on the speciOc team a person works in. For
instance, a data scientist position at a retail
department might concern recommender systems
while the same position in the logistics team may
deal more with optimization techniques.

However, we can expect that most of Amazon's data

scientists will share tasks such as making
predictions and providing forecasting insights using
models and machine learning or deep learning
solutions. Aside from data scientists, Amazon also
employs data analysts whose role is to interpret
data, build and maintain dashboard, and design data
pipelines.

Check our Amazon Data Scientist Interview Guide to

get insight into the interview process, tips and tricks
to ace the interview, and most importantly, the kind
of questions asked in the interviews at Amazon.

Technical Concepts Tested in

Amazon SQL Interview
Questions

As the knowledge and understanding of SQL are the

key requirements for both data scientists and data
analysts at Amazon, many interview questions
require writing solutions using this language. In fact,
Amazon asks much less questions about theoretical
concepts and their products when compared to
other tech companies. This is why it’s crucial to
practice solving SQL interview questions before an
Amazon interview.

The Amazon SQL interview questions test a wide

range of SQL concepts but some notions appear in
them more often. Nearly 70% of Amazon SQL
interview questions concern data stores in multiple
tables and ask for merging the data using JOIN
clauses, Common Table Expressions or subqueries.
Another highly prevailing concept is data
aggregation using SQL functions such as COUNT of
SUM in combination with the GROUP BY clause or
using the more complicated window functions.
Many questions also require data Oltering and
sorting using the WHERE and ORDER BY clauses.

Amazon SQL Interview

Question for Data Scientist
Position

Revenue Over Time

The SQL interview question that we are going to

examine in detail in this article has been asked
during an interview at Amazon. With the title
“Revenue Over Time”, it challenges the candidate to
calculate the rolling average of total revenue from
Amazon purchases using SQL and based on the
rolling average deOnition provided in the question.

Link:
https://platform.stratascratch.com/coding/10314-
revenue-over-time

Multiple Solutions to Data Sci…

We’re being asked to Ond the 3-month rolling

average of total revenue from purchases given a
table with users, their purchase amount, and date
purchased. What’s more, we shouldn’t include
returns which are represented by negative purchase
values and we need to output the month and 3-
month rolling average of revenue, sorted from
earliest month to latest month.

Furthermore, this Amazon SQL interview question

speciOes that a 3-month rolling average is deOned by
calculating the average total revenue from all user
purchases for the current month and previous two
months. The Orst two months will not be a true 3-
month rolling average since we are not given data
from last year.

This is one of the hard Amazon SQL interview

questions because it deals with a difcult concept of
a rolling average and it has many additional
requirements and restrictions that we need to
account for. But it’s also one of the important
Amazon SQL interview questions because
calculating a rolling average is a common data
science problem. There are also multiple ways in
which this can be solved using SQL and that’s why
we will cover several different approaches.

Framework to Solve this Amazon

SQL Interview Question

Let's make the process of solving our interview

question easier. We'll follow an easy framework that
can be applied to any data science interview
problem. This framework has only three easy steps
but it creates a logical pipeline for approaching any
data science problems concerning writing code for
manipulating data. The three steps are following:

1. Understand your data:

1. First, you have to list your assumptions
about the data columns so that you know
which columns you need to use
2. If you still don’t feel conOdent you
understand your data enough, view the
Orst couple of rows of your data (single
or multiple tables). Or if in an interview,
ask your interviewer for some example
values to understand the actual data, not
just the columns. It’ll help you identify
edge cases and limit your solution to the
bounds of your assumption.
2. Formulate your approach:
1. Write down the logical steps you are
supposed to program/code.
2. Now, identify the main functions you are
going to use/implement to perform the
logic.
3. Interviewers will be watching you; they
will intervene when needed, make sure
you ask them to clarify any ambiguity,
they can also specify if you'll use ready-
made functions, or you should write code
from scratch.
3. Code Execution:
1. Build up your code but don't oversimplify
or overcomplicate.
2. I like to build it in steps based on the
steps I’ve outlined with the interviewer.
That means that the code is probably
won't be efcient. That’s One. Here you
will be talking about optimization at the
end with the interviewer.
3. The most important point here is not to
overcomplicate your code with multiple
logical statements and rules in your each
code block. A block of code can be
deOned as a CTE
or a subquery because it’s self-contained
and separate from the rest of the code.
4. You need to speak up! Talk as you’re
laying down code. Remember that your
interviewer will be evaluating your
problem-solving skills.

Understand your data

The Orst step, according to the framework, is to

understand the data. In this case, there is only one,
rather simple table called amazon_purchases. It has
three columns: user_id, created_at and
purchase_amt.

At an actual interview, we’re not usually given any

speciOc data but we can expect that each row in this
table represents a single purchase where for each
purchase we have the ID of the user who made it, the
date when it happened and the amount paid for it.
This Amazon SQL interview question also informs
us that it is possible for a value in the purchase_amt
to be negative which means that a user purchased a
product but then returned it.

amazon_purchases

user_id int
created_at datetime
purchase_amt int

Table: amazon_purchases

user_id created_at purchase_amt

10 2020-01-01 3742

11 2020-01-04 1290

12 2020-01-07 4249

13 2020-01-10 4899

14 2020-01-13 -4656

Show all Toggle dTypes

In many cases, we can solve an interview question

without using the entirety of the table. We can go
back to the task description and try to predict which
columns will be used in our code. Here, we’re
interested in the rolling average of total revenue
from purchases and thus, we’ll need the dates and
prices of the transactions. This means that we’ll only
focus on columns created_at and purchase_amt
while the column user_id won’t be particularly useful
for us.

Solution 1: Common Table

Expressions and INTERVAL
function

Formulate Approach

Let’s jump right into the Orst approach to solving this

Amazon SQL interview question! In this solution, we
will use the concept of Common Table Expressions,
or CTEs for short and an SQL function INTERVAL
that will come very handy to calculate the rolling
average in a simple way. But before we start writing
any code, let’s formulate our approach and write
down a few general steps that we’ll follow in the
solution.

1. Starting with what we have now, just separate

rows for all purchases, let’s start by
aggregating the data. And since before we can
calculate the rolling average we need to know
the average revenue for each month, the Orst
step will be to create a new table with only two
columns: the month and the average revenue
for this respective month.
And what’s more, let’s store this table in the
form of a CTE so this Common Table
Expression. The CTE is basically a special
syntax that allows to deOne a query, give it an
alias and then reuse it multiple times. It’s
somewhat similar to a subquery but it can
easily be used more than once and for many
people, the CTE syntax is much clearer and
easier to read. You’ll see exactly how it looks
once we start writing the code, for now let’s
just say that as the Orst step, we want to deOne
a CTE with total revenue per month.
2. When aggregating by month, we need to
remember about something that may not seem
obvious at Orst, namely, date formatting. We
can assume that the values in the created_at
column include the year, month and day. In
reality, however, the dates may be stored in
other formats or may not be consistent and we
should account for all of it. For now, let’s just
write down that we need to ensure the correct
encoding of dates and once we get to writing
the code, we can Ogure out how to solve it.
3. Remember that this Amazon SQL interview
question asks us to disregard the negative
purchase values so we can Olter them out in
the CTE from step one. The third step will
therefore be to edit the query in the CTE in
such a way that it won’t output the negative
amounts.
4. Now, we have a table with monthly revenues
that we can simply reuse using the CTE. How
to get from it to a rolling average? We will need
to know, for each month, not only the revenue
of this month but also of the 2 previous
months. To achieve this, we can use our CTE
query and merge it with itself. You can think of
it as taking the two columns that we already
have: month and revenue, and we’ll add to it
the revenue column but we’ll shift all the values
by one row. Then let’s repeat it one more time
and in each row we’ll have 3 different revenues
from the past 3 months.
5. Having this, we can simply add a SELECT
clause to turn these three separate columns
with revenues into a single column with their
average.
o. The very Onal step will again be about
adjusting the format to what is required by the
interview question. It says the output should
be sorted from earliest month to latest month,
so that’s what we need to do in the end.

Code Execution

Once we have written down the general steps for

solving this task, we can get to coding. The Orst
thing is to write a query returning total revenue per
month. We can start by selecting the column
created_at and the sum of the column
purchase_amt, both from the table
amazon_purchases. Since we use the aggregation
function SUM(), we need to add a GROUP BY
statement. Here, we aggregate by date so we say
GROUP BY created_at.

SELECT created_at,
sum(purchase_amt) AS revenue_month
FROM amazon_purchases
GROUP BY created_at

PostgreSQL

1 SELECT created_at,
2 sum(purchase_amt) AS
revenue_month
3 FROM amazon_purchases
4 GROUP BY created_at

Reset Run Code

Use ⌥ + Enter to run query

Highlight some code to execute selection
only

If we run it, you can see that not much has changed.
That’s because we’re not really aggregating by
month but by day - it has to do with how the date is
stored. And this is why we have this second step,
ensuring the proper encoding of date. So let’s
change the format of the date to year and month
only. We can start by casting the column created_at
to a date format using the two colons (::). In this very
case if we don’t do it, the solution will still work
because all values in this column happen to be
correctly formatted as dates, but we need to
anticipate possible errors in data. Once we have it,
we can use the to_char() function to convert the
date into the format we want, in this case YYYY-MM
which means year followed by month. Let’s give this
column an alias ‘month_year’ and remember to
update this alias in the GROUP BY clause.

SELECT to_char(created_at::date, 'YYYY-MM') AS month_y

sum(purchase_amt) AS revenue_month
FROM amazon_purchases
GROUP BY month_year

The results look much better now, we have each

month only once and the total monthly revenue is
calculated. But before moving on, let’s change one
more thing. Later on, we will use the INTERVAL
function and for this function, we need to have the
date together with a day. So we can change this
year-month construction that we have now, to year-
month-01, as in the Orst day of each month. This
won’t change the data, just a format in which the
date is stored. We can do it with a to_date() function
and specifying the desired format.

SELECT to_date(to_char(created_at::date, 'YYYY-MM'),'Y

AS month_year,
sum(purchase_amt) AS revenue_month
FROM amazon_purchases
GROUP BY month_year

Event Tree Analysis in Risk Management
No ratings yet
Event Tree Analysis in Risk Management
9 pages
BHULEKH250705221736850
No ratings yet
BHULEKH250705221736850
20 pages
ComColor Black FW1230 - Brochure
No ratings yet
ComColor Black FW1230 - Brochure
8 pages
Unit 1
No ratings yet
Unit 1
83 pages
DN 4.0 - Java FSE Mandatory Hands-On Detail
No ratings yet
DN 4.0 - Java FSE Mandatory Hands-On Detail
6 pages
IP Logger Statistics View
100% (1)
IP Logger Statistics View
1 page
Pseudocode - L1, L2
No ratings yet
Pseudocode - L1, L2
12 pages
Ms Access Practical Questions
93% (15)
Ms Access Practical Questions
2 pages
Agile Dev Plan for Project Teams
No ratings yet
Agile Dev Plan for Project Teams
14 pages
3-Sisteme de Marcare, Unelte Și Materiale de Instalare
No ratings yet
3-Sisteme de Marcare, Unelte Și Materiale de Instalare
696 pages
Creating Personal Narrative Presentations
No ratings yet
Creating Personal Narrative Presentations
13 pages
Case Study ISS
No ratings yet
Case Study ISS
5 pages
AWS Redshift Infographic Final
No ratings yet
AWS Redshift Infographic Final
1 page
Ukm
No ratings yet
Ukm
13 pages
Ricoh MP 401
No ratings yet
Ricoh MP 401
616 pages
Installing Mgmtconsole Accessserver
No ratings yet
Installing Mgmtconsole Accessserver
56 pages
Siemens MC45 (DIL/NetPC DNP/9200 - microHOWTO)
No ratings yet
Siemens MC45 (DIL/NetPC DNP/9200 - microHOWTO)
2 pages
ND PBT 1028 1155 1158
No ratings yet
ND PBT 1028 1155 1158
38 pages
Software Engineering Question Bank 2019-20
No ratings yet
Software Engineering Question Bank 2019-20
21 pages
Key IT Innovations and Trends Explained
No ratings yet
Key IT Innovations and Trends Explained
6 pages
The Rise of Fog Computing in The Digital Era PDF
100% (1)
The Rise of Fog Computing in The Digital Era PDF
305 pages
Web Design Tips for Beginners
No ratings yet
Web Design Tips for Beginners
4 pages
MIS Notes Unit 1 PDF
No ratings yet
MIS Notes Unit 1 PDF
19 pages
Data Base Management System
No ratings yet
Data Base Management System
29 pages
Understanding DevOps Principles and Practices
100% (1)
Understanding DevOps Principles and Practices
6 pages
1 s2.0 S0965997814000775 Main
No ratings yet
1 s2.0 S0965997814000775 Main
16 pages
Dattatrya Synopsis 1
No ratings yet
Dattatrya Synopsis 1
6 pages
Full Stack Developer Interview Preparation - Leavoda Inc
No ratings yet
Full Stack Developer Interview Preparation - Leavoda Inc
5 pages
Momentum
No ratings yet
Momentum
469 pages
LOGIQ-P10-XDclear Brochure Giu Glob jb03032xx March2021
No ratings yet
LOGIQ-P10-XDclear Brochure Giu Glob jb03032xx March2021
5 pages

Amazon SQL Interview Questions - StrataScratch

Uploaded by

Amazon SQL Interview Questions - StrataScratch

Uploaded by

Blog

Amazon SQL Interview

Categories Amazon SQL Interviews Companies

September 25th, 2023

The Data Science Role at Amazon

In this article, we will discuss how to solve one of

Amazon is a company that requires no introduction.

The Data Science Role at

The exact tasks of a data scientist at Amazon highly

However, we can expect that most of Amazon's data

Check our Amazon Data Scientist Interview Guide to

Technical Concepts Tested in

As the knowledge and understanding of SQL are the

The Amazon SQL interview questions test a wide

Amazon SQL Interview

Revenue Over Time

The SQL interview question that we are going to

Multiple Solutions to Data Sci…

We’re being asked to Ond the 3-month rolling

Furthermore, this Amazon SQL interview question

This is one of the hard Amazon SQL interview

Framework to Solve this Amazon

Let's make the process of solving our interview

1. Understand your data:

Understand your data

The Orst step, according to the framework, is to

At an actual interview, we’re not usually given any

user_id created_at purchase_amt

Show all Toggle dTypes

In many cases, we can solve an interview question

Solution 1: Common Table

Let’s jump right into the Orst approach to solving this

1. Starting with what we have now, just separate

Once we have written down the general steps for

Reset Run Code

Use ⌥ + Enter to run query

SELECT to_char(created_at::date, 'YYYY-MM') AS month_y

The results look much better now, we have each

SELECT to_date(to_char(created_at::date, 'YYYY-MM'),'Y

You might also like