Blog
Pricing
Amazon SQL Interview
Questions
Categories Amazon SQL Interviews Companies
Written by:
Nathan Rosidi
Author Bio
September 25th, 2023
Main Topics
The Data Science Role at Amazon
Technical Concepts Tested in Amazon SQL
Interview Questions
Amazon SQL Interview Question for Data
Scientist Position
Framework to Solve this Amazon SQL
Interview Question
Solution 1: Common Table Expressions
and INTERVAL function
Solution 2: Inner Query / Window
Function
Solution 3: CTEs and no analytic
functions
Comparison of Approaches
Conclusion
Share
Follow
In this article, we will discuss how to solve one of
the interesting Amazon SQL interview questions
using several different approaches!
Amazon is a company that requires no introduction.
Launched by Jeff Bezos in 1994 as an online
bookstore, Amazon is now the largest online retailer
in the world. As the company currently employs over
a million people in over 200 teams across nearly 30
branches and products, 1 out of every 153 American
workers can call themselves an Amazon employee.
With its continuous growth and the aspiration to
innovate and automate, Amazon is always looking
for talented data scientists to join its team.
The Data Science Role at
Amazon
The exact tasks of a data scientist at Amazon highly
depend on the speciOc team a person works in. For
instance, a data scientist position at a retail
department might concern recommender systems
while the same position in the logistics team may
deal more with optimization techniques.
However, we can expect that most of Amazon's data
scientists will share tasks such as making
predictions and providing forecasting insights using
models and machine learning or deep learning
solutions. Aside from data scientists, Amazon also
employs data analysts whose role is to interpret
data, build and maintain dashboard, and design data
pipelines.
Check our Amazon Data Scientist Interview Guide to
get insight into the interview process, tips and tricks
to ace the interview, and most importantly, the kind
of questions asked in the interviews at Amazon.
Technical Concepts Tested in
Amazon SQL Interview
Questions
As the knowledge and understanding of SQL are the
key requirements for both data scientists and data
analysts at Amazon, many interview questions
require writing solutions using this language. In fact,
Amazon asks much less questions about theoretical
concepts and their products when compared to
other tech companies. This is why it’s crucial to
practice solving SQL interview questions before an
Amazon interview.
The Amazon SQL interview questions test a wide
range of SQL concepts but some notions appear in
them more often. Nearly 70% of Amazon SQL
interview questions concern data stores in multiple
tables and ask for merging the data using JOIN
clauses, Common Table Expressions or subqueries.
Another highly prevailing concept is data
aggregation using SQL functions such as COUNT of
SUM in combination with the GROUP BY clause or
using the more complicated window functions.
Many questions also require data Oltering and
sorting using the WHERE and ORDER BY clauses.
Amazon SQL Interview
Question for Data Scientist
Position
Revenue Over Time
The SQL interview question that we are going to
examine in detail in this article has been asked
during an interview at Amazon. With the title
“Revenue Over Time”, it challenges the candidate to
calculate the rolling average of total revenue from
Amazon purchases using SQL and based on the
rolling average deOnition provided in the question.
Link:
https://platform.stratascratch.com/coding/10314-
revenue-over-time
Multiple Solutions to Data Sci…
We’re being asked to Ond the 3-month rolling
average of total revenue from purchases given a
table with users, their purchase amount, and date
purchased. What’s more, we shouldn’t include
returns which are represented by negative purchase
values and we need to output the month and 3-
month rolling average of revenue, sorted from
earliest month to latest month.
Furthermore, this Amazon SQL interview question
speciOes that a 3-month rolling average is deOned by
calculating the average total revenue from all user
purchases for the current month and previous two
months. The Orst two months will not be a true 3-
month rolling average since we are not given data
from last year.
This is one of the hard Amazon SQL interview
questions because it deals with a difcult concept of
a rolling average and it has many additional
requirements and restrictions that we need to
account for. But it’s also one of the important
Amazon SQL interview questions because
calculating a rolling average is a common data
science problem. There are also multiple ways in
which this can be solved using SQL and that’s why
we will cover several different approaches.
Framework to Solve this Amazon
SQL Interview Question
Let's make the process of solving our interview
question easier. We'll follow an easy framework that
can be applied to any data science interview
problem. This framework has only three easy steps
but it creates a logical pipeline for approaching any
data science problems concerning writing code for
manipulating data. The three steps are following:
1. Understand your data:
1. First, you have to list your assumptions
about the data columns so that you know
which columns you need to use
2. If you still don’t feel conOdent you
understand your data enough, view the
Orst couple of rows of your data (single
or multiple tables). Or if in an interview,
ask your interviewer for some example
values to understand the actual data, not
just the columns. It’ll help you identify
edge cases and limit your solution to the
bounds of your assumption.
2. Formulate your approach:
1. Write down the logical steps you are
supposed to program/code.
2. Now, identify the main functions you are
going to use/implement to perform the
logic.
3. Interviewers will be watching you; they
will intervene when needed, make sure
you ask them to clarify any ambiguity,
they can also specify if you'll use ready-
made functions, or you should write code
from scratch.
3. Code Execution:
1. Build up your code but don't oversimplify
or overcomplicate.
2. I like to build it in steps based on the
steps I’ve outlined with the interviewer.
That means that the code is probably
won't be efcient. That’s One. Here you
will be talking about optimization at the
end with the interviewer.
3. The most important point here is not to
overcomplicate your code with multiple
logical statements and rules in your each
code block. A block of code can be
deOned as a CTE
or a subquery because it’s self-contained
and separate from the rest of the code.
4. You need to speak up! Talk as you’re
laying down code. Remember that your
interviewer will be evaluating your
problem-solving skills.
Understand your data
The Orst step, according to the framework, is to
understand the data. In this case, there is only one,
rather simple table called amazon_purchases. It has
three columns: user_id, created_at and
purchase_amt.
At an actual interview, we’re not usually given any
speciOc data but we can expect that each row in this
table represents a single purchase where for each
purchase we have the ID of the user who made it, the
date when it happened and the amount paid for it.
This Amazon SQL interview question also informs
us that it is possible for a value in the purchase_amt
to be negative which means that a user purchased a
product but then returned it.
amazon_purchases
user_id int
created_at datetime
purchase_amt int
Table: amazon_purchases
user_id created_at purchase_amt
10 2020-01-01 3742
11 2020-01-04 1290
12 2020-01-07 4249
13 2020-01-10 4899
14 2020-01-13 -4656
Show all Toggle dTypes
In many cases, we can solve an interview question
without using the entirety of the table. We can go
back to the task description and try to predict which
columns will be used in our code. Here, we’re
interested in the rolling average of total revenue
from purchases and thus, we’ll need the dates and
prices of the transactions. This means that we’ll only
focus on columns created_at and purchase_amt
while the column user_id won’t be particularly useful
for us.
Solution 1: Common Table
Expressions and INTERVAL
function
Formulate Approach
Let’s jump right into the Orst approach to solving this
Amazon SQL interview question! In this solution, we
will use the concept of Common Table Expressions,
or CTEs for short and an SQL function INTERVAL
that will come very handy to calculate the rolling
average in a simple way. But before we start writing
any code, let’s formulate our approach and write
down a few general steps that we’ll follow in the
solution.
1. Starting with what we have now, just separate
rows for all purchases, let’s start by
aggregating the data. And since before we can
calculate the rolling average we need to know
the average revenue for each month, the Orst
step will be to create a new table with only two
columns: the month and the average revenue
for this respective month.
And what’s more, let’s store this table in the
form of a CTE so this Common Table
Expression. The CTE is basically a special
syntax that allows to deOne a query, give it an
alias and then reuse it multiple times. It’s
somewhat similar to a subquery but it can
easily be used more than once and for many
people, the CTE syntax is much clearer and
easier to read. You’ll see exactly how it looks
once we start writing the code, for now let’s
just say that as the Orst step, we want to deOne
a CTE with total revenue per month.
2. When aggregating by month, we need to
remember about something that may not seem
obvious at Orst, namely, date formatting. We
can assume that the values in the created_at
column include the year, month and day. In
reality, however, the dates may be stored in
other formats or may not be consistent and we
should account for all of it. For now, let’s just
write down that we need to ensure the correct
encoding of dates and once we get to writing
the code, we can Ogure out how to solve it.
3. Remember that this Amazon SQL interview
question asks us to disregard the negative
purchase values so we can Olter them out in
the CTE from step one. The third step will
therefore be to edit the query in the CTE in
such a way that it won’t output the negative
amounts.
4. Now, we have a table with monthly revenues
that we can simply reuse using the CTE. How
to get from it to a rolling average? We will need
to know, for each month, not only the revenue
of this month but also of the 2 previous
months. To achieve this, we can use our CTE
query and merge it with itself. You can think of
it as taking the two columns that we already
have: month and revenue, and we’ll add to it
the revenue column but we’ll shift all the values
by one row. Then let’s repeat it one more time
and in each row we’ll have 3 different revenues
from the past 3 months.
5. Having this, we can simply add a SELECT
clause to turn these three separate columns
with revenues into a single column with their
average.
o. The very Onal step will again be about
adjusting the format to what is required by the
interview question. It says the output should
be sorted from earliest month to latest month,
so that’s what we need to do in the end.
Code Execution
Once we have written down the general steps for
solving this task, we can get to coding. The Orst
thing is to write a query returning total revenue per
month. We can start by selecting the column
created_at and the sum of the column
purchase_amt, both from the table
amazon_purchases. Since we use the aggregation
function SUM(), we need to add a GROUP BY
statement. Here, we aggregate by date so we say
GROUP BY created_at.
SELECT created_at,
sum(purchase_amt) AS revenue_month
FROM amazon_purchases
GROUP BY created_at
PostgreSQL
1 SELECT created_at,
2 sum(purchase_amt) AS
revenue_month
3 FROM amazon_purchases
4 GROUP BY created_at
Reset Run Code
Use ⌥ + Enter to run query
Highlight some code to execute selection
only
If we run it, you can see that not much has changed.
That’s because we’re not really aggregating by
month but by day - it has to do with how the date is
stored. And this is why we have this second step,
ensuring the proper encoding of date. So let’s
change the format of the date to year and month
only. We can start by casting the column created_at
to a date format using the two colons (::). In this very
case if we don’t do it, the solution will still work
because all values in this column happen to be
correctly formatted as dates, but we need to
anticipate possible errors in data. Once we have it,
we can use the to_char() function to convert the
date into the format we want, in this case YYYY-MM
which means year followed by month. Let’s give this
column an alias ‘month_year’ and remember to
update this alias in the GROUP BY clause.
SELECT to_char(created_at::date, 'YYYY-MM') AS month_y
sum(purchase_amt) AS revenue_month
FROM amazon_purchases
GROUP BY month_year
The results look much better now, we have each
month only once and the total monthly revenue is
calculated. But before moving on, let’s change one
more thing. Later on, we will use the INTERVAL
function and for this function, we need to have the
date together with a day. So we can change this
year-month construction that we have now, to year-
month-01, as in the Orst day of each month. This
won’t change the data, just a format in which the
date is stored. We can do it with a to_date() function
and specifying the desired format.
SELECT to_date(to_char(created_at::date, 'YYYY-MM'),'Y
AS month_year,
sum(purchase_amt) AS revenue_month
FROM amazon_purchases
GROUP BY month_year