0% found this document useful (0 votes)
164 views9 pages

DoorDash SQL Interview Question

The document presents a SQL interview question for DoorDash regarding the analysis of extremely late deliveries, defined as orders delivered more than 20 minutes past the predicted time. It outlines a structured approach for the solution, including dataset exploration, assumptions, and a step-by-step SQL coding framework to calculate the percentage of extremely late deliveries by month. The document emphasizes the importance of this metric for customer satisfaction and business performance.

Uploaded by

yvettegyy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
164 views9 pages

DoorDash SQL Interview Question

The document presents a SQL interview question for DoorDash regarding the analysis of extremely late deliveries, defined as orders delivered more than 20 minutes past the predicted time. It outlines a structured approach for the solution, including dataset exploration, assumptions, and a step-by-step SQL coding framework to calculate the percentage of extremely late deliveries by month. The document emphasizes the importance of this metric for customer satisfaction and business performance.

Uploaded by

yvettegyy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

DoorDash SQL Interview Question​

Extremely Late Delivery​


The question we want to talk about is this one.​

If you have researched the company beforehand (highly recommended!), this is the time to
demonstrate your understanding of the business.​

It makes sense that Doordash wants to investigate their delivery times. Especially in the food
industry, on-time delivery is crucial to ensuring customer satisfaction. Aside from this, it also
affects food quality and safety. Food needs to be served fresh!​

Imagine you’re a data analyst at DoorDash, and Management wants to know, for every month,
what percentage of their orders are ‘extremely late deliveries’. A delivery is considered
‘extremely late’ if the order is delivered more than 20 minutes late of the predicted delivery
time.​

Intuitively, we would monitor the number of ‘extremely late’ deliveries month on month.
However, we cannot look at this number alone. The total deliveries will change monthly and
fluctuate based on season, holidays, and due to the general growth of the company. A better
metric to track is the percentage of total orders that are ‘extremely late’. This is exactly what
this DoorDash SQL interview question is asking us to do.​
Ideally, we want to see this metric lower over time or at least, be kept within a certain threshold.
Otherwise, this is a cause for concern, leading to low ratings, less repeat business, and poor food
quality. Sooner than you think, the company is out of business. And you’re out of your job!​

We can’t prepare you for losing your job, but we can for getting one!​
Solution Approach Framework to Solve this DoorDash SQL
Interview Question​

Now that you understand the problem you need to solve, let’s start formulating the solution.​

Using a framework for that will help you structure your thoughts and communicate them in a
way that the interviewer can follow. This is important because oftentimes, it is not only your
coding skills that are being tested but your communication skills as well.​

The framework we recommend consists of the following steps.​


1. Explore and understand the dataset​
2. Confirm and state your assumptions​
3. Outline your high-level approach in English or pseudo-code​
4. Code the solution​
5. Code optimizations (if there are any)​

If you are not yet familiar with this, go check out this video where all the steps are explained in
detail.​

1. Explore the Dataset​


We are provided with the delivery_orders table showing the delivery ID, the time when the
order was placed, when it was predicted to be delivered, when it was actually delivered, the
rating provided to that delivery, as well as the IDs of the dasher (rider), the restaurant, and the
consumer (customer).​

Here’s the table structure.​

A preview of the table shows the following data.​


Table: delivery_orders​
Show allToggle dTypes​
Once an order is placed, it is assigned a unique delivery ID, and the relevant details about the
order like the customer, the restaurant, the date, and time of order are then logged. The delivery
time/expected time of arrival of delivery is computed, and the order is assigned to a
dasher/rider whose ID is also included in the table. Once the order is delivered, the exact time is
logged, and the customer provides their feedback through a rating.​
2. Confirm and State Your Assumptions​
The question doesn’t offer info on what happens if the order is canceled or the delivery is in
transit, i.e., placed but not yet delivered.

You can clarify this with the interviewer. We don’t have this option, so we’ll assume that the
dataset contains only fulfilled orders.​
3. Outline the High-Level Approach​
Now, let’s unpack what is required to get the target output table. This DoorDash SQL interview
question asks us to show the proportion of extremely late orders for each month.

The output should show months in a YYYY-MM format, with the corresponding percentage of
extremely late orders in the second column.​

To get the dates in a desired format, we need to extract the month and year information from
the order_placed_date column.​
As for the percentage, it requires calculating the total number of late orders and the total
number of orders for every month.​

Getting to the number of total orders is easy: we only need to count the number of unique
delivery IDs. But what about the total number of extremely late orders per month?​

In order to determine whether an order is an ‘extremely’ late order or not, we will have to
compare the columns predicted_delivery_time and actual_delivery_time. If the actual delivery
time exceeds the predicted delivery time by more than 20 minutes, then we have an ‘extremely
late’ order.​

To weed the extremely late orders, we’ll use the CASE statement.​

Turning this verbose description into steps:​


1. TO_CHAR() – extract month and year from the column order_placed_date in the format YYYY-
MM​
2. CASE statement – identify extremely late deliveries at the row level​
3. EXTRACT() – get the difference between the actual and predicted delivery time in minutes​
4. SUM() – get the total number of the extremely late deliveries​
5. COUNT() and DISTINCT – get the total number of all deliveries; divide the previous step to get
the percentage​
6. CAST() - get the decimal point in the percentage (FLOAT)​
7. GROUP BY – get the output by month​

Once you list all the steps, check them with the interviewer. It helps you get feedback early on. If
all’s good, you can start coding.​
4. Code the Solution​
Coding will now simply mean translating the above steps into an SQL code.​
1. Extract Month and Year​
We will use the TO_CHAR() function to get data in the desired format.​
1 SELECT TO_CHAR(order_placed_time,'YYYY-MM') AS year_month
2 FROM delivery_orders;

2. Identify the Extremely Late Deliveries​


The CASE statement is an IF-THEN-ELSE statement in SQL that will allow us to identify the
extremely late deliveries.​
1 SELECT TO_CHAR(order_placed_time, 'YYYY-MM') AS year_month,
2 CASE
3 WHEN (prediced_delivery_time - actual_delivery_time) > 20 THEN 1
4 ELSE 0
5 END
6 FROM delivery_orders;

However, for this to work, we need to extract minutes and subtract them.​
3. Show Difference in Minutes​
In this step we’ll use the EXTRACT() function. The code looks like this so far.​
1 SELECT TO_CHAR(order_placed_time, 'YYYY-MM') AS year_month,
2 CASE
3 WHEN EXTRACT(MINUTE
4 FROM (actual_delivery_time - predicted_delivery_time)) >
5 ELSE 0
6 END
7 FROM delivery_orders;

4. Number of Extremely Late Deliveries​


The CASE statement will show integer 1 besides every extremely late delivery. To get their
number, we use the SUM() aggregate function.​
1 SELECT TO_CHAR(order_placed_time, 'YYYY-MM') AS year_month,
2 SUM(CASE
3 WHEN EXTRACT(MINUTE
4 FROM (actual_delivery_time - predicted_delivery_time
5 ELSE 0
6 END)
7 FROM delivery_orders;
5. Number of All Deliveries​
The next step is to find the number of all unique deliveries using COUNT() and DISTINCT. Also,
the number of extremely late deliveries has to be divided by this number.​
1 SELECT TO_CHAR(order_placed_time, 'YYYY-MM') AS year_month,
2 SUM(CASE
3 WHEN EXTRACT(MINUTE
4 FROM (actual_delivery_time - predicted_delivery_time
5 ELSE 0
6 END)/COUNT(DISTINCT delivery_id)*100
7 FROM delivery_orders;

6. Convert to FLOAT​
The above get will show the division result as an integer. Namely, as zero because the number of
extremely late deliveries is lower than total deliveries.​

To get around this, you can use the CAST() function. We’ll use the double-colon (::), which is a
shorthand for this function.​
1 SELECT TO_CHAR(order_placed_time, 'YYYY-MM') AS year_month,
2 SUM(CASE
3 WHEN EXTRACT(MINUTE
4 FROM (actual_delivery_time - predicted_delivery_time
5 ELSE 0
6 END)/COUNT(DISTINCT delivery_id)::FLOAT*100 AS perc_extremely_late
7 FROM delivery_orders;

7. Group by Month and Year​


All now remains is to use the GROUP BY clause, and you got yourself the full answer to this
DoorDash SQL interview question.​
1 SELECT TO_CHAR(order_placed_time, 'YYYY-MM') AS year_month,
2 SUM(CASE
3 WHEN EXTRACT(MINUTE
4 FROM (actual_delivery_time - predicted_delivery_time
5 ELSE 0
6 END)/COUNT(DISTINCT delivery_id)::FLOAT*100 AS perc_extremely_late
7 FROM delivery_orders
8 GROUP BY 1;

This result calls for attention as Doordash has some months with a high proportion of extremely
late deliveries. The worst performance was in January 2022, with around 36% of the orders
being extremely late. Likewise, November 2021 also had a troubling delivery performance at
31%. With this, the company management would want to investigate the root cause so that
extremely late deliveries are kept to a minimum; otherwise, the long wait might be shunning
away Doordash’s valuable customers!​
5. Code Optimizations​
Whether you’ll need this step or not depends on the code itself. Some codes don’t require
optimization, while others do. Interviewers often like to ask for a code to be optimized. The
interviewees’ ability to do that shows that they can write a code that works and an efficient
one.​

If you have more time left at the end of the interview, think about how you can improve the
solution, even if the interviewer doesn’t ask you to optimize it.​

Regarding our DoorDash question solution, we can improve it. A handy trick is to use the
Common Table Expressions or CTEs.​

If there are column transformations or the code is more complex, it can make your solution
harder to understand and debug in case of errors. CTEs are especially helpful here!​
By way of writing a CTE before aggregating the final results, we can optimize our solution.​
1 WITH orders AS
2 (SELECT TO_CHAR(order_placed_time, 'YYYY-MM') AS year_month,
3 delivery_id,
4 CASE
5 WHEN EXTRACT(MINUTE
6 FROM (actual_delivery_time - predicted_delivery_time)) > 20 THEN 1
7 ELSE 0
8 END AS extremely_late
9 FROM delivery_orders)
10
11 SELECT year_month,
12 SUM(extremely_late)/COUNT(DISTINCT delivery_id)::FLOAT*100 AS perc_extrem
13 FROM orders
14 GROUP BY 1;

You might also like