1.
Import the dataset and do usual exploratory analysis steps like checking the
structure & characteristics of the dataset
Q1. Data type of columns in a table
Ans:
In our dataset we have following data types:
STRING, INTEGER, FLOAT, TIMESTAMP.
Q2.Time period for which the data is given
Ans:
Data is given from 2016 to 2018 time period.
Q3. Cities and States of customers ordered during the given period.
Ans:
SELECT cs.customer_city,cs.customer_state,odr.order_purchase_timestamp
FROM `civic-surge-382404.Target_SQL.customers` as cs
Join `civic-surge-382404.Target_SQL.orders` as odr
on cs.customer_id = odr.customer_id
order by odr.order_purchase_timestamp desc
2. In-depth Exploration:
Q1. Is there a growing trend on e-commerce in Brazil? How can we describe a
complete scenario? Can we see some seasonality with peaks at specific months?
Ans:
SELECT count(order_id) order_count,
EXTRACT(YEAR FROM order_purchase_timestamp) AS year,
EXTRACT(MONTH FROM order_purchase_timestamp) AS month
FROM `civic-surge-382404.Target_SQL.orders` group by
month,year order by count(order_id) desc
No, there is not growing trend on e-commerce in Brazil. Compare to 2018 there was more
item purchased on E-commerce in year 2017.In the month on Nov-2017 highest order
happened – 7544.
Q2. What time do Brazilian customers tend to buy (Dawn, Morning, Afternoon or
Night)?
Ans:
select count(order_id) order_count,
EXTRACT(HOUR FROM order_purchase_timestamp) AS Hour
from `Target_SQL.orders`
group by hour order by order_count desc
- Night time Brazilian customers tend to buy.
3. Evolution of E-commerce orders in the Brazil region:
Q1. Get month on month orders by states
select count(odr.order_id)as order_count,cust.customer_state,
date_trunc(odr.order_purchase_timestamp, Month) AS Purchase_date,
from `civic-surge-382404.Target_SQL.customers` as cust
join `civic-surge-382404.Target_SQL.orders` as odr
on cust.customer_id = odr.customer_id
group by purchase_date, cust.customer_state
order by cust.customer_state limit 100
Q2. Distribution of customers across the states in Brazil
select count(cust.customer_id)as customers,
cust.customer_state,
from `Target_SQL.customers` as cust
group by
cust.customer_state
order by cust.customer_state
4. Impact on Economy: Analyze the money movement by e-commerce by looking at
order prices, freight and others.
Q1. Get % increase in cost of orders from 2017 to 2018 (include months between Jan
to Aug only) - You can use “payment_value” column in payments table
with cte as
(SELECT EXTRACT(year from b.order_purchase_timestamp) as year, sum(a.payment_val
ue) as
totalcost_orders from `Target_SQL.payments` a
join Target_SQL.orders b
on a.order_id = b.order_id
where extract(month from b.order_purchase_timestamp) between 1 and 8
group by 1
order by 1),
cte1 as (select c.totalcost_orders as temp from cte c where c.year = 2017)
select year, totalcost_orders, case when year = 2017 then 0
else round(100*(totalcost_orders-
(select temp from cte1))/(select temp from cte1), 2)
end as percentincrease from cte
order by 1
Q2. Mean & Sum of price and freight value by customer state
select round(sum(odi.price),2)as price, round(sum(odi.freight_value),2)as freight_v
alue,round(
(sum((odi.price * odi.freight_value))/row_number() over()),2)as mean,
cust.customer_state from `Target_SQL.customers` cust
join `Target_SQL.orders` odr
on cust.customer_id = odr.customer_id
join `civic-surge-382404.Target_SQL.order_items` odi
on odi.order_id = odr.order_id
group by cust.customer_state;
5. Analysis on sales, freight and delivery time
Q1. Calculate days between purchasing, delivering and estimated delivery
select odr.order_id,DATE_DIFF(odr.order_estimated_delivery_date, odr.order_purchase
_timestamp, DAY) as purchaseDate_estimationDate,
DATE_DIFF(odr.order_delivered_customer_date, odr.order_purchase_timestamp, DAY) as
purchaseDate_deliverDate,
from `civic-surge-382404.Target_SQL.orders` as odr
where odr.order_delivered_customer_date is not null
Q2. Find time_to_delivery & diff_estimated_delivery. Formula for the same given below
Find time to delivery
select odr.order_id,
DATE_DIFF( odr.order_delivered_customer_date ,odr.order_purchase_timestamp, DAY) as
time_to_delivery,
from `civic-surge-382404.Target_SQL.orders` as odr
where odr.order_delivered_customer_date is not null
diff_estimated_delivery
select odr.order_id,
DATE_DIFF( odr.order_estimated_delivery_date,odr.order_delivered_customer_date, DAY
) as diff_estimated_delivery,
from `civic-surge-382404.Target_SQL.orders` as odr
where odr.order_delivered_customer_date is not null
Q3. Group data by state, take mean of freight_value, time_to_delivery,
diff_estimated_delivery
select cust.customer_state,
odri.freight_value,
DATE_DIFF(odr.order_estimated_delivery_date, odr.order_purchase_timestamp, DAY) as
purchaseDate_estimationDate,
DATE_DIFF(odr.order_delivered_customer_date, odr.order_purchase_timestamp, DAY) as
purchaseDate_deliverDate,
from `civic-surge-382404.Target_SQL.customers` as cust
join `civic-surge-382404.Target_SQL.orders` as odr
on cust.customer_id = odr.customer_id
join `civic-surge-382404.Target_SQL.order_items` odri
on odr.order_id = odri.order_id
group by cust.customer_state,
purchaseDate_estimationDate,
purchaseDate_deliverDate,
odri.freight_value having purchaseDate_deliverDate is not null
Q5: Top 5 states with highest average freight value - sort in desc limit 5
select cust.customer_state, avg(odri.freight_value) val
from `civic-surge-382404.Target_SQL.customers` as cust
join `civic-surge-382404.Target_SQL.orders` as odr
on cust.customer_id = odr.customer_id
join `civic-surge-382404.Target_SQL.order_items` odri
on odr.order_id = odri.order_id
group by cust.customer_state
order by val desc limit 5
Top 5 states with lowest average freight value - sort in asc limit 5
select cust.customer_state, avg(odri.freight_value) val
from `civic-surge-382404.Target_SQL.customers` as cust
join `civic-surge-382404.Target_SQL.orders` as odr
on cust.customer_id = odr.customer_id
join `civic-surge-382404.Target_SQL.order_items` odri
on odr.order_id = odri.order_id
group by cust.customer_state
order by val limit 5
Q6. Top 5 states with highest average time to delivery
-- Top 5 states with highest average time to delivery
select cust.customer_state,avg(
DATE_DIFF(odr.order_delivered_customer_date, odr.order_purchase_timestamp, DAY)) as
days,
from `civic-surge-382404.Target_SQL.customers` as cust
join `civic-surge-382404.Target_SQL.orders` as odr
on cust.customer_id = odr.customer_id
join `civic-surge-382404.Target_SQL.order_items` odri
on odr.order_id = odri.order_id
group by cust.customer_state having days is not null
order by days desc limit 5
Top 5 states with lowest average time to delivery
select cust.customer_state,avg(
DATE_DIFF(odr.order_delivered_customer_date, odr.order_purchase_timestamp, DAY)) as
days,
from `civic-surge-382404.Target_SQL.customers` as cust
join `civic-surge-382404.Target_SQL.orders` as odr
on cust.customer_id = odr.customer_id
join `civic-surge-382404.Target_SQL.order_items` odri
on odr.order_id = odri.order_id
group by cust.customer_state having days is not null
order by days limit 5
Q7. Top 5 states where delivery is really fast/ not so fast compared to estimated date.
select cust.customer_state,max(
DATE_DIFF(odr.order_delivered_customer_date, odr.order_purchase_timestamp, DAY)) as
days,
from `civic-surge-382404.Target_SQL.customers` as cust
join `civic-surge-382404.Target_SQL.orders` as odr
on cust.customer_id = odr.customer_id
join `civic-surge-382404.Target_SQL.order_items` odri
on odr.order_id = odri.order_id
group by cust.customer_state having days is not null
order by days limit 5