0% found this document useful (0 votes)
17 views9 pages

Project Target

The document outlines a comprehensive analysis of e-commerce data in Brazil from 2016 to 2018, detailing exploratory analysis, trends, customer behavior, and economic impacts. Key findings include a lack of growth in e-commerce orders, peak purchasing times at night, and various metrics related to order costs, delivery times, and freight values across different states. The document also includes SQL queries for extracting relevant insights from the dataset.

Uploaded by

asifbaruani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views9 pages

Project Target

The document outlines a comprehensive analysis of e-commerce data in Brazil from 2016 to 2018, detailing exploratory analysis, trends, customer behavior, and economic impacts. Key findings include a lack of growth in e-commerce orders, peak purchasing times at night, and various metrics related to order costs, delivery times, and freight values across different states. The document also includes SQL queries for extracting relevant insights from the dataset.

Uploaded by

asifbaruani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

1.

Import the dataset and do usual exploratory analysis steps like checking the
structure & characteristics of the dataset

Q1. Data type of columns in a table

Ans:
In our dataset we have following data types:
STRING, INTEGER, FLOAT, TIMESTAMP.

Q2.Time period for which the data is given

Ans:

Data is given from 2016 to 2018 time period.

Q3. Cities and States of customers ordered during the given period.

Ans:

SELECT cs.customer_city,cs.customer_state,odr.order_purchase_timestamp
FROM `civic-surge-382404.Target_SQL.customers` as cs
Join `civic-surge-382404.Target_SQL.orders` as odr
on cs.customer_id = odr.customer_id
order by odr.order_purchase_timestamp desc

2. In-depth Exploration:

Q1. Is there a growing trend on e-commerce in Brazil? How can we describe a


complete scenario? Can we see some seasonality with peaks at specific months?

Ans:

SELECT count(order_id) order_count,


EXTRACT(YEAR FROM order_purchase_timestamp) AS year,
EXTRACT(MONTH FROM order_purchase_timestamp) AS month
FROM `civic-surge-382404.Target_SQL.orders` group by
month,year order by count(order_id) desc
No, there is not growing trend on e-commerce in Brazil. Compare to 2018 there was more
item purchased on E-commerce in year 2017.In the month on Nov-2017 highest order
happened – 7544.

Q2. What time do Brazilian customers tend to buy (Dawn, Morning, Afternoon or
Night)?

Ans:
select count(order_id) order_count,
EXTRACT(HOUR FROM order_purchase_timestamp) AS Hour
from `Target_SQL.orders`
group by hour order by order_count desc

- Night time Brazilian customers tend to buy.


3. Evolution of E-commerce orders in the Brazil region:

Q1. Get month on month orders by states

select count(odr.order_id)as order_count,cust.customer_state,


date_trunc(odr.order_purchase_timestamp, Month) AS Purchase_date,
from `civic-surge-382404.Target_SQL.customers` as cust
join `civic-surge-382404.Target_SQL.orders` as odr
on cust.customer_id = odr.customer_id
group by purchase_date, cust.customer_state
order by cust.customer_state limit 100

Q2. Distribution of customers across the states in Brazil

select count(cust.customer_id)as customers,


cust.customer_state,
from `Target_SQL.customers` as cust
group by
cust.customer_state
order by cust.customer_state
4. Impact on Economy: Analyze the money movement by e-commerce by looking at
order prices, freight and others.

Q1. Get % increase in cost of orders from 2017 to 2018 (include months between Jan
to Aug only) - You can use “payment_value” column in payments table

with cte as
(SELECT EXTRACT(year from b.order_purchase_timestamp) as year, sum(a.payment_val
ue) as
totalcost_orders from `Target_SQL.payments` a
join Target_SQL.orders b
on a.order_id = b.order_id
where extract(month from b.order_purchase_timestamp) between 1 and 8
group by 1
order by 1),
cte1 as (select c.totalcost_orders as temp from cte c where c.year = 2017)

select year, totalcost_orders, case when year = 2017 then 0


else round(100*(totalcost_orders-
(select temp from cte1))/(select temp from cte1), 2)
end as percentincrease from cte
order by 1

Q2. Mean & Sum of price and freight value by customer state

select round(sum(odi.price),2)as price, round(sum(odi.freight_value),2)as freight_v


alue,round(
(sum((odi.price * odi.freight_value))/row_number() over()),2)as mean,
cust.customer_state from `Target_SQL.customers` cust
join `Target_SQL.orders` odr
on cust.customer_id = odr.customer_id
join `civic-surge-382404.Target_SQL.order_items` odi
on odi.order_id = odr.order_id
group by cust.customer_state;
5. Analysis on sales, freight and delivery time

Q1. Calculate days between purchasing, delivering and estimated delivery

select odr.order_id,DATE_DIFF(odr.order_estimated_delivery_date, odr.order_purchase


_timestamp, DAY) as purchaseDate_estimationDate,

DATE_DIFF(odr.order_delivered_customer_date, odr.order_purchase_timestamp, DAY) as


purchaseDate_deliverDate,
from `civic-surge-382404.Target_SQL.orders` as odr
where odr.order_delivered_customer_date is not null

Q2. Find time_to_delivery & diff_estimated_delivery. Formula for the same given below

 Find time to delivery

select odr.order_id,
DATE_DIFF( odr.order_delivered_customer_date ,odr.order_purchase_timestamp, DAY) as
time_to_delivery,
from `civic-surge-382404.Target_SQL.orders` as odr
where odr.order_delivered_customer_date is not null
 diff_estimated_delivery

select odr.order_id,
DATE_DIFF( odr.order_estimated_delivery_date,odr.order_delivered_customer_date, DAY
) as diff_estimated_delivery,
from `civic-surge-382404.Target_SQL.orders` as odr
where odr.order_delivered_customer_date is not null

Q3. Group data by state, take mean of freight_value, time_to_delivery,


diff_estimated_delivery

select cust.customer_state,
odri.freight_value,
DATE_DIFF(odr.order_estimated_delivery_date, odr.order_purchase_timestamp, DAY) as
purchaseDate_estimationDate,
DATE_DIFF(odr.order_delivered_customer_date, odr.order_purchase_timestamp, DAY) as
purchaseDate_deliverDate,
from `civic-surge-382404.Target_SQL.customers` as cust
join `civic-surge-382404.Target_SQL.orders` as odr
on cust.customer_id = odr.customer_id
join `civic-surge-382404.Target_SQL.order_items` odri
on odr.order_id = odri.order_id
group by cust.customer_state,
purchaseDate_estimationDate,
purchaseDate_deliverDate,
odri.freight_value having purchaseDate_deliverDate is not null

Q5: Top 5 states with highest average freight value - sort in desc limit 5

select cust.customer_state, avg(odri.freight_value) val


from `civic-surge-382404.Target_SQL.customers` as cust
join `civic-surge-382404.Target_SQL.orders` as odr
on cust.customer_id = odr.customer_id
join `civic-surge-382404.Target_SQL.order_items` odri
on odr.order_id = odri.order_id
group by cust.customer_state
order by val desc limit 5

Top 5 states with lowest average freight value - sort in asc limit 5

select cust.customer_state, avg(odri.freight_value) val


from `civic-surge-382404.Target_SQL.customers` as cust
join `civic-surge-382404.Target_SQL.orders` as odr
on cust.customer_id = odr.customer_id
join `civic-surge-382404.Target_SQL.order_items` odri
on odr.order_id = odri.order_id
group by cust.customer_state
order by val limit 5
Q6. Top 5 states with highest average time to delivery

-- Top 5 states with highest average time to delivery

select cust.customer_state,avg(
DATE_DIFF(odr.order_delivered_customer_date, odr.order_purchase_timestamp, DAY)) as
days,
from `civic-surge-382404.Target_SQL.customers` as cust
join `civic-surge-382404.Target_SQL.orders` as odr
on cust.customer_id = odr.customer_id
join `civic-surge-382404.Target_SQL.order_items` odri
on odr.order_id = odri.order_id
group by cust.customer_state having days is not null
order by days desc limit 5

 Top 5 states with lowest average time to delivery

select cust.customer_state,avg(
DATE_DIFF(odr.order_delivered_customer_date, odr.order_purchase_timestamp, DAY)) as
days,
from `civic-surge-382404.Target_SQL.customers` as cust
join `civic-surge-382404.Target_SQL.orders` as odr
on cust.customer_id = odr.customer_id
join `civic-surge-382404.Target_SQL.order_items` odri
on odr.order_id = odri.order_id
group by cust.customer_state having days is not null
order by days limit 5
Q7. Top 5 states where delivery is really fast/ not so fast compared to estimated date.

select cust.customer_state,max(
DATE_DIFF(odr.order_delivered_customer_date, odr.order_purchase_timestamp, DAY)) as
days,
from `civic-surge-382404.Target_SQL.customers` as cust
join `civic-surge-382404.Target_SQL.orders` as odr
on cust.customer_id = odr.customer_id
join `civic-surge-382404.Target_SQL.order_items` odri
on odr.order_id = odri.order_id
group by cust.customer_state having days is not null
order by days limit 5

You might also like