0% found this document useful (0 votes)
121 views23 pages

Spotify Data Analysis SQL Project 1712710947

Uploaded by

benkhelfa.o
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
121 views23 pages

Spotify Data Analysis SQL Project 1712710947

Uploaded by

benkhelfa.o
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

SQL PROJECT

SPOTIFY ANALYSIS
Project :
Explore Spotify DataSet and answer all the Question

Tools
For Data manipulation and analysis SQL

DataSet
Spotify Dataset From 2010-2019 detials of all songs

SPOTIFY ANALYSIS
DataSet Attributes :

column1: This column seems to be an identifier or index for each entry in the dataset.
It might not contain useful information for analysis if it's just a sequential or arbitrary
identifier.
title: This column represents the title of the song.
artist: This column contains the name of the artist who performed the song.
top_genre: This column indicates the primary genre of the song, likely based on
popularity or classification.
year: This column represents the year in which the song was released.
bpm: This column denotes the beats per minute (BPM) of the song, which indicates its
tempo or speed.
nrgy: This column represents the energy level of the song, which could be a measure of
intensity or excitement.
dnce: This column likely represents the danceability of the song, indicating how
suitable it is for dancing.
dB: This column could represent the volume or loudness of the song, measured in
decibels (dB).

SPOTIFY ANALYSIS
DataSet Attributes :

clive: This column might indicate whether the song was recorded live or in a studio
setting.
val: This column could represent the valence or mood of the song, indicating its
positivity or negativity.
dur: This column represents the duration of the song in some unit of time (e.g.,
seconds or minutes).
acous: This column represents the acousticness of the song, indicating the extent to
which it features acoustic elements.
spch: This column represents the speechiness of the song, which could indicate the
presence of spoken words or vocals.
pop: This column might represent the popularity of the song, though the specific
measure or scale of popularity is not specified.

SPOTIFY ANALYSIS
DataSet View :

SPOTIFY ANALYSIS
Task:
Retrieve all columns for songs released in a specific year.
Calculate the average BPM (beats per minute) for all songs in the dataset.
Count the number of songs by each artist.
Find the maximum and minimum values of the "nrgy" (energy) column.
Calculate the total duration of all songs in minutes.
Identify the top 5 genres with the highest average danceability (dnce).
Find the song with the highest energy (nrgy) value for each year.
Calculate the median value of the "val" column.
Determine the average speechiness (spch) of songs released in the 2010s.
Identify the artist with the most songs having a BPM greater than 120.
Detect any outliers in the "dur" (duration) column using statistical methods.
Identify any correlations between BPM and energy levels across different
genres.
Determine the trend of speechiness over the years.

SPOTIFY ANALYSIS
Analysis by Basic level
Queries

SPOTIFY ANALYSIS
Retrieve all columns for songs released in a
2010, 2015, 2019 year
select *
from
Spotify
where year in ('2010','2015','2019')

SPOTIFY ANALYSIS
Retrieve all columns top 3 entries for songs released in a
2010, 2015, 2019 year order by
energy ( high value )
WITH RankedSpotify AS (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY year order by nrgy desc) AS RowNum
FROM
Spotify
WHERE
year IN ('2010', '2015', '2019')
)
SELECT * FROM RankedSpotify WHERE RowNum <= 3;

SPOTIFY ANALYSIS
Calculate the average BPM (beats per minute)
for all songs in the dataset.

SELECT
SUM(BPM) / COUNT(BPM) AS AVERAGE_BPM
FROM SPOTIFY

SPOTIFY ANALYSIS
Count the number of songs by each artist.
SELECT COUNT(TITLE) AS NUMBER_OF_SONGS,
ARTIST
FROM SPOTIFY
GROUP BY ARTIST

SPOTIFY ANALYSIS
Find the maximum and minimum values of the
"nrgy" (energy) column

SELECT
MAX(NRGY) AS MAXIMUM_ENERGY,
MIN(NRGY) AS MINIMUM_ENERGY
FROM SPOTIFY

SPOTIFY ANALYSIS
Calculate the total duration of all songs in
minutes.

SELECT SUM(DUR)/60 AS
TOTAL_DURATION_MINUTES FROM SPOTIFY

SPOTIFY ANALYSIS
Analysis by Medium
level Queries

SPOTIFY ANALYSIS
Identify the top 5 genres with the highest
average danceability (dnce)

SELECT TOP 5 TOP_GENRE ,


SUM(DNCE)/COUNT(TOP_GENRE) AS AVERAGE_DNCE
FROM SPOTIFY
GROUP BY TOP_GENRE
ORDER BY AVERAGE_DNCE DESC

SPOTIFY ANALYSIS
Find the song with the highest energy (nrgy)
value for each year.

SELECT TITLE, NRGY,


YEAR
FROM SPOTIFY
WHERE (NRGY, YEAR) IN
(
SELECT MAX(NRGY)
AS MAX_NRGY, YEAR
FROM SPOTIFY
GROUP BY YEAR
);

SPOTIFY ANALYSIS
Calculate the median value of the "val" column
WITH SORTDATA AS (
SELECT
*,
ROW_NUMBER() OVER (ORDER BY VAL ASC) AS ROW_NUM
FROM
SPOTIFY
)
SELECT VAL AS MEDIAN_VALUE FROM SORTDATA
WHERE
CASE
WHEN (SELECT COUNT(ROW_NUM) FROM SORTDATA)%2=0
THEN ROW_NUM= (SELECT MAX(ROW_NUM) FROM SORTDATA)/2
WHEN (SELECT COUNT(ROW_NUM) FROM SORTDATA)%2=1
THEN ROW_NUM= ((SELECT MAX(ROW_NUM) FROM SORTDATA)+1)/2
END;

SPOTIFY ANALYSIS
Determine the average speechiness (spch) of
songs released in the 2010s

SELECT AVG(SPCH)
AS
AVERAGE_SPCH_2010
FROM
SPOTIFY
WHERE YEAR='2010'

SPOTIFY ANALYSIS
Identify the artist with the most songs having a
BPM greater than 120.

SELECT COUNT(ARTIST) AS NUM_SONGS, ARTIST


FROM SPOTIFY
WHERE BPM>120
GROUP BY ARTIST
ORDER BY NUM_SONGS DESC
LIMIT 3
;

SPOTIFY ANALYSIS
Analysis by Complex
level Queries

SPOTIFY ANALYSIS
Detect any outliers in the "dur" (duration)
column using statistical methods.

WITH Stats AS (
SELECT
AVG(dur) AS Mean,
STDDEV_POP(dur) AS StdDev
FROM
spotify
)
SELECT
dur,
CASE
WHEN ABS(dur - Mean) > 3 * StdDev THEN 'Outlier'
ELSE 'Not an Outlier'
END AS Outlier_Status
FROM
spotify, Stats
having Outlier_Status='Outlier';

SPOTIFY ANALYSIS
Identify any correlations between BPM and energy
levels across different genres.
SELECT
`top genre`,
correlation,
CASE
WHEN correlation > 0 THEN 'Positive Correlation'
WHEN correlation < 0 THEN 'Negative Correlation'
WHEN correlation = 0 THEN 'Zero Correlation'
END AS correlation_status
FROM (
SELECT
`top genre`,
(COUNT(*) * SUM(bpm * nrgy) - SUM(bpm) * SUM(nrgy)) /
(SQRT((COUNT(*) * SUM(bpm * bpm) - SUM(bpm) * SUM(bpm))
* (COUNT(*) * SUM(nrgy * nrgy) - SUM(nrgy) * SUM(nrgy)))) AS
correlation
FROM
spotify
GROUP BY
`top genre`
) AS subquery_alias;

SPOTIFY ANALYSIS
Determine the trend of speechiness over the
years.

SELECT
YEAR,
AVG(SPCH) AS AVG_SPEECHINESS,
MIN(SPCH) AS MIN_SPEECHINESS,
MAX(SPCH) AS MAX_SPEECHINESS
FROM
SPOTIFY
GROUP BY
YEAR
ORDER BY
YEAR;

SPOTIFY ANALYSIS

You might also like