0% found this document useful (0 votes)

53 views27 pages

Spotify Playlist Recommender: The Task The Dataset Metrics Proposed Solutions EDA Result

The document describes a Spotify playlist recommender system. It defines the task as generating recommended tracks to continue an incomplete playlist. It describes the dataset of 1 million playlists and metrics like R-precision and NDCG to evaluate recommendations. It proposes solutions like collaborative filtering, KNN, frequent pattern growth and matrix factorization. Exploratory data analysis is performed on the dataset and different methods are tested, with playlist-based and song-based KNN performing the best in terms of metrics but being slower than other methods.

Uploaded by

Jyotimoy das

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

53 views27 pages

Spotify Playlist Recommender: The Task The Dataset Metrics Proposed Solutions EDA Result

Uploaded by

Jyotimoy das

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 27

Spotify Playlist Recommender

Table of Contents
1. The task
2. The dataset
3. Metrics
4. Proposed Solutions
5. EDA
6. Result
The task
The goal of the challenge is to develop a system for the task of
automatic playlist continuation. Given a set of playlist features,
participants' systems shall generate a list of recommended tracks
that can be added to that playlist, thereby 'continuing' the playlist.
We define the task formally as follows
Input

Given N incomplete playlists

Output

A list of 500 recommended candidate tracks, ordered by

relevance in decreasing order.
The dataset
The Million Playlist Dataset (MPD) contains 1,000,000 playlists
created by users on the Spotify platform. These playlists were
created during the period of January 2010 through October 2017.

Test set
In order to replicate competition's test set, we remove some
playlists from original playlists such that

All tracks in the challenge set appear in the MPD

All holdout tracks appear in the MPD
Test set size: 1000 playlists

Train set
Train set will be the original set subtracts tracks in test sets
Metrics
G: the ground truth set of tracks
R: and the ordered list of recommended tracks by R.
The size of a set or list is denoted by | â‹… |

R-precision

R-precision is the number of retrieved relevant tracks divided by

the number of known relevant tracks (i.e., the number of withheld
tracks):
Normalized discounted cumulative gain (NDCG)

Discounted cumulative gain (DCG) measures the ranking quality of

the recommended tracks, increasing when relevant tracks are
placed higher in the list.

The ideal DCG or IDCG is, on our case, equal to:

If the size of the set intersection of G and R, is empty, then the

DCG is equal to 0. The NDCG metric is now calculated as:
Recommended Songs clicks

Recommended Songs is a Spotify feature that, given a set of tracks

in a playlist, recommends 10 tracks to add to the playlist. The list
can be refreshed to produce 10 more tracks. Recommended Songs
clicks is the number of refreshes needed before a relevant track is
encountered. It is calculated as follows:

If the metric does not exist (i.e. if there is no relevant track in R), a
value of 51 is picked (which is 1 + the maximum number of clicks
possible).
Proposed Solutions
For our problem, we treat "user" as "playlist" and "item" as "song".
Because the dataset don't provide much information about each
song so we won't use content-based filtering. Therefore we would
only focus on

KNN
Collaborative Filtering
Frequent Pattern Growth
Matrix Factorization
Collaborative Filtering (CF)
Playlist based CF: From similarity between each playlist and
how other playlists "rate" (include or not) a track, I can infer the
current "rate".
Song based CF: From similarity between each songs and how
current playlist "rate" other songs, I can infer the current "rate".
Advantage:
easier to implementation
new data can be added easily and incrementally
don't need content of items
scales well with correlated items
Disadvantage:
are dependent on human ratings
cold start problem for new user and item
sparsity problem of rating matrix
limited scalability for large datasets
1. Construct playlist-song matrix

"1" means that song is included in the playlist and "0" otherwise.
For example, playlist 1 contains song 2 and song 3, song 2 also
includes in playlist 2.
2. Calculate the cosine similarity between song-song or playlist-
playlist. In playlist-playlist similarity, we take each row as a
vector while in song-song similarity we take column as a
vector.
For playlist-playlist, we predict that a playlist p contains song s is
given by the weighted sum of all other playlists' containing for song
s where the weighting is the cosine similarity between the each
playlist and the input playlist p. Then normalizing the result.
∑ sim(p,p′ )rp′ s
p′
r^ps = ∑ ∣sim(p,p′ )∣
p′

With song-song, we simply replace similarity matrix of playlists by

that of songs.
∑ sim(s,s′ )rps′
s′
r^ps = ∑ ∣sim(s,s′ )∣
s′
KNN
Playlist-based: Find most similar playlist with current playlist
and add all non-duplicate songs.
Song-based: Find most similar songs with current songs in
playlist and add them to the playlist.
Advantage:
Same as Collaborative Filtering
Disadvantage:
Need to mantain the large matrix of similarity
Cold start problem
Frequent Pattern Growth
I have a number of current tracks, I will look at other playlists to find
common tracks associated with currents tracks and add them.
Advantage:
Scalability
Don't have problem with sparsity
Disadvantage:
Expensive model building
1. The first step of FP-growth is to calculate item frequencies and
identify frequent items.
2. The second step of FP-growth uses a suffix tree (FP-tree)
structure to encode transactions without generating candidate
sets explicitly, which are usually expensive to generate.
3. After the second step, the frequent itemsets can be extracted
from the FP-tree
Matrix Factorization
Decompose the playlist-songs matrix into dot product of many
matrix and use these matrix to make inference.

Matrix factorization can be done with orthogonal factorization

(SVD), probabilistic factorization (PMF) or Non-Negative
factorization (NMF).
Advantage:
Better addresses the sparsity and scalabity problem.
Improve prediction performance
Disadvantage:
Expensive model building
Trade-off between prediction performance and scalability
Loss of information in dimension reduction technique

Due to the large memory use when compute smaller matrix, I will
not implement Matrix Factorization.
Word2vec
We can use Word2vec to reduce the dimension of matrix and
perform KNN on the new dimension
Eploratory Data Analysis
Number of:
playlists: 1,000,000
Tracks: 66,346,428
Unique tracks: 2,262,292
Unique albums: 734,684
Unique Titles: 92,944
Distribution of: Playlist length, Number of Albums / Playlist,
Number of Artist / Playlist, Number of edits / Playlist, Number
of Followers / Playlist, Number of Tracks / Playlist

As we can see all distributions are left-skewed which means if

we are looking for average value, we should go for "Median"
not "Mean"
Median of playlist length: 11422438.0
Median of number of albums in each playlist: 37.0
Median of number of artists in each playlist: 29.0
Median of number of edits in each playlist: 29.0
Median of number of followers in each playlist: 1.0
Median of number of tracks in each playlist: 49.0
Top 20 Songs in Sporify playlists

Top 20 Artist in Spotify Playlists

Result

Time
R- Song
Method NDCG Taken
precision Click
(s)
Playlist-based KNN 0.7766 1.6010 0.0 41.42
Song-based KNN 0.7847 0.7975 0.0 4183
Word2Vec + Song-based 0.0030 0.004 10.35
Word2Vec + Playlist-based 0.0171 0.015 8.086
Playlist-based CF (get top approx
0.7844 0.8011 0.0
K rating songs) 12000
FP Growth
Conclusion
Playlist-based and song-based KNN perform really well on this
dataset. Thanks for multi-processing, the inteference is really
fask. However similarity matrix and playlist-song matrix are
built beforehand.
Collaborative filtering also show comparable result with KNN
but take much longer to infer result.
Word2Vec dimension reduction failed to capture the similarity
between songs / playlists. It means that despite that playlists
are sequence of songs, its behaviour is difference from text
sequence.
Further improvement
Implement FP-growth in Spark and compare with current
solutions
Matrix Factorization

Track Recommender System Based On Lyrics, Audio and Popularity Features
No ratings yet
Track Recommender System Based On Lyrics, Audio and Popularity Features
6 pages
Recommendation System Sample Paper
No ratings yet
Recommendation System Sample Paper
6 pages
Collaborative Filtering
No ratings yet
Collaborative Filtering
5 pages
Music Recommendation System Analysis
No ratings yet
Music Recommendation System Analysis
10 pages
Paper 2 (Spotify)
No ratings yet
Paper 2 (Spotify)
9 pages
Music Recommender System Seminar
No ratings yet
Music Recommender System Seminar
24 pages
Final
No ratings yet
Final
14 pages
Spotify's Data Science in Music Recommendations
No ratings yet
Spotify's Data Science in Music Recommendations
7 pages
Engineering Students' Spotify Project
100% (1)
Engineering Students' Spotify Project
38 pages
ReportAI - Project 1 - Group 13
No ratings yet
ReportAI - Project 1 - Group 13
16 pages
Music Recommendation System
No ratings yet
Music Recommendation System
22 pages
Music Genre Classification
No ratings yet
Music Genre Classification
33 pages
Music Recommendation System Analysis
No ratings yet
Music Recommendation System Analysis
1 page
Songs Recommender System Using Machine
No ratings yet
Songs Recommender System Using Machine
3 pages
Spotify Music Recommendation System Analysis
No ratings yet
Spotify Music Recommendation System Analysis
13 pages
Music Playlist Generation
No ratings yet
Music Playlist Generation
9 pages
Workshop RecSys Challenge 2018
No ratings yet
Workshop RecSys Challenge 2018
6 pages
LastFM Recommender Systems Guide
No ratings yet
LastFM Recommender Systems Guide
10 pages
1 en 4 Chapter Author
No ratings yet
1 en 4 Chapter Author
10 pages
Project
No ratings yet
Project
21 pages
Music Recommender System Guide
No ratings yet
Music Recommender System Guide
11 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
16 pages
Projects 2021 C6
No ratings yet
Projects 2021 C6
93 pages
Music Genre AI for Streaming Services
No ratings yet
Music Genre AI for Streaming Services
6 pages
Content-Driven Music Recommendation: Evolution, State of The Art, and Challenges
No ratings yet
Content-Driven Music Recommendation: Evolution, State of The Art, and Challenges
29 pages
User Based Spotify Recommendation System Using Machine Learning Algorithms
No ratings yet
User Based Spotify Recommendation System Using Machine Learning Algorithms
6 pages
Music Genre Classification With ResNet and
No ratings yet
Music Genre Classification With ResNet and
17 pages
Music Genre AI Classification
No ratings yet
Music Genre AI Classification
5 pages
Building a Music Recommendation System
No ratings yet
Building a Music Recommendation System
10 pages
Music Recommendation System Overview
No ratings yet
Music Recommendation System Overview
23 pages
Music Reccomendation System
No ratings yet
Music Reccomendation System
32 pages
Deep Content Music Recommendation
No ratings yet
Deep Content Music Recommendation
15 pages
Music Genre Classification Techniques
No ratings yet
Music Genre Classification Techniques
5 pages
Music Recs for Engineers
No ratings yet
Music Recs for Engineers
2 pages
Machine Learning-Based Music Classification and Recommendation System From Spotify
No ratings yet
Machine Learning-Based Music Classification and Recommendation System From Spotify
12 pages
MODEL SELECTIO1 (Humanize) (Checked)
No ratings yet
MODEL SELECTIO1 (Humanize) (Checked)
5 pages
Music Recommendation On Spotify Using Deep Learning: Chhavi Maheshwari
No ratings yet
Music Recommendation On Spotify Using Deep Learning: Chhavi Maheshwari
9 pages
Music Sequence Prediction With Mixture Hidden Markov Models: Tao Li, Minsoo Choi, Kaiming Fu, Lei Lin
No ratings yet
Music Sequence Prediction With Mixture Hidden Markov Models: Tao Li, Minsoo Choi, Kaiming Fu, Lei Lin
5 pages
BOOM, Larger-Scale
No ratings yet
BOOM, Larger-Scale
23 pages
Mini Project - Aiml
No ratings yet
Mini Project - Aiml
8 pages
Hybrid Music Recommendation System
No ratings yet
Hybrid Music Recommendation System
5 pages
Music Recommendation
100% (1)
Music Recommendation
113 pages
第十讲-Recommender Systems
No ratings yet
第十讲-Recommender Systems
81 pages
Aneesha Big Data Project
No ratings yet
Aneesha Big Data Project
4 pages
ML Models Predicting Song Hits
No ratings yet
ML Models Predicting Song Hits
5 pages
Rs. Case - Study - 462
No ratings yet
Rs. Case - Study - 462
4 pages
Final 2
No ratings yet
Final 2
1 page
Insights On Song Genres With PCA Analysis of Spectrograms
No ratings yet
Insights On Song Genres With PCA Analysis of Spectrograms
20 pages
Music - Genre - Classification Final Paper1 Copy Final
No ratings yet
Music - Genre - Classification Final Paper1 Copy Final
16 pages
Music Recommendation System Using Hybrid Approach
No ratings yet
Music Recommendation System Using Hybrid Approach
5 pages
Classification and Popularity Assessment of English Songs Based On Audio Features
No ratings yet
Classification and Popularity Assessment of English Songs Based On Audio Features
3 pages
Team 14 - Prabhas and Team
No ratings yet
Team 14 - Prabhas and Team
12 pages
DM Final Report
No ratings yet
DM Final Report
4 pages
Evaluation Metrics for Recommendation Systems
No ratings yet
Evaluation Metrics for Recommendation Systems
25 pages
CS345A Data Mining: Recommendation Systems
No ratings yet
CS345A Data Mining: Recommendation Systems
26 pages
Music Recommendation System Analysis
No ratings yet
Music Recommendation System Analysis
5 pages
Spotify Song Cohort Analysis
No ratings yet
Spotify Song Cohort Analysis
5 pages
A Survey of Music Recommendation Systems With A Proposed Music Recommendation System
No ratings yet
A Survey of Music Recommendation Systems With A Proposed Music Recommendation System
7 pages
Polynomials
No ratings yet
Polynomials
6 pages
PCDP Components WWW
No ratings yet
PCDP Components WWW
1 page
Multi-Class Classification Lecture
No ratings yet
Multi-Class Classification Lecture
19 pages
Branch and Price
No ratings yet
Branch and Price
23 pages
Assignment 01-Spring24
No ratings yet
Assignment 01-Spring24
3 pages
Numerical Integration
No ratings yet
Numerical Integration
41 pages
Viterbi Algorithm Example and Decoding
No ratings yet
Viterbi Algorithm Example and Decoding
7 pages
Design and Analysis of Algorithm Notes-47-60
No ratings yet
Design and Analysis of Algorithm Notes-47-60
14 pages
Problem Solving With Algorithms and Data Structures Using Python - Problem Solving With Algorithms and Data Structures
100% (1)
Problem Solving With Algorithms and Data Structures Using Python - Problem Solving With Algorithms and Data Structures
6 pages
Generalized Sampling PDF
No ratings yet
Generalized Sampling PDF
2 pages
Understanding CNN Architecture Basics
No ratings yet
Understanding CNN Architecture Basics
24 pages
Operations Management: William J. Stevenson
No ratings yet
Operations Management: William J. Stevenson
19 pages
Kalman Decomposition in Linear Systems
No ratings yet
Kalman Decomposition in Linear Systems
31 pages
Sem 3 Comps Data Structure Important Questions
No ratings yet
Sem 3 Comps Data Structure Important Questions
29 pages
Sorting Algorithms Explained
No ratings yet
Sorting Algorithms Explained
40 pages
BMA3104 Practise Exercise
No ratings yet
BMA3104 Practise Exercise
3 pages
Lab Manual No 6
No ratings yet
Lab Manual No 6
4 pages
CNN Notes Unit-3
No ratings yet
CNN Notes Unit-3
12 pages
Eecs 639 HW4
No ratings yet
Eecs 639 HW4
8 pages
Lesson 4 - Day 2 Factoring by PST and QT
No ratings yet
Lesson 4 - Day 2 Factoring by PST and QT
20 pages
Pro Material Series: Free Placement Video Tutorial With Mock Test Visit
No ratings yet
Pro Material Series: Free Placement Video Tutorial With Mock Test Visit
5 pages
Fem Formulation
No ratings yet
Fem Formulation
14 pages
Engineering MATLAB Exam Guide
No ratings yet
Engineering MATLAB Exam Guide
2 pages
Lecture10 2
No ratings yet
Lecture10 2
8 pages
Meshless Methods for Engineers
100% (1)
Meshless Methods for Engineers
170 pages
Algorithm Analysis & Design Exam
No ratings yet
Algorithm Analysis & Design Exam
62 pages
Prolog Tic-Tac-Toe Minimax Engine
No ratings yet
Prolog Tic-Tac-Toe Minimax Engine
12 pages
Count Rock Samples by Size Ranges
No ratings yet
Count Rock Samples by Size Ranges
5 pages
Ejercicios en Inglés
No ratings yet
Ejercicios en Inglés
11 pages
Gauss Elimination Method Explained
No ratings yet
Gauss Elimination Method Explained
18 pages

Spotify Playlist Recommender: The Task The Dataset Metrics Proposed Solutions EDA Result

Uploaded by

Spotify Playlist Recommender: The Task The Dataset Metrics Proposed Solutions EDA Result

Uploaded by

Spotify Playlist Recommender

Given N incomplete playlists

A list of 500 recommended candidate tracks, ordered by

All tracks in the challenge set appear in the MPD

R-precision is the number of retrieved relevant tracks divided by

Discounted cumulative gain (DCG) measures the ranking quality of

The ideal DCG or IDCG is, on our case, equal to:

If the size of the set intersection of G and R, is empty, then the

Recommended Songs is a Spotify feature that, given a set of tracks

With song-song, we simply replace similarity matrix of playlists by

Matrix factorization can be done with orthogonal factorization

As we can see all distributions are left-skewed which means if

Top 20 Artist in Spotify Playlists

You might also like