Music Recommendation System
Spotify, saavn, youtube music, gaana, amazon prime music are some examples of
music streaming service. It will recommend songs based on previous songs.
Q1) for building the system, what is the business metric?
o Time spent (based on UI, collection of songs, ads)
o # listens / # recommendation.
For a given time t, the precision @top k.
‘k’ is the metric to be fixed by discussing with the business people.
Q2) what is the data required for the project?
User data [age, language, gender, location]
Item(song) data [ era, singer, key instrument, genre, sub-genre]
User item rating. [ passively listen to music and that’s reason the rating information
is very sparse, because they will rate only for 1 percent of songs].
User item affinity [ if he skips it will be zero, if he listens again and again it will be 1].
User item timestamp.
Search history
Song – raw music audio [ we will see more complex features].
Current trending songs.
Musician/ Artist data
Yahoo music dataset is available. If needed can work on that.
Q3) how will you do train/test split?
Time based splitting.
Q4) How do you model this as ML problem? (Classification/ regression/ recommendation
system)
Weightage regression and recommendation system.
We have the information (user and item) to construct a classification (will listen/ skip).
The biz metric is precision at K. It is a classification metric.
Suppose I have an item, we will find the content-based filtering features. Using genre,
artist, time, wave file itself(FFT, spectrogram).
Let k be the top liked songs in the most recent past.
[Link]
Spectrogram as image constructed CNN on top of it.
40 features for each songs.
Plotted tsne.
Q) what types of model will work well?
LR/SVM
Q) if using kernel SVM, can we skip SV and just store the hyperplane separating the points?
Suppose we have Xi and Xj belong to Rd.
One way to find the similarity is XTX.
K(Xi,Xj) will give the real value.
But the equation will be in different dimension d’.
If we don’t have these points, then how will we compute the plane?
Q) how do we productionize the recommendation system?
Recent data for last five songs.
Cold start (new user).
Scalability. we can use spark.
We will save user and items vectors with distributed hashtables and redis.
Very simple model for recent last five songs and find most similar songs.