0% found this document useful (0 votes)
20 views17 pages

Aai Report

Uploaded by

kadprabhu1873
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views17 pages

Aai Report

Uploaded by

kadprabhu1873
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

Mini Project Report

On

Music Genre Classification Machine


Learning
For
Subject: Advanced Artificial Intelligence (REV- 2019 ‘C’ Scheme) of
Fourth Year,
(BE Sem-VIII)

Course code: CSL801


in
Computer Science and Engineering (AI&ML)

By
Prabhuwalawalkar Kadambari Avinash (R-21-0285)

Under the guidance of

Prof. Rahulkumar P. Tivarekar

Hope Foundation’s
Finolex Academy of Management & Technology, Ratnagiri
Academic Year 2024-25
CERTIFICATE

This is to certify that the project entitled Music Genre Classification using

Machine learning is a bonafide work of Prabhuwalawalkar Kadambari

Avinash (R-21-0285) submitted to the University of Mumbai in partial

fulfillment of the requirement for the award of Mini Project Report for the

subject Advanced Artificial Intelligence (REV- 2019 ‘C’ Scheme) of Fourth

Year, (BE Sem-VIII) in Computer Science and Engineering (AI & ML) as

laid down by University of Mumbai during academic year 2024-25.

Prof. Rahulkumar P. Tivarekar Prof. V.V. Nimbalkar


Subject Teacher Head of Department

2
INDEX

SR. NO. NAME OF TOPIC PAGE NO.

1. ABSTRACT 4

2. INTRODUCTION 5

3. LITERATURE REVIEW 6

4. METHODOLOGY 8

5. IMPLEMENTATION 10

6. RESULTS AND FINDINGS 13

7. FUTURE SCOPE AND IMPROVEMENTS 15

8. CONCLUSION 16

9. REFERENCES 17

3
1. ABSTRACT

This study explores the application of machine learning techniques for music genre
classification, utilizing a comprehensive dataset of 114,000 tracks spanning 125 distinct genres
sourced from Spotify. The primary objective is to assess the predictive capabilities of various
audio features in determining a song's genre and to identify which genres present the most
significant challenges for classification models. The dataset encompasses a rich array of audio
features, including danceability, energy, and acousticness, alongside essential metadata such
as track names, album details, and popularity scores.
To facilitate a thorough analysis, the research begins with an exploratory data analysis (EDA)
phase, which involves checking for duplicates and null values, examining the distributions of
both numeric and categorical variables, and analyzing the relationships between features to
ensure a balanced target variable. The EDA process is crucial for understanding the dataset's
structure and preparing it for subsequent modeling.
Data preparation involves several key steps, including the removal of duplicates, the exclusion
of non-sound-based genres, and the elimination of non-explanatory features. Categorical
variables are transformed through one-hot encoding, while numerical features are scaled to
enhance model performance. To address the inherent similarities among genres, hierarchical
clustering is employed to reduce the number of genres, with a dendrogram visualizing the
clustering results.
The study proceeds to implement various machine learning models, including a Neural
Network, XGBoost, K-Nearest Neighbors (KNN) Classifier, and an ensemble model derived
from weak learners. The performance of these models is evaluated using top-k categorical
accuracy, focusing specifically on the top three predicted genres.
Ultimately, this research aims to deepen the understanding of the complexities involved in
music genre classification through machine learning, leveraging audio features and advanced
modeling techniques to enhance accuracy and relevance in music recommendations. The
findings are expected to contribute valuable insights into the effectiveness of different
algorithms in classifying music genres and the potential for improving user experiences in
music streaming platforms.

4
2. INTRODUCTION

The proliferation of digital music platforms has transformed the way listeners access and
engage with music, leading to an overwhelming volume of available tracks across diverse
genres. As a result, effective music genre classification has become increasingly important for
enhancing user experience, enabling personalized recommendations, and facilitating music
discovery. Machine learning techniques offer promising solutions for automating the
classification of music genres by leveraging audio features and metadata associated with tracks.

This study focuses on the application of machine learning for music genre classification,
utilizing a comprehensive dataset of 114,000 tracks sourced from Spotify, which encompasses
125 distinct genres. The dataset includes a rich array of audio features, such as danceability,
energy, and acousticness, alongside essential metadata like track names, album details, and
popularity scores. By analyzing these features, the research aims to assess the predictive
capabilities of various machine learning models in accurately determining a song's genre.

The initial phase of the study involves exploratory data analysis (EDA), which is critical for
understanding the dataset's structure and identifying any potential issues, such as duplicates or
null values. EDA also facilitates the examination of feature distributions and relationships,
ensuring a balanced target variable for modeling. Following this, the data preparation process
includes the removal of non-sound-based genres and non-explanatory features, as well as the
transformation of categorical variables through one-hot encoding and the scaling of numerical
features.

To address the inherent similarities among genres, hierarchical clustering is employed to reduce
the number of genres, providing a clearer framework for classification. The study then
implements various machine learning models, including Neural Networks, XGBoost, K-
Nearest Neighbors (KNN) Classifier, and ensemble models derived from weak learners. The
performance of these models is evaluated using top-k categorical accuracy, with a specific
focus on the top three predicted genres.

5
3. LITERATURE REVIEW

1. Tzanetakis, G., & Cook, P. (2002). "Musical Genre Classification of Audio Signals."
IEEE Transactions on Speech and Audio Processing, 10(5), 293-302.
Tzanetakis and Cook (2002) provide a foundational study in the field of music genre
classification, focusing on the extraction of audio features for genre identification. The authors
propose a framework that utilizes both timbral and rhythmic features, demonstrating that a
combination of these elements significantly enhances classification accuracy. Their work
emphasizes the importance of feature selection and the role of machine learning algorithms,
such as k-Nearest Neighbors (k-NN) and Support Vector Machines (SVM), in achieving
effective genre classification. This study serves as a cornerstone for subsequent research,
highlighting the necessity of robust feature extraction techniques in the development of genre
classification systems.
2. Bertin-Mahieux, T., Ellis, D. P. W., Whitman, B., & Lamere, P. (2011). "The Million Song
Dataset." Proceedings of the 12th International Society for Music Information Retrieval
Conference (ISMIR).
Bertin-Mahieux et al. (2011) introduce the Million Song Dataset, a large-scale collection of
audio features and metadata designed to facilitate research in music information retrieval. This
dataset has become a critical resource for researchers in the field, enabling the development
and evaluation of various machine learning models for genre classification. The authors discuss
the challenges associated with genre labeling and the implications of using large datasets for
training models. Their work underscores the importance of data quality and diversity in
improving the performance of genre classification systems, paving the way for future studies
that leverage extensive datasets for enhanced accuracy.
3. Choi, K., Fazekas, G., & Sandler, M. (2016). "Automatic Music Genre Classification: A
Review of the State of the Art." IEEE Transactions on Multimedia, 18(5), 1000-1012.
In their comprehensive review, Choi et al. (2016) analyze the state of the art in automatic music
genre classification, summarizing various approaches and methodologies employed in recent
studies. The authors categorize existing techniques into traditional machine learning methods
and deep learning approaches, highlighting the shift towards neural networks for feature
learning and classification tasks. They discuss the advantages and limitations of each approach,
emphasizing the need for hybrid models that combine the strengths of both traditional and

6
modern techniques. This review provides valuable insights into the evolving landscape of
music genre classification and identifies key areas for future research, including the integration
of contextual information and user preferences.

4. Huang, Y., & Yang, Y. (2020). "Deep Learning for Music Genre Classification: A
Review." Journal of Computer Science and Technology, 35(1), 1-20.
Huang and Yang (2020) present a thorough examination of deep learning techniques applied
to music genre classification, focusing on convolutional neural networks (CNNs) and recurrent
neural networks (RNNs). The authors highlight the advantages of deep learning in
automatically extracting relevant features from raw audio signals, which significantly reduces
the need for manual feature engineering. They also discuss the challenges associated with
training deep learning models, such as overfitting and the requirement for large labeled
datasets. The review concludes with a discussion of future directions, including the potential
for transfer learning and the incorporation of multimodal data to enhance classification
performance. This work underscores the transformative impact of deep learning on the field of
music genre classification and sets the stage for further exploration of advanced neural
architectures.

7
4. METHODOLOGY

4.1 Tools and Technologies Used


In this study of music genre classification, a variety of tools and technologies were employed
to facilitate data processing, model development, and evaluation. The primary programming
language used was Python, known for its versatility and extensive libraries that support data
analysis and machine learning tasks. For data manipulation and analysis, Pandas and NumPy
were utilized, allowing for efficient handling of large datasets and numerical computations.
Data visualization was enhanced through the use of Matplotlib and Seaborn, which provided
the means to create informative plots and charts for exploratory data analysis.

Machine learning models were developed using Scikit-learn, which offers a wide range of
algorithms for classification and evaluation, alongside XGBoost for gradient boosting tasks.
For deep learning applications, TensorFlow and its high-level API, Keras, were employed to
build and train neural network models. Hierarchical clustering and dendrogram visualizations
were facilitated by the SciPy library. The development environment was primarily Jupyter
Notebook, which allowed for interactive coding and documentation of the workflow. Version
control was managed using Git, ensuring a systematic approach to tracking changes in the
codebase. Additionally, Google Colab was considered for computationally intensive tasks,
providing free access to GPUs for training deep learning models.
4.2 System Architecture
The system architecture for the music genre classification project is designed to facilitate the
seamless flow of data from collection to model deployment. It encompasses several key
components, each serving a specific function in the overall process. The architecture can be
divided into the following layers:
1. Data Collection Layer
This layer is responsible for gathering the dataset used for music genre classification.
The primary source is the Spotify API, which provides access to a rich dataset of audio
features and metadata for a wide range of tracks. The data is collected in a structured
format, including attributes such as track names, artists, album details, audio features
(e.g., danceability, energy, acousticness), and genre labels.

8
2. Data Preprocessing Layer
Once the data is collected, it is passed to the data preprocessing layer, where several
critical tasks are performed:
• Data Cleaning: This involves removing duplicates and handling missing values to
ensure data integrity.
• Feature Selection: Non-relevant features are dropped, and only those that contribute
to genre classification are retained.
• Encoding and Scaling: Categorical variables are transformed using one-hot
encoding, while numerical features are scaled to ensure uniformity across the
dataset.
• Hierarchical Clustering: This step groups similar genres together, reducing the
number of genres for classification and simplifying the modeling process.
3. Model Development Layer
In this layer, various machine learning models are developed and trained using the
preprocessed data. The models include:
• Traditional Machine Learning Models: Algorithms such as k-Nearest Neighbors
(KNN) and XGBoost are implemented to classify music genres based on the
extracted features.
• Deep Learning Models: Neural networks are constructed using TensorFlow and
Keras, allowing for the automatic extraction of features from raw audio signals.
The models are trained on the training dataset, and hyperparameter tuning is performed
to optimize their performance.
4. Evaluation Layer
After training, the models are evaluated using a separate testing dataset. The evaluation
layer focuses on assessing the performance of each model through metrics such as top-
k categorical accuracy and confusion matrices. This step helps identify the strengths
and weaknesses of each model, guiding further refinements.

9
5. IMPLEMENTATION

5.1 Steps of Model Building


Step 1:Define Objectives: Clearly outline the goals of the project, such as improving music
genre classification accuracy and enhancing user recommendations on music platforms.

Step 2:Data Collection: Identify and access the dataset, such as the one from Spotify, which
includes audio features and genre labels for a large number of tracks.
Ensure that the dataset is comprehensive and representative of the genres to be classified.

Step 3: Exploratory Data Analysis (EDA): Inspect the dataset for duplicates and missing
values. Analyze the distribution of audio features and genre labels to understand the dataset's
characteristics. Visualize relationships between features and genres to identify patterns and
potential challenges

Step 4: Data Preprocessing:


• Data Cleaning: Remove any duplicates and handle missing values appropriately.
• Feature Selection: Identify and retain only the relevant features that contribute to genre
classification, discarding non-informative attributes.
• Encoding: Convert categorical variables into numerical format using techniques like
one-hot encoding.
• Scaling: Standardize or normalize numerical features to ensure they are on a similar
scale, which is important for many machine learning algorithms.
• Hierarchical Clustering: Perform clustering on genres to group similar ones together,
reducing the total number of genres for classification.

Step 5: Model Development: Choose a variety of machine learning algorithms to test,


including traditional models (e.g., k-Nearest Neighbors, XGBoost) and deep learning models
(e.g., neural networks).
Split the dataset into training and testing sets, typically using an 80-20 split. Train the
selected models on the training set. Optimize model parameters to improve performance
through techniques like grid search or random search.

10
Step 7: Model Evaluation : Evaluate the trained models using the testing dataset.
Use metrics such as top-k categorical accuracy and confusion matrices to assess model
performance and identify areas for improvement.
Compare the performance of different models to determine the best-performing approach.

Step 8: Model Selection: Based on evaluation results, select the model that demonstrates the
highest accuracy and reliability for genre classification.

5.2 Challenges Faced


1. Data Quality and Availability
• Inconsistent Genre Labels: Many tracks may have ambiguous or inconsistent
genre labels, leading to difficulties in training accurate models. Some songs may
belong to multiple genres, complicating classification.
• Limited Data for Certain Genres: Some genres may be underrepresented in
the dataset, resulting in imbalanced classes that can bias the model towards more
prevalent genres.
2. Feature Extraction and Selection
• Complexity of Audio Features: Extracting meaningful features from audio
signals can be challenging. While tools exist for feature extraction, determining
which features are most relevant for genre classification requires careful
analysis.
• High Dimensionality: The presence of numerous features can lead to the "curse
of dimensionality," where the model's performance deteriorates due to
overfitting and increased computational complexity.
3. Model Selection and Tuning
• Choosing the Right Algorithm: With various machine learning and deep
learning algorithms available, selecting the most appropriate one for genre
classification can be daunting. Each algorithm has its strengths and weaknesses,
and performance can vary significantly based on the dataset.

11
• Hyperparameter Tuning: Optimizing hyperparameters for different models
can be time-consuming and computationally intensive, requiring extensive
experimentation to achieve the best results.
4. Overfitting and Generalization
• Overfitting: Models, especially complex ones like deep neural networks, may
perform well on training data but poorly on unseen data. Ensuring that the model
generalizes well to new examples is a significant challenge.
• Validation Techniques: Implementing effective validation techniques, such as
cross-validation, is essential to assess model performance accurately and avoid
overfitting.

12
6. RESULTS AND FINDINGS

All three models achieved significantly higher performance using Top-3 categorical accuracy
compared to standard accuracy, highlighting the challenge of pinpointing the exact genre but
the success in capturing it within the top 3 predictions.
• XGBoost: Maintained the highest Top-3 categorical accuracy (73.74%) but has a
slightly lower standard accuracy (47.47%) compared to the Neural Network. While
powerful, it likely requires more computational resources for training.
• Neural Network: Achieved a comparable Top-3 categorical accuracy (69.54%) to
XGBoost with a slightly lower standard accuracy (44.0%). Neural networks can also
be computationally expensive to train.
• KNN Classifier: Although it had the lowest Top-3 categorical accuracy (70.36%), it
demonstrated a respectable performance considering its significantly faster training
time compared to XGBoost and Neural Networks. The gap between Top-3 and
standard accuracy (41.0%) suggests it might be more effective at suggesting relevant
genres within its top choices.
KNN offers a compelling balance between accuracy and efficiency. While its Top-3
categorical accuracy is slightly lower, it achieves a respectable result in a fraction of the
training time required by XGBoost and Neural Networks. This makes it a valuable option
when computational resources are limited or real-time predictions are crucial.

Fig1: Accuracies of all three models

13
Fig 2:Accuracies by Genre

14
7. FUTURE SCOPE

1. Exploring Better Ensemble Strategies:


Implementing advanced ensemble methods, such as stacking classifiers, can
improve prediction accuracy by combining the strengths of multiple models. This
approach allows different algorithms to contribute to the final decision, potentially
leading to more robust genre classifications.
2. Incorporating Lyrics and Non-Audio Parameters:
Integrating textual data, such as song lyrics, along with other non-audio features
(e.g., artist popularity, release year) can enhance the model's understanding of
music context. This multimodal approach can provide richer insights and improve
overall prediction accuracy.
3. Using Multi-Label Classifiers:
Employing multi-label classification techniques allows the model to assign
multiple genres to a single track, reflecting the reality that many songs belong to
more than one genre. This can enhance the predictive power and provide a more
nuanced classification of music.
4. Employing a Two-Stage Model:
Developing a two-stage classification model can streamline the process by first
categorizing tracks into broader genres and then further classifying them into
specific sub-genres. This hierarchical approach can improve accuracy and
efficiency in genre classification, making it easier to manage complex genre
relationships.

15
8. CONCLUSION

In this project we explored using various machine learning models for classifying music
genres from audio features extracted from the Spotify dataset. The XGBoost classifier
achieved the highest Top-3 Categorical Accuracy of 73.74%, closely followed by Neural
Networks and KNN. Genres with distinct sound characteristics were easier to classify, while
those with ambiguous or overlapping characteristics proved more challenging. The ensemble
model did not outperform individual models, possibly due to the logistic regression
component’s limitations for multi-class classification and the computational cost of the
Support Vector Classifier.
While promising results were obtained, the inherent subjectivity and overlap between certain
genres remain challenges.

16
10.REFERENCES
[1]Music Theory Site. “Time Signatures.” http://musictheorysite.com/

[2]Wikipedia. “Pitch Class.” https://en.wikipedia.org/wiki/Pitch_class

[3]Miller, Tristan. “How Does Decision Tree Output Predict Proba?” Medium, ml-byte-
size, 12 Aug. 2020. https://medium.com/ml-byte-size/how-does-decision-tree-output-
predict-proba-12c78634c9d5

[4]Stack Overflow. “How Does Sklearn SVM Svcs Function Predict_proba Work
Internally?” Stack Overflow, 20 Dec.
2012. https://stackoverflow.com/questions/30674164/confusing-probabilities-of-the-
predict-proba-of-scikit-learns-svm

[5]Blind, N. N. “How to Interpret Mean Decrease in Accuracy and Mean Decrease Gini
in Random Forest?” Stack Exchange, Stats, 26 Oct.
2016. https://stats.stackexchange.com/questions/420710/range-of-values-for-random-
forest-mean-decrease-in-accuracy

[6]Codecademy. “Feature Importance & Feature Selection with Random


Forests.” https://www.codecademy.com/learn/machine-learning-random-forests-
decision-trees

[7]scikit-learn documentation. “sklearn.tree.DecisionTreeClassifier.” https://scikit-


learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html

[8]maharshipandya (2021). Spotify Tracks Dataset. [Hugging Face]. Retrieved


from https://huggingface.co/datasets/maharshipandya/spotify-tracks-dataset

17

You might also like