Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2008
We apply a new machine learning tool, kernel combination, to the task of semantic music retrieval. We use 4 different types of acoustic content and social context feature sets to describe a large music corpus and derive 4 individual kernel matrices from these feature sets. Each kernel is used to train a support vector machine (SVM) classifier for each semantic tag (e.g., 'aggressive', 'classic rock', 'distorted electric guitar') in a large tag vocabulary. We examine the individual performance of each feature kernel and then show how to learn an optimal linear combination of these kernels using convex optimization. We find that the retrieval performance of the SVMs trained using the combined kernel is superior to SVMs trained using the best individual kernel for a large number of tags. In addition, the weights placed on individual kernels in the linear combination reflect the relative importance of each feature set when predicting a tag.
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval, 2009
When attempting to annotate music, it is important to consider both acoustic content and social context. This paper explores techniques for collecting and combining multiple sources of such information for the purpose of building a query-by-text music retrieval system. We consider two representations of the acoustic content (related to timbre and harmony) and two social sources (social tags and web documents). We then compare three algorithms that combine these information sources: calibrated score averaging (CSA), RankBoost, and kernel combination support vector machines (KC-SVM). We demonstrate empirically that each of these algorithms is superior to algorithms that use individual information sources.
2006
Abstract Efficient and intelligent music information retrieval is a very important topic of the 21st century. With the ultimate goal of building personal music information retrieval systems, this paper studies the problem of identifying" similar" artists using both lyrics and acoustic data. In this paper, we present a clustering algorithm that integrates features from both sources to perform bimodal learning.
2006
There is an increasing interest in customizable methods for organizing music collections. Relevant music characterization can be obtained from short-time features, but it is not obvious how to combine them to get useful information. First, the relevant information might not be evident at the short-time level, and these features have to be combined at a larger temporal level into a new feature vector in order to capture the relevant information. Second, we need to learn a model for the new features that generalizes well to new data. In this contribution, we will study how multivariate analysis (MVA) and kernel methods can be of great help in this task. More precisely, we will present two modified versions of a MVA method known as Orthonormalized Partial Least Squares (OPLS), one of them being a kernel extension, that are well-suited for discovering relevant dynamics in large music collections. The performance of both schemes will be illustrated in a music genre classification task.
2009
In the process of automatically annotating songs with descriptive labels, multiple types of input information can be used. These include keyword appearances in web documents, acoustic features of the song's audio content, and similarity with other tagged songs. Given these individual data sources, we explore the question of how to aggregate them. We find that fixed-combination approaches like sum and max perform well but that trained linear regression models work better. Retrieval performance improves with more data sources. On the other hand, for large numbers of training songs, Bayesian hierarchical models that aim to share information across individual tag regressions offer no advantage.
Distributed Framework and Applications ( …, 2010
Digital audio has become an almost ubiquitously spread medium, and for many consumers, digital audio is the major distribution and storage form of music. Numerous on-line music stores account for a growing share of record sales. The widespread adoption of digital audio on home computers and especially mobile devices, and numerous on-line music stores show the size of this market. Handling the ever growing size of both private and commercial collections however becomes increasingly difficult. Computer algorithms that can understand and interpret characteristics of music, and organise and recommend them for and to their users can be of great assistance. Music is an inherently multi-modal type of data, and the lyrics associated with the music are as essential to the reception and the message of a song as is the audio. Album covers are carefully designed by artists to convey a message consistent with the music and image of a band. Music videos, fan sites and other sources of information add to that in a usually coherent manner. In this paper, we focus on exploring the lyrics domain of music, and how this information can be combined with the acoustic domain. We evaluate our approach by means of a common task in music information retrieval, musical genre classification. Advancing over previous work that showed improvements with simple feature fusion, were we successfully demonstrated simple approaches of combining different representations of music, we apply a more sophisticated machine learning technique, ensemble classification. The results show that the approach is superior to the best choice of a single algorithm on a single feature set. Moreover, it also releases the user from making this choice explicitly.
2003
Automatic musical genre classification is very useful for music indexing and retrieval. In this paper, an efficient and effective automatic musical genre classification approach is presented. A set of features is extracted and used to characterize music content. A multi-layer classifier based on support vector machines is applied to musical genre classification. Support vector machines are used to obtain the optimal class boundaries between different genres of music by learning from training data. Experimental results of multi-layer support vector machines illustrate good performance in musical genre classification and are more advantageous than traditional Euclidean distance based method and other statistic learning methods.
2007
The scenarios opened by the increasing availability, sharing and dissemination of music across the Web is pushing for fast, effective and abstract ways of organizing and retrieving music material. Automatic classification is a central activity to model most of these processes, thus its design plays a relevant role in advanced Music Information Retrieval. In this paper, we adopted a state-of-the-art machine learning algorithm, i.e. Support Vector Machines, to design an automatic classifier of music genres. In order to optimize classification accuracy, we implemented some already proposed features and engineered new ones to capture aspects of songs that have been neglected in previous studies. The classification results on two datasets suggest that our model based on very simple features reaches the state-of-art accuracy (on the ISMIR dataset) and very high performance on a music corpus collected locally.
Proceedings of the 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management, 2015
While content-based approaches for music information retrieval (MIR) have been heavily investigated, usercentric approaches are still in their early stage. Existing user-centric approaches use either music-context or user-context to personalize the search. However, none of them give the possibility to the user to choose the suitable context for his needs. In this paper we propose KISS MIR, a versatile approach for music information retrieval. It consists in combining both music-context and user-context to rank search results. The core contribution of this work is the investigation of different types of contexts derived from social networks. We distinguish semantic and social information and use them to build semantic and social profiles for music and users. The different contexts and profiles can be combined and personalized by the user. We have assessed the quality of our model using a real dataset from Last.fm. The results show that the use of user-context to rank search results is two times better than the use of music-context. More importantly, the combination of semantic and social information is crucial for satisfying user needs.
2012
We have developed a novel hybrid representation for Music Information Retrieval. Our representation is built by incorporating audio content into the tag space in a tag-track matrix, and then learning hybrid concepts using latent semantic analysis. We apply this representation to the task of music recommendation, using similarity-based retrieval from a query music track. We also develop a new approach to evaluating music recommender systems, which is based upon the relationship of users liking tracks. We are interested in measuring the recommendation quality, and the rate at which cold-start tracks are recommended. Our hybrid representation is able to outperform a tag-only representation, in terms of both recommendation quality and the rate that cold-start tracks are included as recommendations. 1
IEEE Transactions on Multimedia, 2000
2007
Query-by-semantic-description (QBSD) is a natural paradigm for retrieving content from large databases of music. A major impediment to the development of good QBSD systems for music information retrieval has been the lack of a cleanlylabeled, publicly-available, heterogeneous data set of songs and associated annotations. We have collected the Computer Audition Lab 500-song (CAL500) data set by having humans listen to and annotate songs using a survey designed to capture 'semantic associations' between music and words. We adapt the supervised multi-class labeling (SML) model, which has shown good performance on the task of image retrieval, and use the CAL500 data to learn a model for music retrieval. The model parameters are estimated using the weighted mixture hierarchies expectation-maximization algorithm which has been specifically designed to handle realvalued semantic association between words and songs, rather than binary class labels. The output of the SML model, a vector of class-conditional probabilities, can be interpreted as a semantic multinomial distribution over a vocabulary. By also representing a semantic query as a query multinomial distribution, we can quickly rank order the songs in a database based on the Kullback-Leibler divergence between the query multinomial and each song's semantic multinomial. Qualitative and quantitative results demonstrate that our SML model can both annotate a novel song with meaningful words and retrieve relevant songs given a multi-word, text-based query.
Complexity, 2018
Automatic retrieval of music information is an active area of research in which problems such as automatically assigning genres or descriptors of emotional content to music emerge. Recent advancements in the area rely on the use of deep learning, which allows researchers to operate on a low-level description of the music. Deep neural network architectures can learn to build feature representations that summarize music files from data itself, rather than expert knowledge. In this paper, a novel approach to applying feature learning in combination with support vector machines to musical data is presented. A spectrogram of the music file, which is too complex to be processed by SVM, is first reduced to a compact representation by a recurrent neural network. An adjustment to loss function of the network is proposed so that the network learns to build a representation space that replicates a certain notion of similarity between annotations, rather than to explicitly make predictions. We evaluate the approach on five datasets, focusing on emotion recognition and complementing it with genre classification. In experiments, the proposed loss function adjustment is shown to improve results in classification and regression tasks, but only when the learned similarity notion corresponds to a kernel function employed within the SVM. These results suggest that adjusting deep learning methods to build data representations that target a specific classifier or regressor can open up new perspectives for the use of standard machine learning methods in music domain.
IEEE Transactions on Multimedia, 2000
Efficient and intelligent music information retrieval is a very important topic of the 21st century. With the ultimate goal of building personal music information retrieval systems, this paper studies the problem of intelligent music information retrieval. Huron [10] points out that since the preeminent functions of music are social and psychological, the most useful characterization would be based on four types of information: genre, emotion, style, and similarity. This paper introduces Daubechies Wavelet Coefficient Histograms (DWCH) for music feature extraction for music information retrieval. The histograms are computed from the coefficients of the db8 Daubechies wavelet filter applied to three seconds of music. A comparative study of sound features and classification algorithms on a dataset compiled by Tzanetakis shows that combining DWCH with timbral features (MFCC and FFT), with the use of multi-class extensions of Support Vector Machine, achieves approximately 80% of accuracy, which is a significant improvement over the previously known result on this dataset. On another dataset the combination achieves 75% of accuracy.
2011
Music prediction tasks range from predicting tags given a song or clip of audio, predicting the name of the artist, or predicting related songs given a song, clip, artist name or tag. That is, we are interested in every semantic relationship between the different musical concepts in our database. In realistically sized databases, the number of songs is measured in the hundreds of thousands or more, and the number of artists in the tens of thousands or more, providing a considerable challenge to standard machine learning techniques. In this work, we propose a method that scales to such datasets which attempts to capture the semantic similarities between the database items by modeling audio, artist names, and tags in a single low-dimensional semantic space. This choice of space is learnt by optimizing the set of prediction tasks of interest jointly using multi-task learning. Our method both outperforms baseline methods and, in comparison to them, is faster and consumes less memory. We then demonstrate how our method learns an interpretable model, where the semantic space captures well the similarities of interest.
2012
Music Information Retrieval systems are often based on the analysis of a large number of low-level audio features. When dealing with problems of musical genre description and visualization, however, it would be desirable to work with a very limited number of highly informative and discriminant macro-descriptors. In this paper we focus on a specific class of training-based descriptors, which are obtained as the loglikelihood of a Gaussian Mixture Model trained with short musical excerpts that selectively exhibit a ...
2008
Social tags are free text labels that are applied to items such as artists, albums and songs. Captured in these tags is a great deal of information that is highly relevant to Music Information Retrieval (MIR) researchers including information about genre, mood, instrumentation, and quality. Unfortunately there is also a great deal of irrelevant information and noise in the tags. Imperfect as they may be, social tags are a source of human-generated contextual knowledge about music that may become an essential part of the solution to many MIR problems. In this article, we describe the state of the art in commercial and research social tagging systems for music. We describe how tags are collected and used in current systems. We explore some of the issues that are encountered when using tags, and we suggest possible areas of exploration for future research.
MM'09 - Proceedings of the 2009 ACM Multimedia Conference, with Co-located Workshops and Symposiums, 2009
Music listeners frequently use words to describe music. Personalized music recommendation systems such as Last.fm and Pandora rely on manual annotations (tags) as a mechanism for querying and navigating large music collections. A well-known issue in such recommendation systems is known as the cold-start problem: it is not possible to recommend new songs/tracks until those songs/tracks have been manually annotated. Automatic tag annotation based on content analysis is a potential solution to this problem and has recently been gaining attention. We describe how stacked generalization can be used to improve the performance of a state-of-the-art automatic tag annotation system for music based on audio content analysis and report results on two publicly available datasets.
IEEE Transactions on Audio, Speech & Language Processing, 2008
We present a computer audition system that can both annotate novel audio tracks with semantically meaningful words and retrieve relevant tracks from a database of unlabeled audio content given a text-based query. We consider the related tasks of content-based audio annotation and retrieval as one supervised multiclass, multilabel problem in which we model the joint probability of acoustic features and words. We collect a data set of 1700 human-generated annotations that describe 500 Western popular music tracks. For each word in a vocabulary, we use this data to train a Gaussian mixture model (GMM) over an audio feature space. We estimate the parameters of the model using the weighted mixture hierarchies expectation maximization algorithm. This algorithm is more scalable to large data sets and produces better density estimates than standard parameter estimation techniques. The quality of the music annotations produced by our system is comparable with the performance of humans on the same task. Our "query-by-text" system can retrieve appropriate songs for a large number of musically relevant words. We also show that our audition system is general by learning a model that can annotate and retrieve sound effects.
Neural Computation, 1998
In music genre classification the decision time is typically of the order of several seconds, however, most automatic music genre classification systems focus on short time features derived from 10 − 50ms. This work investigates two models, the multivariate Gaussian model and the multivariate autoregressive model for modelling short time features. Furthermore, it was investigated how these models can be integrated over a segment of short time features into a kernel such that a support vector machine can be applied. Two kernels with this property were considered, the convolution kernel and product probability kernel. In order to examine the different methods an 11 genre music setup was utilized. In this setup the Mel Frequency Cepstral Coefficients were used as short time features. The accuracy of the best performing model on this data set was ∼ 44% compared to a human performance of ∼ 52% on the same data set.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.