Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
National Conference on Signal and Image Processing Applications
Music information retrieval is currently an active research area that addresses the extraction of musically important information from audio signals, and the applications of such information. The extracted information can be used for search and retrieval of music in recommendation systems, or to aid musicological studies or even in music learning. Sophisticated signal processing techniques are applied to convert low-level acoustic signal properties to musical attributes which are further embedded in a rulebased or statistical classification framework to link with high-level descriptions such as melody, genre, mood and artist type. Vocal music comprises a large and interesting category of music where the lead instrument is the singing voice. The singing voice is more versatile than many musical instruments and therefore poses interesting challenges to information retrieval systems. In this paper, we provide a brief overview of research in vocal music processing followed by a description of related work at IIT Bombay leading to the development of an interface for melody detection of singing voice in polyphony.
Proceedings of the 5th ACM/IEEE-CS joint conference on Digital libraries - JCDL '05, 2005
With the explosive growth of networked collections of musical material, there is a need to establish a mechanism like a digital library to manage music data. This paper presents a content-based processing paradigm of popular song collections to facilitate the realization of a music digital library. The paradigm is built on the automatic extraction of information of interest from music audio signals. Because the vocal part is often the heart of a popular song, we focus on developing techniques to exploit the solo vocal signals underlying an accompanied performance. This supports the necessary functions of a music digital library, namely, music data organization, music information retrieval/recommendation, and copyright protection.
Lecture Notes in Computer Science, 2005
This paper investigates the problem of retrieving popular music by singing. In contrast to the retrieval of MIDI music, which is easy to acquire the main melody by the selection of the symbolic tracks, retrieving polyphonic objects in CD or MP3 format requires to extract the main melody directly from the accompanied singing signals, which proves difficult to handle well simply using the conventional pitch estimation. To reduce the interference of background accompaniments during the main melody extraction, methods are proposed to estimate the underlying sung notes in a music recording by taking into account the characteristic structure of popular song. In addition, to accommodate users' unprofessional or personal singing styles, methods are proposed to handle the inaccuracies of tempo, pause, transposition, or off-key, etc., inevitably existing in queries. The proposed system has been evaluated on a music database consisting of 2613 phrases extracted manually from 100 Mandarin pop songs. The experimental results indicate the feasibility of retrieving pop songs by singing.
In recent years, the revenue earned through digital music stood at a billion-dollar market and the US remained the most profitable market for Digital music. Due to the digital shift, today people have access to millions of music clips from online music applications through their smart phones. In this context, there are some issues identified between the music listeners, music search engine by querying and retrieving music clips from a large collection of music data set. Classification is one of the fundamental problems in music information retrieval (MIR). Still, there are some hurdles according to their listener's preferences regarding music collections and their categorization. In this paper, different music extraction features are addressed, which can be used in various tasks related to music classification like a listener's mood, instrument recognition, artist identification, genre, query-by-humming, and music annotation. This review illustrates various features that can be used for addressing the research challenges posed by music mining.
2009
Detecting distinct features in modern pop music is an important problem that can have significant applications in areas such as multimedia entertainment. They can be used, for example, to give a visually coherent representation of the sound. We propose to integrate a singing voice detector with a multimedia, multi-touch game where the user has to perform simple tasks at certain key points in the music. While the ultimate goal is to automatically create visual content in response to features extracted from the music, here we give special focus to the detection of voice segments in music songs. The solution presented extracts the Mel-Frequency Cepstral Coefficients of the sound and uses a Hidden Markov Model to infer if the sound has voice. The classification rate obtained is high when compared to other singing voice detectors that use Mel-Frequency Cepstral Coefficients.
2012
This dissertation is concerned with the problem of describing the singing voice within the audio signal of a song. This work is motivated by the fact that the lead vocal is the element that attracts the attention of most listeners. For this reason it is common for music listeners to organize and browse music collections using information related to the singing voice such as the singer name. Our research concentrates on the three major problems of music information retrieval: the localization of the source to be described (i.e. the recognition of the elements corresponding to the singing voice in the signal of a mixture of instruments), the search of pertinent features to describe the singing voice, and finally the development of pattern recognition methods based on these features to identify the singer. For this purpose we propose a set of novel features computed on the temporal variations of the fundamental frequency of the sung melody. These features, which aim to describe the vib...
2000
This paper presents a method for extracting vocal melodies from popular songs. Underlying the extraction proce- dure is a sinusoidal representation applied to the input song signal. The desired vocal melody is isolated by focusing on specific (amplitude- and frequency-modulated) sinusoids that are identified as vocal, with the identification based on minimum mean square error (MMSE) estimation of the singing
National Conference on Communications, Bombay, …, 2002
This paper describes some early attempts at developing a music indexing and retrieval system based on melody, or tune, of songs. In the envisaged system, the "query", a song fragment whistled or sung by the user into a microphone, is used to search a database of soundtracks to find the entry that is best matched to it in tune. The challenging issues that this project raises are described. Signal processing tools suitable for melody detection are presented, and finally some experimentally obtained results are discussed.
IEEE Transactions on Multimedia, 2000
Efficient and intelligent music information retrieval is a very important topic of the 21st century. With the ultimate goal of building personal music information retrieval systems, this paper studies the problem of intelligent music information retrieval. Huron [10] points out that since the preeminent functions of music are social and psychological, the most useful characterization would be based on four types of information: genre, emotion, style, and similarity. This paper introduces Daubechies Wavelet Coefficient Histograms (DWCH) for music feature extraction for music information retrieval. The histograms are computed from the coefficients of the db8 Daubechies wavelet filter applied to three seconds of music. A comparative study of sound features and classification algorithms on a dataset compiled by Tzanetakis shows that combining DWCH with timbral features (MFCC and FFT), with the use of multi-class extensions of Support Vector Machine, achieves approximately 80% of accuracy, which is a significant improvement over the previously known result on this dataset. On another dataset the combination achieves 75% of accuracy.
2017
The performance of existing search engines for retrieval of images is facing challenges resulting in inappropriate noisy data rather than accurate information searched for. The reason for this being data retrieval methodology is mostly based on information in text form input by the user. In certain areas, human computation can give better results than machines. In the proposed work, two approaches are presented. In the first approach, Unassisted and Assisted Crowd Sourcing techniques are implemented to extract attributes for the classical music, by involving users (players) in the activity. In the second approach, signal processing is used to automatically extract relevant features from classical music. Mel Frequency Cepstral Coefficient (MFCC) is used for feature learning, which generates primary level features from the music audio input. To extract high-level features related to the target class and to enhance the primary level features, feature enhancement is done. During the lea...
Automatic singing detection and singing phoneme recognition are two MIR research topics that have gained a lot of attention the last years. The,rst approaches borrowed successful techniques widely used in Automatic Speech Recognition (ASR) as speech and singing share similar acoustical features since they are produced by the same ap- paratus. Moving from monophonic to polyphonic audio signals the problem becomes more complex as the background instrumental ac- companiment is regarded as a noise source that has to be attenuated. This thesis presents research into the problem of singing phoneme de- tection in polyphonic audio, in which the lyrics are in English. Speci- cally, we are interested in building statistical classication models that are able to automatically distinguish sung consonants and vowels from pure instrumental music in polyphonic music recordings.
Adaptive Multimedia Retrieval. Large-Scale Multimedia Retrieval and Evaluation, 2013
The effectiveness of audio content analysis for music retrieval may be enhanced by the use of available metadata. In the present work, observed differences in singing style and instrumentation across genres are used to adapt acoustic features for the singing voice detection task. Timbral descriptors traditionally used to discriminate singing voice from accompanying instruments are complemented by new features representing the temporal dynamics of source pitch and timbre. A method to isolate the dominant source spectrum serves to increase the robustness of the extracted features in the context of polyphonic audio. While demonstrating the effectiveness of combining static and dynamic features, experiments on a culturally diverse music database clearly indicate the value of adapting feature sets to genre-specific acoustic characteristics. Thus commonly available metadata, such as genre, can be useful in the front-end of an MIR system.
Lecture Notes in Computer Science, 2010
The problem of identifying sections of singer voice and instruments is investigated in this paper. Three classification techniques: Linde-Buzo-Gray algorithm (LBG), Gaussian Mixture Models (GMM) and feed-forward Multi-Layer Perception (MLP) are presented and compared in this paper. All techniques are based on Mel frequency Cepstral Coefficients (MFCC), which commonly used in the speech and speaker recognition domains. All the proposed approaches yield a decision at every 125 ms only. Particularly, a large experimental data is extracted from the music genre database RWC including various style (68 pieces, 25 subcategories). The recognition scores are evaluated on data used in the training session and others never seen by proposed systems. The best results are obtained with the GMM (94% with train data and 80.5% with test data).
1997
This paper describes a system designed to retrieve melodies from a database on the basis of a few notes sung into a microphone. The system first accepts acoustic input from the user, transcribes it into common music notation, then searches a database of 9400 folk tunes for those containing the sung pattern, or patterns similar to the sung pattern; retrieval is ranked according to the closeness of the match. The paper presents an analysis of the performance of the system using different search criteria involving melodic contour, musical intervals and rhythm; tests were carried out using both exact and approximate string matching. Approximate matching used a dynamic programming algorithm designed for comparing musical sequences. Current work focuses on developing a faster algorithm.
2019
In our daily lives, we are constantly surrounded by music, and we are deeply influenced by music. Making music together can create strong ties between people, while fostering communication and creativity. This is demonstrated, for example, by the large community of singers active in choirs or by the fact that music constitutes an important part of our cultural heritage. The availability of music in digital formats and its distribution over the world wide web has changed the way we consume, create, enjoy, explore, and interact with music. To cope with the increasing amount of digital music, one requires computational methods and tools that allow users to find, organize, analyze, and interact with music--topics that are central to the research field known as \emph{Music Information Retrieval} (MIR). The Dagstuhl Seminar 19052 was devoted to a branch of MIR that is of particular importance: processing melodic voices (with a focus on singing voices) using computational methods. It is of...
The 10th IEEE International Symposium on Signal Processing and Information Technology, 2010
Song and music discrimination play a significant role in multimedia applications such as genre classification and singer identification. Song and music discrimination play a significant role in multimedia applications such as genre classification and singer identification. The problem of identifying sections of singer voice and instrument signals is addressed in this paper. It must therefore be able to detect when a singer starts and stops singing. In addition, it must be efficient in all circumstances that the interpreter is a man or a woman or that he or she has a different register (soprano, alto, baritone, tenor or bass), different styles of music and independent of the number of instruments. Our approach does not assume a priori knowledge of song and music segments. We use simple and efficient threshold-based distance measurements for discrimination. Linde-Buzo-Gray vector quantization algorithm and Gaussian Mixture Models (GMMs) are used for comparison purposes. Our approach is validated on a large experimental dataset from the music genre database RWC that includes many styles (25 styles and 272 minutes of data).
2005 IEEE International Conference on Multimedia and Expo
In this work we strive to find an optimal set of acoustic features for the discrimination of speech, monophonic singing, and polyphonic music to robustly segment acoustic media streams for annotation and interaction purposes. Furthermore we introduce ensemble-based classification approaches within this task. From a basis of 276 attributes we select the most efficient set by SVM-SFFS. Additionally relevance of single features by calculation of information gain ratio is presented. As a basis of comparison we reduce dimensionality by PCA. We show extensive analysis of different classifiers within the named task. Among these are Kernel Machines, Decision Trees, and Bayesian Classifiers. Moreover we improve single classifier performance by Bagging and Boosting, and finally combine strengths of classifiers by StackingC. The database is formed by 2,114 samples of speech, and singing of 58 persons. 1,000 Music clips have been taken from the MTV-Europe-Top-20 1980-2000. The outstanding discrimination results of a working realtime capable implementation stress the practicability of the proposed novel ideas.
International Journal of Innovative Research in Computer and Communication Engineering, 2015
Audio signal is an acoustic signal which has frequency range roughly in 20 to 20,000 Hz. Human auditory system has a wonderful ability of effectively focusing on sound in the surrounding. Most audio signals are from the mixing of several sound sources. Separation of singing voice from music has wide range of application such as lyrics recognition, alignment, singer identification, and music information retrieval. Music accompaniment that is often non-stationary & harmonic. Basically, audio signal is time frequency segments of singing voice. An audio signal classification system should be able to categorize different audio format like speech, background noise, and musical genres, singer identification, karaoke etc. In this paper, discuss about separation technique and classifier which are used for singing voice separation from music. Non-negative matrix factorization (NMF) is used for separation from music, Gaussian mixture model (GMM) & Support vector machine (SVM) classifier for th...
ACM Computing Surveys, 2018
A huge increase in the number of digital music tracks has created the necessity to develop an automated tool to extract the useful information from these tracks. As this information has to be extracted from the contents of the music, it is known as content-based music information retrieval (CB-MIR). In the past two decades, several research outcomes have been observed in the area of CB-MIR. There is a need to consolidate and critically analyze these research findings to evolve future research directions. In this survey article, various tasks of CB-MIR and their applications are critically reviewed. In particular, the article focuses on eight MIR-related tasks such as vocal/non-vocal segmentation, artist identification, genre classification, raga identification, query-by-humming, emotion recognition, instrument recognition, and music clip annotation. The fundamental concepts of Indian classical music are detailed to attract future research on this topic. The article elaborates on the...
Archives of Acoustics, 2008
This paper presents the main issues related to music information retrieval (MIR) domain. MIR is a multi-discipline area. Within this domain, there exists a variety of approaches to musical instrument recognition, musical phrase classification, melody classification (e.g. queryby-humming systems), rhythm retrieval, high-level-based music retrieval such as looking for emotions in music or differences in expressiveness, music search based on listeners' preferences, etc. The key-issue lies, however, in the parameterization of a musical event. In this paper some aspects related to MIR are shortly reviewed in the context of possible and current applications to this domain.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.