Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
multilingualphilippines.com
…
7 pages
1 file
Embedded in the human perception is the distinction of speech rhythm to discriminate between languages. The rhythm of Filipino speech is given empirical validation in this paper using the computational and statistical methods prescribed by Ramus, Nespor, and Mehler (RNM) implemented on the Filipino Speech Corpus of the Digital Signal Processing Laboratory. Following the two sub-grouped convention, syllable-timed and stress-timed, the Filipino language was shown to be syllable-timed, and in the extended classification system, it was shown to be Mora-timed. The study was intended for the comprehensive modeling of the prosodic parameters of Filipino Speech for a natural-sounding Filipino Text-to-Speech (TTS) system, and a robust Automatic Speech Recognition system. The paper was already published and presented in a digital speech processing conference, but it was recently brought to our attention that the study has major implications to the linguistic description of the Filipino language, and to the recently ratified mother-tongue based multilingual education. Similar to digital speech systems, a person studying a second language (L2) will decide on the segment units he or she will use. If not guided properly, the student will be segmenting speech according to the rhythm of the native tongue (L1). Proper awareness of such speech parameters is very important for the student to avoid confusion, and especially for the teacher who will need to come up with an appropriate program to facilitate effective language acquisition.
This study incorporates computational and perceptual methods to classify Filipino speech rhythm. Speech rhythm may be described as a language"s distinguishing durational sound pattern, resulting from the complexity of the language"s syllable inventory. 1 Computational methods involve the correlation of rhythm-types to acoustic features such as the vocalic and consonantal intervals, one of which is the implementation of Multivariate Discriminant Analysis (MDA). Perceptual methods involve contrasting the rhythm of an unclassified language from prototype syllable-timed and stress-timed sentences. In order to isolate rhythm from speech, a data-stripping technique called flat sasasa resynthesis was implemented wherein the consonants are replaced with /s/ and vowels with /a/, producing a resynthesized alternating "sasasa" sounds at a constant pitch (F0). The rhythm discrimination and classification were closely examined for consistency between the data modeling and listening test results. The computational experiment was able to show that an MDA classifier trained to distinguish English and Japanese sentences tend to label Filipino sentences as Japanese 67% of the time, vis-à-vis the perceptual experiment showing that the listeners perceive Filipino to be more similar with Japanese, this study shows computational and perceptual validation that Filipino is syllable-timed, just like Japanese.
Proceedings of the 25th Pacific Asia Conference on Language, Information and Computation. 2011., 2011
This study incorporates computational and perceptual methods to classify Filipino speech rhythm. Speech rhythm may be described as a language‟s distinguishing durational sound pattern, resulting from the complexity of the language‟s syllable inventory. 1 Computational methods involve the correlation of rhythm-types to acoustic features such as the vocalic and consonantal intervals, one of which is the implementation of Multivariate Discriminant Analysis (MDA). Perceptual methods involve contrasting the rhythm of an unclassified language from prototype syllable-timed and stress-timed sentences. In order to isolate rhythm from speech, a data-stripping technique called flat sasasa resynthesis was implemented wherein the consonants are replaced with /s/ and vowels with /a/, producing a resynthesized alternating “sasasa” sounds at a constant pitch (F0). The rhythm discrimination and classification were closely examined for consistency between the data modeling and listening test results. The computational experiment was able to show that an MDA classifier trained to distinguish English and Japanese sentences tend to label Filipino sentences as Japanese 67% of the time, vis-à-vis the perceptual experiment showing that the listeners perceive Filipino to be more similar with Japanese, this study shows computational and perceptual validation that Filipino is syllable-timed, just like Japanese.
International Journal of Technology Enhanced Learning, 2023
This paper explores the extraction and analysis of prosodic features in children’s Filipino speech for application in automated oral reading fluency assessment. Automatic syllabication was optimised in the context of children’s Filipino read speech. Using the Children Filipino Speech Corpus, prosodic features were automatically extracted which were then classified according to human rater assessment of fluency. Analysis of variance showed that speech and articulation rates, pauses, syllable duration, and pitch can be used to classify children’s oral reading fluency in Filipino into three levels, namely, independent, instructional and frustration. Using machine learning classification methods, fivefold cross-validation showed that speech rate, articulation rate and number of pauses can be used to predict oral reading fluency at 92%, 85% and 76% accuracy for 2, 3 and 4 levels of fluency classification, respectively. Pitch and syllable duration patterns were also characterised for the assessment of phrasing and expression between fluent and non-fluent readers.
Text to Speech (TTS) is a system that synthesizes speech from text. The quality of TTS can be judged from intelligibility and naturally. Prosody is one of the parameter that can improve the quality of TTS. This study will develop a model of prosody based on information of Indonesian syntax category. Categories syntax is a word or combination of words that can be categorized as a subject, predicate, object or complement in a sentence. Prosody models developed in this study using chunking method to determine the syntax phrase categor y, and hidden Markov models (HMM) to predict the curve templates that match the input range of syntax phrase category. The hidden state of HMM declared by the template type pitch curve, and state the type of observation is expressed by the phrase syntax. Template pitch curve i s developed by combining theory and models of prosodic pitch contours Fujisaki. Prosody generation method to convert the template pitch curve into phoneme codes, duration and pitch values for each input text sentence pitch contour representing speech.
2019
This paper presents the development of an accent recognition system for the native speakers of Bikol and Tagalog using deep learning. The results of the work serve as baseline for the advancement of recognizing speakers with Tagalog and Bikol accents in Filipino language. A monologue written in Filipino is prepared as script for the development of the speech corpus. The script is used to capture the Bikol accent and Tagalog accent in the recordings. The corpus was validated, cleaned and divided into 80:20 ratios for training and testing. Afterwards, Praat is utilized to analyze and extract prosodic features such as F1 and energy of speech. The model was tested and yields 79.28% and 78.33% accuracy for Tagalog and Bikol accent, respectively.
… : APSIPA ASC 2009: …, 2009
In this paper we describe the development of an intonation model and a duration model to generate prosody for the Filipino language. Z-scores of normalized durations are used for the duration model and the Tilt parameters are used for the intonation model. The Filipino ...
2006
Utterance-type information has been used been used in spoken dialogue system, speech recognition system and translation machine. In a typical spoken dialogue system, a user can ask question or give information to the system. In another side, the spoken dialogue system should be capable of recognizing its user intention to give the correct response to him/her. In this dissertation, the automatic utterance-type recognizer is proposed to distinguish declarative questions from statements in Indonesian speech. Since utterances in these two types have the same words with the same order and differ only in their intonations, their classification requires not only a word recognizer, but also an intonation recognizer. At first, the utterance-type recognizer is designed based on Fujisaki model. The utterance-type recognizer uses a combination of the Fujisaki-model-parameters as the features to recognizt the two utterance type. The best performance of the Fujisaki model based utterance-type rec...
13th National Natural Language Processing Research Symposium, 2017
The language learning system employs a speech corpus that contains selected phrases from an Ilokano phrasebook authored by Dr. Carl Rubino [7]. A survey was conducted in Narvacan, Ilocos Sur for the purpose of selecting Ilokano phrases rated as “commonly used”. Using the survey results, the speech corpus was generated from the recordings of 10 middle-age (ages 18-28) native Ilokano speakers which were transcribed to phoneme level. The system has a built-in assessment subsystem that evaluates the user’s learning based on three categories: reading, listening, and comprehension. The assessment uses a Reading Miscue Detector (RMD) that basically employs force alignment method. An HMM-based Automated Speech Recognition (ASR) system was developed using the phoneme-level transcribed speech corpus as training data and the Viterbi Alignment as the main method for likelihood scoring. The 3-state HMMs were generated using the Hidden Markov Model Toolkit (HTK). A six-week pilot study was conducted to measure the extent of the users’ learning of the language gained from the system. The offline test set contains 3 subsets of full system passages by 3 users wherein each phrase has one phoneme with wrong pronunciation. The test set was used to measure the False Alarm Rate (FAR) and Misdetection Rate (MdR) of the RMD, which have shown fairly low percentages or may be considered as a good RMD according to previous studies presented in the literature [4] [10] [11] [12] [13]. The authors demonstrated that 65.75% of the reading miscues can be detected by the system at a false alarm rate of 25.16% which is good enough for similar systems presented in the literature such as in [4] [10] [12]. Another finding of this study is that the speech rhythm of the Ilokano language is mora-timed [8].
1994
In order to cope with the problems of spontaneous speech (including, for example, hesitations and non-words) it is necessary to extract from the speech signal all information it contains. Modeling of words by segmental units should be supported by suprasegmental units since valuable information is represented in the prosody of an utterance. We present an approach to flexible and efficient modeling of speech by segmental units and describe extraction and use of suprasegmental information.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.
2020 IEEE International Conference on Teaching, Assessment, and Learning for Engineering (TALE), 2020
International Journal on Electrical Engineering and Informatics, 2020
2022 14th International Conference on Knowledge and Smart Technology (KST), 2022
Recent Advances in Speech Understanding and Dialog Systems, 1988
Balkan Region Conference on Engineering and Business Education, 2014
2013 IEEE International Conference on Acoustics, Speech and Signal Processing, 2013
International Journal of …, 2008
The 16th ICPhS, 2007
The Journal of the Acoustical Society of America, 2011
Ircs Workshop on …, 2001
Signal Processing: An …, 2008