Skip to main content

Sheng-yi Kong

Followers

6

Following

2

Co-author

1

Public Views

Dr. Naresh Butani

Somaiyeh Dehghan

Johns Hopkins University

Uploads

Papers by Sheng-yi Kong

Learning Semantic Textual Similarity from Conversations

arXiv (Cornell University), Apr 20, 2018

We present a novel approach to learn representations for sentence-level semantic similarity using... more We present a novel approach to learn representations for sentence-level semantic similarity using conversational data. Our method trains an unsupervised model to predict conversational input-response pairs. The resulting sentence embeddings perform well on the semantic textual similarity (STS) benchmark and Se-mEval 2017's Community Question Answering (CQA) question similarity subtask. Performance is further improved by introducing multitask training combining the conversational input-response prediction task and a natural language inference task. Extensive experiments show the proposed model achieves the best performance among all neural models on the STS benchmark and is competitive with the state-of-the-art feature engineered and mixed systems in both tasks.

Automatic Title Generation for Spoken Documents with a Delicate Scored Viterbi Algorithm

Automatic title generation for spoken documents is believed to be an important key for browsing a... more Automatic title generation for spoken documents is believed to be an important key for browsing and navigation over huge quantities of multimedia content. A new framework of automatic title generation for Chinese spoken documents is proposed in this paper using a delicate scored Viterbi algorithm performed over automatically generated text summaries of the testing spoken documents. The Viterbi beam search is guided by a delicate score evaluated from three sets of models: term selection model tells the most suitable terms to be included in the title, term ordering model gives the best ordering of the terms to make the title readable, and title length model tells the reasonable length of the title. The models are trained from a training corpus which is not required to be matched with the testing spoken documents. Both objective evaluation based on F1 measure and subjective human evaluation for relevance and readability indicated the approach is very attractive.

An HMM trajectory tiling (HTT) approach to high quality TTS

Interspeech 2010, 2010

We propose an HMM Trajectory Tiling (HTT) approach to high quality TTS, which is our entry to Bli... more We propose an HMM Trajectory Tiling (HTT) approach to high quality TTS, which is our entry to Blizzard Challenge 2010. In HTT, first refined HMM is trained with the Minimum Generation Error (MGE) criterion; then trajectory generated by the refined HMM is to guide the search for finding the closest waveform segment "tiles" in synthesis. Normalized distances between HMM trajectory and those of the waveform unit candidates are used for selecting final candidates in a unit sausage (lattice). Normalized cross-correlation, a good concatenation measure for its high relevance to spectral similarity, phase continuity and concatenation time instants, is used for finding the best unit sequence in the sausage. The sequence serves as the best segment tiles to closely follow the HMM trajectory guide. Tested in four tasks, {EH1, EH2, MH1 and MH2}, of Blizzard Challenge 2010, the new HTT approach delivers high quality, natural sounding TTS speech without sacrificing high intelligibility. Subjectively, they are confirmed by naturalness and intelligibility listening test scores.

Multi-layered summarization of spoken document archives by information extraction and semantic structuring

Conference of the International Speech Communication Association, 2006

... The problem is that the query given by the user is usually very short and thus not specific e... more

Automatic key term extraction from spoken course lectures using branching entropy and prosodic/semantic features

This paper proposes a set of approaches to automatically extract key terms from spoken course lec... more This paper proposes a set of approaches to automatically extract key terms from spoken course lectures including audio signals, ASR transcriptions and slides. We divide the key terms into two types: key phrases and keywords and develop different approaches to extract them in order. We extract key phrases using right/left branching entropy and extract keywords by learning from three sets of features: prosodic features, lexical features and semantic features from Probabilistic Latent Semantic Analysis (PLSA). The learning approaches include an unsupervised method (K-means exemplar) and two supervised ones (AdaBoost and neural network). Very encouraging preliminary results were obtained with a corpus of course lectures, and it is found that all approaches and all sets of features proposed here are useful.

Summarization of Spoken Document Archive by Information Extraction and Semantic Structuring

Universal Sentence Encoder

We present models for encoding sentences into embedding vectors that specifically target transfer... more We present models for encoding sentences into embedding vectors that specifically target transfer learning to other NLP tasks. The models are efficient and result in accurate performance on diverse transfer tasks. Two variants of the encoding models allow for trade-offs between accuracy and compute resources. For both variants, we investigate and report the relationship between model complexity, resource consumption, the availability of transfer task training data, and task performance. Comparisons are made with baselines that use word level transfer learning via pretrained word embeddings as well as baselines do not use any transfer learning. We find that transfer learning using sentence embeddings tends to outperform word level transfer. With transfer learning via sentence embeddings, we observe surprisingly good performance with minimal amounts of supervised training data for a transfer task. We obtain encouraging results on Word Embedding Association Tests (WEAT) targeted at det...

Semi-supervised Category-specific Review Tagging on Indonesian E-Commerce Product Reviews

Proceedings of The 3rd Workshop on e-Commerce and NLP

Product reviews are a huge source of natural language data in e-commerce applications. Several mi... more Product reviews are a huge source of natural language data in e-commerce applications. Several millions of customers write reviews regarding a variety of topics. We categorize these topics into two groups as either "category-specific" topics or as "generic" topics that span multiple product categories. While we can use a supervised learning approach to tag review text for generic topics, it is impossible to use supervised approaches to tag category-specific topics due to the sheer number of possible topics for each category. In this paper, we present an approach to tag each review with several product category-specific tags on Indonesian language product reviews using a semi-supervised approach. We show that our proposed method can work at scale on real product reviews at Tokopedia 1 , a major e-commerce platform in Indonesia. Manual evaluation shows that the proposed method can efficiently generate category-specific product tags.

Universal Sentence Encoder for English

Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations

We present easy-to-use TensorFlow Hub sentence embedding models having good task transfer perform... more We present easy-to-use TensorFlow Hub sentence embedding models having good task transfer performance. Model variants allow for trade-offs between accuracy and compute resources. We report the relationship between model complexity, resources, and transfer performance. Comparisons are made with baselines without transfer learning and to baselines that incorporate word-level transfer. Transfer learning using sentence-level embeddings is shown to outperform models without transfer learning and often those that use only word-level transfer. We show good transfer task performance with minimal training data and obtain encouraging results on word embedding association tests (WEAT) of model bias.

Learning Semantic Textual Similarity from Conversations

Proceedings of The Third Workshop on Representation Learning for NLP

We present a novel approach to learn representations for sentence-level semantic similarity using... more We present a novel approach to learn representations for sentence-level semantic similarity using conversational data. Our method trains an unsupervised model to predict conversational input-response pairs. The resulting sentence embeddings perform well on the semantic textual similarity (STS) benchmark and Se-mEval 2017's Community Question Answering (CQA) question similarity subtask. Performance is further improved by introducing multitask training combining the conversational input-response prediction task and a natural language inference task. Extensive experiments show the proposed model achieves the best performance among all neural models on the STS benchmark and is competitive with the state-of-the-art feature engineered and mixed systems in both tasks.

A Multi-layered Summarization System for Multimedia Archives by Understanding and Structuring of Chinese Spoken Documents

Chinese Spoken Language Processing, 2006

The multi-media archives are very difficult to be shown on the screen, and very difficult to retr... more The multi-media archives are very difficult to be shown on the screen, and very difficult to retrieve and browse. It is therefore important to develop technologies to summarize the entire archives in the network content to help the user in browsing and retrieval. In a recent paper [1] we proposed a complete set of multi-layered technologies to handle at least

Multi-layered Summarization of Spo Information Extraction and S

The spoken documents are very difficult to be shown on the screen, and very difficult to retrieve... more The spoken documents are very difficult to be shown on the screen, and very difficult to retrieve and browse. It is therefore important to develop technologies to summarize the entire archives of the huge quantities of spoken documents in the network content to help the user in browsing and retrieval. In this paper we propose a complete set of multi-layered technologies to handle at least some of the above issues: (1) Automatic Generation of Titles and Summaries for each of the spoken documents, such that the spoken documents become much more easier to browse, (2) Global Semantic Structuring of the entire spoken document archive, offering to the user a global picture of the semantic structure of the archive, and (3) Query-based Local Semantic Structuring for the subset of the spoken documents retrieved by the user’s query, providing the user the detailed semantic structure of the relevant spoken documents given the query he entered. The Probabilistic Latent Semantic Analysis (PLSA) ...

Automatic title generation for Chinese spoken documents with a delicate scored Viterbi algorithm

2008 IEEE Spoken Language Technology Workshop, 2008

Automatic title generation for spoken documents is believed to be an important key for browsing a... more Automatic title generation for spoken documents is believed to be an important key for browsing and navigation over huge quantities of multimedia content. A new framework of automatic title generation for Chinese spoken documents is proposed in this paper using a delicate scored Viterbi algorithm performed over automatically generated text summaries of the testing spoken documents. The Viterbi beam search

Spoken Knowledge Organization by Semantic Structuring and a Prototype Course Lecture System for Personalized Learning

by Yun-Nung Chen and Sheng-yi Kong

IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2014

It takes very long time to go through a complete online course. Without proper background, it is ... more It takes very long time to go through a complete online course. Without proper background, it is also difficult to understand retrieved spoken paragraphs. This paper therefore presents a new approach of spoken knowledge organization for course lectures for efficient personalized learning. Automatically extracted key terms are taken as the fundamental elements of the semantics of the course. Key term graph constructed by connecting related key terms forms the backbone of the global semantic structure. Audio/video signals are divided into multi-layer temporal structure including paragraphs, sections and chapters, each of which includes a summary as the local semantic structure. The interconnection between semantic structure and temporal structure together with spoken term detection jointly offer to the learners efficient ways to navigate across the course knowledge with personalized learning paths considering their personal interests, available time and background knowledge. A preliminary prototype system has also been successfully developed.

IMPROVED SUMMARIZATION OF CHINESE SPOKEN DOCUMENTS BY PROBABILISTIC LATENT SEMANTIC ANALYSIS (PLSA) WITH FURTHER ANALYSIS AND INTEGRATED SCORING

2006 IEEE Spoken Language Technology Workshop, 2006

... By considering the special structure of Chi-nese language as discussed in section 2.5, we use... more

Learning on demand - course lecture distillation by information extraction and semantic structuring for spoken documents

2009 IEEE International Conference on Acoustics, Speech and Signal Processing, 2009

Page 1. LEARNING ON DEMAND - COURSE LECTURE DISTILLATION BY INFORMATION EXTRACTION AND SEMANTIC S... more

Improved Spoken Document Summarization Using Probabilistic Latent Semantic Analysis (PLSA)

2006 IEEE International Conference on Acoustics Speed and Signal Processing Proceedings, 2006

In this paper we propose a set of new methods exploring the topical information embedded in the s... more In this paper we propose a set of new methods exploring the topical information embedded in the spoken documents and using such information in automatic summarization of spoken documents. By introducing a set of latent topic variables, probabilistic latent semantic analysis (PLSA) is useful to find the underlying probabilistic relationships between documents and terms. Two useful measures, referred to as topic significance and term entropy in this paper, are proposed based on the PLSA modeling to determine the terms and thus sentences important for the document which can then be used to construct the summary. Experiment results for preliminary tests performed on broadcast news stories in Mandarin Chinese indicated improved performance as compared to some existing approaches

Multi-layered summarization of spoken document archives by information extraction and semantic structuring

… Conference on Spoken …, 2006

... The problem is that the query given by the user is usually very short and thus not specific e... more

Three-phase doubly fed induction generators: an overview

IET Electric Power Applications, 2010

... Brushless doubly fed induction machines (BDFM) have also been studied and tested for performa... more

Semantic Analysis and Organization of Spoken Documents Based on Parameters Derived From Latent Topics

IEEE Transactions on Audio, Speech, and Language Processing, 2000

Learning Semantic Textual Similarity from Conversations

arXiv (Cornell University), Apr 20, 2018

We present a novel approach to learn representations for sentence-level semantic similarity using... more We present a novel approach to learn representations for sentence-level semantic similarity using conversational data. Our method trains an unsupervised model to predict conversational input-response pairs. The resulting sentence embeddings perform well on the semantic textual similarity (STS) benchmark and Se-mEval 2017's Community Question Answering (CQA) question similarity subtask. Performance is further improved by introducing multitask training combining the conversational input-response prediction task and a natural language inference task. Extensive experiments show the proposed model achieves the best performance among all neural models on the STS benchmark and is competitive with the state-of-the-art feature engineered and mixed systems in both tasks.

Automatic Title Generation for Spoken Documents with a Delicate Scored Viterbi Algorithm

Automatic title generation for spoken documents is believed to be an important key for browsing a... more Automatic title generation for spoken documents is believed to be an important key for browsing and navigation over huge quantities of multimedia content. A new framework of automatic title generation for Chinese spoken documents is proposed in this paper using a delicate scored Viterbi algorithm performed over automatically generated text summaries of the testing spoken documents. The Viterbi beam search is guided by a delicate score evaluated from three sets of models: term selection model tells the most suitable terms to be included in the title, term ordering model gives the best ordering of the terms to make the title readable, and title length model tells the reasonable length of the title. The models are trained from a training corpus which is not required to be matched with the testing spoken documents. Both objective evaluation based on F1 measure and subjective human evaluation for relevance and readability indicated the approach is very attractive.

An HMM trajectory tiling (HTT) approach to high quality TTS

Interspeech 2010, 2010

We propose an HMM Trajectory Tiling (HTT) approach to high quality TTS, which is our entry to Bli... more We propose an HMM Trajectory Tiling (HTT) approach to high quality TTS, which is our entry to Blizzard Challenge 2010. In HTT, first refined HMM is trained with the Minimum Generation Error (MGE) criterion; then trajectory generated by the refined HMM is to guide the search for finding the closest waveform segment "tiles" in synthesis. Normalized distances between HMM trajectory and those of the waveform unit candidates are used for selecting final candidates in a unit sausage (lattice). Normalized cross-correlation, a good concatenation measure for its high relevance to spectral similarity, phase continuity and concatenation time instants, is used for finding the best unit sequence in the sausage. The sequence serves as the best segment tiles to closely follow the HMM trajectory guide. Tested in four tasks, {EH1, EH2, MH1 and MH2}, of Blizzard Challenge 2010, the new HTT approach delivers high quality, natural sounding TTS speech without sacrificing high intelligibility. Subjectively, they are confirmed by naturalness and intelligibility listening test scores.

Multi-layered summarization of spoken document archives by information extraction and semantic structuring

Conference of the International Speech Communication Association, 2006

... The problem is that the query given by the user is usually very short and thus not specific e... more

Automatic key term extraction from spoken course lectures using branching entropy and prosodic/semantic features

This paper proposes a set of approaches to automatically extract key terms from spoken course lec... more This paper proposes a set of approaches to automatically extract key terms from spoken course lectures including audio signals, ASR transcriptions and slides. We divide the key terms into two types: key phrases and keywords and develop different approaches to extract them in order. We extract key phrases using right/left branching entropy and extract keywords by learning from three sets of features: prosodic features, lexical features and semantic features from Probabilistic Latent Semantic Analysis (PLSA). The learning approaches include an unsupervised method (K-means exemplar) and two supervised ones (AdaBoost and neural network). Very encouraging preliminary results were obtained with a corpus of course lectures, and it is found that all approaches and all sets of features proposed here are useful.

Summarization of Spoken Document Archive by Information Extraction and Semantic Structuring

Universal Sentence Encoder

We present models for encoding sentences into embedding vectors that specifically target transfer... more We present models for encoding sentences into embedding vectors that specifically target transfer learning to other NLP tasks. The models are efficient and result in accurate performance on diverse transfer tasks. Two variants of the encoding models allow for trade-offs between accuracy and compute resources. For both variants, we investigate and report the relationship between model complexity, resource consumption, the availability of transfer task training data, and task performance. Comparisons are made with baselines that use word level transfer learning via pretrained word embeddings as well as baselines do not use any transfer learning. We find that transfer learning using sentence embeddings tends to outperform word level transfer. With transfer learning via sentence embeddings, we observe surprisingly good performance with minimal amounts of supervised training data for a transfer task. We obtain encouraging results on Word Embedding Association Tests (WEAT) targeted at det...

Semi-supervised Category-specific Review Tagging on Indonesian E-Commerce Product Reviews

Proceedings of The 3rd Workshop on e-Commerce and NLP

Product reviews are a huge source of natural language data in e-commerce applications. Several mi... more Product reviews are a huge source of natural language data in e-commerce applications. Several millions of customers write reviews regarding a variety of topics. We categorize these topics into two groups as either "category-specific" topics or as "generic" topics that span multiple product categories. While we can use a supervised learning approach to tag review text for generic topics, it is impossible to use supervised approaches to tag category-specific topics due to the sheer number of possible topics for each category. In this paper, we present an approach to tag each review with several product category-specific tags on Indonesian language product reviews using a semi-supervised approach. We show that our proposed method can work at scale on real product reviews at Tokopedia 1 , a major e-commerce platform in Indonesia. Manual evaluation shows that the proposed method can efficiently generate category-specific product tags.

Universal Sentence Encoder for English

Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations

We present easy-to-use TensorFlow Hub sentence embedding models having good task transfer perform... more We present easy-to-use TensorFlow Hub sentence embedding models having good task transfer performance. Model variants allow for trade-offs between accuracy and compute resources. We report the relationship between model complexity, resources, and transfer performance. Comparisons are made with baselines without transfer learning and to baselines that incorporate word-level transfer. Transfer learning using sentence-level embeddings is shown to outperform models without transfer learning and often those that use only word-level transfer. We show good transfer task performance with minimal training data and obtain encouraging results on word embedding association tests (WEAT) of model bias.

Learning Semantic Textual Similarity from Conversations

Proceedings of The Third Workshop on Representation Learning for NLP

We present a novel approach to learn representations for sentence-level semantic similarity using... more We present a novel approach to learn representations for sentence-level semantic similarity using conversational data. Our method trains an unsupervised model to predict conversational input-response pairs. The resulting sentence embeddings perform well on the semantic textual similarity (STS) benchmark and Se-mEval 2017's Community Question Answering (CQA) question similarity subtask. Performance is further improved by introducing multitask training combining the conversational input-response prediction task and a natural language inference task. Extensive experiments show the proposed model achieves the best performance among all neural models on the STS benchmark and is competitive with the state-of-the-art feature engineered and mixed systems in both tasks.

A Multi-layered Summarization System for Multimedia Archives by Understanding and Structuring of Chinese Spoken Documents

Chinese Spoken Language Processing, 2006

The multi-media archives are very difficult to be shown on the screen, and very difficult to retr... more The multi-media archives are very difficult to be shown on the screen, and very difficult to retrieve and browse. It is therefore important to develop technologies to summarize the entire archives in the network content to help the user in browsing and retrieval. In a recent paper [1] we proposed a complete set of multi-layered technologies to handle at least

Multi-layered Summarization of Spo Information Extraction and S

The spoken documents are very difficult to be shown on the screen, and very difficult to retrieve... more The spoken documents are very difficult to be shown on the screen, and very difficult to retrieve and browse. It is therefore important to develop technologies to summarize the entire archives of the huge quantities of spoken documents in the network content to help the user in browsing and retrieval. In this paper we propose a complete set of multi-layered technologies to handle at least some of the above issues: (1) Automatic Generation of Titles and Summaries for each of the spoken documents, such that the spoken documents become much more easier to browse, (2) Global Semantic Structuring of the entire spoken document archive, offering to the user a global picture of the semantic structure of the archive, and (3) Query-based Local Semantic Structuring for the subset of the spoken documents retrieved by the user’s query, providing the user the detailed semantic structure of the relevant spoken documents given the query he entered. The Probabilistic Latent Semantic Analysis (PLSA) ...

Automatic title generation for Chinese spoken documents with a delicate scored Viterbi algorithm

2008 IEEE Spoken Language Technology Workshop, 2008

Automatic title generation for spoken documents is believed to be an important key for browsing a... more Automatic title generation for spoken documents is believed to be an important key for browsing and navigation over huge quantities of multimedia content. A new framework of automatic title generation for Chinese spoken documents is proposed in this paper using a delicate scored Viterbi algorithm performed over automatically generated text summaries of the testing spoken documents. The Viterbi beam search

Spoken Knowledge Organization by Semantic Structuring and a Prototype Course Lecture System for Personalized Learning

by Yun-Nung Chen and Sheng-yi Kong

IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2014

It takes very long time to go through a complete online course. Without proper background, it is ... more It takes very long time to go through a complete online course. Without proper background, it is also difficult to understand retrieved spoken paragraphs. This paper therefore presents a new approach of spoken knowledge organization for course lectures for efficient personalized learning. Automatically extracted key terms are taken as the fundamental elements of the semantics of the course. Key term graph constructed by connecting related key terms forms the backbone of the global semantic structure. Audio/video signals are divided into multi-layer temporal structure including paragraphs, sections and chapters, each of which includes a summary as the local semantic structure. The interconnection between semantic structure and temporal structure together with spoken term detection jointly offer to the learners efficient ways to navigate across the course knowledge with personalized learning paths considering their personal interests, available time and background knowledge. A preliminary prototype system has also been successfully developed.

IMPROVED SUMMARIZATION OF CHINESE SPOKEN DOCUMENTS BY PROBABILISTIC LATENT SEMANTIC ANALYSIS (PLSA) WITH FURTHER ANALYSIS AND INTEGRATED SCORING

2006 IEEE Spoken Language Technology Workshop, 2006

... By considering the special structure of Chi-nese language as discussed in section 2.5, we use... more

Learning on demand - course lecture distillation by information extraction and semantic structuring for spoken documents

2009 IEEE International Conference on Acoustics, Speech and Signal Processing, 2009

Page 1. LEARNING ON DEMAND - COURSE LECTURE DISTILLATION BY INFORMATION EXTRACTION AND SEMANTIC S... more

Improved Spoken Document Summarization Using Probabilistic Latent Semantic Analysis (PLSA)

2006 IEEE International Conference on Acoustics Speed and Signal Processing Proceedings, 2006

In this paper we propose a set of new methods exploring the topical information embedded in the s... more In this paper we propose a set of new methods exploring the topical information embedded in the spoken documents and using such information in automatic summarization of spoken documents. By introducing a set of latent topic variables, probabilistic latent semantic analysis (PLSA) is useful to find the underlying probabilistic relationships between documents and terms. Two useful measures, referred to as topic significance and term entropy in this paper, are proposed based on the PLSA modeling to determine the terms and thus sentences important for the document which can then be used to construct the summary. Experiment results for preliminary tests performed on broadcast news stories in Mandarin Chinese indicated improved performance as compared to some existing approaches

Multi-layered summarization of spoken document archives by information extraction and semantic structuring

… Conference on Spoken …, 2006

... The problem is that the query given by the user is usually very short and thus not specific e... more

Three-phase doubly fed induction generators: an overview

IET Electric Power Applications, 2010

... Brushless doubly fed induction machines (BDFM) have also been studied and tested for performa... more

Semantic Analysis and Organization of Spoken Documents Based on Parameters Derived From Latent Topics

IEEE Transactions on Audio, Speech, and Language Processing, 2000