Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), 2015
We introduce Discriminative BLEU (∆BLEU), a novel metric for intrinsic evaluation of generated te... more We introduce Discriminative BLEU (∆BLEU), a novel metric for intrinsic evaluation of generated text in tasks that admit a diverse range of possible outputs. Reference strings are scored for quality by human raters on a scale of [−1, +1] to weight multi-reference BLEU. In tasks involving generation of conversational responses, ∆BLEU correlates reasonably with human judgments and outperforms sentence-level and IBM BLEU in terms of both Spearman's ρ and Kendall's τ .
Discussion forums offer a new source of insight for the experiences and challenges faced by indiv... more Discussion forums offer a new source of insight for the experiences and challenges faced by individuals affected by mental disorders. Language technology can help domain experts gather insight from these forums, by aggregating themes and user behaviors across thousands of conversations. We present a novel model for web forums, which captures both thematic content as well as user-specific interests. Applying this model to the Aspies Central forum (which covers issues related to Asperger's syndrome and autism spectrum disorder), we identify several topics of concern to individuals who report being on the autism spectrum. We perform the evaluation on the data collected from Aspies Central forum, including 1,939 threads, 29,947 posts and 972 users. Quantitative evaluations demonstrate that the topics extracted by this model are substantially more than those obtained by Latent Dirichlet Allocation and the Author-Topic Model. Qualitative analysis by subjectmatter experts suggests intriguing directions for future investigation.
Proceedings of the Workshop on Computational Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality, 2014
Discussion forums offer a new source of insight for the experiences and challenges faced by indiv... more Discussion forums offer a new source of insight for the experiences and challenges faced by individuals affected by mental disorders. Language technology can help domain experts gather insight from these forums, by aggregating themes and user behaviors across thousands of conversations. We present a novel model for web forums, which captures both thematic content as well as user-specific interests. Applying this model to the Aspies Central forum (which covers issues related to Asperger's syndrome and autism spectrum disorder), we identify several topics of concern to individuals who report being on the autism spectrum. We perform the evaluation on the data collected from Aspies Central forum, including 1,939 threads, 29,947 posts and 972 users. Quantitative evaluations demonstrate that the topics extracted by this model are substantially more than those obtained by Latent Dirichlet Allocation and the Author-Topic Model. Qualitative analysis by subjectmatter experts suggests intriguing directions for future investigation.
2010 International Conference on Pattern …, Jan 1, 2010
In Dirichlet process (DP) mixture models, the number of components is implicitly determined by th... more In Dirichlet process (DP) mixture models, the number of components is implicitly determined by the sampling parameters of Dirichlet process. However, this kind of models usually produces lots of small mixture components when modeling real-world data, especially high-dimensional data. In this paper, we propose a new class of Dirichlet process mixture models with some constrained principles, named constrained Dirichlet process (CDP) mixture models. Based on general DP mixture models, we add a resampling step to obtain latent parameters. In this way, CDP mixture models can suppress noise and generate the compact patterns of the data. Experimental results on data clustering show the remarkable performance of the CDP mixture models.
2009 International Conference on Machine …, Jan 1, 2009
Sparse representation for machine learning has been exploited in past years. Several sparse repre... more Sparse representation for machine learning has been exploited in past years. Several sparse representation based classification algorithms have been developed for some applications, for example, face recognition. In this paper, we propose an improved sparse representation based classification algorithm. Firstly, for a discriminative representation, a non-negative constraint of sparse coefficient is added to sparse representation problem. Secondly, Mahalanobis distance is employed instead of Euclidean distance to measure the similarity between original data and reconstructed data. The proposed classification algorithm for face recognition has been evaluated under varying illumination and pose using standard face databases. The experimental results demonstrate that the performance of our algorithm is better than that of the up-to-date face recognition algorithm based on sparse representation.
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), 2015
We introduce Discriminative BLEU (∆BLEU), a novel metric for intrinsic evaluation of generated te... more We introduce Discriminative BLEU (∆BLEU), a novel metric for intrinsic evaluation of generated text in tasks that admit a diverse range of possible outputs. Reference strings are scored for quality by human raters on a scale of [−1, +1] to weight multi-reference BLEU. In tasks involving generation of conversational responses, ∆BLEU correlates reasonably with human judgments and outperforms sentence-level and IBM BLEU in terms of both Spearman's ρ and Kendall's τ .
Discussion forums offer a new source of insight for the experiences and challenges faced by indiv... more Discussion forums offer a new source of insight for the experiences and challenges faced by individuals affected by mental disorders. Language technology can help domain experts gather insight from these forums, by aggregating themes and user behaviors across thousands of conversations. We present a novel model for web forums, which captures both thematic content as well as user-specific interests. Applying this model to the Aspies Central forum (which covers issues related to Asperger's syndrome and autism spectrum disorder), we identify several topics of concern to individuals who report being on the autism spectrum. We perform the evaluation on the data collected from Aspies Central forum, including 1,939 threads, 29,947 posts and 972 users. Quantitative evaluations demonstrate that the topics extracted by this model are substantially more than those obtained by Latent Dirichlet Allocation and the Author-Topic Model. Qualitative analysis by subjectmatter experts suggests intriguing directions for future investigation.
Proceedings of the Workshop on Computational Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality, 2014
Discussion forums offer a new source of insight for the experiences and challenges faced by indiv... more Discussion forums offer a new source of insight for the experiences and challenges faced by individuals affected by mental disorders. Language technology can help domain experts gather insight from these forums, by aggregating themes and user behaviors across thousands of conversations. We present a novel model for web forums, which captures both thematic content as well as user-specific interests. Applying this model to the Aspies Central forum (which covers issues related to Asperger's syndrome and autism spectrum disorder), we identify several topics of concern to individuals who report being on the autism spectrum. We perform the evaluation on the data collected from Aspies Central forum, including 1,939 threads, 29,947 posts and 972 users. Quantitative evaluations demonstrate that the topics extracted by this model are substantially more than those obtained by Latent Dirichlet Allocation and the Author-Topic Model. Qualitative analysis by subjectmatter experts suggests intriguing directions for future investigation.
2010 International Conference on Pattern …, Jan 1, 2010
In Dirichlet process (DP) mixture models, the number of components is implicitly determined by th... more In Dirichlet process (DP) mixture models, the number of components is implicitly determined by the sampling parameters of Dirichlet process. However, this kind of models usually produces lots of small mixture components when modeling real-world data, especially high-dimensional data. In this paper, we propose a new class of Dirichlet process mixture models with some constrained principles, named constrained Dirichlet process (CDP) mixture models. Based on general DP mixture models, we add a resampling step to obtain latent parameters. In this way, CDP mixture models can suppress noise and generate the compact patterns of the data. Experimental results on data clustering show the remarkable performance of the CDP mixture models.
2009 International Conference on Machine …, Jan 1, 2009
Sparse representation for machine learning has been exploited in past years. Several sparse repre... more Sparse representation for machine learning has been exploited in past years. Several sparse representation based classification algorithms have been developed for some applications, for example, face recognition. In this paper, we propose an improved sparse representation based classification algorithm. Firstly, for a discriminative representation, a non-negative constraint of sparse coefficient is added to sparse representation problem. Secondly, Mahalanobis distance is employed instead of Euclidean distance to measure the similarity between original data and reconstructed data. The proposed classification algorithm for face recognition has been evaluated under varying illumination and pose using standard face databases. The experimental results demonstrate that the performance of our algorithm is better than that of the up-to-date face recognition algorithm based on sparse representation.
Uploads
Papers by Yangfeng Ji