Skip to content

weihua916/LCTM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Latent Concept Topic Model (LCTM)

This is an implementation of LCTM [1]. LCTM aims to resolve the data sparsity in short texts by inferring topics based on document-level co-occurrence of latent concepts.

Installation

Requirements

You must have the following already installed on your system.

  • C++11
  • make
  • Eigen

Input and output

The following files are required for inputs.

  • Initial concept assignment file: i-th row indicates the initial concept assignment of i-th word type. Concept assignments can be initialized by performing k-means clustering on word embeddings.
  • Word embeddings file: 1st row contains #(vocabulary) and #(dimension of word vectors). From the second row, i-th row contains vector representation for (i-1)-th word type. 50 dimensional GloVe vectors trained on Wikipedia were used in the paper.
  • Indexed corpus: Each row contains a list of indices of words contained each document.

The software outputs the following file.

  • theta: Document-topic distribution
  • phi: Topic-concept distribution
  • mu: concept vector
  • noise: noise for each concept

Quick start

Type make; sh run.sh

The codes will be compiled and run on dataset in the directory input/sample with default parameters. To modify the parameters or path to the dataset directory, edit the corresponding part in run.sh.

Reference

[1] Weihua Hu and Jun'ichi Tsujii. A Latent Concept Topic Model for Robust Topic Inference Using Word Embeddings. 2016. Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics (ACL-16 short paper)

About

Implementation of a Latent Concept Topic Model (LCTM).

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published