Socialized Word Embeddings

Preparation

You need to download the dataset and some tools:

Download Yelp dataset
Convert following datasets from json format to csv format by using json_to_csv_converter.py:

yelp_academic_dataset_review.json

yelp_academic_dataset_user.json
Download LIBLINEAR

After downloading liblinear, you can refer to Installation to install it.

It is suggested that you put liblinear under the directory SocializedWordEmbeddings.
Download Stanford CoreNLP

Only stanford-corenlp.jar is required. SocializedWordEmbeddings/preprocess/Split_NN.jar and SocializedWordEmbeddings/preprocess/Split_PPL.jar need to reference stanford-corenlp.jar.

It is suggested that after getting stanford-corenlp.jar, you put it under the directory SocializedWordEmbeddings/resources, otherwise, you should modify the default Class-Path in Split_NN.jar and Split_PPL.jar.

Preprocessing

cd SocializedWordEmbeddings/preprocess

Modify ./run.py by specifying --input (Path to yelp dataset).

python run.py

Training

cd SocializedWordEmbeddings/train

You may modify the following arguments in ./run.py:

--para_lambda The trade off parameter between log-likelihood and regularization term
--para_r The constraint of L2-norm of the user vector
--yelp_round The round number of yelp data, e.g. {8,9}

python run.py

Sentiment Classification

cd SocializedWordEmbeddings/sentiment

You may modify the following arguments in ./run.py:

--para_lambda The trade off parameter between log-likelihood and regularization term
--para_r The constraint of L2-norm of the user vector
--yelp_round The round number of yelp data, e.g. {8,9}

python run.py

Perplexity

cd SocializedWordEmbeddings/perplexity

You may modify the following arguments in ./run.py:

--para_lambda The trade off parameter between log-likelihood and regularization term
--para_r The constraint of L2-norm of the user vector
--yelp_round The round number of yelp data, e.g. {8,9}

python run.py

User Vectors for Attention

We thank Tao Lei as our code is developed based on his code.

You can simply re-implement our results of different settings (Table 5 in the paper) by modifying the SocializedWordEmbeddings/attention/run.sh:

[1] add user and word embeddings by specifying --user_embs and --embedding.

[2] add train/dev/test files by specifying --train, --dev, and --test respectively.

[3] three settings for our experiments could be achieved by specifying --user_atten and --user_atten_base:

setting '--user_atten 0' for 'Without attention'.

setting '--user_atten 1 --user_atten_base 1' for 'Trained attention'

setting '--user_atten 1 --user_atten_base 0' for 'Fixed user vector as attention'.

Dependencies

Python 2.7
Theano >= 0.7
Numpy
Gensim
PrettyTable

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Socialized Word Embeddings

Preparation

Preprocessing

Training

Sentiment Classification

Perplexity

User Vectors for Attention

Dependencies

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 55 Commits
attention		attention
data		data
perplexity		perplexity
preprocess		preprocess
resources		resources
sentiment		sentiment
train		train
README.md		README.md

HKUST-KnowComp/SocializedWordEmbeddings

Folders and files

Latest commit

History

Repository files navigation

Socialized Word Embeddings

Preparation

Preprocessing

Training

Sentiment Classification

Perplexity

User Vectors for Attention

Dependencies

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages