You need to download the dataset and some tools:
-
Download Yelp dataset
-
Convert following datasets from json format to csv format by using
json_to_csv_converter.py:yelp_academic_dataset_review.json
yelp_academic_dataset_user.json
-
Download LIBLINEAR
After downloading
liblinear, you can refer toInstallationto install it.It is suggested that you put
liblinearunder the directorySocializedWordEmbeddings. -
Download Stanford CoreNLP
Only
stanford-corenlp.jaris required.SocializedWordEmbeddings/preprocess/Split_NN.jarandSocializedWordEmbeddings/preprocess/Split_PPL.jarneed to referencestanford-corenlp.jar.It is suggested that after getting
stanford-corenlp.jar, you put it under the directorySocializedWordEmbeddings/resources, otherwise, you should modify the defaultClass-PathinSplit_NN.jarandSplit_PPL.jar.
cd SocializedWordEmbeddings/preprocess
Modify ./run.py by specifying --input (Path to yelp dataset).
python run.py
cd SocializedWordEmbeddings/train
You may modify the following arguments in ./run.py:
--para_lambdaThe trade off parameter between log-likelihood and regularization term--para_rThe constraint of L2-norm of the user vector--yelp_roundThe round number of yelp data, e.g. {8,9}
python run.py
cd SocializedWordEmbeddings/sentiment
You may modify the following arguments in ./run.py:
--para_lambdaThe trade off parameter between log-likelihood and regularization term--para_rThe constraint of L2-norm of the user vector--yelp_roundThe round number of yelp data, e.g. {8,9}
python run.py
cd SocializedWordEmbeddings/perplexity
You may modify the following arguments in ./run.py:
--para_lambdaThe trade off parameter between log-likelihood and regularization term--para_rThe constraint of L2-norm of the user vector--yelp_roundThe round number of yelp data, e.g. {8,9}
python run.py
We thank Tao Lei as our code is developed based on his code.
You can simply re-implement our results of different settings (Table 5 in the paper) by modifying the SocializedWordEmbeddings/attention/run.sh:
[1] add user and word embeddings by specifying --user_embs and --embedding.
[2] add train/dev/test files by specifying --train, --dev, and --test respectively.
[3] three settings for our experiments could be achieved by specifying --user_atten and --user_atten_base:
setting '--user_atten 0' for 'Without attention'.
setting '--user_atten 1 --user_atten_base 1' for 'Trained attention'
setting '--user_atten 1 --user_atten_base 0' for 'Fixed user vector as attention'.
- Python 2.7
- Theano >= 0.7
- Numpy
- Gensim
- PrettyTable