DASI

Code and data for INTERSPEECH 2020 paper "Improved learning of Word Embeddings with Dictionary Definitions and Semantic Injection"

Evironment

python3
tensorflow >= 1.12
nltk >= 3.2.5
pandas >= 0.22.0
scikit-learn >= 0.19.2
matplotlib
gensim

Steps

Unzip files in tmp_data and raw_data
Preprocess wordnet data & build vocabulary
using python utils/extract_data.py
get synonyms and antonyms from wordnet using
python utils/extract_syn.py
Run different models under different settings. An example:

python run.py --model=DASI --config=DASI_pretrain_big_corpus --pretrain=w2v_embs_big_corpus --seed=1 --syn_coef=1.0 --retrofit_coef=0 --CP_coef=25 --batch_size=256 --semantic=all --use_emb=definition --sem_model=dasi

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
eval		eval
models		models
raw_data		raw_data
tmp_data		tmp_data
utils		utils
README.md		README.md
__init__.py		__init__.py
evaluation.py		evaluation.py
preprocess.py		preprocess.py
run.py		run.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

DASI

Evironment

Steps

About

Uh oh!

Releases

Packages

Languages

thu-spmi/DASI

Folders and files

Latest commit

History

Repository files navigation

DASI

Evironment

Steps

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages