Spanish Word Embeddings Learned on Word Association Norms

Gerardo Sierra

Spanish Word Embeddings Learned on Word Association Norms

Gerardo Sierra

2019

Sign up for access to the world's latest research

checkGet notified about relevant papers

checkSave papers to use in your research

checkJoin the discussion with peers

checkTrack your impact

Abstract

Word embeddings are vector representations of words in an n-dimensional space used for many natural language processing tasks. A large training corpus is needed for learning good quality word embeddings. In this work, we present a method based on the node2vec algorithm for learning embeddings based on paths in a graph. We used a collection of Word Association Norms in Spanish to build a graph of word connections. The nodes of the network correspond to the words in the corpus, whereas the edges correspond to a pair of words given in a free association test. We evaluated our word vectors in human annotated benchmarks, achieving better results than those trained on a billion-word corpus such as, word2vec, fasttext, and glove.

Gerardo Sierra

Semantic Web

Word embeddings are powerful for many tasks in natural language processing. In this work, we learn word embeddings using weighted graphs from word association norms (WAN) with the node2vec algorithm. Although building WAN is a difficult and time-consuming task, training the vectors from these resources is a fast and efficient process. This allows us to obtain good quality word embeddings from small corpora. We evaluate our word vectors in two ways: intrinsic and extrinsic. The intrinsic evaluation was performed with several word similarity benchmarks, WordSim-353, MC30, MTurk-287, MEN-TR-3k, SimLex-999, MTurk-771 and RG-65, and different similarity measures achieving better results than those obtained with word2vec, GloVe, and fastText, trained on a huge corpus. The extrinsic evaluation was done by measuring the quality of sentence embeddings using transfer tasks: sentiment analysis, paraphrase detection, natural language inference, and semantic textual similarity.

Log In

Spanish Word Embeddings Learned on Word Association Norms

Sign up for access to the world's latest research

Abstract

Related papers

Related topics