Skip to content
This repository was archived by the owner on Apr 8, 2025. It is now read-only.

Add glove, word2vec and fasttext as Language Model#285

Merged
Timoeller merged 12 commits intomasterfrom
word2vec_lm
Apr 24, 2020
Merged

Add glove, word2vec and fasttext as Language Model#285
Timoeller merged 12 commits intomasterfrom
word2vec_lm

Conversation

@Timoeller
Copy link
Copy Markdown
Contributor

@Timoeller Timoeller commented Mar 19, 2020

This PR adds word embedding (Glove, Word2Vec, Fasttext) based Language Models to FARM - in a very basic way

General idea: Instead of contextualizing word embeddings through BERT we just convert words to its corresponding word embedding. The Prediction Head on top of the Language Model now does all the heavy lifting.

Example scripts for:

  • using Glove or Word2Vec as base Language Model for doc classification
  • using Fasttext as base Language Model for doc classification
  • embedding extraction

Features:

  • get pretrained German Glove and fasttext Embeddings from s3 through fasttext-german-uncased and glove-german-uncased
  • conversion script from gensim Glove/Word2Vec models to FARM models

Limitations

  • the embeddings are static. They do not change over the course of training
  • no saving or loading of finetuned models (on a downstream task)
  • prediciton heads only support Feed Forward Neural Networks
  • no multiprocessing for data conversion
  • performance on tasks like GermanEval18 is inferior to Bert based approaches

@Timoeller Timoeller added the enhancement New feature or request label Mar 19, 2020
@Timoeller Timoeller self-assigned this Mar 19, 2020
@Timoeller Timoeller changed the title WIP add glove and word2vec as Language Model WIP add glove, word2vec and fasttext as Language Model Apr 2, 2020
@Timoeller Timoeller changed the title WIP add glove, word2vec and fasttext as Language Model Add glove, word2vec and fasttext as Language Model Apr 2, 2020
@Timoeller Timoeller requested a review from tholor April 2, 2020 18:46
@Timoeller Timoeller merged commit 034dac5 into master Apr 24, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

enhancement New feature or request part: model

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants