0% found this document useful (0 votes)
62 views12 pages

Deep Contextualized Word Representations

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
62 views12 pages

Deep Contextualized Word Representations

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Deep contextualized

word representations
Matthew E. Peters, Mark Neumann, Mohit
Iyyer, Matt Gardner, Christopher Clark,
Kenton Lee, Luke Zettlemoyer

NAACL-HLT 2018
Overview

• Propose a new type of deep contextualised word


representations (ELMo) that model:

• Overview ‣ Complex characteristics of word use (e.g., syntax and


semantics)
• Method
‣ How these uses vary across linguistic contexts (i.e., to
• Evaluation
model polysemy)
• Analysis
• Show that ELMo can improve existing neural models in
• Comments various NLP tasks

• Argue that ELMo can capture more abstract linguistic


characteristics in the higher level of layers

33
Example
GloVe mostly learns
sport-related context

ELMo can distinguish the word


sense based on the context

34
Method

• Embeddings from Language Models: ELMo

• Overview • Learn word embeddings through building


• Method bidirectional language models (biLMs)

• Evaluation ‣ biLMs consist of forward and backward LMs


N


• Analysis ✦ Forward: p(t1, t2, …, tN ) = p(tk | t1, t2, …, tk−1)
k=1
• Comments
N


✦ Backward: p(t1, t2, …, tN ) = p(tk | tk+1, tk+2, …, tN )
k=1

35
Method

With long short term memory (LSTM) network,


predicting the next words in both directions to build
• Overview biLMs
The forward LM architecture Expanded in the forward direction of k
• Method a nice one
Output layer ok
• Evaluation k−1

• Analysis Hidden layers


h LM
k2
(LSTMs) k−1

• Comments h LM
k1

Embedding layer xk

tk … have a nice one …

36
Method

ELMo represents a word tk as a linear combination of


corresponding hidden layers (inc. its embedding)
ELMo is a task specific
representation. A down-stream
biLMs
task learns weighting parameters
Forward LM Backward LM

{
Concatenate
ok ok
s2task × hLM
k2 hidden layers k−1 k+1
h LM h LM
ELMotask
k = γ task
× ∑ s1task × hLM
k1
k2
k−1
k2
k+1
h LM h LM
s0task × hLM
k0
[ h LM LM
kj ; h kj ]
k1 k1

([xk ; xk])
xk
Unlike usual word embeddings, ELMo is
assigned to every token instead of a type tk tk

37
Method

ELMo can be integrated to almost all neural NLP tasks


with simple concatenation to the embedding layer
• Overview Corpus

• Method
Enhance inputs
• Evaluation with ELMos
Train
• Analysis

• Comments
ELMo ELMo ELMo
biLMs

Usual inputs have a nice

38
Evaluation

Many linguistic tasks are improved by using ELMo

• Overview

• Method Q&A
Textual entailment
• Evaluation Semantic role labelling
Coreference resolution

• Analysis Named entity recognition


Sentiment analysis

• Comments

39
Analysis

The higher layer seemed to learn semantics while the lower


layer probably captured syntactic features

• Overview Word sense disambiguation PoS tagging

• Method

• Evaluation

• Analysis

• Comments

40
Analysis

The higher layer seemed to learn semantics while the lower


layer probably captured syntactic features???

• Overview

• Method Most models preferred


“syntactic (probably)” features
• Evaluation
Even in sentiment analysis
• Analysis

• Comments

41
Analysis

ELMo-enhanced models can make use of small


datasets more efficiently
Textual entailment Semantic role labelling
• Overview

• Method

• Evaluation

• Analysis

• Comments

42
Comments

• Pre-trained ELMo models are available at https://


allennlp.org/elmo
• Overview ‣ AllenNLP is a deep NLP library on top of PyTorch
• Method
‣ AllenNLP is a product of AI2 (Allen Institute for
• Evaluation Artificial Intelligence) which works on other
• Analysis interesting projects like Semantic Scholar

• Comments • ELMo can process character-level inputs

‣ Japanese (Chinese, Korean, …) ELMo models likely


to be possible

43

You might also like