0% found this document useful (0 votes)

62 views12 pages

Deep Contextualized Word Representations

Uploaded by

Gerardo Mauricio Toledo Acosta

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

62 views12 pages

Deep Contextualized Word Representations

Uploaded by

Gerardo Mauricio Toledo Acosta

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

Deep contextualized

word representations
Matthew E. Peters, Mark Neumann, Mohit
Iyyer, Matt Gardner, Christopher Clark,
Kenton Lee, Luke Zettlemoyer

NAACL-HLT 2018
Overview

• Propose a new type of deep contextualised word

representations (ELMo) that model:

• Overview ‣ Complex characteristics of word use (e.g., syntax and

semantics)
• Method
‣ How these uses vary across linguistic contexts (i.e., to
• Evaluation
model polysemy)
• Analysis
• Show that ELMo can improve existing neural models in
• Comments various NLP tasks

• Argue that ELMo can capture more abstract linguistic

characteristics in the higher level of layers

33
Example
GloVe mostly learns
sport-related context

ELMo can distinguish the word

sense based on the context

34
Method

• Embeddings from Language Models: ELMo

• Overview • Learn word embeddings through building

• Method bidirectional language models (biLMs)

• Evaluation ‣ biLMs consist of forward and backward LMs

∏
• Analysis ✦ Forward: p(t1, t2, …, tN ) = p(tk | t1, t2, …, tk−1)
k=1
• Comments
N

∏
✦ Backward: p(t1, t2, …, tN ) = p(tk | tk+1, tk+2, …, tN )
k=1

35
Method

With long short term memory (LSTM) network,

predicting the next words in both directions to build
• Overview biLMs
The forward LM architecture Expanded in the forward direction of k
• Method a nice one
Output layer ok
• Evaluation k−1

• Analysis Hidden layers

h LM
k2
(LSTMs) k−1

• Comments h LM
k1

Embedding layer xk

tk … have a nice one …

36
Method

ELMo represents a word tk as a linear combination of

corresponding hidden layers (inc. its embedding)
ELMo is a task specific
representation. A down-stream
biLMs
task learns weighting parameters
Forward LM Backward LM

{
Concatenate
ok ok
s2task × hLM
k2 hidden layers k−1 k+1
h LM h LM
ELMotask
k = γ task
× ∑ s1task × hLM
k1
k2
k−1
k2
k+1
h LM h LM
s0task × hLM
k0
[ h LM LM
kj ; h kj ]
k1 k1

([xk ; xk])
xk
Unlike usual word embeddings, ELMo is
assigned to every token instead of a type tk tk

37
Method

ELMo can be integrated to almost all neural NLP tasks

with simple concatenation to the embedding layer
• Overview Corpus

• Method
Enhance inputs
• Evaluation with ELMos
Train
• Analysis

• Comments
ELMo ELMo ELMo
biLMs

Usual inputs have a nice

38
Evaluation

Many linguistic tasks are improved by using ELMo

• Overview

• Method Q&A
Textual entailment
• Evaluation Semantic role labelling
Coreference resolution

• Analysis Named entity recognition

Sentiment analysis

• Comments

39
Analysis

The higher layer seemed to learn semantics while the lower

layer probably captured syntactic features

• Overview Word sense disambiguation PoS tagging

• Method

• Evaluation

• Analysis

• Comments

40
Analysis

The higher layer seemed to learn semantics while the lower

layer probably captured syntactic features???

• Overview

• Method Most models preferred

“syntactic (probably)” features
• Evaluation
Even in sentiment analysis
• Analysis

• Comments

41
Analysis

ELMo-enhanced models can make use of small

datasets more efficiently
Textual entailment Semantic role labelling
• Overview

• Method

• Evaluation

• Analysis

• Comments

42
Comments

• Pre-trained ELMo models are available at https://

allennlp.org/elmo
• Overview ‣ AllenNLP is a deep NLP library on top of PyTorch
• Method
‣ AllenNLP is a product of AI2 (Allen Institute for
• Evaluation Artificial Intelligence) which works on other
• Analysis interesting projects like Semantic Scholar

• Comments • ELMo can process character-level inputs

‣ Japanese (Chinese, Korean, …) ELMo models likely

to be possible

Veysel
No ratings yet
Veysel
13 pages
ELMo: Deep Contextualized Word Representations
No ratings yet
ELMo: Deep Contextualized Word Representations
15 pages
A Survey On Contextual Embeddings
No ratings yet
A Survey On Contextual Embeddings
13 pages
Day 1 Mastering LLMs - Embedding Era-1
No ratings yet
Day 1 Mastering LLMs - Embedding Era-1
10 pages
BERT: Bidirectional Language Model
No ratings yet
BERT: Bidirectional Language Model
10 pages
NLP LLM
No ratings yet
NLP LLM
47 pages
Contextual Word Embeddings
No ratings yet
Contextual Word Embeddings
8 pages
Split 1363534026993628405
No ratings yet
Split 1363534026993628405
2 pages
Bert
No ratings yet
Bert
20 pages
From Word Vectors To Multimodal Embeddings: Techniques, Applications, and Future Directions For Large Language Models
No ratings yet
From Word Vectors To Multimodal Embeddings: Techniques, Applications, and Future Directions For Large Language Models
21 pages
Trend
No ratings yet
Trend
47 pages
Semantics-Aware BERT For Language Understanding: Zhuosheng Zhang, Yuwei Wu, Hai Zhao, Zuchao Li
No ratings yet
Semantics-Aware BERT For Language Understanding: Zhuosheng Zhang, Yuwei Wu, Hai Zhao, Zuchao Li
8 pages
SudaBERT A Pre-Trained Encoder Representation
No ratings yet
SudaBERT A Pre-Trained Encoder Representation
4 pages
Pre-trained Models in NLP: A Survey
No ratings yet
Pre-trained Models in NLP: A Survey
31 pages
Understanding BERT and NLP Innovations
No ratings yet
Understanding BERT and NLP Innovations
98 pages
Survey of Pre-trained Models in NLP
No ratings yet
Survey of Pre-trained Models in NLP
28 pages
Sentiment Analysis With Contextual Embeddings and Self-Attention
No ratings yet
Sentiment Analysis With Contextual Embeddings and Self-Attention
10 pages
Lec14 Pretraining
No ratings yet
Lec14 Pretraining
42 pages
Improving BERT-Based Text Classification With Auxiliary Sentence and Domain Knowledge
No ratings yet
Improving BERT-Based Text Classification With Auxiliary Sentence and Domain Knowledge
16 pages
NLP and Machine Learning Overview
No ratings yet
NLP and Machine Learning Overview
1 page
2019 Wiedemannetal Konvens Bert 2
No ratings yet
2019 Wiedemannetal Konvens Bert 2
2 pages
Intro to Large Language Models
No ratings yet
Intro to Large Language Models
40 pages
Unit - 7 NLP
No ratings yet
Unit - 7 NLP
14 pages
Duan 2020
No ratings yet
Duan 2020
6 pages
Language Models as Knowledge Bases
No ratings yet
Language Models as Knowledge Bases
11 pages
NLP Pretrained Language Models BERT and Its Variants Model Analysis ML Pretraining Finetuning
No ratings yet
NLP Pretrained Language Models BERT and Its Variants Model Analysis ML Pretraining Finetuning
71 pages
Semantics-Aware BERT For Language Understanding: Zhuosheng Zhang, Yuwei Wu, Hai Zhao, Zuchao Li
No ratings yet
Semantics-Aware BERT For Language Understanding: Zhuosheng Zhang, Yuwei Wu, Hai Zhao, Zuchao Li
8 pages
Transfer Learning in Natural Language Processing PDF
0% (1)
Transfer Learning in Natural Language Processing PDF
238 pages
ParsBERT: Persian Language Model
No ratings yet
ParsBERT: Persian Language Model
10 pages
Word Vectors 2
No ratings yet
Word Vectors 2
57 pages
Sense2Vec: Fast Word Sense Disambiguation
No ratings yet
Sense2Vec: Fast Word Sense Disambiguation
9 pages
Word Vectors and Text Classification Techniques
No ratings yet
Word Vectors and Text Classification Techniques
52 pages
BERT for Word Sense Disambiguation
No ratings yet
BERT for Word Sense Disambiguation
10 pages
Intro DL 10 NLP
No ratings yet
Intro DL 10 NLP
99 pages
Foundations of Text Representation, LLMs and Transformers
No ratings yet
Foundations of Text Representation, LLMs and Transformers
87 pages
Large Language Models From Scratch
No ratings yet
Large Language Models From Scratch
29 pages
Augmenting LLMs With Knowledge - A Survey On Hallucination Prevention
No ratings yet
Augmenting LLMs With Knowledge - A Survey On Hallucination Prevention
11 pages
Glam: Fine-Tuning Large Language Models For Domain Knowledge Graph Alignment Via Neighborhood Partitioning and Generative Subgraph Encoding
No ratings yet
Glam: Fine-Tuning Large Language Models For Domain Knowledge Graph Alignment Via Neighborhood Partitioning and Generative Subgraph Encoding
8 pages
语言模型实现简单的 Word2Vec 式向量运算 Imp
No ratings yet
语言模型实现简单的 Word2Vec 式向量运算 Imp
19 pages
NLP & AI Techniques Guide
No ratings yet
NLP & AI Techniques Guide
37 pages
Jacob Devlin BERT
No ratings yet
Jacob Devlin BERT
43 pages
STORM - The Function of Large Language - Models in Embedding Space and The Subspace of The Learned Manifold ?
No ratings yet
STORM - The Function of Large Language - Models in Embedding Space and The Subspace of The Learned Manifold ?
11 pages
LLM4BeSciV2 2024 04 29T13 - 02 - 01.601Z
No ratings yet
LLM4BeSciV2 2024 04 29T13 - 02 - 01.601Z
25 pages
2018 Breakthroughs in Language AI Models
No ratings yet
2018 Breakthroughs in Language AI Models
5 pages
Transformer Models in NLP
No ratings yet
Transformer Models in NLP
5 pages
A Survey of Word Embeddings Based On Deep Learning: Shirui Wang Wenan Zhou Chao Jiang
No ratings yet
A Survey of Word Embeddings Based On Deep Learning: Shirui Wang Wenan Zhou Chao Jiang
24 pages
NLP Week9 Fine Tuning - and - IR
No ratings yet
NLP Week9 Fine Tuning - and - IR
64 pages
Neural Network Language Models Survey
No ratings yet
Neural Network Language Models Survey
7 pages
BERT and Transformer
No ratings yet
BERT and Transformer
48 pages
Thoughts On NLP Research in The (Post-) LLM Era: Yijia Shao Yuanpei College 2023/04/28
No ratings yet
Thoughts On NLP Research in The (Post-) LLM Era: Yijia Shao Yuanpei College 2023/04/28
51 pages
DL Unit-IV
No ratings yet
DL Unit-IV
20 pages
Graphs and Large Language Models Survey
No ratings yet
Graphs and Large Language Models Survey
13 pages
GenAI Workflow Automation NPTEL Zoom Course
No ratings yet
GenAI Workflow Automation NPTEL Zoom Course
88 pages
ETNLP: A Toolkit For Extraction, Evaluation and Visualization of Pre-Trained Word Embeddings
No ratings yet
ETNLP: A Toolkit For Extraction, Evaluation and Visualization of Pre-Trained Word Embeddings
6 pages
L L M H T C: A S R: Arge Anguage Odels For Ealthcare EXT Lassification Ystematic Eview
No ratings yet
L L M H T C: A S R: Arge Anguage Odels For Ealthcare EXT Lassification Ystematic Eview
55 pages
11-Transformer LLMs Updated
No ratings yet
11-Transformer LLMs Updated
96 pages
Graph and LLM Integration Survey
No ratings yet
Graph and LLM Integration Survey
13 pages
Dis8 Sol
No ratings yet
Dis8 Sol
6 pages
Pretraining for Dependency Parsing in MRLs
No ratings yet
Pretraining for Dependency Parsing in MRLs
10 pages
Orthogonal Isomorphic Representations of Free Groups
No ratings yet
Orthogonal Isomorphic Representations of Free Groups
7 pages
Review Moder Algebra and The Rise of Mathematical Structures Kleiner 2001
No ratings yet
Review Moder Algebra and The Rise of Mathematical Structures Kleiner 2001
9 pages
Review PDF
No ratings yet
Review PDF
12 pages
On The Geometry and Topology of Discrete Groups
No ratings yet
On The Geometry and Topology of Discrete Groups
10 pages
Progress in Artificial Intelligence: Francisco Pereira Penousal Machado Ernesto Costa Amílcar Cardoso
No ratings yet
Progress in Artificial Intelligence: Francisco Pereira Penousal Machado Ernesto Costa Amílcar Cardoso
824 pages
Intersection Properties of Finite Disk Collections
No ratings yet
Intersection Properties of Finite Disk Collections
20 pages
Concepts Valuation by Conjugate Möebius Inverse
No ratings yet
Concepts Valuation by Conjugate Möebius Inverse
10 pages
Fornaess - Dynamics in Several Complex Variables
No ratings yet
Fornaess - Dynamics in Several Complex Variables
70 pages
Arithhyphout 1
No ratings yet
Arithhyphout 1
12 pages
Lexical Resources for Sentiment Analysis
No ratings yet
Lexical Resources for Sentiment Analysis
28 pages
Text Mining Methods For The Characterisation of Suicidal Thoughts and Behaviour
No ratings yet
Text Mining Methods For The Characterisation of Suicidal Thoughts and Behaviour
7 pages
Boundaries of Two-Parabolic Schottky Groups
No ratings yet
Boundaries of Two-Parabolic Schottky Groups
17 pages
Representations of Solvable Subgroups of PSL (3, C)
No ratings yet
Representations of Solvable Subgroups of PSL (3, C)
28 pages
Cognitive Emotional Embedded Representations of Text To Predict Suicidal Ideation and Psychiatric Symptoms
No ratings yet
Cognitive Emotional Embedded Representations of Text To Predict Suicidal Ideation and Psychiatric Symptoms
27 pages
Neuron Model and Network Architectures
No ratings yet
Neuron Model and Network Architectures
18 pages
Become A Machine Learning Engineer
No ratings yet
Become A Machine Learning Engineer
4 pages
(領先制勝試閱版) AWS AIF C01 401 405
No ratings yet
(領先制勝試閱版) AWS AIF C01 401 405
4 pages
ANEEL Compliance via Machine Learning
No ratings yet
ANEEL Compliance via Machine Learning
7 pages
UNIT II - Gated Recurrent Unit
No ratings yet
UNIT II - Gated Recurrent Unit
24 pages
ML Roadmap Day by Day
No ratings yet
ML Roadmap Day by Day
2 pages
Neural Networks
No ratings yet
Neural Networks
17 pages
AI Data Modeling Workflow Guide
No ratings yet
AI Data Modeling Workflow Guide
1 page
Deep Multilayer Network For Automatic Targeting System of Gun Turret
No ratings yet
Deep Multilayer Network For Automatic Targeting System of Gun Turret
6 pages
Ai CH 4
No ratings yet
Ai CH 4
53 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
33 pages
AI-Driven NLP with Transformers
No ratings yet
AI-Driven NLP with Transformers
3 pages
Details of Cognologix Technologies Internship PPO For 2025 Batch 4th July 2025
No ratings yet
Details of Cognologix Technologies Internship PPO For 2025 Batch 4th July 2025
2 pages
26428020
No ratings yet
26428020
85 pages
Project Report Template AICTE Internship 2025
No ratings yet
Project Report Template AICTE Internship 2025
20 pages
Unit 3 CNN
No ratings yet
Unit 3 CNN
47 pages
Shruti Mishra Resume 1561126094
No ratings yet
Shruti Mishra Resume 1561126094
1 page
Efficient ML Research by Steven Kolawole
No ratings yet
Efficient ML Research by Steven Kolawole
2 pages
Blackboxai: Generated On: 7/2/2025 6:19:43 PM
No ratings yet
Blackboxai: Generated On: 7/2/2025 6:19:43 PM
23 pages
PCD Report 2025 XpeRay Nadhem Benhadjali Aya Boussaid
No ratings yet
PCD Report 2025 XpeRay Nadhem Benhadjali Aya Boussaid
94 pages
Legal-BERT Fine-Tuning Guide
No ratings yet
Legal-BERT Fine-Tuning Guide
27 pages
Neural Networks
No ratings yet
Neural Networks
14 pages
RNN LSTM
No ratings yet
RNN LSTM
72 pages
Urdu Abstractive Text Summarization Study
No ratings yet
Urdu Abstractive Text Summarization Study
13 pages
Overview of Attention Mechanisms
No ratings yet
Overview of Attention Mechanisms
15 pages
YOLO: For Computer Vision Experts
No ratings yet
YOLO: For Computer Vision Experts
3 pages
AI Master's Program with IBM
No ratings yet
AI Master's Program with IBM
25 pages
AWS Certified AI Practitioner AIF-C01 Exam - Free Exam Q&as, Page 1 - ExamTopics
0% (1)
AWS Certified AI Practitioner AIF-C01 Exam - Free Exam Q&as, Page 1 - ExamTopics
2 pages
Artificial Intelligence Brochure
No ratings yet
Artificial Intelligence Brochure
14 pages
Foundation of AIML
No ratings yet
Foundation of AIML
5 pages

Deep Contextualized Word Representations

Uploaded by

Deep Contextualized Word Representations

Uploaded by

Deep contextualized

• Propose a new type of deep contextualised word

• Overview ‣ Complex characteristics of word use (e.g., syntax and

• Argue that ELMo can capture more abstract linguistic

ELMo can distinguish the word

• Embeddings from Language Models: ELMo

• Overview • Learn word embeddings through building

• Evaluation ‣ biLMs consist of forward and backward LMs

With long short term memory (LSTM) network,

• Analysis Hidden layers

tk … have a nice one …

ELMo represents a word tk as a linear combination of

ELMo can be integrated to almost all neural NLP tasks

Usual inputs have a nice

Many linguistic tasks are improved by using ELMo

• Analysis Named entity recognition

The higher layer seemed to learn semantics while the lower

• Overview Word sense disambiguation PoS tagging

The higher layer seemed to learn semantics while the lower

• Method Most models preferred

ELMo-enhanced models can make use of small

• Pre-trained ELMo models are available at https://

• Comments • ELMo can process character-level inputs

‣ Japanese (Chinese, Korean, …) ELMo models likely

You might also like