0% found this document useful (0 votes)
163 views13 pages

Introduction To NLP 2021

The document provides an introduction to natural language processing (NLP). It discusses the course information, including prerequisites and grading. It then gives an overview of NLP, including some of its achievements like machine translation and virtual assistants. It also discusses the different levels of linguistic description used in NLP like phonetics, phonology, morphology, syntax and semantics. Finally, it outlines some key applications of NLP in 2019 like computational linguistics, question answering, machine translation and summarization.

Uploaded by

Vũ Đinh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
163 views13 pages

Introduction To NLP 2021

The document provides an introduction to natural language processing (NLP). It discusses the course information, including prerequisites and grading. It then gives an overview of NLP, including some of its achievements like machine translation and virtual assistants. It also discusses the different levels of linguistic description used in NLP like phonetics, phonology, morphology, syntax and semantics. Finally, it outlines some key applications of NLP in 2019 like computational linguistics, question answering, machine translation and summarization.

Uploaded by

Vũ Đinh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

05/02/2021

Content

Introduction to Natural Language Processing  Course Information

(NLP)  Some achievements of NLP


 Overview of NLP
Dr. Tran Hong-Viet  Linguistic levels of description
 Why is NLP difficult?
UET-VNU
 Conclusion

Course information
Course information
 Course web page: https://courses.uet.vnu.edu.vn/ choose NLP course.
 Course: Natural Language Processing (NLP)  Up to date information
 Lecture notes
 Instructor: Dr. Tran Hong Viet Information Faculty.  Relevant dates, links, etc.
Email: [email protected]  Prerequisites: Programming principles, discrete mathematics for
Tel: 0975486888 computing, software design and software engineering concepts, AI. Good
knowledge of C++, Java, Python.
 Python required for programming assignments.
 Grading: 30% for (midterm + homeworks/assignments ) +10% for
attendence + 60% for final

3 4
05/02/2021

Policy & Practical issues Reference


 Slides
 Encourage discussion but assignments must be your individual work  Text books:
1) Speech and Language Processing,
 Codes copied from books or other libraries but be explicitly acknowledged Daniel Jurasky & James H. Martin, second
edition, printed by Prentice Hall, 2009
 Sharing or copying codes is strictly prohibited. (https://web.stanford.edu/~jurafsky/slp3/ )
2) Natural Language Processing ,
Eisenstein, 2018
(https://github.com/jacobeisenstein/gt-nlp-
class/blob/master/notes/eisenstein-nlp-
notes.pdf )
3) Foundation of Statistical Natural
Language Processing, Christopher D.
Manning & Hinrich Schutze, 2001

5 6

NLP in Industry Communication With Machines

7 8
05/02/2021

Google Translate & Vietgle Translate


Virtual Assistant
 Conversational agents contain:
 Speech recognition
 Language analysis
 Dialogue processing
 Information retrieval
 Text to speech
 Google now, Alexa, Siri, Cortana,
VAV…

9 10

Machine Translation vs. Human Watson system –IBM 2011 (Question-Answering )

• IBM built a computer that won Jeopardy in 2011


• Question answering technology built on 200 million text pages, encyclopedias,
dictionaries, thesauri, taxonomies, ontologies, and other databases
11 12
05/02/2021

Google’s Knowledge Graph Key Applicatons in 2019


 Goal: move beyond  Computatonal linguistcs (i.e., modeling the human capacity for language
keyword search document
retrieval to directly
computatonally)
 easier for mobile device  Informaton extracton, especially “open” IE
users
 Queston answering, chatbot (e.g., Watson, Google now)
 Google’s Knowledge
Graph (Knowledge  Machine translaton
Graph (“things not strings”):
 Summarizaton
 built on top of FreeBase
 entries are synthesised  Opinion and sentment analysis
from Wikipedia, news
stories, etc.  Social media analysis
 Manually updating  Fake News Recogniton

13 14

NLP Careers: So hot! What is NLP?


 Natural language processing (NLP) is a subfield of artificial
 Industry
intelligence and computational linguistics. It studies the problems of
 Government
automated generation and understanding of natural human languages.
 Academia
 Natural-language-generation systems convert information from
computer databases into normal-sounding human language. Natural-
language-understanding systems convert samples of human
language into more formal representations that are easier for
computer programs to manipulate.

15 16
05/02/2021

Natural language processing and


What is Natural Language Processing? computational linguistics

 Computers using natural language as input and/or output  Natural language processing (NLP) develops methods for solving practical
problems involving language:
 Automatic speech recognition
 Machine Translation
language Computer language  Sentiment Analysis
 Information extraction from documents

Understanding  Computational linguistics (CL) focused on using technology to


support/implement linguistics:
(NLU)  how do we understand language?
 how do we produce language?
 how do we learn language?
Generation
(NLG)

17 18

Phonetics and phonology


 Phonetics (ngữ âm) studies the sounds of a language

Level Of Linguistic  Phonology (âm vị học) studies the distributional properties of these
Knowledge sounds

19 20
05/02/2021

Morphology Syntax
 Morphology studies the structure of words  Syntax studies the ways words combine to form phrases and
 Morphological derivation exhibits hierarchical sentences
structure
Example: re+vital+ize+ation

 The suffix usually determines the syntactic category of the derived  Syntactic parsing helps identify who did what to whom, a key
word step in understanding a sentence
21 22

Semantics and pragmatics The lexicon


 Semantics studies the meaning of words, phrases and sentences
 A language has a lexicon, which lists for
Ex: I have a dinner in/for an hour
each morpheme
 Pragmatics (Ngữ dụng) studies how we use language to do things in the world
 how it is pronounced (phonology),
Ex: Con vịt chạy đến Mary và liếm chân cô.  its distributional properties (morphology
and syntax),
 what it means (semantics), and
 its discourse properties (pragmatics)
 The lexicon interacts with all levels of
linguistic representation

23 24
05/02/2021

What’s driving NLP and CL research? Factors Changing NLP Landscape

 Tools for managing the "information explosion“


 Increases in computing power
 extracting information from and managing large text document
collections  The rise of the web, then the social web
 NLP is often free tools integrated with main products to sell more  Advances in machine learning
ads;
Ex: speech recognition, machine translation, document clustering (news), etc.
 Advances in understanding of language in social context
 Mobile and portable computing
 keyword search / document retrieval don’t work well on very small
devices
 we want to be able to talk to our computers (speech recognition) and
have them say something intelligent back (NL generation)

25 26

Natural Language Processing Why is NLP difficult?


 Applications  Core Technologies (NLP sub-  Ambiguity
 Machine Translation problems)  Sparsity
 Information Retrieval  Language modeling  Abstractly, most NLP applications can be viewed as prediction
 Question Answering  Part-of-speech tagging problems
 Dialogue Systems  Syntactic parsing  Should be able to solve them with Machine Learning
 Information Extraction  Named-entity recognition  The label set is often the set of all possible sentences
 Summarization  Word sense disambiguation  infinite (or at least astronomically large)
 Sentiment Analysis  Semantic role labeling  Training data for supervised learning is often not available
… …  Unsupervised/semi-supervised techniques for training from available
data
 Algorithmic challenges
NLP lies at the intersection of computational linguistics and machine learning.  vocabulary can be large (e.g., 50K words)
 data sets are often large (GB or TB)

27 28
05/02/2021

Ambiguity ??? Ambiguity


 “At last, a computer that understands you like your mother”
 It understands you as well as your mother understands you
 It understands (that) you like your mother
“At last, a computer that understands you like your mother”  It understands you as well as it understands your mother

“Ông già đi nhanh quá”

29 30

Ambiguity at Many Levels Ambiguity at Many Levels


 At the acoustic level (speech recognition):
 “… a computer that understands you like your mother”  At the syntactic level:

 “… a computer that understands you lie cured mother”

Different structures lead to different interpretations

31 32
05/02/2021

More Syntactic Ambiguity Ambiguity at Many Levels


 At the semantic (meaning) level:
 Two definitions of “bank”
 an organization where people and businesses can invest or borrow money, change it to foreign money,
etc., or a building where these services are offered
 sloping raised land, especially along the sides of a river

 This is an instance of word sense ambiguity

33 34

More Word Sense Ambiguity Dealing with Ambiguity


 At the semantic (meaning) level:  How can we model ambiguity?
 They put money in the bank  Non-probabilistic methods (CKY parsers for syntax) return all possible analyses
 Probabilistic models (HMMs for POS tagging, PCFGs for syntax) and algorithms
 I saw her duck with a telescope (Viterbi, probabilistic CKY) return the best possible analyses, i.e., the most
probable one.
 But the “best” analysis is only good if our probabilities are accurate. Where do they
come from?

35 36
05/02/2021

Corpora Statistical NLP


 A corpus is a collection of text
 Often annotated in some way  Like most other parts of AI, NLP is dominated by statistical methods
 Sometimes just lots of text  Typically more robust than rule-based methods
 Examples  Relevant statistics/probabilities are learned from data
 Penn Treebank: 1M words of parsed WSJ  Normally requires lots of data about any particular phenomenon
 Canadian Hansards: 10M+ words of French/English sentences
 Yelp reviews
 VLSP Corpus (Vietnamese)

37 38

Sparsity Sparsity
 Order words by frequency. What is the frequency of nth ranked word?

39 40
05/02/2021

Sparsity Fields with Connections to NLP


 Regardless of how large our
corpus is, there will be a lot of
infrequent words

 This means we need to find clever


ways to estimate probabilities for
things we have rarely or never
seen

41 42

Today’s Applications What is this course?


 Conversational agents  Linguistic Issues
 Information extraction and question answering  What are the range of language phenomena?

 Machine translation  What are the knowledge sources that let us disambiguate?
 What representations are appropriate?
 Summarization
 How do you know what to model and what not to model?
 Opinion and sentiment analysis
 Statistical Modeling Methods (almost Machine Learning)
 Social media analysis
 Increasingly complex model structures
 Visual understanding
 Learning and parameter estimation
 Essay evaluation  Efficient inference: dynamic programming, search
 Mining legal, medical, or scholarly literature  Deep neural networks for NLP: LSTM, CNN, Seq2seq, Transformer
 …
43 44
05/02/2021

Outline of Topics Goals of this Course


 Words and Sequences  Learn about the problems and possibilities of natural language analysis:
 Text classifications  What are the major issues?
 What are the major solutions?
 Probabilistic language models
 At the end you should:
 Vector semantics and word embeddings
 Agree that language is difficult, interesting and important
 Sequence labeling: POS tagging, NER  Be able to assess language problems
 HMM  Know which solutions to apply when, and how
 Feel some ownership over the algorithms
 Parsers  Be able to use software to tackle some NLP language tasks
 Semantics  Know language resources
 Be able to read papers in the field
 Applications
 Machine translation, Question Answering, Dialog Systems

45 46

Journal and Conference in NLP Conclusion


 http://anthology.aclweb.org/  Computational linguistics and natural language processing:
 were originally inspired by linguistics,
 but now they are almost applications of machine learning and statistics
 We solve these problems using standard methods from machine
learning:
 Define a probabilistic model over the relevant variables
 Factor the model into small components that we can learn
 Ex: HMMs, SVM, CRFs and PCFGs
 End2end: Deep Learning

47 48
05/02/2021

References

 Slides of NLP course from CMU, Toronto University


 Some Tutorials of NLP

49

You might also like