AD-802(A) Natural Language Processing
UNIT 1
N L P TA S K S I N S Y N TA X , S E M A N T I C S , A N D P RAG M AT I C S
@DR. PRITIKA BAHAD, DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND DATA SCIENCE, PIEMR, INDORE 1
Content
• NLP
• Syntax, Semantic and Pragmatics
• NLP tasks in Syntax, Semantic and Pragmatics
@DR. PRITIKA BAHAD, DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND DATA SCIENCE, PIEMR, INDORE 2
Content
• NLP
• Syntax, Semantic and Pragmatics
• NLP tasks in Syntax, Semantic and Pragmatics
@DR. PRITIKA BAHAD, DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND DATA SCIENCE, PIEMR, INDORE 3
Natural Language Processing
•NLP is the branch of computer science focused on developing systems
that allow computers to communicate with people using everyday
language.
•Also called Computational Linguistics
•Also concerns how computational methods can aid the understanding of
human language
@DR. PRITIKA BAHAD, DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND DATA SCIENCE, PIEMR, INDORE 4
Natural Language Processing
Ref: CSE 628 - Introduction to NLP (Professor Niranjan Balasubramanian) Image From: [Link]
@DR. PRITIKA BAHAD, DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND DATA SCIENCE, PIEMR, INDORE 5
Communication
•The goal in the production and comprehension of natural language is
communication.
•Communication for the speaker:
◦ Intention: Decide when and what information should be transmitted
(a.k.a. content selection, strategic generation). May require planning
and reasoning about agents’ goals and beliefs.
◦ Generation: Translate the information to be communicated (in
internal logical representation or “language of thought”) into string of
words in desired natural language (a.k.a. surface realization, tactical
generation).
◦ Synthesis: Output the string in desired modality, text or speech.
@DR. PRITIKA BAHAD, DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND DATA SCIENCE, PIEMR, INDORE 6
Communication
•Communication for the hearer:
◦ Perception: Map input modality to a string of words, e.g. optical
character recognition (OCR) or speech recognition.
◦ Analysis: Determine the information content of the string.
◦ Syntactic interpretation (parsing): Find the correct parse tree
showing the phrase structure of the string.
◦ Semantic Interpretation: Extract the (literal) meaning of the string
(logical form).
◦ Pragmatic Interpretation: Consider effect of the overall context on
altering the literal meaning of a sentence.
◦ Incorporation: Decide whether or not to believe the content of the
string and add it to the KB.
@DR. PRITIKA BAHAD, DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND DATA SCIENCE, PIEMR, INDORE 7
Communication
@DR. PRITIKA BAHAD, DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND DATA SCIENCE, PIEMR, INDORE 8
Content
• NLP
• Syntax, Semantic and Pragmatics
• NLP tasks in Syntax, Semantic and Pragmatics
@DR. PRITIKA BAHAD, DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND DATA SCIENCE, PIEMR, INDORE 9
Syntax
•Syntax concerns the proper ordering of words and its affect on meaning.
◦ The dog bit the boy.
◦ The boy bit the dog.
◦ * Bit boy dog the the.
◦ Colorless green ideas sleep furiously.
@DR. PRITIKA BAHAD, DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND DATA SCIENCE, PIEMR, INDORE 10
Semantic
•Semantics concerns the (literal) meaning of words, phrases, and
sentences.
◦ “plant” as a photosynthetic organism
◦ “plant” as a manufacturing facility
◦ “plant” as the act of sowing
@DR. PRITIKA BAHAD, DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND DATA SCIENCE, PIEMR, INDORE 11
Pragmatics
•Pragmatics concerns the overall communicative and social context and
its effect on interpretation.
◦ The ham sandwich wants another beer. (co-reference, anaphora)
◦ John thinks vanilla. (ellipsis)
@DR. PRITIKA BAHAD, DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND DATA SCIENCE, PIEMR, INDORE 12
Modular Comprehension
Acoustic/ Pragmatics
Syntax Semantics
sound Phonetic
words parse literal meaning
waves trees meaning (contextualized
)
@DR. PRITIKA BAHAD, DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND DATA SCIENCE, PIEMR, INDORE 13
Natural Language Tasks
•Processing natural language text involves many various syntactic,
semantic and pragmatic tasks in addition to other problems.
@DR. PRITIKA BAHAD, DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND DATA SCIENCE, PIEMR, INDORE 14
Syntactic Tasks
@DR. PRITIKA BAHAD, DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND DATA SCIENCE, PIEMR, INDORE 15
Word Segmentation
• Breaking a string of characters into a sequence of words.
• In some written languages (e.g. Chinese) words are not separated by
spaces.
• Even in English, characters other than white-space can be used to
separate words [e.g. , ; . - : ( ) ]
• Examples from English URLs:
– [Link] jump the shark .com
– [Link]/pluckerswingbar
myspace .com pluckers wing bar
myspace .com plucker swing bar
@DR. PRITIKA BAHAD, DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND DATA SCIENCE, PIEMR, INDORE 16
Morphological Analysis
• Morphology is the field of linguistics that studies the
internal structure of words. (Wikipedia)
• A morpheme is the smallest linguistic unit that has
semantic meaning (Wikipedia)
– e.g. “carry”, “pre”, “ed”, “ly”, “s”
@DR. PRITIKA BAHAD, DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND DATA SCIENCE, PIEMR, INDORE 17
Morphological Analysis
• Morphological analysis is the task of segmenting a word
into its morphemes:
– carried carry + ed (past tense)
– independently in + (depend + ent) + ly
– Googlers (Google + er) + s (plural)
– unlockable un + (lock + able) ?
(un + lock) + able ?
@DR. PRITIKA BAHAD, DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND DATA SCIENCE, PIEMR, INDORE 18
Part Of Speech (POS) Tagging
• Annotate each word in a sentence with a part-of-speech.
I ate the spaghetti with meatballs.
Pro V Det N Prep N
John saw the saw and decided to take it to the table.
PN V Det N Con V Part V Pro Prep Det N
• Useful for subsequent syntactic parsing and word sense
disambiguation.
@DR. PRITIKA BAHAD, DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND DATA SCIENCE, PIEMR, INDORE 19
Phrase Chunking
• Find all non-recursive noun phrases (NPs) and verb
phrases (VPs) in a sentence.
– [NP I] [VP ate] [NP the spaghetti] [PP with] [NP
meatballs].
– [NP He ] [VP reckons ] [NP the current account
deficit ] [VP will narrow ] [PP to ] [NP only # 1.8
billion ] [PP in ] [NP September ]
@DR. PRITIKA BAHAD, DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND DATA SCIENCE, PIEMR, INDORE 20
Syntactic Parsing
• Produce the correct syntactic parse tree for a sentence.
@DR. PRITIKA BAHAD, DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND DATA SCIENCE, PIEMR, INDORE 21
Next Lecture
• Semantic Tasks
• Pragmatics Tasks
@DR. PRITIKA BAHAD, DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND DATA SCIENCE, PIEMR, INDORE 22
Reference
1. [Link]
2. SBU CS Graduate Course: CSE 628 - Introduction to NLP (Professor Niranjan
Balasubramanian)
3. Coursera Course on Introduction to Natural Language Processing by Prof.
Dragomir Radev
@DR. PRITIKA BAHAD, DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND DATA SCIENCE, PIEMR, INDORE 23