0% found this document useful (0 votes)
222 views21 pages

Introduction to Natural Language Processing

This document provides an introduction to natural language processing (NLP). It discusses how NLP is used in applications like information retrieval, information extraction, machine translation, question answering, and processing user-generated content. It also outlines some of the key challenges in NLP, including issues related to syntax, semantics, information extraction, information retrieval, and machine translation. The document provides an overview of the basic levels of linguistic analysis and how NLP fits within computer science.

Uploaded by

Rohit RBr
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
222 views21 pages

Introduction to Natural Language Processing

This document provides an introduction to natural language processing (NLP). It discusses how NLP is used in applications like information retrieval, information extraction, machine translation, question answering, and processing user-generated content. It also outlines some of the key challenges in NLP, including issues related to syntax, semantics, information extraction, information retrieval, and machine translation. The document provides an overview of the basic levels of linguistic analysis and how NLP fits within computer science.

Uploaded by

Rohit RBr
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 21

Introduction to NLP

Natural Language Processing

Mohit B. R.
MCA102010037

V.J.T.I.
Any Light at The End of The Tunnel?

• Yahoo, Google, Microsoft  Information Retrieval


• Monster.com, HotJobs.com (Job finders)  Information Extraction +
Information Retrieval
• Systran powers Babelfish  Machine Translation
• Ask Jeeves  Question Answering
• Myspace, Facebook, Blogspot  Processing of User-Generated Content
• Tools for “business intelligence”
• All “Big Guys” have (several) strong NLP research labs:
– IBM, Microsoft, AT&T, Xerox, Sun, etc.
Why Natural Language Processing ?
• Huge amounts of data • Classify text into categories
• Index and search large texts
– Internet = at least 20
• Automatic translation
billions pages
• Speech understanding
– Intranet – Understand phone conversations
• Applications for • Information extraction
– Extract useful information from resumes
processing large
• Automatic summarization
amounts of texts – Condense 1 book into 1 page
require NLP expertise • Question answering
• Knowledge acquisition
• Text generations / dialogues
Natural?
• Natural Language?
– Refers to the language spoken by people, e.g. English,
Japanese, Swahili, as opposed to artificial languages, like
C++, Java, etc.
• Natural Language Processing
– Applications that deal with natural language in a way or
another
• [Computational Linguistics
– Doing linguistics on computers
– More on the linguistic side than NLP, but closely related ]
Why Natural Language Processing?
• kJfmmfj mmmvvv nnnffn333
• Uj iheale eleee mnster vensi credur
• Baboi oi cestnitze
• Coovoel2^ ekk; ldsllk lkdf vnnjfj?
• Fgmflmllk mlfm kfre xnnn!
Computers Lack Knowledge!
• Computers “see” text in English the same you have
seen the previous text!
• People have no trouble understanding language
– Common sense knowledge
– Reasoning capacity
– Experience
• Computers have
– No common sense knowledge
– No reasoning capacity
Where does it fit in the CS taxonomy?
Computers

Databases Artificial Intelligence Algorithms Networking

Robotics Natural Language Processing Search

Information Machine Language


Retrieval Translation Analysis

Semantics Parsing
Linguistics Levels of Analysis
• Speech
• Written language
– Phonology: sounds / letters / pronunciation
– Morphology: the structure of words
– Syntax: how these sequences are structured
– Semantics: meaning of the strings
Issues in Syntax
“the dog ate my homework” - Who did what?
1. Identify the part of speech (POS)
Dog = noun ; ate = verb ; homework = noun
English POS(Part-Of-Speech) tagging: 95%

2. Identify collocations
mother in law, hot dog
Compositional versus non-compositional
collocates
Issues in Syntax
• Shallow parsing:
“the dog chased the bear”
“the dog” “chased the bear”
subject - predicate
Identify basic structures
NP-[the dog] VP-[chased the bear]
More Issues in Syntax
• Anaphora Resolution:
“The dog entered my room. It scared me”

• Preposition Attachment
“I saw the man in the park with a telescope”
Issues in Semantics
• Understand language! How?
• “plant” = industrial plant
• “plant” = living organism
• Words are ambiguous
• Importance of semantics?
– Machine Translation: wrong translations
– Information Retrieval: wrong information
– Anaphora Resolution: wrong referents
Why Semantics?
• The sea is at the home for billions factories and
animals
• The sea is home to million of plants and
animals
• English  French [commercial MT(Machine
Translation) system]
• Le mer est a la maison de billion des usines et
des animaux
• French  English
Issues in Information Extraction
• “There was a group of about 8-9 people close to
the entrance on Highway 75”
• Who? “8-9 people”
• Where? “highway 75”

• Extract information
• Detect new patterns:
– Detect hacking / hidden information / etc.
Issues in Information Retrieval
• General model:
– A huge collection of texts
– A query
• Task: find documents that are relevant to the given
query
• How? Create an index, like the index in a book
• More …
– Vector-space models
– Boolean models
• Examples: Google, Yahoo, etc.
Issues in Information Retrieval
• Retrieve specific information
• Question Answering
• “What is the height of mount Everest?”
• 11,000 feet
Issues in Information Retrieval
• Find information across languages!
• Cross Language Information Retrieval
• “What is the minimum age requirement for car
rental in Italy?”
• Search also Italian texts for “eta minima per
noleggio macchine”
• Integrate large number of languages
• Integrate into performant IR engines
Issues in Machine Translations
• Text to Text Machine Translations
• Speech to Speech Machine Translations

• Most of the work has addressed pairs of widely


spread languages like English-French, English-
Chinese
Issues in Machine Translations
• How to translate text?
– Learn from previously translated data
 Need parallel corpora
• French-English, Chinese-English have the
Hansards
• Reasonable translations
• Chinese-Hindi – no such tools available today!
Current Applications and
Advantages of NLP
• Intro to Perl
– Great for text processing
– Fast: one person can do the work of ten others
– Easy to pick up
• Some linguistic basics
– Structure of English
– Parts of speech, phrases, parsing
• Morphology
• Part of speech tagging
• Syntactic parsing
• Semantics
– Word sense disambiguation
– Semantic relations
Thank
you!
The End

You might also like