INTERNATIONAL JOURNAL OF TECHNOLOGY ENHANCEMENTS AND EMERGING ENGINEERING RESEARCH, VOL 1, ISSUE 4 131
ISSN 2347-4289
Natural Language Processing
Abhimanyu Chopra, Abhinav Prashar, Chandresh Sain
CSE Department, Dronacharya College of Engineering ,Gurgaon ,India
CSE Department, Dronacharya College of Engineering ,Gurgaon ,India
CSE Department, Dronacharya College of Engineering ,Gurgaon ,India
Email:
[email protected],
[email protected],
[email protected]ABSTRACT: Language is way of communicating your words Language helps in understanding the world ,we get a better insight of the world. Language
helps speakers to be as vague or as precise as they like. NLP Stands for natural language processing. . Natural languages are those languages that are
spoken by the people.Natural language processing girdles everything a computer needs to understand natural language and also generates natural
language.Natural language processing (NLP) is a field of computer science, artificial intelligence, and linguistics mainly focuses on the interactions
between computers and human languages or natural languages. NLP is focussed on the area of human computer interaction. The need for natural
language processing was also felt because there is a wide storage of information recorded or stored in natural language that could be accessible via
computers. Information is constantly generated in the form of books, news, business and government reports, and scientific papers, many of which are
available online or even in some reports. A system requiring a great deal of information must be able to process natural language to retrieve much of the
information available on computers. Natural language processing is an interesting and difficult field in which we have to develop and evaluate or analyse
representation and reasoning theories. All of the problems of AI arise in this domain; solving "the natural language problem" is as difficult as solving "the
AI problem" because any field can be expressed or can be depicted in natural language.
Keywords: Naturallanguageprocessing (NLP), Syntactic, Symantic, Pragmatic, DiscourseIntegration, Morphological, Lexical, Linguistics, Generation,
Machine Learning.
1. INTRODUCTION 2.1NATURAL LANGUAGE UNDERSTANDING:
Natural languages are those languages that are spoken by Its task is to understand and reason where input is a natural
the people.Natural language processing girdles everything language.
a computer needs to understand natural language and also
generates natural language. Natural Language 2.2NATURAL LANGUAGE GENERATION:
Processing is a subfield of Artificial Intelligence and It is a sub generation of natural language processing.It is
linguistic,devoted to make computers understand the also referred to as text generation
statements or words written in human languages.A Natural
language also known as ordinary language that is spoken 3. HISTORY OF NLP
or written by people(humans) for general purpose The history of NLP generally starts in the year 1950s. In
communication. Natural language came into existence 1950, Alan Turing published an article titled "Machine and
because when user wishes to communicate with the Intelligence" which advertised what is now called the Turing
computer we cant force the users to learn machine specific test as a subfield of intelligence. Some beneficial and
langauge so this basically caters to managers or childrens successful Natural language systems were developed in
who do not have enough time to learn new specific the 1960s were SHRDLU, a natural language system
langauges or get skilled in them.Languages can be any like working in restricted "blocks worlds"
Hndi,French,english,chinese etc.A language is a system, a withrestrictedvocabularies was written between 1964 to
set of rules or set of symbols. 1966.
1. Symbols are combined and used for conveying
information or broadcasting the information.
4. LINGUISTICSAND LANGUAGE
2. Rules tyrannize handling of symbols.NLP Besets PROCESSING:
anything a computer or machine needs to Linguistics is the science of language.Its study includes:
understand typed or spoken (natural languge). 1. Sounds which refers to phonology
2. Wordformation refers to morphology
2. NATURALLANGUAGE PROCESSING 3. Sentence structure refers to syntax
4. Meaning refers to semantics
5. Understanding refers to pragmatics
Understanding Generation
5. PHASES OF LINGUISTIC ANALYSIS
I. Higher level corresponds to SPEECH
NL NL RECOGNITION
Input Computer Outpu II. Lower level corresponds to NATURALANGUAGE
t PROCESSING
Fig 1.1 Natural Language Processing
Copyright © 2013 IJTEEE.
INTERNATIONAL JOURNAL OF TECHNOLOGY ENHANCEMENTS AND EMERGING ENGINEERING RESEARCH, VOL 1, ISSUE 4 132
ISSN 2347-4289
6. LEVELS ARE task domain.Eg. “colorless blue idea” .This would be
rejected by the analyser as colorless blue do not make any
Fig 6.1 Speech Recognition sense together.
7.4 Discourse Integration
Acoustic
The meaning of any single sentence depends upon the
signal Letter- sentences that preceeds it and also invokes the meaning
Phones
Strings of the sentences that follow it .Eg the word “it” in the
sentence “she wanted it” depends upon the prior discourse
Morphemes context.
Words
7.5 Pragmatic Analysis
It means abstracting or deriving the purposeful use of the
Fig 6.2 Natural language Processing language in situations importantly those aspects of
language which require world knowledge the main focus is
on what was said is reinterpreted on what it actuallymeans.
Morp Phrase Eg “closethewindow?” should have been interpreted as a
heme s and request rather than an order.
s sentec
Wor 8. NATURAL LANGUAGE PROCESSING
es
ds USING MACHINE LEARNING
Fig.8.1NLP Procedure
Meani
Meani
ng out
ng in
ofcont NLP Struc
context String
ext applicatio ture
nEg.Voice Engin
Recogniti e
7. STEPS OF NATURAL LANGUAGE on
PROCESSING
There are 5 phases involved in natural language Resultan
processing t String
1. Morphological and Lexical Analysis
2. Syntactic Analysis
3. Semantic Analysis
4. Discourse Integration Many NLP algorithms are based on machine learning,
5. Pragmatic Analysis mainly statistical machine learning. The working of machine
learning is different from attempts at language processing.
7.1Morphological and Lexical Analysis Prior implementations of language-processing tasks
The lexicon of a language is its vocabulary that includes its typically involved the direct hand coding of large sets of
words and expressions. Morphology depicts analysing, rules. The machine-learning calls instead for using general
identifying and description of structure of words. learning algorithm to automatically learn such rules through
the analysis of large ocean of typical real-world
7.1.1Lexical Analysis examples.Many different classes of machine learning
It involves dividing a text into paragraphs,words and the algorithms have been applied to NLP tasks. These
sentences algorithms take as input a large set of characterstics that
are generated from the input source. Some earlier
7.2 Syntactic Analysis produced algorithms such as decision trees, produced
This involves analysation of the words in a sentence to systems of hard if-then rules similar to the systems of hand-
depict the grammatical structure of the sentence. The written rules that were then common. Increasingly,
words are transformed into structure that shows how the however, research has focused on statistical models, which
words are related to each other Eg. “the girl the go to the make soft, probabilistic decisions based on attaching real-
school”. This would definitely be rejected by the English valued weights to each input Various Systems based on
syntactic analyser. machine-learning algorithms have many advantages over
hand-produced rules:
7.3 Semantic Analysis
This abstracts the dictionary meaning or the exact meaning 8.1 The learning procedures used during machine learning
from context. The structures which are created by the automatically focuses on the most common cases, whereas
syntactic analyser are assigned meaning. There is a when we write rules by hand it is often not correct at all
mapping between the syntactic structures and the objects in where the effort should be deviated .
Copyright © 2013 IJTEEE.
INTERNATIONAL JOURNAL OF TECHNOLOGY ENHANCEMENTS AND EMERGING ENGINEERING RESEARCH, VOL 1, ISSUE 4 133
ISSN 2347-4289
b. Part-of-speech tagging:It describes a sentence,
8.2 Automatic learning procedures can make use of determines the part of speech for each word.
statistical inference algorithms to produce models that are
robust (means strength) to unfamiliar input e.g. containing c. Parsing:It refers to the parse tree (grammatical
words or structures that have not been seen before analysis or evaluation) of a given sentence
d. Question answering:It answers a given
8.3Systems based on automatically learning the rules can human language question and determines its
be made more accurate simply by supplying more input answer.
data or source to it. However, systems based on hand-
written rules can only be made more accurate by increasing
the complexity of the rules, which is a much more difficult
11. Statistical NLP:
task. Statistical natural-language processing using random,
probabilistic and statistical methods to settle some of the
difficulties especially the ones which arise because longer
9. Major tasks in NLP sentences are highly equivocal when processed with
This lists some of the different researches done in NLP. realistic grammars. NLP comprises all quantitative
approaches to.
9.1 Automatic summarization
It produces s an understandable summary of a set of text. It
is used to provide summaries or detailed information of text
12. FUTURE OF NLP
Human level or human readable natural language
of a known type,.
processing is an AI-complete problem.It is equivalent to
solving the central artificial intelligence problem and making
9.2 Coreference resolution
computers as intelligent as people so that they can solve
It refers to a sentence or larger set of text that determines
problems like humans and think like humans as well as
which words refer to the same objects ,example of this is
perform activities that humans cant perform and making it
concerned with matching up pronouns with the nouns or
more efficient than humans .NLP's future is closely linked
names that they link to.
to the growthof Artificial intelligence.As natural language
understanding or readability improves, computers or
9.3 Discourse analysis machines or devices will be able to learn from the
The task is identifying the discourse structure of connected information online and apply what they learned in the real
text, i.e. the nature of the discourse relationships between world. Combined with natural language generation,
sentences e.g. elaboration, explanation, contrast. Another computers will become more and more capable of receiving
possible task is recognizing and classifying the speech acts and giving useful and resourceful information or data.
in a large set of text e.g. yes and no questions, content
question, statements, assertion etc.
13. CONCLUSSION
9.4 Machine translation The strength or the capability to use natural language for
Automatically translates text from one human language to query specification and retrieval baggs over the keyword,
another. keyphrase approaches. The believe that the restricted use
ofnatural language in captions for multimedia data
abstraction is a less cumbersome task than full natural
9.4.1Morphological segmentation
language fact abstraction,and feel that we have a system
Separate words into individual morphemes and identify the
that can be judged and and built upon not only for
class of the morphemes. The difficulty of this task depends
abstracting images but also the form so multimedia
greatly on the complexity of the morphology i.e. the
(audio,video,text,data etc) data or input sources
structure of words of the language being considered.
as well .
9.5 Named entity recognition (NER)
It describes a stream of text, determine which items in the 14. REFERENCES
text relates to proper names, such as people or places, [1]. ArtificialNeuralNetwork,http://en.wikipedia.org/wiki/
and what the type of each such name or place we are Artificial_neural_network
referring to is.
[2]. Neuralnetwork,www.en.wikipedia.org/wiki/Neural_n
9.5.1Naturallanguage understanding etwork
It Converts large set of text into more formal
representations such as first-order logic structures that are [3]. Guoqiang Zhang, B. Eddy Patuwo, Michael Y.
easier for computer programs to manipulate notations of Hu*(July 1997), Forecasting with the Artificial
natural languages concepts. Neural Networks, Kent State University, Kent,
USA.
10. Other tasks includes [4]. http://www.slideshare.net/jhonrehmat/natural
a. Optical character recognition (OCR):Given an
language processing.
image representing printed text, helps in
determining the corresponding or related text. [5]. Artificial Intelligence by Elaine Rich and Kevin
Knight, 1991, McGraw-Hill.
Copyright © 2013 IJTEEE.
INTERNATIONAL JOURNAL OF TECHNOLOGY ENHANCEMENTS AND EMERGING ENGINEERING RESEARCH, VOL 1, ISSUE 4 134
ISSN 2347-4289
[6]. Artificial Intelligence-A Modern Approach by
Russel and Norvig, 1995, Prentice Hall
[7]. Natural Language Processing an
Introductionwww.ncbi.nlm.nih.gov v.18(5); Sep-Oct
2011.
[8]. NaturalLanguageProcessing,www.myreaders.info
/html/artificial_intelligence.html.
[9]. NaturalLanguageProcessing-Computer science
andengineeringwww.cse.unt.edu/~rada/CSCE5290
/Lectures/Intro. t
[10]. NLP,www9.georgetown.edu/faculty/mad87/05/nlp/s
lides/intro. t
[11]. NLP, htt s www.coursera.org course nl
[12]. NLP,research.microso t.com en-us grou s nl
[13]. NLP,see.stanford.edu/see/courseinfo.aspx?coll=63
480b48-8819-4efd.
[14]. NLP, nl .stan ord.edu
[15]. NLP,n tel.iitm.ac.in courses
[16]. www.autonomy.com/content/Functionality/idol...nlp/
inde .en.html
[17]. web.cs.wpi.edu/~cs534/f06/LectureNotes/Slides/na
t_lang_processing. t
[18]. www9.georgetown.edu/faculty/mad87/05/nlp/slides
/intro.ppt
[19]. www.pages.drexel.edu/~rw37/courses/info629/slid
es629/629ppt7. t
[20]. www.city.academic.gr/s3ctit03/presentations/nenad
. t
[21]. www.site.uottawa.ca/~nat/Courses/csi4106_2008/..
./AI05_10b_NLP. t
[22]. www.pes.edu/mcnc/AI/data/.../Rich.../Chapter15_N
aturalLanguage. t
[23]. www.authorstream.com/.../aSGuest26001-239367-
natural-language-proc...
Copyright © 2013 IJTEEE.