NATURAL LANGUAGE PROCESSING
(An Implementation in Querying Databae!
A"STRACT
Natural language processing (NLP) is a subfeld of artifcial intelligence
and linguistics. It studies the problems of automated generation and
understanding of natural human languages. Natural language generation
systems convert information from computer databases into normal-sounding
human language, and natural language understanding systems convert
samples of human language into more formal representations that are easier
for computer programs to manipulate.
Natural Language Processing is the artifcial intelligent concept here
the machines understand Natural Languages li!e "nglish, #orean, $rench,
%elugu, &indi.. etc., 'e are going to develop a tool that ill ta!e the
(atabase )ueries in the form natural language and then processes it and
gives the result. %his includes many sub components li!e Language *naly+er,
,uery -uilder and .ieer. %he system ill frst parses the )uery in natural
language and fnds the ma/or parts in the string. %hen frst it ill loo! for the
table name and then it parses the string for the here clause and then for
the order by clause. *fter parsing it ill construct the )uery string based on
the data available. %he generated 0,L )uery is posted to the database to
fetch the results.
INTRODUCTION
&uman understanding of language re)uires bac!ground or common
sense !noledge of the orld. &uman consciousness is tightly coupled ith
both language and our internal models of the outer orld. Indeed, many
argue that it is our consciousness that creates our on orld (i.e., e create
the orlds that e live in). It ma!es little sense to assume that the real orld
is static and is not a1ected by conscious entities living in that orld. 0o, in
trying to understand life and consciousness, it is important to understand the
conte2t of e2periences in the orld. 3hildren playing often ma!e up ne
ords spontaneously that for the children involved has real meaning in the
conte2t of their lives. %here are to basic approaches depending on hether
e ant to rite an e1ective 4natural language front end5 to a softare
system or if e are motivated to do fundamental research on minds and
consciousness by building a system that ac)uire structure and intelligence
through its interaction ith its environment.
$inite 0tate 6achines that recogni+e ord se)uences as syntactically
valid sentence. 3onceptual (ependency parsers that stress semantics rather
than synta2. %he system uses *n *%N based parser of the 'ordnet le2icon.
*%N parsers are fnite state machines that recogni+e ord se)uences
as specifc ords, noun phrases, verb phrases, etc. %he conte2t free
programming for NLP includes the folloing. (i7culty in dealing ith
di1erent sentences structures that has the same meaning. &andling number
agreement beteen sub/ects and verbs. (etermining the deep structure of
input te2ts.
%he term morphological tags refers to labeling of ords ith parts of
speech tags. 0ome of the e2amples are as follos.
Noun 8 cat, dog, boy etc
Pronouns 8 &e, she, it
o 9elative Pronouns 8 hich, ho, that
.erb 8 run, thro, see etc
(eterminers
o *rticles 8 a, an, the
o Possessives 8 my, your, theirs etc
o (emonstratives 8 this, that, these, those
o Numbers
*d/ectives- -ig, small, purple etc
*dverbs
o (escribe ho some thing is done 8 fast, ell. "tc
o %ime after, soon, etc
o ,uestioning 8 &o, hy, hen, here
o Place 8 don, up, here etc.
In general accurate assigning correct morphological tags to input te2t
is di7cult problem. &idden 6ar!ov 6odel and -ayesian techni)ues are used
for assigning ord types. "nglish :rammar is comple2 %he important steps in
building NLP technology into your on programs are.
9educe domain of discourse to a minimum.
3reate a set of 4use cases5 to focus your e1ort in designing and
riting *%Ns, and to use for testing your NLP system during
development.
'hen possible capture te2t input from real users of your system,
and incrementally build up a set of use cases that your system can
handle correctly.
6ap indentifed ords ; parts of speech to actions that system
should perform.
Le2icon data is used to indicate the many of the ord types. 'e ill
use ordnet le2ical database to build a le2icon
REQUIRE#ENTS ANAL$SIS DOCU#ENT
Intro%u&tion
a' Purpoe o( t)e ytem
%he main purpose of the system is to design and develop a system
that can understand the Natural Languages Li!e "nglish and can convert the
natural languages into data base )ueries. %he )ueries are e2ecuted in the
(-60 and the response ill be in the Natural Language.
b' S&ope o( t)e Sytem
%he scope of the system includes developing the system that can
understand Natural language processor using the *rtifcial Intelligent
concepts.
&' Ob*e&ti+e an% Su&&e Criteria o( t)e Pro*e&t
%he main ob/ective of the system is to design and implement of *%N
Parser in <ava. %o create a database interface, %o create an Natural Language
"ngine, %o create a smart (ate and %o create a help fle.
%' De,nition- a&ronym an% abbre+iation
Current Sytem
In the current system the )ueries are in high level languages li!e 0,L.
%he person ho is using that system must learn the 0,L and rite the
)ueries in the &igh level languages.
Propoe% Sytem
=vervie>
%he proposed system is an intelligent system hich ill understand the
natural language and converts the natural language )uery into the 0,L
)uery. %he system ill use the "nglish parts of speech, divides and identifes
the nouns, verbs and con/unctions. %he 0,L )uery is e2ecuted in the oracle
database. %he results are again shon in the Natural Language.
$unctional 9e)uirements>
%he ma/or functional re)uirements of the system are as follos.
?. %o create a natural language processor.
@. %o create (- Interface to connect the database.
A. %o implement a Natural Language "ngine hich consists of 0earch
techni)ues for the ords.
Non $unctional 9e)uirements>
%he ma/or non functional 9e)uirements of the system are as follos
?. %he )ueries from the client.
@. %he data in the database.
?. Bsability
%he system is designed ith completely automated process
hence there is no or less user intervention.
@. 9eliability
%he system is more reliable because of the )ualities that are
inherited from the chosen platform /ava. %he code built by
using /ava is more reliable.
A. Performance
%he system e2hibits high performance because it is ell
optimi+ed. It uses the automatic garbage collection from /ava.
C. 0upportability
%he system is designed to be the cross platform supportable.
%he system is supported on a ide range of hardare and any
softare platform hich is having <.6 built into the system.
D. Implementation
%he system is implemented in the platform independent, Light
eight, <ava $oundation 3lasses called /ava 0ings. 3ore <ava
classes for the implementation of the *I 3oncepts.
E. Interface
%he Bser Interface is completely based on the 0ing
components.
F. Pac!aging
%he entire application is pac!aged into the single pac!age
named nlp.
G. Legal
%he code sub/ected in this pro/ect is user permissions are
issued to :PL :eneral Public License.
.ar%/are 0 So(t/are #apping
&ardare 9e)uirements
3PB > Intel Pentium C Processor
9*6 > D?@ 6-
&(( > GH :-
Netor! > NI3 3ard 9e)uired
0oftare 9e)uirements
Programming Language > <ava (.ersion <(# ?.D)
(atabase -ac!end > =racle ?Hg 9elease @
%echnologies > 0ervlets, <0P
0cripting Language > <ava 0cript
=perating 0ystem > 'indos IP Professional 'ith
0ervice