0% found this document useful (0 votes)
17 views12 pages

Controlled Vocabulary

Uploaded by

molydey61
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views12 pages

Controlled Vocabulary

Uploaded by

molydey61
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

1

7
·vo cab ula ry con tro l

Intr odu ctio n


Vocabulary control is one of the most impo rtant component
s of an infonnation
~etrieval system. As we have noted from its simple model
given in Chapter I , an
mformation retrieval system tries to match user queries with
the stored docum ents
(document surrogates) and retrieves those that match. In order
Lo match the contents
of the user requirements (the search terms) with the contents
of the stored documents
(the document records), one must follow a vocabulary that
is common to both. 111
other words, user requirements need to be translated and put
to the retrieval systems
in the same language (using the same terms, for example) as
was used to express the
contents of the document records. This leads us to the conce
pt of using a standard or
controlled vocabulary in an information retrieval environme
nt.
Davis and Rush 1 define the term 'vocabulary control' in a simpl
e way that can be
reproduced as follows. Indexing may be thought of as a proce
ss of labelling items
for future reference. Considerable order can be introduced
into the process by
standardizing the terms that are to be used as labels. This
standardization is known
as vocabulary control, the systematic selection of preferred
terms.
According to Lancaster2 the process of subject indexing involv
es two quite distinct
intellectual steps: the 'conceptual analysis' of the documents
and 'translation' of the
conceptual analysis into a particular vocabulary. The secon
d step in any information
retiieval environment involves a 'controlled vocabulary', that
is a limited set of terms
that must be used to repre sent the subject matter of documents.
Similarly, the procesc;
of preparing the search strategy also involves two stages
: conceptual analysis and
translation . The first step involves an analysis of the request
(submitted by the user) to
determine what the user is really looking for, and the secon
d step involves translation
of the conceptual analysis to the vocabulary of the system.
A number of vocabulary control tools have been designed
over the y~ars: they
ciiffe r in their struct~rF: .md desi~r. features, but they all have
the same purpo se in an
infom1ation retrieva: ;:;nvi,c;:!,;;;;11l. Availability of vocab ;;tury
control helps both tue
indexers - peopl e who are engaged in creating docwnent
records. particul~rly those
who create subject representation for the documents (by using
keywords: 1n a po~~-
coordinate system, for example) - as well as the end-users
in the fo nnu\auon of their
··earch expression s. A large number ?f softwa_re packages a~c no_w ~vail abk t~;it
~llow th P record creat or to automal1cally switch io one
ni n101e 1..hosen onl nr
156 INTRODUCTI O N TO MODERN INFOR
MATION RETRI EVAL

VOCABULARY Cori~
vocabulary_ contro l too ls in o rder l\
lo select appro priate tc m1s for repres
docum ent m hand. Tlus h e lps 111 a e nting the
numb er of ways - the doc ument record terms from this list to specif ic docum
on ly contai n a numb e r of tem, s s do not ents. The search er is expected lo
tha~ are _repres entati ve of the conten
docum ent, but these are al so standa
rdized (111 te1111s of their usage , spell
ts of the same _contr olled Ji.,t
during fonnu lation of a search s
trategy. In natura ~:uh l!it
iiig, fom, indexi ng, any terln that appea rs 10 ll1e
and so on) and arc hkel_y to be chose
n by the user for search ing purpo ses. .
be an index term.·r · . title, abstra ct or text of a document re
h . I ~
Similar! ,'
th e use of lermi m~m
there are progra ms av ailabl e by winch There 1s no mec amsm to contro
. . end-u sers may go to Llie. for •1
a partic ular onlrne vocab· ulary c o ntrol ' • }, indexi ng. ~imil~r!y, the searc her is
- . tool in o rder 10 choos ti appro priate page of Sveno nrus d1v1de s the debat e conce
not e~pec ted to use any controlled list
of,:
tem1(s ) tor prepa ring the search expre c 1e most appro prr.ale mmg natura l and contro lled vocabulary
. - th - - ssion. Yocab u larv contr0 I t I . · I three eras, 10 which Rowle y modif ied ·
users mo d 11y e1r prev iously fomiu latcd search expres . oo s a so help end- into four, as s hown in Table 7.1.9
nar.-o wing dm,~1 the search expre ssions
.
sions b ·u ·d • lnlij
· ·· Y er icr W I cnrng or Table 7 .1 The four eras of debat e
2
Lanca ster identi fies two major on contro lled vs. natural language indexing
inform ation retrie val en vironm ent:· ob jectiv es of , b I Era 1 Controlled vocabulary.
· 10 - 1-::--::+.:--,- ---, ---- -,-- ---,
· ca u ary contro l 111 an ---: :-:- -:-- ---: ---- :--- ' !i
Era 2 Comparisons of natural and controlled :---- I '
language: major experimental studies noted
natural language can perform as well as control itial
► tu promo te the consi stent repres entati led v:icabulary, bl/! other factors. such as t, 1'
on of subjec t muller by indexe rs and number of access points, are also significant.
search ers, thereb y avoid ing the disper sio n of related Era 3 Many case studies of limited generalizability I
tl1~ough_ the co ntro l (':1e_rgii,~) of synon
materials. This is achieved
ymou s and nearly synon ymous was noted that the best performance can be
. Searchi~g online databases was oonsi<Jereo.
achieved by a cornbinatinn of controlled and
I l
expre sswns and by d1s tUJgw shmg natural language; the number of access poi1 i
amon g homog raphs its was reaffinned to have a signfficaru effect;
► to fac1ht ate th e condu ct of a compr
ehens ive search on some topic by Jin
full-text and bibliographic databases were noted I
kin to have produced different results.
togeth er tem1s whos e meani ngs are Era 4 New advances in user-based systems 1
related paradi gmatic ally or syn tagma including OPACs. The value of controlled
tic!lly. the context of user-friendly interfac&S and the vm:Ja;yr~
development of knowledge bases were rir,tsd
\
Lanca ster add,; that index ing tends
to be more consis tent when the vocab
1s contro lled, becau se index ers are ulary used Aitchison and Gikhr ist pr~vid e a compa
more likely to agree on the terms rison of contro lled and natural language
descri be a partic ular topic if they needed to mdexm g, winch 1s shown m Tabl e 7.2
are select ed from a pre-es tabli shed list on the next page_ 11 Rowle y mentio
they are given a free hand to use than when despite much debate exten~ mg over ns that
any ten11s th at they wish. Similarly, m o re th an a centur y, ,ogeth er with
search er's point of v iew. it is easier from the research _proJects'. mformatron s_cientists O ranee of
to identif y the terms approp riate to in have failed to resolv e the issue con..:e
needs if these tenns must be select formation the relative ments and demen ts of ~ne
ed from a definit ive li st. T hus contro contro lled and natura l lanm•a!!e
practice and tested resea_rch b~ve s~gge 9 Howe"Ve;,
vocab ulary tends to match the langu lled sted that contro ll ed lan~a ;'" ?.nci
age of indexe rs and search ers. language should be used m conJun cllon nalural
A large numb er of docum ents have wi1h one anoth er. ~
appe& ed coveri ng the details of variou
voci,b ulary contro l tools. There are s
also standa rds such a~ BS 5723, 3 BS
2788,5 ISO 5964 6 and UNISIST guidel 6723, 4 JSO
ines. 7• 8 In this chapte r, we shall try Voca bula ry cont rol tools
under stand what a vocab ulary contro to
l tool is and how it contro ls the vocabu
an inform ation retrie val enviro nment lary in
. We shall also !earn aboul the As the name sugges ts, tl1ese are the tools
vocab ulary contro l tool s, their charac various and retr ieval. These are natural langua used to contro l the v···~· ..
teristi c features, mechanisms of devel ge tools me . .
. ,.~abul::tr) 0/ 1nde,10~
and so on . Finall y we shall look into
on line vocab ulary comro l device s.
the creatio n, riiainte nance and usage
opment.
aspects of
natural language 1erms !hat can be used
indexe r and an index user need is a set of
for index ino / 11
1 ;~g t_
mu th ese 1uols
guiclelines"fo:. \ iei~,e\"ictl purpo ses.
conrain
Syndetic structures are devices that What an
provid e th ese t ,1 c_ P• ~per , e lec t ion
relationships among lenus or concepts. :1f tem1s.
. and they fall gwcle !ines by slw1Ving 1he
cl2ssificat10n schemes. an d SU b.~_eel .
hcad_mg fi sts dllclinroth<'slwo .
· mtror catego n_ .
Con troll ed vs natu ral i~n~d~e'::x~i~n schemes, bemg tools f:;, org~mzmg i,;·
-
~g~- - - -- - - - - - - knowl edge. cou /d · "Lin. C fnss1R cauon
. . -
~"-nlr o lled mdex mg '.:!!le...:m--· . _..J,"bulary control b:i. ::.,.; "''··" t,'lJy ufclas
- h b tl· .· . . terms that are used 10
~., :.re those Ill w 111c silical iun sc , ... ~" 0 _f grear he lp
o. ' ,1.c artific ial language, whereas t~r for
repres ent s ubject s and -the- proces
s where by terms arc assigned 10
. .
parttcular reprcsenroiion. Indexes to cl~ss1ficallon
v~cabulary contro l \vc n;;<.t•e
sc hemes cnuld ser"c th __ naturaf
cn-ganiLed j11 a,·
docum ents a rc contro lled o r execu ted 9 N . ally there 1s a lis
t of term s. l<inguage
by a perso n. oim . .d ·r . ·on rrol but here ,erms appear alphab
a su bj ect headin ns lis t or a thesau c11cally a11d thu s t/1<- ~ !o)e u f,oc,b
rus, that acl s as the au thor ity l'.sl 10 1 1 1 1 l rc.nnization of kn owledg e rs no1 a\ ailable . So me dll~rnp
ulHt,
lem1 s that n,ay h~ a~~ign cd to do1;um e'. : Y ~,~- t--. I ~log iced (sc1na 111 ic 1
~nt~. ::ind indexing in \'o h:es tnr· -
ns... ign ,t ion - i, ,\'e h ecr , rna.:h.· 1. 1
ea
l ""
1NTRODUCTION TO MODERN INFORMAT
ION RETRIEVAL
r
VOCABULARY CONTROL 159
Comparison
ble 7,2 _ be.tween controlled
Ta _ _ _ _ _ _ _ _ _ _ __ -and natural Jangua
Natural language .. Contro1tett I ge
. hs-.ificity gives preas1on. Excels in La k aag11age
on) along ~ ith an indication ofa mapping of that term in the universe of knowledge
H1g e~- . . , c of spec·fi11 •
·eving .'individual lelTTis - names of persons , c1ty, even in deta·i1ed system by 10 ~•catmg ~e broader (superordinate), narrower (subordinate) and related
retn s.
oryaniza11ons, elc. (coordmate and collateral) tenns. However, this distinction has gradually faded and
Exhaustivity gives potential for high recall (does the latest Library of Congress subject headings list indicates the tcm1S' features as
not apply to title-only databases). shown in normal thesauri.
The following sections discuss different kinds of vocabulary control tools,
useful in multilingual systems. Cost of indexing to the · including subject headings list~, thesauri, the thesaurofacet and the classaurus.
is prohibitive. Also t level of natural language However, emphasis will be given to the concept, development, use, and so on, of the
by indexers. erms may be omitted in error thesaurus.
Up 10 ctaie¥ew" terms are immediately available. Not immediately u t
are added to th p o date. lime lag while terms
esaurus.
words of author used - no misinterpretation by Words of author rab! Subject headings lists
indexer. in indexing term ' e to be misconstrued. Errors A subject heading list is an alphabetical list of tem1s and phrases, with appropriate
.s can cause losses
Natural language words used by indexer as well
Artificial language has to be learnt ~Y the cross-references and notes, which can be used as a source of headings in order to
as !he searcher. searcher. represent the subject content of an information resource. Although it is primaril y
Lowinpul costs. High input costs. aJTanged alphabetically by term, under each tenn or phrase we can find a list of other
tenns or phrases that ·are semantically related to the term or phrase. Subject heading
Easier exchange of material between databases Incompatibility a barrier to easy exchange. lists were designed to complement bibliographic classification in the sense that
- language incompatibility removed.
although a bibliographic classification scheme helps us to assign a class number
Intellectual effort placed on searcher. Problems Eases the burden of searching: (built of notations) to an infonnation resource that represents its subject content, a
arise wilh terms having many synonyms and - controls synonyms and near synonyms and subject heading list allows us to assign an appropriate heading, as a term or a phrase,
near-synonyms. leads to specific preferred terms to broaden to an information resource that represents its subject content. A list of subject
search headings, or a subject index as it is often called, can be used to search or browse a
- qualifies homographs
collection of infom1ation resources. Subject heading lists help us produce a pre-
- provides scope notes
- displays broader, narrower and related terms coordinated index of a collection.
- •expresses concepts elusive in fre e text. Library of Congres.s Subject Headings (LCSH) is an example of a subject heading
list; it is used widely as a controlled vocabulary for catalogues and bibliographics. 1t
Syntax problems. Danger of false drops through Overcomes syntax problems with compound
was originally designed as a controlled vocabulary for representing the subject and
incorrect term association. terms and other devices.
fonn of the books and serials in the Library of Congress collection, with the
Exhau_stivity may lead to loss of precision. An At normal levels of indexing, avoids precision loss objective of providing subject access points to the bibliographic records contained
as5€t in numerical databases and multilingual through over-exhaustivity (retrieval of minor in the library's catalogues. It is now most widely used for assigning subject headings
systems. concepts of peripheral interest). to bibliographic i.nfomrntion resources.
LCSH is the most extensive list of subject headings, and used widely throughout
combine !he fea tures of th e main arrangement in classification schemes with those the English-speaking world. Sears' List of Subject Headings (2004) is a smaller work
12
1hat appear in the index to the classification scheme to generate some kind of faceted designed for small to medium-sized libraries. LCSH contains the entry vocabulary
. . ) of the Library of Congress catalogues. lt is available in various fonnats including
or class1fi,d" th csa urus (see below for d1scuss10n . bard copy, CD-ROM and web. lt is now in its 29t1:i edition , containing over 280.000
801h subj ect heading lists and thesauri contain alphabetically arranged terms wi th
· . c
necessar1'·. cioss-re,erences fi · d · · · · cl11·ng ir.·.~•·.:- 0
~
hea;~ings and re:::1ences ._ -,,-.vw.loc.gov/aba/co ·.,;ng,n:~··~"hJect) . LCSH is the mo:•,
ir' and notes that c?.!"! J:,e u:ed 0 1 :!l exrng OJ sear widel y used tool for assigning subject headings to manual and machi ne-reaclabk
li/rna11on rei~1t:val er,-,i:-onment. However ,;,~ri :~ :: oi[ference. Subj ect heacting
catalogues.
,o:,~l ere initially developed to prepare entries/headings in a subject catalogue that In specifying the prescribed beadings., and also deciding which h,:ading, are no,
th
in ·I tplicate the classified arrangement of document records. Therefore'. ey to be used, LCSH has a number of policies, the fundam ental ones being user nc:cd,.
be~ u de rather broader subj ect tem1s or headings. On the other hand thesa uri have ./literary warrant, use of uniform and unique headings, pn•\'ision of direct acces~ in
n e1clope·j . .· . . b ·n a'no toe.ether the
v,rin . ' on specifi c sub jec t fiel ds with a view to n ,,, " ~ · specifi c suhjects, stabi lity and consistency.
tis ienre,, t , · . . . I homonv ms, and so
_1811
· en a c1 ons of terms (synony ms. spell mg vai s, -
160 INTRODUC TION TO MODERN INFOl'IMATION l'IETRIEVAL
VOCABULARY C0NTRoL lo;

The approved Subjcci_headings in LCS l-1 arc sci in bold face, while th ose in Jhc
Subd Geog' immediately afler the heading. A preferred heading may furth
col ry vocabulary ouly, lor ex ampl e, synonyms, appear in normal lype face. Ea ch
entry may be accompamed by all or some of the fo llowing: subdivided lo gilneralc an approprialc preferred heading, for example 'Comer be
' ~ . pui,,
so flwarc- Accounling', and 'Compu lcr so,11vare- Aecoun lmg- Law and legislation·
LCSH provides rhc reciproca l en tries for USE/UF, NTffiT and RT/RT rclatiillll
► a s_cope nole showing how the tcnn may be useJ
For example, the heading Comp uter soflwa rc has Computer programs as one or th,
► a hst of hcadmgs t? which sec also references may be made
NTs. Now if we look al the entry under Computer p rograms we find Lhe heading
► a hst or. heachn gs l:0111 whi ch s,•e references may be made
Computer software shown as lhe Broader Term (Figure 7.2).
► a hst ol headmgs lrom which see also references nwy be made.
Figure 7.3 shows an example of the oulput of a subject search from a typical
OPA C. ft may be noted rhal the s ubject search was conducted on 'digiul libranes'.
Figure 7. 1 _shows ai, exampk of a typical entry in LCS I I. Each preferred term and the rcsulls P"ge shows the number of records ava i/ab/e in the library under the
111
(appearing hold face) is followed by an LCC class number. There may also be a specific heading in LCS H. TI1is a/low., rhc user Wgel an idea of the 1arious
scope note. as appears under Co mputer softwa re, which delinealcs lhc scope oflhe subheadings under rhc so ugh I subject. and thus provides some sari of a map nr th.:
tcnn/phrasc.
collect ion on the subject.

Compute r softwa re Compu ter Programs


[OA76.755] Here are enlered works limited ro cornpu1er programs ..
Here are entered gene ral works on computer programs along with UF Compuler program files
documentation such as manuals , diagrams and operating instructions, etc. Files, Computer program

UF S oltwa re, Computer


RT BT Computer files
Computer soltware industry
Computers Compuler software

SA subdivisions Software and Juvenile software under subjects for Figure 7 .2 Reciprocal entries in LCSH
a ctual software items
NT Application software

Systems software
- Accounting
[HF5681 .C57] I
r
r
-- Law and legislation 1
(May Subd Geog) I
- Catalogs
UF Computer programs - Catalogs
r
!
- Developme nt
[OA76. 76047]

Figure 7 .1 LCSH Headings

UF (used for) denotes the non-preferred hcaJings for the given term or phrase. and BT.
NT • d RT demite broad~r terms, narrower terms and related 1c1111s, respeclively._ S/\
.. . . , b ~ r 1 ,J Some headmg.s
(see an .
also) provides hints as to where related matcn,ib 111.J? i.: OL I · ~ ·M~,·
c:111 b~ rurthc r suhdividcd geographically and this is ind1 cah:<l by the p1llclSC - • •'
162 INTROD UCTION TO MODER N INFOR
MATION RETR IEVAL

Sears · list of Subject Hi!adin <>s i VOCABULARY CONTROL 163


. . . -~· s s ma ll e r in
ex1ens1vcly, part,cu 1ar1y in small er li braries fi scope U1 an LCSH , but i . .
headings .1 2 This tool p rescribe s as a ge , Jor
_ . U1e purpose o f ass ,· g . s used
'
specit1c 1e1111 - s ubject head ing_ tliat a nera rul . relations hips among the terms are shown in order to
e. enter a work u dHlllg .
sub · 1
Jee facilitate index ing and retri~va l.
for examp Ic, a w or,k on ,B n.dges' shouceurateld b
ly and ,. . n e1 lhe most The major obj~tivc of a thesaurus is thus 10 exert
termino logy control in indexing,
" I d- , h . , P ee,sely represen ts •
a broad~r .
1ea mg sue a, Eng mcering e entered t111 d , . and to aid in s&archi ng by alerting the searcher to
c . . • . .
' u er Bridges ' and. •ts content. applied.
lhe index tem1s that have been
have bee n ,01m u 1aled m th is list in ·d· ,-,ow ever lb
' ere are a numbe r o f not under
. . . O 1 e,. to rcsolv Recogn ition of the thesauru s as a widely used fom1
hcad1111's. A typical entry 111 the Sears' l . _ ._ ru 1cs that of indexing tool came with
.
- · · is 1 would looke l'k confl,ct between .
ti .
•• 11cm ativc
the fi rst international standard for the construc
tion of monolin gual thesauri in
1
Skis and Skiing e Jc followin g: 1974. 15 Since then the processe s of develop ing and
maintai11ing thesauri have been
see also Water Skiing standardized. There are international (lSO 2788: 1986
and lSO 5964: 1985), British
x Skiing, Sno w (BS 5723 : l 987 and BS 6723: l 985) and UN ISJST
standard s (UNISI ST Guideli nes.
xx Winter Sports 1980, 198 1). BS 5723 has now been withdra wn
and replaced by a n ew ~tandard
BS8723, two parts of whi ch have come out: BS8723
-Part I (2005) and BSS723-Pa:1::,
(2005). 8S8723 has fi ve parts :
An en tr1· h.:ginni ng with· ·see also' ind icates
lh at refcren , .
heading are to be made to t he specific headin
. . sa _ Ct: e utn;s from the general BS 8723: Structure d vocabul aries for information
retrieval
be~mnm ~ wi.t I1 ' xx , m. d'
1cates that ~-ee al•o g
ti
ppeann g a fter see als· ' A Part1: Definitions, symbols a nd abbreviations
- • ·• re erences are t0 b o - n entry
specific heading to the genera l beadi ng aJJpear· d Part2: Thesau ri
. - ,- - ft , , c ma e from the
with ·.r md,cat e s that see referen ce e ntries fro in mg a er x.x An cnt b · · Pa rt3: Vocabularies other than thesauri
th d h· .
'x' are 1,1 be mad e to the used heading . -
· e unuse eadrng a T)' cgmmn · "
g Pa rt4: Interope ration between multiple vocabul aries
pp earmg a,tcr Part5 : Inte ropera tion betwee n vocabularies and other
Sears now uses thesaur us-type cross-r efere nces compon ents of
(BT NT RT SA informat io n s torage a nd retrieval systems
an d deta,·1cd exp 1anation · ·
s concerr ung these relati ons and' •-L , , , U SE and UF)
. _ us1y u sed see and'

sec also rderenc es appear a t the beg1rmin g of tlie u,e previo Parts l and 2 of BS8723 broadly corresp ond to 1S027
list 12 It m ay be noted th at Sears, 88-1986 . The fundam ental aim
List only" li sts so□.1e head mgs, an d many m ore heading of a thesaurus, according to BS8723 , is to guide indexer
s can be constru cted by tlie s and searche rs to choose the
catalogu ,.r fo ll ow mg s pecific rul es laid down same tcnn for the same concept. There are tliree
m the section ' Headings to be Added major features of a thesaur us -
by the Cataloger' - Detaile d d iscussio ns o n Sears ' List vocabulary control, thesaural re lationsh ips and thesaur
us d isplay.
of Subject Headin gs appear in
Satija and Haynes ." In a thesauru s a\\ tbe concep ts (words and phrases
) in a given domain are listed.
Some ofthe,e tem1s are valid index terms, called
preferre d tenns - they can be used
fo r the purpose of indexin g, while others are
call ed non-pre ferre d terms - tliey
cannot be used as valid index terms, and appropr
Thesaur i iate refere.nc es are created fro m
non-pre ferred to preferre d tem1s to guide the indexer
and the searche r. Thus the
Thesauri appeare d in the late 1950s . T hey were preferre d terms are used for indexin g and searchi
desig ne d fo r use w ith the eme rging ng, w hereas non-pre ferred terms
post-coorJi nate in d ex ing s ystems of that time, function as lead, ins to the prefe1Ted term~.
which n eeded simp le term s witl1 low
pre,coordination , w hich was not p rov ide d b y
ex isting indexin g languag es. 15 'These
tools represent a p o pular method of o rde ring c
o m bin atory docu me ntary languag es,
and thus 11 priori re lationsh ips betwee n con cepts
are made explic it. G uincba t and
Relationships between terms in a thesaurus
Menou de fine ' th esa uri· as too ls con s is ting of Accordi ng to A itch ison the re are two type, of
a controll ed set of tem1s linked by re lationsh ips in a thesaur us: ( 1) !.hr
hierarchi cal or associ a tive re lations, wh ic h mark macro-l evel, whi ch is the arrangement of the whol
any n eeded eq uivalen c e relati ons e domain of t.h t: the~anr us w ith i, s
(synonym,) with terms from th e n atura l la n g uu sub iect fi elcl 0 ~!'.<l sur-fi elds contaiP;ll ',\ ~els
·~ and c:1 :-:central<: on a pan ic '.::,u of \-:i ~rarcl, ;::::ily ,,pd -.'5°.ociat i. -: '·,
area of kr ·.,• •.:!::dgc. 11' Ro wley and Ha11 ley defi rel~ted term; , ,,nd (2) the inter-tem1 re lati o
ne a th esauru s as a com p il ation of nships .15 Th.ree gen cr.11 class<.:s " i
word8 anJ phrases show ing synon yms a nd hi e ti.mdam ental thesaura l relation sh ips ha\'c been establis
hed:
rarch ica l and other relati onships a~d
nd
dtpe enc i~s. the fu nction of whic h is to provi de
infonna1 · - a sta ndard ized voca b ulary fo r
.
l(in Sl(l!age a nd re tn.e va l sys te m s . 14 ► the equival ence relatio nship
th
, From e lll'o defin ition s given above we
0
.
can see that a thcs muus 15 a too l
i~"" the hierarchical (o r \\'hole- part! relat io nsh ip
• nta1n111,· ., ► !he n,socian v-: •r lat! on :;l11p
,- " co ntro l led , c t of Le m1~ amrnge a I l b · ·a l I· ·ind ·,. ;,rinus
d
p la et ic )- '
164 INTRODUCTION TO MODE'1N
INF0'1MATION '11cT11'EVAL
VOCAB ULA'1Y CONTI10L lb\

Equ ivalence relations hips·


The equiv alenc e rdati onship dcno The ge11e ric relationshi p ide mifie
lcs the relati onship betwe en prcf"c s the rela tionship betwe en a class
rred and non- iLs mem ber spec ,~, . This relat1 or category aod
prefe rred lenns in an index ing langu onsl11p has a 'hiera rchic al force
age. This is denoted by USE (lhc 1ruc c1 f a given class 1s a lso true ', that is, whatever
the prefe rred terms ) and lhc UF prefi x used for of all classes subsumed unde r it. 15 11
(used for, the prefix used for non-p
ll1i s genera l rdati o nship covers re ferre d term) . ·n1c hiera rchic al whole- part relati
two kinds o i"tcm,: synonym nnd onship covers a limite d numh er
.~\'11011yms arc tcnns whose quasi-syno nym. terms in which the name of the of classes of
meaning ca n be regarded as the part implies the name of the whol
of contexts. so that they arc virtua same in a w ide range context. so Iha! lhc terms can be e regardless oftlic
ll y inlcrclwngcahlc.-' There could organized as log ica l hiera rchies.
be seve ral cases situa tions when the whole- part There are only four
of synon ymi ty. for example: relations hip may be cons idere
otherw ise. ii is an associative relatio d hierarchical:
nship. 15 Thes e four cases arc:
► 1enns with di ll'erl!nt lingui sti~ ori
gin, such as polyg lot and mult il
► popular ,wmc s and scien lilic name ingual systems ,,nd organs of the body,
s, such as alkrg y and hyper se nsitiv such as
► variant spelli ngs, such ity GASTROfNTESTTNA L SYSTEMS
as cncyclopacclia and encycloped
► terms fro m differ ent cultu ia
res, such as Ibis and apartments BT Digesrive syste ms
► abbre viations a nd full NT Intest ines
names. such as PVC and polyv inyl
► the fa ctored and unfoc chl oride
torcd fonn s of a compo und tcn n, NT Stomach
coal & min ing . such as coal minin g and
2 geographic locari ons, such as
INDIA
Quas i-syno nyms are tcm1s whos BT Asia
e meanings arc generally regar
ordinary usage , but which arc ded as differen t in
treate d as synonyms for index NT Wes! Beng a l
example, hardn ess and softness. ing purpo ses, for NT Calcu tta
3 disciplines or fields of disco
urse, such as
CHEMISTRY
Hiera rchical relations hips BT Science
NT Physical chemistry
The hierarchica l re lationship is
the basic relationship that distin NT Thermodynamics
thesa urus from other organ ized guish es a systematic
lists o f tem,s (subject heading lists). 4 hierarchica l social struct ures,
are repre sented in their superordin Pairs of tcm1s such as
ate or subordinate status , the supe
rord i □ate tern,
CHURCH OF ENGLAND
repre sentin g the w hole and the
subordinat e term representing a NT Di oceses
The super ordin ate tenn is repre member or a part.
sented by BT (broader tenn) , and NT Parishes
tenn by NT (narr ower tenn) . In the s ubordinate
a thesa urus a pair of superord inate
terms are re prcse n t~d reciprocally and subo rdinate
as follows: 11,e po(,•hierarchital relatia11ship occur
s when a conc ept be longs to m
category. for exam ple: ore than one
CAPITAL MAR KETS
BT Financial ma rkets
PRIN TING EQUI PM ENT
COMPUTER PERI PHER AL EQU
NT Computer printe rs IPMENT
FINANCIAL MARKETS NT Com puter printe rs
N T Capital mark ets
COMP UTER PRIN TERS
NT Computer periph eral equipme
BS 5723 identifies three rel ationa nt
l situations reprc ~cnti ng hier:ir NT Printin g equipment
chicnl relationships:

~ me gener ic relati onship


► the hierarchical who le-pa rt rcla!i Asso ciative relal ions hips
onship. and
► tl1c polyhierarc h ical relat ionsh ip
. An aJsur..:icun·e rcla ~ionsh ip
denotes Lh c relntinnship betw
Jicrarch iLal no1 (:qui\' alcnc e~n ,,, _ . . _
e, yet th(; tc:rin s arc ment ally ass
th1 11 tiit.. 11111-. bct\\l.!\.'.11 __ t~mi :-- lh a1 1~ ne1th l'1
them si 1otllc.i be rn-i • , . • •• ' · .OC l<llc d I U s u~ h
1
, l l. i.: .,pJJ Clt Ill lh ... an exh·n 1
<hL .,
,ll. fd" , ... ll: .d1 .....
- ~~ - - ,160 '"I, ,v~ u u ' ,v,. IV IVIVUl::HN INFORM
ATION RETRI EVAL
r
eveal allemalive tem1s that could b .
r . . . . l . e Used 10 1• d .·
elat1onsh1p 1s rcc1proca and 1s repre · 11 cx1ng or • .
r . scnted by RT T . tn retrieval Tl .
difficult one to define and tl1erefore lo det . · his relations! . . · 11s
·ct .
I gu1 elme that slate tie11J1111e bet,;Veen a pair oftlip is the most ► SN: scope note
Provides a genera . s iat one f enns. BS 5723 ► DEF: definition?
implied, acwrdmg to the common fnu11e · r o the tenns should I
► lfN: history not~
. idex whenever the other 1s .
used as an • ds o . reference shared by th a. ways be
" This '. _ , . . . ' 1n cx1110 te e users of an ► USE: indicates that the tenn following USE is the preferred term
standard lurtber 1den1Jfies two b . d O 1111.
. . . . roa Cate · ► UF: use for (it indicates that the term following UF is the non-preferred term)
by lhe assoc1anve relat1onsh1p: gones of terms that can be bound ► USE+: the two or more preferred tenns following USE+ should be used
together to represent the indicated concept
1 terms belongi11g to the same caieoo,y ti • ► UF+: the non-preferred term that follows UF+ should be represented by a
o,·crbpping meanings, such as 'sh,.ps; a lids 'ubsually refers to siblings with combination of preferred terms including the preferred term that precedes UF ,
. • n oats' wJ ·. . ► TT: top term
defmed, yet they are sometimes used loose) . ' llchcan be precisely
st ► BT: broader tenn
2 1er111s be/011gi11g to different cater?:ories· thi/ andllalmo mterchangeably, and
· ~ · usua Y refers to h ► BTG: broader term (generic)
satisfy tJ1c reqmremem that one of the terll1s sh Id . . sue tem1s that
► BTJ: broader term linstantial - showing an instance, e.g. capital cities and
is used in indexing, fo r example:' ou be unplied when the other
London)
(a) a discipli ne or fi eld of study and the ob ·ects h . ► BTP: broader term (pai1itive - showing a whole- part relationship, e.g. nervous
• ·1 or P enomena studied , e.g.
en tomology and msects system and central nervous system)
(b) a process or operation and its agent or instrument - . . ► NT: narrower tenn
lamps , e.g. 111 um1nation and
► NTG: narrow tem1 (generic)

(c) an action and the pr~ducts of the action, e.g. programming and software ► NTI : narrower tem1 (inslantial)
(d) an action and its patient, e.g. harvesting and crops ► NTP: narrower term (partitive)
► RT: related term.
(e) concepts related to their properties, e.g. poisons and tox icity
(f) concepts related to their origins, e.g. India and Indians
(g) concepts linked by causal dependence, e.g. diseases and pathogens The alphabetical form of thesaurus is easy to organize. However, there is a
(h) a thing and its counter agent, e .g. insects and insecticides shortcoming of this form of thesaurus from the user's point of view, as all the
broader and narrower tenns that constitute a hierarchy in an alphabetical thesaurus
(i) syncategorematic phrases and their embedded nouns, e.g. model buses and
cannot be surveyed at a single position. Extra relational infonnation can be added l.o
buses.
an alphabetical display, for example, the top term in the hierarchy to which a specific
concept belongs . Similarly, as shown above, the level of subordination and
superordination can also be shown using BT\ , BT2, NT\, NT2 and so on.
isplay of terms in a thesaurus Figure 7.4 shows a typical example of an entry in a thesaurus (Unesco thesaurus)
that shows the preferred term (lnfonnation Processing, appearing in bold), the
enns and their re lationships in a tl1 esaurus can be displayed in one of the following
reference from a non-preferred tenn (lnfonnation Handling), and nanower tenn;;
tnethods:
Cataloguing and Bibliographic Control, at two different levels designated by N11
and NT2 .
► alphabetical displ ay, with scope notes and relationships indicated at each temi
► systematic displav with an a lphabetical index
► graph ic display with an alphabetical index. Systematic display
r'. thesaurus that is or;-·•.,i!n.1 ~; <cmatica\ly should have l': . ..: ~arts:

lphabetirnl display 1 categories or hierarchies of terms arranged according to their rne;in,n gs ~n,1
th is fonn . . . . , I , . . ferred or non-prefetTed, are
o f d1°' pl av all mdexmg lerms, whd lei pi e
logical relationships. and . .
b· . f 2 n~alphabetical index that directs the user lt• the appropriate part ol the:
. .
ganized III
. J
a si ng le alphabetical sequence. BS872 3-I
(2005) Proposcs a num e1 o
/;y,ter;atir. s,,c1ion
ibois and ~bb 1·e\ iati on ~ for use in a thesaurns, such as:
r r
168 INTRODUCTION TO MODERN INFORMATION RETR IEVAL
VOCABULARY CONT
Ro[ 1,

Information needs thesaurofacet in the folJowing example, a hypothetical thesaurofa~t


MT 5 15 ln fonnal1on managemenl
in hlJr."l
FR Besoin d'1nfonMtion science (Figure f-5). 2• 18
SP Necesidad de informacion
UF Information demand, lnformabon user needs
BT1 lnic,rm ahon users T11esaurus display
Faceted display
RT Ac,:,?SS I•) 111fonnation
RT lnf1...'nr1at1orr usfr 1nstruC'!1c,·1
L Libraries City libraries Ldc
RT ~ n user SIUdleS
RT Ltllrnrv users La Academic libraries UF Municipal libraries
Lac College libraries RT Cit) govemmenr Qp I
Information officers USE Information scientists Lah Universiry libraries
Ld Public libraries lndusrrial libraries Li
Information processing
MT 5.35 Docurne-ntary mtormat,on processing
Ldc City libraries BT (A) lndustnal mfonnat1on scn,ces~
FR Tra1tament de !'information Ldf Rural libraries
SP Procesa1111ento de la mfrrmacion Li Industrial libraries
SN 1he storage and processing of 1lems of mformat,on especially b~· compuler. Lk Government libraries Municipal libraries
. UF Documentary mfcmiahon processing, Information handbng. lnfo1mabon storage and USE City libraries Ldc
retrieval. ln f011T1a~on work. lnfonnat:on/l1t,rery or-erahons
NT1 CEitalorJu1rc
NT2 a,t~:,,or%r,c ,-ontro! Figure 7.5 Examples from a hypothetical thesaurofacet

Figure 7.4 A typical entry from the Unesco thesaurus


(www2.ulcc.ac.uklunesco/terms/list75.htm) The figure shows that the two paits complement one another: while the faceled
component takes care of the hierarchical relationship, all the olher rcla1ionship1
The li nk berween these two parts is maintained through notation. The systematic
appear in the thesaurus component. The thesaurus componen t shows the notation for
d isplay is helpful both for indexers and searchers, for it gives a bird's-eye view of each term, which acts as the link between the diesaurus display and the faccled
the topic and puts it into the context of the whole subject field. In a systematic display. According to Lancaster2 the obvious advantage of the thesaurofacct is tha1·
thesaurus, the systematic part is regarded as tbe maia pan of the thesaurus (the part ' It can be used for amrngiug books on the s helves of a special library as well as for
that carries the most relational infom1a!ion) and the alphabetical index is regarded as indexing the items in a database. Moreover. shelf arrangem e nt and database will be
a secondary component. fully compatible.·
[n the thesaurofacet, the thesaurus parJ rcplact:s tbc alpl,abt:ticaJ ~ubj~cl inJe.x
that is available in a convealioaal faceted c lassific,,tion scheme. On the ·ther hand,
0
Graphic display the faceted classification part replaces the usual hierarchical structure ofa thesaurus
lh rough the BT/NT relationsh ips. Since the de velopment of the firs! thesaurofarel
Graphic display shows the indexing terms and their relationships in the fonn oftwo-
by Jean Aitchison et al. in _I 969, a number of ,·oca bulary control tools have
dimensional figures, which are supplemented by alphabetical sections. ll1 e advan tage appeared based on the same idea, for example th e Unesco Thc.~a urus (
of graphic display is that it provides an immediate overall view of the envtronment of 1977 ) and
ROOT thesaurus (1981).
the concepts; a disadvantage is that it does not always show eqwvalent te~ns o_r sc1l~
notes nor does it distinguish between hierarchical and assoc 1a1Jve relatl(loShlps. .
the d;tai ls are given in the alphabetical section. Moreover, in the printed form, graphic
Classaurus
display may be bulky and not alwLlys easy to consu lt.
The Classaurus is a vocabu lary corurol t_o~/.-dcvelr,;1e<1 by Bha11ucha ,yv~ and used
~ . Pf"·S ' lh l rr~·u)OfdJ. ,.... · rndex111g ,, stc:JIJ. '" ' ,,,.•. "!,-,n,,,J by BJ·. . ,,
m ,, c~;ego •-based (focc1cJ) systema,_ic sd,eme of hi eran;hica;,1 1(achary),1.
15 . 3
lt . .. .r} ra[ino all the esscnua f fcatur(;'~ of a c : _ (i Jrgd nizrn g)
l hesaurofacet c!ass1fic:;110n _m co1foof S)~,onyms. 4unsi-,~•11un)'Tl1s. a11d an~m c1111 n r. al re,,.;,•, ·ii
. II 1a ,e ,··ith both " the~aurus-type thesaurus - t.:ontn.> - . 1f ·1 scheme of ihi~ l \'flL' caJl . onyn1s 111 ~ ,rc-nded
Thesaurofacet is a specia li zed kind o f rctTieva angL g ' . ·tself and not
. . . . , . g some 1c1111 umque 1o,
; 1 111 .J~enscs. tiJ-1 1 The :tp~l'.~a~1~J111e \n.1~r~ss ot el! d l 1...:nn ,-tcL·ur,-,n~" tu\ :1 c1.11np lcn1t: 111 .1~
and a class1ficat1on-t~pe display, eacll ~onta, ~ :- ·, /. describe lhe features of a alp habcrical md~.\ g. 1\ ing ~ •n I le..;_\ '..:f 1..•111,11 I.. p " '-
found in the other. 1' Lancaster and La nca,te 1 ti 0
170 INTRODUCTION TO MODERN IN

A classaurus can be designed eitlier b "


FORMATION RETR
IEVAL r
·
. - k . e,ore start'1
the indexmg wor . But m all cas, . ng lhc index·
. I es, its de·· . - ingwork VOCABULARY CONTROL 17 1
pragmatic ap:roac ies. The st:ructUrc and st I s1gn111g Warr-.ints bo or al~n~ With
been 11lustraled by Bhattacha.ryya as fi Ye of Presentatio r· th a Pnon and
o11 ows:19 n o a classau . I Field: AGRICULTURE AND RELATED SCIENCE
AND TECHNOLOGY OAT
A SYSTEMATIC PART 11.is 1avc RYE
A1 Common Modifiers
SOIL SCIENCE
'
Disciplines and sub-dsciplines
'.
CORN
BARLEY
A11 Form Modifiers AGRICULTURE MILLET
ROOT CROP
A12 Time Modifiers AGRICULTURE MACHINERY (Study of)
TUBER CROP
c Agricullural lool
A13 Environment Modifiers = Farm machinery
SUGAR CROP
ALCOLOIDAL CROP
A2 Inter-subject Relation Modifiers • Machinery, Ag,iculiural
FIBRE CROP
• Machinery, Farm
A3 Disciplines and Subdisciplines = Tool, Agricuhural FORAGE CROP
A4 Entities E!c.
AGRICULTURAL STRUCTURE fSludy of)
FIELD CROP (Cuhuro ol)
A41 Part entities HORTICULTURE
Properties
lflJURY
A42 Type entities FRUIT CULTU RE
EfNIRONMENTAL INJURY
AS Properties VEGETABLE CULTURE
FLORICULTURE
DISEASE
A6 Actions BACTERIAL DISEASE
FORESTRY
FUNGUS DISEASE
ANIMAL HUSBANDRY
VIRAL DISEASE
VETERINARY MEDICINE DPJAAGE
DAIRY TECHNOLOGY
The arrangement in the systematic pan is governed b th ,, . FISHERY AND FISH CULTURE
DAMAGE (by) PEST
DAMAGE (I>/) INSECT PEST
Y e ,o 1lowing rules: MOLLUSC CULTURE
DAMAGE (I>/) PARASITE
CRUSTACEAN CULTURE
► Each tenn in the systematic part under each category is enu db INSECT CULTURE
DAMAGE (I>/) WEED
GROWTI-1
d. 1 . . C d' . merate y APICULTURE Etc.
1sp _aymg its . oor_ mate- Superordmate--Subordinate--CoUateral (COSCO) SERICULTURE
rela110nsh1ps rn a hierarchy of arrays. Elc.
Actions
► For each tenn in the systematic part the following order is followed vertically: Note: The basic consideration for creating
CULTIVATION
TILLAGE
disciplines and sub-<Jisciplines is !he purpose ol
defirnt10n or scope note, and synonyms, quasi-synonyms and antonyms. bringing all inlormalioo pertaining lo an area
BREEDING

► No non-hierarchically related terms are enumerated for any tem1 in a classaurus (denoted IJt a concept-ierm) together
DEVELOPING
HARVESTING
because of its category-based (faceted) structure, and because POPSJ itself For discipline: FIELD CROP
MOWING
REAPING
takes the ri;sponsibility of revealing this relationship as precisely as possible. (Culture o~ STACKING
► Each array in the c!assaurus is open and discontinuous. Entffies
Parts
THRESHING

► Each tenn in the systematic part is assigned a unique address, which, if desired,
HUSKING
ROOT SHELLING
STEM
can also be a class number. LEAF
CLEANING
WINNOWING
FLOWER GRADING
FRUIT STORING
The alphabetical index part contains each and eve1y term, including synonyms, SEED PACKING
Elc. DRY FARMING
quasi-synonyms and antonyms, occurring in the systematic part, along with ,ts
DRAINING
address. The address refers to the systematic part where all synonyms, Wholes (types) IRRIGATION
CEREAL - MANURING
.
superordmates .
subordmates .
coordinates, an d co II·aterals
' of the term concerned
. are RICE CONTROL
found to occur.' Figure 7.6 shows
' an example of•c1assamus en t1.-es: , along wi th the WHEAT Elc
necessary notes, which bas been used by Bha11acharyya himself. 9
Figur~ 7 •6 Sample classaurus entnes

1 h s lo ihe latlcr: the ded11clive me1hod aud the induu ive


Guidelines for developing a thesaurus . . .. .·
. . . . the compdat1on of thesarn I.
Allstthe avai lable standards3- 8 provide guidelines fm c dis layed (in alphabetic,
: :;;;~;~\~:•l~e
. .
!!~~;1\ ,; method, tenns are_ extrdacc1cl· dcfor~:~
• d .· taoe bu1 no anempt 1s ma 0
t,~ !i~~~~~::t~l:~~,ri~i
' 0 "1 · 1
gl~
Fir II must be dccid~d how the thesaurus 15 10 IJ ; rcl11rnnary
. m exmg · s ": .
s bc lwccn tenns u, 1t1·1 'l sufficic111 number ha, c hec::i
·p d , .
ctcrmmc 1he rel at1 o nsl11p.
s,) Slemat,c
- - or gr;.iphic fom1) and howl 11~ tcmis
_ . 'are 10 be collecte . the incluct1 ve tncll1oct, 0 11 Ii te- oiilet, haHJ . ne" terms ,ire <1dn11 l,ce1 '"'"
1
collec1ed. \\'ith
r
172 INTRODUCTION TO MODERN INFORMATION RETR IEVA L
VOCABULARY CONTROL
IJ:l

the thesaurus as soon as they are encountered in the literature, each term bei ng
are consta_ntly changing their nature, co nnotations and consequently
designed as a member of one or mo,·e categories estab lished on an ad hoc basis
vocabu laries. ~w terms and new relationships appear constantly, and th
during the indexing process. However, a combination of both the indu ctive and
changes are lo pe in corporated into th e thesaurus regularly. ese
deductive method s may be applied . The necessary steps are as fo ll ows:

1 Recording o_f terms: Each tern, is recorded on a fo m1 (a samp le of w hi c h is


shown in Figure 7. 7). The record for a tenn shou ld ind icate the sou rce , the date Criteria for evaluating a t hesaurus
of inclus ion, and references to S)•110nyrns, scope notes, and broader, narrower Davis and Rush propose th e following crite ria to be employed for eYaluating a
and related ten11 s. thesa urus: 1

► Terminology: Is it appropriate for th e field, up to date and accurate?


Thesa urus form Class number ► Scope: ls it too broad or too narrow to cover the field adequately?
► Subdivisions: Are there reasonabl e subdi visions?
Te rm ► Definitions and no/ef: Are enough included for clarity?
► References: Are they adequate in both number a nd form?
► Format: Is it legible?
UF Definition
► Classification numbers: Is the list ing keyed lo any kind of classificati on
scheme, if appropriate?
RT

Scope note:
Use of thesauri in online information retrieval
BT Source: Vocabul ary control, particularly in an electronic informati on environment, has been
an interesting area of research. Rowley has summarized several studies arguing for
and agai nst the need for vocabulary control. 9 Recent deve lopments in the world wide
NT Date:
web (discussed in Chapter 18) and digi tal libraries (discussed in Chapter 22) hav~
given rise to new research projects related to vocabulary con trol. For example,
Figure 7. 7 Sample thesaurus form Shapi ro and Yan28 suggest that vocabulary control is essential in digital !jbraries,
while Milstead comments that thesauri will be used in an informati on retrieval
environment quite di fferently as they wi ll be blended into system s o f machine-aided
2 Term verification : Each term shou ld be v eri fied before it is incl uded in th e indexing and text retrieval systems. and they wi ll be used m o re in helping users
thesaurus. Ther e are variou s sources t hat can be consu lted fo r the purpose, such defint search tenns. 29 ? lassel and Walls repo11 on the Scout Report Signpost
as standard technical dictionar ies and encycloped ias , existi ng thesauri, (www.s ignpost. org), wh1c_b demonstrates that mtcmet resources can be catalogued,
class ification schemes indexes to technical journals, indexes to a bstract class ified and arranged: usmg ex1stmg3~axonom1es_s uch _as th e Library of Co ngress
b~ ll etins current textbooks and handbooks, and subject spec iali sts. Class ification and Sub_1ect Headmgs. _Further d1scuss1ons relatin g to the use of
3 Decidin; the specificity: The use of speci fi c term s shoul d b~ restri cte d to th e vocabul ary control tools in d1g1tal hbranes appear in Chapte r 22 .
The grow ing application of onhae and_ electronic versions of doma in-specific
t;Ore area of the subject fie ld co ncerned. .. . . a Ion w ith all thesauri for query fonuu latJon and expansion can be traced back to the late J 970s
4 Admission and deletion of terms: Th e JOb of mc lus1on of tenns g
when a number of info1111at1on retn_e~'al researchers began to deve lop protofvpe
their relationshi ps into the lhesaurns, an d their isp ay 111
· d' I · the chosen fo rm can
.,;h, - stems in l,,::, lO exp lore ways o1 enabling us,;r, to sc:;,c:i, within infonna .. ,,.
be very d ifficult . H owever, a nu mber of software packages are now ava i,-~ _·~ ,,
. • . II , · th, c hosen forma t. , .. ir.,, :~trieval systems. The dcvclopn:ent of expert system and artificial intelli gence
that c:•, ~•-range all the sets ot ,;;rms automa uca ) 111 " technologies in the 1980s prov1de_d the g1ounds fo r a grow ing in tercs i 'in appl yin g
stage some term s may need to be added or deleted . . . ·d by -ubject esauri as the know ledge bases of _a num b_er of expert sys tems and in1ellige111 fron;-
5 Re~ie1r: Once the thesauru s has been comp iled, it has to be revi<:we s tp . . . d. JI provides a de1a1kd r..:1·:ew o f thesl" th c-saunr
l;ids. Eftl11m1a 1s ·
, -
,.-c:n 11anced S}Stcms
experts and modified as necessa1y . _ . . a continuous process :l, su bj ects
6 Main rc11m1ce: Deve lo pment of a th esaUJ us 1s
174 INTRODUCT ION TO MODERN INF
. 0RMATION RETRIEVAL r
VOCABULARY CONTROL 175
most of them using expert system tee! . .
. ' . Jn1ques wl 11. l
assist users m 1onnulatmg and expandi ·. c l\vere de··
. ng qucne • · signed and d
systems embt'<lded tliesaun as pait ofth . s In one way or , evclopcd to
. . • e1r search f: . . . .inot11cr M . also been extensively reported in the literature. A number of researchers have
a chorce ot search .tem1s. Some ot' th csc s ac1lrues w11· h
' tc prov·d
· any ol these
1 •d constructed co-oc;turrence-based thesauri to evaluate the performance of thesaurus-
niatching
' -
user-subnutted terms witli ti .
1c1r tlic ·
Ysten1s used nla .
· PP•ng t J ·
c users with
based query exp~nsion. 58, 59 Using a laboratory environment and the TREC test
hierarchical strncturcs associated witl) tJ saurus knowledge b . ec 1n1qucs for
. . le entered . ase and d. I collections, these studies resulted in a slight improvement in retrieval performance.
intem1ediary systems used standard t·J _ . tenn. Most 01, ti . isp aycd General-purpose thesauri such as WordNet have also been evaluated in the query
. ' tesaun su 11 1ese ex
(M eSH) and lnfom1atrn n Service r Pl . c as Medic·~ ] S b' pert and. expansion process but have demonstrated little di ffcrencc in retrieval effectiveness."°
. . ,or 1ys1 • u ~cct H d'
(TN SPEC) to prov ide either thcsaums-b~,owsmo . cs and Engineer1·11 C
tl g omm · ·
ca mgs Thesauri constructed automaticall y using a linguistic approach have also demonstrated
The sdection of search tem1s for que , fi "' Or iesaurus-mapping . b.. unities a marginal improvement in retrieval performance.61 Combining different types of
. • · · . ~ ormulati · capa 1httcs
ofonlt ne mtormatJOn retneval has been st d' on and expansion 111 · ,L · thesauri for query expansion has shown better retrieval results than using only one type
. . 11 te<l fro 111 u,e context
studi es can be broadly d1v1ded into two a range of perspec" of thesaurus. 62
" .,.
,,,.
0
oups base{! .. ves.. Tb ese ·
approaches.'- The focus of tbe aloont11111 · on the algorithmic and h Automaticall y construeted thesauri have also been evaluated in user-oriented
. . • o 1c approa h • uman environments. 63- 65 In addi tion some researchers have found that the intcb,ration of
different !)'pes of algonthms tor selectino , . . c · is to develop and eval
_ . ,,,, ve1g11t111g a d/ . uate automatically and manually constructed thesauri has a positive effect on the query
the process of query tonnulauon or expaiis· . n or ranking search tenns iri
. · 1011 to llllp , · expansion process.68. 69
Several mstances of research of this type h b rO\ e mfonnation retrieval
. ave een re d. . ' •
The human approach, m contrast, is conce d . porte Ill the hterature.11-40
. . me with stud ·
ways 111 whi ch users choose tenns for formul t· Y'.ng and evaluating the
. • a mg, expanding d' . Subject headings lists and thesauri in the organization of
quenes dunng the search process. It deals Wt.th .. · or mo 1fy111g th ei r
- cogn1t1ve a.11d b h • internet resources
and issues that affect the selection of search terms b e avioural models
• Yusers. Research h fi Although subject heading lists were ptimarily devised to assign subject headings in
user-centred rnnables such as those relating to iiifi . as ocused on
. . onnauon needs u · t • catalogues, many researchers have used these tools fo r organizing internet resources.
personal characten sllcs, and different user inti t· . • ser mentions,
om1a 1011-seeki □g fil1 Examples of some such efforts are given below. - -
inves tigates their relationship to term selection in the scareh process.41,pro 42
es, and

· INFOMINE
Query expansion using thesauri lNFOMINE is a service providing access to several thousand web resources
Several studies have reported_the construction and use of different types of thesauri comprising databases, electronic journals, guides to the internet for most d\sciplines,
as aids lo the query expans10n process. In general, thesauri within infonnation textbooks and conference proceedings (INFOMINE). lt began in January 1994 as a
retrieval systems can be categorized as belonging to one of three main types: project ofilie Library of the University of California, Riverside70 (INFOMINE uses
standard manually constructed thesauri, searching thesauri and automatically the Library of Congress Subject Headings for indexing the infonnation resources).
constructed thesauri. Users can simply select a discipline and enter the search tenns or phrases \c conduct
· Standard thesauri with hierarchical, equivalence and associative relationships a search. The catalogue can also be browsed by author, title, keyword and subject. If
have been widely used for search tenn selection and query expansion purposes. the option for browsing by subject is chosen, users are taken to an alphabeti cal list
of subjects created by LCSH.
Much of th e research in this area has focused on comparing the perfommnce and
effectiveness of controlled vocabularies versus free text terms in information
retrieval.•3 -4 9 These types of thesauri have also been incorporated as knowledge
Scout Report
bases or interface components in several prototype expert and intelligent systems to
~'. ~'.:: ~~~i's in the process of seE••:·~'. '.t-m1s ~:.Jection and q:.,,ry cxpans;•_,,,. 31 !fl'".;!,Ct Scout project is baset'. ~: ihe University of Wi,consin-Mad1s0n and :~
~·;ic'

Searching thesauri, also referred to as end-user thesa uri, are defined as a category part of the National Science Foundation's National Science Digitai Library (NSD L)
~f th esauri enhanced with a large number of entry tenns that are synonyms, quasi- Project. The project is funded by several fu nding bodies including the US l\ational
Science Foundat ion, the Andrew \\/. Mellon Foundation, Mic:·oso ft and the
~yn.onyms or term va ri ants, which ass ist end- users to find altern ative tenns to add to
~niversitv of \Visconsin-Maddison. ·Smee 1994, the Internet Scout Proj ect has
A
their search queri es.4-1 . 50-5 , number of searching thesauri ha ve been desi gned and
~ocused on r~scarch and development pro1..'.cts th at pro\'1de better toois anci ,<:rvKi:,
develop,Ji; q . . _ . I 15.. q
. c · and have been eva luated 111 query expansion re ~ea1c 1·
l he de,ign dlld testing of ; e1 em! 1ypes of automatically constructed th esaurt has
r 176 INTRODUCTION TO MODERN INFORMATION RETRI EVAL VOCABULARY CONTROL
177

. ·1 , ·ng and deliverine. online infonnation and metadata.' 71 Scout Lancaster, F. w., Vocabula ry Control for !nformotio n Retrieval, 2nd
for fiIlldIll!!. 11 1t.:n ~ · ·
Report edn, Arlington.
· ~- . lia'·l' O I dS
Arch ives 1s a searc · u c ••md
· browscable database contammg .23,00 cata oge cout VA, Jnformatiiln Resources . I 986.
Report swrnnaries that can be searched as well as browsed usmg LCSH. 71 BS 5723: 1987'.Guidelines f or the Establis hment and Developm ent of
Mo110/ingua/
Th esauri, Lonhon, British Standards Institution.
4 BS 6723: 1985 Guidelines f ar the Estnblishm ent and De ve/npment of Mono/i11gua/
Jntute: Health and Life Sciences Thesauri, London, British Standards lnstirution.
Jnt11te Heath and Life Sciences (fom1crly BIOME) offers free access ISO 2788: J986 Guidelin es for th e Establishm ent and D evelopmen t
to a searchable of Mono/i11gua/
catalogue of internet sites and resources covering the health and Thesauri, Geneva, Interna tional Organiza tion for Standardiz
life sciences. There ation.
arc over 31.000 resource descriptions listed here that are freely accessible 6 ISO 5964: 1985 Guidelines for th e Establishm ent and Developm ent
for keyword of Monolingual
searching or brnwsing. 72 Users can browse several subject collection Thesa 11n", Geneva, International organ izat ion for Standardiz
s such as: ation.
medicine, nursing and allied health, veterinary science, bioresearc UNISIST Guidelines f or th e Establishm ent and Developm ent
h, natural hi story, of Monolingual Thesaun,
amculture, food and forestry, aud these collections can be browsed rev. edn, Paris, Unesco, 1980.
using one or more
v~abulary control tools such as Defense Documentation Center UNISIST Guidelines for the Establishm ent and Developm enl
(DDC), CAB of M onolingual Thm,uri.
thesaurus, MeSH and the Royal College of Nursing thesaums. By 2nd edn, Paris, Unesco, 198 1.
selecting a specific
collection and the corresponding vocabulary control tool, the user Rowley, J. E., The Controlled Versus Natural Indexing Languages Debate
gets an alphabetical Revisi1ed: a
list of subject headings along with the number of associated records perspective on information retrieval practice and research, .Journal of
in the collection. /11/ormation
Similar thesaurus-based search facilities are also available: Science, 20 (2), 1994, I08- 19.
in other lntutc
services, for example, lntute: Science, Engineeri ng and Technolo JO Svenonius, E., Unanswered Questions in the Des ign of Controll ed Vocabulari
gy, Intute: Social es.
Sciences, and lntute: Arts and Humanities. In each case a .loumal of the American Socie1y for Inform ation Science, 37
( 5), 1986, 3, !-40
number of specific
vocabulary control tools are available with the search i11te1face JI Aitchison, J. and Gilchrist, A., Thesaurus Cons /ruction: a practical
and the user can man ual, 2nd edn.
choose a ;pecific tool to select terms for searching. London, Aslib, 1987.

I 12

l3
Sears list of Subject Headings® , 19th edn, H. W. Wi lson,
www.hwwilson.com/print/searslst_ I 9th.cfm.
2007,

Satija, M. P. and Haynes, E., User's Guide to Sears List of Subject Headings.
\ Discus sion MD, Scarecrow Press, 2008.
Lanham.
',i
Vocabulary control tools - subject heading lists and thesauri - are 14 Rowley, J. E. and Hartley, R. Organizing !(i,owledg e: a ii ,troduction
Ji controlled natmal 11 m.:naoin~
language tools used for facilitatin g access to information by usmg access to i11fo n11ation I, 4th ed.u, Ashgatc, 2008. 10

~
~re-~efined ~nd " '
pre-coordinated natural lai;guage terms. They play a key role 15 Aitchison, J., Indexing languages and indexing. [n Dosst:11, F. (s!d.), llw
111 mforrnation ,dboul< uf
retrieval. These tools also allo;v users to choose appropriate Special librariansh ip and lnfarmmio n Work, 6th edn, London,
search terms for As lib, , I9i°-233.
conducting andior modifying a search. Although subject heading sts s Gumcbat, C. and_ Men~u, M., General lrurod11ctio 11 to the T,,c/mi ues 1992

I
li 16
uch as LCSH a11d Documentallon Work, Pans, Unesco. I983. of ln 'i:irmation
are primaril y used for indexing catalogue records, some research~r q · ~•-
s _have used them 17 Townley, H. M. and Gee, R. D., Th esmm.,s-m aking · ro
in indexing internet resources . Thesauri have long been used
lor mdexmg onlme London, Andre Deutsch, 1980.
databases and almost all maJ.or c, nJ'me daIahases come with onlme thesaurus g w y our 011 ·11 wo:-d-.ctock .
· . d 18 Lancaster, F. W., Ellikcr, C. and Colone ll. T. H ., Subjecl Anal .
. b
interfaces. Ontologies, which can e cons1-d e1.,~c'I as some s011 of controlle . . .
d. Jnfarmatio11 Science and 7'ech11ology, 24. I ygg_ J5-_ _
vocabularies expressed in an ontol ogy representa tion language, • ys is, A111nw/ Re\'/eiv o/
are now use 11: l9 84
l3hanacharyya. Ci, Classaurus: its fundamen tals. design
. th e mtemet
. and
organ izing info rmation m an d 111
· tr,·anet env ironment; and more
Klassifikati on, 11. I 982, 139-48.
.
specifically in the context of the semant ic web _7l- 74 use , S111d1c11 wr
20 Dcvadasan, F. .I., Online Construc1ion of A!phabetical Th ..
and indc;-,i:,~ :ot,! . lnf.u,:~~tio11 P rocessing and Man :-! ar->n
e~.iurus: a vocabu lary connnl

Refere nces
21 Fucmann. :: .. t ... ~ .. '.::-dcuve Classaurus on PC
1990. 133-7 . 1 . .: · .
1
e,,,,
2 t . I 985. 11 - 26
· '1 , ..: , ,,a t,on,.1/ Class{fica1ion. 1. } (3 -ti.
Davis, C H and Rush. C. L.. Guide lo ln(nn1111rw:i Sci~1rre , Wcs1po11. CT, lircenwnod 22 Chowdhury. G. Ci.. ~ee l"'.neghnn. A and l_"l •owd hury. S .. Vu .
i 1979
, J;
~
M1crofs~s Datab.::e\ n P~sc~/ mterfo~e. 1"1nr
('ol nmb 1=1 . !\•l:ty ·- h t )9 ""1 llil fHl hli-;ht:d
l11 1er na 1iona / ~:1hulary Cu111 rlll 011 !1 11~
Congress oFcns-,sl.\ .
11 1

You might also like