Controlled Vocabulary
Controlled Vocabulary
7
·vo cab ula ry con tro l
VOCABULARY Cori~
vocabulary_ contro l too ls in o rder l\
lo select appro priate tc m1s for repres
docum ent m hand. Tlus h e lps 111 a e nting the
numb er of ways - the doc ument record terms from this list to specif ic docum
on ly contai n a numb e r of tem, s s do not ents. The search er is expected lo
tha~ are _repres entati ve of the conten
docum ent, but these are al so standa
rdized (111 te1111s of their usage , spell
ts of the same _contr olled Ji.,t
during fonnu lation of a search s
trategy. In natura ~:uh l!it
iiig, fom, indexi ng, any terln that appea rs 10 ll1e
and so on) and arc hkel_y to be chose
n by the user for search ing purpo ses. .
be an index term.·r · . title, abstra ct or text of a document re
h . I ~
Similar! ,'
th e use of lermi m~m
there are progra ms av ailabl e by winch There 1s no mec amsm to contro
. . end-u sers may go to Llie. for •1
a partic ular onlrne vocab· ulary c o ntrol ' • }, indexi ng. ~imil~r!y, the searc her is
- . tool in o rder 10 choos ti appro priate page of Sveno nrus d1v1de s the debat e conce
not e~pec ted to use any controlled list
of,:
tem1(s ) tor prepa ring the search expre c 1e most appro prr.ale mmg natura l and contro lled vocabulary
. - th - - ssion. Yocab u larv contr0 I t I . · I three eras, 10 which Rowle y modif ied ·
users mo d 11y e1r prev iously fomiu latcd search expres . oo s a so help end- into four, as s hown in Table 7.1.9
nar.-o wing dm,~1 the search expre ssions
.
sions b ·u ·d • lnlij
· ·· Y er icr W I cnrng or Table 7 .1 The four eras of debat e
2
Lanca ster identi fies two major on contro lled vs. natural language indexing
inform ation retrie val en vironm ent:· ob jectiv es of , b I Era 1 Controlled vocabulary.
· 10 - 1-::--::+.:--,- ---, ---- -,-- ---,
· ca u ary contro l 111 an ---: :-:- -:-- ---: ---- :--- ' !i
Era 2 Comparisons of natural and controlled :---- I '
language: major experimental studies noted
natural language can perform as well as control itial
► tu promo te the consi stent repres entati led v:icabulary, bl/! other factors. such as t, 1'
on of subjec t muller by indexe rs and number of access points, are also significant.
search ers, thereb y avoid ing the disper sio n of related Era 3 Many case studies of limited generalizability I
tl1~ough_ the co ntro l (':1e_rgii,~) of synon
materials. This is achieved
ymou s and nearly synon ymous was noted that the best performance can be
. Searchi~g online databases was oonsi<Jereo.
achieved by a cornbinatinn of controlled and
I l
expre sswns and by d1s tUJgw shmg natural language; the number of access poi1 i
amon g homog raphs its was reaffinned to have a signfficaru effect;
► to fac1ht ate th e condu ct of a compr
ehens ive search on some topic by Jin
full-text and bibliographic databases were noted I
kin to have produced different results.
togeth er tem1s whos e meani ngs are Era 4 New advances in user-based systems 1
related paradi gmatic ally or syn tagma including OPACs. The value of controlled
tic!lly. the context of user-friendly interfac&S and the vm:Ja;yr~
development of knowledge bases were rir,tsd
\
Lanca ster add,; that index ing tends
to be more consis tent when the vocab
1s contro lled, becau se index ers are ulary used Aitchison and Gikhr ist pr~vid e a compa
more likely to agree on the terms rison of contro lled and natural language
descri be a partic ular topic if they needed to mdexm g, winch 1s shown m Tabl e 7.2
are select ed from a pre-es tabli shed list on the next page_ 11 Rowle y mentio
they are given a free hand to use than when despite much debate exten~ mg over ns that
any ten11s th at they wish. Similarly, m o re th an a centur y, ,ogeth er with
search er's point of v iew. it is easier from the research _proJects'. mformatron s_cientists O ranee of
to identif y the terms approp riate to in have failed to resolv e the issue con..:e
needs if these tenns must be select formation the relative ments and demen ts of ~ne
ed from a definit ive li st. T hus contro contro lled and natura l lanm•a!!e
practice and tested resea_rch b~ve s~gge 9 Howe"Ve;,
vocab ulary tends to match the langu lled sted that contro ll ed lan~a ;'" ?.nci
age of indexe rs and search ers. language should be used m conJun cllon nalural
A large numb er of docum ents have wi1h one anoth er. ~
appe& ed coveri ng the details of variou
voci,b ulary contro l tools. There are s
also standa rds such a~ BS 5723, 3 BS
2788,5 ISO 5964 6 and UNISIST guidel 6723, 4 JSO
ines. 7• 8 In this chapte r, we shall try Voca bula ry cont rol tools
under stand what a vocab ulary contro to
l tool is and how it contro ls the vocabu
an inform ation retrie val enviro nment lary in
. We shall also !earn aboul the As the name sugges ts, tl1ese are the tools
vocab ulary contro l tool s, their charac various and retr ieval. These are natural langua used to contro l the v···~· ..
teristi c features, mechanisms of devel ge tools me . .
. ,.~abul::tr) 0/ 1nde,10~
and so on . Finall y we shall look into
on line vocab ulary comro l device s.
the creatio n, riiainte nance and usage
opment.
aspects of
natural language 1erms !hat can be used
indexe r and an index user need is a set of
for index ino / 11
1 ;~g t_
mu th ese 1uols
guiclelines"fo:. \ iei~,e\"ictl purpo ses.
conrain
Syndetic structures are devices that What an
provid e th ese t ,1 c_ P• ~per , e lec t ion
relationships among lenus or concepts. :1f tem1s.
. and they fall gwcle !ines by slw1Ving 1he
cl2ssificat10n schemes. an d SU b.~_eel .
hcad_mg fi sts dllclinroth<'slwo .
· mtror catego n_ .
Con troll ed vs natu ral i~n~d~e'::x~i~n schemes, bemg tools f:;, org~mzmg i,;·
-
~g~- - - -- - - - - - - knowl edge. cou /d · "Lin. C fnss1R cauon
. . -
~"-nlr o lled mdex mg '.:!!le...:m--· . _..J,"bulary control b:i. ::.,.; "''··" t,'lJy ufclas
- h b tl· .· . . terms that are used 10
~., :.re those Ill w 111c silical iun sc , ... ~" 0 _f grear he lp
o. ' ,1.c artific ial language, whereas t~r for
repres ent s ubject s and -the- proces
s where by terms arc assigned 10
. .
parttcular reprcsenroiion. Indexes to cl~ss1ficallon
v~cabulary contro l \vc n;;<.t•e
sc hemes cnuld ser"c th __ naturaf
cn-ganiLed j11 a,·
docum ents a rc contro lled o r execu ted 9 N . ally there 1s a lis
t of term s. l<inguage
by a perso n. oim . .d ·r . ·on rrol but here ,erms appear alphab
a su bj ect headin ns lis t or a thesau c11cally a11d thu s t/1<- ~ !o)e u f,oc,b
rus, that acl s as the au thor ity l'.sl 10 1 1 1 1 l rc.nnization of kn owledg e rs no1 a\ ailable . So me dll~rnp
ulHt,
lem1 s that n,ay h~ a~~ign cd to do1;um e'. : Y ~,~- t--. I ~log iced (sc1na 111 ic 1
~nt~. ::ind indexing in \'o h:es tnr· -
ns... ign ,t ion - i, ,\'e h ecr , rna.:h.· 1. 1
ea
l ""
1NTRODUCTION TO MODERN INFORMAT
ION RETRIEVAL
r
VOCABULARY CONTROL 159
Comparison
ble 7,2 _ be.tween controlled
Ta _ _ _ _ _ _ _ _ _ _ __ -and natural Jangua
Natural language .. Contro1tett I ge
. hs-.ificity gives preas1on. Excels in La k aag11age
on) along ~ ith an indication ofa mapping of that term in the universe of knowledge
H1g e~- . . , c of spec·fi11 •
·eving .'individual lelTTis - names of persons , c1ty, even in deta·i1ed system by 10 ~•catmg ~e broader (superordinate), narrower (subordinate) and related
retn s.
oryaniza11ons, elc. (coordmate and collateral) tenns. However, this distinction has gradually faded and
Exhaustivity gives potential for high recall (does the latest Library of Congress subject headings list indicates the tcm1S' features as
not apply to title-only databases). shown in normal thesauri.
The following sections discuss different kinds of vocabulary control tools,
useful in multilingual systems. Cost of indexing to the · including subject headings list~, thesauri, the thesaurofacet and the classaurus.
is prohibitive. Also t level of natural language However, emphasis will be given to the concept, development, use, and so on, of the
by indexers. erms may be omitted in error thesaurus.
Up 10 ctaie¥ew" terms are immediately available. Not immediately u t
are added to th p o date. lime lag while terms
esaurus.
words of author used - no misinterpretation by Words of author rab! Subject headings lists
indexer. in indexing term ' e to be misconstrued. Errors A subject heading list is an alphabetical list of tem1s and phrases, with appropriate
.s can cause losses
Natural language words used by indexer as well
Artificial language has to be learnt ~Y the cross-references and notes, which can be used as a source of headings in order to
as !he searcher. searcher. represent the subject content of an information resource. Although it is primaril y
Lowinpul costs. High input costs. aJTanged alphabetically by term, under each tenn or phrase we can find a list of other
tenns or phrases that ·are semantically related to the term or phrase. Subject heading
Easier exchange of material between databases Incompatibility a barrier to easy exchange. lists were designed to complement bibliographic classification in the sense that
- language incompatibility removed.
although a bibliographic classification scheme helps us to assign a class number
Intellectual effort placed on searcher. Problems Eases the burden of searching: (built of notations) to an infonnation resource that represents its subject content, a
arise wilh terms having many synonyms and - controls synonyms and near synonyms and subject heading list allows us to assign an appropriate heading, as a term or a phrase,
near-synonyms. leads to specific preferred terms to broaden to an information resource that represents its subject content. A list of subject
search headings, or a subject index as it is often called, can be used to search or browse a
- qualifies homographs
collection of infom1ation resources. Subject heading lists help us produce a pre-
- provides scope notes
- displays broader, narrower and related terms coordinated index of a collection.
- •expresses concepts elusive in fre e text. Library of Congres.s Subject Headings (LCSH) is an example of a subject heading
list; it is used widely as a controlled vocabulary for catalogues and bibliographics. 1t
Syntax problems. Danger of false drops through Overcomes syntax problems with compound
was originally designed as a controlled vocabulary for representing the subject and
incorrect term association. terms and other devices.
fonn of the books and serials in the Library of Congress collection, with the
Exhau_stivity may lead to loss of precision. An At normal levels of indexing, avoids precision loss objective of providing subject access points to the bibliographic records contained
as5€t in numerical databases and multilingual through over-exhaustivity (retrieval of minor in the library's catalogues. It is now most widely used for assigning subject headings
systems. concepts of peripheral interest). to bibliographic i.nfomrntion resources.
LCSH is the most extensive list of subject headings, and used widely throughout
combine !he fea tures of th e main arrangement in classification schemes with those the English-speaking world. Sears' List of Subject Headings (2004) is a smaller work
12
1hat appear in the index to the classification scheme to generate some kind of faceted designed for small to medium-sized libraries. LCSH contains the entry vocabulary
. . ) of the Library of Congress catalogues. lt is available in various fonnats including
or class1fi,d" th csa urus (see below for d1scuss10n . bard copy, CD-ROM and web. lt is now in its 29t1:i edition , containing over 280.000
801h subj ect heading lists and thesauri contain alphabetically arranged terms wi th
· . c
necessar1'·. cioss-re,erences fi · d · · · · cl11·ng ir.·.~•·.:- 0
~
hea;~ings and re:::1ences ._ -,,-.vw.loc.gov/aba/co ·.,;ng,n:~··~"hJect) . LCSH is the mo:•,
ir' and notes that c?.!"! J:,e u:ed 0 1 :!l exrng OJ sear widel y used tool for assigning subject headings to manual and machi ne-reaclabk
li/rna11on rei~1t:val er,-,i:-onment. However ,;,~ri :~ :: oi[ference. Subj ect heacting
catalogues.
,o:,~l ere initially developed to prepare entries/headings in a subject catalogue that In specifying the prescribed beadings., and also deciding which h,:ading, are no,
th
in ·I tplicate the classified arrangement of document records. Therefore'. ey to be used, LCSH has a number of policies, the fundam ental ones being user nc:cd,.
be~ u de rather broader subj ect tem1s or headings. On the other hand thesa uri have ./literary warrant, use of uniform and unique headings, pn•\'ision of direct acces~ in
n e1clope·j . .· . . b ·n a'no toe.ether the
v,rin . ' on specifi c sub jec t fiel ds with a view to n ,,, " ~ · specifi c suhjects, stabi lity and consistency.
tis ienre,, t , · . . . I homonv ms, and so
_1811
· en a c1 ons of terms (synony ms. spell mg vai s, -
160 INTRODUC TION TO MODERN INFOl'IMATION l'IETRIEVAL
VOCABULARY C0NTRoL lo;
The approved Subjcci_headings in LCS l-1 arc sci in bold face, while th ose in Jhc
Subd Geog' immediately afler the heading. A preferred heading may furth
col ry vocabulary ouly, lor ex ampl e, synonyms, appear in normal lype face. Ea ch
entry may be accompamed by all or some of the fo llowing: subdivided lo gilneralc an approprialc preferred heading, for example 'Comer be
' ~ . pui,,
so flwarc- Accounling', and 'Compu lcr so,11vare- Aecoun lmg- Law and legislation·
LCSH provides rhc reciproca l en tries for USE/UF, NTffiT and RT/RT rclatiillll
► a s_cope nole showing how the tcnn may be useJ
For example, the heading Comp uter soflwa rc has Computer programs as one or th,
► a hst of hcadmgs t? which sec also references may be made
NTs. Now if we look al the entry under Computer p rograms we find Lhe heading
► a hst or. heachn gs l:0111 whi ch s,•e references may be made
Computer software shown as lhe Broader Term (Figure 7.2).
► a hst ol headmgs lrom which see also references nwy be made.
Figure 7.3 shows an example of the oulput of a subject search from a typical
OPA C. ft may be noted rhal the s ubject search was conducted on 'digiul libranes'.
Figure 7. 1 _shows ai, exampk of a typical entry in LCS I I. Each preferred term and the rcsulls P"ge shows the number of records ava i/ab/e in the library under the
111
(appearing hold face) is followed by an LCC class number. There may also be a specific heading in LCS H. TI1is a/low., rhc user Wgel an idea of the 1arious
scope note. as appears under Co mputer softwa re, which delinealcs lhc scope oflhe subheadings under rhc so ugh I subject. and thus provides some sari of a map nr th.:
tcnn/phrasc.
collect ion on the subject.
SA subdivisions Software and Juvenile software under subjects for Figure 7 .2 Reciprocal entries in LCSH
a ctual software items
NT Application software
Systems software
- Accounting
[HF5681 .C57] I
r
r
-- Law and legislation 1
(May Subd Geog) I
- Catalogs
UF Computer programs - Catalogs
r
!
- Developme nt
[OA76. 76047]
UF (used for) denotes the non-preferred hcaJings for the given term or phrase. and BT.
NT • d RT demite broad~r terms, narrower terms and related 1c1111s, respeclively._ S/\
.. . . , b ~ r 1 ,J Some headmg.s
(see an .
also) provides hints as to where related matcn,ib 111.J? i.: OL I · ~ ·M~,·
c:111 b~ rurthc r suhdividcd geographically and this is ind1 cah:<l by the p1llclSC - • •'
162 INTROD UCTION TO MODER N INFOR
MATION RETR IEVAL
(c) an action and the pr~ducts of the action, e.g. programming and software ► NTI : narrower tem1 (inslantial)
(d) an action and its patient, e.g. harvesting and crops ► NTP: narrower term (partitive)
► RT: related term.
(e) concepts related to their properties, e.g. poisons and tox icity
(f) concepts related to their origins, e.g. India and Indians
(g) concepts linked by causal dependence, e.g. diseases and pathogens The alphabetical form of thesaurus is easy to organize. However, there is a
(h) a thing and its counter agent, e .g. insects and insecticides shortcoming of this form of thesaurus from the user's point of view, as all the
broader and narrower tenns that constitute a hierarchy in an alphabetical thesaurus
(i) syncategorematic phrases and their embedded nouns, e.g. model buses and
cannot be surveyed at a single position. Extra relational infonnation can be added l.o
buses.
an alphabetical display, for example, the top term in the hierarchy to which a specific
concept belongs . Similarly, as shown above, the level of subordination and
superordination can also be shown using BT\ , BT2, NT\, NT2 and so on.
isplay of terms in a thesaurus Figure 7.4 shows a typical example of an entry in a thesaurus (Unesco thesaurus)
that shows the preferred term (lnfonnation Processing, appearing in bold), the
enns and their re lationships in a tl1 esaurus can be displayed in one of the following
reference from a non-preferred tenn (lnfonnation Handling), and nanower tenn;;
tnethods:
Cataloguing and Bibliographic Control, at two different levels designated by N11
and NT2 .
► alphabetical displ ay, with scope notes and relationships indicated at each temi
► systematic displav with an a lphabetical index
► graph ic display with an alphabetical index. Systematic display
r'. thesaurus that is or;-·•.,i!n.1 ~; <cmatica\ly should have l': . ..: ~arts:
lphabetirnl display 1 categories or hierarchies of terms arranged according to their rne;in,n gs ~n,1
th is fonn . . . . , I , . . ferred or non-prefetTed, are
o f d1°' pl av all mdexmg lerms, whd lei pi e
logical relationships. and . .
b· . f 2 n~alphabetical index that directs the user lt• the appropriate part ol the:
. .
ganized III
. J
a si ng le alphabetical sequence. BS872 3-I
(2005) Proposcs a num e1 o
/;y,ter;atir. s,,c1ion
ibois and ~bb 1·e\ iati on ~ for use in a thesaurns, such as:
r r
168 INTRODUCTION TO MODERN INFORMATION RETR IEVAL
VOCABULARY CONT
Ro[ 1,
► No non-hierarchically related terms are enumerated for any tem1 in a classaurus (denoted IJt a concept-ierm) together
DEVELOPING
HARVESTING
because of its category-based (faceted) structure, and because POPSJ itself For discipline: FIELD CROP
MOWING
REAPING
takes the ri;sponsibility of revealing this relationship as precisely as possible. (Culture o~ STACKING
► Each array in the c!assaurus is open and discontinuous. Entffies
Parts
THRESHING
► Each tenn in the systematic part is assigned a unique address, which, if desired,
HUSKING
ROOT SHELLING
STEM
can also be a class number. LEAF
CLEANING
WINNOWING
FLOWER GRADING
FRUIT STORING
The alphabetical index part contains each and eve1y term, including synonyms, SEED PACKING
Elc. DRY FARMING
quasi-synonyms and antonyms, occurring in the systematic part, along with ,ts
DRAINING
address. The address refers to the systematic part where all synonyms, Wholes (types) IRRIGATION
CEREAL - MANURING
.
superordmates .
subordmates .
coordinates, an d co II·aterals
' of the term concerned
. are RICE CONTROL
found to occur.' Figure 7.6 shows
' an example of•c1assamus en t1.-es: , along wi th the WHEAT Elc
necessary notes, which bas been used by Bha11acharyya himself. 9
Figur~ 7 •6 Sample classaurus entnes
the thesaurus as soon as they are encountered in the literature, each term bei ng
are consta_ntly changing their nature, co nnotations and consequently
designed as a member of one or mo,·e categories estab lished on an ad hoc basis
vocabu laries. ~w terms and new relationships appear constantly, and th
during the indexing process. However, a combination of both the indu ctive and
changes are lo pe in corporated into th e thesaurus regularly. ese
deductive method s may be applied . The necessary steps are as fo ll ows:
Scope note:
Use of thesauri in online information retrieval
BT Source: Vocabul ary control, particularly in an electronic informati on environment, has been
an interesting area of research. Rowley has summarized several studies arguing for
and agai nst the need for vocabulary control. 9 Recent deve lopments in the world wide
NT Date:
web (discussed in Chapter 18) and digi tal libraries (discussed in Chapter 22) hav~
given rise to new research projects related to vocabulary con trol. For example,
Figure 7. 7 Sample thesaurus form Shapi ro and Yan28 suggest that vocabulary control is essential in digital !jbraries,
while Milstead comments that thesauri will be used in an informati on retrieval
environment quite di fferently as they wi ll be blended into system s o f machine-aided
2 Term verification : Each term shou ld be v eri fied before it is incl uded in th e indexing and text retrieval systems. and they wi ll be used m o re in helping users
thesaurus. Ther e are variou s sources t hat can be consu lted fo r the purpose, such defint search tenns. 29 ? lassel and Walls repo11 on the Scout Report Signpost
as standard technical dictionar ies and encycloped ias , existi ng thesauri, (www.s ignpost. org), wh1c_b demonstrates that mtcmet resources can be catalogued,
class ification schemes indexes to technical journals, indexes to a bstract class ified and arranged: usmg ex1stmg3~axonom1es_s uch _as th e Library of Co ngress
b~ ll etins current textbooks and handbooks, and subject spec iali sts. Class ification and Sub_1ect Headmgs. _Further d1scuss1ons relatin g to the use of
3 Decidin; the specificity: The use of speci fi c term s shoul d b~ restri cte d to th e vocabul ary control tools in d1g1tal hbranes appear in Chapte r 22 .
The grow ing application of onhae and_ electronic versions of doma in-specific
t;Ore area of the subject fie ld co ncerned. .. . . a Ion w ith all thesauri for query fonuu latJon and expansion can be traced back to the late J 970s
4 Admission and deletion of terms: Th e JOb of mc lus1on of tenns g
when a number of info1111at1on retn_e~'al researchers began to deve lop protofvpe
their relationshi ps into the lhesaurns, an d their isp ay 111
· d' I · the chosen fo rm can
.,;h, - stems in l,,::, lO exp lore ways o1 enabling us,;r, to sc:;,c:i, within infonna .. ,,.
be very d ifficult . H owever, a nu mber of software packages are now ava i,-~ _·~ ,,
. • . II , · th, c hosen forma t. , .. ir.,, :~trieval systems. The dcvclopn:ent of expert system and artificial intelli gence
that c:•, ~•-range all the sets ot ,;;rms automa uca ) 111 " technologies in the 1980s prov1de_d the g1ounds fo r a grow ing in tercs i 'in appl yin g
stage some term s may need to be added or deleted . . . ·d by -ubject esauri as the know ledge bases of _a num b_er of expert sys tems and in1ellige111 fron;-
5 Re~ie1r: Once the thesauru s has been comp iled, it has to be revi<:we s tp . . . d. JI provides a de1a1kd r..:1·:ew o f thesl" th c-saunr
l;ids. Eftl11m1a 1s ·
, -
,.-c:n 11anced S}Stcms
experts and modified as necessa1y . _ . . a continuous process :l, su bj ects
6 Main rc11m1ce: Deve lo pment of a th esaUJ us 1s
174 INTRODUCT ION TO MODERN INF
. 0RMATION RETRIEVAL r
VOCABULARY CONTROL 175
most of them using expert system tee! . .
. ' . Jn1ques wl 11. l
assist users m 1onnulatmg and expandi ·. c l\vere de··
. ng qucne • · signed and d
systems embt'<lded tliesaun as pait ofth . s In one way or , evclopcd to
. . • e1r search f: . . . .inot11cr M . also been extensively reported in the literature. A number of researchers have
a chorce ot search .tem1s. Some ot' th csc s ac1lrues w11· h
' tc prov·d
· any ol these
1 •d constructed co-oc;turrence-based thesauri to evaluate the performance of thesaurus-
niatching
' -
user-subnutted terms witli ti .
1c1r tlic ·
Ysten1s used nla .
· PP•ng t J ·
c users with
based query exp~nsion. 58, 59 Using a laboratory environment and the TREC test
hierarchical strncturcs associated witl) tJ saurus knowledge b . ec 1n1qucs for
. . le entered . ase and d. I collections, these studies resulted in a slight improvement in retrieval performance.
intem1ediary systems used standard t·J _ . tenn. Most 01, ti . isp aycd General-purpose thesauri such as WordNet have also been evaluated in the query
. ' tesaun su 11 1ese ex
(M eSH) and lnfom1atrn n Service r Pl . c as Medic·~ ] S b' pert and. expansion process but have demonstrated little di ffcrencc in retrieval effectiveness."°
. . ,or 1ys1 • u ~cct H d'
(TN SPEC) to prov ide either thcsaums-b~,owsmo . cs and Engineer1·11 C
tl g omm · ·
ca mgs Thesauri constructed automaticall y using a linguistic approach have also demonstrated
The sdection of search tem1s for que , fi "' Or iesaurus-mapping . b.. unities a marginal improvement in retrieval performance.61 Combining different types of
. • · · . ~ ormulati · capa 1httcs
ofonlt ne mtormatJOn retneval has been st d' on and expansion 111 · ,L · thesauri for query expansion has shown better retrieval results than using only one type
. . 11 te<l fro 111 u,e context
studi es can be broadly d1v1ded into two a range of perspec" of thesaurus. 62
" .,.
,,,.
0
oups base{! .. ves.. Tb ese ·
approaches.'- The focus of tbe aloont11111 · on the algorithmic and h Automaticall y construeted thesauri have also been evaluated in user-oriented
. . • o 1c approa h • uman environments. 63- 65 In addi tion some researchers have found that the intcb,ration of
different !)'pes of algonthms tor selectino , . . c · is to develop and eval
_ . ,,,, ve1g11t111g a d/ . uate automatically and manually constructed thesauri has a positive effect on the query
the process of query tonnulauon or expaiis· . n or ranking search tenns iri
. · 1011 to llllp , · expansion process.68. 69
Several mstances of research of this type h b rO\ e mfonnation retrieval
. ave een re d. . ' •
The human approach, m contrast, is conce d . porte Ill the hterature.11-40
. . me with stud ·
ways 111 whi ch users choose tenns for formul t· Y'.ng and evaluating the
. • a mg, expanding d' . Subject headings lists and thesauri in the organization of
quenes dunng the search process. It deals Wt.th .. · or mo 1fy111g th ei r
- cogn1t1ve a.11d b h • internet resources
and issues that affect the selection of search terms b e avioural models
• Yusers. Research h fi Although subject heading lists were ptimarily devised to assign subject headings in
user-centred rnnables such as those relating to iiifi . as ocused on
. . onnauon needs u · t • catalogues, many researchers have used these tools fo r organizing internet resources.
personal characten sllcs, and different user inti t· . • ser mentions,
om1a 1011-seeki □g fil1 Examples of some such efforts are given below. - -
inves tigates their relationship to term selection in the scareh process.41,pro 42
es, and
· INFOMINE
Query expansion using thesauri lNFOMINE is a service providing access to several thousand web resources
Several studies have reported_the construction and use of different types of thesauri comprising databases, electronic journals, guides to the internet for most d\sciplines,
as aids lo the query expans10n process. In general, thesauri within infonnation textbooks and conference proceedings (INFOMINE). lt began in January 1994 as a
retrieval systems can be categorized as belonging to one of three main types: project ofilie Library of the University of California, Riverside70 (INFOMINE uses
standard manually constructed thesauri, searching thesauri and automatically the Library of Congress Subject Headings for indexing the infonnation resources).
constructed thesauri. Users can simply select a discipline and enter the search tenns or phrases \c conduct
· Standard thesauri with hierarchical, equivalence and associative relationships a search. The catalogue can also be browsed by author, title, keyword and subject. If
have been widely used for search tenn selection and query expansion purposes. the option for browsing by subject is chosen, users are taken to an alphabeti cal list
of subjects created by LCSH.
Much of th e research in this area has focused on comparing the perfommnce and
effectiveness of controlled vocabularies versus free text terms in information
retrieval.•3 -4 9 These types of thesauri have also been incorporated as knowledge
Scout Report
bases or interface components in several prototype expert and intelligent systems to
~'. ~'.:: ~~~i's in the process of seE••:·~'. '.t-m1s ~:.Jection and q:.,,ry cxpans;•_,,,. 31 !fl'".;!,Ct Scout project is baset'. ~: ihe University of Wi,consin-Mad1s0n and :~
~·;ic'
Searching thesauri, also referred to as end-user thesa uri, are defined as a category part of the National Science Foundation's National Science Digitai Library (NSD L)
~f th esauri enhanced with a large number of entry tenns that are synonyms, quasi- Project. The project is funded by several fu nding bodies including the US l\ational
Science Foundat ion, the Andrew \\/. Mellon Foundation, Mic:·oso ft and the
~yn.onyms or term va ri ants, which ass ist end- users to find altern ative tenns to add to
~niversitv of \Visconsin-Maddison. ·Smee 1994, the Internet Scout Proj ect has
A
their search queri es.4-1 . 50-5 , number of searching thesauri ha ve been desi gned and
~ocused on r~scarch and development pro1..'.cts th at pro\'1de better toois anci ,<:rvKi:,
develop,Ji; q . . _ . I 15.. q
. c · and have been eva luated 111 query expansion re ~ea1c 1·
l he de,ign dlld testing of ; e1 em! 1ypes of automatically constructed th esaurt has
r 176 INTRODUCTION TO MODERN INFORMATION RETRI EVAL VOCABULARY CONTROL
177
. ·1 , ·ng and deliverine. online infonnation and metadata.' 71 Scout Lancaster, F. w., Vocabula ry Control for !nformotio n Retrieval, 2nd
for fiIlldIll!!. 11 1t.:n ~ · ·
Report edn, Arlington.
· ~- . lia'·l' O I dS
Arch ives 1s a searc · u c ••md
· browscable database contammg .23,00 cata oge cout VA, Jnformatiiln Resources . I 986.
Report swrnnaries that can be searched as well as browsed usmg LCSH. 71 BS 5723: 1987'.Guidelines f or the Establis hment and Developm ent of
Mo110/ingua/
Th esauri, Lonhon, British Standards Institution.
4 BS 6723: 1985 Guidelines f ar the Estnblishm ent and De ve/npment of Mono/i11gua/
Jntute: Health and Life Sciences Thesauri, London, British Standards lnstirution.
Jnt11te Heath and Life Sciences (fom1crly BIOME) offers free access ISO 2788: J986 Guidelin es for th e Establishm ent and D evelopmen t
to a searchable of Mono/i11gua/
catalogue of internet sites and resources covering the health and Thesauri, Geneva, Interna tional Organiza tion for Standardiz
life sciences. There ation.
arc over 31.000 resource descriptions listed here that are freely accessible 6 ISO 5964: 1985 Guidelines for th e Establishm ent and Developm ent
for keyword of Monolingual
searching or brnwsing. 72 Users can browse several subject collection Thesa 11n", Geneva, International organ izat ion for Standardiz
s such as: ation.
medicine, nursing and allied health, veterinary science, bioresearc UNISIST Guidelines f or th e Establishm ent and Developm ent
h, natural hi story, of Monolingual Thesaun,
amculture, food and forestry, aud these collections can be browsed rev. edn, Paris, Unesco, 1980.
using one or more
v~abulary control tools such as Defense Documentation Center UNISIST Guidelines for the Establishm ent and Developm enl
(DDC), CAB of M onolingual Thm,uri.
thesaurus, MeSH and the Royal College of Nursing thesaums. By 2nd edn, Paris, Unesco, 198 1.
selecting a specific
collection and the corresponding vocabulary control tool, the user Rowley, J. E., The Controlled Versus Natural Indexing Languages Debate
gets an alphabetical Revisi1ed: a
list of subject headings along with the number of associated records perspective on information retrieval practice and research, .Journal of
in the collection. /11/ormation
Similar thesaurus-based search facilities are also available: Science, 20 (2), 1994, I08- 19.
in other lntutc
services, for example, lntute: Science, Engineeri ng and Technolo JO Svenonius, E., Unanswered Questions in the Des ign of Controll ed Vocabulari
gy, Intute: Social es.
Sciences, and lntute: Arts and Humanities. In each case a .loumal of the American Socie1y for Inform ation Science, 37
( 5), 1986, 3, !-40
number of specific
vocabulary control tools are available with the search i11te1face JI Aitchison, J. and Gilchrist, A., Thesaurus Cons /ruction: a practical
and the user can man ual, 2nd edn.
choose a ;pecific tool to select terms for searching. London, Aslib, 1987.
I 12
l3
Sears list of Subject Headings® , 19th edn, H. W. Wi lson,
www.hwwilson.com/print/searslst_ I 9th.cfm.
2007,
Satija, M. P. and Haynes, E., User's Guide to Sears List of Subject Headings.
\ Discus sion MD, Scarecrow Press, 2008.
Lanham.
',i
Vocabulary control tools - subject heading lists and thesauri - are 14 Rowley, J. E. and Hartley, R. Organizing !(i,owledg e: a ii ,troduction
Ji controlled natmal 11 m.:naoin~
language tools used for facilitatin g access to information by usmg access to i11fo n11ation I, 4th ed.u, Ashgatc, 2008. 10
~
~re-~efined ~nd " '
pre-coordinated natural lai;guage terms. They play a key role 15 Aitchison, J., Indexing languages and indexing. [n Dosst:11, F. (s!d.), llw
111 mforrnation ,dboul< uf
retrieval. These tools also allo;v users to choose appropriate Special librariansh ip and lnfarmmio n Work, 6th edn, London,
search terms for As lib, , I9i°-233.
conducting andior modifying a search. Although subject heading sts s Gumcbat, C. and_ Men~u, M., General lrurod11ctio 11 to the T,,c/mi ues 1992
I
li 16
uch as LCSH a11d Documentallon Work, Pans, Unesco. I983. of ln 'i:irmation
are primaril y used for indexing catalogue records, some research~r q · ~•-
s _have used them 17 Townley, H. M. and Gee, R. D., Th esmm.,s-m aking · ro
in indexing internet resources . Thesauri have long been used
lor mdexmg onlme London, Andre Deutsch, 1980.
databases and almost all maJ.or c, nJ'me daIahases come with onlme thesaurus g w y our 011 ·11 wo:-d-.ctock .
· . d 18 Lancaster, F. W., Ellikcr, C. and Colone ll. T. H ., Subjecl Anal .
. b
interfaces. Ontologies, which can e cons1-d e1.,~c'I as some s011 of controlle . . .
d. Jnfarmatio11 Science and 7'ech11ology, 24. I ygg_ J5-_ _
vocabularies expressed in an ontol ogy representa tion language, • ys is, A111nw/ Re\'/eiv o/
are now use 11: l9 84
l3hanacharyya. Ci, Classaurus: its fundamen tals. design
. th e mtemet
. and
organ izing info rmation m an d 111
· tr,·anet env ironment; and more
Klassifikati on, 11. I 982, 139-48.
.
specifically in the context of the semant ic web _7l- 74 use , S111d1c11 wr
20 Dcvadasan, F. .I., Online Construc1ion of A!phabetical Th ..
and indc;-,i:,~ :ot,! . lnf.u,:~~tio11 P rocessing and Man :-! ar->n
e~.iurus: a vocabu lary connnl
Refere nces
21 Fucmann. :: .. t ... ~ .. '.::-dcuve Classaurus on PC
1990. 133-7 . 1 . .: · .
1
e,,,,
2 t . I 985. 11 - 26
· '1 , ..: , ,,a t,on,.1/ Class{fica1ion. 1. } (3 -ti.
Davis, C H and Rush. C. L.. Guide lo ln(nn1111rw:i Sci~1rre , Wcs1po11. CT, lircenwnod 22 Chowdhury. G. Ci.. ~ee l"'.neghnn. A and l_"l •owd hury. S .. Vu .
i 1979
, J;
~
M1crofs~s Datab.::e\ n P~sc~/ mterfo~e. 1"1nr
('ol nmb 1=1 . !\•l:ty ·- h t )9 ""1 llil fHl hli-;ht:d
l11 1er na 1iona / ~:1hulary Cu111 rlll 011 !1 11~
Congress oFcns-,sl.\ .
11 1