0% found this document useful (0 votes)

64 views48 pages

Chapter 3 Indexing

Uploaded by

Dawit Sebhat

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

64 views48 pages

Chapter 3 Indexing

Uploaded by

Dawit Sebhat

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 48

Indexing: Basic Concepts

 Indexing is an arrangement of index terms to

permit fast searching and reducing memory space
requirement
 used to speed up access to desired information
from document collection as per users query such
that
 It enhances efficiency in terms of time for
retrieval. Relevant documents are searched and
retrieved quick
 Index file usually has index terms in a sorted
order. Which list is easier to search?
fox pig zebra hen ant cat dog lion ox
ant cat dog fox hen lion ox pig zebra
 An index file consists of records, called index
entries. Index files are much smaller than the
original file.
 Remember Heaps Law: in 1 GB of text
collection the vocabulary has a size of only 5
MB. This size may be further reduced by
Linguistic pre-processing (or text
operations).
 The usual unit for indexing is the word
 Index terms - are used to look up records in
a file.
Major Steps in Index Construction
Source file: Collection of text document
A document can be described by a set of
representative keywords called index terms.
Index Terms Selection: apply text operations or
preprocessing
Tokenize: identify words in a document, so that each
document is represented by a list of keywords or
attributes
Stop words removal: words with high frequency are
non-content bearing and needs to be removed from
text collection
… continued
Word stem: reduce words with similar meaning
into their stem/root word
Term relevance weight: Different index terms
have varying relevance when used to describe
document contents. This effect is captured
through the assignment of numerical weights
to each index term of a document. There are
different index terms weighting methods:
including TF, IDF,TF*IDF, …
Indexing structure: a set of index terms
(vocabulary) are organized in Index File to easily
identify documents in which each term occurs in.
Basic Indexing Process
Documents to
be indexed. Friends, Romans, countrymen.

Token Tokenizer
stream. Friends Romans countrymen

Modified Linguistic friend roman countryman

tokens. preprocessor

Index File Indexer

friend 2 4

roman 1 2
Inverted file countryman 13 16
Index file Evaluation Metrics
Running time of the main operations
 Access/search time
 How much is the running time to find the required search
key from the list?
Update time (Insertion time, Deletion time)
 How much time does it take to update existing records in
an attempt to add new terms or delete existing
unnecessary terms?
 Does the indexing structure allows incremental update or
re-indexing?
Space overhead
 Computer storage space consumed for keeping the list.
Building Index file
An index file of a document is a file consisting of a list of
index terms and a link to one or more documents that has
the index term
An index file is list of search terms that are organized for
associative look-up, i.e., to answer user’s query:
In which documents does a specified search term appear?
Where within each document does each term appear? (There may be
several occurrences.)
For organizing index file for a collection of documents,
there are various options available:
Decide what data structure and/or file structure to use. Is it
sequential file, inverted file, suffix tree, etc. ?
Sequential File
•Sequential file is the most primitive file structures.
• It has no vocabulary as well as linking pointers.
•The records are generally arranged serially, one after
another, but in lexicographic order on the value of
some key field. i.e
• a particular attribute is chosen as primary key whose value
will determine the order of the records.
• when the first key fails to discriminate among records, a
second key is chosen to give an order.
Example:
Given a collection of documents, they are parsed to
extract words and these are saved with the
Document ID.

I did enact Julius

Doc 1 Caesar I was killed
I the Capitol;
Brutus killed me.

So let it be with
Doc 2 Caesar. The noble
Brutus has told you
Caesar was ambitious
Sorting the Vocabulary
Term Doc #
I 1
 After all documents did
enact
1
1
Sequential file
have been julius 1
Doc
caesar 1
tokenized, I 1 Term No.
stopwords are was
killed
1
1 1 ambition 2
removed, and I 1
2 brutus 1
the 1
normalization and capitol 1
3 brutus 2
stemming are brutus 1
killed 1 4 capitol 1
applied, to generate me 1
so 2 5 caesar 1
index terms let 2
6 caesar 2
 These index terms it
be
2
2 7 caesar 2
in sequential file are with
caesar
2
2 8 enact 1
sorted in the 2
9 julius 1
alphabetical order noble
brutus
2
2 10 kill 1
hath 2
told 2 11 kill 1
you 2
caesar 2 12 noble 2
was 2
ambitious 2
Sequential File
To access records search serially;
starting at the first record read and investigate all the
succeeding records until the required record is found
or end of the file is reached.
Its main advantages:
easy to implement;
provides fast access to the next record using
lexicographic order.
 Can be searched quickly, using binary search, O(log n)
…continued
Update options: Is the index needs to be rebuilt or
incremental update is supported?
Its disadvantages:
No weights attached to terms.
Random access is slow: since similar terms are
indexed individually, we need to find all terms
that match with the query
Inverted file
A word oriented indexing mechanism based on sorted list
of keywords, with each keyword having links to the
documents containing it
 Building and maintaining an inverted index is a relatively low cost
risk. On a text of n words an inverted index can be built in O(n)
time
 This list is inverted from a list of terms in location order to a list of
terms in alphabetical order.

Word Extraction Word IDs

Original
Documents •W1:d1,d2,d3
•W2:d2,d4,d7,d9
•…
•Wn :di,…dn
Document IDs
•Inverted Files
Example
 Text:
1 6 12 16 18 25 29 36 40 45 54 58 66 70

That house has a garden. The garden has many flowers. The flowers are beautiful

 Inverted file
Vocabulary Occurrences
beautiful 70
flowers 45, 58
garden 18, 29
house 6
• Space Requirements(for inverted file)
• The space required for the vocabulary is rather
small. According to Heaps’ law the vocabulary
grows as O(n), where  is a constant between 0.4
and 0.6 in practice (sublinear)
• On the other hand, the occurrences demand much
more space. Since each word appearing in the text
is referenced once in that structure, the extra space
is O(n)
• To reduce space requirements, a technique called
block addressing is used
•Block Addressing
• The text is divided in blocks
• The occurrences point to the blocks where
the word appears
• Advantages:
– the number of pointers is smaller than positions
– all the occurrences of a word inside a single
block are collapsed to one reference
• Disadvantages:
– online search over the qualifying blocks if exact
positions are required
•Example
• Text:
Block 1 Block 2 Block 3 Block 4

That house has a garden. The garden has many flowers. The flowers are beautiful

• Inverted file:
Vocabulary Occurrences
beautiful 4
flowers 3
garden 2
house 1
Inverted file
Data to be held in the inverted file includes
The vocabulary (List of terms): is the set of all distinct
words (index terms) in the text collection.
Having information about vocabulary (list of terms) speeds
searching for relevant documents
For each term: the inverted file contains information
related to
Location: all the text locations/positions where the word occurs
frequency of occurrence of terms in a document collection
 TFij, number of occurrences of term tj in document di
 DFj, number of documents containing tj
 CF, total frequency of tj in the corpus n
 mi, maximum frequency of any term in di
 n, total number of documents in a collection
 ….
Inverted file

Having information about the location of each term within

the document helps for:
user interface design: highlight location of search term
proximity based ranking: adjacency and near operators (in
Boolean searching)
Having information about frequency is used for:
calculatingterm weighting (like TF, TF*IDF, …)
optimizing query processing
Inverted File
Documents are organized by the terms/words they contain
Term CF Doc ID TF Location
term 1 3 2 1 66
This is called
19 1 213 an index file.
29 1 45
term 2 4 3 1 94
19 2 7, 212 Text operations
are performed
22 1 56
before building
term 3 1 5 1 43 the index.
term 4 3 11 2 3, 70
34 1 40

Is it possible to keep all these information during searching?

Construction of Inverted file
An inverted index consists of two files: vocabulary and
posting files
A vocabulary file (Word list):
 stores all of the distinct terms (keywords) that appear in any of
the documents (in lexicographical order) and
 For each word a pointer to posting file
Records kept for each term j in the word list contains the
following:
 term j
 number of documents in which term j occurs (DFj)
 Collection frequency of term j (Cf)
 pointer to inverted (postings) list for term j
Postings File (Inverted List)
For each distinct term in the vocabulary, stores a list of
pointers to the documents that contain that term.
Each element in an inverted list is called a posting, i.e., the
occurrence of a term in a document
Each list consists of one or many individual postings

Advantage of dividing inverted file:

Keeping a pointer in the vocabulary to the list in the posting
file allows:
 the vocabulary to be kept in memory at search time even for
large text collection, while the Posting file is kept on disk for
accessing to documents
General structure of Inverted File
 The following figure shows the general structure of
inverted index file.
Organization of Index File
Vocabulary
Postings
(word list) Documents
(inverted list)
Pointer
Term DF TF To
posting

term 1 3 3 Inverted
term 2 3 4 lists

term 3 1 1
term 4 2 3
Example:
Given a collection of documents, they are parsed to
extract words and these are saved with the
Document ID.

I did enact Julius

Doc 1 Caesar I was killed
I the Capitol;
Brutus killed me.

So let it be with
Doc 2 Caesar. The noble
Brutus has told you
Caesar was ambitious
Sorting the Vocabulary Term Doc #
Term Doc # ambitious 2
I 1 be 2
 After all documents did 1 brutus 1
enact 1 brutus 2
have been tokenized julius 1 capitol 1
the inverted file is caesar
I
1
1
caesar
caesar
1
2
sorted by terms was
killed
1
1
caesar 2
did 1
I 1 enact 1
the 1 has 1
capitol 1 I 1
brutus 1 I 1
killed 1 I 1
me 1 it 2
so 2
julius 1
let 2
killed 1
it 2
killed 1
be 2
let 2
with 2
caesar 2 me 1
the 2 noble 2
noble 2 so 2
brutus 2 the 1
hath 2 the 2
told 2 told 2
you 2 you 2
caesar 2 was 1
was 2 was 2
ambitious 2 with 2
Remove stop words, stemming & compute
frequency
Multiple term Term Doc #
entries in a ambition 2 Term Doc # TF
single ambition 2 1
brutus 1
document are brutus 1 1
brutus 2
merged and brutus 2 1
capitol 1
frequency capitol 1 1
caesar 1
information
caesar 2 caesar 1 1
added
caesar 2 caesar 2 2
enact 1 enact 1 1
julius 1 julius 1 1
kill 1 kill 1 2
kill 1 noble 2 1
noble 2
Vocabulary and postings file
The file is commonly split into a Dictionary and a Posting file
vocabulary posting
Term Doc # TF Term DF CF Doc # TF
ambition 2 1 2 1
ambitious 1 1
brutus 1 1 1 1
brutus 2 1 brutus 2 2
2 1
capitol 1 1 capitol 1 1
1 1
caesar 1 1 caesar 2 3 1 1
caesar 2 2 enact 1 1 2 2
enact 1 1 julius 1 1 1 1
julius 1 1 kill 1 2 1 1
kill 1 2 1 2
noble 1 1
noble 2 1 2 1

Pointers
Searching on Inverted File
 Since the whole index file is divided into two,
searching can be done faster by loading vocabulary list
which takes less memory even for large document
collection
 Using binary Search the searching takes logarithmic
time
 The search is in the vocabulary lists

 Updating inverted file is complex.

 We need to update both vocabulary and posting files
Example: Create Inverted file
 Map the file names to file IDs
 Consider the following Original Documents
D1 The Department of Computer Science was established in
1984.
D2 The Department launched its first BSc in Computer
Studies in 1987.
D3 Followed by the MSc in Computer Science which was
started in 1991.
D4 The Department also produced its first PhD graduate in
1994.
D5 Our staff have contributed intellectually and
professionally to the advancements in these fields.
Suffix tree
 What is Suffix? A suffix is a substring that exists at the end of the
given string.
 Each position in the text is considered as a text suffix

 If txt=t1t2...ti...tn is a string, then Ti=ti, ti+1...tn is the suffix of txt that starts at
position i, where 1≤ i ≤ n
 Example: txt = mississippi txt = GOOGOL
T1 = mississippi; T1 = GOOGOL
T2 = ississippi; T2 = OOGOL
T3 = ssissippi; T3 = OGOL
T4 = sissippi; T4 = GOL
T5 = issippi; T5 = OL
T6 = ssippi; T6 = L
T7 = sippi;
T8 = ippi; Exercise: generate suffix of “technology” ?
T9 = ppi;
T10 = pi;
T11 = i;
Suffix trie
•A suffix trie is an ordinary trie in which the input
strings are all possible suffixes.
–Principles: The idea behind suffix TRIE is to assign to each
symbol in a text an index corresponding to its position in the
text. (i.e: First symbol has index 1, last symbol has index n
(#of symbols in text).
• To build the suffix TRIE we use these indices instead of the
actual object.
•The structure has several advantages:
–It requires less storage space.
–We do not have to worry how the text is represented (binary,
ASCII, etc).
–We do not have to store the same object twice (no
duplicate).
Suffix Trie
•Construct suffix trie for the following string: GOOGOL
•We begin by giving a position to every suffix in the text starting
from left to right as per characters occurrence in the string.
• TEXT: G O O G O L $
POSITION: 1 2 3 4 5 6 7
•Build a SUFFIX TRIE for all n suffixes of the text.
•Note: The resulting tree has n leaves and height n.

• This structure is
particularly useful
for any application
requiring prefix
based ("starts
with") pattern
matching.
Suffix tree
A suffix tree is an extension of
suffix trie that construct a Trie of
all the proper suffixes of S
 The suffix tree is created by
compacting unary nodes of
the suffix TRIE.
We store pointers rather
than words in the leaves.
 It is also possible to replace
strings in every edge by a pair
(a,b), where a & b are the
beginning and end index of
the string. i.e.
(3,7) for OGOL$
(1,2) for GO
(7,7) for $
Example
 Text:
1 6 12 16 18 25 29 36 40 45 54 58 66 70

That house has a garden. The garden has many flowers. The flowers are beautiful

 Suffix Trie
70
‘b’ ‘ ’ 58
‘l’ ‘o’ ‘w’ ‘e’ ‘r’ ‘s’
‘f’ 45
‘g’ ‘.’
‘e’‘n’ ‘ ’ 29
‘a’ ‘r’ ‘d’
h’ 18
6 ‘.’
Example
 Text:
1 6 12 16 18 25 29 36 40 45 54 58 66 70

That house has a garden. The garden has many flowers. The flowers are beautiful

 Suffix Tree
70
‘b’ 8 ‘.’ 45
1 ‘f’ 58
‘g’ ‘ ’
‘.’ 18
‘h’ 7
‘’ 29
6
Suffix Arrays
 An array containing all the pointers to the text
suffixes listed in lexicographical order
 The space requirements are almost the same as those for
inverted indices
 The main drawbacks of suffix array are its costly
construction process
 Allow binary searches done by comparing the
contents of each pointer
 Supra-indices (for large suffix array)
 The space requirements of suffix array with vocabulary
supra-index are exactly the same as for inverted indices
Example
 Text:
1 6 12 16 18 25 29 36 40 45 54 58 66 70

That house has a garden. The garden has many flowers. The flowers are beautiful

 Vocabulary Supra-Index
beautiful flower garden house

 Suffix Array
70 58 45 29 18 6

 Inverted List
70 45 58 18 29 6
Example: Suffix tree
•Let s=abab, a suffix tree of s is a
compressed trie of all suffixes of s=abab$
•We label each leaf with the
starting point of the
•{ corresponding suffix.
• $ •$
• b$ •ab
•b •5
• ab$ •$
• bab$
• bab$ } •$ •ab$ •4
•ab$
•3
•2
•1
Generalized suffix tree
• Given a set of strings S, a generalized suffix tree of S is a
compressed trie of all suffixes of s  S
•To make suffixes prefix-free we add a special char, $, at the end of
s. To associate each suffix with a unique string in S add a different
special symbol to each s
• Build a suffix tree for the string s1$s2#, where `$' and `#'
are a special terminator for s1,s2.
•Ex.: Let s1=abab & s2=aab, a generalized suffix tree for s1 & s2 is:
•{ •a •$ •#
• $ # •b
•# •5 •4
• b$ b#
•b
• ab$ ab# •ab$ •ab$ •$
• bab$ aab# •3
•ab$ •# •4
• abab$ •$ •1
•2
•} •1 •3 •2
Search in suffix tree
 Searching for all instances of a substring S in a suffix tree is easy
since any substring of S is the prefix of some suffix.
 Pseudo-code for searching in suffix tree:
 Start at root
 Go down the tree by taking each time the corresponding path
 If S correspond to a node, then return all leaves in sub-tree
 the places where S can be found are given by the pointers in all the leaves
in the subtree rooted at x.
 If S encountered a NIL pointer before reaching the end, then S
is not in the tree
Example:
 If S = "GO" we take the GO path and return:
GOOGOL$,GOL$.
 If S = "OR" we take the O path and then we hit a NIL pointer so
"OR" is not in the tree.
Exercise
 Given the following index terms:
worker, word, world, run & information
construct index file using suffix tree?
Suffix Tree Applications
Suffix Tree can be used to solve a large number of string
problems that occur in:
 text-editing,
 free-text search,
 etc.

Some examples of string problems are given below.

 String matching
 Longest Common Substring
 Longest Repeated Substring
 Palindromes
 etc..
Complexity Analysis
 The suffix tree for a string has been built in O(n2)
time.
 Searching is very fast: The search time is linear in the
length of string S.
 The number of leaves is n+1, where n is the number of input
strings.
 Furthermore, in the leaves, we may store either the strings
themselves or pointers to the strings (that is, integers).
 Searching for a substring[1..m], in string[1..n], can be
solved in O(m) time.
 Expensive memory-wise because Suffix trees consume
a lot of space
Signature Files
 Characteristics
 Word-oriented index structures based on hashing
 Low overhead (10%~20% over the text size) at the cost of
forcing a sequential search over the index
 Suitable for not very large texts
 Inverted files outperform signature files for most
applications
Construction and Search
 Word-oriented index structures base on hashing
 Maps words to bit masks of B bits
 Divides the text in blocks of b words each
 The mask is obtained by bitwise ORing the signatures of
all the words in the text block.
 Search
 Hash the query to a bit mask W
 If W & Bi = W, the text block may contain the word
 For all candidate blocks, an online traversal must be
performed to verify if the word is actually there

Introduction To Information Storage and Retrieval: Chapter Four: Indexing Structure
100% (1)
Introduction To Information Storage and Retrieval: Chapter Four: Indexing Structure
34 pages
Indexing for Efficient Retrieval
No ratings yet
Indexing for Efficient Retrieval
26 pages
Understanding Indexing Structures
No ratings yet
Understanding Indexing Structures
38 pages
3 Index Construction
No ratings yet
3 Index Construction
43 pages
Index Construction Guide
No ratings yet
Index Construction Guide
43 pages
Chapter 4 IR
No ratings yet
Chapter 4 IR
56 pages
3 Indexing
No ratings yet
3 Indexing
28 pages
3-Index Construction
No ratings yet
3-Index Construction
43 pages
4 Indexing
No ratings yet
4 Indexing
59 pages
IR Chapter Three
No ratings yet
IR Chapter Three
30 pages
Heaps Law Linguistic Pre-Processing Index Terms
No ratings yet
Heaps Law Linguistic Pre-Processing Index Terms
8 pages
4 Indexing
No ratings yet
4 Indexing
29 pages
Indexing Structures Explained
No ratings yet
Indexing Structures Explained
44 pages
IR Chap3
No ratings yet
IR Chap3
45 pages
Ir Chapter Three
No ratings yet
Ir Chapter Three
41 pages
IR Chapter Three
No ratings yet
IR Chapter Three
59 pages
Completed UNIT-III 20.9.17
No ratings yet
Completed UNIT-III 20.9.17
61 pages
Chapter 3,4, 5 and 6
No ratings yet
Chapter 3,4, 5 and 6
145 pages
Data Structures & Algorithms for IR
No ratings yet
Data Structures & Algorithms for IR
34 pages
Indexing Techniques for Search Engines
No ratings yet
Indexing Techniques for Search Engines
56 pages
Unit 3 Indexing
100% (1)
Unit 3 Indexing
10 pages
IR Chapter Three
No ratings yet
IR Chapter Three
30 pages
Indexing 1
No ratings yet
Indexing 1
61 pages
Unit 2
No ratings yet
Unit 2
10 pages
Indexing and Searching Techniques
No ratings yet
Indexing and Searching Techniques
15 pages
IRS Module5-I
No ratings yet
IRS Module5-I
15 pages
IR Unit 2 Dictionaries and Query Processing
No ratings yet
IR Unit 2 Dictionaries and Query Processing
20 pages
Assignment 3
No ratings yet
Assignment 3
3 pages
L05
No ratings yet
L05
33 pages
Text Indexing
No ratings yet
Text Indexing
11 pages
Ir Mod4 Notes
No ratings yet
Ir Mod4 Notes
19 pages
Course Name: Advanced Information Retrieval
No ratings yet
Course Name: Advanced Information Retrieval
6 pages
FOP Efficiency Indexing 13
No ratings yet
FOP Efficiency Indexing 13
22 pages
Inverted Index Construction Explained
No ratings yet
Inverted Index Construction Explained
21 pages
Lecture 2 Inverted Index PDF
No ratings yet
Lecture 2 Inverted Index PDF
24 pages
Inverted Index
No ratings yet
Inverted Index
13 pages
4.index Construction - New
No ratings yet
4.index Construction - New
46 pages
Efficient IR Indexing Techniques
No ratings yet
Efficient IR Indexing Techniques
153 pages
Unit 2 IR
No ratings yet
Unit 2 IR
13 pages
03lecture 3 - Biomedical IR-indexing
No ratings yet
03lecture 3 - Biomedical IR-indexing
27 pages
Assignment No: 3: Aim: Objective: Theory:-Inverted Index
No ratings yet
Assignment No: 3: Aim: Objective: Theory:-Inverted Index
2 pages
Unit II-1
No ratings yet
Unit II-1
57 pages
Inverted File
No ratings yet
Inverted File
20 pages
03 - Lect3 Search Engines-Part2
No ratings yet
03 - Lect3 Search Engines-Part2
32 pages
115 Ir 9
No ratings yet
115 Ir 9
4 pages
(Wiki) Inverted Index
No ratings yet
(Wiki) Inverted Index
3 pages
Chap5 Index Construction
No ratings yet
Chap5 Index Construction
38 pages
Chapter-2 - Automatic Text Anlysis
No ratings yet
Chapter-2 - Automatic Text Anlysis
67 pages
Inverted Indexing for Information Retrieval
No ratings yet
Inverted Indexing for Information Retrieval
32 pages
Module 1-1
No ratings yet
Module 1-1
12 pages
Inverted Index Construction Explained
No ratings yet
Inverted Index Construction Explained
10 pages
IR Unit III - Notes
No ratings yet
IR Unit III - Notes
18 pages
AI6122 Topic 3.1 - Index
No ratings yet
AI6122 Topic 3.1 - Index
40 pages
Inverted Index in Information Retrieval
No ratings yet
Inverted Index in Information Retrieval
24 pages
Chapter 3
No ratings yet
Chapter 3
90 pages
Understanding Information Retrieval Systems
No ratings yet
Understanding Information Retrieval Systems
18 pages
Multi Media Material
No ratings yet
Multi Media Material
101 pages
Red It
No ratings yet
Red It
30 pages
Chapter 4
No ratings yet
Chapter 4
83 pages
PHP and MySQL for Dynamic Websites
No ratings yet
PHP and MySQL for Dynamic Websites
29 pages
TCP/IP Basics for Network Admins
No ratings yet
TCP/IP Basics for Network Admins
38 pages
Chapter 2
No ratings yet
Chapter 2
58 pages
Chapter 5 Retrieval Efective
No ratings yet
Chapter 5 Retrieval Efective
24 pages
Chapter 2
No ratings yet
Chapter 2
25 pages
4 Year 2 Semester Final Exam Schedule
No ratings yet
4 Year 2 Semester Final Exam Schedule
2 pages
Understanding DNS and ICANN Functions
No ratings yet
Understanding DNS and ICANN Functions
60 pages
Ethics ch1
No ratings yet
Ethics ch1
25 pages
IT-Chapter 1 B PPT 2015-20
No ratings yet
IT-Chapter 1 B PPT 2015-20
21 pages
AI Problem Solving Techniques
No ratings yet
AI Problem Solving Techniques
52 pages
Session Validation
No ratings yet
Session Validation
2 pages
IT Chapter 2 2015
No ratings yet
IT Chapter 2 2015
26 pages
IT Chapter 5 2015
No ratings yet
IT Chapter 5 2015
41 pages
IT Chapter 4 2015
100% (1)
IT Chapter 4 2015
30 pages
IT Chapter 6 2015
No ratings yet
IT Chapter 6 2015
20 pages
MID - Exam For Emerging Technology
100% (5)
MID - Exam For Emerging Technology
4 pages
IT Final Exam Paper 2016
No ratings yet
IT Final Exam Paper 2016
7 pages
Model Exam For Remedial Alliance
No ratings yet
Model Exam For Remedial Alliance
4 pages
Chemistry MODEL EXAM - 2
No ratings yet
Chemistry MODEL EXAM - 2
2 pages
Remedial Chemistry Model Exam 2024
100% (3)
Remedial Chemistry Model Exam 2024
6 pages
Mini Monitor Module Installation Guide: Troubleshooting
No ratings yet
Mini Monitor Module Installation Guide: Troubleshooting
2 pages
Amith Kumar
No ratings yet
Amith Kumar
2 pages
Term Paper Table of Contents
No ratings yet
Term Paper Table of Contents
5 pages
Autocad Mep 2009
100% (1)
Autocad Mep 2009
98 pages
Computers Buyers & Importers in India
No ratings yet
Computers Buyers & Importers in India
7 pages
Beyer Dynamic MCM 800
No ratings yet
Beyer Dynamic MCM 800
2 pages
DS WhitePapers Working With Derived Format Converter
No ratings yet
DS WhitePapers Working With Derived Format Converter
58 pages
Robotics1 MidTerm 2023-24 23.11.15
No ratings yet
Robotics1 MidTerm 2023-24 23.11.15
11 pages
IOS Operating System
No ratings yet
IOS Operating System
14 pages
X Flowww
No ratings yet
X Flowww
3 pages
Danfoss FC 202P22KT4E20H2XGXXXXSXXXXAXBXCXXXXDX 131F6765
No ratings yet
Danfoss FC 202P22KT4E20H2XGXXXXSXXXXAXBXCXXXXDX 131F6765
31 pages
Cybersecurity Incident Simulation
No ratings yet
Cybersecurity Incident Simulation
8 pages
Prashant Kumar CV
No ratings yet
Prashant Kumar CV
4 pages
Joining Form Indigo
No ratings yet
Joining Form Indigo
2 pages
Data Mining Classifiers Guide
No ratings yet
Data Mining Classifiers Guide
23 pages
p7211 e PDF
No ratings yet
p7211 e PDF
4 pages
EyeBall™ Set-Up and Calibration Guide (Large PC - Cammegh
No ratings yet
EyeBall™ Set-Up and Calibration Guide (Large PC - Cammegh
13 pages
MetaboScape 5.0 User Manual
No ratings yet
MetaboScape 5.0 User Manual
272 pages
Gantt Charts in LaTeX with pgfgantt
No ratings yet
Gantt Charts in LaTeX with pgfgantt
103 pages
Backup: (Note 1)
No ratings yet
Backup: (Note 1)
20 pages
Analyzing Function Continuity
No ratings yet
Analyzing Function Continuity
49 pages
Parts of Microsoft Word and Shortcut Keys
No ratings yet
Parts of Microsoft Word and Shortcut Keys
18 pages
Aruba Switches for Partners
No ratings yet
Aruba Switches for Partners
2 pages
Introduction To Informaiton Technology Express Learning by ITL Education Solutions Limited
No ratings yet
Introduction To Informaiton Technology Express Learning by ITL Education Solutions Limited
409 pages
KNX & DALI Lighting Control Training
No ratings yet
KNX & DALI Lighting Control Training
19 pages
Guide - Class Setups - Post-Moon Lord - Official Calamity Mod Wiki
No ratings yet
Guide - Class Setups - Post-Moon Lord - Official Calamity Mod Wiki
15 pages
Associate Operations Engineer
No ratings yet
Associate Operations Engineer
2 pages
India's IT Sector: Growth & Impact
No ratings yet
India's IT Sector: Growth & Impact
4 pages
Digital SAT Math Section Guide
No ratings yet
Digital SAT Math Section Guide
2 pages
SC-300 Exam Prep for IT Pros
No ratings yet
SC-300 Exam Prep for IT Pros
34 pages

Chapter 3 Indexing

Uploaded by

Chapter 3 Indexing

Uploaded by

Indexing: Basic Concepts

 Indexing is an arrangement of index terms to

Modified Linguistic friend roman countryman

Index File Indexer

I did enact Julius

Word Extraction Word IDs

Having information about the location of each term within

Is it possible to keep all these information during searching?

Advantage of dividing inverted file:

I did enact Julius

 Updating inverted file is complex.

Some examples of string problems are given below.

You might also like