XI-1

XI.

Document Indexing Based on Relevance Feedback T. L. Brauen, R. C. Holt and T. R. Wilcox

Abstract A relatively simple method is proposed in this study for using accumulated relevance feedback information in an automatic information retrieval system to improve precision and recall. This method involves

the dynamic modification of relevant document vectors after each retrieval run. Experiments on a collection of 425 documents indicate More

that this method does improve precision and recall significantly. experiments must be conducted to perfect the method for large scale practical applications.

1.

Introduction User feedback techniques have been developed in an attempt to

achieve higher precision and recall in automatic retrieval systems such as SMART. The user gives the system an indication of the relevance to his This information about what the query should

query of certain documents.

retrieve is used by the system to modify query and document vectors so that the desired documents are retrieved. It is hoped that other

documents in the same area of the document space will also be relevant to the query. Current proposals use this relevance information only in the processing of one query. There is, however, reason to believe that this

information will also be helpful in the processing of similar queries

XI-2

from other users.

It seems quite reasonable to assume that a number of

knowledgeable users submitting similar queries may select roughly the same set of documents as relevant to their query. If this is indeed the

case, it will be of advantage to the system to save relevance information for each of the processed queries. This information can be used as a

first approximation to future users1 responses in the feedback process. As the relevance information for a given type of query builds up, this first approximation should become a very good one and thus reduce the number of feedback iterations necessary to satisfy the users' requests. The time saved can be used to continue searching for more relevant documents, thereby improving the recall. Given that feedback information should be retained by the system, it is clear that normal query modification will not achieve this aim since queries are not a permanent part of the system. It is, therefore,

natural to focus one's attention on the document vectors since they are a permanent part of any retrieval system. Some work has been done in the area of document vector modification. Friedman, Maceyak and Weiss [1] warp the documents closer to the query and thus improve the chances of a better recall performance. is, however, only temporary. This modification

Davis, Linsky and Zelkowitz [2] also modify The modification

the entire document space in an effort to improve recall. is permanent.

These methods are disadvantageous in that they are time

consuming, the modifications being complicated, and requiring the manipulation of the entire document space. These two methods would probably not be

feasible in a large retrieval system.

XI-3

2.

Method The problem is to find a method which retains the feedback

information, but which is practical in the sense that it uses limited amounts of time and space regardless of the size of the document collection. procedure. To achieve these ends the authors propose the following The original query vector q is repeatedly modified, using

any of the existing relevance feedback techniques, until a query vector q is obtained which retrieves a set of documents D acceptable to the The terms and weights of the document vectors in D are now d, x in D

user.

modified to decrease the angle between each document vector

and the original query vactor (formula (1))

q .

Since the cosine correlation

n cos (q j=i V j=i j=i
(i)

" 1

4- \-\

(where d. denotes the j

term weight of vector d.)

and d., o x a decrease in the angle between vectors increases their correlation. For the original query vector q and its relevant documents d,

is used to determine the correlation between the vectors

q

in D defined above, the document modification process takes place in two steps. For each relevant document vector, the query vector is first

normalized to the "length" of the document vector by making the sum of weights of q equal to the corresponding sum for d.. If A is the

XI-4

sum of the weights of the query vector, and if

A

is the sum for the

document vector, then the normalized query vector is defined as:

a o

= q o

(A./A ) . d q d.

(2)

The weights of the relevant document vector according to the formula:

are then modified

d? = d? + 1 1

a(cp - d?) ^o I

where 0<a<l.

(3)

By normalizing the query vector as in (2) the above modification is based on vectors of equal length. means d. Since (3) is a linear transformation, this d. a. iq
o

will have the same length as With 0<a<l, the term

and prevents document vectors - d.)
i

from shrinking.*

may be fractional.

In

this case, the fraction is rounded to the nearest integer. Of course, in actual systems, a document term whose weight zero is not stored in the document vector. a non-zero weight in d. is

Thus, if a term appears with

q but is absent in d. , the term is added to the ^o I document vector with a weight proportional to its corresponding weight in the query vector. Terms appearing in both the query and document vectors

lead to modifications based directly on (3), while terms appearing in neither vector are not considered. a
^o

Fig. 1 illustrates the modification process for

q

and

d.
i

in two-space,
^

The length of any document vector Z d? . From (3) a3
1
3

d,

equals the sum of its term weights

J
= a3 + a (or3 1
3

a3)
1
3

O

z a = z a
i

+ a(z q
J since
.

- z a3)
J
. ^o

D D E a3 « Z a3
i . i

. i

o

, i

I q3 = Z a3 D 3
. i

from (2).

D

D

XI-5 Term 2

Term 1 Document Modification Process Fig. 1

Two features of the modification process should be noted.

First,

only the vectors of documents judged relevant to the query are modified. Second, if the same query is submitted by another user, the documents judged relevant by the first user will be retrieved more quickly because the modification has increased their correlation with the query vector. Ideally, most of these documents will also be relevant to the second user, and retrieval performance will therefore be improved. Since queries are phrased in precise terms, similar queries may reasonably be expected to map to similar query vectors. If user B submits

a query similar to, but not the same as, a query previously submitted by user Af the documents judged relevant by user A should be retrieved quickly This is true because the modification process increases the

for user B.

correlation between user A's query vector and his relevant document vectors, and because user B's query vector is similar to user A's query vector.

XI-6

The possibility of improved retrieval performance for resubmitted identical queries under the modification process is obvious, but is of limited value. However, the fact that improved performance may also be expected for queries that are somewhat similar,but not identical, gives this method potential for practical application. This improvement for queries that

are only similar is demonstrated in the experimental results given below. The method takes little time to execute, and so is not costly to implement. Modification of vectors in large systems does, however, raise This is

the possibility of necessary reclustering for search purposes. not as large a concern as may perhaps appear *

In large systems, the per-

centage of document vectors modified for any query will be quite small, and if a is small, any changes made for a query will be slight. Thus,

reclustering need not be undertaken often, and may well be necessary no more frequently than the reclustering required by the addition of new document vectors.

3.

The Experiment The method was tested using two document collections — the Cranfield

200 collection, consisting of 200 documents in aerodynamics, and 42 queries for which relevance judgments are available, and the Cranfield 400 collection, consisting of 425 documents and 155 queries with relevance judgments. The latter is the Cranfield 1400 collection compressed to in-

clude only those documents relevant to at least one query.

XI-7

The following is a summary of the test procedure.

The set of

query vectors, along with a list of their relevant documents, and the set of document vectors are read into the system of testing programs. The query vectors are ordered in a random sequence and divided into two groups. The first group is used by the system to modify the

document vectors while the second group is saved for later comparison. For each query from the first group, the query vector and all its relevant document vectors are recovered from the query and document vector spaces. The a priori knowledge of relevant documents for Each

the query simulates the results of relevance feedback iterations.

relevant document vector is then modified using the normalized query vector according to formula (3). The modified document vectors are then returned to the document vector space. When all relevant document vectors for all queries in group one have been modified, the second group of queries is used to test the effectiveness of the modification. Each query from the second

group is submitted to a program which correlates it with vectors in the modified document space. The ten highest correlating documents are

printed out along with the ranks of the relevant documents. For each query submitted, global (normalized) precision and global (normalized) recall measures are computed. If there are n

relevant documents in a space of N documents, and if r. is the rank of the i relevant document, the normalized precision is given by

n P = 1- I ( \ C > norm i=l log r ± - ) loq r. - \ i«l log i) / log (Nl/tN-n) In!) | log i) / log (N'/(N-n) Jnl)! (4)

XI-8

and normalized recall is given by n R = 1 - (l/n)(\ norm / i-1 These normalized measures are used because they reflect the rankings of the relevant documents over the whole collection rather than rankings up to some arbitrary point. This eliminates the need for a subjective (r. - i) / (N-n)). 1 (5)

choice of cut-off point which may bias the results. When all queries from group two have been processed, an average normalized precision and recall are computed for all queries in the group. The second group of queries is also correlated with document vectors in the original, unmodified document vector space, and the same normalized precision and recall measures are computed. The average global precision

and the average global recall results for the two sets of correlations are compared using a t-test. From this, it is determined whether the differences

in the measures observed over the two sets of runs are significant.

4.

Experimental Results A number of experiments were performed using the Cranfield 200

collection.

In each run, the set of 42 queries was divided into a group

of 37 queries, used to modify the document vectors, and a group of 5 queries used to test the effect of these modifications. Comparison of normalized

recall and precision results for the 5 testing queries submitted to the modified and unmodified spaces shows no significant change in performance. These results are caused by the small size of the collection. With 37

queries modifying about 4 relevant documents each, each document is modified an average of 0.75 times. Thus, significant modification to the document

XI-9

vector space is not achieved, and results are much the same as those for the original document vector space. have overlapping relevant documents. It also appears that few queries As a result, modifications by

the first 37 queries have only a small effect on the relevant documents of the 5 test queries. A number of experiments performed using the Cranfield 400 collection produces more favorable results. In each run, the set of

155 queries was partitioned into a group of 124 queries, used to modify the document vectors and a group of 31 queries, used to test the effect of the modifications. Experiments were run using two randomly generated

partitions, designated A and B and four values of a, ranging from 0.05 to 0.4. In one special experiment, designated III-A', modifications

were made to a null space in which all document vectors were initialized to zero, instead of to the Cranfield document vectors. these experiments are shown in Table 1. The results of

; Run No. a

-i

;

i

I

K Ave Normed P r e c i s i o n H Ave Normed R e c a l l Mod % % Unmod Mod 1 Unmod S p a c e S p a c e Change Space S p a c e Change .6274 .6274 .5434 .6294 .6788 .7073 .5766 .7182 +8.2 +12.7 +6.1 +8.1 .8891 .8891 .8439 .8891 .9100 .9231 .8624 .9309 +2.4 +3.8 +2.2 +4.7

IA IIA

!

;

I IB IIIA ^ ^ IIIB IVB

.05 .10 .10 .25

s&\ 1 yMf 3 4 .54 .25

/ & & .5960 .40 1 . 5 4 3 4 . 5 9 6 3

/A*A/7 +9.
+9,8 |

\/M9i,'j&iffy 1 y/A/y/A +2.8 .8439 .867
.8439 .8595

+1.8

]

Normalized Precision and Recall Results Table 1

* For run IIIA' modifications were made to the null space.

XI-10

For the first part of this discussion of results, special experiment IIIA' is ignored. In all runs, the average normalized recall and precision

values for the modified document space are considerably higher for the modified document space than the corresponding values for the unmodified space. A t-test applied to the results presented in Table 1 indicates a Standard deviations over

significance level of 0.01 or better in all cases.

the global precision and recall values for the 31 testing queries were calculated for each run. The standard deviations after the modification This indicates that the modifications

were essentially the same as before.

provided a uniform improvement across the document space. As can be seen from Table 1, varying a from 0.05 to 0.4 has no significant effect on the amount of improvement. It appears that an in-

crease of a from 0.05 to 0.4 produces two conflicting effects on the search results. For small values of a only, small changes are made in relevant

document vectors and therefore the modifications have only a small effect on the search results. The fact that significant improvement is nevertheless

achieved indicates that the modifications are mostly beneficial to the test queries. In the case of large a rather drastic modifications are made

to the document vectors, and therefore the modifications have a large effect on the retrieval performance. At the same time, the drastic extent

of the modifications causes a more sporadic and less consistent improvement. That is, beneficial modifications become more beneficial to the document space while any non-beneficial modifications become more harmful. The

net result is that variations in a in the experiments had little effect.

XI-11

The normalized recall values for the unmodified document space, as given in the table, are close to 0.9. Since the maximum possible

global recall is 1.0, a large improvement in the measure cannot be expected. In contrast, the global precision for the unmodified space In the

is about 0.6, leaving room for considerable improvement.

experiments, normalized recall is improved by three per cent on the average while normalized precision is improved by nine per cent on the average. It would appear that this difference between 3 and 9 per cent

reflects characteristics of the measures and the collection rather than a characteristic of the method. The special experiment IIIA1 is different from the rest in that the document vectors in the modified space are derived entirely independently of the Cranfield document vectors. Experiment IIIA1 is

identical with experiment IIIA, except that a dummy document space is substituted for the Cranfield document space. The effect of zero

document vectors is simulated in the dummy space by initializing the document vectors to a single large term weight which is not used in the Cranfield collection. This simulation is necessary so that during Using the

normalization, the query vectors would not be set to zero.

normalized recall measure in experiment IIIA1, the performance of the document vectors derived solely from the 124 modifying queries is seen to be surprisingly good — only 8.4 per cent below the performance of the unmodified Cranfield document vectors. One is tempted to hypothesize

that, given a few hundred more queries, the performance of the original Cranfield vectors would be surpassed. are necessary to test this hypothesis. Experiments with larger collections

XI-12

The most important overall result of these experiments is the fact that in all cases the modified Cranfield document vectors performed significantly better than the unmodified vectors. From Table 1, improvements in normalized precision range from 6.1 to 12.7 per cent, while improvements in normalized recall range from 1.8 to 4.7 per cent. This seems to indicate that in an actual automatic retrieval system, where a very large number of queries are available for modifications and a can be made small, this method could be of practical value. These experiments tend to support the assumption that over a period of time similar query vectors can be expected to be received requesting similar sets of documents.

5. Discussion Several interesting aspects of this method should be mentioned. As non-zero term weights are added to a document vector, the space required to represent the vector increases. In the experiments described here, the

modifications added enough non-zero terms so that the storage required for the document space increased by over 25 per cent. In an operating information

retrieval system, a routine could be used to compress the modified document space by setting small terms to zero. With continual modification of the document vectors, the original indexing information contained in the document vectors is lost. Equation C3) can be rewritten as follows: d? - tl-ot) d? + aP
i i
^o

.

(6)

XI-13

From equation (6) it is obvious that after one modification, a document term d. contains the fraction (1-a) of the original document term

information (d.) and the fraction a of query term information (q ) . Regardless of the values of the original document vector and the relevant queries, the fraction of original indexing information remaining in a document vector after n modifications is (1-a) . For example, in

experiment IVB, where a = 0.4, the fraction of original indexing information remaining in a document vector after 2 modifications to this document vector is (1-0.4) 2 = 0.36. In an operating information retrieval system,

a should be much smaller than 0.4, possibly on the order of 0.01, so that a few "noisy" queries cannot seriously damage the document space, i.e., so that the indexing reflects the judgment of a large number of users. In the Cranfield 400 collection, each query is associated with about 8 relevant documents. Since 124 queries were used for modification, about On the average, then, each document was

992 documents were modified. modified 2 to 3 times.

Thus, for a = 0.4, the preceding argument indicates

that less than about 0.36 of the original indexing information remains after the modifications. Thus, in experiment IVB, normalized precision and recall

were improved by replacing most of the original indexing information in each document vector by information derived from 2 or 3 queries. It may seem disturbing that the information in the original document vectors, which may have been constructed at considerable expense, may eventually be lost to the system. If desired, a simple modification to

the proposed method, such as flagging the original document term weights, could insure that these terms would remain intact. However, the advantage

of saving terms which are seldom if every used is not obvious.

XI-14

The results presented above suggest the following areas for research and application. Information retrieval centers might collect queries with

corresponding relevant documents. This information could form the basis for future indexing schemes. The proposed method might be used in a bootstrapping operation. A minimal and inexpensive technique is used for the initial indexing of a document collection. As the system handles more and more queries, the indexing is constantly corrected in a direction which improves performance. Similarly, in a field where the vocabulary is changing, correction can automatically be made in the direction of the new vocabulary. Further research is required to see which of these suggestions is of practical value.

References

]

Friedman, S. R. , Maceyak, J. A., and Weiss, S. F., A Relevance Feedback System Based on Document Transformation, Report No. ISR-12 to the National Science Foundation, Information Storage and Retrieval, Department of Computer Science, Cornell University, Ithaca, N. Y., Section X, June 1967. Davis, M., Linsky, M., and Zelkowitz, M. , A Relevance Feedback System Employing a Dynamically Evolving Document Space, Computer Science 221 Term Project, Cornell University, Spring 1968.

]

XI-16

Appendix

The Normal Precision and Normal Recall measures shown in Table 1 are global measures. It is often also helpful to visualize the effects of

retrieval runs with the use of non-global measures. The usual recall and precision measures used for this purpose are defined as:

No. of relevant documents retrieved
D p /~i -a T l
— • ._ .

No. of relevant documents

, .

No. of relevant documents retrieved No. of documents retrieved

The measures are computed as each document is retrieved for a given query. In the test runs, a record was kept of the first ten documents retrieved for each query. The following graphs show recall-precision

measures for four randomly chosen queries with respect to the first ten documents retrieved in the modified and the unmodified document spaces. For each query, three graphs are included, indicating the effects of document space modification by different values of alpha.

XI-17

Query 47 Modified a = .05 Unmodified

J i L j_ J i ' .2 .3 .4 .5 .6 .7 .8 .9 1.0 Recall
Fig.
Al

Query 47 Modified a = .10 Unmodified

J I I I I i L .2 .3 .4 .5 .6 .7 .8 .9 1.0 Recall
Fig.
A2

XI-18

1.0

Query 47 Modified a = .25 Unmodified
i • /1

.9 .8
c-7 •w . 6 2
I- •

£.4
.3 .2 .1

i
[-

J

I

I

I

I

I

I

I

L

.2 .3

.4 .5 .6 .7 .8 .9 1.0

Recall
Fig. A3

1.0

<

.

Query 115
— KAr\r\\f ioH ———^— l Y l U U I I l C U

.9 .8

•2.6
" 5
_ . - '
i

c

.7

a = .05 Unmodified

£.4
.2

.3 -

'

%

> > <> ^ t

•
1 1 1

I--1 1 L_

.1 1 1 L_ J

2 .3 .4 .5 .6 .7 .8 .9 1.0

Recall
Fig. A4

XI-19

Query 115 Modified a = .1 Unmodified

I

I !

!.

J I I L .2 .3 .4 .5 .6 .7 .8 .9 1.0
j I
L
j_

Recall
Fig.
A5

•

Query 115 Modified a = .25 Unmodified

• ' \

I
j i
JL

I t.
J I I I L .4 .5 .6 .7 .8 .9 1.0

L

.2 .3

Recall
Fig.
A6

XI-20

Query 17 Modified a = .05 Unmodified
i l -

•

]AA
/:
i

.4
• |
3

/

•2r

!

I L I .2 .3 .4 .5 .6 .7 .8 .9 1.0
j i
L

x J

Recall
Fig.
I•

A7

,k- •

Query 17 Modified

A
i

a = .1
Unmodified

I I I L .2 .3 .4 .5 .6 .7 .8 .9 1.0
-L

-L J

Recall
Fig.
A8

XI-21

Query 17 Modified a = .25 Unmodified

A
j
J
L

-L

J

L

-L

J_

.1 .2 .3 .4 .5 .6 .7 .8 .9 1.0

Recall
Fig.
A9

Query 99 Modified a = .05 Unmodified

A.

-L

J

I

I

I

I

L

I .2 .3 .4 .5 .6 .7 .8 .9 1.0

Recoil
Fig.
A10

Query 99 Modified

a =.l
Unmodified

I I

i

i ' I I I i I I L .2 .3 .4 .5 .6 .7 .8 .9 1.0

Recall
Fig.
ih-

All

•

•

Query 99 Modified a= .25 Unmodified

i

j

i

L

j

I

i

i

I

L

I .2 .3 .4 .5 .6 .7 .8 .9 1.0

Recall
Fig.
A12