XI-1 XI. Document Indexing Based on Relevance Feedback T. L. Brauen, R. C. Holt and T. R. Wilcox Abstract A relatively simple method is proposed in this study for using accumulated relevance feedback information in an automatic information retrieval system to improve precision and recall. This method involves the dynamic modification of relevant document vectors after each retrieval run. Experiments on a collection of 425 documents indicate More that this method does improve precision and recall significantly. experiments must be conducted to perfect the method for large scale practical applications. 1. Introduction User feedback techniques have been developed in an attempt to achieve higher precision and recall in automatic retrieval systems such as SMART. The user gives the system an indication of the relevance to his This information about what the query should query of certain documents. retrieve is used by the system to modify query and document vectors so that the desired documents are retrieved. It is hoped that other documents in the same area of the document space will also be relevant to the query. Current proposals use this relevance information only in the processing of one query. There is, however, reason to believe that this information will also be helpful in the processing of similar queries XI-2 from other users. It seems quite reasonable to assume that a number of knowledgeable users submitting similar queries may select roughly the same set of documents as relevant to their query. If this is indeed the case, it will be of advantage to the system to save relevance information for each of the processed queries. This information can be used as a first approximation to future users1 responses in the feedback process. As the relevance information for a given type of query builds up, this first approximation should become a very good one and thus reduce the number of feedback iterations necessary to satisfy the users' requests. The time saved can be used to continue searching for more relevant documents, thereby improving the recall. Given that feedback information should be retained by the system, it is clear that normal query modification will not achieve this aim since queries are not a permanent part of the system. It is, therefore, natural to focus one's attention on the document vectors since they are a permanent part of any retrieval system. Some work has been done in the area of document vector modification. Friedman, Maceyak and Weiss [1] warp the documents closer to the query and thus improve the chances of a better recall performance. is, however, only temporary. This modification Davis, Linsky and Zelkowitz [2] also modify The modification the entire document space in an effort to improve recall. is permanent. These methods are disadvantageous in that they are time consuming, the modifications being complicated, and requiring the manipulation of the entire document space. These two methods would probably not be feasible in a large retrieval system. XI-3 2. Method The problem is to find a method which retains the feedback information, but which is practical in the sense that it uses limited amounts of time and space regardless of the size of the document collection. procedure. To achieve these ends the authors propose the following The original query vector q is repeatedly modified, using any of the existing relevance feedback techniques, until a query vector q is obtained which retrieves a set of documents D acceptable to the The terms and weights of the document vectors in D are now d, x in D user. modified to decrease the angle between each document vector and the original query vactor (formula (1)) q . Since the cosine correlation n cos (q j=i V j=i j=i (i) " 1 4- \-\ (where d. denotes the j term weight of vector d.) and d., o x a decrease in the angle between vectors increases their correlation. For the original query vector q and its relevant documents d, is used to determine the correlation between the vectors q in D defined above, the document modification process takes place in two steps. For each relevant document vector, the query vector is first normalized to the "length" of the document vector by making the sum of weights of q equal to the corresponding sum for d.. If A is the XI-4 sum of the weights of the query vector, and if A is the sum for the document vector, then the normalized query vector is defined as: a o = q o (A./A ) . d q d. (2) The weights of the relevant document vector according to the formula: are then modified d? = d? + 1 1 a(cp - d?) ^o I where 0 norm i=l log r ± - ) loq r. - \ i«l log i) / log (Nl/tN-n) In!) | log i) / log (N'/(N-n) Jnl)! (4) XI-8 and normalized recall is given by n R = 1 - (l/n)(\ norm / i-1 These normalized measures are used because they reflect the rankings of the relevant documents over the whole collection rather than rankings up to some arbitrary point. This eliminates the need for a subjective (r. - i) / (N-n)). 1 (5) choice of cut-off point which may bias the results. When all queries from group two have been processed, an average normalized precision and recall are computed for all queries in the group. The second group of queries is also correlated with document vectors in the original, unmodified document vector space, and the same normalized precision and recall measures are computed. The average global precision and the average global recall results for the two sets of correlations are compared using a t-test. From this, it is determined whether the differences in the measures observed over the two sets of runs are significant. 4. Experimental Results A number of experiments were performed using the Cranfield 200 collection. In each run, the set of 42 queries was divided into a group of 37 queries, used to modify the document vectors, and a group of 5 queries used to test the effect of these modifications. Comparison of normalized recall and precision results for the 5 testing queries submitted to the modified and unmodified spaces shows no significant change in performance. These results are caused by the small size of the collection. With 37 queries modifying about 4 relevant documents each, each document is modified an average of 0.75 times. Thus, significant modification to the document XI-9 vector space is not achieved, and results are much the same as those for the original document vector space. have overlapping relevant documents. It also appears that few queries As a result, modifications by the first 37 queries have only a small effect on the relevant documents of the 5 test queries. A number of experiments performed using the Cranfield 400 collection produces more favorable results. In each run, the set of 155 queries was partitioned into a group of 124 queries, used to modify the document vectors and a group of 31 queries, used to test the effect of the modifications. Experiments were run using two randomly generated partitions, designated A and B and four values of a, ranging from 0.05 to 0.4. In one special experiment, designated III-A', modifications were made to a null space in which all document vectors were initialized to zero, instead of to the Cranfield document vectors. these experiments are shown in Table 1. The results of ; Run No. a -i ; i I K Ave Normed P r e c i s i o n H Ave Normed R e c a l l Mod % % Unmod Mod 1 Unmod S p a c e S p a c e Change Space S p a c e Change .6274 .6274 .5434 .6294 .6788 .7073 .5766 .7182 +8.2 +12.7 +6.1 +8.1 .8891 .8891 .8439 .8891 .9100 .9231 .8624 .9309 +2.4 +3.8 +2.2 +4.7 IA IIA ! ; I IB IIIA ^ ^ IIIB IVB .05 .10 .10 .25 s&\ 1 yMf 3 4 .54 .25 / & & .5960 .40 1 . 5 4 3 4 . 5 9 6 3 /A*A/7 +9. +9,8 | \/M9i,'j&iffy 1 y/A/y/A +2.8 .8439 .867 .8439 .8595 +1.8 ] Normalized Precision and Recall Results Table 1 * For run IIIA' modifications were made to the null space. XI-10 For the first part of this discussion of results, special experiment IIIA' is ignored. In all runs, the average normalized recall and precision values for the modified document space are considerably higher for the modified document space than the corresponding values for the unmodified space. A t-test applied to the results presented in Table 1 indicates a Standard deviations over significance level of 0.01 or better in all cases. the global precision and recall values for the 31 testing queries were calculated for each run. The standard deviations after the modification This indicates that the modifications were essentially the same as before. provided a uniform improvement across the document space. As can be seen from Table 1, varying a from 0.05 to 0.4 has no significant effect on the amount of improvement. It appears that an in- crease of a from 0.05 to 0.4 produces two conflicting effects on the search results. For small values of a only, small changes are made in relevant document vectors and therefore the modifications have only a small effect on the search results. The fact that significant improvement is nevertheless achieved indicates that the modifications are mostly beneficial to the test queries. In the case of large a rather drastic modifications are made to the document vectors, and therefore the modifications have a large effect on the retrieval performance. At the same time, the drastic extent of the modifications causes a more sporadic and less consistent improvement. That is, beneficial modifications become more beneficial to the document space while any non-beneficial modifications become more harmful. The net result is that variations in a in the experiments had little effect. XI-11 The normalized recall values for the unmodified document space, as given in the table, are close to 0.9. Since the maximum possible global recall is 1.0, a large improvement in the measure cannot be expected. In contrast, the global precision for the unmodified space In the is about 0.6, leaving room for considerable improvement. experiments, normalized recall is improved by three per cent on the average while normalized precision is improved by nine per cent on the average. It would appear that this difference between 3 and 9 per cent reflects characteristics of the measures and the collection rather than a characteristic of the method. The special experiment IIIA1 is different from the rest in that the document vectors in the modified space are derived entirely independently of the Cranfield document vectors. Experiment IIIA1 is identical with experiment IIIA, except that a dummy document space is substituted for the Cranfield document space. The effect of zero document vectors is simulated in the dummy space by initializing the document vectors to a single large term weight which is not used in the Cranfield collection. This simulation is necessary so that during Using the normalization, the query vectors would not be set to zero. normalized recall measure in experiment IIIA1, the performance of the document vectors derived solely from the 124 modifying queries is seen to be surprisingly good — only 8.4 per cent below the performance of the unmodified Cranfield document vectors. One is tempted to hypothesize that, given a few hundred more queries, the performance of the original Cranfield vectors would be surpassed. are necessary to test this hypothesis. Experiments with larger collections XI-12 The most important overall result of these experiments is the fact that in all cases the modified Cranfield document vectors performed significantly better than the unmodified vectors. From Table 1, improvements in normalized precision range from 6.1 to 12.7 per cent, while improvements in normalized recall range from 1.8 to 4.7 per cent. This seems to indicate that in an actual automatic retrieval system, where a very large number of queries are available for modifications and a can be made small, this method could be of practical value. These experiments tend to support the assumption that over a period of time similar query vectors can be expected to be received requesting similar sets of documents. 5. Discussion Several interesting aspects of this method should be mentioned. As non-zero term weights are added to a document vector, the space required to represent the vector increases. In the experiments described here, the modifications added enough non-zero terms so that the storage required for the document space increased by over 25 per cent. In an operating information retrieval system, a routine could be used to compress the modified document space by setting small terms to zero. With continual modification of the document vectors, the original indexing information contained in the document vectors is lost. Equation C3) can be rewritten as follows: d? - tl-ot) d? + aP i i ^o . (6) XI-13 From equation (6) it is obvious that after one modification, a document term d. contains the fraction (1-a) of the original document term information (d.) and the fraction a of query term information (q ) . Regardless of the values of the original document vector and the relevant queries, the fraction of original indexing information remaining in a document vector after n modifications is (1-a) . For example, in experiment IVB, where a = 0.4, the fraction of original indexing information remaining in a document vector after 2 modifications to this document vector is (1-0.4) 2 = 0.36. In an operating information retrieval system, a should be much smaller than 0.4, possibly on the order of 0.01, so that a few "noisy" queries cannot seriously damage the document space, i.e., so that the indexing reflects the judgment of a large number of users. In the Cranfield 400 collection, each query is associated with about 8 relevant documents. Since 124 queries were used for modification, about On the average, then, each document was 992 documents were modified. modified 2 to 3 times. Thus, for a = 0.4, the preceding argument indicates that less than about 0.36 of the original indexing information remains after the modifications. Thus, in experiment IVB, normalized precision and recall were improved by replacing most of the original indexing information in each document vector by information derived from 2 or 3 queries. It may seem disturbing that the information in the original document vectors, which may have been constructed at considerable expense, may eventually be lost to the system. If desired, a simple modification to the proposed method, such as flagging the original document term weights, could insure that these terms would remain intact. However, the advantage of saving terms which are seldom if every used is not obvious. XI-14 The results presented above suggest the following areas for research and application. Information retrieval centers might collect queries with corresponding relevant documents. This information could form the basis for future indexing schemes. The proposed method might be used in a bootstrapping operation. A minimal and inexpensive technique is used for the initial indexing of a document collection. As the system handles more and more queries, the indexing is constantly corrected in a direction which improves performance. Similarly, in a field where the vocabulary is changing, correction can automatically be made in the direction of the new vocabulary. Further research is required to see which of these suggestions is of practical value. References ] Friedman, S. R. , Maceyak, J. A., and Weiss, S. F., A Relevance Feedback System Based on Document Transformation, Report No. ISR-12 to the National Science Foundation, Information Storage and Retrieval, Department of Computer Science, Cornell University, Ithaca, N. Y., Section X, June 1967. Davis, M., Linsky, M., and Zelkowitz, M. , A Relevance Feedback System Employing a Dynamically Evolving Document Space, Computer Science 221 Term Project, Cornell University, Spring 1968. ] XI-16 Appendix The Normal Precision and Normal Recall measures shown in Table 1 are global measures. It is often also helpful to visualize the effects of retrieval runs with the use of non-global measures. The usual recall and precision measures used for this purpose are defined as: No. of relevant documents retrieved D p /~i -a T l — • ._ . No. of relevant documents , . No. of relevant documents retrieved No. of documents retrieved The measures are computed as each document is retrieved for a given query. In the test runs, a record was kept of the first ten documents retrieved for each query. The following graphs show recall-precision measures for four randomly chosen queries with respect to the first ten documents retrieved in the modified and the unmodified document spaces. For each query, three graphs are included, indicating the effects of document space modification by different values of alpha. XI-17 Query 47 Modified a = .05 Unmodified J i L j_ J i ' .2 .3 .4 .5 .6 .7 .8 .9 1.0 Recall Fig. Al Query 47 Modified a = .10 Unmodified J I I I I i L .2 .3 .4 .5 .6 .7 .8 .9 1.0 Recall Fig. A2 XI-18 1.0 Query 47 Modified a = .25 Unmodified i • /1 .9 .8 c-7 •w . 6 2 I- • £.4 .3 .2 .1 i [- J I I I I I I I L .2 .3 .4 .5 .6 .7 .8 .9 1.0 Recall Fig. A3 1.0 < . Query 115 — KAr\r\\f ioH ———^— l Y l U U I I l C U .9 .8 •2.6 " 5 _ . - ' i c .7 a = .05 Unmodified £.4 .2 .3 - ' % > > <> ^ t • 1 1 1 I--1 1 L_ .1 1 1 L_ J 2 .3 .4 .5 .6 .7 .8 .9 1.0 Recall Fig. A4 XI-19 Query 115 Modified a = .1 Unmodified I I ! !. J I I L .2 .3 .4 .5 .6 .7 .8 .9 1.0 j I L j_ Recall Fig. A5 • Query 115 Modified a = .25 Unmodified • ' \ I j i JL I t. J I I I L .4 .5 .6 .7 .8 .9 1.0 L .2 .3 Recall Fig. A6 XI-20 Query 17 Modified a = .05 Unmodified i l - • ]AA /: i .4 • | 3 / •2r ! I L I .2 .3 .4 .5 .6 .7 .8 .9 1.0 j i L x J Recall Fig. I• A7 ,k- • Query 17 Modified A i a = .1 Unmodified I I I L .2 .3 .4 .5 .6 .7 .8 .9 1.0 -L -L J Recall Fig. A8 XI-21 Query 17 Modified a = .25 Unmodified A j J L -L J L -L J_ .1 .2 .3 .4 .5 .6 .7 .8 .9 1.0 Recall Fig. A9 Query 99 Modified a = .05 Unmodified A. -L J I I I I L I .2 .3 .4 .5 .6 .7 .8 .9 1.0 Recoil Fig. A10 Query 99 Modified a =.l Unmodified I I i i ' I I I i I I L .2 .3 .4 .5 .6 .7 .8 .9 1.0 Recall Fig. ih- All • • Query 99 Modified a= .25 Unmodified i j i L j I i i I L I .2 .3 .4 .5 .6 .7 .8 .9 1.0 Recall Fig. A12