XII-1 XII. Query Splitting in Relevance Feedback Systems A. Borodin, L. Kerr and F. Lewis Abstract A modification of normal relevance feedback is presented. It is suggested that instead of simply modifying a query on each iteration, two or more new queries may be formed. Evaluation of experimental results A new measure is introduced shows this method produces some improvement. which permits extrapolation of these results to large document collections. With regard to this measure, query splitting appears to be best suited for large systems. 1. Introduction Query splitting is an extension of the standard relevance feedback procedure. Instead of simply modifying a query on each iteration, two or Such a procedure often provides improved more new queries may be formed. results. Consider the example shown in Fig. 1. Q is the original query. If a simple feedback algorithm moves the query toward one of the groups of relevant documents, then the relevant documents in the other group will not be retrieved. may not move significantly to either group. replace Q with two queries, Q and Q . In fact, the query Ideally, one would like to The user then makes relevance and Q , and a relevance feedseparately, producing new judgments on the documents retrieved by Q back algorithm can be applied to Q queries Q and Q . and Q XII-2 x A represents relevant documents represents queries Query Splitting Example Fig. 1 XII-3 Query splitting is beneficial when the relevant documents are located in distinct groups in the document space* merely hypothetical. one topic. Such a grouping is not User requests may be general and deal with more than Moreover, the structure of the document space is based on predetermined correlation judgments which do not necessarily reflect the user's conception of relevance. Query splitting is not always required, of course, and indiscriminate use of the method may lead to inefficiencies. Part of the query splitting strategy, therefore, is to decide when a query should be split. 2. The Query Splitting Algorithm As mentioned above, query splitting is embedded within a relevance feedback system. The system first reads in the user's initial query, Q . New queries are then formed iteratively, as follows: 1. The documents in the collection are ranked according to their correlation with Q.. 2. The five highest ranking documents, not previously retrieved, are presented to the user for relevance judgments. 3. The user indicates to the system which of the retrieved documents were relevant. 4. The system examines these relevant documents to see if they form distinct groups. This decision is based on the document-document correlations of the relevant documents, relative to the average correlation between Q. and the first five documents retrieved x XII-4 by Q. . If a document-document correlation does not exceed a constant (TP) times the average query-document correlation, then the two documents are considered to belong to separate groups. In this way, the system clusters the relevant documents into zero (if no relevant documents are retrieved), one, or several groups. For each group j a new query Q, is formed according to the relevance feedback formula Q i+i = QI + 2_> • YJ " * where {r } represents the relevant documents in the group and {n } represents the two highest ranking nonrelevant documents retrieved by Q.. If Q. retrieved no new relevant documents, then just one new query is formed, using the negative feedback formula, Q i+i = Q -•L i - ; k where {n } again represents the two top nonrelevant documents. 6. The above steps are then repeated for each of the new queries separately, with the exception that in step 2 only three documents are retrieved from each split query. This avoids a proliferation of retrieved documents when query splitting occurs. The query splitting algorithm used in the experimental analysis was not as specific as the one presented above. It allowed for more extensive query splitting as well as a more general feedback formula. A description of the program is given in the Appendix. The algorithm described above was derived by using the strategy which gave the best results after trying a number of different parameter values. XII-5 3. Evaluation and Results In order to implement and test the query splitting strategy, a program was written to simulate the feedback portion of the SMART system. The Cranfield thesaurus collection of 200 documents and 42 queries was used to compare various strategies. Final evaluation is provided by a comparison of the performance of relevance feedback with and without the query splitting strategy. Twenty-four of the forty-two queries in the Cranfield collection produced some query splitting (i.e. two or more relevant documents were retrieved on some iteration). Evaluation is restricted to these twenty- four queries, since for the other queries, regular feedback and query splitting perform identically. The following methods were compared: 1. 2. 3. Regular feedback. Query splitting with the threshold parameter TP = 1.5. Query splitting with TP = 0.75 (results in fewer splits). One possible measure of performance is a "user measure" in which recall and precision values are determined from the order in which the user receives the documents. difficult to establish. Conventions for such ordering are not Table 1 below shows the improvement over regular feedback in the number of relevant documents retrieved, as a function of the number of retrieved documents. differences appear are listed. Only the queries for which such These results tend to favor query splitting, especially for larger numbers of retrieved documents. XII-6 Increase in no. of rel. doc's retrieved Query No. 5 3 5 7 13 15 16 28 31 35 36 38 +1 +1 +1 +1 . -1 -1 +1 -1 -1 +2 TP = 1.5 No. of doc's retrieved 10 15 20 25 +1 5 TP « 0.75 No. of doc's retrieved 10 15 20 25 -1 -1 +2 +2 +1 +2 +2 +2 - -1 -1 Total Improvement 0 -2 -2 +2 +3 0 0+1+1+2 The User Measure Table 1 XII-7 Another measure, better suited for evaluating overall system performance, is obtained as follows: recall and precision values are computed for the ranking of documents in each iteration, considering previously retrieved documents to be removed from the document space. The recall-precision curves averaged over 24 in Figs. 2 and 3. search requests are shown Once again, general improvement for 0.75 query splitting is illustrated, while improvement for 1.5 query splitting is restricted to the higher recall range. There exist two reasons for believing that the previous results tend to be meaningful: 1. The Cranfield 200 is a small homogeneous collection which provides neither the diversification of document space nor the user population which leads to the type of general request mentioned in the introduction. 2. The measures used, especially the user measure, are too "sensitive" to the large number of documents retrieved in proportion to the size of the document space. Fig. 4 illustrates how sensitive Query 7 is to the number of retrieved documents per iteration. Gross differences in recall and pre- cision occur depending on whether five or ten documents are retrieved after the first iteration. Moreover, Fig. 4 shows that Document 95 would never be retrieved with regular feedback procedures in a large collection if the document rankings were kept proportionate. This is in contrast to The document- the performance of query splitting illustrated in Fig. 4b. document correlations for the relevant documents of Query 7 are shown in Fig. 5. XII-8 A 0.5 L. h r" 0.4 h Split with 1 Regular • 2nd Iterate o 3rd Iterate A 4th Iterate (gter^-——. N> ^ ^X^ ^ >. ^ ^ "*~"~*-A \ ys. \ \^^. \ X c 0.3 o o CO \x r~ \. X \ \ V\ S \ £0.2 \ 1 1 N # \ \^ « & O \ ^V \ \ *\ V \ \ 1 1 \ \ v . tA 1 T T 0.1 h h A ° i i" V L-*. 1 0.5 0.6 0.7 Recall 0.8 0.9 Query Splitting (TP = 0.75) vs. Regular Feedback (averages over 24 queries) Fig. 2 XII-9 0.5 0.4 Split with TP = I.5 Regular • 2nd Iterate o 3rd Iterate A 4th Iterate J 0.3 O £0.2r 0.1 h j. _L 0.5 0.6 0.7 Recall 0.8 0.9 Query Splitting (TP = 1.5) vs. Regular Feedback (averages over 24 queries) Fig. 3 XII-10 Rank 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 35 38 44 50 66 92 130 41* R 100* 90* R 111* 11* 45 110 127 104 192 71 159 42 R 76 133 185 176 83 196 156 72 R 0 95 R 0 0 0 0 Iteration Number 2 41 R 90 R 156* 91* 96* 199* 29* 60 23 109 72 R 95 R 193 42 R 56 155 11 188 141 184 0 0 0 0 0 0 0 3 100 127* 187* 41 R 196* 24* 128* 72 R 103 39 26 17 170 99 84 154 104 158 83 69 0 0 0 95 R 0 42 R 90 R 4 90 R 42*R 11 41 R 199 156 188* 45* 111 100 29 173* 39* 104 192 71 159 176 184 76 0 72 R 0 0 95 R 0 0 Rank Iteration 1 Split Subqueries 2 41 R 100 71* 111 39* 83* 25 84 110 29 155 127 45 156 92 114 153 23 192 90 R 72 R 0 0 0 95 R 0 42 R 0 0 0 0 0 0 3 90 R 91* 11 111 95*R 93* 110 94 192 195 159 109 104 100 76 96 121 199 176 82 0 0 41 R Q 42 R 0 0 72 R 0 0 Regular Feedback 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 24 25 29 35 37 44 59 60 79 86 89 102 137 41*R 100* 90*R 111* 11* 45 110 127 104 192 71 159 42 R 76 133 185 176 83 196 156 0 0 0 72 R 0 95 R 0 0 0 0 0 0 0 a 0 0 b) Indicates Retrieved Document Query Splitting (TP = 0.75) Sensitivity of Number of Retrieved Documents per Search Iteration Fig. 4 XII-11 A new measure is introduced which attempts to capture the notion of "relative improvement" between iterations. with initial rank R. Consider a relevant document Suppose one iteration of relevance feedback causes Then if subsequent iterations the document to attain a new rank of R/3. sustain this "rate of convergence", the number of iterations required to retrieve the document is the least i which satisfied (1/3) 1 R < n _ where n is the number of documents given to the user. r is 1/3. For this example, the rate of convergence In the case of query splitting, the descendents become independent queries, each with its own rate of convergence for any previously unseen relevant document. of convergence. The best of these rates is taken to be the true rate This is justifiable, since query splitting is designed to iterate towards individual groups, and the document need only be retrieved by one of the split queries. However, to be consistent with regular feedback, a document is considered to be retrieved only when its rank becomes less than n/S where S is the number of queries into which the original query was split. weak medium strong Query 7 Correlations Between Relevant Documents of Fig. 4 Fig. 5 XII-12 Formally, let r be the expected rate of convergence, and assume that r is maintained throughout all iterations. This assumption may not be completely valid; however, it was found, experimentally, that r is maintained at least as well for query splitting as it is with regular feedback. A document with initial rank R will be retrieved within (r)1- R £ n/S where S is the expected number of queries generated in the splitting process. Algebraic manipulation yields log (R/n) + log S - log r i iterations if Since ment r < 1, log r is negative. M = - log r. This motivates defining a rate of improve- From the experimental results, values of r and S were obtained for each query, for the three methods. Then parameter K1 = log (R/n). i was computed in terms of a i were averaged over the These values of seventeen queries of the original twenty-four which did not retrieve all relevant documents on the first iteration, and the average , I, was plotted against K1 for each method. The results are shown in Fig. 6. Since I (the average number of iterations) is an indication of retrieval effort, the best method is the one which yields minimal values of I. The curves of Fig. 6 show that query splitting is not beneficial unless K1 > 2.2, i.e. R/n > 150. Since R is the same order of magnitude as the size XII-13 average number of iterations I I , 5 = 4.40K' 4 1.79 0 . 7 5 s 4 - 9 0 K ' +0.70 = 5.21 K * -0.75 split regular feedback I I 2 3 K' (= log R/n) Expected Iterations Required for Relevant -L-^K' Document Retrieval Fig. 6 XII-14 of the document collection, this means that in large collections/ query splitting would probably produce better results than those obtained with the Cranfield 200, for which K' * 1. 4. Conclusions and Suggestions for Further Research As in all systems, one must weigh expected gains against the cost of a proposed improvement. Although the query splitting algorithm is easy to implement, the cost of the increased computing time may prove prohibitive. However, the algorithm can be readily modified for use with In this form, it should be realistically con- clustered document spaces. sidered for use with the large collections for which query splitting appears to be most appropriate. The following topics are suggested for further research: 1. Some scheme for dropping nonproductive queries would result in improved precision and a reduction in computing time. 2. A potentially useful modification would consist in generating by negative feedback an additional query on each iteration. Such a strategy might retrieve documents which could not be retrieved in any other way. These queries would have a higher probability of being nonproductive, however, so this modification should be implemented in conjunction with suggestion 1. 3. A better understanding of the topology of relevant documents in the document space might lead to more complex but more efficient query splitting strategies. XII-15 References [1] S. Friedman, J. Maceyak, S. Weiss, A Relevance Feedback System Based on Document Transformation, Information Storage and Retrieval, Report ISR-12 to the National Science Foundation, Section X, Department of Computer Science, Cornell University, June 1967. H. Hall and N. Weiderman, The Evaluation Problem in Relevance Feedback Systems, Information Storage and Retrieval, Report ISR-12 to the National Science Foundation, Section XII, Department of Computer Science, Cornell University, June 1967. E. Ide, User Interaction with an Automated Information Retrieval System, Information Storage and Retrieval, Report ISR-12 to the National Science Foundation, Section VIII, Department of Computer Science, Cornell University, June 1967. J. Kelly, Negative Response Relevance Feedback, Information Storage and Retrieval, Report ISR-12 to the National Science Foundation, Section IX, Department of Computer Science, Cornell University, June 1967. G. Salton, Search and Retrieval Experiments in RealTime Information Retrieval, Technical Report 68-8, Department of Computer Science, Cornell University, February 1968. [2] [3] [4] [5] XII-16 Appendix A Brief Introduction to the FALTER System 1. Introduction The FALTER system was developed as a programming device used to test the feasibility of query splitting. In order to thoroughly check this idea, a general purpose experimental system was required. Flexibility, simplicity, and modularity are maintained throughout the system. main program. It consists of ten independent subroutines controlled by a The interfaces between the routines are exceedingly simple so that modifications and substitutions can be made easily* Every aspect of the relevance feedback portion is input controlled. Also, the amount of output can be indicated through input parameters. These features will be described below. 2. General Algorithm The FALTER system is built on a skeleton of relevance feedback, pro- ducing new queries according to the formula: na Q. . = 7TQ. + 0 0 + 3 w i+l *i *o lna nb Elnb A method of query splitting has been added to the standard relevance feedback mechanism. This is done by first clustering the relevant documents retrieved by the query using the correlation coefficient for this purpose. The above formula is applied to produce the new queries taking one cluster at a time. XII-17 3. System Operation The main portions of the system are indicated by the flow chart of Fig. 7. Some of the options in the system are described below: a) All of the constants (ALPHA, MU, NA, etc.) are read in by the program. b) The numbers of iterations desired, documents retrieved, queries processed are read in also. c) d) Splitting is controlled by input parameters. Three levels of output are provided as a trace feature. These levels are selected by input. A listing of the routines follows: 1. Main Program This routine controls operation, input and output. Variables involved: PI, ALPHA, MU, OMEGA, NA, LNA, NB, LNB are feedback constants NUMIT - number of iterations IRUN - number of queries NUM - # docs retrieved/iteration HINUM - # docs retrieved without split LONUM - # docs retrieved/query on split NNUM - length of output-chart ISW = 0 1 2 positive feedback only standard split - no negative regular feedback - no split -1 final output 0 slight trace 1 much output ITRACE = TP - controls split (set to 0.75) NEWQRY This routine reads in information about a query. TITLE - the query itself NUMSEQ - its sequence # (1,2,3,etc.) NUMCOL - its "collection #" CONCPT - the concepts WGHT - their weights SUMSG - sum of weights squared NUMREL - # of relevant documents RELDOC - the relevant documents INDOCS This routine reads the documents in to a structure as below: ISPACE \ Sequence # y # concepts concept 1 1 weight 1 . IDISP 1 • • . I \ Sequence # # concepts concept • « • These are the document space and display table SOCSQ - sum of weights squared for documents XII-19 CORLAT A document-document correlation according to: z)V < d. d! 1 1 COSINE A query document correlator according to: L d. • q. l ^i L< "L< The result is a list of the document numbers: DOCNUM and the correlations: CORR SORT 2 Simple n /2 sort on CORR and DOCNUM PUTOUT Output chart LIST made up from DOCNUM with correlations in ACORR and relevance indication in ISREL JJREL Checks the relevance of a document and returns an R where applicable FEDBAK Performs vector modification and constructs new query UANLIZ The relevant and irrelevant documents retrieved are indicated in IREL and JREL and they are noted in GOTIT CONSOL Sorts part of output array STRTGY The splitting routine which determines the query structure for this iteration using clustering operation REGFDB Set up constants CONST Set up constants EVAL Executive routine for system which merges split queries into a composite query, and computes recall and precision values