IX-1 IX. The Use of S t a t i s t i c a l S i g n i f i c a n c e in R e l e v a n c e Feedback J. Steven Brown and Paul D. Reilly Abstract A new approach tically feedback shown significant iteration to relevance f e e d b a c k , the s t a t i s presented; concepts concept (SSC) a p p r o a c h , is queries are constructed significant documents in using to be statistically from n o n r e l e v a n t and query vectors the differentiating than using entire relevant document types for testing are rather as e n t i t i e s . Three new query of testing SSC are p r e s e n t e d , and are g i v e n . the results these queries The experimental to queries found to be approximately equivalent Rocchio-type evaluation exponential is recommended methods in results p r o d u c e d , regardless a newly developed of the frozen study criterion ranking and (including factor [FERF]) used, but future are courses of investigation outlined. 1. Introduc t ion One of the major problems which an information of the re- trieval system must solve is the d e t e r m i n a t i o n the correspondence which the user between a given query and really wishes to o b t a i n . information the user Often IX-2 ^ supplies a request which for precise One method is too i n a c c u r a t e , too b r i e f , or of the documents too poorly worded to his n e e d s . a document during retrieval for relevant of improving the performance items found retrieval system is to display a preliminary to score search of the document as either files and relevant to ask or n o n query the user relevant these documents The system to his query. then generates a new and by combining the information from these judgments from the known c h a r a c t e r i s t i c s bibliographic (words used, ideas expressed, retrieved. (]_ 2, 3 e n t r i e s , etc.) of the documents them that Several a l g o r i t h m s , among and that of R. Crawford this of J. J. Rocchio ( 4 ) , have been feedback. and H. Melzer of relevance develope to address technique Nearly all of the relevance feedback experimentation formula cited to date has utilized by Crawford the general query update and Melzer ( 4 ) : 3 ( ) , = a O . + 3 Q n + Y^ R. + 6£ - + £ w -d. + l+l i 0 ' i . - i . - i i i=l i=l i=l N N 1 N 2 N (1) N . I v,#c. - 1 . 1 , where th Q. = H query at l J l le i=l where Q. - query l at i iteration returned returned R. = relevant l documents N. = nonrelevant documents d. = vectors of a set of documents considered as " e n v i r o n m e n t " IX-3 c. = v e c t o r s imposed a of concepts showing relationships w 9 $> Y> 5> -> v. = w e i g h t s Table 1 details the c o n d i t i o n s of some of the of v e c t o r s as experiments. indivisible as a term, In each a p p r o a c h , a c o m b i n a t i o n entities with (that i s , the entire vector individual is used D O use of only The p a r t s ) is utilized. paper select investigation reported in the present tests to considers concepts Concepts relevant the effect of using statistical to be m a n i p u l a t e d shown and in relevance feedback algorithms. between to be significant in d i f f e r e n t i a t i n g are used (Table to nonrelevant documents forms construct one of three different trieval p e r f o r m a n c e this statistically feedback query 2) whose r e for to is then tested. significant is based The rationale concept (SSC) approach relevance on the following hypotheses: (1) only The user bases his relevance a few of the concepts present judgments in each on document; (2) The small set of concepts which the user represents than query; employs in his selection more accurately in which he is interested set of concepts the i n f o r m a t i o n does the total in the search (3) Those concepts which the user by a considers important analys i s . can be determined statistical IX-4- a Ide (5) Riddle, Horwitz, Dietz (6) 1 Crawford, Melzer (4) Rocchio (1,2,3) 1 0 0 1 6 0 l 0 0 Y 1 1 1 6 0 0 0 w. l v . l 0 0 0 0 0 0 0 1 / N 1 -1/N 2 0 Experimental conditions Table 1 Positively significant and negatively significant cone ep t s Nonsignificant concep t s Concept-correlated query Nonsignificant elements query Mean of relevant document concept values Mean of concept values for all documents 0 Remaining elements of original query, if any 0 Strictly significant query Mean of relevant document concept values Query Definitions Table 2 IX-5 The effort based find a statistical The basic depends on these premises is therefore to process which will SSC approach satisfy ( 3 ) . in this study relevance developed the on the c o r r e l a t i o n between to a given query, and (user-judged) a of a document concept has the w e i g h t s or particular in each retrieved (relevant nonrelevant) in Table with 3. document. One can think of the process v e c t o r s d_. as shown As is e v i d e n t , the document weights as necessary so that are are padded zero each vector has then aligned the weight the same the number of e l e m e n t s ; the v e c t o r s element concept d.. j so that = w.. r e p r e s e n t s in document from an i of (which numbering in C = is determined enumera}, tion of all concepts where C has n {c | c 6 d_. , i = 1, The column vector of concept ..., n S. —i elements). w.. jl then in contains as entries the w e i g h t s 6 c. I each document either binary each retrieved. or spectral The relevance vector (graded) relevance included R includes for judgments x) 0.4436 0.5495 0.7159 Correlation Cutoff Table 4 Cutoff levels of 0.8000, 0.6000, and 0.4000 were chosen for investigation since these values cover the confidence IX-10 level range fairly well, giving low and high confidence points as well as an average (0.6000) figure. The above justifications, then, indicate that after the correlations are performed, the cutoff can be used to determine which concepts are significant in the determination of the relevance of the documents retrieved. for which Those v. 1 v. > x (x = cutoff value) are termed posit ively v. < -x are termed s ignificant concepts, while those for which negatively s ignif icant concepts. signif icant concept s. Other v. are called non- In summary, the key idea underlying the SSC approach to relevance feedback is that document and query vectors are treated as strings of components (the individual conceptweight pairs) rather than as inseparable units. As noted earlier, both the Rocchio and Crawford-Melzer strategies deal with whole vectors, whereas the SSC-oriented methods break the relevance determination into finer levels. 2. Query Construction The investigation as executed made use of the queries outlined in Table 2, though other combinations of the information obtained by the methods described above are readily apparent. The queries investigated were chosen heuristically as being likely to yield fruitful results. IX-11 The first of correlated document the three query by using t y p e s , the the mean of conceptrelevant and other- q u e r y , is formed weights for positively significant concepts the mean of concept values wise. This construction for all retrieved documents is similar to the iteration query proposed component of by Crawford approach and Melzer [4] but differs by the described above and by the weights. from application Clearly, the the significance tests in determining concept-correlated query future (in document query may be farther original space) than d e s i r a b l e , and examine the attempt of a possible to lesseu this i n v e s t i g a t i o n might by using movement the vector composed the mean of concepts the relevant and the document w e i g h t s remaining query. for positively significant concepts of (i.e., n o n s i g n i f i c a n t ) original The nonsignificant a p o s s i b l e means by Ide elements query is intended as out within for approaching relevant the difficulty (pointed [7]) of mixed space: and nonrelevant documents document X A R X 1 X = nonrelevant A = query R = relevant document document X Separated Groups of Relevant Items Figure 1 IX-12 In this s i t u a t i o n , exemplified information about by the classic sample query requesting certain the a e r o d y n a m i c s of b i r d s , over- concepts of the query (aerodynamics) may greatly of the search; shadow others (birds) in the influencing elements query the n o n s i g n i f i c a n t only those elements is thus constructed query which the using of the original significance It seems test has feasible splitting search shown to be overshadowed that this (nonsignificant). type of query might be useful or in the final stages of an to boost in a queryiterative possible). query, algorithm (in an effort The third the recall as high as query type, the strictly significant is the diametric q u e r y , since original opposite of the n o n s i g n i f i c a n t includes only elements of the the former shown those concepts significant query to be positively of relevant (the mean is query toward of the concept weight used as the entry retrieved documents for each e l e m e n t ) . The use of this in the search in iteration will cleary what will the user judged thus effectively produce a shift relevant block on the previous i t e r a t i o n , and (in [4] have is the retrieval As Crawford of separated and Melzer document space) new m a t e r i a l . noted, h o w e v e r , this characteristic highly pleased with the results has value if the user of the previous retrievals. sparse The last query v e c t o r s two methods will in general produce and hence cannot always be used effectively; IX-13 this quality the concepts either of sparseness and (depending on the generality level) may cause of the c o r r e l a t i o n cutoff the retrieval of a great many of documents documents and the or the return of a very of small number disappearance study, inter- the vector on successive indicate that iterations. the set sum A further (or possibly h o w e v e r , may section elements strictly elements taken in the case of high retrieval q u e r y ) of those documents query from the nonsiginficant the retrieved the under significant and under nonsignificant than either method query may produce better results alone. 3. Conduct of the Experiment has been carried [8] and out using the word 42 (and form The study Cranfield 200 Collection the accompanying queries b e c a u s e manageably) this grouping provides a reasonably and large number of documents q u e r i e s , in time and view of practical facilities. of being against limitations on experimental has This c o l l e c t i o n the a d d i t i o n a l advantage ranked infor- composed queries of documents which have been relevance on a five-grade rather s c a l e ; this in m a t i o n was used runs than the binary judgments some in an effort to test whether would the use of finer the relevance distinctions appreciably improve IX-14- performance of the q u e r i e s . run, which was information conducted reThe 1, described The form of an experimental within trieval zeroth 2, and vectors query, the general system, was context of the SMART that of a t h r e e - i t e r a t i o n search. iteration was a full search, while 3 were performed (but the same using iterations one of the previously in all i t e r a t i o n s ) as the the SMART entry, iteration at four Interfacing with system occurred (2) main points: (1) parameter and initialization, docuThe routines, OURCON, (3) a c q u i s i t i o n m e n t s , and first while which MODQUE storage of v e c t o r s of retrieved of the new query v e c t o r . (4) c o m p u t a t i o n three points were covered by trivial m e c h a n i c a l by the routine the fourth point was accomplished effectively in creating assumed the role of the SMART subprogram queries. runs of the three (1) the effects cutoff level, query of In this c o n t e x t , then, several types were made the parametric (2) the effect as opposed each of the in an effort variation to determine of the c o r r e l a t i o n of the use of spectral d e c i s i o n s , and types shown relevance judgments of to binary (3) the p e r f o r m a n c e in Figure 1. three query 4. Experimental With Results to the cutoff l e v e l , it was observed that regard IX-15 for the computing facilities a v a i l a b l e , the 0.6000 figure ( c o r r e s p o n d i n g , as noted level query of about type in Table 2, to a confidence of the 0.05) was most suitable r e g a r d l e s s investigated. Experimental runs made using a several for 0.4000 cutoff hundred in all cases produced queries with c o n c e p t s , so that of processing the core storage soon exceeded required continuation that available. to The 0.8000 l e v e l , on the other hand, caused shrink n o t i c e a b l y , entirely struct during so that the query 7 out of the 4 2 queries iteration (the attempt vanished the first to c o n search. cutoff the first Because experimental query v e c t o r ) of the of this s i t u a t i o n , then, the 0.6000 runs. The query types was used in all p r o d u c t i o n w e r e checked spectral of for p e r f o r m a n c e schemes using both the binary a and the relevance (Appendix A contains in Table summary the spectral s c o r e s ) , and, as shown 5, no was differbinary appreciable detected. ences method d i f f e r e n c e between Queries 11 and the types of judgments consistent the 24 show the most in p e r f o r m a n c e , and provided Cases better in each of these cases than did the final ranking spectral be approach. judged in which the spectral method might superior (Q 1 3 , 1 6 , 28) had d i f f e r e n c e s confined document. to changes of two or less in the rank of a single IX-16 Number of Differences Query Type in Final Ranks of Relevant Documents between Binary and Spectral Schemes Number of Differences Affecting Order of Presentation of Document (iteration level) to User Strictly significant 0 0 8 Concept-correlated (Q 11, 13, 15, 16, 24, 26, 28, 42) 2 (Q 11, 24) Nonsignificant elements 2 (Q 11, 24) 1 (Q 24) Performance Difference between Spectral and Binary Relevance Table 5 IX-17 The following graphs of document level recall and precision constitute Figure 2. Figure 2a shows the performance (7), Figure of the Rocchio-type method characterized by equation 2b illustrates the behavior of the strictly significant query, Figure 2c shows the curve for the concept-correlated query, and elements Figure 2d details the performance of the nonsignificant query. All plots are for information obtained from averaged (excluding queries 6, 9, 12, 14, 21, results over 30 queries 23, 25, 29, 32, 33, 35, 36, in which all relevant documents were associated with the Cranfield 200 work form collection. In the graphs, a solid line denotes performance on the zeroth iteration (full search), a dashed line indicates performance on the first iteration, and a dotted line shows results of the second and third iterations (grouped together). Where two iterations map to the same point, the lower numbered iteration key predominates. IX-18 I.Oi 5 c .4 o ° 31 Rocchio 2,3 l 0.0 .5 .6 2(a) .8 1.0 Recall Fig. I .Or— 5h c o <> / a 0) -3 Strictly Significant Q_ • 2h ^••..2,3 6^ 0.0' 0.0 i" 1 .1 .4 .5 .6 .8 1.0 Recall Fig. 2(b) Recall Precision Graphs Fig. 2 IX-19 1.0 .5 .4 3 .2 r Concept - Correlated c o 0) Q_ 6~ 2,3 o.o1— 0.0 .1 .4 _L .5 -L .6 .7 .8 .9 1.0 Recall Fig. 2(c) 1.0 T a o Ah Nonsignificant Q_ 0.0 0.0 I .4 .5 .6 .7 .8 .9 1.0 Recall Fig. 2(d) Recall Precision Graphs Fig. 2 (contd) IX-20 The conclusion that the spectral relevance scheme is not advantageous agrees with what one might expect, since if the matter of binary relevance cannot be consistently judged by different individuals (this fact has been observed by Cleverdon, Mills, and Keen [8]), then the more subjective matter of degrees of relevance would provide a shaky basis indeed for feedback modification. For the above reasons the general evaluation of the three query types is carried out in the context of a 0.6000 correlation cutoff level and binary relevance grades. In addition, in twelve cases (Q 6,9,12,14,21,23,25,29,32,33, 35,36) all relevant documents were returned to the user on the zeroth iteration (the full search), and, in accordance (e.g., Ide [7]), with methods used by other investigators these queries are excluded from the evaluation. Plots of the document level recall-precision curves are shown in Figures 2b,2c, and 2d for the strictly significant query, the concept-correlated query, and the The curves nonsignificant elements query, respectively. are generally lower than the curve for a Rocchio-type strategy with similar feedback (Figure 2 a ) : N i R. (a=l,y=l, all other coefficients=0 in equation(1)) (7) Q.,- = Q. + £ i=l IX-21 Three points should, h o w e v e r , be noted: (1) any The experiment was structured query failed so that if at any time an iteration to return of ten new relevant d o c u m e n t s documents carded and shown in the group to the u s e r , that query was query retrieved and disused the original in its place until was satisfied either the iteration relevant count was or another document retrieved. (2) If the original relevant original position user; thus full search retrieved no documents before position j query j continued to be used (j > 1 0 ) , the until to the was reached the number of in the return iterations in which reduced. the e x p e r i m e n t a l query was used was (3) using The search comparisons were a cosine correlation vectors accomplished and between query document taken as e n t i t i e s . A promight only cedure more parallel have used with to the SSC approach correlation in the a concept-wise dealing query. those concepts appearing The first called the "roller characteristic leads to what might be c o a s t e r " e f f e c t , in which previously unfetched post-search documents analysis moved shows that relevant up sharply during an iteration of the experimental to the original 6. query, but were query. Examples lost w h e n a return was made of this effect are shown in Table IX-2 2 Rank 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 Query 18 Iteration 0* 1 2* 96R 140 97R 199R 46 64 68 52 47 42 141 108 49 178 161 21 85 171 40 100 10 44 61 54 112 187 55 181 157 143 96R 140 97R 199R 46 64 68 52 47 42 90 119 120 117 18 27 95 99 194 193 153 89 104 72 155R 141 121 173 134 93 96R 140 97R 199R 46 64 68 52 47 42 90 119 120 117 18 27 95 99 194 193 141 108 49 178 161 21 85 171 40 100 I Rank 3* 96R 140 97R 199R 46 64 68 52 47 42 90 119 120 117 18 27 95 99 194 193 141 108 49 178 161 21 85 171 40 100 ' 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 55 69 74 75 110 128 132 134 187 188 189 Query 22 Iteration 0* 1 2* 163 179 167 200 112 130R 150 125 166 164 14 57 184 58 31 42 102 30 198 189 109 147 39 142 32 60 178 145 103 129 128R 128R 163 179 167 200 112 130R 150 125 166 164 10 137 129 14 142 160 106 136 143 135 140 108 107 139 11 153 24 83 184 149 128R 163 179 167 200 112 130R 150 125 166 164 10 137 129 14 142 160 106 136 143 135 57 184 58 31 42 102 30 198 189 109 3* 163 179 167 200 112 130R 150 125 166 164 10 137 129 14 14 2 160 106 136 143 135 37 134 58 31 M 132 30 198 139 139 128R 131R 127R 127R 131R 131R 131R 127R 127R * r. Original quer y used "Roller Coaster" Effect from Discarding Experimental Iteration Query (Strictly significant query, binary judgments, 0.6000 cutoff) Table 6 IX-23 It is not c l e a r , h o w e v e r , that to continue fails using the e x p e r i m e n t a l the best approach is it iteration query w h e n into the group to draw any new relevant the u s e r . Table documents that seen by 7 shows the same experimental worsen obtained can in this the query may perform well the results in another that case. in one case but may actually query alone would part of the query have the original Undoubtedly, that this behavior types tested be explained study by the fact (particularly the strictly toward significant q u e r y ) move query vector but m o r e decidedly previously retrieved documents, and into well investigation into query groups characteristics the reasons why under before best there exist of requests w h i c h do is one iterative scheme and not under another conclusion necessary the any c o m p r e h e n s i v e can be drawn as to expressed in path to take in resolving the problem (1) a b o v e . The somewhat second point that shows that the results are cases (for biased by the fact in several example, used Q 1 0 , 1 1 , 2 0 , 26) the experimental on iteration bad 2. For query query was first 1 the results w e r e query had outstandingly a zero experimental in all efforts -- the original c o r r e l a t i o n with all relevant method was ever applied d o c u m e n t s , and no For future at a l l . studies, a documents wiser a c t i o n in the situation in which no relevant 1X-24 Rank Query 5 I t e r a t l on 2 0* 1 59R 162 197 58R 160 29 182 189 185 184 60R 165 115 150 9 191 10 198 164 11 156 139 168 92 28 16 180 57 200R 90 13R 8R 59R 162 197 58R 160 29 182 189 185 184 13R 89 141 60R 200R 21 15 27 56 126 172 176 171 139 44 30 123 4 140 147 8R 59R 162 197 58R 160 29 182 189 185 184 13R 89 141 60R 200R 21 15 27 56 126 123 167 122 109 94 121 18 163 165 35 Rank 3* 59R 162 197 58R 160 29 182 189 185 184 13R 89 141 60R 200R 21 15 27 56 126 123 167 122 109 94 121 18 163 165 35 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 34 35 38 51 65 69 79 80 102 Query 7 Iteration 0* 1 2 41R 90R 11 60 45 76 160 111 100 176 185 133 192 159 117 110 156 71 83 29 132 158 195 198 150 154 114 199 23 184 95R 72R 41R 90R 11 60 45 76 160 111 100 176 95R 91 93 36 199 64 173 32 155 96 94 193 43 33 140 116 99 50 23 103 42R 41R 90R 11 60 45 76 160 111 100 176 95R 91 93 36 199 64 173 32 155 96 117 94 119 103 192 195 92 153 121 120 3* 41R 90R 11 60 45 76 160 111 ICO 176 95R 91 93 36 199 64 173 32 15 5 9 6 ' 117 94 119 103 19 2 195 92 153 121 120 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 j 26 27 28 29 30 112 133 I 140 144 8R 7 2R 72R 42R 42R 72R 42R * , Original auerv used • A d v a n t a g e o u s Effects from Discarding E x p e r i m e n t a l Iteration Query (Strictly significant query, binary j u d g m e n t s , 0.6000 c u t o f f ) Table 7 IX-25 are returned equation other would probably be to utilize a variant a= 1, y = 1, 6 = 1, and the query of ( 1 ) , perhaps with all space. coefficients = 0, to move in document feedback some Alternatively, strategy such as (8) into the inclusion of a negative the e x p e r i m e n t a l q u e r i e s , through equation Q-.i = Q + c N , where N is a vector elements) concepts, (possibly with weighted of negatively and where significant Q represents the r ^e e x p e r i m e n t a l query of type e, might Qn be e f f e c t i v e , w h e r e Q could be replaced by Q. or if no relevant documents were returned. of specific query queries produced With regard types, to the p e r f o r m a n c e that it was discovered in all of the 30 user elements query in the a n a l y s i s , the n o n s i g n i f i c a n t final rankings lower than either the strictly significant, as the c o n c e p t - c o r r e l a t e d , in equation type query collection or the R o c c h i o - t y p e indicating (formulated that in (7)) q u e r i e s , thus for which or that either the this it is effective is not present the method is generally inapplicable. conclusion sample that involved query It i s , of c o u r s e , impossible answering this question from to draw a general the small request the implication in the present e x p e r i m e n t , but useful this alone) may not be p a r t i c u l a r l y (at least when used IX-26 could follow from either case. the experimental nonan One should significant average concepts n o t e , h o w e v e r , that queries (first elements iteration) contained to an average of This states finding 9.1 of 8.0 for c o n c e p t s , as compared queries. 1, which request the original corroborate hypothesis few (2) of section that only a very important is in ideas of the original are really determining noteworthy the user's n e e d s . for its possible This last conclusion to the application interpretatio it may imply becau of an original natural that a detailed language r e q u e s t , since analysis of the query is not necessary to abstract the some quick method discriminatory might be developed ideas. significant query performed in general The strictly very Table other, similarly 8 details to the R o c c h i o - t y p e examples in which query of equation ( 7 ) . either method surpassed th thus lending to more support to the feeling retrieval that perhaps a of real key successful queries is the development into One groups should a strategy for which note that by which can be classified is a p p r o p r i a t e . a particular experimental method results confirm pre-test query moved projections close cases in that the strictly significant decidedly in some to the previously relevant retrieved d o c u m e n t s , so that pushed away d o c u m e n t s were actually from retrieval (Table 7 ) . IX-27 Strictly Significant Query Query 3 (Strictly Signif icant)* 10 14 15 20 21 22 1 4 11 14 15 144 12 14 40 119 5 22 24 32 33 1 3 4 32R 33R 30R 4R 57R 31R 59R 58R 13R 60R 200R 8R 92R 45R 16R 44R 94R 90R 93R 91R 95R 96R 97R 199R 154R 17R 136R 135R 157R Concep t-Correlated Query 10 16 18 26 27 32 Rocchio-Type Method 10 18 21 22 24 124 32R 30R 4R 57R 31R 33R 32R 30R 33R 4R 31R 57R 59R 58R 200R 60R 13R 8R 92R 45R 16R 44R 94R 90R 93R 91R 95R 96R 97R 199R 154R 17R 135R 157R 136R Query 5 (Strictly signif icant) 1 4 15 32 134 141 12 14 40 115 5 22 24 32 38 1 3 4 1 3 15 51 140 I 59R 58R 4 12 200R 60R 1 7 13R 56 8R 82 12 14 61 72 5 20 21 22 25 Query 11 (Concep t correlat ed) Query 17 (Rocchio) 9 2R 45R 44R 16R 9 4R 95R 91R 90R 93R 9 6R 9 7R 199R 15 5R 154R 17R 135R 15 7R 13 6R Query 18 (Rocchio) 1 3 4 1 3 J2 Query 39 (Strictly significant) 1 3 14 16 20 20 39 * < method judged best by FERF criterion over three iterations (Section 3) All ranks less than 10 were set by the initial full search Final Ranks of Relevant Documents for Selected Queries Table 8 IX-28 The c o n c e p t - c o r r e l a t e d pattern query followed the same general as the R o c c h i o - t y p e m e t h o d , but generally worse than either query. the Rocchio If rankings the query or performed the the noticeably strictly FERF query significant are made using coefficient never (defined below), concept-correlated (the two are the strictly 31 . (Table tied surpasses the Rocchio method 31) and surpasses on queries significant 16,27,28 and query only on queries 1 1 , 1 6 , 2 6 , 2 7 , 2 8 , and 5 and 39 In a d d i t i o n , for some q u e r i e s , such as queries 8), the c o n c e p t - c o r r e l a t e d than does The most the noise either method performs considerably significant is worse type. that the Rocchio or the strictly for probable explanation introduced this behavior for the by the entries affecting nonsignificant discrimination concepts of the (Table 2) is adversely search. Document level r e c a l l - p r e c i s i o n analyzed, is slighly query and graphs (Figure 2) show that for the 30 queries the R o c c h i o - t y p e higher the method the p r o v i d e s a curve which nonsignificant q u e r y , and significant experimental unless elements than that of concept-correlated strictly the three about the same pattern as that of the It is difficult lie so close that query. queries to explain why in together performance in this be one concludes the SSC approach as used investigation (though perhaps different results would IX-29 found if the three points raised above were resolved) contributes no information to the search that a simple Rocchio method does not also produce. In an effort to gain a more solid quantitative measure of the performance of an iteration method in a frozen feedback situation (in which documents retrieved have their ranks locked, so that the highest ranking a document can receive on iteration i is i * N , where N documents are returned to the user on each iteration), the frozen exponential ranking factor (FERF) has been developed: r-1 En. j=0 J (9) gr = T - , r = 1,2,...,i 0 if gr=0 (10) f ' r 8r otherwise i-1 Z (10 ** (i-j)) * f J+ 1 ' j=0 (11) P = FERF = where T = total number of documents relevant to a query n, = number of relevant documents reth trieved on the k iteration J = number of iterations L initial full (not counting search) performed IX-30 The quantity p, which will always lie in the range answer [0, to the 10 ** i ] , has been evaluation The FERF retrieved in w h i c h problem introduced pointed as a possible out by Hall and W e i d e r m a n [ 1 0 ] . of relevant all are not and it documents found, is not affected on the full by the number (provided search case the evaluation breaks d o w n ) , coefficient does assign a higher retrieves to a method which promptly the new m a t e r i a l than to a method iteration. which retrieves same m a t e r i a l is in some number on a later F u r t h e r m o r e , the FERF of the sense " n o r m a l i z e d " relevant note that since it is independent to a query. the FERF is a rather of d o c u m e n t s One should gross rank measure of d e s i r a b i l i t y within in that it makes no evaluation of shown to the u s e r . consequence This an iteration group limitation, will however, is not one of major examine the entire since the user in any presumably group returned case. associated which An Goodness directly with is intuitively illustration of result has for these reasons been of the FERF (an assignment the m a g n i t u d e p l e a s i n g , as seen in the example b e l o w ) . is given in Table 9. rankings for of the FERF these Following the v a r i o u s i d e a s , the following queries types of experimental in this study obtain IX-31 Global conditions: Query Q i = 2 N = 5 T = 7 (notation identical to that in body of the paper) Method A 1 2 R* 3 4 R 5 M_e thod _B 1 2 R 3 4 R 5 6 R 7 8 9 10 R Method _C 1 2 R 3 4 R 5 Full Search Iteration 1 6 R 7 g l = 5 8 R 9 R f = 0.6 10 11 12 13 14 15 6 7 = 5 g l = 8 R = 0.4 9 R f l = 10 R 11 12 J3 g l = 5 0.6 f l " g2 = 2 Iteration 2 f 2 = o.o FERF = 60 11 R 12 R g 2 -= 3 13 R 14 = 1.0 f 2 " 15 FERF = 50 g2 = 2 f 14 15 R 2 " 0.5 F E R F •• = 65 * - indicates position of a relevant document An Illustration of the FERF Approach Table 9 IX-3 2 when averages over the 30 user queries are taken: Rocchio Strictly (equation (7)) 547 475 398 291 significant Concept-correlated Nonsignificant elements Overall Significance Table 10 Evaluation This m e a s u r e not surpass also shows that the experimental query queries do the simple R o c c h i o - t y p e performance. investigation, that necessary The SSC m e t h o d , as tested does require a sizeable amount for an ordinary The information Rocchio-type in this of time beyond relevance feedback search. query f e t c h e s , significance the machine the time c a l c u l a t i o n s , and by construction approximately the user which for increase 1 5 % and time of a search (and effort) required (a from quantity base than to judge the relevance necessary of ten documents is probably to provide a d e f e n s i b l e is markedly the statistical calculations) greater in u n e m b e l l i s h e d methods. Consequently, the SSC approach IX-33 must be proved results resource can be capable of producing strategies substantially before the better than do existing expenditure increased ideas necessary to utilize the SSC justified. 5. Conclusions Although and Recommendations conducted with the the i n v e s t i g a t i o n s queries of Table 2 have not been outstandingly of relevance successful the testing approach on in obtaining authors better m e t h o d s feedback, of feel that because of as broad of the impossibility all aspects a concept as the SSC feedback in a single rather which before the present limited study e x p e r i m e n t , some of the ideas should be further In is based investigated they are dropped of treating from c o n s i d e r a t i o n . documents and particular, as strings than the a p p r o a c h of concept queries beads which can be broken a p a r t , rather be added, as i n d i v i s i b l e bars which must and weighted as entities seems subtracted, it allows the parts tests to have value because in filtering the investigator noise introduced to be more by selective out in irrelevant information contained of a document or query v e c t o r . The use of statistical IX-3M- to a s c e r t a i n relevant those concepts important in distinguishing be be investigated developed, in which the from nonrelevant documents types should should f u r t h e r , and a d d i t i o n a l p e r h a p s along the lines query suggested previously, characteristics For of the SSC approach can be fully investigate a modified exploited. e x a m p l e , one could concept-correl entered and i ) query with in which positively the m e a n of significant concepts are documents (or the w e i g h t s in relevant each remaining query means query (unused) concept of the original is entered with of r e s o l u t i o n retrieves its weight u n c h a n g e d . in which documents negative Similarly, the iteration of the situation relevant no further for for a particul feedback into i t e r a t i o n and means the SSC approach Another extension introducing be should considered. is the retrie p o s s i b l e area of future research of the c o n c e p t - w i s e procedure in the third to the actual evaluative of d o c u m e n t s , as outlined mentioned the area known, here in Section 4. of checking point in Further work could cutoff also be done correlation for l e v e l s ; it is now reported workable for i n s t a n c e , that 0.4000 and the type of feedback the range of from that 0.8000 are beyond of varying l e v e l s , but to a lesser the effect the cutoff 0.6000 degree has not been studied. IX-35 REFERENCES [ 1] R o c c h i o , J.J. "Relevance Feedback in Information R e t r i e v a l " , Scientific Report N o . ISR-9 to the N a t i o n a l Science F o u n d a t i o n , Section I I I , Harvard C o m p u t a t i o n L a b o r a t o r y , August 1 9 6 5 . R o c c h i o , J.J. "Document Retrieval System O p t i m i z a tion and E v a l u a t i o n " , Harvard U n i v e r s i t y D o c t o r a l T h e s i s , Scientific Report N o . ISR-10 to the N a t i o n a l Science F o u n d a t i o n , Harvard C o m p u t a t i o n L a b o r a t o r y , March 1 9 6 6 . R o c c h i o , J.J., S a l t o n , G. "Search O p t i m i z a t i o n and Iterative Retrieval T e c h n i q u e s " , P r o c e e d i n g s of the Fall Joint Computer C o n f e r e n c e , Las V e g a s , November 1965. C r a w f o r d , R.G., M e l z e r , H . Z . "The Use of Relevant D o c u m e n t s Instead of Queries in R e l e v a n c e F e e d b a c k " , Scientific Report N o . ISR-14 to the N a t i o n a l Science F o u n d a t i o n , Section X I I I , D e p a r t m e n t of Computer S c i e n c e , Cornell U n i v e r s i t y , October 1 9 6 8 . R i d d l e , W., H o r w i t z , T., D i e t z , R. "Relevance F e e d b a c k in an I n f o r m a t i o n Retrieval System", Scientific Report N o . ISR-11 to the N a t i o n a l Science F o u n d a t i o n , Section V I , D e p a r t m e n t of Computer S c i e n c e , Cornell U n i v e r s i t y , June 1 9 6 6 . Ide, E. "User Interaction with an Automated R e t r i e v a l S y s t e m " , Scientific Report N o . ISR-12 to the N a t i o n a l Science F o u n d a t i o n , Section X I I , D e p a r t m e n t of Computer S c i e n c e , Cornell U n i v e r s i t y , June 1 9 6 7 . I d e , E. "Relevance Feedback in an A u t o m a t i c D o c u m e n t R e t r i e v a l S y s t e m " , Cornell U n i v e r s i t y Master of Science T h e s i s , Scientific Report N o . ISR-15 to the N a t i o n a l Science F o u n d a t i o n , January 1 9 6 9 . C l e v e r d o n , C , M i l l s , J., K e e n , M. ASLIB Cranfield R e s e a r c h Project - Factors D e t e r m i n i n g the P e r formance of Indexing S y s t e m s , C r a n f i e l d , 1 9 6 6 . [ 2] [ 3] [ 4] [ 5] [ 6] [ 7] [ 8] IX-36 REFERENCES (Continued) [ 9] Spiegel, M.R, Statistics, New York, Schaum Publishing Co. , 1961 (p. 247). Hall, H., Weiderman, N. "The Evaluation Problem in Relevance Feedback Systems", Scientific Report No. ISR-12 to the National Science Foundation, Section XII, Department of Computer Science, Cornell University, June 1967. [10] IX-37 APPENDIX Origins A Judgments of the Spectral R e l e v a n c e for the Cranfield 200 Document Collection Method (Abstracted of Retrieving Documents Keen [ 8 ] , V o l . 1, p.79) from C l e v e r d o n , M i l l s , and Stage 1: Authors of d o c u m e n t s in the c o l l e c t i o n construct search questions assessment citations in the (queries) and make a in the relevance of items listed of their bibliographic included own d o c u m e n t s which are collection. Stage 2: Using Stage every the document c o l l e c t i o n and competent questions examine from 1, technically document people in relation to every question to find any a d d i t i o n a l noted (to the b i b l i o g r a p h i c documents. citations in Stage 1) relevant Stage 3: The document documents assessment authors receive the additional produced of by Stage 2 and make a final relevance. IX-38 APPENDIX A (Continued) Method of Marking Relevance (Abstracted from Cleverdon, Mills, and Keen [8], Vol. 2, p.123; codes changed to conform to the usage in the present experiment) Grade 4: References which are a complete answer to the question. Grade 3: References which are of a high degree of relevance, the lack of which would have made the research (to answer the query) impracticable or would have resulted in a considerable amount of extra work. Grade 2: References which are useful, either as general background to the work or as suggesting methods of tackling certain aspects of the work. Grade 1: References which are of minimum interest (for example, those that have been included from a historical viewpoint). Grade 0: References which are of no interest.