VII-1 VII. Search and Retrieval Experiments in Real-Time Information Retrieval G. Salton Abstract Future operating document retrieval systems may be based on fullyautomatic information analysis methods instead of manual indexing, and on real-time search procedures which allow the user to interact with the system during the search process. Performance characteristics are first given for fully-automatic information retrieval systems, and comparisons are made with presently operating partly-manual systems. Thereafter, various user-controlled search strategies are described, and the potential of these strategies in improving systems performance is discussed. The evaluation results for the real-time retrieval procedures are used to derive design criteria for future automatic information systems. 1. Introduction Throughout the world, the design and operations of large-scale information systems has become of concern to an ever-increasing segment of the scientific and professional population. Furthermore, as the amount and complexity of the available information has continued to grow, the use of mechanized or partly mechanized procedures for various information storage and retrieval tasks has also become more widespread. As a result, a number of large information systems are now in operation in which at least the search operations — that is, the comparison of incoming search VII-2 requests with stored information items is carried out automatically. Typical examples in the United States are the NASA Scientific and Technical Information Facility, and the MEDLARS system at the National Library of Medicine. While these operational information systems are thus able rapidly to search vast storage files, often containing many hundreds of thousands of items, most of the operations other than the search itself are performed manually with the help of human experts. In particular, all the content analysis and indexing operations, leading to the assignment of suitably chosen combinations of index terms to the stored documents and to incoming search requests are normally performed by specialists who know the given subject area, as well as the performance characteristics of the retrieval environment within which they operate. As will be seen in the next section, many of the information systems which base their operations on manual indexing but largely automatic search methods are quite successful in isolating,from the large mass of largely irrelevant stored material, many of the items which prove pertinent to the users1 information needs. Nevertheless, the feeling that manual systems and procedures should be replaced by suitably chosen automatic methods has continued to grow, and a number of fully-automatic information storage and retrieval systems have been designed and put into operation, at least on an experimental basis. The SMART system represents one such effort to replace the intellectual indexing by sophisticated automatic text analysis procedures, and thereby to produce a retrieval environment in which all document and query handling procedures are performed automatically [1,2,3]. VII-3 In the next section, the performance characteristics of presently operating information systems are briefly outlined and a comparison is made with the performance of the alternative fully-automatic environment. Various procedures are then described leading to an improvement in systems performance, and conjectures are made concerning the design of future automatic information systems. 2. Performance Characteristics of Information Systems Many different criteria may suggest themselves for measuring the performance of an information system. In the present context, the system effectiveness is assumed to depend on its ability to satisfy the users' information needs by retrieving wanted material, while rejecting unwanted items. Two measures have been widely used for this purpose, known as recall and precision, and representing respectively, the proportion of relevant material actually retrieved, and the proportion of retrieved material actually relevant [4]. (Ideally, all relevant items should be retrieved, while at the same time, all nonrelevant items should be rejected, as reflected by perfect recall and precision values equal to 1 ) . It should be noted that both the recall and precision figures achievable by a given system are adjustable, in the sense that a relaxation of the search conditions often leads to high recall, while a tightening of the search criteria leads to high precision. Unhappily, experience has shown that, on the average, recall and precision tend to vary inversely since the retrieval of more relevant items normally also leads to the retrieval of more irrelevant ones. In practice, a compromise is usually made, and a performance level is chosen such that much of the VII-4 relevant material is retrieved, while the number of nonrelevant items which are also retrieved is kept within tolerable limits. As an example, the MEDLARS system at the National Library of Medicine is said to achieve an average recall of 0.58 and a precision of 0.50; that is, an average user may expect to retrieve 58 percent of the relevant material contained in the system, while only one half of the retrieved material is actually unwanted. Obviously, other operating points are achievable by altering the indexing or search techniques, producing either higher recall normally with low precision, or higher precision with lower recall. In order to make it possible to design more effective retrieval systems, it is important to be aware of the reasons for system failure under presently existing operating conditions. Tables 1 and 2 summarize the perIt may be seen formance for 18 queries processed by the MEDLARS system. from Table 1 that a large proportion of the recall failures of the system (that is, failures to retrieve relevant material) is due to the lack of sufficient user-system interaction, while Table 2 reveals that many of the precision failures (that is, failures to reject irrelevant material) are caused by lack of specificity in the document indexing. In both cases, the manual analysis process is at least partially to blame, since the query analyses as well as the document indexing are performed by experts who cannot — in the absence of specific information concerning the user population — always be aware of the appropriate indexing requirements. In the automatic SMART system, an attempt is made to replace the intellectual indexing effort used in the conventional situations by a fully-automatic computer analysis of the document and query texts. Specif- ically, a variety of language analysis procedures — including suffix cut-off Type of Recall Failure (resulting in relevant document not retrieved) A. Indexing Errors 1, Important term missing from document specification (indexer omission) Indexing insufficiently exhaustive (some aspect of document content not reflected in the indexing) Indexing insufficiently specific (index terms used are too broad) Number of Instances Percentage of Total 6 11% 2. 6 11% 3. 9 17% | B. Search Errors 4. Important term missing from query specification (searcher omission) Lack of sufficient usersystem interaction (searcher misunderstands user need) Missing items because of selective print-out (retrieved item not shown to user due to sampling) 8 | 15% 5. 21 40% 6. 3 6% Typical Recall Failures in Conventional Retrieval System (18 queries processed by MEDLARS retrieval system) Table 1 Type of Precision Failure (resulting in nonrelevant document retrieved) A. Indexing Errors 1* Indexing too exhaustive (minor aspect of document content reflected in indexing) Indexing not sufficiently specific (indexing language used is too general) Number of 1 Instances Percentage of Total 17 11% 2. 53 34% 3 . Important index term missing , or inappropriately or incorrectly used 4. Incorrect term relationship specified by indexing rules 7 5% 8 5% B. Search Errors 5. Search formulation insufficiently specific Lack of sufficient user-system interaction Search formulation insufficiently exhaustive (not enough Boolean formulation) Miscellaneous errors (on the part of searcher or requester) 18 11% 6. 24 15% 7. 22 14% 8. 7 5% Typical Precision Failures in Conventional Retrieval System (18 queries processed by MEDLARS) Table 2 VII-7 methods, thesaurus look-up, phrase generation methods, statistical term associations, syntactic analysis, and others — are used to reduce document and query texts into analyzed concept (or term) vectors. concept vectors attached to the documents are then matched with the vectors derived from the search requests, and the documents, arranged in decreasing query correlation order, are submitted to the user as answers to the query. One might expect that an automatic text analysis of the type used in the SMART system would necessarily produce retrieval results which are much inferior to those obtained in a system based on manual indexing. In actual fact, the automatic environment makes it possible The to use for analysis purposes relatively large sections of text, such as abstracts or summaries, thus insuring a high degree of indexing exhaustivity; furthermore, the importance of some assigned terms can be enhanced by automatic weighting methods, leading to a more sophisticated matching process between analyzed queries and documents than is normally possible. As a result, the benefits of the index language control supplied in the more conventional retrieval situation by the human indexers appear to be balanced by a deeper and more complex type of analysis available in an automatic environment; this is reflected by evaluation results which indicate that the search effectiveness of the fully-automatic systems is not inferior to that obtainable at present in a partly-manual system. Consider as an example, the performance characteristics of the SMART system, presented in the recall-precision graphs of Fig. 1. The left-hand graph (Fig. 1(a)) shows the performance of an automatic wordstem matching procedure, using weighted word stems extracted from document VII-8 abstracts and from search requests respectively/ for three different document collections in the areas of documentation (ADI), aerodynamics (CRAN), and computer engineering (IRE). The performance for the three collections is generally comparable, ranging from an average precision of 0.80 at recall of 0.10, to an average precision between 0.10 and 0.30 for a recall of 1. Fig. 1(a) also exhibits the large distance which remains to be covered between the performance of the automatic word-stem matching system and that of an ideal system (in the upper right-hand corner of the graph) where both recall and precision values are close to 1. A variety of more refined language analysis procedures can be used in an attempt to improve the performance of automatic information systems [4,5], The use of a stored thesaurus capable of recognizing synonyms and other closely related terms is one such process which can be incorporated into an automatic content analysis system. Fig. 1(b) represents average performance for 780 documents in the computer field for both the wordstem matching process and a thesaurus look-up method in which all word stems are first replaced by thesaurus group entries (concepts) prior to the comparison between query and documents. It may be seen that the synonym recognition implicit in the thesaurus procedure serves to improve the retrieval performance by approximately ten percent. Other language analysis methods produce some further improvement but do not come close to the ideal performance area [5]. A comparison between the automatic SMART procedures using abstract processing, and the conventional operational retrieval situations based on the matching of manually assigned index terms shows that the retrieval VII-9 u c o / c E •— i -j 3 tw « in "XD w. 3 O Wor The « / > lis / / H oo <> x 3 O UJ c o X3 "5 o c a> cr •—i o Of) O <> / 4> JC h- N1 V„ O D E 0) E */> (/) 6 D CJ XD w o w> a o 3 U 0> o en. u UJ £ < oo <0 (VJ O > T> O o 'Z en CL •^ c «/> 2 w aroct t> « U w num a> c «n O -H -C O " £ o S * o ^ ° * «§ I a ~8 •-• t/> o *o c a> o u c o Proc 0> €1 fa k. 10 E t- * * — o Com Ilecti s o »4 • o fc_ < LJ < 9 ! i u a < J • o •— a> E (J 4> o a. 15 u >s a> CL CO *> i. o £ p o w VII-10 effectiveness achievable is quite comparable in the two cases. Table 3(a) shows "normalized recall" and "normalized precision" values averaged over 42 queries for 200 documents in aerodynamics formerly used as part of the Aslib-Cranfield tests [4]. Columns 1 and 3 of Table 3(a) give the performance for the manually assigned index terms both without and with the use of word normalization through the thesaurus. Columns 2 and 4 do the same for the SMART word stem and thesaurus procedures. It may be seen that, in each case, the evaluation measures exhibit the same order of magnitude, the stem process being slightly better for the index terms, and the thesaurus process favoring slightly the automatic abstract process. The same general conclusions can be derived from the recall-precision comparisons of Table 3(b), which reflect the average performance of 18 queries and 273 documents in biomedicine. Column 1 indicates the performance for the MEDLARS manual indexing process used at the National Library of Medicine, while columns 2 to 4 contain figures for three automatic methods incorporated into the SMART system. It is seen that, for the small sample collection, somewhat better average recall is obtained with the SMART methods, and somewhat better precision through the manual indexing. The description of the test environment used to produce the data of Table 3 must remain outside the scope of the present study [5,6]. How- ever, if the data given are assumed to be representative of performance differences between presently operating semi-manual retrieval systems and the fully-automatic analysis systems of the future, then a conclusion that the performance level is comparable seems reasonable. Moreover, in both cases, the performance is far below that which an ordinary user submitting a search query to a retrieval system might be hoping to achieve. VII-11 Stem Indexing Normalized Recall Normalized Precision 0.890 0.683 SMART Stem |I Thes-3 Indexing Abstracts 0.864 0.670 0.873 0.694 SMART Thes-3 Abstracts 0.884 0.695 a) Comparison of Cranfield Index Term Match with SMART Abstract Processing (Cranfield 200 documents; 42 queries) MEDLARS Indexing Recall Adjusted Precision 0.643 0.625 SMART Word Form 0.704 0.571 SMART Word Stem 0.718 0.570 SMART Thesaurus 0.690 0.611 b) Comparison of MEDLARS Index Term Match with SMART Abstract Processing (MEDLARS 273 documents; 18 queries, macro-average) Comparison of Index Term Matching with Automatic Abstract Processing Table 3 VII-12 In the next few sections, various interactive search techniques are described which may help in raising the search effectiveness closer to the optimal area in the upper right-hand corner of the recall-precision graph, 3. User Feedback Retrieval Methods A) General Methodology One of the main hopes in obtaining a retrieval performance which goes beyond that presently reached under normal operating conditions, is to include the customer in the search process. In particular, fewer errors are likely to be made if the information obtained from the users is not restricted to the search request proper, but is supplemented by a variety of special user need indications, or by evaluation data about the acceptability of items previously retrieved by the system in answer to the search requests. User-system interaction is now current for many computer applications, often implemented by special input-output console devices, with the help of operating systems which enable the system to render more or less simultaneous service to a large class of users. In an information retrieval environment, user interaction may take the form of simple dictionary display routines which can be used to present to the user selected dictionary excerpts as an aid in formulating the original search requests, or in reformulating queries which were originally inadequate [7,8], Alternatively, more sophisticated methods may be used in which the reformulation of the search requests is automatically performed based on feedback information obtained from the user population [9,10]. The VII-13 relevance feedback process incorporated into the SMART system is particularly well adapted to a time-sharing computer organization with simple console equipment, since it requires only a minimum of interaction with the user, and places most of the burden on internally stored routines. Specifically, an initial search is first performed for each request received, and a small amount of output, consisting of some of the highest scoring documents, is presented to the user. Some of the retrieved output is then examined by the user who identifies each document as being either relevant (R) or not relevant (N) to his purpose. These relevance judgments are later returned to the system, and used automatically to adjust the initial search request in such a way that query terms, or concepts, present in the relevant documents are promoted (by increasing their weight), whereas terms occurring in the documents designated as nonrelevant are similarly demoted. This process produces an altered search request which may be expected to exhibit greater similarity with the relevant document subset, and greater dissimilarity with the nonrelevant set. The altered request can next be submitted to the system, and a second search can be performed using the new request formulation. If the system performs as expected, additional relevant material may then be retrieved, or, in any case, the relevant items may produce a greater similarity with the altered request than with the original. The newly retrieved items can again be examined by the user, and new relevance assessments can be used to obtain a second reformulation of the request. This process can be continued over several iterations, until such time as the user is satisfied with the results obtained. VII-14 A large number of feedback experiments have been performed with the SMART system in order to identify those methods which appear to be most effective in improving retrieval performance. These experiments are based on a query alteration algorithm described by the following equation: ^ q n 2 n 3 + n 4 ' (1) i+l = aq i + Sq 0 + Y / ^i " i=l 6 / ^i * / i=l ^i^i i=l / ^i^i i=l st Here, the (i+1) " query statement, < -. i /is defined as a composite 34obtained from the ith query formulation q. , the initial q , a set of n_ documents, r, , identified by the user as relevant, a set of n_ 1 —i 2 documents, —a s. , identified as not relevant, a set of n^ 3 documents, d. , —i supplied by the user without specific relevance indications, and a set of n important concept numbers, £. , also obtained from the user. A number of parameters (a,6,y,6,w., and v,) serve as weighting functions. These parameters are set equal to zero when the corresponding information items are not specified. Alternatively, they may be set equal to 1 or greater if the corresponding items are to be added to (or substracted from) the present query statement in order to produce the new query. Equation (1) represents a vector transformation in the document space, which moves the query close to the documents, or concepts, identified as important by the user, and away from the documents, or concepts, specified as unimportant. The effect of various types of transformations is examined in the next few subsections. B) Positive Feedback One of the simplest feedback methods is obtained by using previously VII-15 retrieved relevant items to update the search requests. Fig. 2(a) represents the output obtained for the Cranfield collection by retrieving, in each case, 5 documents at a time, asking the user to identify any relevant items, and adding these relevant items to the search requests to obtain new query formulations. Fig. 2 presents averages over 42 search requests for initial runs, as well as first and second feedback iterations, using equation (1) in the following simplified form: q i+i= •^ •£ Some Fig. 2(a) shows that an average improvement of twenty percent is obtained in the precision between initial run and first feedback iteration. further improvement is produced by the second feedback run, particularly in the high recall region [11,12]. The output of Fig. 2(a) is obtained by a process illustrated for query 8 in Table 4(a). Five documents are first retrieved by the initial run, labelled respectively 187, 173, 39, 33, and 139. Document 39 is now identified as relevant (marked 'Rf in Table 4(a)), and is used to modify the query. The modified query now produces the five documents Relevant document 39 had already labelled 39, 42, 179, 112, and 181. been retrieved in the initial run, but 42 is new, and after it is identified as relevant, a new query is constructed which, in turn, produces five additional documents, and so on, until the user obtains no further useful information. Obviously, the choice of cut-off, which in the example is set at five retrieved items, is arbitrary, and may, in practice, be made to depend on the user's wishes. Initial Rank 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Doc 187 173 39R 33 139 185 36 188 42R 199 41R 15 98 178 23 a) Corr .2315 .1949 .1949 .1949 .1771 .1743 .1714 .1702 .1666 .1621 .1542 .1291 .1238 .1231 .1208 Doc 39R 42R 179 112 181 188 97 45 41R 2 173 62 187 101 185 Feedback Iterations 2 3 1 Corr Doc Corr Doc .9641 .6533 .5200 .5095 .4429 .3865 .3559 .3529 .3022 .2925 .2606 .2494 .2474 .2371 .2355 1 1 1 1 1 1 1 1 1 1 1 1 1 39R 42R 179 112 181 188 97 45 2 173 62 117 116 187 41R .9143 .8871 .5804 .5609 .4600 .4090 .3757 .3735 .3296 .2703 .2661 .2615 .2612 .2510 .2430 39R 42R 179 112 181 188 97 45 2 173 62 117 116 187 41R Corr .9143 .8871 .5804 .5609 .4600 .4090 .3757 .3735 .3296 .2703 .2661 .2615 .2612 .2510 .2430 Feedback Results for Query 8 (all effects) Initial Rank 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 17 134 143 200 b) Doc 187 173 39R 33 139 185 36 188 42R 199 41R 15 98 178 23 0 0 0 40R Corr .2315 .1949 .1949 .1949 .1771 .1743 .1714 .1702 .1666 .1621 .1542 .1291 .1238 .1231 .1208 0 0 0 .0000 Feedback Iterations 2 3 Corr Doc Corr Doc .2510 .2703 .9143 .0317 .0865 187 173 39R 33 139 42R 179 112 181 188 Corr .2510 .2703 .9143 .0317 .0865 .8871 .5804 .5609 .4600 .4090.. Y/ffl • //62 ,3296 .2661 .2430 0 .0368 0 V/ltl/ / 'M~& r 1 I 4IR 0 40R 0 Feedback Results for Query 8 (feedback effect only) "Increment Only" Strategies for Query 8 (Cranfield thesaurus; cosine numeric N=5) Table 4 VII-17 < c o C C ^ c + -•-•.Oi-iJt r 9 •» * a «•_ iO 00 tr «*. u ii 0 j; c UJ o • 5 c 2 < J a > i o o < (\J 1* O O) (7) . ai E o CO ™«+-L o c 00 0) (NJ •? iD CM cr a. 3 c cr o 4 < _i C o o w u c o E T5 J? o 00 .t: *_ 8— o < r «_ « C « <0 a o JO TO o JO ^ _ o ••- i o (\J _ A o CO *5 o rr^—L 00 a> C\J cr CL esaur (feed boc *> o c E • •» c 3 k. o £ J& i VII-18 The example of Table 4(a) illustrates the fact that an improvement in recall and precision may be obtained from one search iteration to the next for two different reasons, known as the ranking effect and the feedback effect, respectively [13] . The former is exemplified by document 39 whose rank is improved from 3 to 1 between the initial run and the first feedback iteration, following its identification as a relevant item. The feedback effect proper is illustrated by relevant document 42, which was not initially retrieved, but jumps from rank 9 to 2 as a result of the feedback action. One may argue that the ranking effect, which reflects improvements in rank of items already retrieved, should be disregarded, since the user may not care to look at retrieved items more than once. The feedback effect alone may be measured by freezing the ranks of all items as they are retrieved. This is reflected in the output of Table 4(b), where the first 5 ranks are maintained throughout, the first 10 ranks after iteration one, the first 15 after iteration two, and so on. Document 42 is now seen to improve only from rank 9 to 6, since the top 5 ranks are preempted. The corresponding recall-precision graph shown in Fig. 2(b) shows that the feedback effect produces less improvement than the ranking effect. In a practical retrieval system, a compromise might be made between the output of Table 4(a) and 4(b) by taking documents previously identified as either relevant or not relevant out of the system, but leaving the others alone. For the example of Table 4, this would delete document 39 after the initial run, and documents 39 and 42 following the first iteration. VII-19 The output of Fig. 2 and Table 4 clearly shows that user feedback information can help in improving retrieval performance by moving the query close to the area identified as important by the user. Occasionally, however, it happens that some relevant items are lost as other new ones are retrieved. Consider, for example, the relevant document 41 which After the first initially receives rank 11 in the output of Table 4. iteration, it gains from rank 11 to 9, but after document 42 is identified as relevant, item 41 decreases from 9 to 15. approaches document 42, it recedes from 41. That is, as the query This situation is shown graphically in Fig. 3, where query 8 first approaches document 39, and then document 42, while at the same time, it gets away from document 41. In Fig. 3, the size of the correlation coefficient between query and document is represented as varying inversely with the physical distance between them. It is clear that in a situation such as the one illustrated in Fig. 3, the search request covers several subject areas. To obtain good retrieval performance, the query must then be split into two parts, one to reflect the subject area of documents 39 and 42, and the other the area of document 41. Experimental query splitting algorithms are presently under construction. C) Negative Feedback The feedback process illustrated in the last subsection, operates with a fixed number of retrieved documents, and uses only positive information (that is, relevant documents identified by the user) for feedback purposes. In a practical system, one may anticipate that some users would VII-20 a) Initial Search b) First Feedback (after retrieval of 39) D O © query relevant items relevant retrieved c) Second Feedback (after retrieval of 39 and 42) Feedback Illustration for Query 8 (Query Splitting) Fig. 3 VII-21 be willing to look at more information than others before supplying feedback data. This may be the case particularly in situations where In such a nothing useful is retrieved within the top 5, or top 10 items. case, two different procedures are available, known respectively as variable and negative feedback. In the variable process, the cut-off value, which determines the number of documents to be retrieved for the user before feedback is returned to the system, varies from query to query. One possible strategy might consist in first retrieving five items; if the user is now in a position to supply a relevance judgment, retrieval stops, and the query is updated in preparation for a subsequent search operation; if not, additional documents are retrieved (in decreasing correlation order with the query) until such time as a relevant item is identified by the user. The recall-precision performance of a typical variable feedback system is shown in Fig. 4(a) for the 200 document Cranfield collection, averaged over 42 search requests. In the output of Fig. 4(a), the cut-off is chosen directly after the first relevant item, with a maximum possible cut-off of 15. It is seen that the variable cut-off process operates much more successfully than the standard system based on a fixed cut-off of 5 items (the second iteration of a standard cut-off process is shown superimposed on the graphs of Fig. 4). Furthermore, out of 42 queries used in the Cranfield tests, only two were found without any relevant items in the top 15 ranks, the average number of items examined being only four. While the variable process thus shows promise, particularly at the high-precision end of the performance scale, some queries may always be VII-22 < U Si c? _ O in > 00 O u & H- o iS 8 > 8 * U.^ XT M •c c o o <\j Z B o w I 5 IS (/> 00 UJ <> x (\J ^I *z S» o < o i i o OJ o ± ° •c *- •• O o ID o UJ Q: 00 u > OJ a. VII-23 submitted for which relevant items are difficult to identify without an exhaustive examination of a large part of the collection. Under these circumstances, a negative strategy can be used, designed to inform the system about what the user does not wish to retrieve. One of the simplest strategies consists simply in identifying the top document as nonrelevant. The query will then be modified in such a way as to decrease its similarity with the item identified as nonrelevant; at the same time, the assumption is that this modification may increase its similarity with the relevant items in the collection [11]. Consider, as an example, the output for query 1 processed against the collection of 200 Cranfield documents. The standard feedback strategy, which consists in incrementing the concept weights from documents identified as relevant, is illustrated in Table 5(a). A cut-off of 5 documents is again assumed. It is seen from the table that no relevant document is ever identified within the top 15 documents, so that the feedback procedure is unavailing in that case. The relevant documents 21, 22, and 1 remain at their initial ranks 32, 33, and 200 respectively. On the other hand, when the negative feedback strategy is used in the form = q. + ) r. - s. % + 1 = 3, + \ £. - s, 1 Lr x , (3) the altered query retrieves relevant document 22, which, in turn, is used to retrieve the remaining relevant items (21 and 1 ) . The process is illustrated in the output of Table 5(b), and the graphical representation of Fig. 5. The first nonrelevant item Cno. 125) is first identified by the user; VII-24 Rank 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 32 33 200 0 125 12 I 159 64 66 123 67 65 24 29 190 25 127 136 85 21R 22R 1R 1 1 Documents 1 2 125 12 159 64 66 123 67 65 24 29 190 25 127 136 85 21R 22R 1R 1 1 1 1 1 1 1 1 | I 1 1 1 1 I 1 1 1 1 3 125 12 159 64 66 123 67 65 24 29 190 25 127 136 85 21R 22R 1R 1 125 12 159 64 66 123 67 65 24 29 190 25 127 136 85 21R 22R 1R a) "Increment Only" Strategy for Query 1 (Initial Run (0) and three Feedback Runs (1-3)) Rank 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 32 33 199 200 0 12 159 64 66 123 67 65 24 29 190 25 127 136 85 21R 22R 0 1R Documents 1 2 125 1 12 1 159 1 v\£y& 64 1 66 1 1 22R 22STl 1 161 l^V \ 1 155 I fS$ y l 1 184 I 184- 1 1 189 la^/i 1 70 21R 1 i 4 IR i 1 136 141 \ i 37 59 V 1 31 30 \ 1 0 0 1 1 0 0 1 1 1R 0 1 1 21R 0 1 I 11 . I 3 125 12 159 64 66 22R 161 155 184 189 21&/1 Q i/^ MX /yh\ ' 69 y 30 0 0 0 0 b) "Decrement High" Strategy for Query 1 (feedback only evaluation) "Decrement High" Strategy for Query 1 Table 5 VII-25 a) Initial Search (no relevant retrieved) b) First Negative Feedback (document 125) • O © query relevant items relevant retrievec non-relevant item c) Second Positive Feedback (document 22) d) Third Positive O Feedback (document 1,21) Negative Feedback Query 1 Fig. 5 Illustration VII-26 as the query moves away from document 125, it approaches item 22, whose rank improves from 33 to 6 between initial runs and first iteration. When item 22 is identified as relevant after the first iteration, the other relevant items 1 and 21 increase their positions from ranks 199 and 200 to 11 and 12 (the top ten ranks are frozen by this time in the output of Table 5(b)) . The negative feedback strategy profits from the fact that in a reasonably homogeneous document space, the various alternative subject areas should be reachable in a small number of negative feedback steps [14]. The case of query 1 previously discussed renders plausible the large average improvement shown for the 42 Cranfield queries in Fig. 4(b) between the negative feedback strategy of equation (3) and the positive strategy of equation (2). Fig 4(b) shows the first iteration curve using negative feedback, compared with the second iteration positive strategy. The im- provements in retrieval effectiveness exhibited in Fig. 4 for the variable cut-off and negative feedback methods make it appear that the two strategies might be combined in an actual operational situation. The feedback procedures described in the present section are likely, substantially, to improve retrieval performance, over the presently achievable performance levels. In some cases, the improvement may be as high as 30 percent in the high precision area and 20 percent in the high recall area. The recall-precision graphs included in this study demonstrate, however, that the distance to the theoretically desirable ideal system is still large. A perfect system, exhibiting both high recall and high precision may well not be reachable in the near future. VII-27 X document vector • cluster ctfntroid A typical query /Zf part of collection actually searched Sample Clustered Document Space PRECISION 4 5% search (n) full search search of n clusters (2, \s