IX-1

Interactive Search and Retrieval Methods Using Automatic Information Displays M. E. Lesk and G. Salton

Abstract Presently available information storage and retrieval systems do not produce retrieval results which will satisfy the information needs of all potential users. Interactive search methods using console displays

and conversational computing methods promise to furnish retrieval results which are far superior to those achievable by conventional procedures. In the present study, various interactive search strategies are used in conjunction with the automatic SMART document retrieval system, and an attempt is made to evaluate the effectiveness of each method as part of a retrieval system. In particular, the usefulness of each method in retrieving

wanted and rejecting unwanted items is discussed, as well as the cost of the user-system interaction in terms of additional user effort and computer time. It is found that for all but the most experienced users, auto-

matic methods requiring little user effort are preferable to more sophisticated procedures which may produce somewhat better retrieval results at somewhat higher cost.

1.

Introduction Throughout the world, the design and operation of large-scale in-

formation systems has become of concern to an ever-increasing segment of the scientific and professional population. Furthermore, as the amount

and complexity of the available information has continued to grow, the use

IX-2

of mechanized or partly mechanized procedures for various information storage and retrieval tasks has also become more widespread. While a number of

retrieval systems are already in operation in which the search operations needed to compare the incoming information requests with the stored items are performed automatically, no systematic study has ever been made of the use of man-machine interaction as a part of a mechanized text analysis and information processing system. Specifically, the recent development of high

capacity random-access storage mechanisms and conversational input-output consoles should permit a rapid interchange of information between users and system. Such an interchange can than be used to produce improved search for-

mulations, resulting in a more effective retrieval service. The present report describes and evaluates the performance of a variety of such interactive search and retrieval procedures in which information supplied by the user population is taken into account in an attempt to achieve improved system responses. system interaction are possible. Several basic approaches to user-

On the one hand, an attempt can be made

to construct refined query formulations, using dictionary displays and similar methods, before any file search is actually attempted. On the other

hand, an original query can be processed when it is first received, and a query reformulation attempted after the results of an initial search are actually available. These two procedures, termed pre-search and post-search, either the

respectively, can in turn be executed in several different ways:

system assumes most of the burden of the query reformulation through an automatic query alteration process, or, the users themselves can rephrase their queries using the available automatic displays. In the latter case,

IX-3

the skill of the user population becomes a more important factor.

The

stored data most important in the pre-search methods might include synonym dictionaries and thesauruses, word frequency statistics, and lists of significant words; the post-search information, on the other hand, consists of the titles, abstracts, or texts of documents retrieved by a previous search process. The investigation of the various interactive search and retrieval methods is carried out with the help of the automatic SMART document retrieval system {1,2]. The SMART system is a large computer-based

retrieval system capable of performing a variety of different text analysis, search, and retrieval operations. Completely automatic text analysis

and information searches are made using several different analysis methods and search strategies. Among the main text analysis procedures are syno-

nym recognition, word disambiguation, phrase recognition, statistical term association, and hierarchical text expansion methods. The effectiveness of the various analysis and search methods may be evaluated by using for this purpose the familiar recall and precision measures, representing respectively the proportion of relevant material actually retrieved, and the proportion of retrieved material actually relevant. Ideally, all relevant items should be retrieved for the user,

while at the same time, all nonrelevant items should be rejected, thus leading to a system where both recall and precision are equal to 1. The

performance effectiveness of an operating system can then be estimated by averaging recall and precision figures over many searches and comparing the results with the ideal situation where recall and precision are equal

IX-4

to 1.

The SMART system automatically generates for each search a set of

recall-precision graphs first introduced by Cleverdon [3] , and also includes procedures for performing computations of the statistical significance of the results. Evaluation data for a wide variety of automatic text processing, search and retrieval methods have previously been published [4]. In addition to the recall-precision data which reflect the capability of the system to deliver to the user the information he requests, it is also important in an interactive computing environment to take into account the amount of effort required from the user to obtain satisfactory results. Thus, the standard performance of fully-automatic search and retrieval operations must be compared against the improvements obtainable through, interactive procedures at additional cost in user effort and computer time. In the remainder of this study, the effectiveness of various types of interactive search methods is examined, including both pre-search and post-search methods, and semi- or fully-automatic query reformulation procedures. The results are compared using, in each case, the evaluation Construction principles are

methods incorporated into the SMART system.

then derived for future information services designed to use man-machine interaction during the search process.

2.

Fully-Automatic Retrieval In the SMART system, various fully-automatic language analysis pro-

cedures are used to normalize the text of incoming search requests and of stored documents. The normalized, reduced forms of the information items,

consisting generally of weighted "concept" numbers, are then compared, and

IX-5

the document representations which are most similar to the request representations are extracted from the file as answers to the queries. The lan-

guage normalization procedures incorporated into the SMART system range from simple word stem matching methods to more sophisticated processes using stored synonym dictionaries and hierarchies, as well as statistical and syntactic analysis methods [1/2]. Three of the simplest language analysis methods, known, respectively,
as wor

d form, word stem, and thesaurus processes may be described as follows:

a).

in the word form, or suffix 's', process, no word normalization in the proper sense is used at all, and the original words with only the final ' s' removed (to confound, for example, "book" and "books") are compared directly;

b)

in the word stem method, the original text words are reduced to word stems by a suffix cut-off process to confound words like "analyzer", "analysis", "analyzed", and so on, before the comparison between queries and documents;

c)

in the thesaurus process, each word stem is looked up in a synonym dictionary, or thesaurus, where it is replaced by one or more so-called concept numbers, representing synonym classes; the concepts extracted

from the thesaurus are then matched instead of the original word forms or word stems.

In all analysis methods, the terms are normally weighted, using word frequency and other criteria, before a comparison is made between stored documents and search requests.

IX-6

An excerpt from a typical, manually constructed thesaurus is shown in Table 1. Three of the synonym classes defined by the thesaurus mapping Concept class 346, for example,

are shown in the right-hand side of Table 1.

contains words specifying objects which fly; category 345 lists words associated with weather. If a request were made, asking

"do planes fly when the weather is bad?" the system would retrieve a document stating

"proper meteorological conditions are necessary for the successful piloting of aircraft", since both document and query would be assigned the concepts 345 and 346. The handling of ambiguous words in the thesaurus is exemplified by the entry for "wind", which could be either the noun, referring to weather, or the verb, indicating a method of constructing loops or coils. The table shows that "wind" is in two categories, 345 and 233. 345, containing also "weather" and "atmosphere", represents the noun, and 233, which contains such words as "winding" "wire-wound", and "solenoid", represents the verbal meaning. Whenever "wind" appears, both 345 and 233 will be entered into the Because the word is considered ambiguous, the weight will

concept vector.

be divided between these two categories; each will receive half of the weight assigned to "wind". It should be noted that the thesaurus entries may consist of word stems, so that "meteorolog" suffices to look up "meteorology" and "meteorological". If desired, however, suffixed forms of a word may be entered in

IX-7

Word Wide Will Wind Winding Wipe Wire Wire-wound

Alphabetic Order Concept Code 438 32032 345 233 233 403 232 105 233

Syntax Code

Numeric Order Concept Word Code 344 345 obstacle target atmosphere meteorolog weather wind aircraft airplane bomber craft helicopter missile plane

j

001 043 040 009 070 043 044 049 070 043 070 136 137 043 070 070 043 001

346

Thesaurus Excerpt Table 1

Query Alteration Process Pre-Search . 1. 2. Repeated Concepts Thesaurus Display

Explanation

User chooses query terms to be repeated for emphasis User chooses terms obtained from thesaurus display to update query (with or without time restrictions) User looks at display of word frequency information before updating query User looks at display of source document before updating

3.

Word Frequency

4.

Source Document

Post- Search 5. 6. Title Display Abstract Display Relevance Feedback User looks at titles of first five retrieved documents before updating User looks at abstracts of first five retrieved documents Query is updated automatically using relevance judgments supplied by user following an initial search

1

7

'

Combined Methods 8, Abstract plus Thesaurus User looks at pre- and post-search information

Typical Query Updating Methods Table 2

IX-8

the thesaurus; this has been done with "winding", since if only "wind" were in the dictionary, "winding" would also be treated as ambiguous, but the presence of "winding" in the thesaurus makes it possible to identify

"winding" in the text with category 233 only. The high concept number identifies "will" as a so-called common word, not to be used for content identification. The syntax codes shown

with the thesaurus entries in Table 1 are not used in the simple automatic thesaurus process. Since the fully-automatic thesaurus process based on concept number matching is often an effective analysis tool, more sophisticated language normalization methods may not normally be required in an operational retreival system.

3. User Interaction Through Pre-Search Methods One of the main hopes in obtaining a retrieval performance which goes beyond that presently reached under normal operating conditions is to include the customer in the search process. In particular, fewer errors

are likely to be made if the information obtained from the users is not restricted to the search request proper, but is supplemented by a variety of special user indications, or by evaluation data about the acceptability of items previously retrieved by the system in answer to the search requests. User-system interaction is now current for many computer application, often implemented by special input-output console devices, with the help of operating systems which enable the system to render more or less simultaneous service to a large class of users.

IX

In an information retrieval environment, user interaction may take the form of simple dictionary display routines which can be used to present to the user selected dictionary excerpts as an aid in formulating the original search requests, or in reformulating queries which were originally inadequate [5,6]. Alternatively, more sophisticated methods may be used in

which the reformulation of the search requests is automatically performed based on feedback information obtained from the user population [7,8], The conceptually simpler methods are the pre-search procedures which are based on term and dictionary displays of previously stored information. In each case, a user would look at the displayed information and,

based on the available data, would decide before any file search is actually attempted how his query could best be reformulated in order properly to reflect his information needs. The following types of pre-search informa-

tion could be displayed for this purpose:

a)

lists of terms included in the user's original search formulation together with word frequency information giving the frequency of use of each word in one or more of the stored document collections;

b)

thesaurus excerpts corresponding to the terms included in the user's search formulation, and consisting, for each of the originally available terms, of a complete thesaurus class, including synonyms and other terms related to the original;

c)

titles and abstracts of source documents, that is, of documents originally known to the user as relevant to his search query (the intent of the user is then to retrieve new documents similar to the source items).

IX-10

The principal differences between fully-automatic retrieval and retrieval using pre-search interaction are summarized in the flowcharts of Figs. 1(a) and 1(b). The pre-search requires the generation of a computer display

followed by a manual choice of terms on the part of the user during the query formulation process. The display of word frequency information is designed to inform the user of the characteristics of the vocabulary which may be used to express his information needs. Thus, if a user notices that many of the terms included in his search request are general terms with a very high frequency of occurrence in the stored document collection (for example, terms such as "computer" and "automatic" in a collection on computer science), he may decide that it is wise to delete these terms from his query so as to prevent the generation of high query-document correlations for many nonrelevant documents. On the other hand, the user may decide to emphasize many highly specific, low-frequency terms by repeating them in the query formulation. A thesaurus display can be used for manual query updating by requesting a printout of the complete thesaurus classes corresponding to each term included in the original query. Consider, as an example, a query

dealing with the "contraction of satellite orbits", and assume that the user signifies that he is interested in the "satellite" class. The computer might then type out terms such as Discoverer, Sputnik, Vanguard, Cosmos, Moon, rocket, trajectory, countdown, drag, telemetry, etc.

IX-11

After studying the display, the user might decide that his original query formulation had been insufficiently specific, and the query might then be altered by addition of the terms "Discoverer, Sputnik, Vanguard, Cosmos, drag, and telemetry". The other displayed terms would, however, be reA second expansion

jected as not being germane to the search topic,

might begin by typing in the term "drag", and then considering the new display of terms related to "drag". Thesaurus displays are also occasionally useful for the removal from the query formulation of incorrectly used and ambiguous terms. For

example, a user interested in information retrieval who identifies his search topic as "IR" might discover that the thesaurus display produces a list of synonyms in the area of "infra-red spectroscopy". As a result,

the term "IR" would, of course, be removed from the search formulation. The use of thesaurus displays for manual query updating provides an opportunity for a selective choice of synonym and related terms. That

is, the user can choose some terms to be added to the original query, and others to replace already existing ones in an attempt to improve search precision. On the other hand, the automatic thesaurus process operates

less selectively and provides synonym recognition by the standard process of automatically replacing the word stems originally included in the search requests and documents by the corresponding concept class numbers extracted from the thesaurus. The automatic thesaurus process is thus designed to

normalize query and document statements by generalizing the respective formulations rather than by making them more specific. Such a process may

be expected to improve recall, since more relevant documents could now

IX-12

-p

O

4) -P rH X O

/K
-P

S &

O •P

CD -P rH X O C >i D

X C D C D

fd £

o

> i -P o ft

£ C D ^ -P C TO D TO > i

n 0

C D >i -P U fd < TJ D 2 ft

O

D

w an
/|\

£ CD H -P CD W >i 0}

CD > i -P ^ fd CD ^ 5 ft Ol p

A
-^
-p T3 CD fd C D

TJ

o o
-p

C fd

- p ft fd •P

•S3

G o •H -P fd £ -H £
fd

U ^ o fd O Q -P fd >i TJ Q fd Q) rH > ft C D
W
-r4

tf 3

1 C D CD d

c

c: o
-H -p fd '

o
O Q > i TJ fd C D rH > ft C D TO - H -H H Q -P
W C D

H

T3 C D

3

A
o
•H •P fd CD -P rH

/N
.3 U U fd CD W TO TO CD O O JH ft

O *H -P fd •H <H •H 0

X^ n W Q -P

U 0

1

/\
rd O TO TO

O H -P fd
'rH IW •H

u u a) fd o
C D 0

o

"TF
£ U H fd CD W

u ft
rC

TO C D

3

u 5 M
fd C D OQ

n a a) rj
0

O

A

CD

A
G

o M rd CD if) CD

m ft u

o
fd CD CO I -P TO O ft

W TO CD O O U ft

u ft

-P
TO

^

H

O ft

V4 fd C D

o

-H

tn

•

/\

A
tP

H3

G rd
i

e n
CD

ft

ft H CD CX p -P fd ft

fd

.5 &
t P CD H Ot O

> i -H H -P C fd D CM ft D

T5

A
1 C D
CD O

M

ao * w
to
0 >1 rH

3

•P
fd

M 0

Q
•P

fd

TO •H

ft

Q

M p 3 U fd 0 TO Q

TO C D 3 £

C

CD U U 3 fd O -P w cd Q O •P >i G fd TO 0) H 3 g

ft u 3
TO H Q

7R
fd

A
fd

3 fd TO

U O Q

-S &
t n CD -H 3 U CX O

•H

H

tn

CD

'u a o

ter

o 2

rH

c

< D U ft

is
•P fd

IX-13

match the query statements and could thus be retrieved in answer to the respective search requests. Obviously, the manual query updating methods using thesaurus displays places a considerable burden on the user, since he is forced to consider a large number of alternative possibilities before eventually making a move. Moreover, the choice must be made before a search has actually

been performed, at a time when he cannot know as yet how well the machine will perform with any potential query formulation. A comparison of the effectiveness of manual and automatic thesaurus procedures is contained in section 5 together with the other evaluation output.

4.

User Interaction Through Post-Search Methods The post-search methods are those applicable after an initial search

has first been performed.

In such a case, one or more documents will al-

ready be available, including in particular those items which were initially judged to be most similar to the search requests. These items can now be

used in a manner analogous to that previously utilized for the thesaurus displays. Specifically, the titles, or abstracts, of the first few

retrieved documents can be examined, and document terms which appear to reflect the wanted subject area can be added to the query statement, while ambiguous and unwanted terms can be removed. Consider, for example, the previously cited query dealing with the "contraction of satellite orbits", and assume that the first two retrieved items are entitled "Discoverer satellite and South Pacific splash down",

IX-14

and "The moon and the tides"• A user could now proceed to add "Discoverer satellite" to the original query, but could avoid the addition of "South Pacific". The document feedback expansion may be even more difficult to carry out than the dictionary display procedure, since the user is forced to make sophisticated decisions using relatively large text excerpts. Thus, whereas the dictionary display procedure can often be performed in less than a minute per query, approximately four minutes are required on the average for the use of five typical document abstracts. Furthermore, the document expansion process also entails a higher cost in machine time and storage space than the dictionary display, since document abstracts in natural language form constitute a much greater bulk than dictionary excerpts. In addition, an initial retrieval run must first be made before On the other hand, a stored dictionary

document feedback can be used.

need, of course, not be available for the document feedback method. Another post-search method is designed particularly for those users who do not wish to assume the burden of query reformulation themselves. For such users, an automatic relevance feedback method is available which requires only a minimum of interaction with the user, since most of the burden is placed on internally stored routines [7,8,9,10]. Specifically,

an initial search is first performed for each request received, and a small amount of output consisting of some of the highest scoring documents, is presented to the user. Some of the retrieved output is then

examined by the user who identifies each document as being either relevant (R) or not relevant (N) to his purpose. These relevance judgments are

IX

later returned to the system, and used automatically to adjust the initial search request in such a way that query terms, or concepts, present in the relevant documents are promoted (by increasing their weight), whereas terms occurring in the documents designated as nonrelevant are similarly demoted. If the terms from the relevant items are added to the search requests, while terms from nonrelevant items are subtracted, the first query updating operation can be represented by the equation:

qn 1

=

q

o

+ L r, - X s. . -a , —i

where q

is the original query formulation, q

is the updated query, r. is

the set of terms identifying the i

document specified as relevant by the nonrelevant document.

user, and s. is the set of terms identifying the i

This process produces an altered search request which may be expected to exhibit greater similarity with the relevant document subset, and greater dissimilarity with the nonrelevant set. The altered request can next be submitted to the system, and a second search can be performed using the new request formulation. If the

system performs as expected, additional relevant material may then be retrieved, or, in any case, the relevant items may produce a greater similarity with the altered request than with the original. The newly retrieved

items can again be examined by the user, and new relevance assessments can be used to obtain a second reformulation of the request. This process can

be continued over several iterations, until such time as the user is satisfied with the results obtained. Since the method makes very few demands

IX-16

on the user, the automatic relevance feedback process may be expected to be preferred by users unfamiliar with the system operations. On the other hand, the process is not likely to be effective if the user is unable to identify for the system at least one document which is clearly relevant to his needs. The post-search methods as well as the combined methods making use of pre- as well as post-search information are illustrated in the bottom half of Fig, 1. A summarization of all the query updating methods is given in Table 2.

5.

Evaluation Results and Discussion The experimental results included in this section are based on

the manipulation of a collection of 200 abstracts of documents in aerodynamics, together with 42 search requests proposed by scientists active in aerodynamics. Complete relevance judgments, prepared by these same scien-

tists, were available which identify for each query the set of relevant documents. The aerodynamics collection has previously been used for test

purposes by the Aslib-Cranfield project [3] and by the SMART system [4]. The thesaurus used for both the manual and automatic query expansion operations contains 3230 word stems and 736 thesaurus classes. This thesaurus was constructed by SMART staff members using text concordances, word frequency lists, standard dictionaries and reference works, and word lists obtained earlier from the Cranfield project. An attempt was made to time the query expansion operations by restricting the use of the thesaurus display to either one minute, two minutes, or more than two minutes. While

IX-17

the output of Table 3 shows that increasingly more terms can be added to the queries as more time becomes available for the updating operations, the differences in retrieval effectiveness are small, and the evaluation output shown represents the output obtained for a display time of two minutes. The main results are presented first in terms of recall-precision graphs, and then in terms of cost and user effort.

A)

Recall-Precision Results

The evaluation output is presented in Figs. 2 to 7 using the standard recall-precision graphs, averaged in each case over the 42 queries used with the collection of 200 documents. The curves are, as usual, mono-

tonically decreasing, reflecting the fact that as more relevant items are retrieved (as recall goes up), more irrelevant items are also retrieved (causing the precision to go down). Increasingly more effective retrieval

performance is reflected by recall-precision curves close to the upper right-hand corner of the graph where both recall and precision take on ideal values of 1. Next to the graphs, some of the numeric values are presented

in terms of recall-precision tables, giving the average precision values at certain selected recall values. Significance values, computed by a standard t-test, are also included in the output figures, representing in each case the probability that the performance values for two specified processing methods are in fact derived from the same distribution. Thus, if the computed probability value is

high, the two methods are assumed to be statistically indistinguishable; on the other hand, if the probability value is low - say 0.05 or less - the

IX-18

likelihood that the evaluation results could have been derived from the same data set is very small, and the differences in performance can then be assumed to be statistically significant. The following principal conclusions can be drawn from the output of Figs. 2-7:

a)

automatic thesaurus vs. pre-search using thesaurus display (Fig.2): the automatic thesaurus expansion and the manual expansion using pre-search thesaurus display both produce an improvement in performance over the word stem matching process. Overall,

the automatic thesaurus (which requires no user intervention) is superior. At high precision, however, the greater selec-

tivity of the words chosen by the manual process produces better results. The superiority of the automatic thesaurus

at medium and high recall is attributed to the previously mentioned difficulty of selecting appropriate terms from the thesaurus display. b) automatic thesaurus vs. pre-search using source document display (Fig. 3 ) : the source document display produces a precision improvement of up to ten percent over and above the automatic thesaurus process; however, the table appearing with Fig. 3 the improvement is not statistically significant. shows that The rela-

tively modest increase in performance may be due to the fact that the source documents and queries used in the experiment originated with the same authors, so that the source documents contain many of the terms already included in the query statements; also, some of the source documents appear only marginally relevant to the actual queries; both of the interactive pre-search methods turn out therefore to be not substantially

IX-19

superior to the fully-automatic thesaurus method (except at high precision). post-search procedures using displays of titles or abstracts of previously retrieved items (Figs. 4 and 5 ) : title and abstract post-search displays are superior to both of the pre-search displays, as shown in Figs. 4 and 5. Im-

provement with title display is limited to the high precision regions, since the titles are so short that words not in the query are rarely included. The query alterations due to title

display are therefore limited to deletion of unnecessary concepts, improving mostly the precision. Abstract displays pro-

duce both precision and recall improvements, at the cost of greatly increased work on the user's part. The amount of

text examined during an abstract display process is about 1000 words, from which five to ten may be selected for query expansion. automatic thesaurus vs. post-search updating using abstract display and relevance feedback (Fig. 6 ) : both the manual post-search method with abstract display and the automatic relevance feedback process are superior to the standard word stem process; the abstract display is best in the very high precision ranges. The performance differences

between the two post-search methods are not significant at high precision, although the improvements obtained with both methods over the standard word stem process are significant. The relevance feedback output included in Fig. 6 is obtained by retrieving, in each case, 5 documents at a time, asking the user to identify any relevant items, and adding the corresponding terms to the search request. combined pre-search dictionary and post-search abstract display (Fig. 7 ) :

IX-20

Fig. 7 shows that a combination of abstracts and thesaurus displays offers an overall improvement of about twenty percent over the standard word stem process, and of ten to fifteen percent over the thesaurus process; in both cases, the improvement is statistically significant. When word frequency information is added to the display, a further improvement results for the word stem procedure, since the user can now ensure that all parts of the query are properly weighted. The output of

Fig. 7 is approximately equivalent to the automatic relevance feedback process (Fig. 6); however, the combined pre- and postsearch process requires much more user effort and experience than the relevance feedback method before it can operate successfully.

B) Overall Evaluation The performance of the various interactive procedures is summarized in Table 4. The first column reflects computer demands; the second, user

effort; and the last two reflect search effectiveness in terms of recall and precision improvements over and above the normal word stem matching method. Since the post-search methods require two separate file searching operations - one prior to the interactive process and one following it the computer demands are comparatively higher for post-search than for the other methods. Thus, when search time may be expected to be considerable -

for example, for very large collection sizes - the pre-search procedures may become mandatory. From the user's viewpoint, the less information is displayed, the easier will normally be the interactive process. Thus the relevance feed-

IX-21

1

Query Type

! Average Number of Significant Terms per Query 8.3 3.6 2.0 1.0

Original Query Terms added in 1 minute in 2 minutes later

Variation in Query Length Table 3

Processing Method

Demands i Demands on on Computer User

Precision Improvement over Word Stem Match Low Recall High Recall

A) Fully Automatic
word stem match automatic thesaurus B) Pre-Search Interaction thesaurus display source document display C) Post-Search Interaction title display abstract display relevance feedback high high high medium very higtj , low +13% +17% + 10% +2% +7% +7% normal + mediumhigh medium +6% +4% normal normal none none +4% +6%

i

normal +

+8%

+5%

Performance Summary Table 4

IX-22

back procedure is simplest, since the user must merely identify one or another document as either relevant or nonrelevant; the pre-search thesaurus displays, and the post-search abstract displays are hardest, since complicated decisions are required to update the search requests. Turning now to the performance parameters, it is seen in Table 4 that, everything else being equal, the post-search methods are more powerful than the pre-search procedures. (Unfortunately, those are also the One obvious reason

methods which put the highest demands on the computer).

why the post-search methods operate more reliably is that a computer search has already been performed before the user is asked to update the query. Thus, the query alteration process is undertaken with prior knowledge of how well the original query has performed. The post-search alteration can then

be used to initiate small changes for queries requiring only little improvement, and more massive changes for the others. For the pre-search methods, no such prior information is available. Of the post-search methods, the best performance is obtained with abstract display; however, this method also makes the greatest demands on the user. The relevance feedback method is superior and much preferable from the

user's viewpoint. To summarize the performance and cost indications, the following search strategy would appear to be useful under most circumstances: a) normally, use standard automatic thesaurus method without user interaction; b) if improvement is needed and search time is not excessive, use relevance feedback;

IX-23

c)

if search time is £t a premium, use pre-search source document or thesaurus display;

d)

on the other hand, if high retrieval performance is mandatory, try post-search abstract display.

The difficulties of the manual query updating methods may be illustrated by the example of query 317, reproduced in Table 5. The

original word stem retrieval run produces the two relevant documents in rank positions 4 and 10. words were selected: From the thesaurus display, the following

"elastic", "resilient", "unstiffened", "modulus", This promotes the two rele-

"aeroelastic", "laminar-boundary-layer".

vant documents to rank positions 2 and 5; however, the automatic thesaurus run yields rank positions 2 and 4, without any user interaction. similar. When the post-search displays are used, the results are Title display is not very effective for this particular

query, yielding only an indication that "theoretical" should be increased in weight, which raises the rank positions of the relevant from 4 and 10 (in the original word stem run) to 4 and 9. display is more fruitful, adding "elastic" Abstract

and "resilient" as well. How-

This increases the ranks of the relevant documents to 1 and 6.

ever, the same query, now processed through the automatic thesaurus (abstract display and automatic thesaurus run) yields perfect performance, as does the automatic thesaurus run with relevance feedback. To achieve perfect performance using only manual updating methods and word stem matches, it is necessary to utilize a combined thesaurus display, abstract display, and word-frequency information,

IX-24

which yields the following rather complex set of changes: delete "anyone" and "investigate"; increase the weight on "theoretical" and "flexibility" by a factor of two; add with weight of one the words "analytic", "resilient", "calculate", "unstiffened", "aeroelastic", and "laminar-boundary"; add with

weight of two "flexure"; and add with weight of three "elastic". These changes produce a word stem run with perfect performance, but at far greater time and trouble than the automatic thesaurus with abstract display run. The exact adjustment of the term weights is normally performed more accurately and more easily by the automatic thesaurus. The manual methods are thus best reserved for users with the skill and interest to consult lengthy displays and to make complex decisions. A meaningful cost analysis is difficult to make without the use of an operational time sharing system to perform the experiments. Table 6 contains an estimated cost summary based on running times for the IBM 7094. Machine and user costs are assumed to be $75.00 and $10.00 per hour, respectively. Scanning time is 5 milliseconds per document, and additional cenTable 6 shows that the post-search methods

tral processor time is ignored.

are clearly the most expensive (they also are the most effective), with relevance feedback relatively cheaper than abstract displays. In general,

the automatic procedures appear economically and operationally better suited to the retrieval operations than the manual methods. Since the cost of

human time may be expected to continue to increase relative to the cost of machine time, the automatic procedures may grow even more attractive in the future.

IX-25

T-test Significance 1

.798 .339 .168 .181 .252

o

c

Q.

<

3

.580 .203 .199 .138 .061

o

o
CO CO Q>

CO 3 w3 O CO

1=
.634 .691 .669 .534 .594 .605 .462 .5 10 .541 .343 .376 .411 .253 .292 .314

c £
o c
CO
o

Precision

a

CL O
— 10

*—»

E
a> c o
TJ

E
«*CO
a>

^^
CO -1 "3

o

A

wor

ord

esa

o

Recall

QL

**~-'
CO

*

*— x:

>» o
CL

+^,

— K> I O S 0>

a>
i-

CO

do

odd
1
o o

E o o

CM

in

k.

— o

O

c o»

c o
••-

quer

que

—

a>

o

en

•—

o
<J

o c e*

O

c o

A
D

cwo

J

a
c o o
CO

Ajp

CXJO

///

J*
•<JO

^

J

HOO
C D

HCD A

I o o

o
UJ

Jtf
c o

1
H*
HCM

o

* — ' —

1
00

iff
<J0D

1

1

1
C M

1

C D

a.

—

IX-26

10
1£u
fd

-HD

o

n

<J

|m l-H G tn l-H lw
1 t(i
N

C^ ro rCM V£ CM VD co CO LO r>-

r>-

o 1 r^ r^

4J

3 ft

^

1 <u 3

o^ ro r^ ro
(X) CT.

G
00 vD rH r H CM CO LO r H CM fd U) C D

5
C D U

I+

H

• i IEH

-a
H

G n 3 o
•H MH -H

1
G

n

00

ro
O ^D

9
rL, D

^r r-

LO 00 LO

O

CM

r* VD ^r co

- H -rH 0) CO -H U H C D fd

c 0

C J r^

saurus

1o |-H 1 m s\ H <p i | <u D ^
lO.

0 ^ LO VO o KD ^D

rH

<tf
LO

rH <tf

H <tf

r-{ CO

rH

U o Pu -rH 1 -P
0)

<» — B
Q) -P U) T) U 0 '£ Ul QJ -H

^-v

(D

0)

e n 3 M 3 rd

• * —
>1 01 rH Q^ [fl

£ -P

1^

H -H rd -P o 05

Y 6
H rH

o

^r

CO ro ^ 0 LO

< CM ^ vD
^r

CO CM

• ^ cr>
CO CM

fi

-P CO

H ft 1 0 •H Q

fd

-P 0)

CM

(U .C -P
0) (D -H >H C D

rH

-H 13

ro
O

LO

r^ en
o o

1

u
Q)

-P G
C D

& o ^ ti

o
Q

2
<d
U)

o

o

0

CD U

-P
CD

£
U
Tl

&
H

u o
CO MH O

O

-H

tf
•H •H 0

rH fd -H -H H O

o

o
O O
CM
U CD

fa

c
tn

c

C D U

Cn

M

0)

M 3 o

0)

c/)
CD

o o IT

O

<

D

& u c o
•H 0) -H U Q J

>
-H -P O CD MH LW

CD

>

O
0) C D

tn
rd H
CD fd

W

>

00

H

r^
I rd

(M

c o
00

10

CJ V

IX-27

T-test Significance A a o o

o o o oo o o o oo
C

ro

Q.

C J C J N <fr IC V V O O O —ro

0)

o o o oo
^ O

0> <*• N tooo
(J) — C J C J CM V V

.2 c
<o "o CO

o c 'C
o o

Precision

D

h - r - CD * • to

a. o I .2 o :r oc co

a. co

A
co

E

E

CVJ

co h - r -

h» CO «0 fO C\J

o o
•-

O

^- <fr CVJ roro ro ro co ^ - i n co io TJ- ro CVJ

u

O

<

to -Q

Recall

co
<D

i~

3

O
Q.

6 6ddd

— ro to K o>

^

0) 3

cr

2
CO

S
*•-

8 '
o «-

i
o / / / D< O d<] o CM o

o o

cr
Q. O
L.

c o
CO

o co .2
CO

o ex

E o o

CO

0-

I

a< I /
D < / /

/

/

o /
O /
CVJ

/

o o
CD

a<
o
CO

o
CD

00

" 1

o
CVJ

CO

IX-28

T-test Significance A a o o

ro

oo ooo oo ooo
T3 C 3 Q.
«•-

—

o ro a> oo —
00 O 0) rOCD IO C M O

<
CO CD J3 O

13

O
CD

K c

O O
H -

o c

a> ^ r- K> oo
0 > — CVJ CVJ CM

o c CO o»
U CO
CD
k.

a
E
CVJ

Precision

h- r- a> *• ro
A

— ^ o <o w
(7) 0 ) — ^ <7> (0 io m ro CM ^ ^ CM ro ro

CL o | o

_

Q. CO

^ E
<D «•CO -D k-

E CD
CO

*-* E
CD
••-

__ O
CD

+— CO

Q

o o

'Z
C D

o

•o
o J
k>

CO T3 k.

ro ro co ^ to <D to *• ro CM

QC CO
TJ C

* '
CO

o
T3

* >*
CL CO

Recall

O

o *
>v O Q. CO TD

<
— ro# io N a>

C D
k. CD •D CT

O O OO O

o c
O O

»-

in
tP f^

k.

O o o
CD

— O
c o» o O

o c o
«•-

*O o
•k. •CO
JQ

•k— o .
T)

o D

<1

CXIO

hr m

fit
CO CD

a. a
k.

o
O

O

E o

CL

CO

u
CD k.

a.
I o u
CD

/

//

a/
c o
CD

a/

a

II <o

<o Jl <o
CM

a:

CD

CM

IX-29

o

0)

c
ed o
-H -P 00 rH rH rH rH

CDO e n
-P
d) -P I

3

o o o o o
O O O O O TJ 01 in

a, -p
3
0)

c o
ti

w

< D

u

rH 01 •Q u 01 •H H MH -H - H •H 0) c/3 ^H u rH C 01 D H C) P4 •rH 1 -P rH in rH -H 03 -P
O <D Pd 03 -P 00

G C 0 cn

(T> ^
<T> r H

r ^ oo oo
CN CN CN

u
01
CD

r^ r->
O «H

KD

-*$ oo

o
-p
(D

en
-H

c> r o
00

^ ^r
00

O
rH

u
(D

o r- r-

in

00 CO 00

OJ
PL.

u

6
<D -P

6 a)
-P W

H

Tf

rr

CN

OO

oo
U0 CN

oo 0 0 V£> U0

KG
KT

<tf 00

>
0) H <D T1

w
T)

u o

w (U H H
(1)

X 0

d
0
•H

—>
> i

id

rd

u

2 0^
q
tp
-H

-P 01 5-1 0) (1) 4-1 -P *H

€ 0)
0) U

01 H O. in *H

•H oo in r^ <y\ • • • • • o o o o o

01

>1 03 rH 10 -H

<D

-H

TJ
-p

Q
4J U 01

>
rH

03

u
O

05

+J

o

M
4-> in

H

o
O

•a
<J

a:
CL

MH

PI
<] o o

f/°?

o o E —o • o
00

o
in -H o3
QA

o

"35
a
v-

U

CD

CL

«tf
/ /

J/ J
/

J_
"o o a>

/
/
CD
C\J

.2 -*Lo O

V
I 00

CJ

IX-30

T-test Significance

.009 .002 .006 .001 .002

o

13

3

< .009 .007 .014 .001 .002 o
V)

r

3

o

a> a>
X) o

.787 .695 .631 .469 .361

E

o c h- o o c «*o c CO o» o en
CL — o
<D w-

Precision

Q

.634 .794 .534 .668 .462 .595 .343 .445 .253 .349

o

o o

a>

O

E

| o ••_ CO _ n •— o n
<p

A

£
CD

a.
CO

o c

or (7)

C/>

T) O

Recall

— K) if) f- 0>

XX

66od 6

o
>i C

T3 O

5
DM

>*

J5 .2

o 2 E 5
~
O

51
o-S
o o
CD

JO

.2
<

E o o

D
D<J O

a:

o
00

2
o
c o

o
CO

o

D<

O
CD

o
CD

6 o o

O O

ft
<n o
ao
CD
CJ V

CVJ

IX-31

Query 317;

Has anyone investigated theoretically whether surface flexibility can stabilize a laminar boundary layer? (Two Relevant Documents)

Processing Method

Terms "Added" or Deleted

Ranks of Relevant Documents

A)

Fully Automatic word stem match automatic thesaurus 4 ,10 2,4

B)

Improved Searches word stem plus thesaurus display (pre-search) word stem plus title display (post-search) word stem plus abstract display (post-search) "unstiffened", "modulus", "elastic", "resilient" , "aeroelastic" "theoretical" 2,5

4,9

"elastic", "resilient", "theoretical"

1,6

c)

Perfect Searches automatic thesaurus plus relevance feedback word stem plus abstract display plus automatic thesaurus thesaurus display plus abstract display plus word frequency display "elastic", "resilient" 1,2

1,2

anyone, investigate, "theoretical", "flexibility", "analytic", "resilient", "calculate", "unstiffened", "aeroelastic", "laminarboundary" , "flexure", "elastic"

1,2

Typical Manual Query Updating Table 5

Processing Method

Estimated Cost per Query , 50,000 documents 100,000 documents

A)

Fully Automatic word stem match automatic thesaurus $ 5.00 $ 5.00 $10.00 $10.00

B)

Interactive Pre-Search thesaurus display source document display $ 6.00 $ 5.50 $11.00 $10.50

C)

Interactive Post-Search title display abstract display relevance feedback $10.50 $13.00 $10.50 $20,50 $23.00 $20.50

D)

Partial Search cluster searches Conetenth of collection) cluster search plus relevance feedback cluster search plus abstract display $ 0.50 $ 6.00 $ 8.50 $ 1.00 $11.50 $14.00

Assumptions:

machine cost $75.00/hour document scan 5msec/doc central processing cost 0 human time $10.00/hour

Estimated Cost Figures Table 6

IX

The bottom part of Table 6 shows that processing cost goes down drastically if partial searches of the collection are performed, rather than full searches. Such partial "cluster" searches are implemented with

the SMART system; however, the cluster searches cannot be used if a recall performance higher than about 50 percent is required 111].

6.

Conclusion The best overall process for precision purposes is the abstract dis-

play used in conjunction with a word stem matching procedure.

For recall

purposes, a combination of abstract display with thesaurus word normalization appears best. The automatic relevance feedback approximates the abConsidering

stract display method while requiring much less user effort.

the complexity of the abstract display system, a sensible set of recommendations for high performance real-time retrieval would be the following:

a)

for highest precision, use title display and word stem matching;

b)

for highest recall with normal users, use the automatic thesaurus followed by automatic relevance feedback; with experienced and patient users, use abstract display and dictionary display plus frequency information;

c)

for maximum cost reduction at lower performance, use partial searches of the document collection.

These rules provide a graded set of feedback methods, ranging from automatic procedures which make only minimal demands on the user and are suitable for novices (automatic thesaurus expansion, relevance feedback),

34

to methods permitting sophisticated user-system interaction which combine the best features of manual and automatic query adjustment (thesaurus and abstract display)• One may expect that a suitable mix of user feedback

procedures can be found to produce optimal retrieval under many different conditions over many types of user classes.

IX-35

References

[1]

G. Salton and M. E. Lesk, The SMART Automatic Document Retrieval System An Illustration, Communications of the ACM, Vol. 8, No. 6, June 1965. G. Salton, et al., Scientific Reports on the SMART System to the National Science Foundation, Nos. ISR-11, ISR-12, ISR-13, Department of Computer Science, Cornell University, Ithaca, New York, June 1966, June 1967, and January 1968. C. W. Cleverdon, Jack Mills, and E. M. Keen, Factors Determining the Performance of Indexing Systems, Vol. 1, Design, AslibCranfield Research Project, Cranfield College of Aeronautics, 1966. G. Salton and M. E. Lesk, Computer Evaluation of Indexing and Text Processing, Journal of the ACM, Vol. 15, No. 1, January 1968. R. M. Curtice and V. Rosenberg, Optimizing Retrieval Results with Man-machine Interaction, Center for the Information Sciences Report, Lehigh University, Bethlehem, Pa., 1965. H. Borko, Utilization of On-Line Interactive Displays, in Information Systems Science and Technology, D. Walker, editor, Thompson Book Co., Washington, D. C., 1967. J. J. Rocchio, Jr., Document Retrieval Systems Optimization and Evaluation, Harvard University Doctoral Thesis, Scientific Report No. ISR-10 to the National Science Foundation, Harvard Computation Laboratory, March 1966. J. J. Rocchio and G. Salton, Information Search Optimization and Iterative Retrieval Techniques, Proceedings of the AFIPS Fall Joint Computer Conference, Vol. 27, Spartan Books, November 1965. G. Salton, Search and Retrieval Experiments in Real-Time Information Retrieval, Proceedings IFIP Congress - 68, Edinburgh, August 1968.

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

IX-36

References (contd)

[10]

E. Ide, User Interaction with an Automated Information Retrieval System, Scientific Report No. ISR-12 to the National Science Foundation, Section VIII, Department of Computer Science, Cornell University, June 1967. R. T. Grauer and M. Messier, "An Evaluation of Rocchio's Clustering Algorithm", Scientific Report No. ISR-12 to the National Science Foundation, Section VI, Cornell University, Department of Computer Science, June 1967.

[11]