IX-1

IX.

The Use of S t a t i s t i c a l S i g n i f i c a n c e in R e l e v a n c e Feedback J. Steven Brown and Paul D. Reilly

Abstract A new approach tically feedback shown significant iteration to relevance f e e d b a c k , the s t a t i s presented; concepts

concept

(SSC) a p p r o a c h , is

queries are constructed significant documents in

using

to be statistically from n o n r e l e v a n t and query vectors the

differentiating than using entire

relevant document types for testing are

rather

as e n t i t i e s .

Three new

query of

testing

SSC are p r e s e n t e d , and are g i v e n .

the results

these queries

The experimental to

queries

found

to be approximately

equivalent

Rocchio-type evaluation exponential is recommended

methods

in results

p r o d u c e d , regardless a newly developed

of the frozen study

criterion ranking and

(including

factor

[FERF]) used, but future are

courses

of investigation

outlined.

1.

Introduc t ion One of the major problems which an information of the re-

trieval

system must

solve is the d e t e r m i n a t i o n the

correspondence which the user

between a given query and really wishes to o b t a i n .

information the user

Often

IX-2

^

supplies

a request which for precise One method

is too

i n a c c u r a t e , too b r i e f , or of the documents

too

poorly worded to his n e e d s . a document during

retrieval for

relevant of

improving

the performance items found

retrieval

system

is to display

a preliminary to score

search of the document as either

files and relevant

to ask or n o n query

the user relevant

these documents The system

to his query.

then generates

a new and

by combining

the information

from these judgments

from

the known c h a r a c t e r i s t i c s bibliographic

(words used, ideas

expressed, retrieved. (]_ 2, 3

e n t r i e s , etc.) of the documents them that

Several a l g o r i t h m s , among and that of R. Crawford this

of J. J. Rocchio ( 4 ) , have been feedback.

and H. Melzer of relevance

develope

to address

technique

Nearly

all of the relevance

feedback

experimentation formula cited

to date has utilized by Crawford

the general query update

and Melzer ( 4 ) : 3 ( ) , = a O . + 3 Q n + Y^ R. + 6£ - + £ w -d. + l+l i 0 ' i . - i . - i i i=l i=l i=l
N

N

1

N

2

N

(1)

N
.

I v,#c.
- 1 . 1

, where

th Q. = H query at l J
l

le

i=l

where

Q. - query
l

at i

iteration returned returned

R. = relevant
l

documents

N. = nonrelevant

documents

d. = vectors of a set of documents considered as " e n v i r o n m e n t "

IX-3

c. = v e c t o r s imposed
a

of concepts showing relationships
w

9

$> Y>

5>

->

v. = w e i g h t s

Table

1 details

the c o n d i t i o n s

of some of the of v e c t o r s as

experiments. indivisible as a term,

In each a p p r o a c h , a c o m b i n a t i o n entities with (that

i s , the entire vector individual

is used

D O use of only The

p a r t s ) is

utilized. paper select

investigation

reported

in the present tests to

considers concepts Concepts relevant

the effect

of using

statistical

to be m a n i p u l a t e d shown and

in relevance

feedback

algorithms. between

to be significant

in d i f f e r e n t i a t i n g are used (Table to

nonrelevant

documents forms

construct

one of three different trieval p e r f o r m a n c e this statistically feedback

query

2) whose r e for to

is then tested. significant is based

The rationale

concept

(SSC) approach

relevance

on the following

hypotheses:

(1) only

The user bases his relevance a few of the concepts present

judgments in each

on

document;

(2)

The small

set of concepts which

the user represents than query;

employs

in his selection more accurately in which he is interested set of concepts

the i n f o r m a t i o n does the total

in the search

(3)

Those

concepts which

the user by a

considers

important analys i s .

can be determined

statistical

IX-4-

a Ide (5) Riddle, Horwitz, Dietz (6) 1 Crawford, Melzer (4) Rocchio (1,2,3) 1 0 0 1

6
0 l 0 0

Y 1 1 1

6 0 0 0

w.
l

v .
l

0 0 0

0 0 0 0

1 / N 1 -1/N 2 0

Experimental

conditions

Table 1

Positively significant and negatively significant cone ep t s

Nonsignificant concep t s

Concept-correlated query Nonsignificant elements query

Mean of relevant document concept values

Mean of concept values for all documents

0

Remaining elements of original query, if any 0

Strictly significant query

Mean of relevant document concept values

Query Definitions Table 2

IX-5

The effort based find a statistical The basic depends

on these premises

is therefore

to

process which will SSC approach

satisfy ( 3 ) . in this study relevance

developed the

on the c o r r e l a t i o n between to a given query, and

(user-judged) a

of a document concept has

the w e i g h t s or

particular

in each retrieved

(relevant

nonrelevant) in Table with 3.

document.

One can think of the process v e c t o r s d_.

as shown

As is e v i d e n t , the document weights as necessary so that are

are padded

zero

each vector has then aligned the weight

the same the

number

of e l e m e n t s ; the v e c t o r s element concept d.. j

so that

= w.. r e p r e s e n t s

in document from an

i

of

(which numbering in C =

is determined

enumera},

tion of all concepts where C has n

{c | c 6 d_. , i = 1, The column vector of concept

..., n S.
—i

elements). w..
jl

then in

contains

as entries

the w e i g h t s
6

c.
I

each document either binary each

retrieved. or spectral

The relevance vector (graded) relevance included

R

includes for

judgments

<d. , the latter being

to a s c e r t a i n affects

whether the

the type of r e l e v a n c e judgment m a t e r i a l l y correlation. For R using each

i = 1, 2,....,m, J . is correlated 3 moment:

against

the P e a r s o n product

IX-6

C o r r e l a t i o n vector

V:

v . J

correlation between

71

S r 1 r r
: •
w

ll

W

12

W

13

|

•• •

W

lj 2j 3j

lm w0 2m 3m
-

2 3

W

21

w22 1
w

w23
w

W W

|W31

32

33
•

I
W

r n

1

W

nl

n2

W

n3

w .

w

nm

d ~n

1
Column vectors Column vector Vector entry Row vectors d S. = concept vectors
-1
r

R - relevance vector of user j udgments w.. = weight of i th concep t in document l = document vectors

Correlation between concepts and relevance weights Table 3

IX-7

I
i=l

(x
1

- y ) (y
1 1

-

y )
*•

(2)

P

1 where y = — 1 n

n

Z x. , . -, l i=l V = (v , v

K

y 2

1 = — n

n

L ' y. . .. J l 1=1 formed

A c o r r e l a t i o n vector using the relation
V

, ..., v ) is then

(3)

i

"

P

R

S.

'

This c o e f f i c i e n t value

of c o r r e l a t i o n

differs

from

the

cosine

n E (4) q xy

x. •y.
x x

i=i
?
• i i=l

2
1

"
• i i=l
1

2

in that

each o c c u r r e n c e

of a vector formula

element

x.
l

or the

y,
i

in the cosine x. - y or

coefficient

is replaced

by y.

term mean

y. - y

, as a p p r o p r i a t e , where vector. The

is the

of the entries moment

of the particular

Person regardless

thus provides values

ranging

from -1 to +1

of the signs of the vector on the other vector

e l e m e n t s ; the cosine non-negative

correlation, if all

hand, will be strictly non-negative.

entries are

IX-8

As an example of the d i f f e r e n c e b e t w e e n c o e f f i c i e n t s , one can consider vectors : (5) The cosine q._ = 1.0 AB tions are A = (1, 10) B = (1, 10) for C = the following

the

two

two-element

(10, 1) are

correlation and P q AC = q BL

coefficients = 0.20 P

these vectors the Pearson . The

, while = P

correlalatter

= 1.0

and

= -0.98

v a l u e of -0.98 magnitude vectors

in the P e a r s o n

set is indicative

of a strong

of a s s o c i a t i o n between vectors A and C and is nearly as strong as that

B and C; this a s s o c i a t i o n B, but

b e t w e e n A and versed implies (that that

the " d i r e c t i o n " of a s s o c i a t i o n

is r e of A

i s , a high value of the first component the first component of B will

also have a high

v a l u e , but value,

that

the first

component

of C will have a low to hold among A, B, the re-

if the a s s o c i a t i o n s

are assumed

and C in g e n e r a l ) .

The Pearson moment

thus d i s t i n g u i s h e s (in information is a

three cases of high negative

correlation

t r i e v a l , an indication of active user disinterest c o n c e p t ) , values near zero

(a sign of user u n c o n c e r n correlation

regarding of

a c o n c e p t ) , and high positive active interest in a c o n c e p t ) .

(an indication

These categories

correspond relationships of the

to the intuitive

ideas of p o s i t i v e and negative a basis for a possible

as well as provide present study

extension

to include

some variety

of negative

feedback.

IX-9

For either correlation method, there exists a test based on Student's t-test (Spiegel [9]) for significance of v.
1

the difference of the correlation vector component zero : JLZ_
\!

from

(6)

t =

( - r x p / ( - 2) 1 N
N - 2 represents the number of

where

degrees of freedom of the experiment and r xy represents the correlation, r N = 10 (10 documents retrieved) one

In the case for which

finds that for a one-tailed t-distribution, the following confidence level correlation cutoff values obtain:

Confidence Level p = 0.10 p = 0.05 p = 0.01

Cutoff x ( |r |> x) 0.4436 0.5495 0.7159

Correlation Cutoff Table 4

Cutoff levels of 0.8000, 0.6000, and 0.4000 were chosen for investigation since these values cover the confidence

IX-10

level range fairly well, giving low and high confidence points as well as an average (0.6000) figure. The above justifications, then, indicate that after the correlations are performed, the cutoff can be used to determine which concepts are significant in the determination of the relevance of the documents retrieved. for which Those v.
1

v. > x (x = cutoff value) are termed posit ively v. < -x are termed

s ignificant concepts, while those for which negatively s ignif icant concepts. signif icant concept s. Other v.

are called non-

In summary, the key idea underlying the SSC approach to relevance feedback is that document and query vectors are treated as strings of components (the individual conceptweight pairs) rather than as inseparable units. As noted

earlier, both the Rocchio and Crawford-Melzer strategies deal with whole vectors, whereas the SSC-oriented methods break the relevance determination into finer levels.

2.

Query Construction The investigation as executed made use of the queries

outlined in Table 2, though other combinations of the information obtained by the methods described above are readily apparent. The queries investigated were chosen

heuristically as being likely to yield fruitful results.

IX-11

The first of correlated document

the three query by using

t y p e s , the the mean of

conceptrelevant and other-

q u e r y , is formed

weights

for positively

significant

concepts

the mean of concept values wise. This construction

for all retrieved

documents

is similar

to the iteration

query

proposed component of

by Crawford approach

and Melzer

[4] but differs by

the

described

above and by the weights. from

application Clearly, the the

significance

tests

in determining

concept-correlated query future (in document

query may be farther

original

space) than d e s i r a b l e , and examine the attempt of

a possible to lesseu this

i n v e s t i g a t i o n might by using

movement

the vector

composed

the mean of concepts the

relevant and the

document w e i g h t s remaining query.

for positively

significant concepts of

(i.e., n o n s i g n i f i c a n t )

original

The nonsignificant a p o s s i b l e means by Ide

elements

query

is intended

as out within

for approaching relevant

the difficulty

(pointed

[7]) of mixed space:

and nonrelevant

documents

document

X A

R X 1

X = nonrelevant A = query R = relevant

document

document

X

Separated

Groups

of Relevant

Items

Figure 1

IX-12

In this

s i t u a t i o n , exemplified information about

by the classic sample

query

requesting certain

the a e r o d y n a m i c s

of b i r d s , over-

concepts

of the query

(aerodynamics) may greatly of the search;

shadow others

(birds) in the influencing elements query

the n o n s i g n i f i c a n t only those elements

is thus constructed query which the

using

of the original

significance It seems

test has feasible splitting search

shown to be overshadowed that this

(nonsignificant).

type of query might be useful or in the final stages of an to boost

in a queryiterative possible). query,

algorithm

(in an effort The third

the recall as high as

query

type, the strictly

significant

is the diametric q u e r y , since original

opposite of the n o n s i g n i f i c a n t includes only

elements of the

the former shown

those concepts significant

query

to be positively of relevant

(the mean is query toward

of the concept weight used as the entry

retrieved

documents

for each e l e m e n t ) .

The use of this in the search

in iteration will cleary what will the user judged thus effectively

produce a shift

relevant block

on the previous

i t e r a t i o n , and (in [4] have is

the retrieval As Crawford

of separated and Melzer

document

space) new m a t e r i a l .

noted, h o w e v e r , this characteristic highly pleased with the results

has value

if the user

of the previous

retrievals. sparse

The last query v e c t o r s

two methods will

in general produce

and hence cannot

always be used

effectively;

IX-13

this quality the concepts either

of sparseness and

(depending

on the generality level) may cause

of

the c o r r e l a t i o n

cutoff

the retrieval

of a great many of documents

documents and the

or the

return

of a very of

small number

disappearance study, inter-

the vector

on successive indicate that

iterations. the set sum

A further (or possibly

h o w e v e r , may section elements strictly elements taken

in the case of high retrieval q u e r y ) of those documents query

from

the

nonsiginficant the

retrieved the

under

significant

and under

nonsignificant than either method

query may produce better

results

alone.

3.

Conduct

of the

Experiment has been carried [8] and out using the word 42 (and form

The study Cranfield

200 Collection

the accompanying

queries b e c a u s e manageably)

this grouping

provides

a reasonably and

large number

of documents

q u e r i e s , in time and

view of practical facilities. of being against

limitations

on experimental has

This c o l l e c t i o n

the a d d i t i o n a l

advantage ranked infor-

composed queries

of documents which have been relevance

on a five-grade rather

s c a l e ; this in

m a t i o n was used runs

than the binary

judgments

some

in an effort

to test whether would

the use of

finer the

relevance distinctions

appreciably

improve

IX-14-

performance

of the q u e r i e s . run, which was information conducted reThe 1, described

The form of an experimental within trieval zeroth 2, and vectors query, the general system, was context

of the SMART

that of a t h r e e - i t e r a t i o n

search.

iteration was a full search, while 3 were performed (but the same using

iterations

one of the previously

in all i t e r a t i o n s ) as the the SMART entry,

iteration at four

Interfacing

with

system occurred (2)

main points:

(1) parameter and

initialization, docuThe routines, OURCON,

(3) a c q u i s i t i o n m e n t s , and first while which MODQUE

storage of v e c t o r s of retrieved of the new query v e c t o r .

(4) c o m p u t a t i o n

three points were covered

by trivial m e c h a n i c a l by the routine

the fourth point was accomplished effectively in creating assumed

the role of the SMART

subprogram

queries. runs of the three (1) the effects cutoff level, query of

In this c o n t e x t , then, several types were made the parametric (2) the effect as opposed each of the in an effort variation

to determine

of the c o r r e l a t i o n

of the use of spectral d e c i s i o n s , and types shown

relevance

judgments of

to binary

(3) the p e r f o r m a n c e in Figure 1.

three query

4.

Experimental With

Results to the cutoff l e v e l , it was observed that

regard

IX-15

for

the computing

facilities

a v a i l a b l e , the 0.6000

figure

( c o r r e s p o n d i n g , as noted level query of about type

in Table 2, to a

confidence of the

0.05) was most

suitable r e g a r d l e s s

investigated.

Experimental

runs made using a several for

0.4000 cutoff hundred

in all cases produced

queries with

c o n c e p t s , so that of processing

the core storage soon exceeded

required

continuation

that

available. to

The 0.8000 l e v e l , on the other hand, caused shrink n o t i c e a b l y , entirely struct during so that

the query

7 out of the 4 2 queries iteration (the attempt

vanished

the first

to c o n search. cutoff

the first Because

experimental

query v e c t o r ) of the

of this s i t u a t i o n , then, the 0.6000 runs. The query types

was used

in all p r o d u c t i o n

w e r e checked spectral of

for p e r f o r m a n c e schemes

using both

the binary a

and

the

relevance

(Appendix A contains in Table

summary

the spectral

s c o r e s ) , and, as shown

5, no was differbinary

appreciable detected. ences method

d i f f e r e n c e between Queries 11 and

the types of judgments consistent the

24 show the most

in p e r f o r m a n c e , and provided Cases better

in each of these cases than did the

final ranking

spectral be

approach. judged

in which

the spectral method

might

superior

(Q 1 3 , 1 6 , 28) had d i f f e r e n c e s

confined document.

to changes

of two or less

in the rank of a single

IX-16

Number of Differences Query Type in Final Ranks of Relevant Documents between Binary and Spectral Schemes

Number of Differences Affecting Order of Presentation of Document (iteration level) to User

Strictly

significant

0

0

8 Concept-correlated (Q 11, 13, 15, 16, 24, 26, 28, 42)

2 (Q 11, 24)

Nonsignificant elements

2 (Q 11, 24)

1 (Q 24)

Performance Difference between Spectral and Binary Relevance Table 5

IX-17

The following graphs of document level recall and precision constitute Figure 2. Figure 2a shows the performance (7), Figure

of the Rocchio-type method characterized by equation

2b illustrates the behavior of the strictly significant query, Figure 2c shows the curve for the concept-correlated query, and elements

Figure 2d details the performance of the nonsignificant query.

All plots are for information obtained from averaged (excluding queries 6, 9, 12, 14, 21,

results over 30 queries

23, 25, 29, 32, 33, 35, 36, in which all relevant documents were associated with the Cranfield 200 work form collection. In the graphs, a solid line denotes performance on the zeroth iteration (full search), a dashed line indicates performance on the first iteration, and a dotted line shows results of the second and third iterations (grouped together). Where

two iterations map to the same point, the lower numbered iteration key predominates.

IX-18

I.Oi 5 c .4 o ° 31 Rocchio

2,3

l 0.0

.5

.6
2(a)

.8

1.0

Recall
Fig.

I .Or—

5h
c o <> /

a
0)

-3

Strictly

Significant

Q_

• 2h

^••..2,3

6^
0.0' 0.0

i"

1
.1 .4 .5 .6 .8 1.0

Recall
Fig. 2(b)

Recall Precision Graphs Fig. 2

IX-19

1.0 .5 .4 3 .2

r
Concept - Correlated

c o 0) Q_

6~ 2,3

o.o1—
0.0

.1

.4

_L .5

-L .6

.7

.8

.9

1.0

Recall
Fig. 2(c)

1.0

T

a o

Ah Nonsignificant

Q_

0.0 0.0

I

.4

.5

.6

.7

.8

.9

1.0

Recall
Fig. 2(d)

Recall Precision Graphs

Fig. 2 (contd)

IX-20

The conclusion that the spectral relevance scheme is not advantageous agrees with what one might expect, since if the matter of binary relevance cannot be consistently judged by different individuals (this fact has been observed

by Cleverdon, Mills, and Keen [8]), then the more subjective matter of degrees of relevance would provide a shaky basis indeed for feedback modification. For the above reasons the general evaluation of the three query types is carried out in the context of a 0.6000 correlation cutoff level and binary relevance grades. In addition, in twelve cases (Q 6,9,12,14,21,23,25,29,32,33,

35,36) all relevant documents were returned to the user on the zeroth iteration (the full search), and, in accordance (e.g., Ide [7]),

with methods used by other investigators

these queries are excluded from the evaluation. Plots of the document level recall-precision curves are shown in Figures 2b,2c, and 2d for the strictly significant query, the concept-correlated query, and the The curves

nonsignificant elements query, respectively.

are generally lower than the curve for a Rocchio-type strategy with similar feedback (Figure 2 a ) :
N

i R. (a=l,y=l, all other coefficients=0 in equation(1))

(7)

Q.,- = Q. +

£ i=l

IX-21

Three points

should, h o w e v e r , be

noted:

(1) any

The experiment

was structured query failed

so that

if at any

time an iteration

to return of ten

new relevant d o c u m e n t s documents carded and shown

in the group

to the u s e r , that query was query retrieved and

disused

the original

in its place until was satisfied

either

the iteration relevant

count was

or another

document

retrieved.

(2) If the original relevant original position user; thus

full search

retrieved

no

documents before position j query j continued to be used

(j > 1 0 ) , the until to the

was reached the number of

in the return iterations

in which reduced.

the e x p e r i m e n t a l

query was used was

(3) using

The search comparisons were a cosine correlation vectors

accomplished and

between query

document

taken as e n t i t i e s .

A promight only

cedure more parallel have used with

to the SSC approach correlation in the

a concept-wise

dealing query.

those concepts appearing

The first called the "roller

characteristic

leads

to what might

be

c o a s t e r " e f f e c t , in which previously unfetched

post-search documents

analysis moved

shows

that

relevant

up sharply

during

an iteration

of the

experimental to the original 6.

query, but were query. Examples

lost w h e n a return was made

of this effect are shown in Table

IX-2 2

Rank 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

Query 18 Iteration 0* 1 2* 96R 140 97R 199R 46 64 68 52 47 42 141 108 49 178 161 21 85 171 40 100 10 44 61 54 112 187 55 181 157 143 96R 140 97R 199R 46 64 68 52 47 42 90 119 120 117 18 27 95 99 194 193 153 89 104 72 155R 141 121 173 134 93 96R 140 97R 199R 46 64 68 52 47 42 90 119 120 117 18 27 95 99 194 193 141 108 49 178 161 21 85 171 40 100

I Rank 3* 96R 140 97R 199R 46 64 68 52 47 42 90 119 120 117 18 27 95 99 194 193 141 108 49 178 161 21 85 171 40 100 ' 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 55 69 74 75 110 128 132 134 187 188 189

Query 22 Iteration 0* 1 2* 163 179 167 200 112 130R 150 125 166 164 14 57 184 58 31 42 102 30 198 189 109 147 39 142 32 60 178 145 103 129 128R 128R 163 179 167 200 112 130R 150 125 166 164 10 137 129 14 142 160 106 136 143 135 140 108 107 139 11 153 24 83 184 149 128R 163 179 167 200 112 130R 150 125 166 164 10 137 129 14 142 160 106 136 143 135 57 184 58 31 42 102 30 198 189 109

3* 163 179 167 200 112 130R 150 125 166 164 10 137 129 14 14 2 160 106 136 143 135 37 134 58 31 M 132 30 198 139 139

128R 131R 127R 127R 131R 131R 131R 127R 127R

*

r.

Original quer y used

"Roller Coaster" Effect from Discarding Experimental Iteration Query (Strictly significant query, binary judgments, 0.6000 cutoff) Table 6

IX-23

It is not c l e a r , h o w e v e r , that to continue fails using the e x p e r i m e n t a l

the best approach

is it

iteration query w h e n into the group

to draw any new relevant the u s e r . Table

documents that

seen by

7 shows

the same

experimental worsen obtained can in this the

query may perform well the results in another that case.

in one case but may actually query alone would part of the query have

the original Undoubtedly, that

this behavior types tested

be explained study

by the fact

(particularly

the strictly toward

significant

q u e r y ) move

query vector but m o r e

decidedly

previously

retrieved

documents, and into well

investigation

into query groups

characteristics

the reasons why under before best

there exist

of requests w h i c h do is

one iterative

scheme and not under another conclusion

necessary the

any c o m p r e h e n s i v e

can be drawn as to expressed in

path

to take in resolving

the problem

(1) a b o v e . The somewhat second point that shows that the results are cases (for biased

by the fact

in several

example, used

Q 1 0 , 1 1 , 2 0 , 26) the experimental on iteration bad 2. For query

query was first

1 the results w e r e query had

outstandingly a zero experimental

in all efforts -- the original

c o r r e l a t i o n with all relevant method was ever applied

d o c u m e n t s , and no For future

at a l l .

studies, a documents

wiser a c t i o n

in the situation

in which no relevant

1X-24

Rank

Query 5 I t e r a t l on 2 0* 1 59R 162 197 58R 160 29 182 189 185 184 60R 165 115 150 9 191 10 198 164 11 156 139 168 92 28 16 180 57 200R 90 13R 8R 59R 162 197 58R 160 29 182 189 185 184 13R 89 141 60R 200R 21 15 27 56 126 172 176 171 139 44 30 123 4 140 147 8R 59R 162 197 58R 160 29 182 189 185 184 13R 89 141 60R 200R 21 15 27 56 126 123 167 122 109 94 121 18 163 165 35

Rank 3* 59R 162 197 58R 160 29 182 189 185 184 13R 89 141 60R 200R 21 15 27 56 126 123 167 122 109 94 121 18 163 165 35 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 34 35 38 51 65 69 79 80 102

Query 7 Iteration 0* 1 2 41R 90R 11 60 45 76 160 111 100 176 185 133 192 159 117 110 156 71 83 29 132 158 195 198 150 154 114 199 23 184 95R 72R 41R 90R 11 60 45 76 160 111 100 176 95R 91 93 36 199 64 173 32 155 96 94 193 43 33 140 116 99 50 23 103 42R 41R 90R 11 60 45 76 160 111 100 176 95R 91 93 36 199 64 173 32 155 96 117 94 119 103 192 195 92 153 121 120

3* 41R 90R 11 60 45 76 160 111 ICO 176 95R 91 93 36 199 64 173 32 15 5 9 6 ' 117 94 119 103 19 2 195 92 153 121 120

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 j 26 27 28 29 30 112 133 I 140 144

8R

7 2R 72R 42R 42R 72R 42R

* , Original auerv used •

A d v a n t a g e o u s Effects from Discarding E x p e r i m e n t a l Iteration Query (Strictly significant query, binary j u d g m e n t s , 0.6000 c u t o f f ) Table 7

IX-25

are returned equation other

would

probably

be to utilize a variant a= 1, y = 1, 6 = 1, and the query

of

( 1 ) , perhaps with

all space.

coefficients

= 0, to move

in document feedback some

Alternatively, strategy such as (8) into

the inclusion

of a negative

the e x p e r i m e n t a l

q u e r i e s , through

equation

Q-.i

=

Q

+ c N

, where

N

is a vector elements) concepts,

(possibly with weighted of negatively and where

significant

Q represents the r ^e e x p e r i m e n t a l query of type e,

might Qn

be e f f e c t i v e , w h e r e

Q

could

be replaced

by

Q. or

if no relevant

documents were

returned. of specific query queries produced

With regard types,

to the p e r f o r m a n c e that

it was discovered

in all of the 30 user elements query

in the a n a l y s i s , the n o n s i g n i f i c a n t final rankings lower than either

the strictly

significant, as

the c o n c e p t - c o r r e l a t e d , in equation type query collection

or the R o c c h i o - t y p e indicating

(formulated that in

(7)) q u e r i e s , thus for which or that

either

the this

it is effective

is not present

the method

is generally

inapplicable. conclusion sample that involved query

It i s , of c o u r s e , impossible answering this question from

to draw a general the small request the implication

in the present

e x p e r i m e n t , but useful

this alone)

may not be p a r t i c u l a r l y

(at least when used

IX-26

could

follow

from

either

case. the experimental nonan

One should significant average concepts

n o t e , h o w e v e r , that queries (first

elements

iteration) contained to an average of This states finding 9.1

of 8.0 for

c o n c e p t s , as compared queries. 1, which request

the original

corroborate

hypothesis few

(2) of section

that only a very important is in

ideas of the original

are really

determining noteworthy

the user's n e e d s . for its possible

This last conclusion to the

application

interpretatio it may imply becau

of an original natural that a detailed

language

r e q u e s t , since

analysis

of the query

is not necessary to abstract the

some quick method discriminatory

might be developed

ideas. significant query performed in general

The strictly very Table other, similarly 8 details

to the R o c c h i o - t y p e examples in which

query of equation ( 7 ) . either method surpassed th

thus lending to more

support

to the feeling retrieval

that perhaps a of

real key

successful queries

is the development into One groups should

a strategy for which note that

by which

can be classified is a p p r o p r i a t e .

a particular experimental

method

results confirm pre-test query moved

projections close cases

in that

the strictly

significant

decidedly in some

to the previously relevant

retrieved

d o c u m e n t s , so that pushed away

d o c u m e n t s were actually

from

retrieval

(Table 7 ) .

IX-27

Strictly Significant Query Query 3 (Strictly Signif icant)* 10 14 15 20 21 22 1 4 11 14 15 144 12 14 40 119 5 22 24 32 33 1 3 4 32R 33R 30R 4R 57R 31R 59R 58R 13R 60R 200R 8R 92R 45R 16R 44R 94R 90R 93R 91R 95R 96R 97R 199R 154R 17R 136R 135R 157R

Concep t-Correlated Query 10 16 18 26 27
32

Rocchio-Type Method 10 18 21 22 24 124 32R 30R 4R 57R 31R 33R

32R 30R 33R 4R 31R 57R 59R 58R 200R 60R 13R 8R 92R 45R 16R 44R 94R 90R 93R 91R 95R 96R 97R 199R 154R 17R 135R 157R 136R

Query 5 (Strictly signif icant)

1 4 15 32 134 141 12 14 40 115 5 22 24 32 38 1 3 4 1 3 15 51 140

I 59R 58R 4 12 200R 60R 1 7 13R 56 8R 82 12 14 61 72 5 20 21 22
25

Query 11 (Concep t correlat ed) Query 17 (Rocchio)

9 2R 45R 44R 16R 9 4R 95R 91R 90R 93R 9 6R 9 7R 199R 15 5R 154R 17R 135R 15 7R 13 6R

Query 18 (Rocchio)

1 3 4 1 3
J2

Query 39 (Strictly significant)

1 3 14 16 20

20 39

* < method judged best by FERF criterion over three iterations (Section 3) All ranks less than 10 were set by the initial full search

Final Ranks of Relevant Documents for Selected Queries Table 8

IX-28

The c o n c e p t - c o r r e l a t e d pattern

query

followed

the same

general

as the R o c c h i o - t y p e m e t h o d , but generally worse than either query. the Rocchio If rankings the query or

performed the the

noticeably strictly FERF query

significant

are made using

coefficient never

(defined

below),

concept-correlated (the two are the strictly 31 . (Table tied

surpasses

the Rocchio method 31) and surpasses

on queries significant

16,27,28 and query only

on queries

1 1 , 1 6 , 2 6 , 2 7 , 2 8 , and 5 and 39

In a d d i t i o n , for some q u e r i e s , such as queries 8), the c o n c e p t - c o r r e l a t e d than does The most the noise either method performs

considerably significant is

worse type. that

the Rocchio

or the strictly for

probable explanation introduced

this behavior for the

by the entries affecting

nonsignificant discrimination

concepts of the

(Table 2) is adversely

search. Document level r e c a l l - p r e c i s i o n analyzed, is slighly query and graphs (Figure 2) show

that

for

the 30 queries

the R o c c h i o - t y p e higher the

method the

p r o v i d e s a curve which nonsignificant q u e r y , and significant experimental unless elements

than that of

concept-correlated strictly the three

about

the same pattern as that of the It is difficult lie so close that

query. queries

to explain why in

together

performance in this be

one concludes

the SSC approach

as used

investigation

(though perhaps different

results would

IX-29

found if the three points raised above were resolved) contributes no information to the search that a simple Rocchio method does not also produce. In an effort to gain a more solid quantitative

measure of the performance of an iteration method in a frozen feedback situation (in which documents retrieved

have their ranks locked, so that the highest ranking a document can receive on iteration i is i * N , where N documents

are returned to the user on each iteration), the frozen exponential ranking factor (FERF) has been developed: r-1 En. j=0 J

(9)

gr

= T -

, r = 1,2,...,i

0 if gr=0 (10) f ' r 8r otherwise i-1 Z (10 ** (i-j)) * f J+ 1 ' j=0

(11)

P = FERF = where

T = total number of documents relevant to a query n, = number of relevant documents reth trieved on the k iteration J = number of iterations L initial full (not counting search) performed

IX-30

The

quantity

p, which will

always

lie in the range answer

[0, to the

10 ** i ] , has been evaluation The FERF retrieved in w h i c h problem

introduced pointed

as a possible

out by Hall and W e i d e r m a n [ 1 0 ] . of relevant all are not and it documents found,

is not affected on the full

by the number (provided

search

case the evaluation breaks d o w n ) , coefficient

does

assign a higher retrieves

to a method which

promptly the

new m a t e r i a l

than to a method iteration.

which retrieves

same m a t e r i a l is in some number

on a later

F u r t h e r m o r e , the

FERF of the

sense " n o r m a l i z e d " relevant note that

since it is independent to a query. the FERF is a rather

of d o c u m e n t s One should

gross rank

measure

of d e s i r a b i l i t y within

in that

it makes no evaluation of shown to the u s e r . consequence This

an iteration group

limitation, will

however,

is not one of major examine the entire

since the user in any

presumably

group returned

case. associated which An

Goodness directly with is intuitively illustration

of result

has for these reasons been of the FERF (an assignment

the m a g n i t u d e

p l e a s i n g , as seen in the example b e l o w ) . is given in Table 9. rankings for

of the FERF these

Following the v a r i o u s

i d e a s , the following queries

types of experimental

in this study

obtain

IX-31

Global

conditions:

Query Q i = 2 N = 5 T = 7

(notation identical to that in body of the paper)

Method A 1 2 R* 3 4 R 5

M_e thod _B 1 2 R 3 4 R 5 6 R 7 8 9 10 R

Method _C 1 2 R 3 4 R 5

Full Search

Iteration 1

6 R 7 g l = 5 8 R 9 R f = 0.6 10 11 12 13 14 15

6 7 = 5 g l = 8 R = 0.4 9 R f l = 10 R 11 12
J3

g

l

=

5 0.6

f

l "

g2 = 2

Iteration 2

f 2 = o.o
FERF = 60

11 R 12 R g 2 -= 3 13 R 14 = 1.0 f 2 " 15 FERF = 50

g2 = 2
f

14 15 R

2 "

0.5

F E R F •• = 65

* - indicates position of a relevant document

An Illustration of the FERF Approach Table 9

IX-3 2

when averages

over

the 30 user queries

are

taken:

Rocchio Strictly

(equation

(7))

547 475 398 291

significant

Concept-correlated Nonsignificant elements

Overall

Significance Table 10

Evaluation

This m e a s u r e not surpass

also

shows

that the experimental query

queries

do

the simple R o c c h i o - t y p e

performance. investigation, that necessary

The SSC m e t h o d , as tested does require a sizeable amount for an ordinary The information Rocchio-type

in this

of time beyond

relevance

feedback

search. query

f e t c h e s , significance the machine the time

c a l c u l a t i o n s , and by

construction approximately the user which for

increase 1 5 % and

time of a search

(and

effort) required (a

from quantity base than

to judge

the relevance necessary

of ten documents

is probably

to provide a d e f e n s i b l e is markedly

the

statistical

calculations)

greater

in u n e m b e l l i s h e d

methods.

Consequently,

the SSC

approach

IX-33

must be proved results resource can be

capable of producing strategies

substantially before the

better

than do existing expenditure

increased ideas

necessary

to utilize

the SSC

justified.

5.

Conclusions Although

and

Recommendations conducted with the

the i n v e s t i g a t i o n s

queries

of Table

2 have not been outstandingly of relevance

successful the testing approach on

in obtaining authors

better m e t h o d s

feedback, of

feel that because of as broad

of the impossibility

all aspects

a concept

as the SSC feedback

in a single rather which before the present

limited study

e x p e r i m e n t , some of the ideas should be further In

is based

investigated

they are dropped of treating

from c o n s i d e r a t i o n . documents and

particular, as strings than

the a p p r o a c h of concept

queries

beads which

can be broken a p a r t , rather be added,

as i n d i v i s i b l e bars which must and weighted as entities seems

subtracted, it allows the parts tests

to have value because in filtering

the investigator noise introduced

to be more by

selective

out in

irrelevant

information

contained

of a document

or query v e c t o r .

The use of statistical

IX-3M-

to a s c e r t a i n relevant

those concepts

important

in

distinguishing be be investigated developed, in which the

from nonrelevant

documents types

should should

f u r t h e r , and a d d i t i o n a l p e r h a p s along the lines

query

suggested

previously,

characteristics For

of the SSC approach can be fully investigate a modified

exploited.

e x a m p l e , one could

concept-correl entered and i )

query with

in which positively the m e a n of

significant

concepts are documents (or

the w e i g h t s

in relevant

each remaining query means query

(unused) concept

of the original

is entered with of r e s o l u t i o n retrieves

its weight u n c h a n g e d . in which documents negative

Similarly, the iteration

of the situation relevant

no further for

for a particul feedback into

i t e r a t i o n and means the SSC approach Another extension

introducing be

should

considered. is the retrie

p o s s i b l e area of future research

of the c o n c e p t - w i s e procedure in the third

to the actual evaluative

of d o c u m e n t s , as outlined mentioned the area known, here in Section 4. of checking

point in

Further work could cutoff

also be done

correlation for

l e v e l s ; it is now reported workable

for

i n s t a n c e , that 0.4000 and

the type of feedback the range of from

that

0.8000 are beyond of varying

l e v e l s , but to a lesser

the effect

the cutoff

0.6000

degree has not been

studied.

IX-35

REFERENCES

[

1]

R o c c h i o , J.J. "Relevance Feedback in Information R e t r i e v a l " , Scientific Report N o . ISR-9 to the N a t i o n a l Science F o u n d a t i o n , Section I I I , Harvard C o m p u t a t i o n L a b o r a t o r y , August 1 9 6 5 . R o c c h i o , J.J. "Document Retrieval System O p t i m i z a tion and E v a l u a t i o n " , Harvard U n i v e r s i t y D o c t o r a l T h e s i s , Scientific Report N o . ISR-10 to the N a t i o n a l Science F o u n d a t i o n , Harvard C o m p u t a t i o n L a b o r a t o r y , March 1 9 6 6 . R o c c h i o , J.J., S a l t o n , G. "Search O p t i m i z a t i o n and Iterative Retrieval T e c h n i q u e s " , P r o c e e d i n g s of the Fall Joint Computer C o n f e r e n c e , Las V e g a s , November 1965. C r a w f o r d , R.G., M e l z e r , H . Z . "The Use of Relevant D o c u m e n t s Instead of Queries in R e l e v a n c e F e e d b a c k " , Scientific Report N o . ISR-14 to the N a t i o n a l Science F o u n d a t i o n , Section X I I I , D e p a r t m e n t of Computer S c i e n c e , Cornell U n i v e r s i t y , October 1 9 6 8 . R i d d l e , W., H o r w i t z , T., D i e t z , R. "Relevance F e e d b a c k in an I n f o r m a t i o n Retrieval System", Scientific Report N o . ISR-11 to the N a t i o n a l Science F o u n d a t i o n , Section V I , D e p a r t m e n t of Computer S c i e n c e , Cornell U n i v e r s i t y , June 1 9 6 6 . Ide, E. "User Interaction with an Automated R e t r i e v a l S y s t e m " , Scientific Report N o . ISR-12 to the N a t i o n a l Science F o u n d a t i o n , Section X I I , D e p a r t m e n t of Computer S c i e n c e , Cornell U n i v e r s i t y , June 1 9 6 7 . I d e , E. "Relevance Feedback in an A u t o m a t i c D o c u m e n t R e t r i e v a l S y s t e m " , Cornell U n i v e r s i t y Master of Science T h e s i s , Scientific Report N o . ISR-15 to the N a t i o n a l Science F o u n d a t i o n , January 1 9 6 9 . C l e v e r d o n , C , M i l l s , J., K e e n , M. ASLIB Cranfield R e s e a r c h Project - Factors D e t e r m i n i n g the P e r formance of Indexing S y s t e m s , C r a n f i e l d , 1 9 6 6 .

[

2]

[

3]

[ 4]

[

5]

[

6]

[

7]

[ 8]

IX-36

REFERENCES

(Continued)

[ 9]

Spiegel, M.R, Statistics, New York, Schaum Publishing Co. , 1961 (p. 247). Hall, H., Weiderman, N. "The Evaluation Problem in Relevance Feedback Systems", Scientific Report No. ISR-12 to the National Science Foundation, Section XII, Department of Computer Science, Cornell University, June 1967.

[10]

IX-37

APPENDIX Origins

A Judgments

of the Spectral R e l e v a n c e for the

Cranfield

200 Document

Collection

Method (Abstracted

of Retrieving

Documents Keen [ 8 ] , V o l . 1, p.79)

from C l e v e r d o n , M i l l s , and

Stage

1:

Authors

of d o c u m e n t s

in the c o l l e c t i o n

construct

search questions assessment citations in the

(queries) and make a in the

relevance

of items listed of their

bibliographic included

own d o c u m e n t s which are

collection.

Stage

2:

Using Stage every

the document

c o l l e c t i o n and competent

questions examine

from

1, technically document

people

in relation

to every

question

to

find

any a d d i t i o n a l noted

(to the b i b l i o g r a p h i c documents.

citations

in Stage 1) relevant

Stage

3:

The document documents assessment

authors

receive

the

additional

produced of

by Stage

2 and make a final

relevance.

IX-38

APPENDIX A

(Continued)

Method of Marking Relevance (Abstracted from Cleverdon, Mills, and Keen [8], Vol. 2, p.123; codes changed to conform to the usage in the present experiment)

Grade 4:

References which are a complete answer to the question.

Grade 3:

References which are of a high degree of

relevance,

the lack of which would have made the research (to answer the query) impracticable or would have resulted in a considerable amount of extra work.

Grade 2:

References which are useful, either as general background to the work or as suggesting methods of tackling certain aspects of the work.

Grade 1:

References which are of minimum interest (for example, those that have been included from a historical viewpoint).

Grade 0:

References which are of no interest.