PI Figure A1 Teat Data Summary raw/aource material name collection name size reqs docs primary Indexing Cranfield C1400I 225 182* 182* 1400 manual from documents automatic from titles automatic from titles via profiles automatic from abstracts automatic from titles UKCIS U27000T U27000P 27361 27361 NPL N11500A N11500T 93 93 39 3 9 11429 11416 Evans (iNSPEC) E2500T E2500P 2542 2542 automatic from titles automatic from titles via profiles * Most of the tests were done with a subset of requests having strict as opposed to fancy Boolean form for the original SDI profiles• These subsidiary collections with 75 requests are named U27000Tb and U27000Pb. The C1400I, U27000P and E2500P requests are manual, the others automatic. P2 Figvore A2 Test Raw Material : /CTeverdon,Mills and Keen 1366 Project on factors determining the performance of indexing systems. 1. Cranfield Size 221 queries , especially 1400 documents J s8eGvveerraall s u b s e t a especially subsets, Subject Aeronautics* Indexing source Pull texts; &(title +) abstracts/titles. 42 2Q0 B Q 1 O* U rH 0 C C D D ON CM C\J U 9i U Av reldocs C\J C D o -p ON CO LfN ON CO LT\ GO CO KNCC CO C\J CO - o o CJ KN CO O CO ON ON CO ON ON CO Av docs per terms VO TO CO NO LTN CO ON CO o o VO 58 CQ CO NO T— ir\ CO CD CD CD £ C D CD O C O ON CO CO CO KN CO CD CD -P O -p • vo NO CO 00 T VO KN KN VO LT\ O «tf<*fr KN O KN C O CO T- CO CD CO T~ «*f ON LCN U <8 OQ C7< CD CO CO CO CO KN ON KN ON ON KN ON KN O CO CO CO > O -*• nd GO & C C D D PUP o u u vo ITN KN ON ON o o KN vo eg ml S fc o > C C O D D c o 1 •HI 70 00 llec EH 4001 UN o c EH PH O CJ T- CM o fc> Iz: CM M F9 Figure A7 Subsidiary Collections No documents No requests No reldocs Av reldocs per req CranfieLi ITKCIS C1400e C1400o U27000Te TJ27000TO U27000Tf U27000T1 U27000T3/4 U27000T1/2 U27000T1/4 U27000T1/8 U27000T1/16 U27000TS1/4 U27000Tb J U27000Pb * U27000Tbe • > U27000Pbe J U27000Tbo 3 U27000Pbo * U27000Pbf U27000Pbl U27000Pb3/4 U27000Pbl/2 U27000Pbl/4 U27000Pbl/8 U27000Pbl/l6 U27000PbSl/4 700 700 13680 13681 11613 15748 20521 13681 6841 3421 1711 6840 27561 13680 13681 11613 15748 20521 13681 6841 5421 1711 6840 5714 5706 5715 5710 2 J 225 225 182 182 182 182 182 182 182 182 182 182 854 780 3.7 3.5 4058 6657 8036 5358 2679 1339 670 2702 3759 1857 1902 1444 2295 2804 1870 935 467 234 902 1022 1061 * * * * * 22.3 36.6 44.? 29.4 14.7 7.4 5.7 14.8 49.9 24.5 25.4 19.3 50.6 * * * * * 57.4 * 24.9 * 12.5 6.? 5.1 * 12.0 1100 11.4 11.4 11.7 75 75 75 75 75 75 75 75 75 75 75 93 93 39 39 NPL N11500Ae N11500Te N11500AO N11500TO E2500Te E2500Pe E2500TO Evans 1271 1271 443 456 2 * These figures are estimated as percentages of the whole set. Note tha^^ay n?fte1iave any relevant documents in a given subset; if this is allowed for slightly different averages are obtained. F10 Figure A8 Collection Term Retrieval Facts Av terms per req Av Av max Av No match- match- match- reling ing ing docs terms terms retr. terms Av docs retr. Av reldocs retr. per req per doc per reldoc &. fo tot per req per req C1400I TJ27000T U27000P U27000Pb N11500A N11500T E2500T E2500P 7.9 7.0 29.4 21.9 5.7 2.8 5.7 2.9 4.7 5.7 4.4 4.2 1.7 1.1 1.2 1.1 1.5 1.1 1.5 1.2 5.5 1.4 1.8 1.4 2.8 108 1.9 1.9 7.2 7.2 52.4 48.0 746.7 6.9 1549 96.0 1841.4 25.0 4544 42.4 8844 2668.7 48.6 82.5 2820 1481.5 57.6 75.4 5171.0 21.9 2054 97.6 1529.0 17.7 1645 79.0 678 749.5 17.4 75.4 477.6 18.6 725 80.6 The profile figures refer to simple term matching with any NOT terms removed from the profile term lists F11 Figure A9 Performance Representation Methods A Document value (dv) recall/precision figures obtained by averaging by numbers across matching values, with precision at standard recall levels subsequently obtained by linear interpolation, B Recall cutoff (re) recall/precision figures obtained by averaging over precision computed for individual requests at standard recall levels using pessimistic interpolation (with fully and completely ranked output)• Document rank (dr) recall/precision figures obtained as for dvf but averaging across matching ranks (fully and completely ranked output)• Precision rank (pr) recall/precision figures obtained by averaging by numbers for specific ranks (fully and completely ranked output)* C Total retrieved (tr) average relevant and total documents retrieved, by the lowest matching value, obtained by averaging by numbers. Relevant rank (rr) average numbers of relevant documents retrieved at specific ranks, obtained by average of numbers (fully and completely ranked output). Cumulative requests (cr) cumulative proportion of requests retrieving a relevant document by specific ranks (fully and completely ranked output). PI 2 Figure B1 Special Oases for Weighting Functions Definition Documents in which term occurs Functions Implications for document when term is: present absent indifferent indifferent r =0 n-r = 0 R-r = 0 non-relevant onlyrelevant onlyall relevant & some others all others 'F1, F4 bad good indifferent indifferent good F] / F4 F4 F4 F4 bad good N-n-R+r • 0 some relevant & • n-r = 0 R-r * 0 r a0 7 all relevant & J no others bad good o no relevant & • (J all others N-n-R+r '• bad bad good : the document should never be retrieved, i.e. should be at bottom rank : the document should always be retrieved, i.e. should. be at top rank indifferent: the document should be unaffected, i.e. should be at the rank determined by its other terms F13 Figure B2 Numbers of Retrieved Documents score 1 av. rel av. nrel av. nrel per rel recall 3C# av. rel av. nrel av. nrel per rel 12.8 C1400IO T FIRST1 P4p FIRST2 P4p BEST1 ALL ALL P4p P4p F4r BEST2 F4p 3.3 2.8 2.8 2.5 2.6 2.8 3.1 375.4 64.O 67.4 48.8 52.8 64.1 39.9 113.7 23.2 24.1 19.8 20.6 22.7 12.7 1.1 1.0 13.5 4.0 3.2 4.3 3.6 3.3 1.0 3.9 3.0 4.0 3.4 3.1 1.0 U27000Fbo T PIRST1 P4p FIRST2 F4p FIRST3 P4p BEST1 F4p 18.9 10.5 12.0 14.4 724.2 114.5 146.5 183.7 93.5 116.9 131.4 335.5 336.6 38.3 11.0 12.2 12.8 7.6 7.6 7.7 7.6 7.8 7.7 7.8 7.8 7.8 173.3 20.9 23.9 25.9 35.0 22.6 22.4 22.7 2.7 3.1 3.4 4.5 2.9 2.9 1.2 0.4 9.6 11.7 12.4 18.7 18.8 9.7 10.0 10.6 18.0 BEST2 P4p BEST3 F4p ALL ALL F4p P4r 9.6 3.5 17.9 The slight variations in the numbers of relevant documents at standard recall ere due to the use of interpolated document numbers. \oo G1 Graph 1' T O O ^p loo Graph 2 CltfOoXo G2 Graph ;? - - T M S T £ t * f S , fZfp Oc*l| Cu+ofc loo G3 Graph 4 CllfoOIo — r o—o fyp o c f^r (? too yoo Graph 5 U27©o^Pto 1• T O 0 ^ ^kh. 1 1 1 O..~o FlfK >^> 1 R 1 1 \ \ \ \ \ * \ N ^ '• \ N I N 1 I \ X >A V \ \ X v « • \ • ^ 1 \ • \ r i \ \ < N N *>. v \ b -o \oo W Graph 6 T o 0 ^4-f 100 \OC Graph 7 NIISooTo ^o 160 1 G5 Graph 3 E2.SO0T0 _.- T 0 1 % 1 11 0 l^p O-c 0—0 ^N \ FVpv ?*K 1 1 Q O, o M tf h • c k a \ \ \ \ \ , & X \ \ * '•v . N N o *-. v . N o, ^ o^ ^ "*• \ ' \ It ^ \ \ \ \ . ^^w »» » '*.. " 0. ^ '0. *« V \ \ \ x^>v > • v > ^ v. "^ S V. "0 100 \oo Graph 9 E2.5©opc \ \ \ G6 Graph 10 CA+ooX* T X * F, f* \ \ loo loo Graph 11 U2/7oooPbo X o K F, f of>p o o &+" fc o X \oo 07 Graph 12 r x—x F i f >< o o- —or>f \ \ x o lOQ r !©o G8 Graph 15 Clifoeio Q o D H,P o ^-p £ io^> \oo Graph 14 E2SOOPo T 0 DH,f 0 O^Vp G9 Graph 15 CI4-O0X© T * ^ * Wp ^ t»lr O O O &*? O &+** (do Graph 16 \>7nOOO?\>o r ^ — 4 Wp o- -o for \oo\ no Graph 17 NJHSOOAo T A A Vlf 4 &U\r o o o f^-p o FtK A O \OQ \oo\ Graph 18 N\\500To - - T * A^'P \j\r AT--A o—o % f c>—o &+r a ^o G11 Graph 19 - - T &—^Ulp o o FVf o—o ^ - r /0<3 Graph 20 &2Soof>0 T * ^ *Mf ^Olr o—o^f> ^ Ao 1

o 0 0 0^P o ^fP1" recall cvf-oflp \00 G13 ft recall cutoff {00 tool Graph 24 K/ll5"0oTo O O O^t? OnfPr CCCAH cuf-off O 6 G14 Graph 25 £2.50oTo •b oo —r \ W o o \ \ \ 0 \>, \ V \ \ ' o o o^P -o B|-pr *, X 0. X r*c*H ci/tofp 6 \ o. \ \ \ N o V 0 O \ t> o /oo Graph ?6 E2!>OOPo r •o^P O o^*?r r*c*\\ cvttff (ft f G15 Graph 27 cv+oo£o —r A—-*U|p o o p / f-P recg>>i ci^offr \ fOC Graph 28 VHOooPko - -T A A <"P A - -A ^ l r O O ^ P rcc-cll c^f-off1 G16 Graph 29 \ \ \ E 2 5 00 Po -_r A—Afip V \ o \ 0 ^ f r<.c<\U ci/fbfi \ \ \ \ \ \ \ G17 Graph 50 C\\+OOXo __ r A — ^ Wp 62 B2 o- Ft^p 100 Graph 51 T 4 O 4 U'p •o ^ p B3 83 G18 Graph 32 NII5-OOAO __ T o o ^P * 3 foo *—&v\? o—0 83 FT+P S3 G19 Gra h P ?4, E25"OOT*© __ T A A^f> O O ^-P ^3 ^^ /co ioo I graph ?5 £X500P0 T ,s—A w p 83 o o __ tftp 83 T1 Note on the organisation of the Run Tables The performance evaluation methods used in the project are described in Chapter 4 of Section A and summarised in Figure A9# The organisation of the search output is discussed in Chapter 5 of Section A. Each table groups runs for closely related indexing/searching strategies, with the parallel runs for different collections constituting a run set. The tables are divided into two sets: Main Tables: these contain recall/precision results obtained with the main performance representation method used, namely dv. Secondary Tables: these contain different representations of results obtained with other performance representation methods, namely re, dr, j*r, tr, rr, and cr. Any particular run (run set) can therefore have corresponding entries, for the different representation methods, in a Main Table and corresponding Secondary Tables. There are 12 table groiips of runs, as indicated in Section A, Chanter S: thus R, for example, refers to a group of relevance weighting run sets. The corresponding Main and Secondary Tables sTe distinguished by the prefixes M and S respectively, with the latter further distinguished by the particular representation method used* For a given run set in the R group, say 9, we therefore refer to the particular search output representation figures of the Main Table as MR9, and to the figures of the re representation as SrcR9# In addition, again as described in Chapter 5 of Section A, particular indexing/searching strategies were tested with variant relevance sets, as opposed to the regular ones. The results essentially provide further alternative tables, involving corresponding Main and Secondary representationso The variant sets fall into two groups, labelled v1 and v2, and these labels are used to flag the appropriate tables within the Main and Secondary sets respectively: thus we may refer to run set v2/MR9 or v2/SrcR9. However as there are relatively few run sets involving the variants, their tables are summary ones following the complete sets of regular Main and Secondary Tables respectively. An initial Key Table lists the table group and run set specifications, and the following Summary Table shows which runs have actually been done. The actual results for these all appear in the Main Tables, while those with corresponding Secondary Table entries (mainly re other than ir) have a f. An Other Table, labelled 0, gives some miscellaneous results. The overall order of the tables is therefore: Key Table K Summary Table S Main Tables, regular M " variant v/M Secondary Tables, regular Sx " variant v/?x Other Table . T2 Key Table SPECIFICATION OF TABLES Table T W W R Description Terms, i.e. simple term matching of original requests Collection frequency weights, applied to original requests Relevance weights, applied to original requests Substitutions of relevant document terms for original requests Substitutions with collection frequency weights Substitutions with relevance weights Classifications enlarging original requests Classifications with collection frequency weights Classifications with relevance weights Expansions by relevant documents of original requests Expansions with collection frequency weights Expansions with relevance weights Huns T1 W1, W .1 1 R1-12, R2 .1 .2. .2 S1-5 SW1-5 SR1-5 C1-2 CW1-2 S SW SR C CV CR E EW CR1-6 E1-5 EW1-5 ER ER1-5 Tlie individual runs in each table are listed below: '-03 Key Ts SPECIFICATION OP RUNS Table Name T W R T1 W1 V1.1 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 R11 R12 R2.1 R2.2 S1 52 53 54 55 SW1 SW2 SW3 SW4 SW5 SR1 SR2 SR3 SR4 SR5 C1 C2 CW1 CW2 CR1 CR2 CR3 CR4 CR5 CR6 E1 E2 E3 E4 F5 Description Original request terms Collection frequency weights, PO, using majc n " " " " using N Relevance weights, F1, predictive, i.e. F1p " " P4, " P4p " " F1, predictive retrospective, i.e. F1pr " " P4, • • " F4pr " " F1, retrospective yardstick, i.e. F1r " " F4, " " F/]r " " U1, predictive, i.e. U1p " " U3, " U3p " " U1, predictive retrospective, i.p. Ulpr " " U1, retrospective yardstick, i.e. U1r » " U3, " " H3r " " H1, predictive, i.e. H1p * * " F4, predictive, u=0 if n=0 " " F4 , " w=0,u~0 if n=0 F1 relevant document tised as request B1 " " " F2 " " " " " B2 " " " " " ALL " " " " " Combines S1 specification and W1 specification " S2 " " " " " S3 " " " " " S4 " " " " " S5 " " " " Combines S1 specification and R2 specification " S2 " " " " S3 " " " " " S4 " " " " " S5 " " " " STARS classification enlarging request MST " " " Combines C1 specification and W1 specification " C2 " " " " Combines C1 specification and R? specification " " " " R7 " " " " " R12 " " C2 " " R? " " " " " R7 " " " " " R12 " P1 relevant document enlarging request ft Tt If M B1 »l tt tt tt F2 tt tt tt tt B2 ALI. oontd. S SW SR C CW CR E T4 Key Table contd. Etf ER EW1 EW2 EW5 EW4 EW5 ER1 ER2 ER3 ER4 ER5 Combines E1 specification and W1 specification f t E2 " " ,f " f f " E3 " " " w r t f t n E4 " " E5 " " M " Combines E1 specification and R2 specification " E2 " " " " tf f f » E3 " " , f " E4 " " " w " E5 " " " T5 i« • • m • •" • »• 1 » .ili 1 ' • 9 ' "O • » " ft—nii »•••'< > • • ' 4 Summary Table ••• • • > 1 Cranfield 1400 UKCIS 27000 MPL 11500 Evans 25OO I O T 0 p A b 0 0 T 0 T 0 P 1 1si b 0 0 1 a 1_ reqs m m a 1 a J a J a J a J a1 m J X X X X X X X X X m X X X ml x« X X X mJ m 1 a J X X X X X X X X X X X X X _sij a| xf X aJ a J X X a X X m1 m1 X X X RUN T1 W1 W1.1 x» x» X X1 xf X xf X xf xf R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 R11 R1? R2.1 R2.2 xf xf X X X X X X X X X X x ! X X X X xf X 1 xf X1 X X X X X X X X x xf X X X X xf X xf x» xf X xf xf xf X X X X1 X 1 xf X X xf xf X xf X X I x» X X xf xf xf xf X X1 xM X X X X S1 S2 S3 S5 s.4 X X X X X X X X X SW1 SW2 SW3 SW4 SV5 SR1 SR2 SR3 SR4 SR5 C1 C2 X X X X X X X X X CW1 CW2 CR1 CR2 CR3 CR4 CR5 CR6 4 xf X X x' X X i I • 9 4 1 J 1 1 4 I I 4 4 I 9 • 4 I 1 T6 Sumrarv nVsle eontd. M.»I . - i . . ». .i • » . • i « Cranf ield 1400 |l UKCIS 27000 T NPL 11500 T Evans 2500 Jo m Si m a m ra lo m ll m \sk m a a a m x X X X X X X X X X X X X1 a 4 • • T7 v1/Summary Table TIKCIS 27000 T generation set application set reos 1 1 16- 8 1 IS 8 4 2 Si Si Si si si Si 4 m 1 I p b 1 1 m 1 4 1 1 2 4 51 m si m m HUN R1 B2 x X T8 v2/Summajry Table Cranfield 1400 I gen. set app. set reqs RUIf R2 R7 R12 R2.1 R2.2 SR1 SR2 SR3 SR4 CR1 CR4 CR6 ER1 ER2 ER3 ER4 P1 o m F2 UKCIS 27000 T B1 |B2 P1 o m o m o P2 o P3 o B1 0 B2 o B3 o P b P1 P2 o F3 m B1 B2 o a P m lo I ° m r3 m m l° m IX1 x x x T9 v2/Summaxy Table contd. MPL 11500 A gen, set app, set reqs PI o F2 o P5 o B1 o B2 o B3 0 T PI 0 P2 o ¥3 o B1 o 32 33 o o a a a RUN R2 R7 R12 R201 R2.2 x X X X X X X1 X T10 v2/STimmary Table contd. Evans 2500 T gen. set app» set reqs RUN R2 R7 R12 R2#1 R2.2 X' XT P1 F2 P3 B1 B2 B3 F1 P2 F3 B1 B2 3 r P 0 o o o o 0 o o o o !° a m m m X X x x X X II T11 Precision Cranfield 1400 Re- I call 'o reqs T1 Table T7 o O z 2 3 2 2 2 3 4T & o o 2 2 a 3 3 C 7 o 0 2 2 o i z 5" Z 3 ? 10 0 0 o 0 0 I I O 0 0 0 I 0 2 0 o 2. 2 3 *t 1 I I r 7 3 ? IX «5| 7 »o i\ !2o| 3o 41 0 0 o 0 Z 3 10 \ 3 5 I 1 % <\ M M *l 3 S 12 '^ x* 13 < f lo ii 11 |4°l I I T13 Precision Cranfield 1400 I o m ra UECIS 27000 T O Table M R FPL Evans 2500 T Q , p 1 Recall reqs RUN R1 11500 A b 0 0 T P 0 0 si b o 1_ aJ a I a J a | a | a 6 0 0 0 0 2 & 0 0 0 0 0 0 2 m1 m j ml 0 0 0 rn | -Sij| m a | a a aI a a m m 1 100 90 80 70 60 50 40 30 20 10 100 90 80 70 1 60 50 v 0 1 2 * 0 0 O 0 O 12 20 32 0 0 * 1 k 1 o ^ 5 /I 0 0 0 0 0 2 ! 6 4 8 * 13 o 3 3 * > it ? o *H <3 0 •1 1 <2 0 O 0 0 4 7 S 21 5 1 '3 14 3o 3 "7 ,2 R2 o "^ 1 lo 15 20 7 o o o 0 40 30 20 10 R5 1 212? 38 0 1 0 0 0 0 0 0 1 <\ s 100 90 80 '70 60 50 40 —J30 20 10 W 4| 0 0 0 3oj 0 0 0 0 0 0 0 it 0 1*3 o 2 * (4 2 1 2© 31 2 3 2» o o O 0 ^ 3 1 Id 31 22 32 34 n - O O O O S 5" *l 33 q3 2 0 0 0 0 12 n '1 53 o 0 0 34 *• 0 0 0 0 4' 42 o 0 n Yl •M 1 2S 38 *1 51 0 0 3 5 8 II 6 6 O I 4 4 IS 22 23 s *7 37 9i 1 i 1 12 0 0 O R 4 100 90 80 j 70 60 50 40 30 20 10 100 90 80 70 60 50 40 30 20 10 n 0 6 0 0 0 II 0 0 Y 4 o © 2 % S 1- 11 0 0 0 20 0 O 3 4 7 t 1 7 4 IO 1 15 I* 2 0 22 O 0 O \% 0 Y Y 1 •Y 2 0 \€ 27 D O O ?3 W 0 0 0 3t> 22 32. II IS 21 2* 33 n R5 n 0 2 X3 I 2 2? *2 ? k<> 4 2 2 5 ^ H 1o 0 0 0 2 1© 2 o 0 0 0 © 0 3 24 0 0 0 22. m- *H \ 3*f x«| \Si+\ 14© 42, 7 0 0 0 0 0 0 ° o ° o 7 21 lo 14 Y 0 0 0 ' * 9^ C3j 0 0 0 11 22 I 0° 0 0 10 t* 31 Yq 40 !3t| 0 7 I £ n o o 2 0 6 I O U-4 o < 3 0 O 6 © o a 4 12 12 h 20 51 51 42 © 0 12 35 Y2. 21 Y* 53 22 34 58 't 1 IY3 \So 22, 33 \ ^ \ 'Y 1 21 7o 77 51 \*>l\ «7 0 0 0 1 1* 4 ^ t 7 1 " i3 0 0 o «> i * 1 2-1 6 O 0 2 o 0 3 & it 4 M 10 17 13 »s *4 2f 32 Uff 4 i U? 4• 7 MY 21 s 1* 4 21 1 J 32 I I 3o Y* 42 55* 57 1 1 i? ^' Ia « I I n 25 55 72 81 1 f% I 1 i 1 \ 1q T14 Table MR contd. Precision i • • • » i i « e » i i i i i • • »• 0 H I « 1 1 » I t * Cranfield 1400 ReI 0 UKCIS 27000 T O MPL , '4 1 D Evans 2500 T 0 11500 P A b 0 0 T 0 p 0 call reqs RUN 6 4 ij 0 m m a1 a J a J d O 0 O O 0 2 (6 3* 0 0 0 0 0 0 a a | a1 m J 0 0 0 0 0 0 m ' m | j^_ .§£1 m Jm | O 0 O * C|3 52 0 0 0 a a J 0 a ' a1 0 0 0 a a m Im 0 0 0 100 90 80 70 60 50 40 50 20 10 0 o 12 2 0 o % 7 O O O 21 27 3s 3* * 3 1-2 56 M A\ «f . 31 <*2 ** *» Si 34- 3b i S VI *7 Xt\ 4S 7* 0 i f it31 0 ? 7 W *1 *M 54 43 45 m l S II (8 23 32 ?* U Si O 0 0 0 H- 1 10 j I? 30 7 S* 17 0 0 0 12 100 90 80 70 60 50 40 30 20 10! %n\ is *4 0 2 /b 25 3S 0 0 0 '** 27 40 4b 51 3 S b 12. 45 1 77 1b 0 0 0 5 0 0 0 1 ib 22 % \ l*0 31 S tt> * n ! 80 | '70 I 60 50 40 30 20 10 3 * •0 ' H ix. 19. 22 ' 9 100 90 80 70 60 50 40 30 20 10 si. o 2 'T 5" S Sb 4* 71 w *r 0 0 0 IS 12 22 It 31 2-1 i*' *10100 90 80 70 60 50 40 30 20 10 H 0 2 \*t 0 © 0 43 75 0 0 © 0 0 15 33 5t S * 5* «» ! 0 2% i?" It ii hi 13 77 \ 1 h I 5" I V 24 a ^ i 3 1 in ^ *1 \ 110 13 I § hiu 1 *» So i^ 33 k I Iffl 73 i^2l ^ *2 SS* is 1 T15 Precision Cranfield 1400 I m m 0 UKCTS Table MR contd. HPL 11500 A T Evans 2500 T a a m ra Recall reqs RUN 100 90 80 70 60 50 40 30 20 10 R12100 90 80 70 60 50 40 30 20 R2 10 R11 27000 T Si m m '0 Jo |l ra ra |sj m X 5 12. H> 23L » o I X O O 0 3 o E> ll II 13 8 /o It 2fe 'ho 90 80 70 60 50 40 30 20 10 R2 •?oo 90 80 70 60 50 40 30 20 10 T16 Precision • ' 1 ' • » • »i i nil i» Table FB mil 0 i • O i '<•' C • i i ii Cranfield 1400 I UKCIS 27000 T WPL 11500 A "o T Evans 2500 'o m Dl Si m m o |i | si m m m m ra 2 5" ? il 3 3 10 0 1 2 3 *f 3 7 10 5 n o \ 2 3 t> ? to Ik 0 1 2 2 IT 4 10 T17 Precision Cranfield 1400 I UKCIS Table MSW NPL 11500 T Evans 2500 T Recall 27000 T Si ra m m in o ll m ra Isi m m ra reqs RUN 100 90 80 70 60 50 40 30 20 10 SW2. 100 90 80 70 60 50 40 30 20 10 SW1. 6 I 1 3 11 IS" af O 1 a \ 1 % I I sw5. 100 90 80 70 60 50 40 30 20 10 SW4, 100 90 80 70 60 50 40 30 20 10 SW5, 100 90 80 70 60 50 40 30 20 10 'M **f 0 1 X 3 ffc 13 lo 33 0 1 3 1 o • o n . 11 i i f • i •» ' < ' i > • • •• Cranf ield 1400 II UKCIS 27000 T o HPL 11500 T Evans 2500 T a a m ra Jo m m 0 I 2 m m m ra m 7 I3 IB T23 Precision Cranfield 1400 I H i Table ME NPL 11500 T Evans 2500 T UKCIS 27000. T Recall reqs RUN E1 100 90 80 70 60 50 40 30 20 10 E2 100 90 BO 70 60 50 40 30 I Si |o fl [ S | m a m m m ra m a m ra O s 2 5 & 10 13 o 1 2 I 1 % 0 1 20 10 E3 100 90 .80 70 60 50 40 30 20 10 E4 100 90 80 70 60 50 40 30 20 10 E5 100 90 80 70 60 50 40 30 20 10 2 3 5 t % \o n o I 2 I 4 5 10 1 t n o I 2 2 3 * 5 T24 Precision • • •", Table MEW + Cranfield 1400 ReI call 'o reqs RUN ^100 90 80 70 60 50 40 30 20 10 EW2100 90 80 70 60 50 40 30 20 10 EW3100 90 .80 70 60 50 40 30 20 10 EW4 100 90 80 70 60 50 40 30 20 10 FW5100 90 ra UKCIS 27000 T HPL 11500 T Evans 2500 » » lo m a m ll m |S^ m a m ra m ra o « 3 22.1 3o 3d W\ o o Q O n 23 12 30 20 10 3o 31 ER4100 I 90 80 70 60 50 40 301 20 10! ER5100 ro o o o 0 n V 0 o 0 it, 3i 90 80 I 7o| 60! 50! 401 301 20 10 n n 3| 3b o 45 T26 Precision UKCIS Recall generation set application set reas HU1T 27000 T 1 vS 8 1 1 4 Si Si 4 4 2 4 8 si si si Si Si si m Si m m m m 0 0 0 % 1 2 I P b 1 1 1 1 2 R1 100 90 80 70 60 50 40 30 20 10 100 90 80 70 60 50 40 30 20 10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 6 0 O 0 R2 t 1 \o 13 13 o 0 0 0 0 0 0 a o o o a ,t X5" U IS 13 o 13 31 i t o *7 33 33 0 32 3* 37 z ^ !V7 2.2 i1 hz 4? 30 53 3 * s% 4t 51 0 0 o 2 8 o 0 6 12. O 0 0 \n T27 Precision Cranfield 1400 I F1 o P2 0 v2/Table M R TJKCIS 27000 T Recall gen. s e t app. s e t reqs RUN 82 IB1 0 B2 o m 0 F1 0 F2 0 |P 113 [ P3 B1 |B2 |B3 i p i O 0 0 0 0 P2 0 P3 0 B1 |B2 |B3 0 0 0 1m 100 90 80 70 60 50 40 30 20 10 R7 m 0 0 ni 0 0 0 * a a a a a a m O O O O 0 0 m O O O 0 0 0 I* m 0 0 m O O O 0 O 0 0 18 m O O 0 0 0 0 2 m O O 0 0 0 0 2V 1S\ 0 0 5 8 II io 25" lo o 1 L ts \t o 0 S * •' 221 » 0 0 1 "7 n zs z» 36 2o\ 2if 3 1 III IS 7 25" 22 40 0 0 0 n s 2fc 23 41 **l a 3o 100 90 80 70 60 50 40 30 20 10 41 ft *7 3* 1* * 2t t 3 II »«t It 3| O I 2. 1 fe> 10 13 ib 20 30 r 7 0 O 0 5" /«•• "7 2 2 3/ m M 40 R12 100 90 80 i 70 60 50 40 30 20 10 R2#1 3 5 % \o 2tf *fr Ife 0 100 90 80 70 60 50 40 30 20 10 10 0 z 7 2? 38 »z 45 VS *2 112,2 100 90 80 70 60 J ' i i i I { 4 i * J 50 40 30 1 T28 v2/Table MR contcU Precision HPL Recall gen # s e t app* s e t reqs RUN R2 100 90 80 70 60 50 40 30 20 10 100 90 80 i 70 60 50 40 30 20 10 100 90 80 70 60 50 40 30 20 10 © 0 % O O O O O 0 © 11500 A F1 0 T F3 0 P2 0 B1 0 B2 o a B3 o a PI 0 P2 o a P3 o a B1 0 B2 I 33 o a 0 a a a a |a a a ia 0 0 0 3 M !(> 22 ^%\ 2*7 •7 12 S 31 3| R7 38 *7 0 1 2fcl 3 2 34o 0 0 3 5" 3 »3 ! ^ ir 23 23 R12 R2.1 100 90 80 70 o o 7 22 27 31 37 6 o © o o ° 6o 50 40 30 20 10 R2#1 100 90 80 70 60 •3 0 n 2a 29 32 0 3« 0 r 12 IS" 2$- 3 0 3 - 5 i 'o 2* 21 \o It r 7 5 IS I? h Ilk jl*! 2| IT R12 '1 >7 *7 R2 1 - O 0 0 0 0 0 0 0 0 \ 0 0 6 0 0 0 IS 2J» 2 4 3*7 * 7 0 0 •° 0 0 0 0 22 ^ 5"l I J j 4 4 R2#1 0 0 0 0 0 0 0 0 I 0 0 © 0 0 0 0 4 »Sl T50 v2/Table MSR Precision r 1 — » » 1 • 1f " % •" •" r ' 9 w' ' • "• » y ' » • »• • ' * ' »' " " » ' '""i Recall Cranfield 1400 TJKCIS 27000 gen. set app. set reqs RUI? SR1 P b F1 F2 |B1 I B2 P1 F2 F3 rB1 B2 ! 3 F1 F2 F3 B1 B2 B3 B 0 o Io o 0 0 O 0 0 0 0 0 0 0 0 0 I T m 100 90 80 70 60 50 40 30 20 10 100 90 80 70 60 50 40 20 10 0 0 m m m a a a a a a m m m m m Im 0 0 o o o M 2\ i J i J 4 I 'A'PP Table Src T, V, CRt ER NPL 11500 Precision Cranfield 1400 I UKCIS 27000 T Evans 2500 T m Recall I si m m m m m reqe m RUN T1 100 90 \0 80 12 70 60 2© 50 11 40 30 20 10 *1 W1 100 90 80 70 60 50 40 30 20 10 CR11001 si. m 21 is 1? 11 23 2» 3 r 3 12. f 2. I 3 4 % 3 4 (I IS* '1 3 J* 132 11 11 IS! 14 121 23 3? IS 2* ? M l V 2l 2» 23 24 3e> 3S 31 *l So 20 21 2S 90 80 70 60 50 40 30 20 10 CR4100 I 90 80 70 60 50 40! 30 I 20 ER5 10 100 I |35-| 90 80 70! 60] 50 40 30 20 *7 So 24 24 2? 301 35- 44 H| ri T34 Precision Cranfield Recall UKCIS 27000 0 O Table SrcR NPL Evans T 0 1400 I mI m , P 1 a | 11500 A b O 0 2500 T 0 p 0 si b 0 l__ reqs RUN R1 a1 a1 aJ aJ a J mJ m a mJ m 1m 1 sil | a | a | a1 a | a m | m 100 90 80 70 60 50 40 30 20 10 100 xo 2o 22 XS\ 3o 3* *1 44 21 21 2b 32 *H*» \ R2 10 22 90 80 70 60 50 40 30 20 10 R3 2n\ 4o »4 t> XI 24 IS * 35" 41 *1 7 IO S" 23 3 J R7 R12 11 X| 3 3r T38 v2/Table SrcR contd. Precision KPL Recall gen, set app. set reqs EOT R2 11500 A F1 o F2 |F3 B1 o o o B2 o B3 o T F1 o P2 o P3 B1 B2 B3 o o a o 0 100 90 80 70 60 50 40 30 20 10 100 90 80 70 60 50 40 30 20 10 100 90 80 70 60 50 40 30 20 10 1 12. o 1 «? R7 \n t *7 n 33 R12 T39 Precision 1» • • • • '• v2/Table SrcR contd0 » 0 *¥ • ' » > • • » ^ • Evans Recall gen. set app. eet reqs 2500 T p F1 P2 F3 |BI |B2 B3 PI P2 F3 |BI |B2 B3 0 0 0 0 0 0 0 0 t o 0 0 0 a a a a | a a m m m m m m 100 90 80 70 60 50 40 30 20 10 100 RUN R2 1 1 1 1 % s n zr 3? 10 *t 6 IS Zfc 3 18 2* 1 ft H 90 80 70 60 50 40 30 20 10 R12 100 90 80 70 60 50 40 30 20 10 t II I 1 J J i 11 J i i 1 1 T40 v2/Table SroCR ( No tables v2/Sdr, pr, rr CR ) Precision Cranfield 1400 I P1 o m 100 90 80 70 60 50 40 30 20 10 100 90 80 70 60 50 40 50 20 10 100 90 80 70 60 50 40 30 20 10 F2 o m UKCIS Recall gen. set app. set reqs RTO CR1 27000 T B1 !B2 F1 P2 o o o !o m m o B1 o B2 o B3 o p b P1 F2 |F5 jo B1I o m B2 jo ni B3 o m o in Im CR4 Zo 32 n 3* CR6 3& «H H i° T41 Precision Cranfield 1400 I "o m UKCIS 27000 T Table Sdr T. V. CR. ER NPL 11500 Evans 2500 T Recall Si lo ll I s j m m m m m reqs m m r n RUN T1 100 90 80 70 60 50 n.«u M.^ 40 30 20 10 W1 100 90 80 70 60 rt.*J 50 40 30 20 10 CR1100 90 80 .70 60 !*.«!• 50 40 30 20 10 CR4100 90 80 70 6o »>.*. 50 40 30 20 10 ER5100 90 80 70 60 50 40 30 20 10 z n.A« o o o I I ? •J X 3? X\ 3 5 " 12. O I I 2. 2. .3 iu«. 4> . 1 n.q. T42 Precision Table SdrR « * i Cranfield 1400 Recall I reqs m RUN R1 100 90 80 70 60 50 40 30 20 10 R2 100 90 UKCIS 27000 T St NPL 11500 A o Evans 2500 m lo H Is* a a m m m m m m rt.O. o n is M.A. ft, 38 10 20 50 100 H3 5 10 20 50 100 R4 5 10 20 50 100 R5 #1 rf.A »«.| 3LJW |222t[ 5 10 •20 50 R6 100 fl.A 5 10 20 50 100 R7 IH.A. En h.«.l art ri,4 5 i<7 10 20 50 100 R8 5 10 20 50 100 R9 n,ft 5 10 20 50 100 5 10 20 k-«J R10 n-fcj T50 Table SprR contd. Recall f precision • • Cranfield 1400 Hank I "o reqs RUN R11 m UKCIS 27000 T NPL 11500 A 'o Evans 2500 T Si m m m ra m m m ra 5 10 20 50 100 R12 5 4.4. 10 20 50 100 I I 4 1 4 J • i T51 v2/Table SprR Recall, precision Cranfield 1400 I UKCIS 27000 T P b B1 B2 B3 P1 F2 |P5 B1 B2 B3 0 o o I o [o o 0 0 m m m m m m Rank gen. Bet app. set reqs RUN R2 F1 P2 B1 B2 F1 P2 o o o o o o o m m Im m 5 10 20 50 100 5 10 20 50 100 5 10 20 50 100 M.A Kft. R7 R12 rt.« T52 v2/Table SprR oontd. Recall, precision NPL 11500 A PI o P2 |F5 o o B1 o B2 o B3 o F1 o F2 o P3 o B1 o B2 o B3 o Rank gen. set app. set reqs RUN R2 5 10 20 50 100 lira hi,*f R7 5 10 20 50 100 R12 5 10 20 50 100 T53 contd, Recall, precision I Evans Rank gen. set app. set reqs 2500 T PI o F2 o P3 B1 B2 B3 PI P2 P3 B1 B2 o o o o o |m o m o |m !o ID o m r3 o fm mm R2 5 10 20 50 100 \*4\ MM R7 5 10 20 50 100 R12 5 10 20 50 100 II T54 r ^ a v e r a g e relevant r e t r i e v e d t • average t o t a l retrieved • • * « 2*0-41 f1 7^4 •w 2M 14 11.I ll.|J 17.7 1.1 *-f *IH I M t"» m37)0 I H ^ •1^-»H 3fcM3nM| wwiil-i tU-3 KfXl.S W 1 gird *ni w3 W 224J l?4j «no mi 1-") r 3.3 t im R2 r 2* t U.1 R3 r 3.3l t 3syd KM 6.«H 3.1 R4 r id t Tti\\ 117J.0 R5 r J-1\ 3-3 2H4 t 5^2.y 151-0 1103-1 R5 < r 34 2->3 t M5-3 43.0 HTM R7 r 3.3 t 3%» R8 r 3.3| t 3 ^ 3-3 R9 r 't 333& R10 r 33 t 2tSl| R11 r J3j t itso R12 r si t 30t*| R2.1 r 1U^ id "r?J •stJ «••« A.4. 1S3-1 1 t«J *s ni IM HtM TVSi W l*l 22 J 736-2 I W 1*7 UH (t-i\ nt 14 M tmi* 3tli -H wi r\\ lob t^a 1IW t'1,-3 15-5 *d «H «U 7M *IM A. A n» ^« #\ A W 10.1 3S7S 4©tt *« u 7J Oil 22M I8S-0 14 Hl-S 14 /IS-3 4- R2.2 r t S1 S2 S3 S4 S5 r t r t r t r t r t SW H H til-fc 3-H 4H-S 3-H a W I 3-«H WI-1 1 i I 4 1 § 1 1 1 [ 1 I \t k 1 1 | T55 Cranfield 1400 I "o reqs RUN !W1 IW2 ra UKCIS MPL Table S t r contd. •" i > ii i • • Evans 2500 27000 T 11500 A Si a m li I sj m m ra m m m ra r t r sm AttSf 3n >•» . • • • i m Cranfield 1400 I 'o reqs RUN t2yS t3oa m m TJKCIS HPL 11500 A |o m m in Evans T 27000 T jl |SJ m m 2500 1 m 1 A 3o.d 2/S 13.b| 2-3 11-3 2.1 lb.*] T57 r = average relevant retrieved t - average total retrieved tlKCIS 27000 T generation set application set reqs RTJIJ R1 R2 r t r t 1 P b 1 v1/Table Str IT 8 4 Si Si 1 1 1 I 2 £>4 1 1 si 4 Si fi m si si si m m 8 4 4 m m *>.«(, K.O. *v«. " < . ».« I1 MI 1* *77| •«-7l T58 r • average relevant retrieved t « average total retrieved f » » » 9 ' • * • ' * > 9 • • v2/Table Str •' ' » ' * Cranfield 1400 I geru s e t app. reqs RWR2 R7 R12 R2.1 R2.2 SR1 SR2 SR3 SR4 CR1 CR4 CR6 ER1 ER2 ER3 ER4 set TJKCIS 27000 T B1 0 1 p F1 I P2 0 0 B2 0 P1 0 F2 0 P3 B1 B2 B3 P1 P2 P3 B1 B2 B3 0 0 0 0 0 0 0 0 0 0 m m m m a a a a a a m m m m m m r t r r t t r t r t r ] 2-8 2-» 2-S ftffl ioih Sl-S 5S-^ £^ 5.3 v ,2 37 m •2-0 *J u 117 Il-J *l ISM in-11 Icvst-l n«t IW-7 U-zJ /J-7 #13 IV mi t r t r r t r t.f 20\ *7 14 7-1 t HI t r t r 20 3-3 «*23-l t r t r t r t r t 111 1.0 73 II .0 1*3 i 9 I 1 \ i1 I i I * i < J i A T59 v2/Table Str contd. fflPL 11500 A gen. app, reqs RUN r t r t r t r t r t 1115J22H Clb'f T P3 B1 o o B2 o B3 o PI o F2 o F3 o B1 o a B2 o set set P1 o F2 o 33 o «3 ? IS W-l R7 R12 R2.1 R2.2 2KI3U*UH *3 HI 2MWI •*3 1-1 6-7 73 It)-3112*01 "7-31 T60 v2/Table Str contd. Evans 2500 T P P3 |B1 B2 o B3 o P1 o m r t r t r t r t r t P2 o m 2* \b •) «H gen. s e t app. s e t reqs EDIT R2 P1 |P2 ?3 B1 B2 B3 o 0 o o m m m n R7 R12 H2.1 R2.2 31 7 n.4 4.1 W1 5 10 20 50 100 CR1 5 10 20 50 100 n.4 CR4 5 10 20 50 100 • * • 50 100 89 5 10 20 50 100 !(10 0 10 ?0 M- I*.«J T63 Table SrrR oontd. Nbe relevant retrieved Cranfield 1400 I TJKCIS 27000 T 'o NPL 11500 Evans 2500 T Rank Si m m m ra m m m reqs RUN R1T m m 5 10 20 50 100 5 10 20 R12 \*4, 50 100 I 1 I I J 9 I T64 v2/Table SrrR No. relevant retrieved Cranfield 1400 I F1 o m F2 UKCIS 27000 T P Rank gen, set app. set regs ROT 12 B1 |B2 P1 F2 F3 B1 B2 B3 P1 F2 |P5 B1 B2 B3 o m o lo o o o o |o |o m o Im P jo m m l ° m o m |m Im 5 10 20 50 100 n-a *\A. R7 5 10 20 50 100 R12 5 10 ?0 nA - 50 100 I T65 No. relevant retrieved NPL Rank gen, set app. set reqs RUN R2 11500 A F1 F2 F3 B1 B2 B3 PI F2 F3 B1 B2 B3 o o o o o o o o o o o o a v2/Table SrrR contd. 5 10 20 50 100 It R7 5 10 20 50 100 5 10 20 50 100 R12 T66 v2/Table SrrR contd. No. relevant retrieved Evans Rank gen. set app. set reqs RUN R2 2500 IT P1 |P2 |P3 I B3 o P1 |o jo P2 |F3 B1 o o o Im Im Im B2 B3 ! ° jo m m im 5 10 20 50 100 5 10 20 50 100 5 10 20 50 100 Kg h.*- R7 R12 I J T67 Proportion requests Cranfield 1400 I 'o m Hi Table Scr T, W. CR. ER UKCIS NPL 11500 A "o T Evans 2500 T 27000 T Rank reqs Si m m |o |l m III |sj m m m 111] 1H-7 1">H- RUN T1 5 10 I77-* 20 93»[ 50 121 12-1 100 *N H-7 74-4] tSl II-1 13-1 1H 1?3 12-/ W1 17 H9 5 10 20 50 100 5 10 ?0 50 100 5 10 20 H10 T69 Proportion requests UKCIS 27000 T Table ScrR contd. Cranfield 1400 I 'o reqs RUN R11 m m NPL 11500 Evans 2500 T T m ra m m m 5 R12 10 20 50 100 5 10 20 50 100 k • 4 * § % 9 9 T70 v2/ Table ScrR Proportion requests Cranfield 1400 I F1 o F2 o B1 o B2 o TJKCIS 27000 T F1 o P2 o Rank gen. set app. set reqs RUU H2 P3 B1 B2 B3 o o o o P b F1 F2 o F5 o m m m m m m m m 5 10 20 50 100 5 10 20 50 100 R7 R12 5 10 20 50 100 T71 v2/Table ScrF contd. Proportion requests NPL Rank 11500 A PI o gen. s e t app. s e t reqs RUN R2 P2 o F3 o B1 o B2 |o B3 o PI o P2 o P3 o B1 o B2 o 33 o 5 10 20 50 100 5 10 20 50 100 5 10 20 50 100 ffcd % i 17 Si R7 R12 T72 v2/Table ScrR contd. Proportion requests IEvans Rank gen. set app. set reqs RUN R2 2500 IT P1 |P2 |P3 I lo o B3 F1 P2 P3 B1 B2 B3 o lo m o o o m o ! m |m fm 5 10 20 50 100 mi 12- I*f7| hv-7 I'M R7 5 10 20 50 100 R12 5 10 20 50 100 I I T73 v2/Table ScrCR Proportion requests Cranfield 1400 I UKCIS 27000 T P b B2 B3 PI P2 P3 B1 B2 B3 o o o o 0 o o m m m m m m Rank gen. set app. set reqs HON CR1 F1 P2 B1 B2 P1 F2 P3 B1 o o o o o o o o m m m ID 5 10 20 50 100 CR4 5 10 20 50 100 7*7 tl-t •7l*f CR6 5 10 20 50 100 %$ 1*7 T74 Cranf ield 1400 II UKCIS 27000 T NPL 11500 A Other Table 0 •+ * Evans 2500 T T Jo m m St |o |l m [Sj m m m m m m 0 n f, to lo \ 1* o o 1 II IS 1**1 ?7 R1 REFERENCES BARKER, F. H.f VEAL, D. C. and WYATT, B. K. , 'Towards automatic construction', Journal of Documentation, 28, 44-55 (1972). profile BARKER, F. H., VEAL, D. C. and WYATT, B. K. , Retrieval Experiments Based on Chemical Abstracts Condensates, Research Report No, 2, UKCIS, University of Nottingham (1974). BARRACLOUGH, E. D. et al. , The Medusa Current Awareness Experiment, Computing Laboratory, University of Newcastle upon Tyne (1975). CLEVERDON, C. W. , MILLS, J. and KEEN, E. M., Factors Determining the Performance of Indexing Systems, 2 vols., College of Aeronautics, Cranfield (1966). CROFT, W. B. and HARPER, D. J., 'Using probabilistic models of document retrieval without relevance information', Journal of Documentation, 35, 285-295 (1979). EVANS, L., Search Strategy Variations in SDI Profiles, Report INSPEC, Institution of Electrical Engineers, London (1975a). R75/1, EVANS, L., Methods of Ranking SDI and IR Outputs, Report R75/3, INSPEC, Institution of Electrical Engineers, London (1975b). HARPER, D. J. and VAN RIJSBERGEN, C. J., 'An evaluation of feedback in document retrieval using cooccurrence data', Journal of Documentation, 34, 189-216 (1978). HARPER, D. J., Relevance Feedback in Document Retrieval Systems: An Evaluation of Probabilistic Strategies, Ph.D. Thesis, Computer Laboratory, University of Cambridge (1980). MILLER, W. L., The Evaluation of Large Information Retrieval Systems with Application to MEDLARS, Ph.D. Thesis, University of Newcastle (1970). ROBERTSON, S. E., A Theoretical Model of the Retrieval Characteristics of Information Retrieval Systems, Ph.D. Thesis, University of London (1976). ROBERTSON, S. E. and SPARCK JONES, K., 'Relevance weighting of search terms', Journal of the American Society for Information Science, 27, 129-146 (1976). ROBERTSON, S. E., VAN RIJSBERGEN, C. J. and PORTER, M. F., 'Probabilistic models of indexing and searching', paper presented at the BCS/ACM Symposium on Research and Development in Information Retrieval, Cambridge, 1980; in press. ROBSON, A. and LONGMAN, J. S., 'Automatic aids to profile construction', Journal of the American Society for Information Science, 27, 213-223 (1976). R2 ROCCHIO, J. J., 'Relevance feedback in information retrieval1, in The SMART Retrieval System, Ed, G. SALTON, Prentice-Hall, Englewood Cliffs, N.J. (1971). SALTON, G. (Ed.), The SMART Cliffs, N.J. (1971 ). Retrieval System, Prentice-Hall, for Englewood Information SPARCK JONES, K. , Automatic Keyword Retrieval, Butterworths, London (1971). Classification SPARCK JONES, K., 'A statistical interpretation of term specificity and its application in retrieval', Journal of Documentation, 28, 11-21 (1972). SPARCK JONES, K., !A performance yardstick for test collections', Journal of Documentation, 31• 266-272 (1975). SPARCK JONES, K., 'Performance averaging Journal of Informatics, 2, 95-105 (1977). for recall and precision', SPARCK JONES, K. and BATES, R. G., Research on Automatic Indexing 1974-1976, 2 vols., British Library Research and Development Report 5464, Computer Laboratory, University of Cambridge (1977). SPARCK JONES, K., 'Search term relevance weighting given little relevance information', Journal of Documentation, 35, 30-48 (1979a). SPARCK JONES, K. , 'Experiments in relevance weighting of search terms', Information Processing and Management, 15, 133-144 (1979b). SPARCK JONES, K. , 'Search term relevance weighting results', Journal of Information Science, (1980). some recent YASWANI, P. K. T. and CAMERON, J. B. , The National Physical Laboratory Experiments in Statistical Word Associations and Their Use in Document Indexing and Retrieval, National Physical Laboratory, Teddington (1970). VAN RIJSBERGEN, C. J., 'A theoretical basis for the use of cooccurrence data in information retrieval', Journal of Documentation, 33, 106-119 (1977). VAN RIJSBERGEN, C. J., HARPER, D. J. and PORTER, M. F., 'The selection of good search terms', paper presented at the Seventh Cranfield International Conference on Mechanised Information Storage and Retrieval Systems, 1979; in press. VERNIMB, C , 'Automatic query adjustment in document Information Processing and Management, 13, 339-353 (1977). YU, C. T. and SALTON, G., 'Effective information retrieval accuracy', Communications of the ACM, 20, 135-142 (1977). retrieval', using term YU, C. T., LAM, K. and SALTON .G., 'Optimum term weighting, in information retrieval using the term precision model', Department of Computer Science, Cornell University (1980). Gl 1 GLOSSARY alternative request different form of request primary indexing application (set) document set used for searching with weighted requests baseline simple term matching retrieval performance collection see test collection collection frequency number of documents (descriptions) in a collection containing a term collection frequency weight term weight based on its collection frequency completely ordered fully ordered search output including all documents in a collection component of a retrieval system, in the most general sense cross check comparison altering values of a variable other than the focal one description characterisation of a document or request description exhaustivity see indexing description exhaustivity document cutoff basis for recall and precision computation exhaustivity see indexing description exhaustivity factor system component affecting performance fully ordered ordered search output with one retrieved document per rank generation (set) document set used to calculate term weights Gl 2 indexing description exhaustivity number of index terms per description indexing factor factor affecting the descriptions of documents and requests indexing mode automatic or manual preparation of primary indexing indexing source form of document or request used for indexing indexing vocabulary term set used for collection primary indexing indexing vocabulary specificity number of documents per term' input factor factor affecting input documents and requests matching condition form of matching between requests and documents matching rank point of comparison between request search outputs matching value point of comparison between request search outputs material (difference) difference between recall/precision graphs of at least 10% mode see indexing mode noticeable (difference) difference between recall/precision graphs of at least 5% ordered (output) partially or fully ordered search output output factor factor affecting search conduct and output output form output as naturally generated by a search procedure output type output as used for comparative evaluation parameter environmental factor (not subject to control) partially ordered ordered search output allowing more thnn (^no retrieved Gl 3 document per rank primary indexing word or word-stem lists representing document and request sources proposition statement germane to a particular research topic and related to experiments raw material initial documents and request need statements recall cutoff basis for recall and precision computation relevance feedback retrieval techniques exploiting relevance assessments to control searching relevance weight term weight utilising relevance information relevance variants sets of relevant documents defined by different relevance grades run averaged search output for specific request, document and relevance sets, involving specific indexing and searching procedures run set specific run output for different collections scanning strategy form of search inspection of a collection scoring criterion type of scoring for a request-document match setting of a parameter source see indexing source source material forms of documents and request need statements exploited in indexing and relevance assessment specificity see indexing language specificity subsidiary collection Gl 4 variant, e.g. part of, a test collection term extracted word stem term weighting any weighting of index terms in a description test collection particular set of primary indexing descriptions for some source material topic subject of investigation unordered (output) minimally ordered output i.e. retrieved/not retrieved value of a system variable variable system factor (subject to control) variant collection collection with relevance set alternative to main one version form of raw material represented by a specific collection vocabulary see indexing vocabulary yardstick performance defined by optimal weighting