7 S y s t e m CZJBSCZ: r~ ± p -t ± o m 7.1 Introduction This chapter describes and illustrates the catalogue systems used for live evaluation. The description includes some background design information, some indication of what is going on "behind the screen" and occasional evaluative remarks. Rs described in Chapter- 6, for the live testing we constructed two systems which differ in the extent of the automatic search aids they provide. These BPO the experimental system CEXP3 and the control system CCTL3. There was also a third system (OS7EM3 which used the same method of term combination as the live systems but performed no stemming or spelling normalisation. This was only used by the experimenters for comparative repetition of live searches. Both the EXP and the CTL catalogues offer only a subject search. This fact is clearly displayed on the introductory screen CFig 7.13, and was emphasised while we were collecting the evaluation data by means of a notice on top of each terminal. Both catalogues accessed the same bibliographic files, and the record displays and screen layouts BTG identical. It is unlikely that more than a very few users noticed any difference between the two systems since the two terminals were on different floors of the library. The CTL catalogue applies "weak" stemming and spelling standardisation C6.23 to all search words. The EXP catalogue uses, in addition, S t r o n g " stemming, a phrase dictionary, some automatic cross-referencing and will sometimes Cabout half the time} suggest an alternative tor a word which it is unable to find. The general appearance of the systems is quite 'similar to the subject search in the original version of Gkapi COkapi '843 described in the first Dkapi report C1, Ch 73. The most noticeable difference is an additional screen. This is the lookup and search results screen illustrated Pig 7.4 etc. It appears as soon as a user has entered a search. This screen records the progress of the search and displays options as appropriate. In Okapi '64 the bottom half of the input screen was used for these purposes. -83- 7 System description Fig 7.1 Introductory screen ttntttttttttttttttntttuntttttttttttttttttttttttttttttttn^ ! ! WOKflPI '86 « ! EXPERIhENTFL CCmjTER CATALOGUE FOR SUBJECT 5EAROC5 ! This catalogue has been developed by the library research team at PCL ! ! ! You might have used Okapi '84 before. This version has been improved: Okapi '86 is easier to use and is more effective I ! Until this version is complete ! THI5 CATALOGUE WILL C*LY DO SUBJECT 5EARCHES ! ! If you want to look for a particular book - use the other catalogues ! ! To start, press the GREEN KEY on the right of the keyboard ... I mmmmmmmmmmmmmmmmmmmmmmmmmm Fig 7.2 Empty input screen SUBJECT SEARCH » OKAPI The computer will look for books which include all (or most} of your words in their titles or subject descriptions Type a word or a phrase which describes the books you want: I _an 7 System description Fig 7.3 Input screen after user has s t a r t e d to type SUBJECT SEflRCH tt OKflPI The computer w i l l Look for books which include a l l (or most) of your words i n t h e i r t i t l e s or subject descriptions: Type a word or a phrase v^iich describes the books you want: a r b i t r a t i o n , mediation, c o n c i l i a t i o n ! Press the GREEN KEY when you have finished WHITE KEY BLUE KEY to change what you have typed to get r i d of vhat you have typed 7.2 Keyboard and display The u s e r s t a t i o n s a r e R p p l e l i e m i c r o c o m p u t e r s , w i t h s i x of t h e p e r i p h e r a l keys p a i n t e d f o r use as f u n c t i o n k e y s . One of them, t h e y e l l o w key, i s not used i n any of the systems described here. I n O k a p i ' 8 4 i t was u s e d t o i n v o k e h e l p or advice. Examination of logs from the e a r l i e r system showed t h a t i t was r a r e l y u s e d , and t h i s seems t o be t h e c a s e for most o n l i n e c a t a l o g u e s . T h e t y p e o f s y s t e m we d e v i s e d w a s i n t e n d e d t o be u s a b l e a t s i g h t , and t h e i n c o r p o r a t i o n of h e l p a d d s Bn e x t r a n e o u s d e p e n d e n t v a r i a b l e t o t h e m o r e i m p o r t a n t ones which the systems were designed t o investigate. (The y e l l o w key i s r e t a i n e d because i t has a use in Bn a s y e t u n t e s t e d r e l e v a n c e f e e d b a c k system.3 The other keys have the following general meanings: GREEN BLUE RED WHITE BLOCK p r o c e e d , browse f o r w a r d s go b a c k one s t e p , b r o w s e b a c k w a r d s s t o p , go back ( m o r e d r a s t i c t h a n BLUE) d e l e t e Last c h a r a c t e r ( o n l y works d u r i n g i n p u t search terms) r e t u r n t o i n t r o d u c t o r y screen (end s e s s i o n ) of None o f t h e s c r e e n s i n d i c a t e s t h e f u n c t i o n o f t h e b l a c k key, b u t t h e r e i s a l a b e l n e x t t o t h e k e y w h i c h s a y s P R E S 5 WHEN FINISHED. There i s , of course, a t i m e - o u t . This returns t o t h e i n t r o d u c t o r y s c r e e n i f no key has been p r e s s e d fur 80 seconds - e x c e p t d u r i n g r e c o r d d i s p l a y when t h e time- -Q1 - stem dmmcplpt£,@n ©ui hm"' message, or a 'CRN'T FIND •"' message. 5ince the components of the weakstemmed string are not, in general, suitable for display the system shows the portion of the CpreprocessedD source string which gave rise to the stem which it is looking up. In the case of a "CRN'T FIND", the user has to take some action - replace the word, tell the system to ignore it or abort the search. The display is illustrated in Fig 7.5. If the weak stem has already been Looked up, but the source word is not the same Cexample: EFFECTIVE CD5T RND 5UC1RL CO^lb) there is a message of the form ("costs' included under "cost") Fig 7.5 Display white looking up - word not found Your search: "anstotles poegics' Looking up these words 25 books under 'aristotles' C l ' FIND 'poegics' fNT BLUE KEY to change this word I WEEN KEY to continue without this word (RED KEY to abandon this search) -RR- 7 System description Fig 7.6 shows the user being prompted for a replacement search term. If the user substitutes another word or words, the top Line of the screen alters accordingly (Fig 7.73, the replacement string is weak-stemmed, the "CPN'T FIND" message disappears and Look-up continues. The system assumes that the most usual and appropriate action will be to replace the word, and so this is the first option. When the user does choose to have the word ignored we were unable to arrive at a wholly satisfactory way of indicating this in the top Line of the screen. If the word is simply deleted from the displayed string one is often left with a non-meaningful phrase or sentence. For example, in the search P05T WAR GRRPHIC DESIGNS Nl THE USR, the user might quite sensibly tell the computer to ignore NI, but Your search: 'post war graphic designs the usa' does not Look very s e n s i b l e . The e a s i e s t c o m p r o m i s e was to replace the ignored word by s o m e t h i n g w h i c h might remind the user that a w o r d h a s been o m i t t e d : at the risk of looking p r u d i s h the system d i s p l a y s an a s t e r i s k , and shows "post war g r a p h i c d e s i g n s * the u s a " . Whether the word is replaced or i g n o r e d , the "CPN'1 FIND" message disappears. This is m a i n l y to avoid recording the p r o g r e s s of a s e a r c h o v e r f l o w i n g the s c r e e n . Rlso because of s c r e e n c a p a c i t y , if the number- of w o r d s w i t h n o n - z e r o h i t s reaches 1 2 , the remainder of the search statement is ignored. F i g 7.6 Retyping a misspelt word Your search: 'aristottes poegics' Looking up these words 25 books under 'aristottes1 ON'T FIND •poegicsType your new word : poetics! GREEN KEY when you have finished (RED KEY to abandon this search) _ uc _ . 7 System description F i g 7.7 Display d u r i n g and a f t e r merging Your search: 'aristotles poetics 1 Looking up these words 25 books under 'aristotles 1 67 books under •poetics" Looking for books described by your search 1 book matches your search exactly C91 books found altogether) G E N KEY to look at the bookCs) found I RE (the most similar books should appear f i r s t ) BLUE KEY to correct or change your search RDKY E E to do a different search The L o o k - u p p r o c e d u r e i s q u i t e f a s t - h a l f a s e c o n d o r L e s s f o r e a c h t e r m - and i t i s i n d e p e n d e n t of t h e number o f postings. When L o o k - u p i s c o m p l e t e , t h e s y s t e m a s s i g n s w e i g h t s t u t h e t e r m s and c a l c u l a t e s maximum, " g o o d " and "minimum a c c e p t a b l e " w e i g h t s f o r r e c o r d s as d e s c r i b e d i n 6.5.2. The L i n e " L o o k i n g f o r b o o k s d e s c r i b e d b y y o u r s e a r c h " C F i g 7 ,7~) a p p e a r s . The s y s t e m m a k e s a n e s t i m a t e o f t h e n u m b e r o f p o s t i n g s i t i s g o i n g t o have t o e x a m i n e , and d i s p l a y s 1 1 - p l e a s e w a i t . . M and a countdown i f i t i s g o i n g t o t a k e more t h a n a few s e c o n d s . R term c o m b i n a t i o n , or merge, then f o l l o w s . CThis i s , of c o u r s e , t r i v i a l f o r single-term searches.D 7.4.2 Experimental system I n s t e a d o f L o o k i n g f o r i n d i v i d u a l w o r d s , EXP s u b m i t s t h e whole o f t h e remainder o f t h e weak-stemmed s t r i n g f o r i n d e x l o o k - u p C a f t e r removal of any s t o p words f r o m t h e b e g i n n i n g of the s t r i n g } . T h i s l o o k - u p p r o c e d u r e Looks f o r t h e L o n g e s t m a t c h f r o m t h e L e f t o f t h e s t r i n g up t o a w o r d b o u n d a r y , and r e t u r n s e i t h e r t h e number o f c h a r a c t e r s m a t c h e d , or failure. F o r e x a m p l e , t h e s e a r c h U N I I ED 5TBTES RND THE WORLD WPR 2 ECONOMY m a t c h e s o n t h e 13 c h a r a c t e r s o f UNITED 57WTE5. I he -37- 7 System description stop words PND and THE are removed. Lookup is re-entered with WORLD WOR 2 ECONOMY. WORLD WMR 2 is matched and removed, and finally ECONOMY is processed. The count displayed for each term is the number of postings for the weak stem, just as in the control system. In the case of a single-word search this can result in an inconsistency between the number of "books under" the word and the final number of books retrieved. No interviewed user mentioned this. Whenever a single word which is not a member of an equivalence class is retrieved, it is subjected to 'strong stemming - and the resulting strong stem is also looked up. In the example, UNITED STRTES and WORLD WHR 2 are not strong-stemmed because they are "go phrases - , but ECONOMY is Looked up both as a weak and a strong stem. Ps a strong stem this will bring in postings for records containing "economics", "economic" and also "economical" and other potential causes of false drops. When both the weak stem and the strong stem look-up return failure - that is, the index contains no match even on the strong stern of the first word of the sought string - the approximate matching technique described in 6.4 comes into play, if the word is long enough. If the system can find what it thinks is a near match, this is presented as shown in Figs 7.8 and 7.y. If there is no candidate replacement, or if the user rejects what the system offers, the procedure is as for the control system. If the weak-stem look-up fails, but the strong stem succeeds, the "CPN'T FIND" message takes the form CQN'T FIND 'manageability' - 3866 books under similar words In the example the system will have found postings for stems arising from "manage", "manager", "management" etc. The user is not given the option of replacing the word Cbut can, of course, abort the search with the red key}. Sometimes the strong stemming causes a user's misspelling to be matched, often to a misspelling in the source file or to a foreign word. -98- 7 System description Fig 7.8 EXP s y s t e m suggests a replacement Your search: "introductory sociotgy' Looking up these words 31B books under •introductory1 CflN'T FIND 'sociolgv' - closest match found is 'sociology1 GREEN KEY to use 'sociology' instead I BLUE KEY to type a different word (RED KEY to abandon this search) Fig 7.3 User accepts suggested replacement Your search: 'introductory sociology' Looking up these words 31E books under 'introductory1 3497 books under 'sociology' Looking for books described by your search - please wait.. 18 books match your search exactly GREEN KEY to look at the bookCs) found I BLUE KEY to correct or change your search RED KEY to do a different search -33- 7 Symtmm dmmcpipilon The a included under 0 message is more frequent in the experimental iy§tem than in the control systems this message is displayed if two terms in t ( > search belong to h§ the same equivalence cLi^i Cthis is ©f course not known until the second term is looked up) „ On example is the search TERT1HRY EMPLOYMENT IN UNITED KINGDOM ILK* . giving Q yk° included under °united fkiogderh is rather w h i c h i s p r e s u m a b l y wtiaf t h d u s e r i n t e n d e d 0 C l t r a r e f o r y i i f i t o t h i n k ©f i n c l u d i n g l y n o n y t i i o ) odlS T©(P(JUD o(°ndeim)e]f corn - t h e eoeou T h e r e i s m d e s c r i p t i o n ©f t h e p r o c e s s i t s e l f we show how i t i s p r e s e n t e d t o t h e u s e r B i n BT lere If the merge was always very quick Cup to about three seconds B n y ) „ there would be no need to indicate anyth^ further to thm user except the number of books found. However, with mul f i==word searches containing some heavily posted ternsn particularly on the experimental system where both strong end weak stems enter the merge 8 this process could take a minute or more c The computer displays "Looking for books described by your search00 CPig ? D B 3 0 If the total number of postings to be examined is more than a few hundred s hplease wait*3 is appended , This is followed bsy a Countdown 5 5 which is actually the number of desk accesses needed to read all the postings which may need to be considered 0 It is certainly not obvious to users what the decreasing number represents = mi least one person thought it was a time in seconds = but people can see thai something is - _ „ and none of those interviewed found it ©finputting 0 If would be easy to display an estimate of the time B or to provide a range of messages, but we did not think that this was necessary. (i the end of the merge B the system knows how many postings Ht there ere in its output lostD and also how many of these are of maximum possible weight Ciodw would have bmmn retrieved by a boolean END of alt the ferms3y and how many are of °good a weight 0 There ere about fen different results messages depending on the distribution of weights in the output tisfo Partly to avoid making the system sometimes look silly and partly to avoid cluttering the screen B the message is never more than two lineSo Some of the possible combinations mnw illustrated in Fogs 7.7, 70BS 7 d 0 and 7 0 1 1 0 7 System description Fig 7.10 O search for two common terms which do not cooccur Your search: 'psychology of plants' Looking up these words 3034 books under 'psychology' 1278 books under 'plants' Looking for books described by your search - please wait No books match your search BLUE KEY to correct or change your search I RED KEY to do a different search Fig 7.11 Two *rare* terms which do not co-occur Your search: 'yachting and boating' Looking up these words 5 books under 'yachting' 5 books under 'boating' Looking for books described by your search 10 books found but none match your search very well GREEN KEY to look at the book(s) found I BLUE KEY to correct or change your words RED KEY to do a different search -101- 7 System description The green (record display} option, the blue (correct or change your search} and the red Cdo a different search} are always in the same order and relative positions on the screen. In a production system this would be a design weakness, because there are some searches, such as H15T0RY or SOCIOLOGY, which may be sensible Cthe user may want to see where in the library these general topics are shelved}, but are more Likely to be associated with misconceptions about the size of the collection or the way the catalogue works. When the number of postings with •good" weight is greater than, say, fifty, it is appropriate to suggest quite prominently that the user might like to "make the search more specific" Cwhile not preventing the display of records if that is what the user wants}. 7.6 Record display Pt first we tried providing both a brief one-line record and a "full" record. It was clear that single line records are not generally appropriate for subject searching because it is not possible to display the subject headings, or even the complete title. Experience with users of previous Okapi systems suggested that subject headings, while not being very good sources of index terms, are felt to be useful in making relevance judgments. On the other hand, brief records do display slightly more quickly, and at least one user asked for them. The most serious difficulty is that of the number of options which would have to be offered during record display. If brief" display were the default, with full records selected by Line number and browsing forwards and backwards provided at both Levels, the red key could be used to return from full to brief display as in previous Okapi systems. However, there is Little doubt that fuLL records should be the default. It may, on balance, be worth providing ari additional function key for switching between full and brief displays. UJe did not provide this because the systems described here also form the basis of an experimental "relevance feedback" CRF} system, in which there are more options and questions at the foot of the full display screen CLOOK FOR OTHER BOOKS LIKE THIS ONE; 5EE BOOKS SHELVED NERR THIS ONE; IS THIS THE SORT OF THING YOU ORE LOOKING FOR?}. Fig 7.12 shows the information displayed and the layout of the records. The display is almost identical to earlier Okapis. Records never occupy more than one screen. In the rare cases when there would be an overflow they are arbitrarily truncated. -102- 7 System description Fig 7.12 FuLL record display SUBJECT SEARCH FILL DI9=>LRY Book 3 of 3 'sociology of unemployment i n the uk* RJTHDRC5) (ThOPSCN K) TITLEC5) The irarployed PUBLICATION Harrap, Kay 1964 5UBJECTC5) Great B r i t a i n . I n d u s t r i a l sociology. Employment. Sociological perspectives Not i n t h i s branch No. of copies i n other PCL l i b r a r i e s : hfld (1) Shelved at : 306.36 W R O RED KEY to search again or to f i n i s h BLUE KEY to see the PREVIOUS book again GREEN KEY to see the hEXT book 7.6.1 HighLighting of search terms in records Where p o s s i b l e , w o r d s i n r e c o r d s w h i c h m a t c h s e a r c h t e r m s BTB h i g h l i g h t e d . This f e a t u r e i s almost u n i v e r s a l l y liked, but i t i s not at a l l easy t o implement e c o n o m i c a l l y and u n i v e r s a l l y i n a s y s t e m w h e r e t h e r e c o r d s r e t r i e v e d may n o t c o n t a i n the a c t u a l words of the s e a r c h . B y far t h e e a s i e s t way t o i m p l e m e n t h i g h l i g h t i n g i s b y s t o r i n g w i t h each b i b l i o g r a p h i c r e c o r d a l l the terms under which i t i s i n d e x e d , i n the form i n which they appear in t h e i n d e x - i . e . words and p h r a s e s , w i t h s p e l l i n g n o r m a l i s e d a n d s u b j e c t e d t o b o t h weak a n d s t r o n g s t e m m i n g . T h e s e i n d e x t e r m s w o u l d be s t o r e d w i t h p o i n t e r s t o t h e words or p h r a s e s i n t h e body o f t h e r e c o r d w h i c h t h e y represent. T h e s t o r a g e o v e r h e a d m i g h t b e a r o u n d 5 0 % . CThe s t o r e d i n d e x t e r m s c o u l d a l s o be u s e d t o compute m e a s u r e s of M s i m i I a n t y " b e t w e e n r e t r i e v e d r e c o r d s and t h e q u e r y : t h e s e a r c h WRR RND PERCE may b e a b e t t e r m a t c h w i t h a r e c o r d i n d e x e d o n l y u n d e r WRR a n d PERCE t h a n i t i s w i t h o n e i n d e x e d a d d i t i o n a l l y b y RU9TRRLIR 1 9 1 4 - 1 9 4 6 . 3 -103- 7 5ymi(Bm d^m^piptian H slightLy different approach to highlighting, avoiding the storage ovirheadj is to extract the index terms from a idved record before it i@ formatted for display 5 as if it wtPi being indexed 0 CThis s § in fact done 6 in prepar, ation for m P t l e v m c i feedback system in ekiieh words extracted from riLevant ficordi would be considered for query expansson 0 3 Howiver s shortage of memory prevents online storage ©f the Lo©k°yp t@ble Cgo/see list) 130 t n r fa@©k§sbut they bosk t oateb your seersh very est I noe ?oio3 Opt^mm foiiovsiog raeenrd display In Okapi °BB the current search is lost on exit from record display - except that the user is reminded of if on the \nmui input screen £Fig ? Q 1 There is a case for the red 0 key returning the user to a screen shoeing the results of the current search 0 This screen would! contain options to start a new search^ to alter the current search or redisplay the records Cperhaps with a subreption of returning to the record display at the point mt which if was left). -m<6- 7 System description This is the a p p r o a c h p r o v i d e d in the S W P L C H P L I B E R ! P b online catalogue. It is d i f f i c u l t to think of a s u i t a b l y c o n c i s e and e x p r e s s i v e m e s s a g e for the red key p r o m p t . W e h a v e no e v i d e n c e as to the r e l a t i v e m e r i t s of the two a p p r o a c h e s : the first has the v i r t u e of s i m p l i c i t y , but the second p r o v i d e s m o r e f l e x i b i l i t y . 7.7 5 e c o n d and subsequent input screens W h e n the user p r e s s e s the red key from a record d i s p l a y , a m o d i f i e d input s c r e e n a p p e a r s CFig 7.133. Q k a p i '84 m a d e it too easy for u s e r s to repeat their p r e v i o u s s e a r c h e s by a l w a y s r e t u r n i n g to an input s c r e e n w i t h the p r e v i o u s s e a r c h o c c u p y i n g the input field. In the n e w s y s t e m s second and subsequent input s c r e e n s h a v e a blank input field but c a r r y a brief reminder of up to three p r e v i o u s searches. Fig 7.13 S u b s e q u e n t input s c r e e n s h o w i n g the r e s u l t s of previous searches Previous search(es) "domestic violence' "battered women' "women" No of books 2 22 more than 500 Type a word or a phrase which describes the books you want: I , The "number of b o o k s " for p r e v i o u s s e a r c h e s is the total number r e t r i e v e d , not the number w h i c h m a t c h e d "exactly" or "quite w e l l " . It w o u l d be better to d i s p l a y s o m e t h i n g Like F i g 7.14 Improved subsequent input screen Previous search(es) No of books found good matches total 2 2 1 22 more than 500 "domestic violence" "battered women" "women" _ inc;_ y System description Reference 1 MI7EV N N, VENNER G M and WALKER 5. Designing an p u b l i c access cataLogue : Okapi, a catalogue on a onLme LocaL area network CLibrary and Information Research Report 3 a ) . London : the British Library, 1385.