An operational evaluation of weighting, ranking and relevance feedback via a front-end system S.E. Robertson C.L. Thompson Department of Information Science The City University Northampton Square London EC1V OHB, U.K.

Final report to the British Library Research and Development Department on Project Number SI/G/703 (January 1985 - June 1987) July 1987

Suggested keywords: information retrieval, weighted searching, ranked output, relevance feedback, front-end systems, Boolean searching, evaluation

Abstract

Cirt is a front-end system which allows Weighted searching (i.e. search term weighting, ranking of output documents by matching value, and modification of the weights by relevance feedback) on a Boolean host. In this project, Cirt was first modified to improve its usability, and then used for a comparative evaluation of Weighted versus Boolean searching in an operational environment. Searches were conducted by experienced intermediaries in the presence of the end-users, in three University of London institutions as well as at City University, mainly on the Medline or Inspec databases. Each user was randomly allocated to either Weighted or Boolean searching. Evaluation parameters included subjective reactions of the user and the intermediary and cost-related factors as well as the more traditional relevance-related parameters. Because of various delays in the early stages of the project, because the independent-sample (as opposed to matched-pair) design of experiment requires large samples, and because the differences between the systems are generally small, few of the results obtained were statistically significant. The implementation of Weighted searching in a front-end is a limiting factor, as it does not allow many terms to be used or any form of query expansion. Nevertheless, it appears that Weighted searching is a feasible way to do real searches, and that it gives results which are comparable to those obtained from Boolean searching.

Preface

This report describes the second of two projects supported by the British Library, concerning the front-end system Cirt. The first project was concerned with the development of Cirt; this second project has been aimed at evaluating weighted searching via Cirt under operational conditions. The project would not have been possible without the help of a number of people. John Bovey, Mike Macaskill and Helen Mickleburgh were concerned at various stages with developing and maintaining the system. Alina Vickery and the staff at the University of London Central Library Services were very helpful in setting up the contacts with the other London University institutions. A number of such institutions, particularly the School of Hygiene and Tropical Medicine and Middlesex (and also the City University Skinner's Library), gave us the publicity which we needed for the search service offered at City as part of the project. A number of other institutions expressed interest in the project, and had to withdraw from taking part for a variety of practical reasons. Inspec provided some royalty-free searches on their databases. Particular thanks are due to the intermediaries who undertook all the searching and liaison with users, and contributed in this way a very substantial amount of their valuable time to the project. Apart from Catherine Thompson, these were: Alain Besson Sheila Dibley Elizabeth Lyon St. Bartholomew's Imperial College St. George's

Stephen Robertson July 1987

Contents

Abstract Preface Contents 1. Introduction 1.1. 1.2. 1.3. 1.4. 1.5. 2. 2.1. 2.2. 2.3. The theory Aims Functional description of Cirt Technical description of Cirt Structure of the report Initial work Two-process system Refining useability 2.3.1. 2.3.2. 2.3.3. 2.3.4. Search tree Limits Deleting Adding terms offline

Refinements and modifications to Cirt

2.3.5. Saving searches 2.3.6. Look mode 2.3.7. Printing offline 2.4. Discussion: Cirt and the relevance feedback model 2.4.1. Limits 2.4.2. Ignore 2.4.3. Print off-line 2.4.4. Delete and save Methodology 3.1. Sample size 3.2. Variables 3.2.1. Retrieval effectiveness 3.2.2. User effort 3.2.3. Cost 3.2.4. Subjective user reactions 3.2.5. User characteristics 3.2.6. Request characteristics

3.

3.2.7, 3.2.8. 3.3. 3.3.1. 3.3.2. 3.3.3. 3.4. 3.5. 3.6.

Intermediary's contribution Search process characteristics Questionaires Evaluation of offline prints The Logs

Data collection instruments

The participants Procedure for data collection Discussion

Results 4.1. Results from the presearch form 4.2. Results from the post-search user questionnaire 4.2.1. 4.2.2. 4.2.3. 4.2.4. 4.2.5. 4.2.6. 4.2.7. Satisfaction Assessment of the search Search results Searcher's contribution Closeness to original enquiry Number of references expected

References marked as relevant 4.2.8. References viewed online 4.3. Results from the post-search intermediary's questionnaire 4.3.1. 4.3.2. 4.3.3. 4.3.4. 4.4. Overall satisfaction Assessment of the search process Results of the search Reason for finishing

Results from logs 4.4.1. PSS packets sent and received 4.4.2. 4.4.3. 4.4.4. Online time Online citations Offline citations

4.4.5. Terms used in the search, terms added or ammended Results from the relevance assessments 4.5.1. Total number of documents assessed 4.5.2. Total relevant retrieved 4.5.3. Precision 4.6. Summary of results by category of variable 4.6.1. Retrieval effectiveness 4.6.2. User effort 4.6.3. Cost 4.5. 4.6.4. 4.6.5. 4.6.6. 4.6.7. Problems Subjective user reactions User characteristics, request characteristics Intermediary's contribution Perceived online time

5.1. 5.2. 5.3. 5.4. 5.5. 6. 6.1. 6.2. 6.3. References

The York Box and the size problem System crash Staffing Capturing searches Proposal for extension Experimental methodology Proposals for future research Final remarks

Conclusions and recommendations

Appendices Al Macaskill, M.J. A2 A3 A4 A5 A6 A7 A8 Robertson, S.E. experiments Sample logs

Splitting Cirt into two processes On sample sizes for non-matched-pair 1R

Publicity for free searches Thompson, C.L. The Cirt manual Random allocator cards The questionnaires Tables of results