R E P O R T ON A STUDY D E S I G N FOR THE 1 'IDEAL T E S T I N F O R M A T I O N R E T R I E V A L C O L L E C T I O N K. S p a r c k B a t e s J o n e s R. G. Computer L a b o r a t o r y U n i v e r s i t y of Cambridge Corn Exchange S t r e e t Cambridge fc\ \^S October 1977 Summary This Report presents the findings of a Design Study for the 'Ideal1 Information Retrieval Test Collection. Part A, Design, covers a detailed collection specification, an investigation of sources of document material and methods of obtaining requests and relevance assessments, and estimates of the costs of building various versions of the collection. The conclusion is that a collection consisting of a main set of 30,000 scientific documents and 750 requests with adequate relevance assessments, plus a supporting set of 3000 social science documents with 250 requests and assessments, providing a range of characterisations for the documents and requests but not citation data, could be provided in a two year buildincr programme for about £85K. A similar collection with citation data for the documents could cost £94K. One with more supporting document sets and their own requests and assessments, but without citations, could cost E109K, and a collection with this range of sets and citations could cost £123K. In Part B, Uses, information on possible uses of the 'ideal1 Collection collected as part of the Design Study work is summarised, to allow some evaluation of the proposed collection as a tool for worthwhile research and teaching. The collection design and costings were considered by the Study Project's Advisory Panel, and in Part C, Discussion, the main comments made by the Panel are noted. Consequent chanqes to the initial design and costings are indicated, and a possible cut-price collection costing perhaps £55k is presented. Professor Cleverdon has questioned the need for a specially-built test collection, and more specifically argues that sufficiently useful test material mav be obtained as a byproduct of operational system investigations. We therefore examine the relationship between a purpose-built 'ideal' collection and an 'incidental' one, in the context of the types of information retrieval research which may by desirable; and conclude that unless very severe restrictions are placed on the kind of research done, an 'ideal' collection, if only a cut-price one, is needed for effective and efficient research. Acknowledgement The Desiqn Study was carried out under British Library Research and Development Department Grant SI/G/231. We are very grateful to the many people who provided information for the Study. Contents PART A : DESIGN 1. 2. 3. 4. 5. 6. 7. 8. 9. Objectives, scope and conduct of the Design Study Summary of the collection specification input to the Design Study Conduct of the Study Document sets Request sets Relevance judgement sets Document and request characterisations Citations Summary of data base selections 10. Possible uses of the 'ideal'collection 11. Costs PART B ; USES 1. 2. Research projects Teaching activities PART C : DISCUSSION 1. 2. Comments on the Study Retrieval experiments References Appendices 1. 2. 3. 4. 5. 6. 7. 8. 9. List of reference works consulted Sample entry from Williams and Rouse' Data Base Directory Example of data base questionnaire as sent out List of CA and CAB subbases Tabulated data base questionnaire replies CAB subbase sizes Analysis of relevance judgement requirements Research roject questionnaire Teaching and on-line education questionnaires