Department of Computer Science Cornell University Ithaca, New York 14850 Scientific Report No. ISR-14 INFORMATION STORAGE AND RETRIEVAL to The National Science Foundation Reports on Analysis, Search and Iterative Retrieval Ithaca, New York October 1968 Gerard Salton Project Director 0 Copyright 1968 by Cornell University Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. Staff of the Department of Computer Science Cornell University Kenneth M. Brown Robert L. Constable Richard W. ConwayBarbara Evers Sally Grove Juris Hartmanis John E. Hopcroft Howard L. Morgan Joann Newman Rosalind Pasquali Christopher Pottle Gerard Salton Alan C« Shaw Stephen Stephenson Roland A. Sweet Robert A Wagner Robert J. Walker Peter Wegner Donna Williamson Robert E. Williamson William S. Worley Project Staff in the Division of Engineering and Applied Physics Harvard University Jeffrey Bean Jeffrey Golden Michael Lesk E. Ricardo Quinones iii REPORTS ON ANALYSIS, SEARCH AND ITERATIVE RETRIEVAL TABLE OF CONTENTS Page SUMMARY Xlll PART ONE SYSTEM DESIGN LESK, M. E. "Design of a Revised On-Line Information Retrieval System" 1. 2. 3. Introduction . 1-1 1-4 1-11 1-13 1-14 1-27 1-30 1-34 1-36 1-39 1-43 1-43 1-44 1-47 1-49 1-49 1-50 . . . . . . 1-50 1-52 Supervisor Organization System Procedures A) B) C) D) E) F) G) H) I) vJ) K) L) M) N) Request and Text Input Request and Text Lookup Automatic Thesaurus Processing Phrase Processing Hierarchical Processing Concept Vector Formation and Storage Searching of Document Collections Clustering of Document Collections Relevance Feedback Dictionary Displays Citation Searching Class Information Selective Information Dissemination User Information Files . . . . . . 4. 5. Equipment Summary iv TABLE OF CONTENTS (continued) Page continued References Appendix 1-54 1-56 WILLIAMSON, D. "The Cornell Implementation of the SMART System" Abstract 1. 2. 3. . . . . . . . . . . . II-l II-l II-2 11-12 11-12 11-15 . . 11-20 Introduction Basic Cornell System Organization The SMART System Routines A) B) Control Routines Inner System Routines . . . . References PART TWO ANALYSIS AND SEARCH LESK, M. E. AND SALTON, G. "Relevance Assessments and Retrieval System Evaluation" Abstract 1. 2. 3. 4. 5. 6. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . III-l III-l III-3 III-8 111-14 111-23 111-28 Introduction . The Relevance Problem The Experiment . Experimental Results Judgment Consistency and Performance Measures Machine Search Effectiveness . . . . v TABLE OF CONTENTS (continued) Page continued References Appendix . III III COYAUD, M. "Resolution of Lexical Ambiguities in Ophthalmology11 1. 2. Introduction . . . . . . . . . . . . . • IV IV IV IV IV IV IV IV Procedures for Devising the Polysemy Rules A) B) C) The Rules Inspired by Corpus A Notes to the Polysemy Rules . . . The Control of the Rules by Corpus B . . . . . . . . . . . . . . . . . . . . 3. Conclusion References Annex I . DATTOLA, R. T. "A Fast Algorithm for Automatic Classification" Abstract . 1. 2. 3. 4. Introduction 2 The N Problem . Doyle's Algorithm Satisfaction of Termination Condition A) B) 5. Non-convergence of Doylef s Algorithm Termination of Modified Algorithm . . . . . . V V V V V V V V Implementation VI TABLE OF CONTENTS (continued) Pag contn.nued 6. Experimental Results A) B) The Scoring Function Movement of Documents . Initial Clusters . . . . . . . . V V V VV . . . . . . VV- c) D) 7. Evaluation of Results . . . . . . . . Conclusion References SALTON, G. AND WILLIAMSON, D. K. "A Comparison Between Manual and Automatic Indexing Methods" Abstract 1. 2. 3. . . . . . . . . . . VIVIVIVIVIVIVIVIVIVIVIVI- Introduction The Evaluation of Information Systems The Test Design A) B) . . . . The MEDLARS Evaluation Study Design of the SMART Test 4. 5. 6. SMART-MEDLARS Comparison Comparison of SMART Analysis Methods Conclusions . . . . . . . . . . . References Appendix A . Appendix B . vii TABLE OF CONTENTS (continued) Page PART THREE USER FEEDBACK PROCEDURES VII. SALTON, G. "Search and Retrieval Experiments in Real-Time Information Retrieval" A.DS L l d C t . . . . « . . . » VII-1 VII-1 VII-3 VII-12 1. 2. Introduction . . . . . . Performance Characteristics of Information Systems 3. • • • • • « User Feedback Retrieval Methods A\ B) CX General Methodology Positive Feedback Negative Feedback . . . . . . . . . . . . . . . . VII-12 VII-14 VII-19 VII-29 References, VIII. IDE, E. "New Experiments in Relevance Feedback" Abstract 1. 2. 3. 4. . . . . . . . VIII-1 VIII-1 VIII-4 VIII-6 VIII-7 VIII-7 VIII-10 VIII-12 The Relevance Feedback Procedure The Experimental Environment Earlier Results in the Same Environment Evaluation of Retrieval Performance A) B) C) The "Feedback Effect" in Evaluation Performance Measures Statistical Tests vili TABLE OF CONTENTS (continued) Page 5. Experimental Results A) B) C) 6. . . . . . VIII-14 Two Strategies Using Relevant Documents Only . . Varying the Amount of Feedback . . . VIII-15 VIII-16 VIII-20 VIII-28 Strategies Using Nonrelevant Documents Summary and Recommendations . . . . . . . References VIII-30 LESK, M. E. AND SALTON, G. "Interactive Search and Retrieval Methods Using Automatic Information Displays" Abstract 1. 2. 3. 4. 5. . . . . . . . . . . . . . . . . . . IX-1 IX-1 IX-4 IX-8 IX-13 IX-16 IX-17 IX-20 . . . . IX-31 IX-35 Introduction Fully-Automatic Retrieval User Interaction Through Pre-Search Methods User Interaction Through Post-Search Methods Evaluation Results and Discussion A) B) Recall-Precision Results Overall Evaluation . . . . . . 6. Conclusion References DAVIS, M. C , LINSKY, M. D., AND ZELKOWITZ, M. V. "A Relevance Feedback System Employing a Dynamically Evolving Document Space" Abstract 1. . . . . . . . . . . . . . . . . X-l X-l Introduction IX TABLE OF CONTENTS (continued) Page continued 2. 3. 4. Proposed Study Experimental Results Results and Conclusions . . . . . . . . . . . . . . . . . . . . . . . X-5 X-9 X-23 X-25 X-26 References Appendix XI. BRAUEN, T. L., HOLT, R. C., AND WILCOX, T. R. "Document Indexing Based on Relevance Feedback Abstract 1. 2. 3. 4. 5. Introduction Method . The Experiment Experimental Results Discussion XI-1 XI-1 XI-3 XI-6 XI-8 XI-12 XI-15 XI-16 References . Appendix XII. BORODIN, A., KERR, L., AND LEWIS, F. "Query Splitting in Relevance Feedback Systems Abstract 1. 2. 3. 4. . . . . . . . . . . . . XII-1 XII-1 XII-3 XII-5 Introduction The Query Splitting Algorithm Evaluation and Results Conclusions and Suggestions for Further Research . . . . . XII-14 TABLE OF CONTENTS (continued) Page References Appendix 1. 2. 3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . XII-15 XII-16 XII-16 XII-16 XII-17 Introduction General Algorithm System Operation XIII. CRAWFORD, R. G., AND MELZER, H. Z. "The Use of Relevant Documents Instead of Queries in Relevance Feedback" Abstract 1. 2. 3. 4. . . . . . . . . . . . . . . XIII-1 XIII-1 XIII-3 . . . . . . . . . XIII-6 XIII-8 XIII-8 XIII-18 . . XIII-21 XIII-26 Introduction Motivation and Assumptions Implementation Results A) B) . . . . . Feedback Using Only Relevant Documents Source Document Used as Original Query . . . . . . . . . . . . . 5. Conclusions References PART FOUR EDITING PROGRAMS XIV. BEAN, JEFFREY "Bean's Automatic Tape Manipulator - A Description, and Operating Instructions" 1. 2. General Description . Operating Instructions « • « • XIV-1 XIV-2 xi TABLE OF CONTENTS (continued) Page XV. QUINONES, RICARDO, E. "EDIT - An Editing Subroutine" 1. 2. 3. General Description . Requirements and Specifications for BATMAN EDIT Control Card Format A) B) C) Serialization Editing Commands Special Commands 4. Editing Principles A) B) C) Temporary Cards ("!" and D Cards) Permanent Cards (P and X Cards)• String Contents . . . . 5. Miscellany and Tables xii