TABLE OF CONTENTS Page SUMMARY PART ONE SMART SYSTEM DESIGN I. WILLIAMSON, D., WILLIAMSON, R., AND LESK, M. "The Cornell Implementation of the SMART System" Abstract 1. 2. Introduction Basic System Organization A) B) C) D) E) 3. 4. Input of Printed Text Document Clustering for Search Purposes The Selection of Documents to be Searched The Searching of the Document Groups . Search Evaluation . . . . . . . . . . . . 1-1 1-1 1-2 1-3 1-5 1-1*4 1-28 1-41 1-54 1-56 . . . . . . 1-62 Access to the SMART System Basic SMART System Flowchart . . . . . References II. MURRAY, D. "A Scatter Storage Scheme For Dictionary Lookups" 1. 2. Introduction . . . . . . . . . . . . . . . . . II-l II-2 II-2 II-3 . . . II-4 Basic Scatter Storage . A) B) C) Method Collisions Table Layout and Search Procedure iv SMART Project Staff Robert Crawford Barbara Evers Marcia Kerchner Michael Lesk (Harvard) Harry Melzer Rosalind Pasquali Jacob Razon Angie Rettig Gerard Salton Donna Williamson Robert Williamson Steven Worona iii Copyright 1969 by Cornell University Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. 11 A Department of Computer Science Cornell University Ithaca, New York 14850 Scientific Report No. ISR-16 INFORMATION STORAGE AND RETRIEVAL to The National Science Foundation Ithaca, New York September 1969 Gerard Salton Project Director i TABLE OF CONTENTS (continued) Page II. continued D) 3. Theoretical Expectations II-5 11-10 11-10 11-12 Virtual Scatter Storage A) B) Method Collision Problem 4. Experiments with Algorithms for Generating Hash Addresses A) B) C) Dictionaries Hash Coding Algorithms Evaluation . . . . . . 11-15 11-15 11-17 11-21 11-32 11-32 11-33 5. A Practical Lookup Scheme A) B) C) D) E) General Description Table Layout Search Considerations Performance Comparisons . . . . . . . . . . . . . . 11-34 11-38 . . . . . . . . . . . . 11-40 11-40 11-40 11-41 11-41 6. Extensions A) B) Larger Dictionaries Suffix Removal . . . . 7. Conclusions . . . . . . . . . . . References 11-42 III. JOINER, J. AND WERNER, L. "A New Evaluation Measure" Abstract 1. 2. 3. Introduction . . . . . . . . . III-l III-l III-2 . . . II1-4 Problems of Evaluation Criteria for a Good Evaluation Measure v TABLE OF CONTENTS (continued) Page III. continued 4. 5. The Probability Measure Tests III-5 III-8 III-ll Bibliography PART TWO CONTENT ANALYSIS METHODS IV. SALTON, G. "Automatic Processing of Foreign Language Documents" Abstract 1. 2. 3. 4. 5. 6. 7. Introduction The SMART System The Evaluation of Language Analysis Methods . Multi-lingual Thesaurus Foreign Language Retrieval Experiment Failure Analysis Conclusion . . . . . . . . . . . . . . . . . . . . . . . . IV-1 IV-1 IV-3 IV-6 IV-10 IV-12 IV-19 IV-25 IV-28 IV-29 References Appendix V. WEISS, S. F. "Syntax in Text Analysis" Abstract 1. Introduction vi V-l V-l TABLE OF CONTENTS (continued) Page V. continued 2. 3. 4. 5. 6. 7. Statistical Phrases Syntactic Phrases Cooccurrence . . . . . . . . . . . . . . . . . . V-2 V-4 V-5 V-9 . . . . . V-ll V-16 . . . . . . . . . . V-18 Elimination of the Phrase List Analysis of Results Conclusion . . . . References VI. WEISS, S. F. "Template Analysis and its Application to Natural Language Processing" Abstract 1. The Basics of Template Analysis A) B) C) 2. Introduction Types of templates Applicability of template analysis . . . VI-1 VI-1 VI-1 VI-4 VI-7 An Implementation of Natural Language Analysis by Template Analysis . . . . . . A) B) Keyword Analysis . . . . . . . VI-8 VI-9 VI-13 Implementation conventions . . . 3. An Implementation of Template Analysis A) B) C) D) E) Date phrases Journal phrases Author phrases Experiments and results . Conclusion . . . VI-26 VI-27 VI-32 VI-35 . . VI-41 VI-4U VI-46 Bibliography vii TABLE OF CONTENTS (continued) Pag continued Appendix A Appendix B VI VI FAITH, B. AND JENSEN, J. "The Combination of Thesaurus and Word Form Vectors" Abstract 1. 2. 3. 4. Introduction Procedure Results Further Studies VII VII VII VII VII VII References McNEIL, J. W., AND WETHERELL, C. S. "Bibliographic Data as an Aid to Document Retrieval" Abstract 1. 2. 3. 4. 5. Introduction The Experiment The Statistical Measure The Results Conclusions . . . . . . . . . . . . VIII VIII VIII VIII VIII VIII VIII References viii TABLE OF CONTENTS (continued) Page PART THREE USER FEEDBACK PROCEDURES IX. BROWN, J. S., AND REILLY, P. D. "The Use of Statistical Significance in Relevance Feedback" Abstract 1. 2. 3. 4. 5. Introduction . . . . . . . . . IX-1 IX-1 IX-10 IX-13 IX-1U IX-33 . . . . IX-35 IX-37 Query Construction Conduct of the Experiment Experimental Results Conclusions and Recommendations . . . . . . . References Appendix A X. CIRILLO, C , CHANG, Y. K. , AND RAZON, J. "Evaluation of Feedback Retrieval using Modified Freezing, Residual Collection and Test and Control Groups" Abstract 1. Introduction . . . . . . . . . X-l X-l Part A: 1. 2. 3. 4. Evaluation of Feedback Retrieval Using Modified Freezing . . . . . . X-3 X-U X-4 X-7 Introduction Modified Freezing Evaluation Results Discussion . . . . . . . . . . X-8 IX TABLE OF CONTENTS (continued) Page X. continued Part B: Evaluation of Feedback Retrieval Using Residual Collection Feedback . . . X-ll X-12 X-12 1. 2. 3. Statement of the Problem Summary of Methods Results and Conclusions . . . . . . . X-14 Part C: Evaluation of Feedback Retrieval Using Test and Control Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . X-22 X-23 X-23 X-27 X-33 X-34 1. 2. 3. 4. Introduction Process Description Experimental Results and Evaluation Conclusions . . . . . . . . . . . . . References XI. IDE, E., AND SALTON, G. "Interactive Search Strategies and Dynamic File Organization in Information Retrieval" Abstract 1. 2. Retrieval System Performance Request Space Modifications A) B) C) 3. 4. 5. Relevance Feedback Positive and Negative Strategies . Selective Negative Feedback . . . . . . . . . . . . . . . XI-1 XI-1 XI-4 XI-MXI-6 XI-19 XI-22 . . . . . . XI-28 XI-32 XI-33 Document Clustering Document Space Modification Conclusion References x TABLE OF CONTENTS (continued) Page XII. LEVENTHAL, T., AND MILLER, R. "Query Splitting Using Relevant Documents Instead of Queries in Relevance Feedback" Abstract 1. 2. 3. 4. 5. Introduction Motivations and Assumptions Implementation . . . . . . . . . XIIXIIXIIXIIXII. . . . . . . XIIXII- Evaluation and Results Conclusions . . References PART FOUR CLUSTERING METHODS XIII. DATTOLA, R. "Experiments with a Fast Algorithm for Automatic Classificat Abstract 1. 2. 3. Introduction . . . . . . . . . . . . . . . . . . . . . . . . XIIIXIIIXIIIXIIIXIIIXIIIXIIIXIII. . . . . . XIIIXIII- General Description Implementation . A) B) C) Initial Clusters Overlap Algorithm 4. Evaluation A) B) Evaluation Measures Internal Evaluation xi TABLE OF CONTENTS (continued) Page XIII. continued C) Initial Clusters XIII-23 XIII-36 XIII-42 XIII-45 XIII-48 XIII-51 XIII-59 XIII-62 D) Number of Clusters E) Overlap F) Cutoff G) Percent Loose Clustered H) External Evaluation 5. Conclusion References XIV. RIEBER, S.9 AND MARATHE, V. P. "The Single Pass Clustering Method" Abstract 1. 2. 3. Introduction The Program Investigation and Results A) Correlation Comparison B) Disjoint-Overlapping Comparison C) Variation of Document Order D) 4. SMART Evaluation . . . . . . . . . . . . . XIV-1 XIV-1 XIV-4 XIV-7 XIV-7 XIV-8 XIV-10 XIV-10 XIV-1*-XIV-1E XIV-19 XIV-27 Conclusions References Appendix 1 Appendix 2 xii TABLE OF CONTENTS (continued) Page XV. WORONA, S. "Query Clustering in a Large Document Space" Abstract 1. 2. 3. *4. 5. 6. Introduction Generating Clusters Searching Clustered Collections . . . . . . . XV-1 XV-1 XV-2 XV-5 XV-5 XV-8 XV-12 XV-15 XV-17 XV-22 Parameters for Evaluating Cluster Searches The Experiment Results References Appendix A Appendix B xiii A