-31Chapter 4. RESULTS 4.1 Problem Statements - general characteristics Our initial analysis of these statements We collected problem statements from 27 interviewees, which ranged from 490 to 66 tokens in length. was to remove non-significant words from the text of the interviews. As a measure of the 'content density' we computed the ratio of tokens before and after this procedure for each interview, and as a measure of 'redundancy', the type-token ration after removing non-significant words. These data are displayed in Table 1, from which one can note that the problem statements are all quite similar in these characteristics, regardless of length. Using the results of the text analysis program, Table 2 indicates the maximum and minimum (of the top 40 associates) association strengths for each oral problem statement, and the number of types (after removing non-significant words). of types from 99 to 21. We were concerned to see if there were correlations between association strengths and text characteristics, and if there were differences among the oral problem statements, written problem statements and abstracts. For the oral problem statements, the values of r (the product-moment correlation) calculated were: 1. Highest association strength vs. number of types r = 0.5676 2. Range of association strengths vs. number of types r = 0.5547 Both these values are significant at the 1% level (two-tailed t test). In order to see if there were consistent differences between general text characteristics of oral and written problem statements, we performed the same analysis on eight written problem statements. These results are displayed in Tables 3 and 4, which show a mean 'content density' of 2.05, as compared to 2.7 for oral problem statements. The values for r, the product-moment correlation, for the written problem statements were: 1. Highest association strength vs. number of types r = 0.377 Maximum association strengths ranged from 1481 to 166, minimum from 132 to 25, and number -32- Interview No. '. 1 Pre -— Tokens Post 2.49 2.78 3.09 3.06 2.40 2.97 2.00 2.27 3.76 2.89 2.13 3.06 2.68 2.69 2.30 2.63 2.59 2.57 2.20 2.47 2.27 2.76 3.78 2.78 2.87 2.39 2.98 i Pos Types t -=r-, Tokens 0.48 0.67 0.78 0.66 0.78 0.74 0.69 0.89 0.71 0.67 0.45 0.62 0.81 0.53 0.61 0.73 Q.37 0.60 0.70 0.64 0.70 0.76 0.72 0.66 0.77 0.69 0.82 0.69 0.01 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 Mean 2 Variance (S ) 2.7 0.3 Pre - Before Identification of Significant Words Post- After Identification of Significant Words Table 1. Token-token and type-token ratios for oral problem statements (from Brooks, 1978). -33- Interview Highest Association Strength Lowest Association Strength Range of Strengths No. of types No. 1 2 3 879 316 198 257 258 430 631 273 497 297 464 924 273 1481 1114 132 50 33 50 33 42 33 33 58 50 26 75 58 52 83 25 25 48 25 100? 47 266 165 207 225 372 598 240 439 247 438 849 215 1349 1031 95 68 61 47 55 56 31 33 60 37 64 49 43 87 84 37 30 80 21 99 32 34 26 53 46 42 46 1 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 183 166 489 249 390 297 265 264 447 297 290 264 158 141 431 224 290 231 232 231 397 239 240 198 1 56 33 33 50 58 50 66 m* ' Table 2. Association strengths and number of types for oral problem statements (from Brooks, 1978). -34- Script No. Pre *. , Tokens _ Post Types Tokens 0.7 0.85 0.68 0.75 0.71 0.71 0.76 0.68 0.73 0.01 Post 30 31 32 33 34 35 36 37 Mean 2 Variance (S ) 2.08 2.29 1.84 2.00 1.86 1.55 2.09 2.68 2.05 0.113 Table 3. Token-token and type-token ratios for written problem statements (from Brooks, 1978) Script No. Highest Association Strength Lowest Association Strength 33 33 33 25 33 33 33 25 Range of Association Strengths No. of Types 30 31 32 33 34 35 36 37 198 141 183 100 657 1242 274 258 165 108 150 75 624 1209 241 233 42 35 35 24 42 39 34 15 J Table 4. Association strengths and number of types for written problem statements, (from Brooks, 1978). -352. Range of association strengths vs. number of types r = 0.384 Neither of these values is significant, even when recomputed after having removed number 35 from the data because its maximum association strength is so much higher. text parameter Thus, there appears to be no relation between this and association strength values. Finally, in order to indicate the subject spread of the problem statements, we classed them into the five broad categories indicated in Table 5. is The social sciences are well represented in our sample, as medicine, with perhaps some under representation of technology and Nevertheless, the spread, given sample size, is the natural sciences. reasonably broad. 4.2 Problem Statements - evaluation * The point of the surveys of users and authors was to see whether the analyses of the problem statements and abstracts were in general accord with the originators1 own perceptions of their information needs or of the ideas they were attempting to communicate; and, if there were disparities, then to see if there might be suggestions for improvement. Of course, for retrieval purposes it may not be necessary for the representations to be congruent with the originators' ideas about them, but as a first method of evaluation the technique seemed reasonable. If the subjects were unanimous in their disapproval of the representations, then we could be fairly sure that we should probably try something else. We wished to determine in evaluating the problem statement representation: 1. how accurately, in the interviewee's opinion, the two formats described her/his ASK at the time of the interview; and 2. how the two formats compared with one another. Response to the survey was good, 63% of the group (N = 27) returning completed questionnaires. Table 6 is a summary of replies to the Association Map questionnaire, Table 7 to the Association Clusters questionnaire, and Table 8 to the comparative questionnaire. From these tables it is evident that the analysis, presented in the Association Map format, provided a generally adequate representation of the information needs of the interviewees. The major criticism of the analysis is that some concepts were too weakly associated, and this seems * This section, and section 4.4, are based on Brooks, Oddy and Belkin (1979) -36- —————————————— Psychology /Education /Sociology /Linguistics Medicine Agriculture Information Science Biology /Chemistry /BioChemistry 1 2 6 8 12 13 22 24 Totals < 3 4 5 19 20 21 23 25 8 -> i 9 10 11 14 15 16 18 7 17 26 27 8 3 ! 4 I 1 4 1 Table 5. Subject areas of interviewees by interview number (from Brooks, 1978). -37- rH II &^ CO m j fr* N^ CO LU UD • rH en CD LLUJ G£ z CD CD CD CO CO a LU CO CO 1 CD ir^T o- •—H 1 C£ o 1— CO o^£ ^~ CO r— h- CD CD CO rH < >*^ • PQ v^/ • a PQ Ol UJ LU !2*^ CD »—i 1— UJ PQ f— O 1— UJ CD CD zrz t— Q£ U J CD _ l l^gr OSE " " ^ ^p CD _l >_l >- -U CD CD CO CO f^ UJ —J PQ 1— CO ^^ ^^* OUJ CD Z2H CD CD CO Q—-"} OL. CD CD ss v-^ gy ^-% < K~S CNJ °£ CD Ql CD •~s """^ 1< ^ >»-• ^ " N ^-^ PQ Nw/ PQ **~s < v PQ ^ rA f O QGO LU Od o GO UJ >** | L A N Y PREFERENCE? 81.8 18.2 \2.PREFER ASSOCIATION MAP 1 3 , P R E F E R ASSOCIATION CLUSTER 1 • • ! • 1 I I I GO UJ ZD G3 m rH II 73.0 26.6 N=ll 1 N=ll 1 O II CNI M i l l • • • 4. 75.0 IF NO PREFERENCE, WERE BOTH UNSUCCESSFUL? UJ GO UJ o oo N^ fr* co cn cn 06 00 CD CO LU Q£ QLU Cd UJ o- CO c> _ UJ O 3 h- <-> 3 Q£ CO CO CO hCd ZD C_> C_> I— CL. UJ Q_ UJ cu UJ c_> CD c_> CD C_> c_> CD C_> CQ CD 1—4 o