- 37 -

APPENDICES

Al A2

Macaskill, M.J, Robertson, S.E. experiments Sample logs

Splitting Cirt into two processes On sample sizes for non-matched-pair IR

A3 A4 A5 A6 A7 A8

Publicity for free searches Thompson, C.L. The Cirt manual

Random allocator cards The questionnaires Tables of results

APPENDIX Al-1

Splitting cirt into two processes

M. J. Macaskill

The original idea in project B [1,2 ] was to split processes as follows: Parent + . | < | front end | Universal | searching | Search | algorithm | Protocol H + > Child n | X29 & DB | dependent code H

cirt

into

two

-I | | | +

< H Search | stmts, | docs,etc. | > H

+ | Host +

The idea was that this work should be done during the data gathering stage of project A [1]. It was not possible however to add any more features to cirt without running out of core space. So we were forced to do something in the way of splitting cirt as part project A in order to make cirt viable as an experimental tool. The split is not as sophisticated as that outlined in [2], in that it does not attempt to handle more than one host. Nevertheless, it uses the same fundamental method as that outlined in [2]: Parent calls pipe twice to set up a two way communication channel. It then calls fork and if fork returns zero, it spawns the child process using execl and then passes the file descriptors (one read and one write) as arguments. The flow of control in the child is follows: 1. Get next command from parent. 2. Hold dialogue with host. 3. Send reply to parent. 4. Go to 1. The main program for the parent process is in cirt.c, the main program for the child process is in subpro.c. The parent and child processes both access general functions held in the program library, /lib/libgs.a. The files are distributed between the two processes as follows:

May 30, 1986

APPENDIX

Al-2

Parent:main + f u n c t i o n s : c i r t . c functions: p r i n t . c search.c

/lib/libgs.a

Child:main + functions: subpro.c functions: x29.c lex.yy.c /lib/libgs.a

The contents of files cirt.c, print.c, search.c, x29.c and lex.yy.c are described in [3].

The parent sends commands to the child via the UNIX* pipe, and the child then either sends a message to the network or reads from the network via yylex. In talkthro mode, where cirt allows the searcher to talk directly to the Host (Data-Star), the parent reads from the pipe while the child directly controls terminal I/O and network I/O. As an example, consider the case of parent will send <send-msg-command>:<search term>: to the child. When the child receives this request, it sends a '1' back to the parent acknowledging this. The child then sends the search term to the host. To get the search result, the parent then sends <call-yylex-command>: to the child. The child then calls yylex to read the result from network. If the search was valid, then the parent will be sent 1:<yylex-return-value>:<doc set no.>:<search result>: If there was a yylex error, then the parent will be sent: 1:<yylex-return-value>:<possible other information>: There are three possibilities for the other information depending on the yylex return value: yylex return other information value returned zero call status 1-up; O-down 5 Data-Star error code other value nothing If the yylex return value is zero (a network event of some sort) then the appropriate action is taken and the command is aborted. If the return value Is 5, then an error message is displayed and result taken as zero. If a program error has occurred, then the parent is sent a '0' *UNIX is a Trademark of Bell Laboratories. the a single term search. The

May 30, 1986

APPENDIX Al-3

Indicating itself.

. serious euE - tne parent will then kill the c M U end

exit

Robertson,S.E. (1984) Project Proposal. City front-end project Robertson,S.E. and Bovey,J. (1983) A front-end for IR experiments. BLRDD Report Number 5807. Macaskill.M.J. (1986) An Overview of Cirt (doc in dept).

May 30, 1986

APPENDIX

A2-1

On sample sizes for non-matched-pair IR experiments

S.E. Robertson
Department of Information Science The City University Northampton Square London EC1V OHB, U.K.

The use in IR experiments of formal statistical methods such as significance tests has been relatively unusual. This gap has to do in part with the difficulty of establishing the validity of particular tests or even of defining a suitable framework for such tests (IR experimental data is notoriously difficult to pin down in any neat statistical model). Some of the difficulties are discussed elsewhere.

One consequence is that the use of experimental design ideas from statistics, even to the extent of attempts to determine required sample size, has been almost non-existent. A notable exception was in a proposal 2 to the British Library in 1979 by Jamieson and Oddy. Here an argument was given to suggest a particular minimum sample size for an operational comparison of two systems, using independent samples of requests on the two systems. The object of this paper is to place the Jamieson/Oddy argument in the context of a more general discussion of experimental design for IR, drawn in part from Robertson, and to review and extend the argument. 3 Ideas are also borrowed from Sparck Jones and Bates, and Gilbert and

APPENDIX A2-2 Robertson

Sparck Jones, who address particularly the matched-pair experimental design (rather than independent samples).

1

STATISTICS IN IR TESTS

1.1

Experimental design:

system comparison

An experiment Is normally undertaken to answer, or to provide evidence which will help to answer, a research question or questions. Formal methods of experimental design are used to establish experimental techniques and procedures which have the best chance of answering these questions. arguments. Such formal methods appeal to logical or statistical

The research questions In an IR experiment will frequently have to do with comparing two or more variant systems, or with diagnostic approaches to less-than-perfect performance on the part of one system. In this paper,

I concentrate on the former (comparison) type of question. A comparison will normally Involve the measurement (observation) of some variables on a number of Instances of use of the systems, and the comparison of averages or distributions of observations on the different systems. The variables may be traditional performance measures such as recall or precision, cost factors such as time taken, user satisfaction, or any other variable of interest.

An IR experiment will normally Involve a system or systems (that Is, computer programs, human procedures, methods, rules etc., documents and requests). Both the set of documents used and the set of requests put to the systems must be regarded as some sort of sample of all possible documents/requests. Although It is seldom the case that the sample can be assumed to be a genuine random sample, nevertheless any Inference from the results of the experiment must be based on the assumption that the documents and requests used In the test do represent those that may come to the system later.

APPENDIX

A2-3

Robertson

For documents, this may not be so much of a problem: any shortage of documents to chooose from. hand, It Is a major problem:

there Is not usually

For requests, on the other

requests (In the sense of Anomalous States

of Knowledge or ASKs, perceived and acted upon) exist only for a short time, and must be caught then. Any observation that depends on the state of knowledge of the requester, such as relevance judgements or Indications of satisfaction or otherwise, must also be trapped at the same time. Requests are extremely variable In their characteristics; any comparison between two systems must seek to eliminate such variations from the comparison. That Is, one must try to ensure that any Inferred difference between the two systems is actually a systematic difference and not some accidental effect of the requests used.

For these reasons, the most usual way of using requests in an IR experiment, from Cranfleld on, has been based on a "matched-pair" design. That Is, the same set of requests has been put to both or all the systems being compared. This should clearly prevent some of the grosser Inter-request differences from affecting the comparison, although It will not eliminate Interaction effects (a particular system being better for particular types of request). (This kind of argument Is taken further In the Latin-square experimental designs of Keen.)

The matched-pair design Is based on the assumption that the same request can be put to the two or more systems, and equivalent measurements or observations can then be made. In a way this assumption dates from an earlier era of Information retrieval: the time of black-box batch computer retrieval systems. However, as befits the on-line age, we now see IR processes as essentially Iterative and Interactive, and the
M

requestft or Mneedff or ASK as something fluid and difficult to get hold In particular the user, having once made a search on a particular

of.

problem at a particular time, and having responded to the results of that search, no longer has the same ASK as before, and the search cannot therefore be repeated on another system.

APPENDIX A2-4 Robertson

2 This problem was confronted by Jamieson and Oddy, and they therefore proposed, for a comparison of two highly Interactive systems, to obtain two Independent samples of requests. The matched-pair design has become so much the norm that few other IR researchers have taken up the Independent-sample method; nevertheless, as we get to grips with the problems of Interaction, It seems likely that more use will have to be made of this method. currently under way. It was adopted by the author for an experiment

1.2

Tests of significance

An obvious application of statistical Ideas to IR experiments Is to apply significance tests to the results. In the case of a comparative

evaluation of two or more systems, such a test would normally address the question of whether any observed difference between the systems was real, or alternatively could be attributed to chance effects due to the samples Involved.

Some such tests have been used In IR experiments, though by no means universally. The earliest such tests were based on treating the document collection as a sample, ignoring the fact that the request set should also be seen In that light. Since (as we have seen) document collections are usually much larger than request sets, such tests would attribute significance vdiere It was not always justified.

The problem with dealing with both samples simultaneously has seldom been tackled In IR, and looks pretty Intractable. request sets. Given the point about relative sizes, it therefore seems more profitable to concentrate on Some tests based on requests have been tried; they come into two categories:

(a) Tests based on the matched-pair design of the experiment, whereby observations on all or both systems on each query are compared, and the test applied to this comparative Information (one datum

APPENDIX

A2-5

Robertson

for each request);

(b) Tests based on examining the distribution of observations over all queries for each system, and comparing the distributions for the different systems.

The former Includes for example the sign test and Wilcoxon's signed ranks test, and Is clearly only applicable In the case of a matched-pair design.

One problem which needs to be addressed when deciding on a statistical significance test, Is what (if any) assumptions can be made about the shapes of the distributions. Many tests depend on strong assumptions about these shapes. Unfortunately, IR data Is notoriously difficult to pin down in this respect: if one looks, for example, at recall or precision values, one finds the whole range (0-100%) covered, with points of concentration at those values which happen to be low-denominator frations (half, two-thirds etc.). Of course the actual distribution will

depend on which particular variable Is being measured as well as the circumstances of measurement; but the situation does not look good for such strong assumptions. We are therefore led towards non-parametric tests.

Given this point, and given also that we are looking at the Independent sample rather than the matched-pair design, the choice of test Is fairly 2 limited. Jamieson and Oddy suggest the Mann-Whitney U test, a suggestion which Is followed In the present paper. The matched-pair design, and Its consequences for choice of significance test and sample size calculation, were analysed extensively in the context 3 of the 'Ideal1 test collection (Sparck Jones and Bates; Gilbert and Sparck Jones ). Several of the Ideas discussed there re-appear below.

APPENDIX

A2-6

Robertson

1.3

Experimental design:

sample size

A common application of experimental design ideas outside IR is in the determination of the sample size that might be required in order to answer a particular research question. Sample sizes are difficult to

determine in advance, chiefly because the calculation involves guessing what the results will look like. In some kinds of experimental endeavour,

it may be reasonable to suppose that such guesses can be made. Generally this is not the case in IR experiments.

The basis for the determination of sample size is as follows: (a) Guess the shapes of the distributions of the variables involved;

(b) Guess the magnitude of the difference whose existence is to be established, or the precision of the estimate required;

(c) Choose a test of significance and a required level of significance;

(d) Calculate, given the above, the minimum sample size vdiich can be expected to give the right result. Thus, for example, suppose we have a theory that rose bushes reared on XGRO should produce 107o more flowers than those grown on ordinary fertilisers. We conduct an experiment with some bushes given XGRO and some not, with a view to establishing (if our theory is correct) at least that there is a significant difference between the experimental and control groups. For (a), we need to know or guess something about the usual distribution of numbers of flowers per bush; for (b) our theory has given us the figure of 10?o; we choose a test and a significance level for (c), and the calculation can then, in principle, be made.

Some of the difficulties of applying these ideas to IR will already be apparent: the distribution shape has been identified as a problem, and theories in IR are rarely so explicit as to predict a particular

APPENDIX

A2-7

Robertson

quantitative Improvement.

For (b), we may design an experiment which will

detect a difference between two systems If It exceeds a certain amount; In other words, we specify a minimum difference we would like to be able to detect. That will be the approach taken here.

For (a), the situation may not be quite so bad as appears at first. Although we have rejected any significance test which Is based on assumptions about distribution shapes, It may nevertheless be appropriate to use particular shapes In a sample-size calculation. This would at least give us some first approximation to an appropriate sample size, and a number of calculations based on different shapes would Indicate to what extent the calculation was dependent on the distribution. In other words, over-simplified assumptions about distributions are not a suitable basis for a choice of significance test, but may be suitable for a sample-size calculation.

The next section develops various sample-size calculations for the Independent-samples design of IR experiment, based on different distributional assumptions.

2

SAMPLE SIZES FOR INDEPENDENT SAMPLES

2.1

Variables and distributions

As Indicated above, a comparison between two or more systems may Involve measurement or observation of a number of variables. For example, these may Include traditional performance measures such as recall and precision, unnormalised performance data such as number of relevant documents retrieved, cost of searching or of various aspects of the searching process (such as on-line costs), other cost-related factors such as time taken (human or machine), and more subjective factors such as user satisfaction.

APPENDIX A2-8 Robertson

Most of these variables have a range of values which Is restricted In some logical fashion, which will affect the choice of distribution. Thus recall and precision are restricted to the range 0-1007o; cost factors are normally constrained to be positive. In both cases, the fact that observed distributions tend to push the limits of the range means the exclusion of the statistician's favourite distribution, the Normal. A calculation based on the Normal distribution Is Included below, but Is considered to be the least realistic assumption for most IR data.

2.2

The Mann-Whitney U test and the sample size calculation

We assume, then, that two systems are being compared, and that the distributions of observations of the variable In question over the set of requests, for one system, Is known. We assume further that the two distributions A, B (for the two systems) are of the same shape, and differ only In location. The density functions we consider will take the form f^(x) and f^-fS^*)* where |* Is the mean for the lower of the two distributions, A, and ^t* the mean for the higher, B.

The U test involves ranking the observations of the two systems together, and counting the number of Instances In which an observation from B precedes (Is less than) an observation from A. For continuous distributions, the expected value of this count will be

j-oo

Jx

where N Is the sample size (In these calculations, both samples are assumed to be the same size N). For a discrete distribution, where the density function f Is replaced by a probability p, this would read

However, in most of the cases discussed here, the distributions are

APPENDIX A2-9 Robertson

assumed continuous. The limits of +*0are chosen to represent "all possible values'1 upwards or downwards respectively; as we have seen, the values may In fact be restricted.

For N>20, under the null hypothesis (no difference between the samples), U Is approximately Normally distributed; In fact

12 Is approximately Normally distributed with mean zero and standard deviation one. Probability tables for the Normal distribution can then be consulted for significance levels for z. In most examples below, N Is In

the hundreds, and a further approximation Is taken, namely (2N+1)«2N, hence

=

ifeW (U/A/1 - i)

In the examples we will be looking for significance at the 5% level, i.e. for sample sizes which will give an expected value z p below -1.96 (having chosen A as the lower of the two distributions, z^ will be negative). This value is based on a two-tailed test, I.e. assuming no prior Implication as to which system Is better. Is Independent of N. Hence From (1), I W N

N=

t((u,/k/'-t).

and the requirement for significance becomes

(iue/wl

-i)%

It should be noted that the requirement that the expected value of z should be below -1.96 allows a substantial chance (50% in the extreme

APTENDTX A2-10

Robertson

case) that the value actually observed Is nevertheless above this limit and therefore not significant. In statistical terms, the test would be

described as having 507o power. A more stringent requirement would be that the test should have 9 7 > power, I.e. that there should be no more tha a 5< five per cent chance of the experiment yielding a non-significant result (this Is the position taken in the "ideal11 test collection studies). '

A sample size calculation for a 957o power test In the present case would require an analysis of the distribution of the Mann-Whitney U statistic under assumptions other than the null hypothesis, which has not been attempted In this paper. (The 507o power calculation requires only the mean of this distribution, which Is relatively easy to establish.) In relation to the results presented belcw, It should be stressed that a 957c power test would require considerably larger sample sizes. Two further tests are used In the sample-size calculations below: at

test for Normal distributions and a chl-squared test for binary data.

2.3

Jamieson and Oddyfs rectangles

Jamleson and Oddy considered the measure precision in their attempt to calculate sample size. As precision Is restricted to a limited range, they assumed for the purpose of the calculation that the distributions were rectangular (I.e. uniform over the range r). Then the distributions will appear as In Figure 1. The absolute position of the pair of distributions Is not Important; we will assume for the calculation that A starts at 0. Then f^(x) Is 1/r for 0<x<r, and 0 elsewhere; f|UL^(x) is l/r for d<x<r+i, and 0 elsewhere.

[insert figure 1 about here] Formula (1) gives

u£ =

w

r

t-f

r f
CAM

^,

<r J .

APPENDIX A2-11 Robertson

= t w" 0 - *-f S
(the end-point of the range of Integration for the first integral is taken as r because In the range (r, r+i) gives the second Integral Is zero). Then (2)

+

M >

w-$T

From this formula, we can calculate a table of minimum values for N given different S/r (Table 1).

ffr
2.5X

Minimum H

1050

57
107o 207o

269 71 20

Table 1. Minimum N values for two rectangle distributions (U test)

It is instructive to reconsider some aspects of the original Jamieson and Oddy version of this argument. Firstly, the sample sizes they present are less than those in Table 1. The reason for this Is that they chose a one-tailed test of significance rather than a two-tailed test. They took the theory to predict that one of the systems would necessarily work better than the other; the only question was, was It significantly better? This shows a touching faith In IR theory, and actually a misconception of the theory under test (which theory was due to the present author!). At any rate, it is not the normal situation in IR experiments.

APPENDIX A2-12 Robertson

Secondly, the discussion by Jamieson and Oddy suggests that the difference 5 between the distributions is normalised by the mean /A of the lower instead of what they

distribution. That is, they discuss a factor

actually calculate, namely d/r. Since they are considering precision, r is likely to be 1 (or 1007o); JA might easily be 1/2 (50%). Thus their specification of the minimum difference they would like to be able to detect is actually expressed in units which are perhaps twice as big as they intended. The sample size required to detect, say, 570 would be larger still than that required to detect S/r=570 as in Table 1. Thirdly, the form of distribution assumed contradicts the assumption that precision values are likely to cover the whole 0-1007o range since the end-points of the two distributions do not coincide. It would be worth attempting a formulation that satisfied this assumption.

2.4

Trapezium distribution

In pursuit of a simple model for a variable like precision, a trapezium-shaped distribution on the range (0,1) suggests itself. A pair of trapezium distributions are represented in Figure 2. f|*(x) is given by the straight line through the three points (0,l-3a), (1/2,1), and (l,l+3a) for some a such that -l/3<a<l/3. Then f (x) = l-3a+6ax and f = £(l+a). * fjA^x) is given by the same formula with b replacing a; p+o= ~(l+b), and

I-

£(b-a).
[insert figure 2 about here]

From (1),

€

Jo

Jx

APPENDIX A2-13 Robertson

and from (2)

N> (Hf
Tdble 2 gives minimum N for different values of S from this formula. These E values are not normalised, but the range Is necessarily 1, so they can be taken as equivalent to the J/r values In Table 1.

It may be seen that the figures are in fact very close to those derived from the two rectangle model.

%

Minimum

N
2.57. 1024

57.
107. 207.

256 64 16

Table 2. Minimum N values for two trapezium distributions (U test)

2.5

Normal distributions

Turning now to the case of an unrestricted variable, an obvious model to try is a pair of Normal distributions of the same variance. Figure 3 shows two Normal distributions N(|*,ff) and N(f+£,<r). [insert figure 3 about here] From (1) and the Normal distribution function, we have

APPENDIX A2-14

Robertson

.00

ue =

M'
*!•, 2 TT (T •
» — oo

-fr'A-0'
^<^ •
^

*X,

-frH1,
x<r

Solution of this equation is not simple. However a solution to an
o

equation of which this is a special case is given In Robertson. From that, we can infer that oo i l

U e -- and from (2)

M

/ZTT

J Sfo i<r

**r

-t

TJ

U

c

M >

1 ^ J-v^ V-SK I

/ J_

J"//lr

7-

The integral Is simply a probability from N(0,1) and can be looked up In suitable tables. Table 3 gives the minimum sample sizes for different values of f/r. It should be noted that normalising <T by the standard deviationtf"is not equivalent to the methods adopted in the twu previous examples; there Is strictly no such equivalent. Therefore, making comparisons between Table 3 and the two previous tables Is not really valid. A rule-of-thumb comparison can be devised by saying that 95% of the observations In a Normal distribution lie within ±2 standard deviations, so the range might perhaps be taken as 4<T. On this basis, 5?Q £7r In Table 1 would be equivalent to 207o 5"/<T. By this argument it would appear to be a little easier to distinguish Normal distributions than rectangle or trapezium.

APPENDIX A2-15

Robertson

J/r
101
207o

Minimum N

822 206 53 14

40%
807„

Table 3. Minimum N values for two Normal distributions (U test)

2.6

Exponential distributions

As discussed above, the Normal distribution Is not likely to represent many IR variables very closely. Those that are not constrained to a finite interval (like precision) tend to be constrained by a lower limit (e.g. zero, like any cost-related factor). Furthermore, their distributions tend to be very highly skew. A candidate distribution having these properties Is the (negative) exponential. A pair of exponential distributions Is shown In Figure 4. The exponential Is described by a single parameter i* which Is both mean and standard deviation.

[insert figure 4 about here]

From (1) and the exponential distribution function,

UL =
* €

N7-

— <*f - ^

^ *** ^

h ^x

APPEJNLIX A2-16

Robertson

From (2),

whence we may calculate Table 4. The normalisation

Is comparable with

the normalisation J < for the Normal case. On the basis of the rather /r arbitrary comparison discussed in the previous section, exponential distributions are somewhat more difficult to distinguish than any of the other cases analysed so far.

J/r
10% 20% 40%
807c

Minimum N 1129

310 92 31

Table 4. Minimum N values for two exponential distributions (U test)

2.7

Normal distribution with t-test

It is worth considering alternative tests as well as alternative distributions, to see whether the choice of test is likely to be critical. In particular, we should ask whether there Is much to be gained by using a parametric test. One obvious candidate Is the t-test for a difference between two means.

The expected value of the t statistic for the case of two Normal distributions (as in 2.5) is

APPENDIX A2-17 Robertson

te =
(for two samples both of size N). For large N, t is approximately Normally distributed with mean 0 and variance 1 under the null hypothesis. Therefore, for a 95% significance level with 50% power, we need t£>1.96. This yields

M >
from which we calculate the minimum sample size required (Table 5). taken Into account In the last two values In the table.) (The formula requires a slight correction when N Is not large, which has been

J/cr
10% 20% 40% 80%

Minimum

N 768 192 51 14

Table 5. Minimum N values for two Normal distributions (t^test)

These results are a little but not very much better than those already obtained for Normal distributions: it seems that we do not gain much by allowing the strong parametric assumptions of the t-test.

2.8

Binary distributions with chi-squared test

All the distributions discussed so far have been continuous. Many variables Involved In retrieval tests are (either essentially or because

APPENDIX A2-18 Robertson

of a particular operational definition) discrete rather than continuous. An extreme version of a discrete distribution is a binary distribution (i.e. the variable has only two values). suggests a chi-squared test. Comparing two such distributions

(Although the Mann-Wiitney is capable of

dealing with ties in the ranking, as is bound to occur with a discrete variable, the number of ties in the extreme case of a binary variable renders that test unsuitable.)

A binary distribution may be represented by a probability for one of the values, which would also be the mean if the value chosen is arbitrarily associated with unity, the other being zero. the mean is|A=P(l), and 1-/*=P(0). We assume two distributions, A with mean and B with mean M+ f/2. In other words

The results of the experiment will be represent by a 2x2 table: A 1 0 A N-S A N
S

B % N-S B N
S

A +S B 2N-S A -S B

2N

The expected value of S. (the number of ffsuccesses " with A) is N(fA-d/2), of S B is N(p+J/2). If we perform a chi-squared test on this data, the expected value under the null hypothesis for cell Al will be (SA+SR)/2, with similar values for the other cells. We may therefore calculate an MexpectedM value for the chi-squared statistic. (The quotation marks are used because this is not a true expected value, as Y 1 is a non-linear function of the cell values. Therefore the sample sizes reported here are approximate only.) The calculation gives
* y
w

mm

*

*~

K£

r 0-r)

This formula depends o n u , but we can calculate a worst case: if |Jt=l/2,

APPENDIX A2-19 Robertson

otherwise it is larger. The 5% significance level of Y of freedom is 3.84; hence we need, in the worst case,

with one degree

s1This gives the values in Table 6. As in Table 2, the 6 values are not normalised, but the range is necessarily 1.

5"
2.5%

Minimum

N
3072

5% 10% 20%

768 192 48

Table 6. Minimum N values for two binary distributions (va test)

It may be seen that it is very much more difficult to demonstrate a significant difference for this kind of variable.

3

DISCUSSION

It should be remembered that the sample sizes given in the tables apply to each of the systems being compared — i.e. a comparison of two systems

would require a total of twice the number of requests given in the tables.

APPENDIX A2-20 Robertson

The present study has been limited in a number of ways.

First, it has

dealt with a small number of somewhat artificial distributions. Real-life distributions tend to be much messier, and in this respect the results are indicative only.

Second, it has considered mainly continuous distributions. Discrete distributions are likely to be more problematic (i.e. to require larger samples) with Mann-Whitney, because of the possibility of ties in the ranking; this point is confirmed by the analysis of section 2.8, where a discrete distribution was used, albeit with a different test. Third, it provides only for tests of 50% power (i.e. for the given difference, a sample of the given size would have a 50% chance only of producing a significant result). Requiring higher power would involve larger samples again.

Thus, in general, the figures given here may be seen to underestimate the sample sizes required.

4

CONCLUSIONS

The major conlusion to be drawn from the present analysis must be the extreme difficulty of conducting adequate non-matched-pair tests in IR. On the one hand, differences in IR performance between alternative systems have often been small; on the other hand, few IR researchers have found it feasible to obtain and work with query set sizes greater than a hundred or so. Between these two facts and the numbers arrived at above, there seems little opportunity for establishing statistically significant differences in properly conducted operational experiments.

On a somewhat more positive note, this analysis seems to indicate the need for more experiments based on some compromise between reality and laboratory controls. While it is clearly desirable to go beyond the highly artificial laboratory experiments that have been common since

APPENDIX A2-21 Robertson

Cranfield, attempting total realism is likely to introduce insuperable methodological problems. The difficulty lies, of course in the selection of appropriate compromises for particular research aims.

REFERENCES

1. ROBERTSON, S.E. In:

The methodology of information retrieval experiment. Information Retrieval Experiment.

SPARCK JONES, K., ed.

London: Butterworths, 1981, pp 9-31. 2. JAMIESON, S.H. and ODDY, R.N. Implementation and evaluation of

interactive retrieval through an intelligent terminal. Project proposal to the British Library Research and Development Department, 1979. 3. SPARCK JONES, K. and BATES, G. 'ideal
1

Report on a design study for the

test collection.

Computer Laboratory, University of ). Statistical bases of relevance Computer Laboratory,

Cambridge, 1977 (BLR&D Report No. 4. GILBERT, H. and SPARCK JONES, K.
1

assessment for the 'ideal test collection.

University of Cambridge, 1979 (BLR&D Report No. 5481). 5. KEEN, E.M. and DIGGER, J.A. 1972. 6. ROBERTSON, S.E., THOMPSON, C.L. , MACASKILL, M.J. and BOVEY, J.D. Weighting ranking and relevance feedback in a front-end system. Journal of Information Science, 12, 1986, 71-75. 7. SIEGEL, S. Nonparametric statistics for the behavioural sciences. New York: McGraw-Hill, 1956. Report of an information science

index languages test. Aberystwyth: College of Librarianship Wales,

APPENDIX A2-22

Robertson

8. ROBERTSON, S.E.

The parametric description of retrieval tests. Part

II: overall measures. Journal of Documentation, 25, 1969, 93-107.

APPENDIX A3-1

The following is a log of a Weighted search, in the abbreviated form used for analysis (an unabbreviated log is given in the Cirt manual, Appendix A 5 ) . More specifically, this is the log of the Cirt/searcher dialogue; the corresponding Cirt/Data-Star dialogue is given below (A3-4 o n ) . The searcher's contibutions are underlined; brief explanations are given on the right.

Enter id for offline prints- 275 w Enter query identifier- 275 Enter y or Y for boolean search; n,N or RETURN for weighted search-> aof food-formulated-ut composition -> oq [2 terms added offline] rels t rels 1. 0 0 food-formulated-ut 2. 0 0 composition -> li [Automatic login] No host name given - dstar assumed call established USERID : DATA BASE NAME : meyy Login successful Any limits ? -> add all food-formulated-ut not found 3.F30911 W4.7 -> add food-formulated 4.F1340 W7.8 -> add "food-formulated with ut food-formulated with ut not found -> add nfood-formulated with ut" food-formulated with ut not found -> s Searching..Searched Search tree 82 0 -> 1 12.5 food-formulated composition [best set has 82 docs]

[1 of the 2 not found]

[3 more attempts, 1 sucessful]

[weighted search]

set: 82 iplq?l 1?? q -> add elemental 8.F1527 W7.7

[display 1st title] [quit displaying] [add new term]

APPENDIX A3-2

-> s Searching.•Searched Search tree 12 0 20.2 food-formulated composition elemental 164 0 15.5 food-formulated elemental [best set now has 12 -> 1 docs, second best has 164] 12 set: iplq?l document 1 already seen [1st & 2nd titles 1?? rl marked as relevant. 2?? rl Abstract displayed 3?? fl for 3rd, then marked 3?? rp rel; remainder of set: 164 set printed] iplq?q rels [weights recalculated 7.8 9 .8 3 food-formulated using relevance info] .6 3 composition 4.7 6 .6 3 elemental 7.7 9 n -> add fatty adj acid" 12.F11 757 W3.7 -> s Searching..Searched Search tree 3 0 23.1 food-formulated elemental fatty adj acid 6 0 20.1 food-formulated composition fatty adj acid 2 0 20.0 composition elemental fatty adj acid 161 0 19.4 food-formulated elemental -> 1 set: 3 iplq?l 1?? rl 2?? rl 3?? rl set: 6 iplq?l 1?? 1 2?? 1 3?? q rels: 6 9.8 10.4 6.6 4.7 9.6 10.3 3.7 5.7 3

food-formulated composition elemental fatty adj acid

APPENDIX A3-3

-> 1 set: 3 iplq?i set: 161 iplq?q -> add amino-acids 21.F33138 W2.0 -> s Searching..Searched Search tree 18 0 22.7 food-formulated elemental amino-acids -> 1 set: 18 iplq?l 1?? rl 2?? 1 3?? 1 4?? rp Search exhausted [no more sets in rels: 8 search tree] 10.4 10.7 8 food-formulated 4.7 4.2 3 composition 10.3 10.5 8 elemental 5.7 5.2 3 fatty adj acid 2.0 3.7 2 amino-acids -> q No. of documents in off-line print set is 33 ^CONNECT TIME MEYY: 0:12:37 HH:MM:SS 0.210 DEC HRS. SESSION 1842* *SIGN-0FF 17.28.17 l6.03.87Clearing the call Hours 0, Minutes 13, Seconds 16, Rx 278, Tx 60 [all docs marked relevant, plus all sets requested, printed offline]

APPENDIX A3-4

/ # Service Packet - Set parameters - Received ### /# (7,25) (3,2) (4,0)
D A T A - S T A R , PLEASE ENTER YOUR USERID :

ENTER YOUR A-M-I-S PASSWORD OOOOOOOO XXXXXXXX ******** - SUNDAY 22. MARCH 1987: DUE TO MAINTENANCE, D-S WILL NOT BE AVAILABLE FROM 0700 TO 1700 HOURS SWISS TIME (0600-1600GMT). - HSELINE, THE DATABASE OF THE UK HEALTH AND SAFETY EXECTUVIE TO BE LAUNCHED SOON. SEE BROADCAST MESSAGE. - THE EUROPEAN DIRECTORY OF AGROCHEMICAL PRODUCTS DATABANK TO BE LAUNECHED SOON. SEE BROADCAST MESSAGE. ENTER YES IF BROADCAST MSG IS DESIRED_: ENTER DATA BASE NAME_: meyy *SIGN-ON 17.15.28 D-S/MEDL/MEDLINE 1983 - APRIL 1987 D-S - SEARCH MODE - ENTER QUERY 1 : docz RESULT 2__: 3378409 16.03.87 SESSION

1841

food-formulated-ut FOOD-FORMULATED-UT KEYWORD NOT IN DICTIONARY 0

RESULT 3_:

composition 30911

RESULT

4__: food-formula ted RESULT 5_j 1340

food-formulated with ut 0

RESULT 6_ _:

food-formulated with ut 0

RESULT 7_: 4 and 3

RESULT 8__:

82

..print 7 /AN,TI/doc=l

APPENDIX A3-5

1 TI Effect of enteral formula infusion rate, osmolality, and chemical composition upon clinical tolerance and carbohydrate absorption In normal subjects. END OF DOCUMENT : . .s

D-S - SEARCH MODE - ENTER QUERY 8_: elemental RESULT 9_: 1527

7 and 8 12

RESULT 10_:

4 not 3 and 8 164

RESULT 11_:

..print 9 /AN,TI/doc=l

1 TI Effect of enteral formula infusion rate, osmolality, and chemical composition upon clinical tolerance and carbohydrate absorption in normal subjects. END OF DOCUMENT : ..print 9 /AN,TI/doc=2 2 TI Influence of the intake and composition of elemental diets on bile acid metabolism and hepatic lipids in the rat. END OF DOCUMENT -...print 9 /AN,Tl/doc=3 3 TI Energy needs and nutritional rehabilitation in undernourished adolescents and young adult patients with cystic fibrosis. END OF DOCUMENT :..print 9 /AN,TI/doc=4 4 TI (Immediate postoperative enteral feeding with an elemental diet (Survimed) using a new application form of the so-called fine needle catheter jejunostomy. A prospective study). END OF DOCUMENT :..print 9 /AB/doc=4

APPENDIX A3-6

END OF DOCUMENT

D-S - SEARCH MODE - ENTER QUERY 11_: (86308376).AN, or (85057528).AN, or (84107858).AN, RESULT 12_: 3

fatty adj acid 11757

RESULT 13__:

11 and 12 0

RESULT

14 : 7 not 8 and 12 RESULT 15_: 6

10 and 12 3

RESULT 16__: 3 not 4

RESULT 17_:

30829

16 and 8 236

RESULT 18_:

17 and 12 2

RESULT 19_:

..print 15 /AN,Tl/doc=l after prolonged treatment with

1 TI Essential fatty acid deficiency elemental diet (letter). END OF DOCUMENT :..print 15 /AN,Tl/doc=2 2 TI Essential fatty acid deficiency elemental diet (letter). END OF DOCUMENT :..print 15 /AN,TI/doc=3

after

prolonged

treatment

with

3 TI Alterations in gastrointestinal contents induced by elemental diets.

APPENDIX A3-7

D-S - SEARCH MODE - ENTER QUERY 19__: . .print 14 /AN,TI/doc=l 1 TI Modulation of human erythrocyte shape and fatty acids by diet, END OF DOCUMENT -...print 14 /AN,TI/doc=2 2 TI Changes in essential traumatized patients.

fatty

acids

in

plasma

lipid

fractions of

END OF DOCUMENT :..print 14 /AN,TI/doc=3 3 TI Essential fatty acid status in premature newborns fed by nasoduodenal technique. END OF DOCUMENT : . .s

D-S - SEARCH MODE - ENTER QUERY 19_: (86308376).AN. or (85057528).AN. or (84107858).AN. or (81073924).AN. or (81051304).AN. or (80031286).AN. RESULT
20_:
^s_

6

D-S - SEARCH MODE - ENTER QUERY 20_: 10 not 12 RESULT 21_: ..s 161

D-S - SEARCH MODE - ENTER QUERY 21__: amino-acids RESULT 22_: 33138

19 and 21 0

RESULT 23_: 20 and 21

RESULT

18

APPENDIX A3-8

24_:

18 and 21 0

RESULT 25_:

..print 23 /AN,TI/doc=l

1 TI Nutritional support: how much for how much? END OF DOCUMENT :..print 23 /AN,TI/doc=2 2 TI Alteration of methotrexate dietary components. END OF DOCUMENT :..print 23 /AN,Tl/doc=3 3 TI Efficacy of two elemental diets: a pair feeding study. END OF DOCUMENT :..print 23/AN,TI/doc«4 4 TI (Experimental study on intestinal absorption of elemental diet). END OF DOCUMENT : . .s toxicity in rats by manipulation of

D-S - SEARCH MODE - ENTER QUERY 25_: (86308376).AN. or (85057528).AN. or (84107858).AN. or (81073924).AN. or (81051304).AN. or (80031286).AN. or(87081546).AN. or (85163376).AN. RESULT 26_: 8

25 or 9 or 23 33

RESULT 27_:

..po 26/ALL/ALL/M. Wilson 275w;275

..PRINTOFF 26/ALL/DOC=ALL/lD=M. WILSON 275W; 14632 TOTAL DOCUMENTS FOR OFFLINE PRINT: 33 14607 PRINTOFF SAVED AS QUERY Q0469 D-S - SEARCH MODE - ENTER QUERY 27_: ..po 26/ALL/ALL/M. Wilson 275w;275;evaluation copy ..PRINTOFF 26/ALL/DOC=ALL/ID»M. WILSON 275W; 14632 TOTAL DOCUMENTS FOR OFFLINE PRINT: 33

APPENDIX A3-9

14607 PRINTOFF SAVED AS QUERY Q0470 D-S - SEARCH MODE - ENTER QUERY 27 : ..o ^CONNECT TIME MEYY: 0:12:37 HH:MM:SS 0.210 DEC HRS. SESSION *SIGN-OFF 17.28.17 16.03.87 ### Service Packet - Inv. to Clear - Received ### 1842*

APPENDIX

A4-1

tui

UJ UJ

<

i/>

f\

AFTENDIX

A4-2

FREE

COMPUTER SEARCHES i
Do you need a reference for: an essay a supplement to required reading further information on a specific subject research? Don't go searching the shelves. . . come and have a FREE computer . search at City University Department of Information Science. Online bibliographic searches can produce just the references to journal articles, reviews or books that you require in only a matter of minutes. Generally you pay for this convenience and charges range around £20.00 BUT now you can have one done FREE. The City University Department of Information Science is conducting an experiment involving online database searches and welcomes your particular enquiry. All you have to do is: 1. Make sure it is a subject enquiry related to Medicine or Psychology so it can be searched for on the U.S. National Library of Medicine database MEDLINE. Examples of subject enquiries might be - The effects of cigarette smoking on the lungs - Arthritis and drug therapy - Schizophrenia and diet etc. etc. etc. Come to City University Department of Information Science and be there while the search is taking place •

2.

3.

Answer a brief questionnaire and evaluate the success of your particular search.

If you are interested would you please contact: Catherine Thompson Department of Information Science The City University Northampton Square London EC1V OHB phone: 253 ^399 ext. 3901 Donft be shy come along and have a try...

APPENDIX

A5-1

_CIgT USERS MANUAL

C a t h e r i n e Thompson

January, 1986

APFFNDIX A 5 - 2

Preface

This manual i s divided i n t o t h r e e s e c t i o n s 1. 2. 3. Search guide Alphabetical listing of conmands Sample searches

Note: terms citation, reference, and document all refer to the same thing ie the document surrogate offered by the host Data-Star.

APPENDIX

A5-3

REFERENCE CARD

->DELelet,del ->LIST,list ->Look,l ->NewWeights,nw ->Out,o
->RESET,reset ->Quit,q ->Search,s ??rl

deletes terms before or after the search lists the terms presently active on the search displays titles of documents recalculates the weights based on relevance feedback information log-off Data-Star start a new search or changes databases log-off Cirt

sends search terms t o Data-Star i n d i c a t e s the present document as relevant and displays the t i t l e of the next document i n d i c a t e s the present document i s relevant then q u i t s the whole s e t and r e t u r n s t o command mode -> i n d i c a t e s the present document i s relevant and asks what you would l i k e t o do with the e n t i r e s e t i e ignore, p r i n t , look, or quit would l i k e t o see further fields 3•Year 6.Author & Source

??rq

??r

??f

1.Abstract 2.Descriptors *J.Language 5.Source
??1 ??i

look at the t i t l e of the next document i g n o r e s t h e whole s e t and s k i p s t o t h e next set p r i n t the whole s e t o f f - l i n e in f u l l r e t u r n s t o command mode -> a s k s what you would l i k e t o do t o t h e e n t i r e s e t i e ignore, p r i n t look or quit format.

??p ??q ??CR

APPENDIX

A5-4

TABLE, OF CONTENTS TO SEARCH GUIDE 1. 2. 3. Introduction Weighting Ranking and Relevance Feedback Modes 3.1 COMMAND MODE 3.2 DISPLAY MODE 4. Searching 4.1 LOGGING ON 4.1.1 Logging on _to Cirt 4.1.2 Logging on to Data-Star 4.1.3 Boolean Searches 4.1.4 Cirt Searches 4.2 BUILDING A QUERY 4.2.1 Limits 4.2.2 Adding and Deleting Adding Deleting 4.3 EXECUTING A SEARCH 4.4 LOOKING AT DOCUMENTS 4.5 RELEVANCE FEEDBACK 4.5.1 Relevance 4.5.2 New Weights 4.5-3 Reset 4.6 LOGGING OFF 4.7 ADDITIONAL FEATURES 4.7.1 Commands 4.7.2 Abandoning a Search _ 5Printing off-line 5.1 WEIGHTED

5.2 BOOLEAN 6. Iridex

APFENDIX

A5-5

SEARCH GUIDE

n r r li »i u i

A

r» «• — * • >

CIRT SEARCH GUIDE

1. Introduction Cirt is a prototype front-end system which permits weighting ranking and relevance feedback on a traditional Boolean retrieval system - Data-Star. The system Cirt is strictly dependent on the host Data-Star but within that host it is able to access the following databases MEZZ MEDL PSYC INSP BJZZ. Cirt permits two forms of searching. One is to use weighting ranking and relevance feedback facilities of Cirt, and the other is to use Cirt transparently, providing access to the traditional Boolean retrieval offered by the host. For the experiment both types of searches will be done on Cirt in order to keep a log of the transactions for later comparison and provide a consistent basis for comparison. Although Cirt will provide the mechanism for exploring and developing weighted ranked and relevance feedback retrieval it is not the finished front-end, but essentially a tool for this experiment. Therefore it is somewhat cumbersome and like all prototypes lacking the refinements which will result from further development. 2. Weighting Ranking and Relevance Feedback

This form of retrieval offers three processes to enhance retrieval effectiveness: weighting, ranking, and relevance feedback. As terms in the enquiry are put to the chosen database the terms are weighted according to the frequency of occurrence of that term within the database, the most infrequently used (and perhaps most specific) terms having the highest weight. Searches are then done with combinations of terms, giving sets of documents ranked by matching value (sum of weights). References within each retrieved set can then be examined. The title is automatically displayed; if more information is required in order to judge relevance, there is also option to see other fields such as Abstract Source etc. Having seen these references, judgements are made as to the citation's relevance or lack of it by tagging the reference with an "r" to indicate relevance, and blank or no response to indicate not relevant. After viewing and tagging any number of references in this fashion it is possible to calculate new weights based not on the frequency of occurrence information mentioned previously but on the relevance information supplied by the tagged references. Using these new weights and perhaps additional terms it is possible to re-search thereby achieving more effective results.

3. Modes Not unlike Data-Star Cirt has two modes 3.1 COMMAND MODE which has the prompt ->. This mode expects any one of the commands mentioned in the alphabetical listing of commands preceded by the -> prompt.

APFENDIX A5-7

All commands are in the form -> command argument! arguraent2 ...

valid arguments are dependent on the command. For instance: -> add terml term2...

Commands and arguments can be in upper or lower case. Any command or argument in the list below can be abbreviated by leaving out only of the letters given in lower case eg if a command is specified as Login it can be abbreviated as f,lifl or "logi" or fflinff or entered in full as "login" but it cannot be abbreviated to "1" or "log". Spaces separate the command and arguments, but there should be no space within the command itself.

3.2 DISPLAY MODE which has the prompt ??. This mode expects any one of the responses mentioned in the alphabetical listing of commands proceeded by the ?? prompt. This mode essentially asks two questions - one related to relevance the other related to displaying documents, therefore it can accept either a single or double response. ? When addressing the question of relevance you must positively indicate relevance by typing "r". All documents tagged with an ,frfl will automatically be printed offline. Nonrelevance is indicated by a blank or no response. ? The next question queries what you would like to do the following document or the whole set. Responses to the ?? prompt can take three forms. with

1. Further information required f permits the display of additional fields, after which you are returned to the ?? prompt. There are six possible options with corresponding numbers: 1.Abstract 4.Language f6 2.Descriptors 5.Source 3.Year 6.Author & Source

It is possible to skip this menu and indicate directly the additional field you require. This can be done by adding the corresponding number of the desired field after typing "fn ie fff1" would display the abstract, f,f6" would display author and source etc.

APPENDIX A5-8 Indicating relevance and automatically printing offline. rl indicates the present document is relevant displays the title of the next document indicates the present document is relevant,, and quits the set to return to command mode-> indicates the present document is relevant and ask£ whether you would like to ignore or print the entire set, look or quit ,to command mode ->.

rq -

r

-

Non-relevant documents 1 q OR displays the title of the next document returns to command mode -> asks what you would like to do with the entire set ie ignore, print lock or quit to command mode ->

K.

Searching

1 . LOGGING ON 11 Jj. 1.1 Logging on to Cirt Call up the LSI 11/23 (which you will be instructed to do according to your particular circumstances). Once you are connected to the 11/23, log-in using the ID, which you have been given, and your password, which you should assign yourself. This should put you directly into Cirt. If you have been successful you will get the header:

i

i

I
I

W E L C O M E

T O

C I K T

I
1
4

+

Using dstar: Uhen you get the Cart pronpt -> Type li < login) dstar and press RETURN. When you get the next Cart pronpt, you can si art using any o1 the Cart connands ( add, search, etc. ). To end a dstar session use the Cirt connand o< ui ). Quitting Cirt: Use the Cirt connand q( uit ). When you quit Cirt, you will be left in Unix (the Unix pronpt is * ). Logoff by typing *ii ( control-r). > Looking at docunents: Use the Cirt connand H o o k ) . "To look at further fields, use the f connarid. Type 1 and then press RETURN (if you wis^^to be pronpted iL>r a ficid nunber ), or f followed by one of the following nunbers for trie associatc-d ficld(s). 1. Abstract 2. Descriptors 3. Year 4. Language 5. Source 6. Author % Source

APPENDIX A5-9 The first input you will be asked to type is "Enter id for offline prints" This is identical to the ID required on the printoff statement of Data-star, all the appropriate punctuation will apply (ie ";" to produce a new line etc). This statement generally refers to the person to whom the off-line prints are to be sent. You will then be asked for the query identifier. "Enter query identifier" This is the number printed on the card which randomly allocated either a Cirt or Boolean search. The same number should be put on all questionnaires and forms relating to this particular enquiry. Lastly you will be requested to indicate which type search you will be doing eg Cirt or Boolean. "Enter search" y or Y for Boolean search; n,N or RETURN for of

weighted

When these three questions have been answered you will be in command mode with the -> prompt. 4.1.2 Logging on to Data-Star After you have received the command mode prompt -> simply type -> li dstar you

A successful log-in will be indicated. If you do encounter any problems the error messages are self explanatory and generaly involve retyping the log-on command ie "li dstar". The search processes will divide at this stage 4.1.3 Boolean Searches If you have successfully logged on to Data-Star you will receive the message You Are Connected to the Host Simply enter RETURN and you will receive the standard Star search mode prompt. 1 : 4.1.4 Cirt Searches If you have successfully logged on to"Data-Star receive the message Any Limits? you Data-

4

APPENDIX

A5-1C

4.2 BUILDING A QUERY 4.2.1 Limits
These limits apply to MEDLINE SEARCHES ONLY; if you are searching on an other database you need only reply N or No and the program will skip this section and pronpt with the Cirt command mode prompt ->. Now you can start to build your query (see next section). If you are going to search Medline and require limits for Year, Language, Human, Animal, Female, Male or any other limits, you need only reply Y or Yes to this request for limits. There are four types of responses to the limit requests. Specify - this response requires a statement of the type of limit you want. It applies to YR, LG or "other limits".

Y or Yes - a positive limit; for instance everything written pertaining to females N or No - a negative limit; do not want anything written on animals skips to the next limit, not apply this limit does

CR -

Limits and their appropriate responses are: YR Specify any two digit year ie 84, or Carriage Return (This will include all documents from the specified year to the most current. It is not possible to specify only one isolated year unless it is the most recent year). Specify or Carriage Return (All Medline language abbreviations apply. If you are making a list put a comma, space between each abbreviation. In addition it is possible to exclude a language by specifying "not" before the language ie "not fr" or "not gr") Yes No or Carriage Return Yes No or Carriage Return Yes No or Carriage Return Yes No or Carriage Return Specify or Carriage Return (This includes any applicable Medline check tag or negative Boolean statement, such as adolescence, or not electron)

LG -

human animal female male other limits

APPENDIX

A5-11

k.2.2

Adding and Deleting terms "add"

Adding: The command to add terms to the database is followed by the term in any of the following forms. 1. Single natural language terms ->add terml ->add tenn2 ... OR

A s t r i n g of terms separated by a space ->add terml tenn2 t e r m 3 . . . 2. Data-Star search terms Truncated - psycholog$6 I f r i g h t - t r u n c a t i o n i s used and more than 100 terms match, C i r t w i l l simply pass on the Data-Star e r r o r message. Adjacency - f, long adj term" Boolean - "labor or labour" Paragraph q u a l i f i c a t i o n - t e l e v i s i on $ L t i . d e . 3. Mesh search term f a c i l i t i e s Explosion - l u n g # Mesh heading - l u n g - d i s e a s e s N. B. QUOTATION MARKS - I f the search term contains blanks then i t must be enclosed i n quotes. There i s no harm in enclosing a term i n quotes so, i f in doubt use quotes. After using the command "add" C i r t w i l l respond with Search s e t number, term, frequency and weight Deleting: Terms can be deleted in one of two ways 1. Before the search i s executed: ->del terml t e r m 2 . . . O R
->List (to get the following display)

-> list Query terns No. frequency 5 S 4. 68 S 2. 1075 S 8. 1539 S 9. 3439 S 5. 5152 S 10. S 7. 8130 S 3. 10375 S 6. 19193

weight 12.6586 10.0485 7. 2879 6. 9291 6. 1251 5.7209 5.2647 5. 0209 4.4057

rels 0

c c

c
0 0 0

c
0

text rubbish thalidonide-ae asia anerica europe africa us uk australia

APPENDIX

A5-12

one situation in which it may be necessary to type "search" again would be if you require more than a minimum of 15 documents for display. The searching algorithm has an internal parameter known as search size, which is the target number of documents to be found, ie the target length of the ranked list of documents. By default, search size has been set at 15- Since documents are found in sets, the actual number found is likely to be more than 15, it can be less than 15 (if all single term searchs yield less than 15 between them). Futhermore there may be additional documents which are retrievable by the search statement, but Cirt is unaware of them because the program stopped when it reached the minimum of 15 documents. If you do a search and then display or print some documents, you may reach the end of the documents known to Cirt, at this point you will receive the prompt "Search exhausted". If you type "search" again, Cirt will carry on and try to find a further 15 documents beyond those you have already seen. i . j LOOKING AT DOCUMENTS li After searching it is possible to look at the document one set at a time using the command -> 1 or look

Cirt will respond with the number of documents in the s e t , the weight of a l l the documents within that set and the individual terras and t h e i r weights. In addition you will be asked i f you wish t o "ignore, p r i n t , look or q u i t " ( a l l responses can be abbreviated t o the f i r s t l e t t e r i e i , p , l or q).
1 or look - This "look" refers only to the current document not to the whole set. If you choose this option the title will automatically be displayed, and you will receive a ?? cirt display mode prompt. If you wish to see additional fields in this citation type "f" which will display the six options available. They are: 1.Abstract 2.Descriptors 3-Year 4.Language 5.Source 6.Author & Source It is possible to skip this display and indicate directly the additional field you require. This can be done by adding the corresponding number of the desired field after typing "f" ie "f1" would display the abstract, "f6" would display author and source etc. (for additional details on display mode see page ). i or ignore This will ignore the whole set and skip the next set. p or print q or quit Print will print the whole of the set line in full format.

to

off-

This comnand will return you to command mode with the -> prompt.

APPENDIX A5-13 then type ->del 9 OR ->del 9 6... 2. Deleting after the search is done in the same form as above but as the term has been searched it will still appear in the listing but with a zero weighting. For example
-> list fiuery terns fro. frequency weight S A. 5 12.4586 S 2. 68 10. 04B5 S 6. 1C7L 7.2679 S 9. 1539 6.9291 3439 6. 1251 5.7209 6130 5. 2647 1C37S 5. 0209 2 9193 4.4C57 Search tree weigh; No seer. 31.33 3 C 3i.ee J C 3C.47 17 0 -> delete 4 6 9 1C -> list Guery terr.s text fro. frequency weight rels 0.0000 0 rubbjsr. 66 10. 0485 0 thaijoonjof- ar 3071; 0.OO0C B. 0 asia 1539 0.0000 9. t anenca r 3439 6.1251 J. 0 europe 10. 5152 0.0000 0 africa 7. 6130 5. 26C7 0 us 3 10375 5. 02C9 0 u* 6. 19193 « 4C57 0 Australia Search tree tern' No. seen weight thalidcmide-ac europe 16. 17 1 C

rels

text

0 rubbish 0 thalidonide-ae 0 asia 0 inerKi 0 europe 0 africa 0 us 0 u> 0 australia terns asia anerica europe a f n e a us asia a n e n c a europe africa u» asia aneraca europe afraca australia

k.3 EXECUTING A SEARCH Once you have constructed an enquiry using the terms you deem necessary, the command to search these terms is: search or s This command w i l l a u t o m a t i c a l l y send s e t s of Boolean statements t o D a t a - S t a r which a r e p r e s c r i b e d by the search algorithm (Robertson and Bovey, 1983)While the search i s automatically being executed the t e r m i n a l d i s p l a y s the message Searching full stops are added after the word "searching'1 to show the user the search is in progress. Once the search is complete you will be prompted with Searching The length considerably and and the number time increases terms used - the Searched

of time required for each search varies is dependent on host usage, response time, of terms used for the enquiry. The search exponentially in relation to the number of more terms the longer it takes.

The search procedure is designed in such a way that it never needs to resend any search statement. Hence if you type "search* after just completing a search the program will simply list the results of the search again. Nevertheless

APPENDIX

A5-14

N.B. - Once a document has been s e e n or a document s e t has been ignored, or printed, i t CANNOT BE SEEN AGAIN unless you RESET and s t a r t the search over. In other words the only way t o go back i s t o s t a r t o v e r . T h i s i s due t o t h e n a t u r e of the search algorithm. 4.5 RELEVANCE FEEDBACK JJ.5.1 Relevance Having s e e n t h e t i t l e ( o r any o t h e r f i e l d s ) i t i s p o s s i b l e t o t a g t h e c i t a t i o n a s r e l e v a n t or n o t . You w i l l r e c e i v e t h e ?? p r o m p t which i s e s s e n t i a l l y a s k i n g two questions: f i r s t l y , i s the document r e l e v a n t ? and secondly, what would you l i k e t o do with the next document or the whole set? If you wish t o i n d i c a t e relevance you w i l l need t o reply with anyone of the following responses: rl rq r i n d i c a t e s the present document i s r e l e v a n t displays the t i t l e of the next document i n d i c a t e s t h e p r e s e n t document i s r e l e v a n t , and q u i t s the s e t t o r e t u r n t o command mode-> i n d i c a t e s t h e p r e s e n t document i s r e l e v a n t and a s k s w h e t h e r you wish t o i g n o r e or p r i n t t h e e n t i r e s e t , look a t t h e n e x t document or q u i t t o command mode ->

4.5.2 New Weights After looking a t and tagging any number of references as r e l e v a n t i t i s t h e n p o s s i b l e t o r e c a l c u l a t e t h e w e i g h t s based on t h e r e l e v a n c e i n f o r m a t i o n p r o v i d e d . T h i s w i l l be done automatically when you leave display mode. Nevertheless in command mode you can ask for a l i s t of new weights a t any s t a g e . The command f , newweights t f or "nw" w i l l p r o v i d e t h e following display " ^ ^ 3 ^ relevant doconenls
old wt. 6.30 5. 10 neu ut. B.24 7.04 rcls. 3 cigarette*? 3 snoking

If you then add new terms or execute another search the new weights will be based on the relevance information you have provided. M.5.3 Reset

When t h e s e a r c h on t h e one d a t a b a s e i s e x h a u s t e d and you wish t o change t o another database, s t a r t over with a new search or r e v i s e a previous sesearch the RESET command w i l l p r o v i d e any of t h e s e f a c i l i t i e s . Upon e n t e r i n g " r e s e t " you w i l l f i r s t be asked Enter database name i f you wish t o change e n t e r the appropriate ^ l e t t e r abbreviation MEZZ, PSYC, INSP, INZZ or simply c a r r i a g e r e t u r n which w i l l d e f a u l t t o MEDL. You w i l l then receive the Cirt command mode prompt and you can s t a r t building your new query.

APPENDIX

Ao-io

14.6 LOGGING OFF I t i s sometimes necessary t o l o g - o f f as many three separate "layers". d i s p l a y mode Data-Star Cirt I t i s always necessary t o l o g - o f f a t separate " l a y e r s " . Data-Star Cirt Display mode: Valid answers are r , l f f , q f r l , r q or r e t u r n . The only responses which w i l l return you t o command mode a r e : q or rq or r followed by q Command mode -> type n o T! and you w i l l r e c e i v e the Data-Star log-off message which w i l l f i n i s h with a C i r t command mode prompt i e least two as

-) o *CONNECT TIME INSF: 0:00:53 HH: MM: SS 0.015 DEC HRS. SESSION *SIGN-OFF 18.15.29 27. 01. B6Cle*nng the call Hours 0, Minutes 6, Seconds 21, Rx 166, 7x 79
COST of call ( ) 0. 00

1106*

At the ->

prompt type ->q

which will take you out of Cirt. the City University.

It will also log you off

4.7 ADDITIONAL FEATURES ^.7.1 Commands

Accounting

This permits you t o examine the number of City PSS u n i t s used and a l l o c a t e d l i m i t . This command i s most a p p r o p r i a t e l y used BEFORE Y U LOG-ON TO O DATA-STAR, OR AFTER Y U LOG-OFF. O

SET SearchSize=n Tnis command permits you t o i n c r e a s e the number of documents C i r t " g a t h e r s " for d i s p l a y . After the search has been executed the program seeks a minimum number of 15--documents for d i s p l a y . For instance i f you searched four terms C i r t p r o g r e s s e s through the s e t s u n t i l i t has "gathered for d i s p l a y " a minimum of 15 documents

APPENDIX A5-16

Bearer, tree *to. seen 3 f > 1 0 17 0

weight 31.33 31. OB 30.47

terns asia a>nerica europc Jfrica us a t i a a*ienca rurope a f n c a uk «sia anerica europe africa j u s t r a l i a

there are now 21 documents ready for display that is 6 more than the minimum of 15. Another example could look like this:
No. 1
62 seen weight 16.36 IS. 27 terns health uk funding . care health funding

0 0

there are now 63 documents ready for display that is 48 more than the minimum of 15 required by Cirt. If you require more than 15 documents for display it is possible to adjust Cirt with the command -> set ss=n n is any number of documents over the default 15 set by Cirt.

of

For example: Set Searchsize=50 will look for a minimum of 50 documents for display. Therefore the results of a search may look somthing like this:
No. 2 1 4 1 12 6
22 seen

0 0 0 0
. 0

3

0 0 0

weigh 26. 43 25. B7 24.93 24.48 24.27 24.20 23.99 23.54

terns enbolisn stroke thronbosis infarction nypcardial enbolisn angina thronbosis infarction acute enbolisn angina infarction nypcardial acute enbolisn stroke infarction nypcardial acute enbolisn thronbosis infarction nupcardial acute angina stroke infarction nupcardial acute angina thronbosis infarction nyoccrdial acute stroke thronbosis infarction nypcardial aculf

There a r e now 51 documents gathered for 1 more than your required 50. ^.7*2 Abandoning a_ Search When the message Searching.

display

is being displayed if for any reason you need to abandon a search simply press CONTROL C. This will break the search and send the following message Search interrupted Call talk to clear any remaining input or continue
->

You respond with t h e command " t a l k " which puts you i n t o Data-

APPENDIX A5-17 Star search mode ie 1_: Here you respond CONTROL P which will put you back into Cirt command mode. In summary - to abandon a search your responses are: control c ->t control p Once you have abandoned a search it is impossible to pick up where you left off. If you wish to re-do the same search or start a new and totally different search the only method for doing this is to use the command RESET. This command offers several facilities: at the end of a search or after you have abandoned a search RESET allows you to re-enter the previous search, execute a completely new subject search, or to change databases. Those databases offered by Cirt are MEDL, MEZZ, PSYC, INSP AND INZZ. 5. Printing off-line 5.1 PRINTING OFF-LINE USING WEIGHTED RETRIEVAL

In display mode: ??p ??r w i l l p r i n t the whole s e t of documents offline in full format marks a document as r e l e v a n t and automatically prints i t offline

Cirt will automatically print two sets of full format o f f l i n e p r i n t s , one for the user t o keep and one for the evaluation. If the offline print set i s a) Less than 50 there will be two identical print-off sets

b) Greater than 50 there w i l l be one complete s e t and another a selection of 50 documents only the f i r s t and l a s t 25 of the whole set.
5.2 PRINTING OFF-LINE USING BOOLEAN RETRIEVAL

A minimum of two p r i n t off statements must be made

APPENDIX A5-18

1) For t h e u s e r t o keep. I t w i l l c o n s i s t of a complete s e t in whatever form you feel appropriate t o the user. 2) For the evaluation. a. b. Less t h a n 50 documents t o t h e p r i n t - o f f set* w i l l require a complete s e t G r e a t e r t h a n 50 documents t o t h e p r i n t - o f f w i l l require two statements 1. the f i r s t 25 of the s e t 2. the l a s t 25 of the s e t for example i f print-off set t h e r e a r e 65 documents in t h e set*

. . p o 6/all/docs=1-25/id=abcd . . p o G/all/docsz^O-SS/id^bcd All three p r i n t statements can be entered on separate l i n e s or i f you a r e courageous a l l t h r e e can be e n t e r e d i n one l i n e stacking the commands eg ..po6/all/doc=1=25/id=abcd/..po6/all/doc=40-65/id=abcd/..po6/bibl/doc=all * If t h e r e i s more than one s e t t o be p r i n t e d o f f - l i n e i t i s a d v i s a b l e t o "OR" a l l t h e s t a t e m e n t s t o g e t h e r i n t o one set.

APPENDIX

A5-19

INDEX TO SEARCH GUIDE

abandoning a search 4.7-2 a b s t r a c t 2, 3-2 ac 4.7 accounting 4.7 add 3 - 1 , 4.2.2 adjacency 4.2.2 algorithm 4 . 3 , 4.4 animal 4.2.1 author 3-2 boolean 1, 4 . 1 . 1 , 4 . 1 . 3 , 4 . 2 . 1 p r i n t i n g o f f - l i n e 5.2 search 4.3 statements 4 . 2 . 2 , 4.3 carriage return 3-2, 4 . 1 . 3 , 4.6 Cirt search 4 . 1 . 1 , 4.1.4 t r a n s p a r e n t l y 1, 4 . 1 . 3 City University 4 . 6 , 4.7.1 command mode 3 . 1 , 3-2, 4 . 1 . 1 , 4 . 1 . 2 , 4.5.1, 5.4.3, 4.6, 4.7.2 CONTROL C 4.7.2 CONTROL P 4.7.2 CR 3-2, 4 . 1 . 3 , 4 . 2 . 1 , 4.6 1, 3, 4, 4 . 1 . 1 , 4 . 1 . 3 , 4.1.4, 4.2.2, 4.3, 4.7.2 default 4.3 del 3, 4.2.2 d e l e t e or del 3, 4.2.2 descriptors 3-2 display mode 3.2, 4.6 e n t i r e s e t 3-2, 4.5.1 explosion 4.2.2 exponentially 4.3 f 3.2, 4.6 f6 3-2 female 4.2.1 frequency 2 front-end 1 host 1, 4 . 1 . 3 , 4.3 human 4.2.1
i 4.4 ID 4.1.1 ignore or i 4.4 INSP 1, 4.5.3 internal parameter 4.3 INZZ 1, 4.5.3, 4.7.2 1 3.2, 4.4, 4.6 language 3-2, 4.2.1 layers 4.7 li 3.1 li dstar 4.1.2

Data-Star

APFENDIX

A5-20

limit responses M-2.1 specify 4,2.1 y or yes 4.2.1 n or no 4.2.1 CR (Carriage Return) 4.2.1 limits 4.1.4, 4.2.1 YR LG human animal female male other limits lin 3.1 list 4.2.2 log 1 logi 3.1 login 3.1 logging on 4.1, 4.7.1 cirt 4.1.1 Data-Star 4.1.2 Boolean 4.1.3 logging off 4..6, 4.7.1 look or 1 4.4 LSI 11/23 4.1.1 male 4.2.1 MEDLINE 1, 4.2.1 check tag 4.2.1 MEDL 1, 4.5.3, 4.7.2 menu 3.2 Mesh heading 4.2.2 MEZZ 1, 4.5.3, 4.7.2 N 4.2.1 N or No 4.2.1 new weights 2, 4.5.2 non-relevant 3-2 nw 4.5.2 o 4.6 offline prints 3.2, 4.1.1, 5

P 4.4, 5.1 paragraph qualification password 4.1.1 print or p 4.4 printing 4.1.1 off-line 5 print off 4.1.1 prototypes 1 pss 4.7.1 PSYC 1, 4.5.3, 4.7.2

4.2.2

q 3.2, 4 . 4 , 4.6 query 4.2 query i d e n t i f i e r 4.1.1 questionnaires 4.1.1 quit or q 4.4 quotation marks 4.2.2

APPENDIX A5-21

r 3.2, 4 . 5 . 1 , 4 . 6 , 5.1 ranking 1, 2 relevance 2, 3 - 1 , 3.2, 4.5 feedback 1, 2, 4.5 reset 4.5-3, 4.7.2 response time 4.3 return 3-2, 4 . 1 . 3 , 4 . 2 . 1 , 4.6 r l 3.2, 4 . 5 . 1 , ,4.6 rq 3.2, 4 . 5 . 1 , 4.6
s 4.3 search abandoning 4.7.2 algorithm 4.3, 4.4 exhausted 4.3 set number 4.2.2 size 4.3 statement 4.3 time 4.3 set 3.2, 4.5.1 set searchsize= 4.7.1 set ss= 4.7.1 source 2, 3-2 specify 4.2.1 string 4.2.2

t 4.7.2 tag 1, 4 . 5 . 1 , 4 . 5 . 2 talk 4.7.2 t i t l e 4.4 truncated 4.2.2 user ID 4.1.1 password 4.1.1 weighting 1, 2, 4.5 zero 4.2.2 weighting ranking and relevance feedback 1, 2 weighted search 4.1.1 p r i n t i n g o f f - l i n e 5.1 whole s e t 3-2, 4.5.1 y or yes 4.2.1 year 3-2, 4.2.1 zero weighting 4 . 2 . 2

APPENDIX

A5-22

ALPHABETICAL LISTING Of CttMANDS

APPENDIX A5-23

Alphabetical Listing of Conmands

The nature of commands and responses. Not unlike Data-Star Cirt has two modes COMMAND MODE which has the prompt ->. of the commands mentioned in the conmands, proceeded by the -> prompt. All commands are in the form -> conmand argumentl argument2 ... This mode expects any one alphabetical listing of

valid arguments are dependent on the command. For instance: -> add terml term2...

Commands and arguments can be in upper or lower case. Any conmand or agrument in the list below can be abbreviated by leaving out ony of the letters given in lower case eg if a conmand is specified as Login it can be abbreviated as fflin or "logi" or fflinfl or entered in full as fllogin" but it cannot be abbreviated to Mlft or "log". Spaces separate the conmand and arguments, but there should be no space within the command itself.

DISPLAY MODE which has the prompt ??. of the responses mentioned in the conmands preceeded by the ?? prompt.

This mode expects any one alphabetical listing of

This mode essentially asks two questions - one related to relevance the other related to desplaying documents, therefore it can accept either a single or double response. ?When addressing the question of relevance you must positively indicate relevance by typing Mrf!. All documents tagged with a "r M will automatically be printed offline. Nonrelevance is indicated by a blank or no response. ?The next question queries what you would like to do with following document or the whole set.
ft*********************************

the

COMMAND MODE -> Accounting A special conmand in Cirt which is not part of the search program therefore it must be used either before you log on to Data-Star or after you log-off. (See sample searches-)". The purpose of the conmand is to examine the number of City PSS units used and the allocated limit. It is particularly useful toward the end of the month (units are allocated monthly) to ensure there are sufficient units.

APPENDIX A5-24 -> ADD The comnand to add terms to the database. can take any of the following forms: 1. Single natural language terms ->add terml ->add term2 ... OR A string of terms separated by a space ->add terml term2 term3.«* 2. Data-Star search terms Truncated - psycholog$6 If right-truncation is used and more than 100 terms match, Cirt will simply pass on the Data-Star error message. Adjacency - "long adj term" Boolean - "labor or labour" Specific field - television$Lti.de. 3. Mesh search term facilities Terms

Explosion - lungtfMesh heading - lung-diseases N. B. QUOTATION MARKS - If the search term contains blanks then it must be enclosed in quotes. There is no harm in enclosing a term in quotes so if in doubt use quotes. After with using the command "add" Cirt will respond

Search set number, term, frequency and weight -> BReak In transparent mode ie when searching Data-Star directly, if you have requested a number of documents to be displayed at the terminal and no longer require all those documents to be transmitted to the terminal BREAK supresses further output. It must be used in conjunction with CONTROL P and TALK. In other words to stop a Data-Star print conmand type control p br t This should stop the printing of documents at the terminal. You should be in Data-Star print mode type ..s to get out of Data-Star print mode.)

ii f

-> ADD

APPENDIX A5-25 The command to add terms to the database. can take any of the following forms: L Single natural language terms ->add terml ->add term2 ... OR A string of terms separated by a space

Terms

->add terml term2 term3... 2. Data-Star search terms Truncated - psycholog$6 If right-truncation is used and more than 100 terms match, Cirt will simply pass on the Data-Star error message. Adjacency - "long adj term" Boolean - "labor or labour" Specific field - television$Lti.de. 3. Mesh search term facilities

Explosion - lung (hash) Mesh heading - lung-diseases N. B. QUOTATION MARKS - If the search term contains blanks then it must be enclosed in quotes. There is no harm in enclosing a term in quotes so if in doubt use quotes. After with using the conmand "add" Cirt will respond

Search set number, term, frequency and weight -> BReak In transparent mode ie when searching Data-Star directly, if you have requested a number of documents to be displayed at the terminal and no longer require all those documents to be transmitted to the terminal BREAK supresses further output. It must be used in conjunction with CONTROL P and TALK. In other words to stop a Data-Star print conmand type control p br t This should stop the printing of documents at the terminal. You should be in Data-Star print mode type ..s to get out of Data-Star print mode.)

->CLr

APPENDIX A5-26 A backup command which clears the call to DataStar* If ->q or ..o for some reason do not clear the call, type "cl* and this should clear the call. Terms can be deleted in one of two ways 1. Before the search is executed: del term! tera2... OR list (to get the following display)
-> list Query terns No. frequency

-> DELete

3. 11. 7. 10. 9. 2. 8. 4. 6.

32 38 157 518 650 736
1100 2723 3482 101923

weight 10.5024 10.3306 8.9119 7. 7182 7.4912 7.3669 6. 9651 6. 0587 5. 8128 2.4362

rels

0 0 0 0 0 0

c

c
0 0

text snobol bliss conpass prolog algol cobol nacro pascal fortran

c

No search -> del 9. 6 . . tern 9 deleted tern 6 deleted

then type: del 9" OR del 9 6... 2. Deleting after the search is done in the same form as above but as the term has been searched it will still appear in the listing but with a zero weighting. For example
-> delete 4 8 9 -> list Guery terns No. f re quency S 4. 68 5 2. 1075 s e. 1539 S 9. 3439 S 5. 5152 S 10. S 7. 813C S 3. 10375 S 6. 19193
us uk australia

-> list Guery terns No. frequency weight S 4 5 12.6586 S 2 68 10.0485
1C75 S 8 1539 S 9 3439 S 5 5152 S 10 8130 S 7 10375 S 3 S 6. 19193 Secret tree NC seer 2 C : 0 1? 0 7.2879 6.9291 6.1251 5.7209 5.2647 5.0209 4.4057 weioht 31.32 31.OS 30.47

10

rels 0 0
0 0 0 0 0 0 0

text rubbish thalidonide-ae
asia anenca europe africa us u> australia

weight 0. 0000 10.0485 0. OOOC 0. 0000 6.1251 0.0000 5.2647 5. 02C9 4.4C57 weight 16. 17

rels

0 0 0 0 0 0

text rubbish thaildo^ior-ar asia anerica europe africa
US

terns a s i a a n e r i c a europe a f r i c a a s i a a n e n c a europe a f r i c a asia a/nenca europe a f r i c a

Search tree__ seen No. 1 0

c c c

uk
australia

terns thalidonide-ae europe

-> NewW eights

APPENDIX A5-27 permits a recalculation of the weights based on relevance feedback information obtained froiL tagging documents after the first search. This command applies only after the first search- The following information is displayed.
-> nw

There are o l d wt. 6.30 5.10 4.20 4.19

3 known relevant documents rels. new wt. cigarette*2 3 B.24 snoking 3 7.04 long sdj tern 2 4.69 lungf 3 6.12

->0ut
->RESET

The log-off command to log-off Data-Star

If you wish to do the same search or start an new and totally different search the only method for doing this is to use the contrand RESET. This command offers several facilities: at the end of a search or after you have abandoned a search RESET allows you to re-enter the previous search, execute a completely new subject search, or to change databases. Those databases offered by Cirt are MEDL, MEZZ, PSYC, INSP AND INZZ. The log-off command to log out of Cirt This command automatically sends sets of boolean statements to Data-Star which are prescribed by the search algorithm (Robertson and Bovey, 1983). While the search is automatically being executed the terminal prompts the user with Searching full stops are added after the word "searching" to show the user the search is in progress. Once the search is complete you will be prompted with Searching Searched

->Quit ->Search

The length of time required for each search varies considerably and is dependent on host usage, response time, and the number of terms used for the enquiry. The search time increases exponentially in relation to the number of terms used - the more terms the longer it takes. The search procedure is designed in such a way that it never needs to resend any search statement. Hence if you type "search" after just completing a search the program will simply list the results of the search again. Alternatively if you execute a few commands (such as adding terms) then "search" again the new search will bring it all up to date. If after looking at, printing or ignoring some documents you get the message "search exhausted"

APPENDIX A5-28

->LIST

L i s t s the terras p r e s e n t l y a c t i v e on t h i s Presents the following display
Guerg i r m s •to. frequencg S «. 5 5 2. 68 S 6. 1075 S 9. 1539 S 5. 3^39 S 10. 5152 > weight 12.6586 10.0485 7.2879 6.9291 6.1251 5.7209 TCIS 0 0 0 0 C 0 text rubbish thalidonide-ae asia aneric* europe afnca

search.

->LogIn

The command used t o l o g - i n t o D a t a - S t a r . This command i s always used i n conjunction with t h e h o s t name. The s t a n d a r d a b b r e v i a t i o n for this command i s li dstar

->Look

After searching it is possible to look at the documents one set at a time using the conmand -> 1 or look

Cirt will respond with the number of documents in the set, the weight of all the documents within that set and the individual terms and their weights. In addition you will be asked if you wish to "ignore, print, look or quit" (all responses can be abbreviated to the first letter ie i,p,l or q ) . 1 or look - This "look" refers only to the current document not to the whole set. If you choose this option the title will automatically be displayed, and you will receive a ?? Cirt display mode prompt. If you wish to see additional fields in this citation type "f" which will display the six options available. They are: 1.Abstract ^.Language 2.Descriptors 5.Source 3-Year 6.Author & Source

It is possible to skip this display and indicate directly the additional field you require. This can be done by adding the corresponding number of the desired field after typing "f" ie "f1" would display the abstract, "f6" would display author and source etc. ignore This will ignore the whole set and skip to the next set. Print will print the whole of the set off-line in full format. This conmand will command mode -> return you to

print

quit

APPENDIX A5-29

t r E t t e u p t tc< f a i r e r sorr- mcrt aoc>r:er.t:: t : lo>: at (See search guide page for Verifier cetc-LIs), SearchSizem This coamand p e r m i t s you t o i n c r e a s e the number of docimrents C i r t g a t h e r s for display. After the search has been executed the program seeks a minimum number of 15 documents for d i s p l a y . For i n s t a n c e i f you searched four terms C i r t p r o g r e s s through t h e s e t s u n t i l i t has a minimum of 15 documents For example:
Search tree No. seen 3 0 1 0 17 0 weight 31.33 terns asia aneraca europe africa us asia sneric* europe a f n e a uk a«*ia *neri-a europe afraca justralia

31. oe
3C.47

there are now 21 documents ready for display, that is 6 more than the minimum of 15. Another example could look like this:
NL

seen

1
6J

0 C

weight 16. 3^ 15.27

terns health ul funding care health fundinc

there are now-63 documents ready for display, that is JJ8 more than the minimum of 15. If display command you require more than 15 documents it is possible to adjust Cirt with for the

-> SET SS=n n is any number of documents default of 15 set by Cirt.

over

the

For example: SET SS= 50 will look for a mini mum- of 50 documents for display. Therefore the results of a search may look something like this:

No. 2 1 4 1 12 6 22 3

seen 0 0 0 0 0 C 0 c

weight 26.43 25.87 24.93 24.48 24.27 24.20 23.99 23.54

terns enbolisn enbolisn enbolisn enbolisn enbolisn

stroke thronbosis infarction nyocardial angina thronbosis infarction acute angina infarction Myocardial acute stroke infarction Myocardial acute thronbosis infarction Myocardial acute

angina stroke infarction nyocardial acute angina thronbosis infarction nyoccdial acutr stroke thronbosis infarction nyocardial a:utf

There a r e now 51 documents gathered for 1 more than your r e q u i r e d 50. However, this procedure is not

display normally

APPENDIX

A5-30

necessary. If you look a t , p r i n t or ignore a i l the gathered documents, executing search again w i l l c a u s e C i r t t o g a t h e r t h e next 15 documents anyway. >Talk Permits the user t o t a l k d i r e c t l y to Data-Star. In o r d e r t o make c e r t a i n you a r e i n D a t a - S t a r s e a r c h mode i t i s b e s t t o type . . s which w i l l respond w i t h t h e u s u a l D a t a - S t a r s e a r c h mode prompt i e 6_: To r e t u r n t o C i r t s i m p l y p r e s s CONTROL P which w i l l prompt w i t h C i r t command mode ->

MODE

?rl -

i n d i c a t e s the present document i s relevant displays the t i t l e of the next document i n d i c a t e s t h e p r e s e n t document i s r e l e v a n t , and q u i t s the s e t t o r e t u r n t o command mode-> i n d i c a t e s t h e p r e s e n t document i s r e l e v a n t and a s k s what you would l i k e t o do w i t h t h e e n t i r e s e t i e ignore, p r i n t or q u i t .

?f

-

permits the d i s p l a y of a d d i t i o n a l f i e l d s . There a r e s i x p o s s i b l e o p t i o n s with c o r r e s p o n d i n g numbers: 1.Abstract 2.Descriptors 4.Language 5.Source 3-Year 6.Author & Source

I t i s possible t o skip t h i s display and i n d i c a t e d i r e c t l y t h e a d d i t i o n a l f i e l d you r e q u i r e . This can be done by adding the corresponding number of t h e d e s i r e d f i e l d a f t e r t y p i n g nfn i e "f 1" would d i s p l a y t h e a b s t r a c t , f l f6" would d i s p l a y a u t h o r and source etc. .1 ?i ?p ?q displays the t i t l e of the next document ignores the whole s e t and skips t o the next s e t p r i n t s the whole s e t o f f - l i n e in f u l l r e t u r n s t o command mode -> a s k s what you would l i k e t o do w i t h t h e e n t i r e s e t i e ignore, p r i n t or quit
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * § *

format.

?CR -

-24

APPENDIX A5-31 CONTROL C Not a Cirt conmand. Used for abandoning a search while in progress. If for any reason you need to abandon a search at the message Searching simply press CONTROL C.Ihis will break the search and send the following message Search interrupted Call talk to clear any remaining input or continue
->

You respond with the conmand "talk" you into Data-Star search mode ie
1_:

which

puts

Here you respond CONTROL P which will back into Cirt command mode ->. In summary - to abandon a search your are: control c t control p

put

you

responses

Then use RESET to "reinitialize" the search tree. CONTROL P Not a Cirt conmand. Used after the conmand "talk" to return you to Cirt Applicable to Medline searches only. If you are searching an other database you need only reply N or No and the program will skip this section and prompt with the Cirt command mode prompt ->. Now you can start to build your enquiry (see next section). If you are going to search Medline and require limits for Year, Language, Human, Animal, Female, Male or any other limits, you need only reply Y or Yes to this request for limits. There are four types of responses to limit requests. Specify the

limits -

- this response requires a statement of the type of limit you want. It applies to YR, LG or "other limits".

Y or Yes - a positive limit; for instance everything written pertaining to females N or No - a negative limit; written on animals do not want anything

CR - skips to the next limits., this limit does not apply

APPENDIX A5-32 imits and their appropriate responses YR are:

- Specify any two digit year ie 8H, Carriage Return (This will include all documents from the specified year to the most currentIt is not possible to specify only one isolated year unless it is the most recent year).

LG - Specify or Carriage Return (All Medline language abbreviations apply. If you are making a list put a comma, space between each abbreviation. In addition it is possible to exclude a language by specifying "not" before the language ie "not fr" or "not gr") human animal female male Yes No or Carriage Return Yes No or Carriage Return Yes No or Carriage Return

Yes No or Carriage Return

other limits - Specify or Carriage Return (This includes any applicable Medline check tag or negative boolean statement, such as adolescence, or not electron)

APPENDIX

A5-33

SAMPLE SEARCHES

27

Jar, 27 15: 5S 1736

APPENDIX 5C0 Fags 1

A5-34

Enter id for offline prints- cathy Enter query identifier- 500 Enter y or Y for boolean search; n, N or RETURN for weighted search-> li dstar call established D A T A - S T A R , PLEASE ENTER YOUR USERID :raluab ENTER YOUR A-M-I-S PASSWORD OOOOOOOOXXXXXXXX******** i - TELEPAC NUA 464110115 IS OVERLOADED, PLEASE USE 484110115. THANK YOU! - TWO NEW PREDICASTS DATABASES (PTNP/PTMA ) ONLINE NOW. SEE BROADCAST. - FAIRBASE, THE DIRECTORY ON TRADE FAIRS. ONLINE NOW. SEE BROADCAST. - MESH AND TREE 1986: SEE BROADCAST. ENTER YES IF BROADCAST MSG IS DESIRED.: ENTER DATA E:ASE NAME_: nedl Login successful Any lmits ?n -> add cigarette$2 snoking 'long adj tern* lungf 2. ,cigarette$2' added, freq=2897, weight= 6. 2966 3. "snoking' added, f req=9615, weight= 5. 0969 4. 'long adj tern' added, freq=23636, weight= 4.1975 5. 'lungf added, freq=23756, weight= 4.1924 -> s Searching Searched

Search tree No. seen weight 13 0 19.78 89 0 15.59 -> add snoking. ti. 9. 'snoking. ti. ' added, -> s Searching

terns cigarette$2 snoking long adj tern lungf cigarette$2 snoking long adj tern freq=2627, weight= 6.3944

Searched

Search tree No. seen weight terns 1 0 26. 18 cigarette$2 snoking long adj tern lungf snoking. ti. 35 0 21.99 cigarette$2 snoking long adj tern snoking. ti. -> 1 1 docunents with weight 26. 18 cigarette$2(6. 3) snoking(5. 1) long adj tern(4.2) lungf(4.2) snoking. ti. ( 6. 4 ) ignore, print, look or quit? 1 TI Morphologic and norphonetric effects of prolonged cigarette snoking on the snail airways. ?? f6 AU Cosio-M-G, Hale-K-A, Niewoehner-D-E. SO An-Rev-Respir-Dis 1980 Aug, VOL: 122 (2), P: 265-21, ISSN: 0003-0805.
?? f ^

AE: L e studied lungs fron 25 snokers and 14 lifelong nonsnokers, all over i 40 yr of age, to exanme the relationship of long-tern cigarette snoking to histopathologic changes in the snail airways. Despite

26

APTENDIX

A5-35

considerable overlap between the 2 groups, snokers had a significantly higher score (p < 0.01) for snail airway disease. The specific Morphologic features separating snokers fron nonsnokers were increases in goblet cell metaplasia (p < 0.001), snooth nuscle hypertrophy (p < 0.05), mflannation in the walls of bronchioles (p < 0.01), and respiratory bronchiolitis (p < 0.001). The average bronchiolar dianeter was not significantly different in snokers conpared with nonsnokers; however, snokers had an excess of airways less than 400 Microns in dianeter (p < 0.03). Anong snokers, the severity of snail airway disease correlated with the percentage of airways that are less than 400 nicrons in dianeter ( rs = 0.63) and with the extent of centrilobular enphysena (r = 0.53). Snokers also had an increase in the proportion of bronchial gland nass (p < 0.05), but this pathologic feature was not related to the severity of either snail airway disease or centrilobular enphysena. We concluded that prolonged cigarette snokmg is associated with progressive pathologic changes in the snail airways that nay be an inportant cause of airflow obstruction and that nay predispose to the developnent of centrilobular enphysena. Author. 35 docunents with weight 21.99 cigarette$2(6. 3) snoking( 5. 1 ) long adj tern(4.2> snoking. ti. ( 6. 4 ) ignore, print, look or quit7 q -> reset Enter data base nane -> Any linits ?n -> add cisplatin-ae heartf 2. Bcisplatin-ae' added, freq=637, weight= 7.8112 3. "heartf added, freq=42904, weight= 3.6013 -> s Searching. . . . Searched Search tree No. seen weight terns 2 0 11.41 cisplatin-ae heartf 635 0 7.81 cisplatin-ae -> 1 2 docunents with weight 11.41 cisplatin-ae(7.8) heart£( 3. 6 ) ignore, print, look or quit? 1 TI Conbined intravenous and intra-arterial cyclophosphanide, doxorubicin, and cisplatin (CISCA) in the nanagenent of select patients with invasive urothelial tunors. ?? rl TI High-dose cis-platinun in conbination with adrianycin in the treatnent of ovarian carcinona. 7 ? r 635 docunents with weight 7. 81 cisplatin-ae( 7. 8) ignore, print, look or quit? q -> add fenale

29

Jan 27 15:59 1986

APPENDIX

500 Page 2

A5-36

7. ' f e m a l e ' added, freq=529019, weight^ 1.0892 - > nw There are 2 known relevant documents old wt. new wt. rels. 7.81 9.42 2 cisplatin-ae 3.60 5. 18 2 heartf 1.09 2.29 2 female -> s Searching Searched

Search t r e e No. seen weight 363 0 11.71 -> reset Enter data base name
->

terms cisplatin-ae female

Any limits ?n -> add cigarette$2 smoking 'long adj term' lung£ 2. 'cigarette$2' added, freq=2897, weight= 6.2966 3. 'smoking' added, freq=9615, weight= 5. 0969 4. 'long adj term" added, freq=23636, weight= 4.1975 5. "lungf added, freq=23756, weight= 4. 1924 -> s Searching Searched

Search tree No. seen weight terms 13 0 19.78 cigarette$2 smoking long adj term lung£ 89 0 15.59 cigarette$2 smoking long adj term -> 1 13 documents with weight 19. 78 cigarette$2(6. 3 ) smokmg(5. 1) long adj term(4.2> lung£(4.2) ignore, print, look or quit? 1 TI Clearance of polonium-210-enriched cigarette trachea and lung. ?? rl TI Factors affecting the 'alveolar deposition7 particles in healthy subjects. ?? rl TI The pulmonary alveolar macrophage. ?? 10 docs remaining in this set: ignore, print, look or quit? p 89 documents with weight 15.59 cigarette$2(6. 3 ) smoking( 5. 1 ) long adj term(4.2) ignore, print, look or quit? Search exhausted -> s Searching Search tree Searched

smoke

from

the rat

of 5 microns inhaled

30

APPENDIX

A5-37

too. seen weight terns 205 0 15.59 c i g a r e t t e $ 2 smoking lung£ -> 1 205 documents with weight 15.59 cigarette$2<6. 3) smoking(5.1) lung£(4.2) ignore, print, look or quit? 1 TI Endobronchial foreign body (cigarette filter tip) with inflavatory pseudotumor of the lung. A •true* xanthogranuloma, ?? rl | TI Nicotine and cigarette snoking: an alternative hypothesis. ?? 203 docs remaining in this set: ignore, print, look or quit? q -> nw There are 3 known relevant documents old wt. new wt. rels. 6.30 8.24 3 cigarette$2 5.10 7.04 3 smoking 4.20 4.69 2 long adj term 4. 19 6.12 3 lung£ -> add asbestos 13. 'asbestos1 added, freq=1480, weight= 6.9682 -> s Searching Searched

Search tree No. seen weight terms 17 0 28.37 cigarette$2 smoking lung£ asbestos -> reset Enter data base name -> Any limits ?n -> add cigarette$2 smoking 'long adj term' lung-diseases£ 2. 'cigarette$2" added, freq=2897, weight= 6.2966 3. 'smoking' added, freq=9615, weight= 5. 0969 4. 'long adj term' added, freq=23636, weight= 4.1975 5. 'lung-diseasesf added, freq=48551, weight= 3. 4776 -> s Searching Searched

Search tree No. seen weight terms 19 0 19. 07 cigarette$2 smoking long adj term lung-diseases£ -> o ^CONNECT TIME MEDL: 0:08:29 HH:MM:SS 0.141 DEC HRS. SESSION 1102* *SIGN-0FF 16.58.18 27. 01. 86Clearing the call Hours 0, hinutes 9, Seconds 51, Rx 274, Tx 71 Cost of call ( ) 0. 00 -> q

31

12: C l o A BF(M§^4?^ 8 - *e

Enter id for offline prints- mjm Enter query identifier- ql23 Enter y or Y for boolean search; n,N or RETURN for weighted search-> ac f call established "lou have used 10146 units (max: 75000 units) •> li dstar c a l l established D A T A - S T A R , PLEASE ENTER YOUR USERID : r a l u a b ENTER YOUR A-M-I-S PASSWORD O0000OD0XXXXXXXX******** - TELEPAC NUA 464110115 IS OVERLOADED, PLEASE USE 484110115. THANK YOU! - SATURDAY 25. JAN: DUE HARDWARE INSTALLATIONS, D-S WILL NOT BE AVAILAE:LE FROM 0630 TO 1900 HOURS SWISS TIME. - TWO NEW PREDICASTS DATABASES TO BE LAUNCHED ON JANUARY 20, 1986. CONSULT BROADCAST. - FAIRBASE, THE DIRECTORY ON TRADE FAIRS, EXHIBITIONS, CONFERENCES ONLINE FROM JANUARY 27, 1986, SEE BROADCAST. - DATAPRO PC-SOFTWARE DATABASE ONLINE NOW. SEE BROADCAST. - MUNZINGER-LAENDERARCHIV ONLINE NOW. SEE BROADCAST. - MESH AND TREE 1986: SEE E-'ROADCAST.' ENTER YES IF BROADCAST MSG IS DESIRED.: RESTART IS POSSIBLE FOR DATABASE ' MEDL' REPLY 'Y' IF DESIRED - IF NOT, PRESS CARRIAGE RETURN : ENTER DATA BASE NAME.: medl Login successful Any limits ?y YR? 80 LG^' FR, EN human? animal? female? male? other limits? -> add drug-ae drug-ae not found -> add drugfl 'long adj term adj effects' 5. •drugfl1 added, freq=63477, weight= 2. 9623 6. 'long adj term adj effects' added, freq=871, weight= 7.2511 -> s Searching. . . Searched Search tree No. seen weight terms 113 0 10.21 long adj term adj effects drugil -> 1 113 documents with weight 10.21 long adj term adj effects(7.3) drugfl(3. 0) ignore, print, look or quit? 1 Tj Enalapril in essential hypertension.

32

APPENDIX

A5-39

112 docs renaining i n t h i s set: ignore, p r i n t , look or quit? q -> 1 112 docunents with weight 10.21 long adj tern adj effects(7. 3 ) drugSK 3. 0 ) 112 docs renaming in this set: ignore, print, look or quit? 1 TI The long-tern effects of cranial irradiation on the central nervous systen. ,
*?*? i

TI Long-tern effects of neuroleptic drugs on the neuroendocrine systen. ?? q -> add 'nervous systen" 8. 'nervous systen' added, freq=73375, weight= 2.8174 -> delete 8 tern 8 deleted -> add 'nervous adj systen' 9. 'nervous adj systen' added, freq=10259, weight= 4.7848 -> s Searching. ...... Searched Search tree No. seen 7 0 13 0
->

weight 15.00 12. 04
i

terns long adj tern adj effects drug$l nervous adj syste long adj tern adj effects nervous adj systen

1

7 docunents with weight 15. 00 . long adj tern adj effects(7.3) drug$l(3. 0) nervous adj systen(4.8) ignore, print, look or quit? 1 docunent 1 already seen TI Long-tern behavioral effects of phenobarbital in suckling rats. ?? 1 TI Nadolol: evidence for synpathetic nerve inhibition by a beta blocker in essential hypertension. ?? r 4 docs renaining in this set: ignore, print, look or quit? 1 TI Haenodynanic consequences of intrinsic synpathoninetic activity in relation to changes in plasna renin activity and noradrenaline during beta-blocker therapy for hypertension. ?? rl TI Tine course of regression of left ventricular hypertrophy in treated hypertensive patients. ?? q -> add beta-blocker*1 beta-blocker$l not found -> add beta-blocker beta-blocker not found -> 1 2 docunents with weight 15. 00 long adj tern adj effects(7.3) drug$l<3. 0) nervous adj systen(4.8) 2 docs renaining in this set: 33

APPENDIX

A5-40

ignore, print, look or quit? ) TI Neurotoxicity of industrial chemicals and contaminants: aspects of biochemical mechanisms and effects. ?? 1 TI Pharmacologic considerations in the therapy of neonatal apnea. 13 documents with weight 12.04 long adj term adj effects(7.3) nervous adj system(4. 8) ignore, print, look or quit? 1 TI Transient enhancement of sympathetic nervous system activity by longterm restriction of sodium intake. ?? 1 TI Autonomic neuroeffector mechanisms in smooth muscle. ?? 11 docs remaining in this set: ignore, print, look or quit? 1 TI Long term effects of exposure to viral infections in utero. ?? fl AE: An analysis was conducted of the major findings of a long term follow up study of 3076 subjects who were exposed to viral infections in utero and who at the time of analysis were up to 40 years of age. Mortality and morbidity were • compared with those in a control population matched for sex and date and area of birth. An excess of cancers (16 cases against seven) appeared to be clustered among those exposed to herpes viruses (varicella or cytomegalovirus). There was evidence of an increased risk of diabetes among those exposed to mumps during the first trimester ( four cases among 128 subjects against none in 148 controls). The most surprising finding was a decrease of diseases of the skin and subcutaneous tissue and of the nervous system among subjects exposed to antenatal varicella poster infection. The mechanism for the association may include production of fetal anti-idiotype antibodies in response to transplacentally acquired maternal autoantibodies. Author. ?? q -> reset Enter data base name -> Any limits ?n -> add sclerosis 2. 'sclerosis' added, freq=5281, weight= 5.6961 -> add lateral 3. 'lateral' added, freq=13615, weight= 4.7491 -> s Searching. . . Searched Search tree
No. seen weight terms 461 0 10.45 sclerosis lateral - ) add a m y t r o p h i c a m y t r o p h i c not f o u n d - > add n e u r o m u s c u l a r 6. ' n e u r o m u s c u l a r ' added, f r e q = 3 2 6 5 , w e i g h t = -) s 34

6.1770

APTENDIX Searching. . . . Searched Search t r e e No. seen weight 28 0 16.62 -> add niven niven not found -> 1 28 documents w i t h weight sclerosis(5. 7 ) l a t e r a l ( 4 .

A5-41

terns sclerosis lateral neuromuscular

16. 62 7 ) neuromuscular 6. 2 )

ignore, p r i n t , look or quit? 1 TI (Results of muscular x - r a y computed tonography neuromuscular disease). ?? f 2 DE Adolescence; Adult; Amyotrophic Lateral Sclerosis/radiography; Comparative Study; Diagnosis, Differential; English Abstract; Human; Hypertrophy/radiography; Museles/*radiography; Muscular Atrophy/radiography; Muscular Dystrophy/familial & genetic, radiography; Myositis/radiography; Myotonia Atrophica/radiography; Neuromuscular Diseases/classification, *radiography; Peripheral Nerve Diseases/radiography; Tomography, X-Ray Computed/* . 79 q -> add amyotrophic 9. •amyotrophic - added, freq=444, weight= 8. 1722 -> s Searching Searched

in

145 cases of

Search tree No. seen weight terms 28 0 24.79 sclerosis lateral neuromuscular amyotrophic -> 1 28 documents with weight 24.79 sclerosis( 5. 7 ) lateral<4.7) neuromuscular 6. 2 ) amyotrophic* 8. 2 ) ignore, print, look or quit? 1 document 1 already seen TI (Computed tomography of the skeletal muscles in neuromuscular diseases). ?? rl TI Analog specificity of the thyrotropin-releasing hormone receptor in the central nervous system: possible clinical implications. ?? rq ~> 1 25 documents with weight 24.79 sclerosis( 5. 7 ) lateral(4.7) neuromuscular 6. 2 ) amyotrophia 8. 2 )

35

Jan 20 13:07 1986

u c d f W ^ B ^

A5-42

25 docs renaining in this set: ignore, print, look or quit? p Search exhausted -> reset Enter data base nane
->

Any lifiits ?n -> t . .d type po 11027 THE FOLLOWING PRINTOFF QUERIES FOUND: Q0029 Q0030 **** END OF DISPLAY **** D-S - SEARCH MODE - ENTER QUERY 2_: ..purge q0029 q0030 11064 PRINTOFF QUERY Q0029 PURGED 11064 PRINTOFF QUERY Q0030 PURGED D-S - SEARCH MODE - ENTER QUERY 2_: -> o •CONNECT TIME MEDL: 0:33:26 HH:MM:SS 0.557 DEC HRS. SESSION •SIGN-OFF 13.59.30 20. 01.86Clearing the call Hours 0, Minutes 34, Seconds 3, Rx 334, Tx 67 Cost of call ( ) 0.00 -> ac call established Killed with signal 3 stop child reading fron network get child to clear call

1084*

36

APPENDIX A5-43

N W C M A D IN CIRT E O MNS Six new commands have been added t o C i r t . They p r o v i d e several c a p a b i l i t i e s . F i r s t l y t o enable you t o add terms o f f l i n e b e f o r e you logon t h e r e b y s a v i n g on l i n e t i m e . Secondly t o save searches and execute them subsequently on other databases, e i t h e r a t t h e same s e s s i o n or a t a l a t e r d a t e . L a s t l y t o overcome t h e problems i n v o l v e d i n l i m i t i n g by y e a r i n MEZZ ( t h i s h a s been o v e r l o a d i n g t h e s y s t e m ) , t h e new commands w i l l l e t you save searches and then execute them en the s m a l l e r databases i e ME74 1966 - 1974 ME82 1975 - 1982 MEDL 1983 N. B. All commands are listed here in upper case, this is purely for emphasis, Cirt will accept either upper or lower case. ADDING TERMS OFFLINE The command for this is Add Off Line AOF The most appropriate time to use this command is just after you have been asked which type of search you hope to do ie Cirt or Boolean (see Cirt Users Manual 4.1.1 page 4). Having added the terms you then logon to Data Star. You will get the usual Data Star response and be requested to designate any limits. When the command mode prompt -> appears it is advisable to check to see that your terms are there, with the command OQ (Old Query) SENDING THE TERMS DOWNLINE TO DATA STAR If the terms are satisfactory send them down the line to Data Star with the command ADD ALL. If only a few of the terms are appropriate it is possible to select only those terms to be sent to Data Star by typing in ADD and the corresponding number eg ADD 1 3 ^ etc. You then proceed with the search in the usual way typing S to search or adding additional terms. The whole process should look something like this:
f-ntpr i d for o f f l i n e p r i n t s - « j n K n t p r HijMPij i d e n t i f i e r - n i c k E n t e r n o r Y f o r h o o l e a n s e a r c h ; n, U ov RFTURN I n r wfMrjMf><l s e . * r c h - > a n t ^ h p a r t s t r o d e v a ) ^ t h r o n h o s i s p*hn»]«*» i n f a r c t i o n
- > Or,

rpls 1
0

X

rels bean
v/d)vP

. 0

0 0 0 0

n

.n

-> l i cktai r a i l established n A T A - S T A R ,

n o

0 0

stroke
t hronbosis
enbolISM

infarct ion
i F'LFASF ENTER YOUR USE^TP - T - l u - h

i MfFR YfUM.' A - M - - ] - ^ F'A^SIJCIRD

ncunonnoxxxxxxxx********
- MONDAY 1?. MAY 199<S: DUr TECHNICAL MAINTENANCE, f>-S WIU WAI1ABJ.F FROM 0<S30 TO 0^00 HOURS SWISS TIME, ENTER YES IF BROADCAST MSO IS DESIRED : FNTER DATA fcASE NAME_: nedl I onin successful 7 APM 1i«its n ->"add I ? 6 ?. 'heart* added, freq-3n076, weioht= : . 32 W < 3. Stroke' adde^ fren=3QM, weight* f . 415B l 4. 'infarction' added, ' reo=£Sl7, weinht= 4. tf-313 f -> s NOT f-F.

APPENDIX
Searching.. . . Searched

A5-44

^eirch tree No. seen weioht terns 13V 0 13.59 stroke infarction heart -> 1 133 documents with weioht t3. 59 strnke<5. 4 ) inf arctionC4,fl> heart(3. 3 ) ignore, print, look or quit 7 1 TI A prognostic comparison of asymptomatic left ventricular hypertrophy and unrecognized myocardial infarction; the Franinrjhart Si «»dy. •?-? 132 docs renaining in this set: ignore, print, look or quit? Search exhausted There are 1 known relevant documents old wt. new wt. rels. 5.4? 6.51 1 stroke 4.84 5.94 1 infarction 3.33 4.43 1 heart
r

CHANGING DATABASES AFTER A RESET, OQ w i l l give you a l i s t of your query terms as they were b e f o r e t h e r e s e t . Again you can ADD ALL or ADD 1 2 M... t o r e p e a t t h e s e a r c h on t h e new d a t a b a s e . SAVING SEARCH TERMS FOR LATER SESSIONS If you wish t o log off completely and come back t o the query a t a l a t e r d a t e t h i s i s p o s s i b l e w i t h t h e c o m b i n a t i o n of SAVE AND RESTORE commands, in t h i s manner. Immediately a f t e r you l o g o f f Data S t a r ( - > 0 ) o r change d a t a b a s e u s i n g RESET save t h e query with ->SAVE ( f i l e n a m e of up t o 10 l e t t e r s ) , eg SAVE MIKE then logoff Cirt (->Q).
- ) rpset No. of documents in o f f - l i n e p r i n t set is 1 Fnter data base name • CONNI-CT TTMF MFPM.: 0:04:57 HH:MM:SS 0. ilfn HFr MRS. Anij l i n i t s ?n
- . ci^p
%

SESSION

1408*

n jrt

- * nq

rels t rels 1.1 1 st roke ?, 1 1 infarction 3. 1 1 heart *rn{WrT TIME DFMM: 0:00:45 HH:MM:S* 0 O H Oi-T: HK'S. SFS^IDN *SIGN-0FF 10.53.14 0V. 05 RAflen ln^j tb*» rail Hours 0, Minutes 6, Seconds 4, Rx 61, T* )&

1409*

SEARCHING AT A LATER DATE At the new session before you logon to Data Star (ie after you have been asked what type of search you want - Boolean or weighted) in type RESTORE and file name, this will bring back your previously saved file and restore it ready for use. To be on the safe side you can get a listing of the terms saved by typing OQ. Then logon to Data Star and specify limits, send the terms to Data Star with the command -> ADD ALL and lastly search ->s

F m e r \d for o f f l i n p p r i n t s - n j n APFEMDIX A 5 - 4 5 F O I P T qupru lriei.t i f i p r mrk E n t # r IJ o r ~ Y f o r b o o l e a n s e a r c h ; n, N o r RETURN f o r w e i g h t e d ^ p ^ r r j . -> rpstore njn - ) oq rvl«= t re Is 1 i 1 stroke ? \ 1 infarction .>. ) 1 heart -> 1\ dMar cal1 established D A T A - S T A R , PLFASE ENTER YOUR USERID : r a l u a b FNTFP YOUR A - H - I - S PASSUORD

orv"innonrr<xY*xyxx********
- MONDAY 1?. MAY 198*: DUE TECHNICAL MAINTENANCE, D-S WILL NOT BE AVAII AW E FROM 0630 TO 0900 HOURS SWISS TIME. EN7FR YE^ TF BROADCAST MSG IS DESIRED \ ENTFR DATA RASE NAME_: nedl Login successful
Ang 1 i m t s ?n base na«p

-> reset
Enter data

- * nrrz

0:00:14 HH:MM;SS 0. 004 DEC HRS. •CONNFCT TIME MFDL: 9 Ann limits g LP? en hunan7 aninal? fe^ale^ nale? Oth*T limits? -> add all ? •strode' added, freq=9877, weight* 6.9051 3. 'infarction' added, freg=18195, weinht= 6.2917 4. 'heart" added. freq=37785, weight* 4.6965 -> s Searching. . . . Searched

SESSION

1413*

Search tree weinht terns No sp^n 17.fl9 201 0 stroke infarction heart * C O 3 N F C T TIMF MF77: 0:08:18 HH:MM:SS 0. 138 DEC HRS. SESSION *SIGN-OFF 12.44.34 09.05.86Clearing the call Hours 0, Minutes 8, Seconds 56, Rx 36, Tx 13

1414*

LISTING PURGING EXISTING FILES The command UNSAVE provides two functions: 1. When entered with a file name of one or your saved files it will erase the file. 2. Typed in on its own it will provide a list of all the file names currently being used. SUMMARY

aof
oq add all

Adding terms offline, permits you to type in terms before logging on. Displays the terms and relevance information currently held in the "old query terms table". Sends the saved query down line to Data Star then listing the usual term frequency and weighting information usually expected from the "add11 command.

save Specifies a file into which the present query will (file name) be saved. unsave 1. With a file name erases the specified file. 2. Without a file name provides a list of current files holding saved queries. 1. Brings back a previously saved query for use during the current session. 2. Without a file name provides a list of current files holding saved queries.

restore

APFENDIX

A6-1

OUTLINE OF PROCEEDURES

When the user comes requesting a search, b r i e f l y check t h a t : a. b. c. i t i s a subject search you w i l l be present the database you want i s a v a i l a b l e on C i r t . g i v e them the

Explain b r i e f l y about the experiment, introductory l e t t e r and have them sign i t . Draw the random card a. b.

write t h e i r name c l e a r l y on the card s t a p l e the numbered card t o the signed introductory letter

Write the query number on the top r i g h t corner of the purple p r e s e a r c h q u e s t i o n n a i r e . Ask t h e u s e r t o f i l l i n t h e form, perhaps while you are both discussing the search. Do t h e s e a r c h - s e e C i r t manual f o r d e t a i l s of e x e c u t i n g a search and p r i n t i n g o f f l i n e . If for any reason you do not go online reuse the random card, a l t e r n a t i v e l y i f you go online and t h e s e a r c h f a i l s f o r any r e a s o n , c o n s i d e r t h i s a f a i l e d search and do not reuse the random card. W r i t e t h e same q u e r y number on t h e b l u e and g r e e n questionnaires. Give the blue one t o the user. F i l l in the green one yourself. When the o f f - l i n e p r i n t s come enclose the pink form, and make s u r e t h e query number i s on t h e copy of t h e p r i n t s t o be evaluated. Cross f i n g e r s and hope t h e y r e t u r n t h e o f f l i n e p r i n t s . Keep t h i s mass of p a p e r t o g e t h e r so I can come and c o l l e c t it. Many Thanks,

APPENDIX AC-2 Random Allocator Cards

APPENDIX

A7-1

THE W CITY UNIVERSITY
Northampton Square London EC 1V OHB telephone: 01-253 4399 telex: 263896 Department of Information Science Head: Professor R. T. Bottle

The City Front End Project is a two year research grant funded by British Library Research and Development Department, headed by Dr. S. E. Robertson of the Department of Information Science at City University. The purpose of the project is to conduct an experiment in an operational environment comparing weighted retrieval with traditional Boolean retrieval. The weighted retrieval data will be provided by a front end system "Cirt" which was created under another BLR&D grant by Robertson and Bovey (1983). The Boolean retrieval data will be supplied from conventional searching using Data-Star. The project will consist of collecting a large number of searches randomly designated to either "Cirt's" weighted retrieval or traditional Boolean retrieval. The searches will be evaluated and the results compared to see whether there is a significant difference between the two systems. Your participation in the project will involve completing a questionnaire in three parts 1. The first part will be given before the search your expectations of the search. indicating

2. The second part will be given just after finishing the online search. This will provide us with background information and most importantly your assessment of certain aspects of the search. 3. The last part will involve evaluating a copy of the prints. offline

This research is registered under the Data Protection Act. All the information will be STRICTLY CONFIDENTIAL and used for no other purpose than the experiment. The data will be held on a computer only for the duration of the experiment and will be used for statistical processing. No individuals will be identified in the final report. If you are willing to participate would you please sign the bottom of this form. Many thanks for your cooperation.

Signed

Date

APPENDIX

A7-2

Query no

QUESTIONNAIRE

CCWFTDElfnAL Instructions: Please answer each question. We encourage comments on the back, but do please number your comments so we can match them to the question. 1. NAME DEPARTMENTAL ADDRESS DAYTIME 1ELEPHCNE STATUS: Pbst Graduate Consultant Researcher Doctor Lecturer

Other please specify information? eg

2. How do you i n t e n d t o use t h i s teaching, research, patient care etc

3.

Indicate your general assessment of the NATURE or your SUBJECT ENQUIRY General Vague or WafQey

Precise or Accurate

4.

What type of search do you require? BROAD - i e a l l the references on a subject including peripheral material. N K C - ie only very specific references A RW

5.

Have you had online searches done for you before? Yes No

If YES about how many? 6. Have you done an online search on your own without an intermediary? Yes No If YES about how many?

APPENDIX A7-3
Query no QUESTIONNAIRE

OCNFIIEMTIAL

Instructions: Please answer each question from 1 - 6 and EITHER 7 OR 8. We encourage comments on the back, but do please number your comments so we can match them to the question.

1.

Indicate your SATISFACTION with the search on the basis of the scale below. Excellent Good Satisfactory Pbor Bad

2.

Please provide a general assessment of the SEARCH. Easy Average Difficult

3.

Generally speaking were the RESULTS of the search. Excellent jBood Satisfactory Pbor Bad

4.

Please assess the SEARCHER'S CONrRIBUTICN to the search. Essential HelpfUl Satisfactory Pbor Bad

5. How close was the online search to your original or intended enquiry? Exact Fairly close Considerably altered

6.

Did you GET the number of REFERENCES EXPECTED? About as expected

Less than expected Mere than expected

***********************************

"CIRT" SEARCHES ONLY 7. Did you mark any seen references as relevant? Yes No

If YES did it appear to make the search more effective? Yes No BOOLEAN SEARCHES ONLY 8. Did you view any references while online? Yes No

If YES did you modify the search on the basis of the references you saw online? Yes No

AFPENDIX

A7-4

Query no
OCMFIEEMTIAL

Instructions:

Please answer all the questions.

1. NAME

2.

Indicate your OVERALL SATISFACTION with the search. Excellent Good Satisfactory Poor Bad

3.

Please provide a general assessment of the SEARCH PROCESS. Easy Average Difficult

4.

Generally speaking were the RESULTS of the search Excellent Good Satisfactory Poor Bad

5. What was the total NUMBER of TERMS in your PREiEARCH strategy ie after the interview but before going online.

6.

Approximately how long did it take to PREPARE the SEARCH (starting when you met the user ending when you went online).

7.

What was your REASON FDR FINISHING the search? Found what was required Technical difficulties The search strategy failed Others please specify

APPENDIX A8-1

Tables of results

The results reported in the following tables relate to the 190 searches eventually obtained (96 Boolean and 94 Weighted). (The totals in individual tables may vary slightly because of occasional missing values.) The tables are numbered to correspond to the sub-sections of section 4 in which they are discussed. There are two kinds of table. For those that refer to nominal variables, the results are presented as numbers of searches in each category. At the bottom of the table is the value of the chi-square statistic and a significance value (i.e. £ value). For the quantitative variables, the results are first summarised as an overall mean and standard deviation. Then the Mann-Whitney U test is used: the individual observations are ranked, ignoring sign, and the mean ranks for Boolean and Weighted are calculated. The statistic given is the z_ statistic, being the transformation of the U statistic into a form that is (under _ the null hypothesis) approximately normal with mean 0 and variance 1. A £ value is also given. The significance criterion used in the text is 5%: i.e. a £ value of less than 0.05 is regarded as significant. All these results were derived using the SPSS package.

APPENDIX A8-2

Boolean Excellent Good Satisfactory Poor

Weighted

31 45 14 6 0 5 .21821 0 .2656

20 47 19 5 2

Bad
Chi- Square Significance

4.2.1

User's Satisfaction with the Search

Boolean Easy Average Difficult Chi-Square Significance

Weighted

49 33 14 0 .05193 0 .9744

49 31 13

4.2.2

User' Assessment of the Search

Boolean Excellent Good Satisfactory Poor

Weighted

20 50 19 5 2 0.64913 0.9575

17 48 20 6 1

Bad
Chi-Square Significance

4.2.3

User's Assessment of Results

APPENDIX A8-3

Boolean Essential Helpful Satisfactory Chi- Square Significance

Weighted

75 21 0 4 .01550 0 .1343

65 25 3

4.2.4

Searcher's Contribution

Boolean Exact Close Difficult Chi-Square Significance

We ighted 36 55 2 3 .53145 0 .1711

41 48 7

4.2.5

Match of Search to Enquiry

Boolean Less Expected More Chi-Square Significance

Weighted

21 49 26 1. .03045 0.5974

25 41 27

4.2.6

Expected References

APPENDIX 'A8-4

Boolean Excellent Good Satisfactory Poor

Weighted

12 51 28 4 0 8 .88882 0 .0639

13 52 16 10 3

Bad
Chi- Square Significance

4.3.1

Intermediary's Satisfaction with the Search

Boolean

Weighted 31 46 17

LI
Easy Average Difficult Chi-Square Significance 4.3.2 | |

I

20 56 20 .3.57553 0.1673

I | I
|

Intermediary's Assessment of the Search

Boolean Excellent Good Satisfactory Poor Bad Chi-Square Significance 13 51 25 6 1 0.64913 0.9575

Weighted 11 57 20 5 1

4.3.3

Intermediary's Assessment of Results

APPENDIX A8-5

Boolean Found Req Results Tech Difficulties Strategy Failed Others Found & Tech Diffs Chi- Square Significance 83 0 2 4 7 2.233635 0.6742

Weighted 85 1 1 2 5

4.3.4

Reason for Finishing

APPENDIX A8-6

Mean Std. Deviation

69.994 57.827 | Boolean Weighted | 113.17 90

Mean Rank Cases Z Significance

|

66.57 89

-6.0160 0.0000

4.4.1

Packet Switching Packets Sent

Mean Std. Deviation

22.206 16.931 Boolean Weighted | 94.92 90

Mean Rank Cases Z Significance

85.03 89

-1.2766 0.2017

4.4.2

Online Time (minutes)

Mean Std. Deviation

28.103 25.349 Boolean Weighted 84.34 92

Mean Rank Cases

100.66 92 -2.0796 0.0376

Z
Significance

1

4.4.3

Online Citations

APPENDIX A8-7

Mean Std. Deviation

68.099 67.747 Boolean Weighted | 83.11 90

Mean Rank Cases Z Significance

98.80 91

-2.0148 0.0439

4.4.4

Offline Citations

Mean Std. Deviation

7.940 4.813 Boolean ' Weighted

Mean Rank Cases Z Significance

108.83 | 92 -4.1729 0.0000

76.17 92

4.4.5 (a) Terms Used in the Search

Mean Std. Deviation

-0.011 3.842 Boolean Weighted | 82.29 92

Mean Rank Cases Z Significance

103.59 93

-2.7621 0.0057

4.4.5 (b) Terms Added and Amended

APPENDIX A8-8

Mean Std. Deviation

15.912 12.166 Boolean 1 Weighted

Mean Rank Cases

97.88 90 -1.6179 0.1057

85.26 92

Z
Significance

4.5.2 (a) Total Relevantl Retrieved

Mean Std. Deviation

24.253 15.268

i
Mean Rank Cases Z Significance

j
| Boolean | Weighted 97.18 90 -1.4400 0.1499 85.94 92

4.5.2 (b) Total Relevant2 Retrieved

Mean Std. Deviation

0.477 0.275 | Boolean | Weighted

Mean Rank Cases Z Significance

|

87.92 | 90 -0.9080 0.3639

95.01 92

4.5.3 (a)

Precisionl

APPENDIX A8-9

Mean Std. Deviation |

0.708 0.267 Weighted 96.22 92

| Boolean Mean Rank Cases Z Significance | 86.67 | 90 |

-1.2266 0.2200

4.5.3 (b) Precision2