47.

5.

Final Discussion

Our main feeling at the completion of this project is some disappointment that the logistic model has not so far shown that it can reach the performance level of existing methods. Since the

new model is also somewhat more complex in application than the old, we have no grounds for claiming any practical use for logistic methods.

Nevertheless, there are some positive conclusions we can draw. The theoretical properties of the class of logistic models

make it particularly attractive as a vehicle for experimental and theoretical work. For example, the manner in which dependence

parameters can be incorporated into the model to any desired extent clearly makes the model suitable for investigation of dependence; work. similarly it would be suitable for query expansion

In both these cases, the theoretical work described in section 4 is relevant. Although we failed to achieve a simple,

practical rule for when the dimension of the model should be increased, we believe we have made a substantial step in that direction. Further, in the process we have gained some insight into the problem. We can state that the possible profitable use of a new

parameter depends not so much on whether that property exists or is correlated with relevance, as on whether we have enough data to make use of the property. Thus, we should no longer argue between

independence and dependence models on the grounds of the existence of dependence, but rather on the basis of whether we can use the fact. Furthermore, and irrespective of the dependent/independent

argument, the usefulness of a new parameter depends a great deal on how many parameters we have already.

On the problem of estimation which was one of the central problem areas originally identified, our progress has not been great. We had high hopes of the logistic approach because of the non-random

48.

nature of the estimating sample.(the same argument is used forcefully in the medical context). However, the hoped-for

benefits did not emerge; we now consider, as indicated in section 2.4.2, that the use of the complement method effectively negates that advantage of the logistic model. We further failed to obtain

much advantage from the ability to choose any prior distribution for the logistic model (although clearly there is scope for further experiments on these lines). It appears that the "prior"

that is implied in the RSJ model, "(7.5" formula, is as good as anything else we tried. It may be worth analysing the characteristics

of this implicit prior, and trying to reproduce them in the logistic model. Further, it may be that where very small samples are

concerned, the RSJ model (although not ideal) still has an advantage over the maximum posterior method used with the logistic model.

Finally, the method we have developed for realistic evaluation of feedback in searching should be useful for future experiments. Indeed, we would like to see a number of such

experiments, since the few that we have done suggest that the benefits to be obtained from relevance feedback have been exagerated by past experiments.

49.

Acknowledgement s

We are grateful for the availability of the test collections used for our experiments, immediately to Dr. K. Sparck Jones and Professor C.J. van Rijsbergen, and ultimately to Mr. L. Evans of INSPEC and Dr. P.K.T. Vaswani of the National Physical Laboratory.

50.

References

CROFT, W.B. (1981)

Document representation in probabilistic Journal of the American Society

models of information retrieval.

for Information Science, 32, 451-457.

CROFT, W.B. and

HARPER, D.J. (1979).

Using probabilistic models of Journal of

document retrieval without relevance information. Documentation, 35, 285-295.

DAWID, A.P. (1976) Properties of diagnostic data distributions. Biometrics, 32, 647-658.

EVANS, L. (1975a) Search strategy variations in SDI profiles. Report R75/1, INSPEC, London.

EVANS, L. (1975b) Methods of ranking SDI and IR output. Report R75/3, INSPEC, London.

HARPER, D.J. (1980).

Relevance feedback in document retrieval.

Ph.D. Thesis, University of Cambridge.

HARPER, D.J. and VAN RIJSBERGEN, C.J. (1978).

An evaulation of Journal

feedback in document retrieval using co-occurrence data. of Documentation, 34, 189-216.

PORTER, M.F. (1980)

An algorithm

for suffix stripping.

Program, 14, 130-137. Porter (1980).

Also in : van Rijsbergen, Robertson and

ROBERTSON, S.E. (1976).

A theoretical model of the retrieval Ph.D. Thesis,

characteristics of information retrieval systems. University of London.

ROBERTSON, S.E. (1977).

The probability ranking principle in IR.

Journal of Documentation, 33, 294-304

ROBERTSON, S.E. and SPARCK JONES, K. (1976). search terms. 2^7, 129-146.

Relevance weighting of

Journal of the American Society for Information Science,

51.

ROBERTSON, S.E., VAN RIJSBERGEN, C.J. and PORTER, M.F. (1981). Probabilistic models of indexing and searching. Information Also in :

Retrieval Research, Butterworths, London (pp 35-56). Van Rijsbergen, Robertson and Porter (1980).

SPARCK JONES, K. (1972) A statistical interpretation of term specificity and its application in retrieval. Documentation, 28, 11-21. Journal of

SPARCK JONES, K. (1979a) search terms.

Experiments in relevance weighting of

Information Processing and Management, 15, 133-144.

SPARCK JONES, K. (1979b).

Search term relevance weighting given Journal of Documentation, 35, 30-48

little relevance information.

SPARCK JONES, K. (1980).

Search term weighting : some recent

results. Journal of Information Science,!, 325-332.

SPARCK JONES K. and BATES, R.G. (1977) indexing 1974-1976.

Research on automatic

BL R&D Report No.5464

SPARCK JONES, K. and WEBSTER, C.A. (1980). Research on relevance weighting 1976-1979. BL R&D Report No.5553.

TITTERINGTON, D.M. et al (1981) Comparison of discrimination techniques applied to a complex data set of head injured patients. Journal of the Royal Statistical Society A, 144, 145-175.

VAN RIJSBERGEN, C.J. (1979)

Information retrieval

Butterworths, London (2nd edition),

VAN RIJSBERGEN, C.J. , HARPER, D.J. and PORTER, M.F. (1981). The selection of good search terms. Management, 17, 77-91. Porter (1980). Information Processing and

Also in Van Rijsbergen, Robertson and

VAN RIJSBERGEN, C.J., ROBERTSON, S.E. and PORTER, M.F. (1980). New models in probabilistic information retrieval. BL R & D Report No. 5587.

52.

VASWANI, P.K.T. and CAMERON, J.B. (1970)

The National Physical

Laboratory Experiments in statistical word associations and their use in document indexing and retrieval. Teddington. National Physical Laboratory,

YU, C.T.., LAM, K. and SALTON, G. (1982).

Term weighting in Journal

information retrieval using the term precision model.

of the Association for Computing Machinery, 29, 152-170.