Prof. Tetsuya Sakai visited us and gave a talk on "Designing Test Collections based on Statistical Requirements"

2014/10/15

Evaluation conferences such as TREC (Text Retrieval Conference) and NTCIR (NII Testbeds and Community for Information access Research) build test collections every year, to enable fair comparisons across different research groups. However, the design principle they use tend to be ad hoc - for example, they typically decide to build n=50 topics, and then let the pool depth be pd=100 so that the test collections will have "reasonable" numbers of relevant documents sampled from the large target corpora. In this talk, I will show that the topic set size n can be determined based on statistical requirements, such as the maximum length of confidence intervals, and the minimum performance difference under given probabilities of Type I and Type II errors. The pool depth can also be determined based on the available budget. I will show that test collections need to be designed with specific evaluation measures in mind, and that this statistical approach can cut down assessment costs dramatically.

Tetsuya's talk in RUC