A Dataset for Hierarchical Diversity Measures
Xiaojie Wang, wangxiaojie@ruc.edu.cn
Zhicheng Dou, dou*AT*ruc*dot*edu*dot*cn
Renmin University of China
Overview
This page provides a dataset for hierarchical diversity measures and the evaluation results for runs in Text REtrieval Conference 2009-2013.
More details about hierarchical diversity measures can be found in the following paper. If you use this dataset for research, please kindly cite it.
X. Wang, Z. Dou, T. Sakai, and J. Wen. Evaluating Search Result Diversity using Intent Hierarchies. In SIGIR, 2016.
@inproceedings{Wang:SIGIR16:HEVAL,
author = {Wang, Xiaojie and Dou, Zhicheng and Sakai, Tetsuya and Wen, Ji-Rong},
title = {Evaluating Search Result Diversity using Intent Hierarchies},
booktitle = {Proceedings of SIGIR '16},
year = {2016},
}
Download our copy: [
Evaluating Search Result Diversity using Intent Hierarchies]
Intent Hierarchies
For each topic (query) in TREC 2009-2013, the official intents are manually grouped into an original intent hirarchy (OIH).
These original intent hierarchies can be found in Original-Intent-Hierarchies.txt. Note that the official intents that receive no relevant documents are removed.
We provide the original intent hierarchies in the following format (separated by tab):
<Topic ID> <1st-layer Node ID> <1st-layer Node Weight> <2nd-layer Node ID> <2nd-layer Node Weight>, etc
The leaf node IDs correspond one-to-one with the official intent IDs. A parent node ID is the minimum of its child nodes IDs.
For example, the original intent hierarchy for No. 77 topic with 4 official intents in TREC 2010 looks like the following:
77 |
2 |
0.25000000 |
|
|
|
|
77 |
1 |
0.75000000 |
4 |
0.33333333 |
|
|
77 |
1 |
0.75000000 |
1 |
0.66666667 |
1 |
0.50000000 |
77  |
1  |
0.75000000  |
1  |
0.66666667  |
3  |
0.50000000  |
Evalution Results
Evaluation results for runs in Text REtrieval Conference 2009-2013, which are organized first by year then by measure.
These evaluation results can be found within EvaluationResults.zip
We provide the evaluation results in the following format (separated by tab):
<Topic ID> <Run1 Score> <Run2 Score>, etc
For example, the evaluation results by N-rec@20 using OIH in TREC 2010 look like the following:
   |
cmuComb10   |
cmuFuTop10D   |
cmuWi10D   |
ICTNETDV10R1   |
51 |
1.000 |
1.000 |
0.800 |
1.000 |
52 |
0.500 |
0.500 |
0.375 |
0.500 |
53 |
0.400 |
0.400 |
0.000 |
0.800 |
... |