Search Result Diversification

Studies show that the vast majority of queries to search engines are short and vague in specifying a user’s intent. Different users may have completely different information needs and goals when using precisely the same query. For example, User A is finding information about Apply Company by issuing a query "apple,", while User B is finding information related to fruit apple using the same query. When such a query is issued, search engines will return a list of documents that mix different topics. It takes time for a user to choose which information he/she wants. Search Result Diversification is an effective way to solve this problem. It provides a list of results that cover as many aspects as possible, so that most users can be satisfied by the top results.

>More about Search Result Diversification

Query Facet/Dimension Mining

We address the problem of finding multiple groups of words or phrases that explain the underlying query facets, which we refer to as query dimensions/facets. We assume that the important aspects of a query are usually presented and repeated in the query’s top retrieved documents in the style of lists, and query facets can be mined out by aggregating these significant lists.

>More about Query Facet Mining


搜索引擎在一定程度上解决了大规模网页所带来的信息过载问题。 用户可通过输出简单的关键词,即可在海量互联网内容中查找到相关网站或者网页。 但近年来随着互联网尤其是移动互联网的高速发展,互联网文档的数量、内容的丰富度和复杂度都大大增加。 互联网朝大数据时代迈进,而用户的信息需求也趋于复杂化。除了基本的信息检索需求外,对大量相关文档的深入理解与聚合分析的需求也越来越强烈, 而传统的互联网搜索引擎已经无法满足人们该类信息需求。 针对这一问题,我们提出了“互联网分析引擎”的构想。


Past Projects