Exploiting Document Boltzmann Machine in Query Extension
- DOI
- 10.2991/cnct-16.2017.80How to use a DOI?
- Keywords
- Document Boltzmann Machine, Query Extension , Model Selection, CIF.
- Abstract
Most work related to query extension (QE) adopted the assumption that terms in a document are independent, and multinomial distribution is widely used for feedback documents modeling in lots of QE models. We argue that in QE methods, the relevance model (RM) which generates the feedback documents should be modeled with a more suitable distribution, in order to naturally handle the term associations in feedback document. Recently, Document Boltzmann Machine (DBM) was proposed for document modeling in information retrieval, and this model can relax the independence assumption, i.e., can capture the term dependency naturally. It has been shown that DBM can be seen as the generalization of traditional unigram language model and achieves better ad hoc retrieval performance. In this paper, we replace the multinomial distribution in the traditional unigram RM method with DBM, while leaving the main QE framework unchanged to keep the model uncomplicated. Thus, the relevance model is estimated by the DBM trained on feedback documents, called relevance DBM (rDBM). The extended query is generated from the learnt rDBM, and we give the final extended query likelihood according to the parameter values in rDBM. One difficulty in learning rDBM is the problem of data sparseness, which could lead to over fitted rDBMs and harm the retrieval performance. To solve this problem, we adopt Confident Information First (CIF)as model selection principle to reduce the complexity of rDBM, which lead our proposed query extension method more efficient and practical. Experiments on several standard TREC collections show the effectiveness of our QE method with DBM and model selection method.
- Copyright
- © 2017, the Authors. Published by Atlantis Press.
- Open Access
- This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).
Cite this article
TY - CONF AU - Li-ming HUANG AU - Xiao-zhao ZHAO AU - Yue-xian HOU AU - Ya-ping ZHANG PY - 2016/12 DA - 2016/12 TI - Exploiting Document Boltzmann Machine in Query Extension BT - Proceedings of the International Conference on Computer Networks and Communication Technology (CNCT 2016) PB - Atlantis Press SP - 585 EP - 592 SN - 2352-538X UR - https://doi.org/10.2991/cnct-16.2017.80 DO - 10.2991/cnct-16.2017.80 ID - HUANG2016/12 ER -