Topic Model based Approach for Improved Indexing in Content based Document Retrieval
- DOI
- 10.2991/ijndc.2016.4.1.6How to use a DOI?
- Keywords
- Information Retrieval; CBIR; TFIDF; LDA; Per-Category Indexing; Inverted Indexing
- Abstract
Information Retrieval system plays an essential role in web services. However, the web services in which users can upload files as attachments typically do not support enough search conditions and often rely only on the title or the description that the users provide during upload. We present a topic-model based framework for fast and effective Content Based Document Information Retrieval that retrieves the information from the actual contents in the attachment. The proposed systems is analyzed and compared with conventional methods in various aspects. In particular, we propose an efficient keyword extraction method based on Latent Dirichlet Allocation which is compared with the Term Frequency Inverse Document Frequency approach typically used in conventional systems. Moreover, a per-category indexing structure is also proposed and compared with the existing total indexing scheme. Our experimental results validate the utility of the proposed system for web services that can upload document attachments.
- Copyright
- © 2017, the Authors. Published by Atlantis Press.
- Open Access
- This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).
Cite this article
TY - JOUR AU - Moon Soo Cha AU - So Yeon Kim AU - Jae Hee Ha AU - Min-June Lee AU - Young-June Choi AU - Kyung-Ah Sohn PY - 2016 DA - 2016/01/01 TI - Topic Model based Approach for Improved Indexing in Content based Document Retrieval JO - International Journal of Networked and Distributed Computing SP - 55 EP - 64 VL - 4 IS - 1 SN - 2211-7946 UR - https://doi.org/10.2991/ijndc.2016.4.1.6 DO - 10.2991/ijndc.2016.4.1.6 ID - Cha2016 ER -