International Journal of Networked and Distributed Computing

Volume 4, Issue 1, January 2016, Pages 55 - 64

Topic Model based Approach for Improved Indexing in Content based Document Retrieval

Authors
Moon Soo Cha, So Yeon Kim, Jae Hee Ha, Min-June Lee, Young-June Choi, Kyung-Ah Sohn
Corresponding Author
Moon Soo Cha
Available Online 1 January 2016.
DOI
10.2991/ijndc.2016.4.1.6How to use a DOI?
Keywords
Information Retrieval; CBIR; TFIDF; LDA; Per-Category Indexing; Inverted Indexing
Abstract

Information Retrieval system plays an essential role in web services. However, the web services in which users can upload files as attachments typically do not support enough search conditions and often rely only on the title or the description that the users provide during upload. We present a topic-model based framework for fast and effective Content Based Document Information Retrieval that retrieves the information from the actual contents in the attachment. The proposed systems is analyzed and compared with conventional methods in various aspects. In particular, we propose an efficient keyword extraction method based on Latent Dirichlet Allocation which is compared with the Term Frequency Inverse Document Frequency approach typically used in conventional systems. Moreover, a per-category indexing structure is also proposed and compared with the existing total indexing scheme. Our experimental results validate the utility of the proposed system for web services that can upload document attachments.

Copyright
© 2017, the Authors. Published by Atlantis Press.
Open Access
This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Download article (PDF)

Journal
International Journal of Networked and Distributed Computing
Volume-Issue
4 - 1
Pages
55 - 64
Publication Date
2016/01/01
ISSN (Online)
2211-7946
ISSN (Print)
2211-7938
DOI
10.2991/ijndc.2016.4.1.6How to use a DOI?
Copyright
© 2017, the Authors. Published by Atlantis Press.
Open Access
This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Cite this article

TY  - JOUR
AU  - Moon Soo Cha
AU  - So Yeon Kim
AU  - Jae Hee Ha
AU  - Min-June Lee
AU  - Young-June Choi
AU  - Kyung-Ah Sohn
PY  - 2016
DA  - 2016/01/01
TI  - Topic Model based Approach for Improved Indexing in Content based Document Retrieval
JO  - International Journal of Networked and Distributed Computing
SP  - 55
EP  - 64
VL  - 4
IS  - 1
SN  - 2211-7946
UR  - https://doi.org/10.2991/ijndc.2016.4.1.6
DO  - 10.2991/ijndc.2016.4.1.6
ID  - Cha2016
ER  -