Document Retrieval System Based on Topic Clustering Method
- DOI
- 10.2991/icst-18.2018.118How to use a DOI?
- Keywords
- document retrieval; topic model; clustering
- Abstract
Document retrieval aims to find documents in a collection of unstructured text to meet the needs of user information. The search engine was required in the document retrieval system to perform the entire process automatically, starting from the processing of document text in the collection, feature selection, feature extraction, query text processing and search documents relevant to the query. There were three main factors in improving search engine performance: the feature selection method, the method of weighting features in document collections and the method of searching documents in the collection. In this paper, there were some methods used to improve the performance of search engines. For feature selection, Term Frequency-Invers Document Frequency based on Luhn's Idea was used for document features selection. For weighting features, Fuzzy Gibbs Latent Dirichlet Allocation was used for feature extraction method to weight the document features. To search documents that were relevant to the query, this paper used a Document Retrieval based on Topic Clustering method. Through this method, all documents were clustered based on feature weight obtained through feature extraction methods. Clusters that relevant to the query term combinations were selected and all documents in the cluster were displayed as search results. The result showed this method can retrieve set of documents in the cluster that relevant to the query. Therefore, this method could eliminate the query-document distance calculation function in the retrieval process, so it was hoped that the search process would run faster.
- Copyright
- © 2018, the Authors. Published by Atlantis Press.
- Open Access
- This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).
Cite this article
TY - CONF AU - P.M. Prihatini AU - I.K.G.D. Putra AU - I.A.D. Giriantari AU - M. Sudarma PY - 2018/12 DA - 2018/12 TI - Document Retrieval System Based on Topic Clustering Method BT - Proceedings of the International Conference on Science and Technology (ICST 2018) PB - Atlantis Press SP - 568 EP - 573 SN - 2589-4943 UR - https://doi.org/10.2991/icst-18.2018.118 DO - 10.2991/icst-18.2018.118 ID - Prihatini2018/12 ER -