Information Retrieval Using Matrix Methods
Case Study: Three Popular Online News Sites in Indonesia
- DOI
- 10.2991/acsr.k.220202.032How to use a DOI?
- Keywords
- Data mining; Text mining; Matrix methods; Cosine size; Sparse matrix
- Abstract
This research is part of data mining, a sub-section of information retrieval and text mining. This research focuses on finding an approach to getting relevant documents online news documents with a specific threshold value and improving computing performance to get relevant documents with large documents. In this case, the author use news from 3 news sites that are pretty popular in Indonesia, which are included in the top 10 Alexa Traffic Rank (ATR) 2021, namely tribunnews.com, detik.com, and liputan6.com. In searching for relevant news documents, the author determines the threshold value first by calculating the average similarity value of the documents used as the experimental sample. The resulting threshold value is a determinant of the similarity value of each document to be used. The author uses several techniques to assist the research process, such as text mining with the tala method and news document representation techniques using matrix methods, and finally utilizing the cosine size method to determine the similarity of documents with matrix-based search data. The results obtained indicate that the approach using the matrix method and the matrix compression process shows good computational results, so it will be useful for implementation on a large number of documents.
- Copyright
- © 2022 The Authors. Published by Atlantis Press International B.V.
- Open Access
- This is an open access article under the CC BY-NC license.
Cite this article
TY - CONF AU - Ferry Wiranto AU - I Made Tirta PY - 2022 DA - 2022/02/08 TI - Information Retrieval Using Matrix Methods BT - Proceedings of the International Conference on Mathematics, Geometry, Statistics, and Computation (IC-MaGeStiC 2021) PB - Atlantis Press SP - 167 EP - 172 SN - 2352-538X UR - https://doi.org/10.2991/acsr.k.220202.032 DO - 10.2991/acsr.k.220202.032 ID - Wiranto2022 ER -