Improving Topic Modeling Performance with Term Frequency for Feature Optimization on Final Project Dataset
- DOI
- 10.2991/978-94-6463-587-4_7How to use a DOI?
- Keywords
- Bag of Words; Final Project; Latent Dirichlet Allocation; Term Frequency-Inverse Document Frequency
- Abstract
Unstructured data is generated in huge volumes every day from sources such as emails, social media, and business documents. This data, although extremely useful for decision-making, requires extensive preprocessing and feature extraction to be useful. Text mining extracts meaningful patterns from large text datasets, revealing new insights through feature extraction, which identifies distinctive data attributes. Topic modeling, especially using Latent Dirichlet Allocation, plays an important role in this process by uncovering semantic structures hidden within large text collections. Latent Dirichlet Allocation operates with the Bag of Words approach but can be improved by integrating Term Frequency-Inverse Document Frequency, which filters out less important words and improves topic accuracy and processing speed. This research compares the performance of Bag of Words and Term Frequency-Inverse Document Frequency methods on Latent Dirichlet Allocation to extract topics from Indonesian final project abstracts. Through testing with the coherence score metric, it is shown that Latent Dirichlet Allocation combined with Term Frequency-Inverse Document Frequency can be used to extract the hidden topics better than Bag of Words.
- Copyright
- © 2024 The Author(s)
- Open Access
- Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
Cite this article
TY - CONF AU - Putu Manik Prihatini AU - I Ketut Gede Sudiartha AU - Sri Andriati Asri AU - Elina Rudiastari AU - I Nyoman Eddy Indrayana AU - Putu Indah Ciptayani PY - 2024 DA - 2024/12/01 TI - Improving Topic Modeling Performance with Term Frequency for Feature Optimization on Final Project Dataset BT - Proceedings of the International Conference on Sustainable Green Tourism Applied Science - Engineering Applied Science 2024 (ICoSTAS-EAS 2024) PB - Atlantis Press SP - 53 EP - 62 SN - 2352-5401 UR - https://doi.org/10.2991/978-94-6463-587-4_7 DO - 10.2991/978-94-6463-587-4_7 ID - Prihatini2024 ER -