Proceedings of the International Conference on Sustainable Green Tourism Applied Science - Engineering Applied Science 2024 (ICoSTAS-EAS 2024)

Improving Topic Modeling Performance with Term Frequency for Feature Optimization on Final Project Dataset

Authors
Putu Manik Prihatini1, *, I Ketut Gede Sudiartha1, Sri Andriati Asri1, Elina Rudiastari1, I Nyoman Eddy Indrayana1, Putu Indah Ciptayani1
1Information Technology Department, Politeknik Negeri Bali, Bali, Indonesia
*Corresponding author. Email: manikprihatini@pnb.ac.id
Corresponding Author
Putu Manik Prihatini
Available Online 1 December 2024.
DOI
10.2991/978-94-6463-587-4_7How to use a DOI?
Keywords
Bag of Words; Final Project; Latent Dirichlet Allocation; Term Frequency-Inverse Document Frequency
Abstract

Unstructured data is generated in huge volumes every day from sources such as emails, social media, and business documents. This data, although extremely useful for decision-making, requires extensive preprocessing and feature extraction to be useful. Text mining extracts meaningful patterns from large text datasets, revealing new insights through feature extraction, which identifies distinctive data attributes. Topic modeling, especially using Latent Dirichlet Allocation, plays an important role in this process by uncovering semantic structures hidden within large text collections. Latent Dirichlet Allocation operates with the Bag of Words approach but can be improved by integrating Term Frequency-Inverse Document Frequency, which filters out less important words and improves topic accuracy and processing speed. This research compares the performance of Bag of Words and Term Frequency-Inverse Document Frequency methods on Latent Dirichlet Allocation to extract topics from Indonesian final project abstracts. Through testing with the coherence score metric, it is shown that Latent Dirichlet Allocation combined with Term Frequency-Inverse Document Frequency can be used to extract the hidden topics better than Bag of Words.

Copyright
© 2024 The Author(s)
Open Access
Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Download article (PDF)

Volume Title
Proceedings of the International Conference on Sustainable Green Tourism Applied Science - Engineering Applied Science 2024 (ICoSTAS-EAS 2024)
Series
Advances in Engineering Research
Publication Date
1 December 2024
ISBN
978-94-6463-587-4
ISSN
2352-5401
DOI
10.2991/978-94-6463-587-4_7How to use a DOI?
Copyright
© 2024 The Author(s)
Open Access
Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Cite this article

TY  - CONF
AU  - Putu Manik Prihatini
AU  - I Ketut Gede Sudiartha
AU  - Sri Andriati Asri
AU  - Elina Rudiastari
AU  - I Nyoman Eddy Indrayana
AU  - Putu Indah Ciptayani
PY  - 2024
DA  - 2024/12/01
TI  - Improving Topic Modeling Performance with Term Frequency for Feature Optimization on Final Project Dataset
BT  - Proceedings of the International Conference on Sustainable Green Tourism Applied Science - Engineering Applied Science 2024 (ICoSTAS-EAS 2024)
PB  - Atlantis Press
SP  - 53
EP  - 62
SN  - 2352-5401
UR  - https://doi.org/10.2991/978-94-6463-587-4_7
DO  - 10.2991/978-94-6463-587-4_7
ID  - Prihatini2024
ER  -