Identifying Indonesian Sentences Containing Idiomatic Expression Using the BERT Model
- DOI
- 10.2991/978-94-6463-413-6_16How to use a DOI?
- Keywords
- Idiomatic Expression; BERT Classifier Model; Idiomatic Sentence Identification
- Abstract
Idiomatic expressions are expressions that consist of a series of two or more words that have a meaning that cannot be predicted from the meaning of the individual words that compose them. Idiomatic expressions exist in almost all languages but are difficult to extract because there is no algorithm that can precisely decipher the structure of idiomatic expressions, so most rule-based machine translation systems generally translate idiomatic expressions by translating the constituent words word by word, but the translation results are not produce the true meaning of the idiom expression. In this research, the BERT model is used to identify sentences in Indonesian sentences that contain idiomatic expressions. The dataset used is a collection of basic Indonesian sentences that contain idiomatic expressions and basic Indonesian sentences that do not contain idiomatic expressions. This data amounts to 2000 sentences which have been labeled as non-idiomatic sentences and idiomatic sentences manually based on the Indonesian idiom dictionary book, with the number of sentences on each label being 1000 sentences. From the research conducted, the classification process using BiDirectional Encoder Representations from Transformers (BERT) obtained an Accuracy of 0.97, Precision 0.96, Recall 0.98 and F1-Score 0.97, respectively, with Learning Rate 2e-5 and Epoch 5.
- Copyright
- © 2024 The Author(s)
- Open Access
- Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
Cite this article
TY - CONF AU - A. A. I. N. Eka Karyawati AU - N. M . Yuli Cahyani PY - 2024 DA - 2024/05/13 TI - Identifying Indonesian Sentences Containing Idiomatic Expression Using the BERT Model BT - Proceedings of the First International Conference on Applied Mathematics, Statistics, and Computing (ICAMSAC 2023) PB - Atlantis Press SP - 160 EP - 169 SN - 2352-538X UR - https://doi.org/10.2991/978-94-6463-413-6_16 DO - 10.2991/978-94-6463-413-6_16 ID - Karyawati2024 ER -