Proceedings of the First International Conference on Applied Mathematics, Statistics, and Computing (ICAMSAC 2023)

Identifying Indonesian Sentences Containing Idiomatic Expression Using the BERT Model

Authors
A. A. I. N. Eka Karyawati1, *, N. M . Yuli Cahyani1
1Udayana University, South Kuta, Jimbaran, Indonesia
*Corresponding author. Email: eka.karyawati@unud.ac.id
Corresponding Author
A. A. I. N. Eka Karyawati
Available Online 13 May 2024.
DOI
10.2991/978-94-6463-413-6_16How to use a DOI?
Keywords
Idiomatic Expression; BERT Classifier Model; Idiomatic Sentence Identification
Abstract

Idiomatic expressions are expressions that consist of a series of two or more words that have a meaning that cannot be predicted from the meaning of the individual words that compose them. Idiomatic expressions exist in almost all languages ​​but are difficult to extract because there is no algorithm that can precisely decipher the structure of idiomatic expressions, so most rule-based machine translation systems generally translate idiomatic expressions by translating the constituent words word by word, but the translation results are not produce the true meaning of the idiom expression. In this research, the BERT model is used to identify sentences in Indonesian sentences that contain idiomatic expressions. The dataset used is a collection of basic Indonesian sentences that contain idiomatic expressions and basic Indonesian sentences that do not contain idiomatic expressions. This data amounts to 2000 sentences which have been labeled as non-idiomatic sentences and idiomatic sentences manually based on the Indonesian idiom dictionary book, with the number of sentences on each label being 1000 sentences. From the research conducted, the classification process using BiDirectional Encoder Representations from Transformers (BERT) obtained an Accuracy of 0.97, Precision 0.96, Recall 0.98 and F1-Score 0.97, respectively, with Learning Rate 2e-5 and Epoch 5.

Copyright
© 2024 The Author(s)
Open Access
Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Download article (PDF)

Volume Title
Proceedings of the First International Conference on Applied Mathematics, Statistics, and Computing (ICAMSAC 2023)
Series
Advances in Computer Science Research
Publication Date
13 May 2024
ISBN
10.2991/978-94-6463-413-6_16
ISSN
2352-538X
DOI
10.2991/978-94-6463-413-6_16How to use a DOI?
Copyright
© 2024 The Author(s)
Open Access
Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Cite this article

TY  - CONF
AU  - A. A. I. N. Eka Karyawati
AU  - N. M . Yuli Cahyani
PY  - 2024
DA  - 2024/05/13
TI  - Identifying Indonesian Sentences Containing Idiomatic Expression Using the BERT Model
BT  - Proceedings of the First International Conference on Applied Mathematics, Statistics, and Computing (ICAMSAC 2023)
PB  - Atlantis Press
SP  - 160
EP  - 169
SN  - 2352-538X
UR  - https://doi.org/10.2991/978-94-6463-413-6_16
DO  - 10.2991/978-94-6463-413-6_16
ID  - Karyawati2024
ER  -