Hate Speech Detection Based on Multiple Machine Learning Algorithms

Jialin Lu

doi:10.2991/978-94-6463-300-9_25

<Previous Article In Volume

Next Article In Volume>

Hate Speech Detection Based on Multiple Machine Learning Algorithms

Authors

Jialin Lu¹^{, *}

¹Computer Science, University of British Columbia, Vancouver, V6T1Z4, Canada

^*Corresponding author. Email: jialinlu@student.ubc.ca

Corresponding Author

Jialin Lu

Available Online 27 November 2023.

DOI: 10.2991/978-94-6463-300-9_25 How to use a DOI?
Keywords: Hate speech; Natural language processing; BERT
Abstract: Social media platforms such as Facebook, Twitter, and Reddit have experienced a substantial surge in user base and popularity over the past decade, facilitating global connectivity among billions of individuals. The major platforms have also served as a place for users to freely spread hate speech, which can be defined as offensive language against a specific group of people. Online hate speech has become a serious issue in the social media platforms, and can lead to negative psychological effects on the targeted people. Therefore, finding an effective model to classify a sequence as hate speech or not is very crucial. This paper treated this task as a sequence binary classification task, where the labels are hate speech and not hate speech, and conducted a comparative analysis on multiple different models with the binary label version of ETHOS dataset. Four metrics: accuracy, recall, precision, and F1 score were used to evaluate the trained/fine-tuned models, and the performance of each classification model that was trained/fine-tuned on ETHOS dataset were analyzed to discover potential weaknesses of the existing models. This research shows that the single-task fine-tuned BERT classifier resulted in the highest accuracy, recall, precision, and F1 score. Surprisingly, the simple probabilistic model Naïve Bayes also demonstrated good performance on hate speech classification using the test dataset. After thorough experimentation, this research also shows that the predictions of the Naïve Bayes and BiLSTM models are strongly affected by the appearance of words that are often associated and used in hate speech.
Copyright: © 2023 The Author(s)
Open Access: Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Download article (PDF)

<Previous Article In Volume

Next Article In Volume>

Volume Title: Proceedings of the 2023 International Conference on Image, Algorithms and Artificial Intelligence (ICIAAI 2023)
Series: Advances in Computer Science Research
Publication Date: 27 November 2023
ISBN: 978-94-6463-300-9
ISSN: 2352-538X
DOI: 10.2991/978-94-6463-300-9_25 How to use a DOI?
Open Access: Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Cite this article

ris enw bib

TY  - CONF
AU  - Jialin Lu
PY  - 2023
DA  - 2023/11/27
TI  - Hate Speech Detection Based on Multiple Machine Learning Algorithms
BT  - Proceedings of the 2023 International Conference on Image, Algorithms and Artificial Intelligence (ICIAAI 2023)
PB  - Atlantis Press
SP  - 244
EP  - 252
SN  - 2352-538X
UR  - https://doi.org/10.2991/978-94-6463-300-9_25
DO  - 10.2991/978-94-6463-300-9_25
ID  - Lu2023
ER  -

download .riscopy to clipboard