ISSN: A/F

Bilingual Hate Speech Detection on Social Media: Amharic and Afaan Oromo

Abstract

Hate speech detection on social media has become a critical issue, particularly in bilingual settings where language mixing complicates identification. This study focuses on Amharic and Afaan Oromo, two widely spoken languages in Ethiopia, and investigates how deep learning techniques can enhance bilingual hate speech detection. The research examines five key aspects: the impact of language mixing on detection accuracy, the effectiveness of hybrid deep learning classifiers, the role of feature extraction techniques, the significance of linguistic features, and the influence of bilingual communication on hate speech propagation. Using classifiers such as CNN, BiLSTM, CNN-BiLSTM, and BiGRU, along with feature extraction methods like Keras word embedding, word2vec, and FastText, the study demonstrates that hybrid models outperform conventional approaches. The findings reveal that language mixing reduces detection accuracy, while advanced feature extraction techniques and linguistic feature integration significantly improve performance. The results contribute to addressing gaps in existing literature and provide insights into optimizing bilingual hate speech detection models. Future research should explore real-time detection methods and broader linguistic applications to enhance hate speech mitigation strategies on social media.

References

  1. Badjatiya, P., Gupta, S., Gupta, M., & Varma, V. (2017). "Deep learning for hate speech detection in tweets." Proceedings of the 26th International Conference on World Wide Web Companion, 759-760
  2. Bohra, A., Vijay, D., Singh, V., Akhtar, S. S., & Shrivastava, M. (2018). "A dataset for hate speech detection in Hindi-English code-mixed social media text." Proceedings of the International Conference on Language Resources and Evaluation (LREC), 2595-2601
  3. Davidson, T., Warmsley, D., Macy, M. W., & Weber, I. (2017). "Automated hate speech detection and the problem of offensive language." Proceedings of the 11th International Conference on Web and Social Media (ICWSM), 512-515
  4. Gambäck, B., & Sikdar, U. K. (2017). "Using convolutional neural networks to classify hate speech." Proceedings of the First Workshop on Abusive Language Online, 85-90
  5. Hossain, M. T., Basu, A., & Wagner, C. (2020). "Code-mixing in social media: Analyzing linguistic patterns and detecting hate speech." ACM Transactions on Social Computing, 3(2), 1-27
  6. Mandl, T., Modha, S., Patel, D., & Mandlia, C. (2019). "Overview of the HASOC track at FIRE 2019: Hate Speech and Offensive Content Identification in Indo-European Languages." Proceedings of FIRE 2019 – Forum for Information Retrieval Evaluation, 263-267
  7. Mozafari, M., Farahbakhsh, R., & Crespi, N. (2020). "A BERT-based transfer learning approach for hate speech detection in online social media." Proceedings of the 29th ACM International Conference on Information and Knowledge Management (CIKM), 2893-2900
  8. Pitsilis, G. K., Ramampiaro, H., & Langseth, H. (2018). "Detecting offensive language in tweets using deep learning." Applied Intelligence, 48(12), 4730-4742
  9. Risch, J., & Krestel, R. (2018). "Aggression identification using deep learning and data augmentation." Proceedings of the First Workshop on Trolling, Aggression and Cyberbullying (TRAC-2018), 150-158
  10. Zhang, Z., Robinson, D., & Tepper, J. (2018). "Detecting hate speech on Twitter using a convolution-GRU based deep neural network." Proceedings of the 15th European Semantic Web Conference (ESWC), 745-760
Download PDF

How to Cite

Kanchan Vishwakarma, (2025-03-06 10:08:18.587). Bilingual Hate Speech Detection on Social Media: Amharic and Afaan Oromo. JANOLI International Journal of Big Data , Volume ALIHWmliJRGzmKonHyii, Issue 1.