Distinguishing Human from Bot Texts: A Graph-Based and Few-Shot Learning Approach
1
(
Computer engineering department, college of Alborz, University of Tehran, Tehran, Iran
)
Abdol-Hossein Vahabie
2
(
Tehran University
)
Keywords: Arabic text bot detection, Graph neural networks, Graph attention networks, Graph convolutional networks, SetFit model,
Abstract :
Bots are automated programs created to carry out specific tasks on the internet. On social networks, bots frequently disseminate automated or misleading content, posing significant challenges to the integrity and reliability of these platforms. Detecting and mitigating bot activity is crucial for maintaining the trustworthiness of social media environments. This task becomes even more challenging in low-resource languages like Arabic, where intricate linguistic structures and limited annotated datasets complicate accurate classification. In this regard, a novel framework is introduced for distinguishing between human- and bot-generated Arabic text, using the AutoTweet-Dataset. The framework evaluates two categories of advanced models: graph neural networks and the SetFit model. The first category encompasses two distinct architectures: graph attention networks and graph convolutional networks. In contrast, the SetFit model leverages few-shot learning to facilitate efficient classification. Besides creating an advanced model for identifying bot-generated text, our primary objective is to compare graph neural network-based models with the SetFit model in addressing the complexities of Arabic text processing. The evaluation results determine that the SetFit model achieved the highest accuracy at 88.35%, demonstrating its effectiveness in differentiating between text generated by humans and bots. This research represents a significant advancement in bot detection techniques for low-resource languages. Introducing scalable and efficient methodologies enhances the accuracy of automated content detection, contributing to the security and authenticity of social media interactions in the face of increasingly advanced bot activity.
[1] A. Nambiar, "Impact of fake news, message and spam spread through social media on people decision making ability," 2022.
[2] D. Assenmacher, L. Clever, L. Frischlich, T. Quandt, H. Trautmann, and C. Grimme, "Demystifying social bots: On the intelligence of automated social media actors," Social Media+ Society, vol. 6, no. 3, p. 2056305120939264, 2020.
[3] D. Ajiga, P. A. Okeleke, S. O. Folorunsho, and C. Ezeigweneme, "The role of software automation in improving industrial operations and efficiency," International Journal of Engineering Research Updates, vol. 7, no. 1, pp. 22-35, 2024.
[4] M. Cai, H. Luo, X. Meng, Y. Cui, and W. Wang, "Network distribution and sentiment interaction: Information diffusion mechanisms between social bots and human users on social media," Information Processing & Management, vol. 60, no. 2, p. 103197, 2023.
[5] E. Ferrara, "Social bot detection in the age of ChatGPT: Challenges and opportunities," First Monday, 2023.
[6] L. Madahali, "Social Bots," Iowa State University, 2024.
[7] M. Himelein-Wachowiak et al., "Bots and misinformation spread on social media: Implications for COVID-19," Journal of medical Internet research, vol. 23, no. 5, p. e26933, 2021.
[8] H. Aghakhani, A. Machiry, S. Nilizadeh, C. Kruegel, and G. Vigna, "Detecting deceptive reviews using generative adversarial networks," in 2018 IEEE security and privacy workshops (SPW), 2018: IEEE, pp. 89-95.
[9] S. Cresci, R. Di Pietro, M. Petrocchi, A. Spognardi, and M. Tesconi, "The paradigm-shift of social spambots: Evidence, theories, and tools for the arms race," in Proceedings of the 26th international conference on world wide web companion, 2017, pp. 963-972.
[10] A. Bodaghi, S. Goliaei, and M. Salehi, "The number of followings as an influential factor in rumor spreading," Applied Mathematics and Computation, vol. 357, pp. 167-184, 2019.
[11] N. S. Alghamdi and J. S. Alowibdi, "Distinguishing Arabic GenAI-generated Tweets and Human Tweets utilizing Machine Learning," Engineering, Technology & Applied Science Research, vol. 14, no. 5, pp. 16720-16726, 2024.
[12] H. Alshammari, A. El-Sayed, and K. Elleithy, "AI-Generated Text Detector for Arabic Language Using Encoder-Based Transformer Architecture," Big Data and Cognitive Computing, vol. 8, no. 3, p. 32, 2024. [Online]. Available: https://www.mdpi.com/2504-2289/8/3/32.
[13] A. Yousefi Jordehi, M. Hosseini Khasheh Heyran, S. Ahmadnia, S. A. Mirroshandel, and O. Rambow, "Improving Opinion Mining Through Automatic Prompt Construction," Journal of Information Systems and Telecommunication (JIST), vol. 3, no. 47, p. 216, 2024.
[14] A. Roustakiani, N. Abdolvand, and S. R. Harandi, "An improved sentiment analysis algorithm based on appraisal theory and fuzzy logic," Journal of Information Systems and Telecommunication (JIST), vol. 88, 2018.
[15] H. Solaimannezhad and O. Fatemi, "Representing a Content-based link Prediction Algorithm in Scientific Social Networks," Journal of Information Systems and Telecommunication (JIST), p. 146, 2017.
[16] H. Sharifian, N. Tohidi, and C. Dadkhah, "Designing a Semi-Intelligent Crawler for Creating a Persian Question Answering Corpus Called Popfa," Journal of Information Systems and Telecommunication (JIST), 2024.
[17] M. E. Khademi, M. Fakhredanesh, and S. M. Hoseini, "Farsi conceptual text summarizer: a new model in continuous vector space," Journal of Information Systems and Telecommunication (JIST), vol. 1, no. 25, p. 23, 2019.
[18] E. Alothali, M. Salih, K. Hayawi, and H. Alashwal, "Bot-MGAT: A Transfer Learning Model Based on a Multi-View Graph Attention Network to Detect Social Bots," Applied Sciences, vol. 12, no. 16, p. 8117, 2022. [Online]. Available: https://www.mdpi.com/2076-3417/12/16/8117.
[19] E. Alothali, "STREAM-EVOLVING BOT DETECTION FRAMEWORK USING GRAPH-BASED AND FEATURE-BASED APPROACHES FOR IDENTIFYING SOCIAL BOTS ON TWITTER," 2023.
[20] S. Najari, M. Salehi, and R. Farahbakhsh, "GANBOT: a GAN-based framework for social bot detection," Social Network Analysis and Mining, vol. 12, no. 1, p. 4, 2021/11/14 2021, doi: 10.1007/s13278-021-00800-9.
[21] A. Bhandarkar, M. A. DM, D. Vishwachetan, A. Mushtaq, D. Kadam, and S. Saxena, "Unmasking the AI Hand: A Machine Learning Approach to Deciphering Authorship," in 2024 3rd International Conference for Innovation in Technology (INOCON), 2024: IEEE, pp. 1-6.
[22] A. T. Y. Chong, H. N. Chua, M. B. Jasser, and R. T. K. Wong, "Bot or Human? Detection of DeepFake Text with Semantic, Emoji, Sentiment and Linguistic Features," in 2023 IEEE 13th International Conference on System Engineering and Technology (ICSET), 2-2 Oct. 2023 2023, pp. 205-210, doi: 10.1109/ICSET59111.2023.10295100.
[23] F. Wei and U. T. Nguyen, "Twitter Bot Detection Using Neural Networks and Linguistic Embeddings," IEEE Open Journal of the Computer Society, vol. 4, pp. 218-230, 2023, doi: 10.1109/OJCS.2023.3302286.
[24] Y. Wu, Y. Fang, S. Shang, J. Jin, L. Wei, and H. Wang, "A novel framework for detecting social bots with deep neural networks and active learning," Knowledge-Based Systems, vol. 211, p. 106525, 2021/01/09/ 2021, doi: https://doi.org/10.1016/j.knosys.2020.106525.
[25] M. Mazoochi, N. Asadi, F. Rahmani, and L. Rabiei, "Identifying Persian bots on Twitter; which feature is more important: Account Information or Tweet Contents?," International Journal of Information and Communication Technology Research, vol. 15, no. 1, pp. 35-44, 2023.
[26] F. Alhayan and H. Himdi, "Ensemble learning approach for distinguishing human and computer-generated Arabic reviews," PeerJ Computer Science, vol. 10, p. e2345, 2024.
[27] F. Harrag, M. Debbah, K. Darwish, and A. Abdelali, "Bert transformer model for detecting Arabic GPT2 auto-generated tweets," arXiv preprint arXiv:2101.09345, 2021.
[28] H. Alshammari, "AI-Generated Text Detector for Arabic Language," University of Bridgeport, 2024.
[29] H. Almerekhi and T. Elsayed, "Detecting automatically-generated arabic tweets," in Information Retrieval Technology: 11th Asia Information Retrieval Societies Conference, AIRS 2015, Brisbane, QLD, Australia, December 2-4, 2015. Proceedings 11, 2015: Springer, pp. 123-134.
[30] A. K. Uysal and S. Gunal, "The impact of preprocessing on text classification," Information processing & management, vol. 50, no. 1, pp. 104-112, 2014.
[31] N. Reimers, "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks," arXiv preprint arXiv:1908.10084, 2019.
[32] W. I. D. Mining, "Data mining: Concepts and techniques," Morgan Kaufinann, vol. 10, no. 559-569, p. 4, 2006.
[33] A. Safaya, M. Abdullatif, and D. Yuret, "Kuisail at semeval-2020 task 12: Bert-cnn for offensive speech identification in social media," arXiv preprint arXiv:2007.13184, 2020.
[34] L. Tunstall et al., "Efficient few-shot learning without prompts," arXiv preprint arXiv:2209.11055, 2022.
[35] T. Chen, S. Kornblith, M. Norouzi, and G. Hinton, "A simple framework for contrastive learning of visual representations," in International conference on machine learning, 2020: PMLR, pp. 1597-1607.
[36] S. I. Hassan, L. Elrefaei, and M. S. Andraws, "Arabic Tweets Spam Detection Based on Various Supervised Machine Learning and Deep Learning Classifiers," MSA Engineering Journal, vol. 2, no. 2, pp. 1099-1119, 2023.