An Experimental Study on Performance of Text Representation Models for Sentiment Analysis
Subject Areas : Data MiningSajjad Jahanbakhsh Gudakahriz 1 , Amir Masoud Eftekhari Moghaddam 2 * , Fariborz Mahmoudi 3
1 - Qazvin Branch, Islamic Azad University, Qazvin, Iran University
2 - Qazvin Branch, Islamic Azad University, Qazvin, Iran University
3 - General Motors Company
Keywords: Text Representation Models, , Sentiment Analysis, , Sentiment Classification, , Ensemble Classifiers, ,
Abstract :
Sentiment analysis in social networks has been an active research field since 2000 and it is highly useful in the decision-making process of various domains and applications. In sentiment analysis, the goal is to analyze the opinion texts posted in social networks and other web-based resources to extract the necessary information from them. The data collected from various social networks and web sites do not possess a structured format, and this unstructured format is the main challenge for facing such data. It is necessary to represent the texts in the form of a text representation model to be able to analyze the content to overcome this challenge. Afterward, the required analysis can be done. The research on text modeling started a few decades ago, and so far, various models have been proposed for performing this modeling process. The main purpose of this paper is to evaluate the efficiency and effectiveness of a number of commons and famous text representation models for sentiment analysis. This evaluation is carried out by using these models for sentiment classification by ensemble methods. An ensemble classifier is used for sentiment classification and after preprocessing, the texts is represented by selected models. The selected models for this study are TF-IDF, LSA, Word2Vec, and Doc2Vec and the used evaluation measures are Accuracy, Precision, Recall, and F-Measure. The results of the study show that in general, the Doc2Vec model provides better performance compared to other models in sentiment analysis and at best, accuracy is 0.72.
[1] A. Montoyo, P. Martínez-Barco, and A. Balahur, “Subjectivity and sentiment analysis: An overview of the current state of the area and envisaged developments”, Decision Support Systems, Vol. 53, 2012, pp. 675–679.
[2] J. A. Balazs, and J. D. Velasquez, “Opinion Mining and Information Fusion: A survey”, Information Fusion, Vol. 27, 2016, pp. 95–110.
[3] F. A. Pozzi, E. Fersini, E. Messina, and B. Liu, Sentiment Analysis in Social Networks (Chapter 1), Science Direct, 2017.
[4] V. M. Mika, G. Daniel, and K. Miikka, “The evolution of sentiment analysis—A review of research topics, venues, and top cited papers”, Computer Science Review, Vol. 27, 2018, pp. 16-32.
[5] E. Cambria, “Affective computing and sentiment analysis”, IEEE Intelligent Systems,Vol. 31, No. 2, 2016, pp. 102-107.
[6] P. Gupta, R. Tiwari, and N. Robert, “Sentiment Analysis and Text Summarization of Online Reviews: A Survey”, in International Conference on Communication and Signal Processing, 2016, pp. 241-245.
[7] J. A. Balazs, and J. D. Velasquez, “Opinion Mining and Information Fusion: A survey”, Information Fusion, Vol. 27, pp. 95–110, 2016.
[8] A. Heydari, M. A. Tavakoli, N. Salim, and Z. Heydar, “Detection of review spam: A survey”, Expert Systems with Applications, Vol. 42, 2015, pp. 3634–3642.
[9] A. Qazi, R. Raj, G. Hardaker, and C. Standing, “A systematic literature review on opinion types and sentiment analysis techniques: tasks and challenges”, Internet Research, Vol. 27, 2017.
[10] F. Bisio, L. Oneto, and E. Cambria, Sentiment Analysis in Social Networks (Chapter 5), Science Direct, 2017.
[11] S. Sun, C. Luo, and J. Chen, “A Review of Natural Language Processing Techniques for Opinion Mining Systems”, Information Fusion, 2016.
[12] P. Sobkowicz, M. Kaschesky, and G. Bouchard, “Opinion mining in social media: Modeling, simulating, and forecasting political opinions in the web”, Government Information Quarterly, Vol. 31, 2012, pp. 470–479.
[13] M. Thelwall, “Sentiment Analysis for Tourism”, Big Data and Innovation in Tourism, Travel, and Hospitality, 2019, pp. 87-104.
[14] L. Yue, W. Chen, X. Li, W. Zuo, and M. Yin, “A survey of sentiment analysis in social media”, Knowledge and Information Systems, 2018.
[15] K. Ravi, and V. Ravi, “A survey on opinion mining and sentiment analysis: Tasks, approaches and applications”, Knowledge-Based Systems, Vol. 89, 2015, pp. 14–46.
[16] Z. Jianqiang, and G. Xiaolin, “Comparison Research on Text Pre-processing Methods on Twitter Sentiment Analysis”, IEEE Access, Vol. 5, 2017, pp. 2870 – 2879.
[17] L. W. Manod, P. J. Wiliam, and T. R. Michael, “Variable segmentation and ensemble classifiers for predicting dairy cow behavior”, Biosystems engineering, Vol. 17, 2019, pp. 156-167.
[18] T. Brychcin, and M. Konopik, “Semantic spaces for improving language modeling”, Computer Speech and Language, Vol. 28, 2014, pp. 192–209.
[19] A. Aizawa, “An information-theoretic perspective of tf–idf measures”, Information Processing and Management, Vol. 39, 2003, pp. 45-65.
[20] S. Deerwester, S. T. Dumais, G. W. Furnas, T. K. Landauer, and R. Harshman, “Indexing by latent semantic analysis”, Journal of the American Society for Information Science, Vol. 41, 1990, pp. 391–407.
[21] T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient Estimation of Word Representations in Vector Space”, Proc. Workshop at ICLR, 2013.
[22] Q. Le, and T. Mikolov, “Distributed representations of sentences and documents”, in Proceedings of the 31st International Conference on Machine Learning (ICML 2014), 2014, pp. 1188–1196.
[23] G. Wang, J. Sun, J. Ma, K. Xu, and J. Gu, “Sentiment classification: The contribution of ensemble learning”, Decision Support Systems, 2013.
[24] I. Furxhi, F. Murphy, M. Mullins, and C. A. Poland, “Machine Learning Prediction of Nanoparticle In Vitro Toxicity: A Comparative Study of Classifiers and Ensemble-Classifiers using the Copeland Index”, Toxicology Letters,Vol. 312, 2019, pp. 157-166.
[25] I. Lopez-Gazpio, M. Maritxalar, M. Lapata, and E. Agirre, “Word n-gram attention models for sentence similarity and inference”, Expert Systems With Applications, Vol. 132, 2019, pp. 1-11.
[26] N. F. F. Da Silva, E. R. Hruschka, and J. E. R. Hruschka, “Tweet sentiment analysis with classifier ensembles”, Decision Support Systems, 2014.
[27] M. Speriosu, N. Sudan, S. Upadhyay, and J. Baldridge, “Twitter polarity classification with label propagation over lexical links and the follower graph”, in Proceedings of the EMNLP First workshop on Unsupervised Learning in NLP, 2011.
[28] W. Deitrick, and W. Hu, “Mutually enhancing community detection and sentiment analysis on twitter networks”, Journal of Data Analysis and Information Processing, Vol. 1, 2013, pp. 19-29.
[29] P. Nakov, S. Rosenthal, Z. Kozareva, V. Stoyanov, A. Ritter and T. Wilson, “Semeval-2013 task 2: Sentiment analysis in twitter”, in Proceedings of the 7th International Workshop on Semantic Evaluation, Association for Computational Linguistics, 2013.
[30] A. Go, R. Bhayani, and L. Huang, “Twitter sentiment classification using distant supervision”, CS224N Project Report, Stanford, 2009.